* Re: [dpdk-dev] [PATCH 2/2] uio: new driver to support PCI MSI-X
2015-10-01 8:33 ` Michael S. Tsirkin
@ 2015-10-01 10:37 ` Michael S. Tsirkin
2015-10-01 16:06 ` Michael S. Tsirkin
2015-10-01 14:50 ` Stephen Hemminger
` (2 subsequent siblings)
3 siblings, 1 reply; 44+ messages in thread
From: Michael S. Tsirkin @ 2015-10-01 10:37 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev, hjk, gregkh, linux-kernel
On Thu, Oct 01, 2015 at 11:33:06AM +0300, Michael S. Tsirkin wrote:
> On Wed, Sep 30, 2015 at 03:28:58PM -0700, Stephen Hemminger wrote:
> > This driver allows using PCI device with Message Signalled Interrupt
> > from userspace. The API is similar to the igb_uio driver used by the DPDK.
> > Via ioctl it provides a mechanism to map MSI-X interrupts into event
> > file descriptors similar to VFIO.
> >
> > VFIO is a better choice if IOMMU is available, but often userspace drivers
> > have to work in environments where IOMMU support (real or emulated) is
> > not available. All UIO drivers that support DMA are not secure against
> > rogue userspace applications programming DMA hardware to access
> > private memory; this driver is no less secure than existing code.
> >
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
>
> I don't think copying the igb_uio interface is a good idea.
> What DPDK is doing with igb_uio (and indeed uio_pci_generic)
> is abusing the sysfs BAR access to provide unlimited
> access to hardware.
>
> MSI messages are memory writes so any generic device capable
> of MSI is capable of corrupting kernel memory.
> This means that a bug in userspace will lead to kernel memory corruption
> and crashes. This is something distributions can't support.
>
> uio_pci_generic is already abused like that, mostly
> because when I wrote it, I didn't add enough protections
> against using it with DMA capable devices,
> and we can't go back and break working userspace.
> But at least it does not bind to VFs which all of
> them are capable of DMA.
>
> The result of merging this driver will be userspace abusing the
> sysfs BAR access with VFs as well, and we do not want that.
>
>
> Just forwarding events is not enough to make a valid driver.
> What is missing is a way to access the device in a safe way.
>
> On a more positive note:
>
> What would be a reasonable interface? One that does the following
> in kernel:
>
> 1. initializes device rings (can be in pinned userspace memory,
> but can not be writeable by userspace), brings up interface link
> 2. pins userspace memory (unless using e.g. hugetlbfs)
> 3. gets request, make sure it's valid and belongs to
> the correct task, put it in the ring
> 4. in the reverse direction, notify userspace when buffers
> are available in the ring
> 5. notify userspace about MSI (what this driver does)
>
> What userspace can be allowed to do:
>
> format requests (e.g. transmit, receive) in userspace
> read ring contents
>
> What userspace can't be allowed to do:
>
> access BAR
> write rings
Thinking about it some more, many devices
Have separate rings for DMA: TX (device reads memory)
and RX (device writes memory).
With such devices, a mode where userspace can write TX ring
but not RX ring might make sense.
This will mean userspace might read kernel memory
through the device, but can not corrupt it.
That's already a big win!
And RX buffers do not have to be added one at a time.
If we assume 0.2usec per system call, batching some 100 buffers per
system call gives you 2 nano seconds overhead. That seems quite
reasonable.
>
> This means that the driver can not be a generic one,
> and there will be a system call overhead when you
> write the ring, but that's the price you have to
> pay for ability to run on systems without an IOMMU.
>
>
>
>
> > ---
> > drivers/uio/Kconfig | 9 ++
> > drivers/uio/Makefile | 1 +
> > drivers/uio/uio_msi.c | 378 +++++++++++++++++++++++++++++++++++++++++++
> > include/uapi/linux/Kbuild | 1 +
> > include/uapi/linux/uio_msi.h | 22 +++
> > 5 files changed, 411 insertions(+)
> > create mode 100644 drivers/uio/uio_msi.c
> > create mode 100644 include/uapi/linux/uio_msi.h
> >
> > diff --git a/drivers/uio/Kconfig b/drivers/uio/Kconfig
> > index 52c98ce..04adfa0 100644
> > --- a/drivers/uio/Kconfig
> > +++ b/drivers/uio/Kconfig
> > @@ -93,6 +93,15 @@ config UIO_PCI_GENERIC
> > primarily, for virtualization scenarios.
> > If you compile this as a module, it will be called uio_pci_generic.
> >
> > +config UIO_PCI_MSI
> > + tristate "Generic driver supporting MSI-x on PCI Express cards"
> > + depends on PCI
> > + help
> > + Generic driver that provides Message Signalled IRQ events
> > + similar to VFIO. If IOMMMU is available please use VFIO
> > + instead since it provides more security.
> > + If you compile this as a module, it will be called uio_msi.
> > +
> > config UIO_NETX
> > tristate "Hilscher NetX Card driver"
> > depends on PCI
> > diff --git a/drivers/uio/Makefile b/drivers/uio/Makefile
> > index 8560dad..62fc44b 100644
> > --- a/drivers/uio/Makefile
> > +++ b/drivers/uio/Makefile
> > @@ -9,3 +9,4 @@ obj-$(CONFIG_UIO_NETX) += uio_netx.o
> > obj-$(CONFIG_UIO_PRUSS) += uio_pruss.o
> > obj-$(CONFIG_UIO_MF624) += uio_mf624.o
> > obj-$(CONFIG_UIO_FSL_ELBC_GPCM) += uio_fsl_elbc_gpcm.o
> > +obj-$(CONFIG_UIO_PCI_MSI) += uio_msi.o
> > diff --git a/drivers/uio/uio_msi.c b/drivers/uio/uio_msi.c
> > new file mode 100644
> > index 0000000..802b5c4
> > --- /dev/null
> > +++ b/drivers/uio/uio_msi.c
> > @@ -0,0 +1,378 @@
> > +/*-
> > + *
> > + * Copyright (c) 2015 by Brocade Communications Systems, Inc.
> > + * Author: Stephen Hemminger <stephen@networkplumber.org>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 only.
> > + */
> > +
> > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> > +
> > +#include <linux/device.h>
> > +#include <linux/interrupt.h>
> > +#include <linux/eventfd.h>
> > +#include <linux/module.h>
> > +#include <linux/pci.h>
> > +#include <linux/uio_driver.h>
> > +#include <linux/msi.h>
> > +#include <linux/uio_msi.h>
> > +
> > +#define DRIVER_VERSION "0.1.1"
> > +#define MAX_MSIX_VECTORS 64
> > +
> > +/* MSI-X vector information */
> > +struct uio_msi_pci_dev {
> > + struct uio_info info; /* UIO driver info */
> > + struct pci_dev *pdev; /* PCI device */
> > + struct mutex mutex; /* open/release/ioctl mutex */
> > + int ref_cnt; /* references to device */
> > + unsigned int max_vectors; /* MSI-X slots available */
> > + struct msix_entry *msix; /* MSI-X vector table */
> > + struct uio_msi_irq_ctx {
> > + struct eventfd_ctx *trigger; /* vector to eventfd */
> > + char *name; /* name in /proc/interrupts */
> > + } *ctx;
> > +};
> > +
> > +static irqreturn_t uio_intx_irqhandler(int irq, void *arg)
> > +{
> > + struct uio_msi_pci_dev *udev = arg;
> > +
> > + if (pci_check_and_mask_intx(udev->pdev)) {
> > + eventfd_signal(udev->ctx->trigger, 1);
> > + return IRQ_HANDLED;
> > + }
> > +
> > + return IRQ_NONE;
> > +}
> > +
> > +static irqreturn_t uio_msi_irqhandler(int irq, void *arg)
> > +{
> > + struct eventfd_ctx *trigger = arg;
> > +
> > + eventfd_signal(trigger, 1);
> > + return IRQ_HANDLED;
> > +}
> > +
> > +/* set the mapping between vector # and existing eventfd. */
> > +static int set_irq_eventfd(struct uio_msi_pci_dev *udev, u32 vec, int fd)
> > +{
> > + struct eventfd_ctx *trigger;
> > + int irq, err;
> > +
> > + if (vec >= udev->max_vectors) {
> > + dev_notice(&udev->pdev->dev, "vec %u >= num_vec %u\n",
> > + vec, udev->max_vectors);
> > + return -ERANGE;
> > + }
> > +
> > + irq = udev->msix[vec].vector;
> > + trigger = udev->ctx[vec].trigger;
> > + if (trigger) {
> > + /* Clearup existing irq mapping */
> > + free_irq(irq, trigger);
> > + eventfd_ctx_put(trigger);
> > + udev->ctx[vec].trigger = NULL;
> > + }
> > +
> > + /* Passing -1 is used to disable interrupt */
> > + if (fd < 0)
> > + return 0;
> > +
> > + trigger = eventfd_ctx_fdget(fd);
> > + if (IS_ERR(trigger)) {
> > + err = PTR_ERR(trigger);
> > + dev_notice(&udev->pdev->dev,
> > + "eventfd ctx get failed: %d\n", err);
> > + return err;
> > + }
> > +
> > + if (udev->msix)
> > + err = request_irq(irq, uio_msi_irqhandler, 0,
> > + udev->ctx[vec].name, trigger);
> > + else
> > + err = request_irq(irq, uio_intx_irqhandler, IRQF_SHARED,
> > + udev->ctx[vec].name, udev);
> > +
> > + if (err) {
> > + dev_notice(&udev->pdev->dev,
> > + "request irq failed: %d\n", err);
> > + eventfd_ctx_put(trigger);
> > + return err;
> > + }
> > +
> > + udev->ctx[vec].trigger = trigger;
> > + return 0;
> > +}
> > +
> > +static int
> > +uio_msi_ioctl(struct uio_info *info, unsigned int cmd, unsigned long arg)
> > +{
> > + struct uio_msi_pci_dev *udev
> > + = container_of(info, struct uio_msi_pci_dev, info);
> > + struct uio_msi_irq_set hdr;
> > + int err;
> > +
> > + switch (cmd) {
> > + case UIO_MSI_IRQ_SET:
> > + if (copy_from_user(&hdr, (void __user *)arg, sizeof(hdr)))
> > + return -EFAULT;
> > +
> > + mutex_lock(&udev->mutex);
> > + err = set_irq_eventfd(udev, hdr.vec, hdr.fd);
> > + mutex_unlock(&udev->mutex);
> > + break;
> > + default:
> > + err = -EOPNOTSUPP;
> > + }
> > + return err;
> > +}
> > +
> > +/* Opening the UIO device for first time enables MSI-X */
> > +static int
> > +uio_msi_open(struct uio_info *info, struct inode *inode)
> > +{
> > + struct uio_msi_pci_dev *udev
> > + = container_of(info, struct uio_msi_pci_dev, info);
> > + int err = 0;
> > +
> > + mutex_lock(&udev->mutex);
> > + if (udev->ref_cnt++ == 0) {
> > + if (udev->msix)
> > + err = pci_enable_msix(udev->pdev, udev->msix,
> > + udev->max_vectors);
> > + }
> > + mutex_unlock(&udev->mutex);
> > +
> > + return err;
> > +}
> > +
> > +/* Last close of the UIO device releases/disables all IRQ's */
> > +static int
> > +uio_msi_release(struct uio_info *info, struct inode *inode)
> > +{
> > + struct uio_msi_pci_dev *udev
> > + = container_of(info, struct uio_msi_pci_dev, info);
> > + int i;
> > +
> > + mutex_lock(&udev->mutex);
> > + if (--udev->ref_cnt == 0) {
> > + for (i = 0; i < udev->max_vectors; i++) {
> > + int irq = udev->msix[i].vector;
> > + struct eventfd_ctx *trigger = udev->ctx[i].trigger;
> > +
> > + if (!trigger)
> > + continue;
> > +
> > + free_irq(irq, trigger);
> > + eventfd_ctx_put(trigger);
> > + udev->ctx[i].trigger = NULL;
> > + }
> > +
> > + if (udev->msix)
> > + pci_disable_msix(udev->pdev);
> > + }
> > + mutex_unlock(&udev->mutex);
> > +
> > + return 0;
> > +}
> > +
> > +/* Unmap previously ioremap'd resources */
> > +static void
> > +release_iomaps(struct uio_mem *mem)
> > +{
> > + int i;
> > +
> > + for (i = 0; i < MAX_UIO_MAPS; i++, mem++) {
> > + if (mem->internal_addr)
> > + iounmap(mem->internal_addr);
> > + }
> > +}
> > +
> > +static int
> > +setup_maps(struct pci_dev *pdev, struct uio_info *info)
> > +{
> > + int i, m = 0, p = 0, err;
> > + static const char * const bar_names[] = {
> > + "BAR0", "BAR1", "BAR2", "BAR3", "BAR4", "BAR5",
> > + };
> > +
> > + for (i = 0; i < ARRAY_SIZE(bar_names); i++) {
> > + unsigned long start = pci_resource_start(pdev, i);
> > + unsigned long flags = pci_resource_flags(pdev, i);
> > + unsigned long len = pci_resource_len(pdev, i);
> > +
> > + if (start == 0 || len == 0)
> > + continue;
> > +
> > + if (flags & IORESOURCE_MEM) {
> > + void __iomem *addr;
> > +
> > + if (m >= MAX_UIO_MAPS)
> > + continue;
> > +
> > + addr = ioremap(start, len);
> > + if (addr == NULL) {
> > + err = -EINVAL;
> > + goto fail;
> > + }
> > +
> > + info->mem[m].name = bar_names[i];
> > + info->mem[m].addr = start;
> > + info->mem[m].internal_addr = addr;
> > + info->mem[m].size = len;
> > + info->mem[m].memtype = UIO_MEM_PHYS;
> > + ++m;
> > + } else if (flags & IORESOURCE_IO) {
> > + if (p >= MAX_UIO_PORT_REGIONS)
> > + continue;
> > +
> > + info->port[p].name = bar_names[i];
> > + info->port[p].start = start;
> > + info->port[p].size = len;
> > + info->port[p].porttype = UIO_PORT_X86;
> > + ++p;
> > + }
> > + }
> > +
> > + return 0;
> > + fail:
> > + for (i = 0; i < m; i++)
> > + iounmap(info->mem[i].internal_addr);
> > + return err;
> > +}
> > +
> > +static int uio_msi_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> > +{
> > + struct uio_msi_pci_dev *udev;
> > + int i, err, vectors;
> > +
> > + udev = kzalloc(sizeof(struct uio_msi_pci_dev), GFP_KERNEL);
> > + if (!udev)
> > + return -ENOMEM;
> > +
> > + err = pci_enable_device(pdev);
> > + if (err != 0) {
> > + dev_err(&pdev->dev, "cannot enable PCI device\n");
> > + goto fail_free;
> > + }
> > +
> > + err = pci_request_regions(pdev, "uio_msi");
> > + if (err != 0) {
> > + dev_err(&pdev->dev, "Cannot request regions\n");
> > + goto fail_disable;
> > + }
> > +
> > + pci_set_master(pdev);
> > +
> > + /* remap resources */
> > + err = setup_maps(pdev, &udev->info);
> > + if (err)
> > + goto fail_release_iomem;
> > +
> > + /* fill uio infos */
> > + udev->info.name = "uio_msi";
> > + udev->info.version = DRIVER_VERSION;
> > + udev->info.priv = udev;
> > + udev->pdev = pdev;
> > + udev->info.ioctl = uio_msi_ioctl;
> > + udev->info.open = uio_msi_open;
> > + udev->info.release = uio_msi_release;
> > + udev->info.irq = UIO_IRQ_CUSTOM;
> > + mutex_init(&udev->mutex);
> > +
> > + vectors = pci_msix_vec_count(pdev);
> > + if (vectors > 0) {
> > + udev->max_vectors = min_t(u16, vectors, MAX_MSIX_VECTORS);
> > + dev_info(&pdev->dev, "using up to %u MSI-X vectors\n",
> > + udev->max_vectors);
> > +
> > + err = -ENOMEM;
> > + udev->msix = kcalloc(udev->max_vectors,
> > + sizeof(struct msix_entry), GFP_KERNEL);
> > + if (!udev->msix)
> > + goto fail_release_iomem;
> > + } else if (!pci_intx_mask_supported(pdev)) {
> > + dev_err(&pdev->dev,
> > + "device does not support MSI-X or INTX\n");
> > + err = -EINVAL;
> > + goto fail_release_iomem;
> > + } else {
> > + dev_notice(&pdev->dev, "using INTX\n");
> > + udev->info.irq_flags = IRQF_SHARED;
> > + udev->max_vectors = 1;
> > + }
> > +
> > + udev->ctx = kcalloc(udev->max_vectors,
> > + sizeof(struct uio_msi_irq_ctx), GFP_KERNEL);
> > + if (!udev->ctx)
> > + goto fail_free_msix;
> > +
> > + for (i = 0; i < udev->max_vectors; i++) {
> > + udev->msix[i].entry = i;
> > +
> > + udev->ctx[i].name = kasprintf(GFP_KERNEL,
> > + KBUILD_MODNAME "[%d](%s)",
> > + i, pci_name(pdev));
> > + if (!udev->ctx[i].name)
> > + goto fail_free_ctx;
> > + }
> > +
> > + /* register uio driver */
> > + err = uio_register_device(&pdev->dev, &udev->info);
> > + if (err != 0)
> > + goto fail_free_ctx;
> > +
> > + pci_set_drvdata(pdev, udev);
> > + return 0;
> > +
> > +fail_free_ctx:
> > + for (i = 0; i < udev->max_vectors; i++)
> > + kfree(udev->ctx[i].name);
> > + kfree(udev->ctx);
> > +fail_free_msix:
> > + kfree(udev->msix);
> > +fail_release_iomem:
> > + release_iomaps(udev->info.mem);
> > + pci_release_regions(pdev);
> > +fail_disable:
> > + pci_disable_device(pdev);
> > +fail_free:
> > + kfree(udev);
> > +
> > + pr_notice("%s ret %d\n", __func__, err);
> > + return err;
> > +}
> > +
> > +static void uio_msi_remove(struct pci_dev *pdev)
> > +{
> > + struct uio_info *info = pci_get_drvdata(pdev);
> > + struct uio_msi_pci_dev *udev
> > + = container_of(info, struct uio_msi_pci_dev, info);
> > + int i;
> > +
> > + uio_unregister_device(info);
> > + release_iomaps(info->mem);
> > +
> > + pci_release_regions(pdev);
> > + for (i = 0; i < udev->max_vectors; i++)
> > + kfree(udev->ctx[i].name);
> > + kfree(udev->ctx);
> > + kfree(udev->msix);
> > + pci_disable_device(pdev);
> > +
> > + pci_set_drvdata(pdev, NULL);
> > + kfree(udev);
> > +}
> > +
> > +static struct pci_driver uio_msi_pci_driver = {
> > + .name = "uio_msi",
> > + .probe = uio_msi_probe,
> > + .remove = uio_msi_remove,
> > +};
> > +
> > +module_pci_driver(uio_msi_pci_driver);
> > +MODULE_VERSION(DRIVER_VERSION);
> > +MODULE_LICENSE("GPL v2");
> > +MODULE_AUTHOR("Stephen Hemminger <stephen@networkplumber.org>");
> > +MODULE_DESCRIPTION("UIO driver for MSI PCI devices");
> > diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
> > index f7b2db4..d9497691 100644
> > --- a/include/uapi/linux/Kbuild
> > +++ b/include/uapi/linux/Kbuild
> > @@ -411,6 +411,7 @@ header-y += udp.h
> > header-y += uhid.h
> > header-y += uinput.h
> > header-y += uio.h
> > +header-y += uio_msi.h
> > header-y += ultrasound.h
> > header-y += un.h
> > header-y += unistd.h
> > diff --git a/include/uapi/linux/uio_msi.h b/include/uapi/linux/uio_msi.h
> > new file mode 100644
> > index 0000000..297de00
> > --- /dev/null
> > +++ b/include/uapi/linux/uio_msi.h
> > @@ -0,0 +1,22 @@
> > +/*
> > + * UIO_MSI API definition
> > + *
> > + * Copyright (c) 2015 by Brocade Communications Systems, Inc.
> > + * All rights reserved.
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License version 2 as
> > + * published by the Free Software Foundation.
> > + */
> > +#ifndef _UIO_PCI_MSI_H
> > +#define _UIO_PCI_MSI_H
> > +
> > +struct uio_msi_irq_set {
> > + u32 vec;
> > + int fd;
> > +};
> > +
> > +#define UIO_MSI_BASE 0x86
> > +#define UIO_MSI_IRQ_SET _IOW('I', UIO_MSI_BASE+1, struct uio_msi_irq_set)
> > +
> > +#endif
> > --
> > 2.1.4
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [dpdk-dev] [PATCH 2/2] uio: new driver to support PCI MSI-X
2015-10-01 10:37 ` Michael S. Tsirkin
@ 2015-10-01 16:06 ` Michael S. Tsirkin
0 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2015-10-01 16:06 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev, hjk, gregkh, linux-kernel
On Thu, Oct 01, 2015 at 01:37:12PM +0300, Michael S. Tsirkin wrote:
> On Thu, Oct 01, 2015 at 11:33:06AM +0300, Michael S. Tsirkin wrote:
> > On Wed, Sep 30, 2015 at 03:28:58PM -0700, Stephen Hemminger wrote:
> > > This driver allows using PCI device with Message Signalled Interrupt
> > > from userspace. The API is similar to the igb_uio driver used by the DPDK.
> > > Via ioctl it provides a mechanism to map MSI-X interrupts into event
> > > file descriptors similar to VFIO.
> > >
> > > VFIO is a better choice if IOMMU is available, but often userspace drivers
> > > have to work in environments where IOMMU support (real or emulated) is
> > > not available. All UIO drivers that support DMA are not secure against
> > > rogue userspace applications programming DMA hardware to access
> > > private memory; this driver is no less secure than existing code.
> > >
> > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> >
> > I don't think copying the igb_uio interface is a good idea.
> > What DPDK is doing with igb_uio (and indeed uio_pci_generic)
> > is abusing the sysfs BAR access to provide unlimited
> > access to hardware.
> >
> > MSI messages are memory writes so any generic device capable
> > of MSI is capable of corrupting kernel memory.
> > This means that a bug in userspace will lead to kernel memory corruption
> > and crashes. This is something distributions can't support.
> >
> > uio_pci_generic is already abused like that, mostly
> > because when I wrote it, I didn't add enough protections
> > against using it with DMA capable devices,
> > and we can't go back and break working userspace.
> > But at least it does not bind to VFs which all of
> > them are capable of DMA.
> >
> > The result of merging this driver will be userspace abusing the
> > sysfs BAR access with VFs as well, and we do not want that.
> >
> >
> > Just forwarding events is not enough to make a valid driver.
> > What is missing is a way to access the device in a safe way.
> >
> > On a more positive note:
> >
> > What would be a reasonable interface? One that does the following
> > in kernel:
> >
> > 1. initializes device rings (can be in pinned userspace memory,
> > but can not be writeable by userspace), brings up interface link
> > 2. pins userspace memory (unless using e.g. hugetlbfs)
> > 3. gets request, make sure it's valid and belongs to
> > the correct task, put it in the ring
> > 4. in the reverse direction, notify userspace when buffers
> > are available in the ring
> > 5. notify userspace about MSI (what this driver does)
> >
> > What userspace can be allowed to do:
> >
> > format requests (e.g. transmit, receive) in userspace
> > read ring contents
> >
> > What userspace can't be allowed to do:
> >
> > access BAR
> > write rings
>
> Thinking about it some more, many devices
>
>
> Have separate rings for DMA: TX (device reads memory)
> and RX (device writes memory).
> With such devices, a mode where userspace can write TX ring
> but not RX ring might make sense.
>
> This will mean userspace might read kernel memory
> through the device, but can not corrupt it.
>
> That's already a big win!
>
> And RX buffers do not have to be added one at a time.
> If we assume 0.2usec per system call, batching some 100 buffers per
> system call gives you 2 nano seconds overhead. That seems quite
> reasonable.
>
To add to that, there's no reason to do this on the same core.
Re-arming descriptors can happen in parallel with packet
processing, so this overhead won't affect PPS or latency at all:
only the CPU utilization.
>
>
>
> >
> > This means that the driver can not be a generic one,
> > and there will be a system call overhead when you
> > write the ring, but that's the price you have to
> > pay for ability to run on systems without an IOMMU.
> >
> >
> >
> >
> > > ---
> > > drivers/uio/Kconfig | 9 ++
> > > drivers/uio/Makefile | 1 +
> > > drivers/uio/uio_msi.c | 378 +++++++++++++++++++++++++++++++++++++++++++
> > > include/uapi/linux/Kbuild | 1 +
> > > include/uapi/linux/uio_msi.h | 22 +++
> > > 5 files changed, 411 insertions(+)
> > > create mode 100644 drivers/uio/uio_msi.c
> > > create mode 100644 include/uapi/linux/uio_msi.h
> > >
> > > diff --git a/drivers/uio/Kconfig b/drivers/uio/Kconfig
> > > index 52c98ce..04adfa0 100644
> > > --- a/drivers/uio/Kconfig
> > > +++ b/drivers/uio/Kconfig
> > > @@ -93,6 +93,15 @@ config UIO_PCI_GENERIC
> > > primarily, for virtualization scenarios.
> > > If you compile this as a module, it will be called uio_pci_generic.
> > >
> > > +config UIO_PCI_MSI
> > > + tristate "Generic driver supporting MSI-x on PCI Express cards"
> > > + depends on PCI
> > > + help
> > > + Generic driver that provides Message Signalled IRQ events
> > > + similar to VFIO. If IOMMMU is available please use VFIO
> > > + instead since it provides more security.
> > > + If you compile this as a module, it will be called uio_msi.
> > > +
> > > config UIO_NETX
> > > tristate "Hilscher NetX Card driver"
> > > depends on PCI
> > > diff --git a/drivers/uio/Makefile b/drivers/uio/Makefile
> > > index 8560dad..62fc44b 100644
> > > --- a/drivers/uio/Makefile
> > > +++ b/drivers/uio/Makefile
> > > @@ -9,3 +9,4 @@ obj-$(CONFIG_UIO_NETX) += uio_netx.o
> > > obj-$(CONFIG_UIO_PRUSS) += uio_pruss.o
> > > obj-$(CONFIG_UIO_MF624) += uio_mf624.o
> > > obj-$(CONFIG_UIO_FSL_ELBC_GPCM) += uio_fsl_elbc_gpcm.o
> > > +obj-$(CONFIG_UIO_PCI_MSI) += uio_msi.o
> > > diff --git a/drivers/uio/uio_msi.c b/drivers/uio/uio_msi.c
> > > new file mode 100644
> > > index 0000000..802b5c4
> > > --- /dev/null
> > > +++ b/drivers/uio/uio_msi.c
> > > @@ -0,0 +1,378 @@
> > > +/*-
> > > + *
> > > + * Copyright (c) 2015 by Brocade Communications Systems, Inc.
> > > + * Author: Stephen Hemminger <stephen@networkplumber.org>
> > > + *
> > > + * This work is licensed under the terms of the GNU GPL, version 2 only.
> > > + */
> > > +
> > > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> > > +
> > > +#include <linux/device.h>
> > > +#include <linux/interrupt.h>
> > > +#include <linux/eventfd.h>
> > > +#include <linux/module.h>
> > > +#include <linux/pci.h>
> > > +#include <linux/uio_driver.h>
> > > +#include <linux/msi.h>
> > > +#include <linux/uio_msi.h>
> > > +
> > > +#define DRIVER_VERSION "0.1.1"
> > > +#define MAX_MSIX_VECTORS 64
> > > +
> > > +/* MSI-X vector information */
> > > +struct uio_msi_pci_dev {
> > > + struct uio_info info; /* UIO driver info */
> > > + struct pci_dev *pdev; /* PCI device */
> > > + struct mutex mutex; /* open/release/ioctl mutex */
> > > + int ref_cnt; /* references to device */
> > > + unsigned int max_vectors; /* MSI-X slots available */
> > > + struct msix_entry *msix; /* MSI-X vector table */
> > > + struct uio_msi_irq_ctx {
> > > + struct eventfd_ctx *trigger; /* vector to eventfd */
> > > + char *name; /* name in /proc/interrupts */
> > > + } *ctx;
> > > +};
> > > +
> > > +static irqreturn_t uio_intx_irqhandler(int irq, void *arg)
> > > +{
> > > + struct uio_msi_pci_dev *udev = arg;
> > > +
> > > + if (pci_check_and_mask_intx(udev->pdev)) {
> > > + eventfd_signal(udev->ctx->trigger, 1);
> > > + return IRQ_HANDLED;
> > > + }
> > > +
> > > + return IRQ_NONE;
> > > +}
> > > +
> > > +static irqreturn_t uio_msi_irqhandler(int irq, void *arg)
> > > +{
> > > + struct eventfd_ctx *trigger = arg;
> > > +
> > > + eventfd_signal(trigger, 1);
> > > + return IRQ_HANDLED;
> > > +}
> > > +
> > > +/* set the mapping between vector # and existing eventfd. */
> > > +static int set_irq_eventfd(struct uio_msi_pci_dev *udev, u32 vec, int fd)
> > > +{
> > > + struct eventfd_ctx *trigger;
> > > + int irq, err;
> > > +
> > > + if (vec >= udev->max_vectors) {
> > > + dev_notice(&udev->pdev->dev, "vec %u >= num_vec %u\n",
> > > + vec, udev->max_vectors);
> > > + return -ERANGE;
> > > + }
> > > +
> > > + irq = udev->msix[vec].vector;
> > > + trigger = udev->ctx[vec].trigger;
> > > + if (trigger) {
> > > + /* Clearup existing irq mapping */
> > > + free_irq(irq, trigger);
> > > + eventfd_ctx_put(trigger);
> > > + udev->ctx[vec].trigger = NULL;
> > > + }
> > > +
> > > + /* Passing -1 is used to disable interrupt */
> > > + if (fd < 0)
> > > + return 0;
> > > +
> > > + trigger = eventfd_ctx_fdget(fd);
> > > + if (IS_ERR(trigger)) {
> > > + err = PTR_ERR(trigger);
> > > + dev_notice(&udev->pdev->dev,
> > > + "eventfd ctx get failed: %d\n", err);
> > > + return err;
> > > + }
> > > +
> > > + if (udev->msix)
> > > + err = request_irq(irq, uio_msi_irqhandler, 0,
> > > + udev->ctx[vec].name, trigger);
> > > + else
> > > + err = request_irq(irq, uio_intx_irqhandler, IRQF_SHARED,
> > > + udev->ctx[vec].name, udev);
> > > +
> > > + if (err) {
> > > + dev_notice(&udev->pdev->dev,
> > > + "request irq failed: %d\n", err);
> > > + eventfd_ctx_put(trigger);
> > > + return err;
> > > + }
> > > +
> > > + udev->ctx[vec].trigger = trigger;
> > > + return 0;
> > > +}
> > > +
> > > +static int
> > > +uio_msi_ioctl(struct uio_info *info, unsigned int cmd, unsigned long arg)
> > > +{
> > > + struct uio_msi_pci_dev *udev
> > > + = container_of(info, struct uio_msi_pci_dev, info);
> > > + struct uio_msi_irq_set hdr;
> > > + int err;
> > > +
> > > + switch (cmd) {
> > > + case UIO_MSI_IRQ_SET:
> > > + if (copy_from_user(&hdr, (void __user *)arg, sizeof(hdr)))
> > > + return -EFAULT;
> > > +
> > > + mutex_lock(&udev->mutex);
> > > + err = set_irq_eventfd(udev, hdr.vec, hdr.fd);
> > > + mutex_unlock(&udev->mutex);
> > > + break;
> > > + default:
> > > + err = -EOPNOTSUPP;
> > > + }
> > > + return err;
> > > +}
> > > +
> > > +/* Opening the UIO device for first time enables MSI-X */
> > > +static int
> > > +uio_msi_open(struct uio_info *info, struct inode *inode)
> > > +{
> > > + struct uio_msi_pci_dev *udev
> > > + = container_of(info, struct uio_msi_pci_dev, info);
> > > + int err = 0;
> > > +
> > > + mutex_lock(&udev->mutex);
> > > + if (udev->ref_cnt++ == 0) {
> > > + if (udev->msix)
> > > + err = pci_enable_msix(udev->pdev, udev->msix,
> > > + udev->max_vectors);
> > > + }
> > > + mutex_unlock(&udev->mutex);
> > > +
> > > + return err;
> > > +}
> > > +
> > > +/* Last close of the UIO device releases/disables all IRQ's */
> > > +static int
> > > +uio_msi_release(struct uio_info *info, struct inode *inode)
> > > +{
> > > + struct uio_msi_pci_dev *udev
> > > + = container_of(info, struct uio_msi_pci_dev, info);
> > > + int i;
> > > +
> > > + mutex_lock(&udev->mutex);
> > > + if (--udev->ref_cnt == 0) {
> > > + for (i = 0; i < udev->max_vectors; i++) {
> > > + int irq = udev->msix[i].vector;
> > > + struct eventfd_ctx *trigger = udev->ctx[i].trigger;
> > > +
> > > + if (!trigger)
> > > + continue;
> > > +
> > > + free_irq(irq, trigger);
> > > + eventfd_ctx_put(trigger);
> > > + udev->ctx[i].trigger = NULL;
> > > + }
> > > +
> > > + if (udev->msix)
> > > + pci_disable_msix(udev->pdev);
> > > + }
> > > + mutex_unlock(&udev->mutex);
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +/* Unmap previously ioremap'd resources */
> > > +static void
> > > +release_iomaps(struct uio_mem *mem)
> > > +{
> > > + int i;
> > > +
> > > + for (i = 0; i < MAX_UIO_MAPS; i++, mem++) {
> > > + if (mem->internal_addr)
> > > + iounmap(mem->internal_addr);
> > > + }
> > > +}
> > > +
> > > +static int
> > > +setup_maps(struct pci_dev *pdev, struct uio_info *info)
> > > +{
> > > + int i, m = 0, p = 0, err;
> > > + static const char * const bar_names[] = {
> > > + "BAR0", "BAR1", "BAR2", "BAR3", "BAR4", "BAR5",
> > > + };
> > > +
> > > + for (i = 0; i < ARRAY_SIZE(bar_names); i++) {
> > > + unsigned long start = pci_resource_start(pdev, i);
> > > + unsigned long flags = pci_resource_flags(pdev, i);
> > > + unsigned long len = pci_resource_len(pdev, i);
> > > +
> > > + if (start == 0 || len == 0)
> > > + continue;
> > > +
> > > + if (flags & IORESOURCE_MEM) {
> > > + void __iomem *addr;
> > > +
> > > + if (m >= MAX_UIO_MAPS)
> > > + continue;
> > > +
> > > + addr = ioremap(start, len);
> > > + if (addr == NULL) {
> > > + err = -EINVAL;
> > > + goto fail;
> > > + }
> > > +
> > > + info->mem[m].name = bar_names[i];
> > > + info->mem[m].addr = start;
> > > + info->mem[m].internal_addr = addr;
> > > + info->mem[m].size = len;
> > > + info->mem[m].memtype = UIO_MEM_PHYS;
> > > + ++m;
> > > + } else if (flags & IORESOURCE_IO) {
> > > + if (p >= MAX_UIO_PORT_REGIONS)
> > > + continue;
> > > +
> > > + info->port[p].name = bar_names[i];
> > > + info->port[p].start = start;
> > > + info->port[p].size = len;
> > > + info->port[p].porttype = UIO_PORT_X86;
> > > + ++p;
> > > + }
> > > + }
> > > +
> > > + return 0;
> > > + fail:
> > > + for (i = 0; i < m; i++)
> > > + iounmap(info->mem[i].internal_addr);
> > > + return err;
> > > +}
> > > +
> > > +static int uio_msi_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> > > +{
> > > + struct uio_msi_pci_dev *udev;
> > > + int i, err, vectors;
> > > +
> > > + udev = kzalloc(sizeof(struct uio_msi_pci_dev), GFP_KERNEL);
> > > + if (!udev)
> > > + return -ENOMEM;
> > > +
> > > + err = pci_enable_device(pdev);
> > > + if (err != 0) {
> > > + dev_err(&pdev->dev, "cannot enable PCI device\n");
> > > + goto fail_free;
> > > + }
> > > +
> > > + err = pci_request_regions(pdev, "uio_msi");
> > > + if (err != 0) {
> > > + dev_err(&pdev->dev, "Cannot request regions\n");
> > > + goto fail_disable;
> > > + }
> > > +
> > > + pci_set_master(pdev);
> > > +
> > > + /* remap resources */
> > > + err = setup_maps(pdev, &udev->info);
> > > + if (err)
> > > + goto fail_release_iomem;
> > > +
> > > + /* fill uio infos */
> > > + udev->info.name = "uio_msi";
> > > + udev->info.version = DRIVER_VERSION;
> > > + udev->info.priv = udev;
> > > + udev->pdev = pdev;
> > > + udev->info.ioctl = uio_msi_ioctl;
> > > + udev->info.open = uio_msi_open;
> > > + udev->info.release = uio_msi_release;
> > > + udev->info.irq = UIO_IRQ_CUSTOM;
> > > + mutex_init(&udev->mutex);
> > > +
> > > + vectors = pci_msix_vec_count(pdev);
> > > + if (vectors > 0) {
> > > + udev->max_vectors = min_t(u16, vectors, MAX_MSIX_VECTORS);
> > > + dev_info(&pdev->dev, "using up to %u MSI-X vectors\n",
> > > + udev->max_vectors);
> > > +
> > > + err = -ENOMEM;
> > > + udev->msix = kcalloc(udev->max_vectors,
> > > + sizeof(struct msix_entry), GFP_KERNEL);
> > > + if (!udev->msix)
> > > + goto fail_release_iomem;
> > > + } else if (!pci_intx_mask_supported(pdev)) {
> > > + dev_err(&pdev->dev,
> > > + "device does not support MSI-X or INTX\n");
> > > + err = -EINVAL;
> > > + goto fail_release_iomem;
> > > + } else {
> > > + dev_notice(&pdev->dev, "using INTX\n");
> > > + udev->info.irq_flags = IRQF_SHARED;
> > > + udev->max_vectors = 1;
> > > + }
> > > +
> > > + udev->ctx = kcalloc(udev->max_vectors,
> > > + sizeof(struct uio_msi_irq_ctx), GFP_KERNEL);
> > > + if (!udev->ctx)
> > > + goto fail_free_msix;
> > > +
> > > + for (i = 0; i < udev->max_vectors; i++) {
> > > + udev->msix[i].entry = i;
> > > +
> > > + udev->ctx[i].name = kasprintf(GFP_KERNEL,
> > > + KBUILD_MODNAME "[%d](%s)",
> > > + i, pci_name(pdev));
> > > + if (!udev->ctx[i].name)
> > > + goto fail_free_ctx;
> > > + }
> > > +
> > > + /* register uio driver */
> > > + err = uio_register_device(&pdev->dev, &udev->info);
> > > + if (err != 0)
> > > + goto fail_free_ctx;
> > > +
> > > + pci_set_drvdata(pdev, udev);
> > > + return 0;
> > > +
> > > +fail_free_ctx:
> > > + for (i = 0; i < udev->max_vectors; i++)
> > > + kfree(udev->ctx[i].name);
> > > + kfree(udev->ctx);
> > > +fail_free_msix:
> > > + kfree(udev->msix);
> > > +fail_release_iomem:
> > > + release_iomaps(udev->info.mem);
> > > + pci_release_regions(pdev);
> > > +fail_disable:
> > > + pci_disable_device(pdev);
> > > +fail_free:
> > > + kfree(udev);
> > > +
> > > + pr_notice("%s ret %d\n", __func__, err);
> > > + return err;
> > > +}
> > > +
> > > +static void uio_msi_remove(struct pci_dev *pdev)
> > > +{
> > > + struct uio_info *info = pci_get_drvdata(pdev);
> > > + struct uio_msi_pci_dev *udev
> > > + = container_of(info, struct uio_msi_pci_dev, info);
> > > + int i;
> > > +
> > > + uio_unregister_device(info);
> > > + release_iomaps(info->mem);
> > > +
> > > + pci_release_regions(pdev);
> > > + for (i = 0; i < udev->max_vectors; i++)
> > > + kfree(udev->ctx[i].name);
> > > + kfree(udev->ctx);
> > > + kfree(udev->msix);
> > > + pci_disable_device(pdev);
> > > +
> > > + pci_set_drvdata(pdev, NULL);
> > > + kfree(udev);
> > > +}
> > > +
> > > +static struct pci_driver uio_msi_pci_driver = {
> > > + .name = "uio_msi",
> > > + .probe = uio_msi_probe,
> > > + .remove = uio_msi_remove,
> > > +};
> > > +
> > > +module_pci_driver(uio_msi_pci_driver);
> > > +MODULE_VERSION(DRIVER_VERSION);
> > > +MODULE_LICENSE("GPL v2");
> > > +MODULE_AUTHOR("Stephen Hemminger <stephen@networkplumber.org>");
> > > +MODULE_DESCRIPTION("UIO driver for MSI PCI devices");
> > > diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
> > > index f7b2db4..d9497691 100644
> > > --- a/include/uapi/linux/Kbuild
> > > +++ b/include/uapi/linux/Kbuild
> > > @@ -411,6 +411,7 @@ header-y += udp.h
> > > header-y += uhid.h
> > > header-y += uinput.h
> > > header-y += uio.h
> > > +header-y += uio_msi.h
> > > header-y += ultrasound.h
> > > header-y += un.h
> > > header-y += unistd.h
> > > diff --git a/include/uapi/linux/uio_msi.h b/include/uapi/linux/uio_msi.h
> > > new file mode 100644
> > > index 0000000..297de00
> > > --- /dev/null
> > > +++ b/include/uapi/linux/uio_msi.h
> > > @@ -0,0 +1,22 @@
> > > +/*
> > > + * UIO_MSI API definition
> > > + *
> > > + * Copyright (c) 2015 by Brocade Communications Systems, Inc.
> > > + * All rights reserved.
> > > + *
> > > + * This program is free software; you can redistribute it and/or modify
> > > + * it under the terms of the GNU General Public License version 2 as
> > > + * published by the Free Software Foundation.
> > > + */
> > > +#ifndef _UIO_PCI_MSI_H
> > > +#define _UIO_PCI_MSI_H
> > > +
> > > +struct uio_msi_irq_set {
> > > + u32 vec;
> > > + int fd;
> > > +};
> > > +
> > > +#define UIO_MSI_BASE 0x86
> > > +#define UIO_MSI_IRQ_SET _IOW('I', UIO_MSI_BASE+1, struct uio_msi_irq_set)
> > > +
> > > +#endif
> > > --
> > > 2.1.4
> > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > > Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [dpdk-dev] [PATCH 2/2] uio: new driver to support PCI MSI-X
2015-10-01 8:33 ` Michael S. Tsirkin
2015-10-01 10:37 ` Michael S. Tsirkin
@ 2015-10-01 14:50 ` Stephen Hemminger
2015-10-01 15:22 ` Michael S. Tsirkin
2015-10-01 16:31 ` Michael S. Tsirkin
2015-10-05 21:54 ` Michael S. Tsirkin
3 siblings, 1 reply; 44+ messages in thread
From: Stephen Hemminger @ 2015-10-01 14:50 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: dev, hjk, gregkh, linux-kernel
On Thu, 1 Oct 2015 11:33:06 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Wed, Sep 30, 2015 at 03:28:58PM -0700, Stephen Hemminger wrote:
> > This driver allows using PCI device with Message Signalled Interrupt
> > from userspace. The API is similar to the igb_uio driver used by the DPDK.
> > Via ioctl it provides a mechanism to map MSI-X interrupts into event
> > file descriptors similar to VFIO.
> >
> > VFIO is a better choice if IOMMU is available, but often userspace drivers
> > have to work in environments where IOMMU support (real or emulated) is
> > not available. All UIO drivers that support DMA are not secure against
> > rogue userspace applications programming DMA hardware to access
> > private memory; this driver is no less secure than existing code.
> >
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
>
> I don't think copying the igb_uio interface is a good idea.
> What DPDK is doing with igb_uio (and indeed uio_pci_generic)
> is abusing the sysfs BAR access to provide unlimited
> access to hardware.
>
> MSI messages are memory writes so any generic device capable
> of MSI is capable of corrupting kernel memory.
> This means that a bug in userspace will lead to kernel memory corruption
> and crashes. This is something distributions can't support.
>
> uio_pci_generic is already abused like that, mostly
> because when I wrote it, I didn't add enough protections
> against using it with DMA capable devices,
> and we can't go back and break working userspace.
> But at least it does not bind to VFs which all of
> them are capable of DMA.
>
> The result of merging this driver will be userspace abusing the
> sysfs BAR access with VFs as well, and we do not want that.
>
>
> Just forwarding events is not enough to make a valid driver.
> What is missing is a way to access the device in a safe way.
>
> On a more positive note:
>
> What would be a reasonable interface? One that does the following
> in kernel:
>
> 1. initializes device rings (can be in pinned userspace memory,
> but can not be writeable by userspace), brings up interface link
> 2. pins userspace memory (unless using e.g. hugetlbfs)
> 3. gets request, make sure it's valid and belongs to
> the correct task, put it in the ring
> 4. in the reverse direction, notify userspace when buffers
> are available in the ring
> 5. notify userspace about MSI (what this driver does)
>
> What userspace can be allowed to do:
>
> format requests (e.g. transmit, receive) in userspace
> read ring contents
>
> What userspace can't be allowed to do:
>
> access BAR
> write rings
>
>
> This means that the driver can not be a generic one,
> and there will be a system call overhead when you
> write the ring, but that's the price you have to
> pay for ability to run on systems without an IOMMU.
I think I understand what you are proposing, but it really doesn't
fit into the high speed userspace networking model.
1. Device rings are device specific, can't be in a generic driver.
2. DPDK uses huge mememory.
3. Performance requires all ring requests be done in pure userspace,
(ie no syscalls)
4. Ditto, can't have kernel to userspace notification per packet
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [dpdk-dev] [PATCH 2/2] uio: new driver to support PCI MSI-X
2015-10-01 14:50 ` Stephen Hemminger
@ 2015-10-01 15:22 ` Michael S. Tsirkin
0 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2015-10-01 15:22 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev, hjk, gregkh, linux-kernel
On Thu, Oct 01, 2015 at 07:50:37AM -0700, Stephen Hemminger wrote:
> On Thu, 1 Oct 2015 11:33:06 +0300
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>
> > On Wed, Sep 30, 2015 at 03:28:58PM -0700, Stephen Hemminger wrote:
> > > This driver allows using PCI device with Message Signalled Interrupt
> > > from userspace. The API is similar to the igb_uio driver used by the DPDK.
> > > Via ioctl it provides a mechanism to map MSI-X interrupts into event
> > > file descriptors similar to VFIO.
> > >
> > > VFIO is a better choice if IOMMU is available, but often userspace drivers
> > > have to work in environments where IOMMU support (real or emulated) is
> > > not available. All UIO drivers that support DMA are not secure against
> > > rogue userspace applications programming DMA hardware to access
> > > private memory; this driver is no less secure than existing code.
> > >
> > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> >
> > I don't think copying the igb_uio interface is a good idea.
> > What DPDK is doing with igb_uio (and indeed uio_pci_generic)
> > is abusing the sysfs BAR access to provide unlimited
> > access to hardware.
> >
> > MSI messages are memory writes so any generic device capable
> > of MSI is capable of corrupting kernel memory.
> > This means that a bug in userspace will lead to kernel memory corruption
> > and crashes. This is something distributions can't support.
> >
> > uio_pci_generic is already abused like that, mostly
> > because when I wrote it, I didn't add enough protections
> > against using it with DMA capable devices,
> > and we can't go back and break working userspace.
> > But at least it does not bind to VFs which all of
> > them are capable of DMA.
> >
> > The result of merging this driver will be userspace abusing the
> > sysfs BAR access with VFs as well, and we do not want that.
> >
> >
> > Just forwarding events is not enough to make a valid driver.
> > What is missing is a way to access the device in a safe way.
> >
> > On a more positive note:
> >
> > What would be a reasonable interface? One that does the following
> > in kernel:
> >
> > 1. initializes device rings (can be in pinned userspace memory,
> > but can not be writeable by userspace), brings up interface link
> > 2. pins userspace memory (unless using e.g. hugetlbfs)
> > 3. gets request, make sure it's valid and belongs to
> > the correct task, put it in the ring
> > 4. in the reverse direction, notify userspace when buffers
> > are available in the ring
> > 5. notify userspace about MSI (what this driver does)
> >
> > What userspace can be allowed to do:
> >
> > format requests (e.g. transmit, receive) in userspace
> > read ring contents
> >
> > What userspace can't be allowed to do:
> >
> > access BAR
> > write rings
> >
> >
> > This means that the driver can not be a generic one,
> > and there will be a system call overhead when you
> > write the ring, but that's the price you have to
> > pay for ability to run on systems without an IOMMU.
>
> I think I understand what you are proposing, but it really doesn't
> fit into the high speed userspace networking model.
I'm aware of the fact currently the model does everything including
bringing up the link in user-space.
But there's really no justification for this.
Only data path things should be in userspace.
A userspace bug should not be able to do things like over-writing the
on-device EEPROM.
> 1. Device rings are device specific, can't be in a generic driver.
So that's more work, and it is not going to happen if people
can get by with insecure hacks.
> 2. DPDK uses huge mememory.
Hugetlbfs? Don't see why this is an issue. Might make things simpler.
> 3. Performance requires all ring requests be done in pure userspace,
> (ie no syscalls)
Make only the TX ring writeable then. At least you won't be
able to corrupt the kernel memory.
> 4. Ditto, can't have kernel to userspace notification per packet
RX ring can be read-only, so userspace can read it directly.
--
MST
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [dpdk-dev] [PATCH 2/2] uio: new driver to support PCI MSI-X
2015-10-01 8:33 ` Michael S. Tsirkin
2015-10-01 10:37 ` Michael S. Tsirkin
2015-10-01 14:50 ` Stephen Hemminger
@ 2015-10-01 16:31 ` Michael S. Tsirkin
2015-10-01 17:26 ` Stephen Hemminger
2015-10-05 21:54 ` Michael S. Tsirkin
3 siblings, 1 reply; 44+ messages in thread
From: Michael S. Tsirkin @ 2015-10-01 16:31 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev, hjk, gregkh, linux-kernel
On Thu, Oct 01, 2015 at 11:33:06AM +0300, Michael S. Tsirkin wrote:
> On Wed, Sep 30, 2015 at 03:28:58PM -0700, Stephen Hemminger wrote:
> > This driver allows using PCI device with Message Signalled Interrupt
> > from userspace. The API is similar to the igb_uio driver used by the DPDK.
> > Via ioctl it provides a mechanism to map MSI-X interrupts into event
> > file descriptors similar to VFIO.
> >
> > VFIO is a better choice if IOMMU is available, but often userspace drivers
> > have to work in environments where IOMMU support (real or emulated) is
> > not available. All UIO drivers that support DMA are not secure against
> > rogue userspace applications programming DMA hardware to access
> > private memory; this driver is no less secure than existing code.
> >
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
>
> I don't think copying the igb_uio interface is a good idea.
> What DPDK is doing with igb_uio (and indeed uio_pci_generic)
> is abusing the sysfs BAR access to provide unlimited
> access to hardware.
>
> MSI messages are memory writes so any generic device capable
> of MSI is capable of corrupting kernel memory.
> This means that a bug in userspace will lead to kernel memory corruption
> and crashes. This is something distributions can't support.
>
> uio_pci_generic is already abused like that, mostly
> because when I wrote it, I didn't add enough protections
> against using it with DMA capable devices,
> and we can't go back and break working userspace.
> But at least it does not bind to VFs which all of
> them are capable of DMA.
>
> The result of merging this driver will be userspace abusing the
> sysfs BAR access with VFs as well, and we do not want that.
>
>
> Just forwarding events is not enough to make a valid driver.
> What is missing is a way to access the device in a safe way.
>
> On a more positive note:
>
> What would be a reasonable interface? One that does the following
> in kernel:
>
> 1. initializes device rings (can be in pinned userspace memory,
> but can not be writeable by userspace), brings up interface link
> 2. pins userspace memory (unless using e.g. hugetlbfs)
> 3. gets request, make sure it's valid and belongs to
> the correct task, put it in the ring
> 4. in the reverse direction, notify userspace when buffers
> are available in the ring
> 5. notify userspace about MSI (what this driver does)
>
> What userspace can be allowed to do:
>
> format requests (e.g. transmit, receive) in userspace
> read ring contents
>
> What userspace can't be allowed to do:
>
> access BAR
> write rings
>
>
> This means that the driver can not be a generic one,
> and there will be a system call overhead when you
> write the ring, but that's the price you have to
> pay for ability to run on systems without an IOMMU.
>
The device specific parts can be taken from John Fastabend's patches
BTW:
https://patchwork.ozlabs.org/patch/396713/
IIUC what was missing there was exactly the memory protection
we are looking for here.
>
>
> > ---
> > drivers/uio/Kconfig | 9 ++
> > drivers/uio/Makefile | 1 +
> > drivers/uio/uio_msi.c | 378 +++++++++++++++++++++++++++++++++++++++++++
> > include/uapi/linux/Kbuild | 1 +
> > include/uapi/linux/uio_msi.h | 22 +++
> > 5 files changed, 411 insertions(+)
> > create mode 100644 drivers/uio/uio_msi.c
> > create mode 100644 include/uapi/linux/uio_msi.h
> >
> > diff --git a/drivers/uio/Kconfig b/drivers/uio/Kconfig
> > index 52c98ce..04adfa0 100644
> > --- a/drivers/uio/Kconfig
> > +++ b/drivers/uio/Kconfig
> > @@ -93,6 +93,15 @@ config UIO_PCI_GENERIC
> > primarily, for virtualization scenarios.
> > If you compile this as a module, it will be called uio_pci_generic.
> >
> > +config UIO_PCI_MSI
> > + tristate "Generic driver supporting MSI-x on PCI Express cards"
> > + depends on PCI
> > + help
> > + Generic driver that provides Message Signalled IRQ events
> > + similar to VFIO. If IOMMMU is available please use VFIO
> > + instead since it provides more security.
> > + If you compile this as a module, it will be called uio_msi.
> > +
> > config UIO_NETX
> > tristate "Hilscher NetX Card driver"
> > depends on PCI
> > diff --git a/drivers/uio/Makefile b/drivers/uio/Makefile
> > index 8560dad..62fc44b 100644
> > --- a/drivers/uio/Makefile
> > +++ b/drivers/uio/Makefile
> > @@ -9,3 +9,4 @@ obj-$(CONFIG_UIO_NETX) += uio_netx.o
> > obj-$(CONFIG_UIO_PRUSS) += uio_pruss.o
> > obj-$(CONFIG_UIO_MF624) += uio_mf624.o
> > obj-$(CONFIG_UIO_FSL_ELBC_GPCM) += uio_fsl_elbc_gpcm.o
> > +obj-$(CONFIG_UIO_PCI_MSI) += uio_msi.o
> > diff --git a/drivers/uio/uio_msi.c b/drivers/uio/uio_msi.c
> > new file mode 100644
> > index 0000000..802b5c4
> > --- /dev/null
> > +++ b/drivers/uio/uio_msi.c
> > @@ -0,0 +1,378 @@
> > +/*-
> > + *
> > + * Copyright (c) 2015 by Brocade Communications Systems, Inc.
> > + * Author: Stephen Hemminger <stephen@networkplumber.org>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 only.
> > + */
> > +
> > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> > +
> > +#include <linux/device.h>
> > +#include <linux/interrupt.h>
> > +#include <linux/eventfd.h>
> > +#include <linux/module.h>
> > +#include <linux/pci.h>
> > +#include <linux/uio_driver.h>
> > +#include <linux/msi.h>
> > +#include <linux/uio_msi.h>
> > +
> > +#define DRIVER_VERSION "0.1.1"
> > +#define MAX_MSIX_VECTORS 64
> > +
> > +/* MSI-X vector information */
> > +struct uio_msi_pci_dev {
> > + struct uio_info info; /* UIO driver info */
> > + struct pci_dev *pdev; /* PCI device */
> > + struct mutex mutex; /* open/release/ioctl mutex */
> > + int ref_cnt; /* references to device */
> > + unsigned int max_vectors; /* MSI-X slots available */
> > + struct msix_entry *msix; /* MSI-X vector table */
> > + struct uio_msi_irq_ctx {
> > + struct eventfd_ctx *trigger; /* vector to eventfd */
> > + char *name; /* name in /proc/interrupts */
> > + } *ctx;
> > +};
> > +
> > +static irqreturn_t uio_intx_irqhandler(int irq, void *arg)
> > +{
> > + struct uio_msi_pci_dev *udev = arg;
> > +
> > + if (pci_check_and_mask_intx(udev->pdev)) {
> > + eventfd_signal(udev->ctx->trigger, 1);
> > + return IRQ_HANDLED;
> > + }
> > +
> > + return IRQ_NONE;
> > +}
> > +
> > +static irqreturn_t uio_msi_irqhandler(int irq, void *arg)
> > +{
> > + struct eventfd_ctx *trigger = arg;
> > +
> > + eventfd_signal(trigger, 1);
> > + return IRQ_HANDLED;
> > +}
> > +
> > +/* set the mapping between vector # and existing eventfd. */
> > +static int set_irq_eventfd(struct uio_msi_pci_dev *udev, u32 vec, int fd)
> > +{
> > + struct eventfd_ctx *trigger;
> > + int irq, err;
> > +
> > + if (vec >= udev->max_vectors) {
> > + dev_notice(&udev->pdev->dev, "vec %u >= num_vec %u\n",
> > + vec, udev->max_vectors);
> > + return -ERANGE;
> > + }
> > +
> > + irq = udev->msix[vec].vector;
> > + trigger = udev->ctx[vec].trigger;
> > + if (trigger) {
> > + /* Clearup existing irq mapping */
> > + free_irq(irq, trigger);
> > + eventfd_ctx_put(trigger);
> > + udev->ctx[vec].trigger = NULL;
> > + }
> > +
> > + /* Passing -1 is used to disable interrupt */
> > + if (fd < 0)
> > + return 0;
> > +
> > + trigger = eventfd_ctx_fdget(fd);
> > + if (IS_ERR(trigger)) {
> > + err = PTR_ERR(trigger);
> > + dev_notice(&udev->pdev->dev,
> > + "eventfd ctx get failed: %d\n", err);
> > + return err;
> > + }
> > +
> > + if (udev->msix)
> > + err = request_irq(irq, uio_msi_irqhandler, 0,
> > + udev->ctx[vec].name, trigger);
> > + else
> > + err = request_irq(irq, uio_intx_irqhandler, IRQF_SHARED,
> > + udev->ctx[vec].name, udev);
> > +
> > + if (err) {
> > + dev_notice(&udev->pdev->dev,
> > + "request irq failed: %d\n", err);
> > + eventfd_ctx_put(trigger);
> > + return err;
> > + }
> > +
> > + udev->ctx[vec].trigger = trigger;
> > + return 0;
> > +}
> > +
> > +static int
> > +uio_msi_ioctl(struct uio_info *info, unsigned int cmd, unsigned long arg)
> > +{
> > + struct uio_msi_pci_dev *udev
> > + = container_of(info, struct uio_msi_pci_dev, info);
> > + struct uio_msi_irq_set hdr;
> > + int err;
> > +
> > + switch (cmd) {
> > + case UIO_MSI_IRQ_SET:
> > + if (copy_from_user(&hdr, (void __user *)arg, sizeof(hdr)))
> > + return -EFAULT;
> > +
> > + mutex_lock(&udev->mutex);
> > + err = set_irq_eventfd(udev, hdr.vec, hdr.fd);
> > + mutex_unlock(&udev->mutex);
> > + break;
> > + default:
> > + err = -EOPNOTSUPP;
> > + }
> > + return err;
> > +}
> > +
> > +/* Opening the UIO device for first time enables MSI-X */
> > +static int
> > +uio_msi_open(struct uio_info *info, struct inode *inode)
> > +{
> > + struct uio_msi_pci_dev *udev
> > + = container_of(info, struct uio_msi_pci_dev, info);
> > + int err = 0;
> > +
> > + mutex_lock(&udev->mutex);
> > + if (udev->ref_cnt++ == 0) {
> > + if (udev->msix)
> > + err = pci_enable_msix(udev->pdev, udev->msix,
> > + udev->max_vectors);
> > + }
> > + mutex_unlock(&udev->mutex);
> > +
> > + return err;
> > +}
> > +
> > +/* Last close of the UIO device releases/disables all IRQ's */
> > +static int
> > +uio_msi_release(struct uio_info *info, struct inode *inode)
> > +{
> > + struct uio_msi_pci_dev *udev
> > + = container_of(info, struct uio_msi_pci_dev, info);
> > + int i;
> > +
> > + mutex_lock(&udev->mutex);
> > + if (--udev->ref_cnt == 0) {
> > + for (i = 0; i < udev->max_vectors; i++) {
> > + int irq = udev->msix[i].vector;
> > + struct eventfd_ctx *trigger = udev->ctx[i].trigger;
> > +
> > + if (!trigger)
> > + continue;
> > +
> > + free_irq(irq, trigger);
> > + eventfd_ctx_put(trigger);
> > + udev->ctx[i].trigger = NULL;
> > + }
> > +
> > + if (udev->msix)
> > + pci_disable_msix(udev->pdev);
> > + }
> > + mutex_unlock(&udev->mutex);
> > +
> > + return 0;
> > +}
> > +
> > +/* Unmap previously ioremap'd resources */
> > +static void
> > +release_iomaps(struct uio_mem *mem)
> > +{
> > + int i;
> > +
> > + for (i = 0; i < MAX_UIO_MAPS; i++, mem++) {
> > + if (mem->internal_addr)
> > + iounmap(mem->internal_addr);
> > + }
> > +}
> > +
> > +static int
> > +setup_maps(struct pci_dev *pdev, struct uio_info *info)
> > +{
> > + int i, m = 0, p = 0, err;
> > + static const char * const bar_names[] = {
> > + "BAR0", "BAR1", "BAR2", "BAR3", "BAR4", "BAR5",
> > + };
> > +
> > + for (i = 0; i < ARRAY_SIZE(bar_names); i++) {
> > + unsigned long start = pci_resource_start(pdev, i);
> > + unsigned long flags = pci_resource_flags(pdev, i);
> > + unsigned long len = pci_resource_len(pdev, i);
> > +
> > + if (start == 0 || len == 0)
> > + continue;
> > +
> > + if (flags & IORESOURCE_MEM) {
> > + void __iomem *addr;
> > +
> > + if (m >= MAX_UIO_MAPS)
> > + continue;
> > +
> > + addr = ioremap(start, len);
> > + if (addr == NULL) {
> > + err = -EINVAL;
> > + goto fail;
> > + }
> > +
> > + info->mem[m].name = bar_names[i];
> > + info->mem[m].addr = start;
> > + info->mem[m].internal_addr = addr;
> > + info->mem[m].size = len;
> > + info->mem[m].memtype = UIO_MEM_PHYS;
> > + ++m;
> > + } else if (flags & IORESOURCE_IO) {
> > + if (p >= MAX_UIO_PORT_REGIONS)
> > + continue;
> > +
> > + info->port[p].name = bar_names[i];
> > + info->port[p].start = start;
> > + info->port[p].size = len;
> > + info->port[p].porttype = UIO_PORT_X86;
> > + ++p;
> > + }
> > + }
> > +
> > + return 0;
> > + fail:
> > + for (i = 0; i < m; i++)
> > + iounmap(info->mem[i].internal_addr);
> > + return err;
> > +}
> > +
> > +static int uio_msi_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> > +{
> > + struct uio_msi_pci_dev *udev;
> > + int i, err, vectors;
> > +
> > + udev = kzalloc(sizeof(struct uio_msi_pci_dev), GFP_KERNEL);
> > + if (!udev)
> > + return -ENOMEM;
> > +
> > + err = pci_enable_device(pdev);
> > + if (err != 0) {
> > + dev_err(&pdev->dev, "cannot enable PCI device\n");
> > + goto fail_free;
> > + }
> > +
> > + err = pci_request_regions(pdev, "uio_msi");
> > + if (err != 0) {
> > + dev_err(&pdev->dev, "Cannot request regions\n");
> > + goto fail_disable;
> > + }
> > +
> > + pci_set_master(pdev);
> > +
> > + /* remap resources */
> > + err = setup_maps(pdev, &udev->info);
> > + if (err)
> > + goto fail_release_iomem;
> > +
> > + /* fill uio infos */
> > + udev->info.name = "uio_msi";
> > + udev->info.version = DRIVER_VERSION;
> > + udev->info.priv = udev;
> > + udev->pdev = pdev;
> > + udev->info.ioctl = uio_msi_ioctl;
> > + udev->info.open = uio_msi_open;
> > + udev->info.release = uio_msi_release;
> > + udev->info.irq = UIO_IRQ_CUSTOM;
> > + mutex_init(&udev->mutex);
> > +
> > + vectors = pci_msix_vec_count(pdev);
> > + if (vectors > 0) {
> > + udev->max_vectors = min_t(u16, vectors, MAX_MSIX_VECTORS);
> > + dev_info(&pdev->dev, "using up to %u MSI-X vectors\n",
> > + udev->max_vectors);
> > +
> > + err = -ENOMEM;
> > + udev->msix = kcalloc(udev->max_vectors,
> > + sizeof(struct msix_entry), GFP_KERNEL);
> > + if (!udev->msix)
> > + goto fail_release_iomem;
> > + } else if (!pci_intx_mask_supported(pdev)) {
> > + dev_err(&pdev->dev,
> > + "device does not support MSI-X or INTX\n");
> > + err = -EINVAL;
> > + goto fail_release_iomem;
> > + } else {
> > + dev_notice(&pdev->dev, "using INTX\n");
> > + udev->info.irq_flags = IRQF_SHARED;
> > + udev->max_vectors = 1;
> > + }
> > +
> > + udev->ctx = kcalloc(udev->max_vectors,
> > + sizeof(struct uio_msi_irq_ctx), GFP_KERNEL);
> > + if (!udev->ctx)
> > + goto fail_free_msix;
> > +
> > + for (i = 0; i < udev->max_vectors; i++) {
> > + udev->msix[i].entry = i;
> > +
> > + udev->ctx[i].name = kasprintf(GFP_KERNEL,
> > + KBUILD_MODNAME "[%d](%s)",
> > + i, pci_name(pdev));
> > + if (!udev->ctx[i].name)
> > + goto fail_free_ctx;
> > + }
> > +
> > + /* register uio driver */
> > + err = uio_register_device(&pdev->dev, &udev->info);
> > + if (err != 0)
> > + goto fail_free_ctx;
> > +
> > + pci_set_drvdata(pdev, udev);
> > + return 0;
> > +
> > +fail_free_ctx:
> > + for (i = 0; i < udev->max_vectors; i++)
> > + kfree(udev->ctx[i].name);
> > + kfree(udev->ctx);
> > +fail_free_msix:
> > + kfree(udev->msix);
> > +fail_release_iomem:
> > + release_iomaps(udev->info.mem);
> > + pci_release_regions(pdev);
> > +fail_disable:
> > + pci_disable_device(pdev);
> > +fail_free:
> > + kfree(udev);
> > +
> > + pr_notice("%s ret %d\n", __func__, err);
> > + return err;
> > +}
> > +
> > +static void uio_msi_remove(struct pci_dev *pdev)
> > +{
> > + struct uio_info *info = pci_get_drvdata(pdev);
> > + struct uio_msi_pci_dev *udev
> > + = container_of(info, struct uio_msi_pci_dev, info);
> > + int i;
> > +
> > + uio_unregister_device(info);
> > + release_iomaps(info->mem);
> > +
> > + pci_release_regions(pdev);
> > + for (i = 0; i < udev->max_vectors; i++)
> > + kfree(udev->ctx[i].name);
> > + kfree(udev->ctx);
> > + kfree(udev->msix);
> > + pci_disable_device(pdev);
> > +
> > + pci_set_drvdata(pdev, NULL);
> > + kfree(udev);
> > +}
> > +
> > +static struct pci_driver uio_msi_pci_driver = {
> > + .name = "uio_msi",
> > + .probe = uio_msi_probe,
> > + .remove = uio_msi_remove,
> > +};
> > +
> > +module_pci_driver(uio_msi_pci_driver);
> > +MODULE_VERSION(DRIVER_VERSION);
> > +MODULE_LICENSE("GPL v2");
> > +MODULE_AUTHOR("Stephen Hemminger <stephen@networkplumber.org>");
> > +MODULE_DESCRIPTION("UIO driver for MSI PCI devices");
> > diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
> > index f7b2db4..d9497691 100644
> > --- a/include/uapi/linux/Kbuild
> > +++ b/include/uapi/linux/Kbuild
> > @@ -411,6 +411,7 @@ header-y += udp.h
> > header-y += uhid.h
> > header-y += uinput.h
> > header-y += uio.h
> > +header-y += uio_msi.h
> > header-y += ultrasound.h
> > header-y += un.h
> > header-y += unistd.h
> > diff --git a/include/uapi/linux/uio_msi.h b/include/uapi/linux/uio_msi.h
> > new file mode 100644
> > index 0000000..297de00
> > --- /dev/null
> > +++ b/include/uapi/linux/uio_msi.h
> > @@ -0,0 +1,22 @@
> > +/*
> > + * UIO_MSI API definition
> > + *
> > + * Copyright (c) 2015 by Brocade Communications Systems, Inc.
> > + * All rights reserved.
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License version 2 as
> > + * published by the Free Software Foundation.
> > + */
> > +#ifndef _UIO_PCI_MSI_H
> > +#define _UIO_PCI_MSI_H
> > +
> > +struct uio_msi_irq_set {
> > + u32 vec;
> > + int fd;
> > +};
> > +
> > +#define UIO_MSI_BASE 0x86
> > +#define UIO_MSI_IRQ_SET _IOW('I', UIO_MSI_BASE+1, struct uio_msi_irq_set)
> > +
> > +#endif
> > --
> > 2.1.4
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [dpdk-dev] [PATCH 2/2] uio: new driver to support PCI MSI-X
2015-10-01 16:31 ` Michael S. Tsirkin
@ 2015-10-01 17:26 ` Stephen Hemminger
2015-10-01 18:25 ` Michael S. Tsirkin
0 siblings, 1 reply; 44+ messages in thread
From: Stephen Hemminger @ 2015-10-01 17:26 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: dev, hjk, gregkh, linux-kernel
On Thu, 1 Oct 2015 19:31:08 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Thu, Oct 01, 2015 at 11:33:06AM +0300, Michael S. Tsirkin wrote:
> > On Wed, Sep 30, 2015 at 03:28:58PM -0700, Stephen Hemminger wrote:
> > > This driver allows using PCI device with Message Signalled Interrupt
> > > from userspace. The API is similar to the igb_uio driver used by the DPDK.
> > > Via ioctl it provides a mechanism to map MSI-X interrupts into event
> > > file descriptors similar to VFIO.
> > >
> > > VFIO is a better choice if IOMMU is available, but often userspace drivers
> > > have to work in environments where IOMMU support (real or emulated) is
> > > not available. All UIO drivers that support DMA are not secure against
> > > rogue userspace applications programming DMA hardware to access
> > > private memory; this driver is no less secure than existing code.
> > >
> > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> >
> > I don't think copying the igb_uio interface is a good idea.
> > What DPDK is doing with igb_uio (and indeed uio_pci_generic)
> > is abusing the sysfs BAR access to provide unlimited
> > access to hardware.
> >
> > MSI messages are memory writes so any generic device capable
> > of MSI is capable of corrupting kernel memory.
> > This means that a bug in userspace will lead to kernel memory corruption
> > and crashes. This is something distributions can't support.
> >
> > uio_pci_generic is already abused like that, mostly
> > because when I wrote it, I didn't add enough protections
> > against using it with DMA capable devices,
> > and we can't go back and break working userspace.
> > But at least it does not bind to VFs which all of
> > them are capable of DMA.
> >
> > The result of merging this driver will be userspace abusing the
> > sysfs BAR access with VFs as well, and we do not want that.
> >
> >
> > Just forwarding events is not enough to make a valid driver.
> > What is missing is a way to access the device in a safe way.
> >
> > On a more positive note:
> >
> > What would be a reasonable interface? One that does the following
> > in kernel:
> >
> > 1. initializes device rings (can be in pinned userspace memory,
> > but can not be writeable by userspace), brings up interface link
> > 2. pins userspace memory (unless using e.g. hugetlbfs)
> > 3. gets request, make sure it's valid and belongs to
> > the correct task, put it in the ring
> > 4. in the reverse direction, notify userspace when buffers
> > are available in the ring
> > 5. notify userspace about MSI (what this driver does)
> >
> > What userspace can be allowed to do:
> >
> > format requests (e.g. transmit, receive) in userspace
> > read ring contents
> >
> > What userspace can't be allowed to do:
> >
> > access BAR
> > write rings
> >
> >
> > This means that the driver can not be a generic one,
> > and there will be a system call overhead when you
> > write the ring, but that's the price you have to
> > pay for ability to run on systems without an IOMMU.
> >
>
>
> The device specific parts can be taken from John Fastabend's patches
> BTW:
>
> https://patchwork.ozlabs.org/patch/396713/
>
> IIUC what was missing there was exactly the memory protection
> we are looking for here.
The bifuricated drivers are interesting from an architecture
point of view, but do nothing to solve the immediate use case.
The problem is not on bare metal environment, most of those already have IOMMU.
The issues are on environments like VMWare with SRIOV or vmxnet3,
neither of those are really helped by bifirucated driver or VFIO.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [dpdk-dev] [PATCH 2/2] uio: new driver to support PCI MSI-X
2015-10-01 17:26 ` Stephen Hemminger
@ 2015-10-01 18:25 ` Michael S. Tsirkin
0 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2015-10-01 18:25 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev, hjk, gregkh, linux-kernel
On Thu, Oct 01, 2015 at 10:26:19AM -0700, Stephen Hemminger wrote:
> On Thu, 1 Oct 2015 19:31:08 +0300
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>
> > On Thu, Oct 01, 2015 at 11:33:06AM +0300, Michael S. Tsirkin wrote:
> > > On Wed, Sep 30, 2015 at 03:28:58PM -0700, Stephen Hemminger wrote:
> > > > This driver allows using PCI device with Message Signalled Interrupt
> > > > from userspace. The API is similar to the igb_uio driver used by the DPDK.
> > > > Via ioctl it provides a mechanism to map MSI-X interrupts into event
> > > > file descriptors similar to VFIO.
> > > >
> > > > VFIO is a better choice if IOMMU is available, but often userspace drivers
> > > > have to work in environments where IOMMU support (real or emulated) is
> > > > not available. All UIO drivers that support DMA are not secure against
> > > > rogue userspace applications programming DMA hardware to access
> > > > private memory; this driver is no less secure than existing code.
> > > >
> > > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > >
> > > I don't think copying the igb_uio interface is a good idea.
> > > What DPDK is doing with igb_uio (and indeed uio_pci_generic)
> > > is abusing the sysfs BAR access to provide unlimited
> > > access to hardware.
> > >
> > > MSI messages are memory writes so any generic device capable
> > > of MSI is capable of corrupting kernel memory.
> > > This means that a bug in userspace will lead to kernel memory corruption
> > > and crashes. This is something distributions can't support.
> > >
> > > uio_pci_generic is already abused like that, mostly
> > > because when I wrote it, I didn't add enough protections
> > > against using it with DMA capable devices,
> > > and we can't go back and break working userspace.
> > > But at least it does not bind to VFs which all of
> > > them are capable of DMA.
> > >
> > > The result of merging this driver will be userspace abusing the
> > > sysfs BAR access with VFs as well, and we do not want that.
> > >
> > >
> > > Just forwarding events is not enough to make a valid driver.
> > > What is missing is a way to access the device in a safe way.
> > >
> > > On a more positive note:
> > >
> > > What would be a reasonable interface? One that does the following
> > > in kernel:
> > >
> > > 1. initializes device rings (can be in pinned userspace memory,
> > > but can not be writeable by userspace), brings up interface link
> > > 2. pins userspace memory (unless using e.g. hugetlbfs)
> > > 3. gets request, make sure it's valid and belongs to
> > > the correct task, put it in the ring
> > > 4. in the reverse direction, notify userspace when buffers
> > > are available in the ring
> > > 5. notify userspace about MSI (what this driver does)
> > >
> > > What userspace can be allowed to do:
> > >
> > > format requests (e.g. transmit, receive) in userspace
> > > read ring contents
> > >
> > > What userspace can't be allowed to do:
> > >
> > > access BAR
> > > write rings
> > >
> > >
> > > This means that the driver can not be a generic one,
> > > and there will be a system call overhead when you
> > > write the ring, but that's the price you have to
> > > pay for ability to run on systems without an IOMMU.
> > >
> >
> >
> > The device specific parts can be taken from John Fastabend's patches
> > BTW:
> >
> > https://patchwork.ozlabs.org/patch/396713/
> >
> > IIUC what was missing there was exactly the memory protection
> > we are looking for here.
>
> The bifuricated drivers are interesting from an architecture
> point of view, but do nothing to solve the immediate use case.
> The problem is not on bare metal environment, most of those already have IOMMU.
> The issues are on environments like VMWare with SRIOV or vmxnet3,
> neither of those are really helped by bifirucated driver or VFIO.
Two points I tried to make (and apparently failed, so I'm trying again,
more verbosely):
- bifurcated drivers do DMA into both kernel and userspace
memory from same pci address (bus/dev/fn). As IOMMU uses this
source address to validated accesses, there is no way to
have IOMMU prevent userspace from accessing kernel memory.
If you are prepared to use dynamic mappings for kernel
memory, it might be possible to limit the harm that userspace
can do, but this will slow down kernel networking
(changing IOMMU mappings is expensive) and userspace will
likely be able to at least disrupt kernel networking.
So what I am discussing might still have value there.
- bifurcated drivers have code to bring up link and map rings into
userspace (they also map other rings into kernel, and tweak rx filter
in hardware, that might not be necessary for this usecase).
What I proposed above can use that code, with
the twist that the RX ring is made RO for userspace, and a system call
to safely copy from userspace ring there is supported.
In other words, here's the device specific part in
kernel that you wanted to use, which will only
need some tweaks.
--
MST
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [dpdk-dev] [PATCH 2/2] uio: new driver to support PCI MSI-X
2015-10-01 8:33 ` Michael S. Tsirkin
` (2 preceding siblings ...)
2015-10-01 16:31 ` Michael S. Tsirkin
@ 2015-10-05 21:54 ` Michael S. Tsirkin
2015-10-05 22:09 ` Vladislav Zolotarov
3 siblings, 1 reply; 44+ messages in thread
From: Michael S. Tsirkin @ 2015-10-05 21:54 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev, hjk, gregkh, linux-kernel
On Thu, Oct 01, 2015 at 11:33:06AM +0300, Michael S. Tsirkin wrote:
> Just forwarding events is not enough to make a valid driver.
> What is missing is a way to access the device in a safe way.
Thinking about it some more, maybe some devices don't do DMA, and merely
signal events with MSI/MSI-X.
The fact you mention igb_uio in the cover letter seems to hint that this
isn't the case, and that the real intent is to abuse it for DMA-capable
devices, but still ...
If we assume such a simple device, we need to block userspace from
tweaking at least the MSI control and the MSI-X table.
And changing BARs might make someone else corrupt the MSI-X
table, so we need to block it from changing BARs, too.
Things like device reset will clear the table. I guess this means we
need to track access to reset, too, make sure we restore the
table to a sane config.
PM capability can be used to reset things tooI think. Better be
careful about that.
And a bunch of devices could be doing weird things that need
to be special-cased.
All of this is what VFIO is already dealing with.
Maybe extending VFIO for this usecase, or finding another way to share
code might be a better idea than duplicating the code within uio?
--
MST
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [dpdk-dev] [PATCH 2/2] uio: new driver to support PCI MSI-X
2015-10-05 21:54 ` Michael S. Tsirkin
@ 2015-10-05 22:09 ` Vladislav Zolotarov
2015-10-05 22:49 ` Michael S. Tsirkin
0 siblings, 1 reply; 44+ messages in thread
From: Vladislav Zolotarov @ 2015-10-05 22:09 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: dev, hjk, gregkh, linux-kernel
On Oct 6, 2015 12:55 AM, "Michael S. Tsirkin" <mst@redhat.com> wrote:
>
> On Thu, Oct 01, 2015 at 11:33:06AM +0300, Michael S. Tsirkin wrote:
> > Just forwarding events is not enough to make a valid driver.
> > What is missing is a way to access the device in a safe way.
>
> Thinking about it some more, maybe some devices don't do DMA, and merely
> signal events with MSI/MSI-X.
>
> The fact you mention igb_uio in the cover letter seems to hint that this
> isn't the case, and that the real intent is to abuse it for DMA-capable
> devices, but still ...
>
> If we assume such a simple device, we need to block userspace from
> tweaking at least the MSI control and the MSI-X table.
> And changing BARs might make someone else corrupt the MSI-X
> table, so we need to block it from changing BARs, too.
>
> Things like device reset will clear the table. I guess this means we
> need to track access to reset, too, make sure we restore the
> table to a sane config.
>
> PM capability can be used to reset things tooI think. Better be
> careful about that.
>
> And a bunch of devices could be doing weird things that need
> to be special-cased.
>
> All of this is what VFIO is already dealing with.
>
> Maybe extending VFIO for this usecase, or finding another way to share
> code might be a better idea than duplicating the code within uio?
How about instead of trying to invent the wheel just go and attack the
problem directly just like i've proposed already a few times in the last
days: instead of limiting the UIO limit the users that are allowed to use
UIO to privileged users only (e.g. root). This would solve all clearly
unresolvable issues u are raising here all together, wouldn't it?
>
> --
> MST
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [dpdk-dev] [PATCH 2/2] uio: new driver to support PCI MSI-X
2015-10-05 22:09 ` Vladislav Zolotarov
@ 2015-10-05 22:49 ` Michael S. Tsirkin
2015-10-06 7:33 ` Stephen Hemminger
2015-10-06 8:23 ` Vlad Zolotarov
0 siblings, 2 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2015-10-05 22:49 UTC (permalink / raw)
To: Vladislav Zolotarov; +Cc: dev, hjk, gregkh, linux-kernel
On Tue, Oct 06, 2015 at 01:09:55AM +0300, Vladislav Zolotarov wrote:
> How about instead of trying to invent the wheel just go and attack the problem
> directly just like i've proposed already a few times in the last days: instead
> of limiting the UIO limit the users that are allowed to use UIO to privileged
> users only (e.g. root). This would solve all clearly unresolvable issues u are
> raising here all together, wouldn't it?
No - root or no root, if the user can modify the addresses in the MSI-X
table and make the chip corrupt random memory, this is IMHO a non-starter.
And tainting kernel is not a solution - your patch adds a pile of
code that either goes completely unused or taints the kernel.
Not just that - it's a dedicated userspace API that either
goes completely unused or taints the kernel.
> >
> > --
> > MST
>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [dpdk-dev] [PATCH 2/2] uio: new driver to support PCI MSI-X
2015-10-05 22:49 ` Michael S. Tsirkin
@ 2015-10-06 7:33 ` Stephen Hemminger
2015-10-06 12:15 ` Avi Kivity
2015-10-06 13:42 ` Michael S. Tsirkin
2015-10-06 8:23 ` Vlad Zolotarov
1 sibling, 2 replies; 44+ messages in thread
From: Stephen Hemminger @ 2015-10-06 7:33 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: dev, hjk, gregkh, linux-kernel
Other than implementation objections, so far the two main arguments
against this reduce to:
1. If you allow UIO ioctl then it opens an API hook for all the crap out
of tree UIO drivers to do what they want.
2. If you allow UIO MSI-X then you are expanding the usage of userspace
device access in an insecure manner.
Another alternative which I explored was making a version of VFIO that
works without IOMMU. It solves #1 but actually increases the likely negative
response to arguent #2. This would keep same API, and avoid having to
modify UIO. But we would still have the same (if not more resistance)
from IOMMU developers who believe all systems have to be secure against
root.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [dpdk-dev] [PATCH 2/2] uio: new driver to support PCI MSI-X
2015-10-06 7:33 ` Stephen Hemminger
@ 2015-10-06 12:15 ` Avi Kivity
2015-10-06 14:07 ` Michael S. Tsirkin
2015-10-16 17:11 ` Thomas Monjalon
2015-10-06 13:42 ` Michael S. Tsirkin
1 sibling, 2 replies; 44+ messages in thread
From: Avi Kivity @ 2015-10-06 12:15 UTC (permalink / raw)
To: Stephen Hemminger, Michael S. Tsirkin; +Cc: dev, hjk, gregkh, linux-kernel
On 10/06/2015 10:33 AM, Stephen Hemminger wrote:
> Other than implementation objections, so far the two main arguments
> against this reduce to:
> 1. If you allow UIO ioctl then it opens an API hook for all the crap out
> of tree UIO drivers to do what they want.
> 2. If you allow UIO MSI-X then you are expanding the usage of userspace
> device access in an insecure manner.
>
> Another alternative which I explored was making a version of VFIO that
> works without IOMMU. It solves #1 but actually increases the likely negative
> response to arguent #2. This would keep same API, and avoid having to
> modify UIO. But we would still have the same (if not more resistance)
> from IOMMU developers who believe all systems have to be secure against
> root.
vfio's charter was explicitly aiming for modern setups with iommus.
This could be revisited, but I agree it will have even more resistance,
justified IMO.
btw, (2) doesn't really add any insecurity. The user could already poke
at the msix tables (as well as perform DMA); they just couldn't get a
useful interrupt out of them.
Maybe a module parameter "allow_insecure_dma" can be added to
uio_pci_generic. Without the parameter, bus mastering and msix is
disabled, with the parameter it is allowed. This requires the sysadmin
to take a positive step in order to make use of their hardware.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [dpdk-dev] [PATCH 2/2] uio: new driver to support PCI MSI-X
2015-10-06 12:15 ` Avi Kivity
@ 2015-10-06 14:07 ` Michael S. Tsirkin
2015-10-06 15:41 ` Avi Kivity
2015-10-16 17:11 ` Thomas Monjalon
1 sibling, 1 reply; 44+ messages in thread
From: Michael S. Tsirkin @ 2015-10-06 14:07 UTC (permalink / raw)
To: Avi Kivity; +Cc: dev, hjk, gregkh, linux-kernel
On Tue, Oct 06, 2015 at 03:15:57PM +0300, Avi Kivity wrote:
> btw, (2) doesn't really add any insecurity. The user could already poke at
> the msix tables (as well as perform DMA); they just couldn't get a useful
> interrupt out of them.
Poking at msix tables won't cause memory corruption unless msix and bus
mastering is enabled. It's true root can enable msix and bus mastering
through sysfs - but that's easy to block or detect. Even if you don't
buy a security story, it seems less likely to trigger as a result
of a userspace bug.
--
MST
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [dpdk-dev] [PATCH 2/2] uio: new driver to support PCI MSI-X
2015-10-06 14:07 ` Michael S. Tsirkin
@ 2015-10-06 15:41 ` Avi Kivity
0 siblings, 0 replies; 44+ messages in thread
From: Avi Kivity @ 2015-10-06 15:41 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: dev, hjk, gregkh, linux-kernel
On 10/06/2015 05:07 PM, Michael S. Tsirkin wrote:
> On Tue, Oct 06, 2015 at 03:15:57PM +0300, Avi Kivity wrote:
>> btw, (2) doesn't really add any insecurity. The user could already poke at
>> the msix tables (as well as perform DMA); they just couldn't get a useful
>> interrupt out of them.
> Poking at msix tables won't cause memory corruption unless msix and bus
> mastering is enabled.
It's a given that bus mastering is enabled. It's true that msix is
unlikely to be enabled, unless msix support is added.
> It's true root can enable msix and bus mastering
> through sysfs - but that's easy to block or detect. Even if you don't
> buy a security story, it seems less likely to trigger as a result
> of a userspace bug.
If you're doing DMA, that's the least of your worries.
Still, zero-mapping the msix space seems reasonable, and can protect
userspace from silly stuff. It can't be considered to have anything to
do with security though, as long as users can simply DMA to every bit of
RAM in the system they want to.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [dpdk-dev] [PATCH 2/2] uio: new driver to support PCI MSI-X
2015-10-06 12:15 ` Avi Kivity
2015-10-06 14:07 ` Michael S. Tsirkin
@ 2015-10-16 17:11 ` Thomas Monjalon
2015-10-16 17:20 ` Stephen Hemminger
1 sibling, 1 reply; 44+ messages in thread
From: Thomas Monjalon @ 2015-10-16 17:11 UTC (permalink / raw)
To: Michael S. Tsirkin, linux-kernel; +Cc: Avi Kivity, dev, hjk, gregkh
To sum it up,
We want to remove the need of the out-of-tree module igb_uio.
3 possible implementations were discussed so far:
- new UIO driver
- extend uio_pci_generic
- VFIO without IOMMU
It is preferred to avoid creating yet another module to support.
That's why the uio_pci_generic extension would be nice.
In my understanding, there are currently 2 issues with the patches
from Vlad and Stephen:
- IRQ must be mapped to a fd without using a new ioctl
- MSI-X handling in userspace breaks the memory protection
I'm confident the first issue can be fixed with something like sysfs.
About the "security" concern, mainly expressed by MST, I think the idea
of Avi (below) deserves to be discussed.
2015-10-06 15:15, Avi Kivity:
> On 10/06/2015 10:33 AM, Stephen Hemminger wrote:
> > Other than implementation objections, so far the two main arguments
> > against this reduce to:
> > 1. If you allow UIO ioctl then it opens an API hook for all the crap out
> > of tree UIO drivers to do what they want.
> > 2. If you allow UIO MSI-X then you are expanding the usage of userspace
> > device access in an insecure manner.
[...]
> btw, (2) doesn't really add any insecurity. The user could already poke
> at the msix tables (as well as perform DMA); they just couldn't get a
> useful interrupt out of them.
>
> Maybe a module parameter "allow_insecure_dma" can be added to
> uio_pci_generic. Without the parameter, bus mastering and msix is
> disabled, with the parameter it is allowed. This requires the sysadmin
> to take a positive step in order to make use of their hardware.
Giving the control of the memory protection level to the distribution or
the administrator looks a good idea.
When allowing insecure DMA, a log will make clear how it is supported
-or not- by the system provider.
>From another thread:
2015-10-01 14:09, Michael S. Tsirkin:
> If Linux keeps enabling hacks, no one will bother doing the right thing.
> Upstream inclusion is the only carrot Linux has to make people do the
> right thing.
The "right thing" should be guided by the users needs at a given time.
The "carrot" for a better solution will be to have a well protected system.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [dpdk-dev] [PATCH 2/2] uio: new driver to support PCI MSI-X
2015-10-16 17:11 ` Thomas Monjalon
@ 2015-10-16 17:20 ` Stephen Hemminger
0 siblings, 0 replies; 44+ messages in thread
From: Stephen Hemminger @ 2015-10-16 17:20 UTC (permalink / raw)
To: Thomas Monjalon
Cc: Avi Kivity, Michael S. Tsirkin, dev, linux-kernel, hjk, gregkh
On Fri, 16 Oct 2015 19:11:35 +0200
Thomas Monjalon <thomas.monjalon@6wind.com> wrote:
> To sum it up,
> We want to remove the need of the out-of-tree module igb_uio.
> 3 possible implementations were discussed so far:
> - new UIO driver
> - extend uio_pci_generic
> - VFIO without IOMMU
There is recent progress on VFIO without IOMMU. This looks the most promising
long term solution.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [dpdk-dev] [PATCH 2/2] uio: new driver to support PCI MSI-X
2015-10-06 7:33 ` Stephen Hemminger
2015-10-06 12:15 ` Avi Kivity
@ 2015-10-06 13:42 ` Michael S. Tsirkin
1 sibling, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2015-10-06 13:42 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev, hjk, gregkh, linux-kernel
On Tue, Oct 06, 2015 at 08:33:56AM +0100, Stephen Hemminger wrote:
> Other than implementation objections, so far the two main arguments
> against this reduce to:
> 1. If you allow UIO ioctl then it opens an API hook for all the crap out
> of tree UIO drivers to do what they want.
> 2. If you allow UIO MSI-X then you are expanding the usage of userspace
> device access in an insecure manner.
That's not all. Without MSI one can detect insecure usage by detecting
userspace enabling bus mastering. This can be detected simply using
lspci. Or one can also imagine a configuration where this ability is
disabled, is logged, or taints kernel. This seems like something that
might be worth having for some locked-down systems.
OTOH enabling MSI requires enabling bus mastering so suddenly we have no
idea whether device can be/is used in a safe way.
>
> Another alternative which I explored was making a version of VFIO that
> works without IOMMU. It solves #1 but actually increases the likely negative
> response to arguent #2.
No - because VFIO has limited protection against device misuse by
userspace, by limiting access to sub-ranges of device BARs and config
space. For a device that doesn't do DMA, that will be enough to make it
secure to use.
That's a pretty weak excuse to support userspace drivers for PCI devices
without an IOMMU, but it's the best I heard so far.
Is that worth the security trade-off? I'm still not sure.
> This would keep same API, and avoid having to
> modify UIO. But we would still have the same (if not more resistance)
> from IOMMU developers who believe all systems have to be secure against
> root.
"Secure against root" is a confusing way to put it IMHO. We are talking
about memory protection.
So that's not IOMMU developers IIUC. I believe most kernel developers will
agree it's not a good idea to let userspace corrupt kernel memory.
Otherwise, the driver can't be supported, and maintaining upstream
drivers that can't be supported serves no useful purpose. Anyone can
load out of tree ones just as well.
VFIO already supports MSI so VFIO developers already have a lot of
experience with these issues. Getting their input would be valuable.
--
MST
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [dpdk-dev] [PATCH 2/2] uio: new driver to support PCI MSI-X
2015-10-05 22:49 ` Michael S. Tsirkin
2015-10-06 7:33 ` Stephen Hemminger
@ 2015-10-06 8:23 ` Vlad Zolotarov
2015-10-06 13:58 ` Michael S. Tsirkin
1 sibling, 1 reply; 44+ messages in thread
From: Vlad Zolotarov @ 2015-10-06 8:23 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: dev, hjk, gregkh, linux-kernel
On 10/06/15 01:49, Michael S. Tsirkin wrote:
> On Tue, Oct 06, 2015 at 01:09:55AM +0300, Vladislav Zolotarov wrote:
>> How about instead of trying to invent the wheel just go and attack the problem
>> directly just like i've proposed already a few times in the last days: instead
>> of limiting the UIO limit the users that are allowed to use UIO to privileged
>> users only (e.g. root). This would solve all clearly unresolvable issues u are
>> raising here all together, wouldn't it?
> No - root or no root, if the user can modify the addresses in the MSI-X
> table and make the chip corrupt random memory, this is IMHO a non-starter.
Michael, how this or any other related patch is related to the problem u
r describing? The above ability is there for years and if memory serves
me well it was u who wrote uio_pci_generic with this "security flaw". ;)
This patch in general only adds the ability to receive notifications per
MSI-X interrupt and it has nothing to do with the ability to reprogram
the MSI-X related registers from the user space which was always there.
>
> And tainting kernel is not a solution - your patch adds a pile of
> code that either goes completely unused or taints the kernel.
> Not just that - it's a dedicated userspace API that either
> goes completely unused or taints the kernel.
>
>>> --
>>> MST
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [dpdk-dev] [PATCH 2/2] uio: new driver to support PCI MSI-X
2015-10-06 8:23 ` Vlad Zolotarov
@ 2015-10-06 13:58 ` Michael S. Tsirkin
2015-10-06 14:49 ` Vlad Zolotarov
0 siblings, 1 reply; 44+ messages in thread
From: Michael S. Tsirkin @ 2015-10-06 13:58 UTC (permalink / raw)
To: Vlad Zolotarov; +Cc: dev, hjk, gregkh, linux-kernel
On Tue, Oct 06, 2015 at 11:23:11AM +0300, Vlad Zolotarov wrote:
> Michael, how this or any other related patch is related to the problem u r
> describing?
> The above ability is there for years and if memory serves me
> well it was u who wrote uio_pci_generic with this "security flaw". ;)
I answered all this already.
This patch enables bus mastering, enables MSI or MSI-X, and requires
userspace to map the MSI-X table and read/write the config space.
This means that a single userspace bug is enough to corrupt kernel
memory.
uio_pci_generic does not enable bus mastering or MSI, and
it might be a good idea to have uio_pci_generic block
access to MSI/MSI-X config.
--
MST
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [dpdk-dev] [PATCH 2/2] uio: new driver to support PCI MSI-X
2015-10-06 13:58 ` Michael S. Tsirkin
@ 2015-10-06 14:49 ` Vlad Zolotarov
2015-10-06 15:00 ` Michael S. Tsirkin
0 siblings, 1 reply; 44+ messages in thread
From: Vlad Zolotarov @ 2015-10-06 14:49 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: dev, hjk, gregkh, linux-kernel
On 10/06/15 16:58, Michael S. Tsirkin wrote:
> On Tue, Oct 06, 2015 at 11:23:11AM +0300, Vlad Zolotarov wrote:
>> Michael, how this or any other related patch is related to the problem u r
>> describing?
>> The above ability is there for years and if memory serves me
>> well it was u who wrote uio_pci_generic with this "security flaw". ;)
> I answered all this already.
>
> This patch enables bus mastering, enables MSI or MSI-X
This may be done from the user space right now without this patch...
> , and requires
> userspace to map the MSI-X table
Hmmm... I must have missed this requirement. Could u, pls., clarify?
From what I see, MSI/MSI-X table is configured completely in the kernel
here...
> and read/write the config space.
> This means that a single userspace bug is enough to corrupt kernel
> memory.
Could u, pls., provide and example of this simple bug? Because it's
absolutely not obvious...
>
> uio_pci_generic does not enable bus mastering or MSI, and
> it might be a good idea to have uio_pci_generic block
> access to MSI/MSI-X config.
Since device bars may be mapped bypassing the UIO/uio_pci_generic - this
won't solve any issue.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [dpdk-dev] [PATCH 2/2] uio: new driver to support PCI MSI-X
2015-10-06 14:49 ` Vlad Zolotarov
@ 2015-10-06 15:00 ` Michael S. Tsirkin
2015-10-06 16:40 ` Vlad Zolotarov
0 siblings, 1 reply; 44+ messages in thread
From: Michael S. Tsirkin @ 2015-10-06 15:00 UTC (permalink / raw)
To: Vlad Zolotarov; +Cc: dev, hjk, gregkh, linux-kernel
On Tue, Oct 06, 2015 at 05:49:21PM +0300, Vlad Zolotarov wrote:
> >and read/write the config space.
> >This means that a single userspace bug is enough to corrupt kernel
> >memory.
>
> Could u, pls., provide and example of this simple bug? Because it's
> absolutely not obvious...
Stick a value that happens to match a kernel address in Msg Addr field
in an unmasked MSI-X entry.
--
MST
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [dpdk-dev] [PATCH 2/2] uio: new driver to support PCI MSI-X
2015-10-06 15:00 ` Michael S. Tsirkin
@ 2015-10-06 16:40 ` Vlad Zolotarov
0 siblings, 0 replies; 44+ messages in thread
From: Vlad Zolotarov @ 2015-10-06 16:40 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: dev, hjk, gregkh, linux-kernel
On 10/06/15 18:00, Michael S. Tsirkin wrote:
> On Tue, Oct 06, 2015 at 05:49:21PM +0300, Vlad Zolotarov wrote:
>>> and read/write the config space.
>>> This means that a single userspace bug is enough to corrupt kernel
>>> memory.
>> Could u, pls., provide and example of this simple bug? Because it's
>> absolutely not obvious...
> Stick a value that happens to match a kernel address in Msg Addr field
> in an unmasked MSI-X entry.
This patch neither configures MSI-X entries in the user space nor
provides additional means to do so therefore this "sticking" would be a
matter of some extra code that is absolutely unrelated to this patch.
So, this example seems absolutely irrelevant to this particular discussion.
thanks,
vlad
>
^ permalink raw reply [flat|nested] 44+ messages in thread