From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id 9EA923787 for ; Thu, 1 Oct 2015 17:22:56 +0200 (CEST) Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by mx1.redhat.com (Postfix) with ESMTPS id EE64FC0798E9; Thu, 1 Oct 2015 15:22:55 +0000 (UTC) Received: from redhat.com (ovpn-116-83.ams2.redhat.com [10.36.116.83]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with SMTP id t91FMqe9019293; Thu, 1 Oct 2015 11:22:52 -0400 Date: Thu, 1 Oct 2015 18:22:51 +0300 From: "Michael S. Tsirkin" To: Stephen Hemminger Message-ID: <20151001152251.GA25009@redhat.com> References: <1443652138-31782-1-git-send-email-stephen@networkplumber.org> <1443652138-31782-3-git-send-email-stephen@networkplumber.org> <20151001104505-mutt-send-email-mst@redhat.com> <20151001075037.61c43f63@urahara> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20151001075037.61c43f63@urahara> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23 Cc: dev@dpdk.org, hjk@hansjkoch.de, gregkh@linux-foundation.org, linux-kernel@vger.kernel.org Subject: Re: [dpdk-dev] [PATCH 2/2] uio: new driver to support PCI MSI-X X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Oct 2015 15:22:57 -0000 On Thu, Oct 01, 2015 at 07:50:37AM -0700, Stephen Hemminger wrote: > On Thu, 1 Oct 2015 11:33:06 +0300 > "Michael S. Tsirkin" wrote: > > > On Wed, Sep 30, 2015 at 03:28:58PM -0700, Stephen Hemminger wrote: > > > This driver allows using PCI device with Message Signalled Interrupt > > > from userspace. The API is similar to the igb_uio driver used by the DPDK. > > > Via ioctl it provides a mechanism to map MSI-X interrupts into event > > > file descriptors similar to VFIO. > > > > > > VFIO is a better choice if IOMMU is available, but often userspace drivers > > > have to work in environments where IOMMU support (real or emulated) is > > > not available. All UIO drivers that support DMA are not secure against > > > rogue userspace applications programming DMA hardware to access > > > private memory; this driver is no less secure than existing code. > > > > > > Signed-off-by: Stephen Hemminger > > > > I don't think copying the igb_uio interface is a good idea. > > What DPDK is doing with igb_uio (and indeed uio_pci_generic) > > is abusing the sysfs BAR access to provide unlimited > > access to hardware. > > > > MSI messages are memory writes so any generic device capable > > of MSI is capable of corrupting kernel memory. > > This means that a bug in userspace will lead to kernel memory corruption > > and crashes. This is something distributions can't support. > > > > uio_pci_generic is already abused like that, mostly > > because when I wrote it, I didn't add enough protections > > against using it with DMA capable devices, > > and we can't go back and break working userspace. > > But at least it does not bind to VFs which all of > > them are capable of DMA. > > > > The result of merging this driver will be userspace abusing the > > sysfs BAR access with VFs as well, and we do not want that. > > > > > > Just forwarding events is not enough to make a valid driver. > > What is missing is a way to access the device in a safe way. > > > > On a more positive note: > > > > What would be a reasonable interface? One that does the following > > in kernel: > > > > 1. initializes device rings (can be in pinned userspace memory, > > but can not be writeable by userspace), brings up interface link > > 2. pins userspace memory (unless using e.g. hugetlbfs) > > 3. gets request, make sure it's valid and belongs to > > the correct task, put it in the ring > > 4. in the reverse direction, notify userspace when buffers > > are available in the ring > > 5. notify userspace about MSI (what this driver does) > > > > What userspace can be allowed to do: > > > > format requests (e.g. transmit, receive) in userspace > > read ring contents > > > > What userspace can't be allowed to do: > > > > access BAR > > write rings > > > > > > This means that the driver can not be a generic one, > > and there will be a system call overhead when you > > write the ring, but that's the price you have to > > pay for ability to run on systems without an IOMMU. > > I think I understand what you are proposing, but it really doesn't > fit into the high speed userspace networking model. I'm aware of the fact currently the model does everything including bringing up the link in user-space. But there's really no justification for this. Only data path things should be in userspace. A userspace bug should not be able to do things like over-writing the on-device EEPROM. > 1. Device rings are device specific, can't be in a generic driver. So that's more work, and it is not going to happen if people can get by with insecure hacks. > 2. DPDK uses huge mememory. Hugetlbfs? Don't see why this is an issue. Might make things simpler. > 3. Performance requires all ring requests be done in pure userspace, > (ie no syscalls) Make only the TX ring writeable then. At least you won't be able to corrupt the kernel memory. > 4. Ditto, can't have kernel to userspace notification per packet RX ring can be read-only, so userspace can read it directly. -- MST