From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f47.google.com (mail-wm0-f47.google.com [74.125.82.47]) by dpdk.org (Postfix) with ESMTP id C4F78237 for ; Tue, 28 Nov 2017 20:13:06 +0100 (CET) Received: by mail-wm0-f47.google.com with SMTP id 9so1632737wme.4 for ; Tue, 28 Nov 2017 11:13:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=netronome-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=GgqwbSDHi0r6rE/e+Aw6wFfebu88JHzhXgdPWYJutLw=; b=UsX0ewbFt+KO+//224MAooDbyYkeFYKuNvLCosSF7In2hA5NyjUYsQT/2zAIaB808W w/o7zxv1MJD6h/fHorFTsQwmwnaC/FCXSsaqoQ40MHhwtjHsphLl5Dr4JQLljMik0geY f2MV+X2cyePc7FCtJpT7pAUkeWTPhsO7wu4TS8GqCB6t1Jd7JxyYCv28/Azvg1Sdszl4 GExPDszwsLxguewULwU/pNkhsNL1vkTQOwg7HKiALIfLiythpQaov269bVTKGaQUZRuz 0+Ao+aP88244K5/Sx3fSGIygOQ51QoqnG//tON+4o3OteuOzHfk8IuXP8JNq63RWLWev PSEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=GgqwbSDHi0r6rE/e+Aw6wFfebu88JHzhXgdPWYJutLw=; b=OUdSAJT8PVCu+3KgBb9mHI46mbZeItdCx26+PryvhtgcC6lgIpUxH+9vgRI5Yb494T EeMXxue2CsBxbV11ltXPsmw1SsN6jpUdPDeYabXEtVjPT+PLqMADNnOou5Cdni9cdr6q W1RUZ/qdcSR8kL/g8ES+HvNJI8d+tHBfyhx9EhZ4UjF3ngYUEMLm37jIAmDeBjKHx+dv cfKytZoNkCulGtxrjTjB4lJzpBhA0pBqYCbPl4eB/MwBf6438LnkvlgRtzJBDYiHcCiK ln+GclSDjFQWwoDqQfFBZksNI9XGMsa9P4GipGgJ1PAAB3vRLKzz2/Fe3gh28i5QlP4N RXlQ== X-Gm-Message-State: AJaThX7YGAtSoLjHkIBZBWWyftm3DyXGqPTKhVDuRgYcuZ1zV2fs8HLO 7zbSQajZV5Hp8y8o3fsTn7qpxjEvxtgDDh5w3cAKvw== X-Google-Smtp-Source: AGs4zMb8N5rsx7eocz1SQ79R6jDynOCs+eKqVo4bN6vaUOCaAptasO9bfp2uhpB9LbvAfbUa4DybRNvl7B38GrIlUB8= X-Received: by 10.80.215.29 with SMTP id t29mr4121312edi.45.1511896386370; Tue, 28 Nov 2017 11:13:06 -0800 (PST) MIME-Version: 1.0 Received: by 10.80.226.9 with HTTP; Tue, 28 Nov 2017 11:13:05 -0800 (PST) In-Reply-To: <1511891412.2692.95.camel@intel.com> References: <1483044080.11975.1.camel@intel.com> <1483565664.9482.3.camel@intel.com> <6c6766f0-145e-9354-e275-d107d69173c3@intel.com> <2179627.cU6MQpMJOa@xps> <1511805495.2692.82.camel@intel.com> <1511891412.2692.95.camel@intel.com> From: Alejandro Lucero Date: Tue, 28 Nov 2017 19:13:05 +0000 Message-ID: To: "Walker, Benjamin" Cc: "thomas@monjalon.net" , "Gonzalez Monroy, Sergio" , "Burakov, Anatoly" , "Tan, Jianfeng" , "dev@dpdk.org" Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-dev] Running DPDK as an unprivileged user X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Nov 2017 19:13:06 -0000 On Tue, Nov 28, 2017 at 5:50 PM, Walker, Benjamin wrote: > On Tue, 2017-11-28 at 14:16 +0000, Alejandro Lucero wrote: > > > > > > On Mon, Nov 27, 2017 at 5:58 PM, Walker, Benjamin < > benjamin.walker@intel.com> > > wrote: > > > On Sun, 2017-11-05 at 01:17 +0100, Thomas Monjalon wrote: > > > > Hi, restarting an old topic, > > > > > > > > 05/01/2017 16:52, Tan, Jianfeng: > > > > > On 1/5/2017 5:34 AM, Walker, Benjamin wrote: > > > > > > > > Note that this > > > > > > > > probably means that using uio on recent kernels is subtly > > > > > > > > broken and cannot be supported going forward because there > > > > > > > > is no uio mechanism to pin the memory. > > > > > > > > > > > > > > > > The first open question I have is whether DPDK should allow > > > > > > > > uio at all on recent (4.x) kernels. My current understanding > > > > > > > > is that there is no way to pin memory and hugepages can now > > > > > > > > be moved around, so uio would be unsafe. What does the > > > > > > > > community think here? > > > > > > > > > > Back to this question, removing uio support in DPDK seems a little > > > > > overkill to me. Can we just document it down? Like, firstly warn > users > > > > > do not invoke migrate_pages() or move_pages() to a DPDK process; > as for > > > > > the kcompactd daemon and some more cases (like compaction could be > > > > > triggered by alloc_pages()), could we just recommend to disable > > > > > CONFIG_COMPACTION? > > > > > > > > We really need to better document the limitations of UIO. > > > > May we have some suggestions here? > > > > > > > > > Another side, how does vfio pin those memory? Through memlock > (from code > > > > > in vfio_pin_pages())? So why not just mlock those hugepages? > > > > > > > > Good question. Why not mlock the hugepages? > > > > > > mlock just guarantees that a virtual page is always backed by *some* > > > physical > > > page of memory. It does not guarantee that over the lifetime of the > process > > > a > > > virtual page is mapped to the *same* physical page. The kernel is free > to > > > transparently move memory around, compress it, dedupe it, etc. > > > > > > vfio is not pinning the memory, but instead is using the IOMMU (a > piece of > > > hardware) to participate in the memory management on the platform. If a > > > device > > > begins a DMA transfer to an I/O virtual address, the IOMMU will > coordinate > > > with > > > the main MMU to make sure that the data ends up in the correct > location, > > > even as > > > the virtual to physical mappings are being modified. > > > > This last comment confused me because you said VFIO did the page pinning > in > > your first email. > > I have been looking at the kernel code and the VFIO driver does pin the > pages, > > at least the iommu type 1. > > The vfio driver does flag the page in a way that prevents some types of > movement, so in that sense it is pinning it. I haven't done an audit to > guarantee that it prevents all types of movement - that would be very > difficult. > My point was more that vfio is not strictly relying on pinning to > function, but > instead relying on the IOMMU. In my previous email I said "pinning" when I > really meant "programs the IOMMU". Of course, with vfio-noiommu you'd be > back to > relying on pinning again, in which case you'd really have to do that full > audit > of the kernel memory manager to confirm that the flags vfio is setting > prevent > all movement for any reason. > > If you are saying the kernel code related to page migration will know how to reprogram the IOMMU, I think that is unlikely. What the VFIO code is doing is to set a flag for those involved pages saying they are "writable", and therefore it is not safe to do the page migration. If that mm code needs to reprogram the IOMMU, it needs to know not just the process which page table will be modified, but also the device that process has assigned, because the IOMMU mapping is related to devices and not processes. So I'm not 100% sure, but I don't think the kernel is doing so. > > > > I can see a problem adding support to UIO for doing the same, because > that > > implies there is a device > > doing DMAs and programmed from user space, which is something the UIO > > maintainer is against. But because > > vfio-noiommu mode was implemented just for this, I guess that could be > added > > to the VFIO driver. This does not > > solve the problem of software not using vfio though. > > vfio-noiommu is intended for devices programmed in user space, but > primarily for > devices that don't require physical addresses to perform data transfers > (like > RDMA NICs). Those devices don't actually require pinned memory and already > participate in the regular memory management on the platform, so putting > them > behind an IOMMU is of no additional value. > > AFAIK, noiommu mode was added to VFIO just for DPDK (mainly) and for solving the problem with the unupstreamable igb_uio module, and the problem with adding more features to the uio.ko > > > > Apart from improving the UIO documentation when used with DPDK, maybe > some > > sort of check could be done > > and DPDK requiring a explicit parameter for making the user aware of the > > potential risk when UIO is used and the > > kernel page migration is enabled. Not sure if this last thing could be > easily > > known from user space. > > The challenge is that there are so many reasons for a page to move, and > more are > added all the time. It would be really hard to correctly prevent the user > from > using uio in every case. Further, if the user is using uio inside of a > virtual > machine that happens to be deployed using the IOMMU on the host system, > most of > the reasons for a page to move (besides explicit requests to move pages) > are > alleviated and it is more or less safe. But the user would have no idea > from > within the guest that they're actually protected. I think this case - > using uio > from within a guest VM that is protected by the IOMMU - is common. > > That is true, but a driver can know if the system is a virtualized one, so then that explicit flag could not be needed. > > > > On another side, we suffered a similar problem when VMs were using SRIOV > and > > memory balloning. The IOMMU was > > removing the mapping for the memory removed, but the kernel inside the > VM did > > not get any event and the device > > ended up doing some wrong DMA operation. >