From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-it1-f196.google.com (mail-it1-f196.google.com [209.85.166.196]) by dpdk.org (Postfix) with ESMTP id 0E6F61B28B for ; Thu, 14 Feb 2019 13:27:50 +0100 (CET) Received: by mail-it1-f196.google.com with SMTP id i2so13877854ite.5 for ; Thu, 14 Feb 2019 04:27:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=netronome-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=9igB+vaawV+A0J6DeONAGBp7/Co3M7yHJlUUzes2Tnc=; b=vcyymfiCF/AzeQotNkmxa3fDAjUmYcikHNtxJ/LnxFGYSf6G7kOCdntzwGYzjs7RQ5 wHqEIx9CboR6DnKubf0ODfgtPxwOhE7YI1ZDMxhFaazIMBy2hFBpN3FdqOJQ8cisSN60 TH+3C2ANGwDNHXe+b4tv9ITDECZVKPQ0V0FmHBSp3aZscccIJIN76UpakAtZmHKYgVZU gvmy41hZC5m01mwliYTDVtzIsZseflOutw6Gk9f44smmbabbXJXpzyqFmvDOS0JtyToA A8O3IibB6eKs7k9sres0ttVEuqk1ZX19WlIMhh4A+NV9yTnkK4nOK6xL1lcNnl4JE2W9 ey0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=9igB+vaawV+A0J6DeONAGBp7/Co3M7yHJlUUzes2Tnc=; b=K6BAPphVGt3das2ccdp/DrhPXeuAMlx1tw72f3khWZrlk8lCU/2H1akRbt9uGSlD9V nkxYLzvIsbsP31pZqI2se4z1y0pmkkfhg21MJipdFgLkpYxQx+GEFjPPAoJoNR4pwy8N Zudzfq679gvgLUAWHdCQtNwzaVPJd04MQVLQN52d87kOqoxNXw9pBiWGnS6yuVtpNFHH /1R+Zoz/DV+h0zzDzJFNAoBnE/6YN+89tDhlw/AhDveBWyTe41N5WNTqhZvzBUywC2Fw StaN6A1g+gw7abwejTHlQZYQokrJdg+dGlyXMjroAYjjGVQXILFOY28ROIItf1NSSBVl z8Pg== X-Gm-Message-State: AHQUAuZPjl22Jvx/ncEJ77dOy4i6E6+KyR3G7lN3yJtytepzVZpPE863 QEr5RZ8YZJe3uQ8U92dKaXcW49aIlQQMFLCQI3kmBQ== X-Google-Smtp-Source: AHgI3IaAW+reB36KeeV88MbQcIpOI1dfH/q20ew5R1w5MC4W0grO4JAwuV253LA8EBhV2u12Y0yf0WaziBM+RAzvhW8= X-Received: by 2002:a24:1d1:: with SMTP id 200mr1748062itk.146.1550147269349; Thu, 14 Feb 2019 04:27:49 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Alejandro Lucero Date: Thu, 14 Feb 2019 12:27:38 +0000 Message-ID: To: Shahaf Shuler Cc: "Burakov, Anatoly" , Yongseok Koh , Thomas Monjalon , Ferruh Yigit , "nhorman@tuxdriver.com" , Gaetan Rivet , dev Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for external memory X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Feb 2019 12:27:50 -0000 On Thu, Feb 14, 2019 at 12:22 PM Alejandro Lucero < alejandro.lucero@netronome.com> wrote: > > > On Wed, Feb 13, 2019 at 7:24 PM Shahaf Shuler > wrote: > >> Wednesday, February 13, 2019 1:43 PM, Alejandro Lucero: >> > Subject: Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for >> > external memory >> > >> > On Wed, Feb 13, 2019 at 9:11 AM Shahaf Shuler >> > wrote: >> > >> > > This series is in continue to RFC[1]. >> > > >> > > The DPDK APIs expose 3 different modes to work with memory used for >> > DMA: >> > > >> > > 1. Use the DPDK owned memory (backed by the DPDK provided >> > hugepages). >> > > This memory is allocated by the DPDK libraries, included in the DPDK >> > > memory system (memseg lists) and automatically DMA mapped by the >> > DPDK >> > > layers. >> > > >> > > 2. Use memory allocated by the user and register to the DPDK memory >> > > systems. This is also referred as external memory. Upon registration >> > > of the external memory, the DPDK layers will DMA map it to all neede= d >> > > devices. >> > > >> > > 3. Use memory allocated by the user and not registered to the DPDK >> > > memory system. This is for users who wants to have tight control on >> > > this memory. The user will need to explicitly call DMA map function = in >> > > order to register such memory to the different devices. >> > > >> > > The scope of the patch focus on #3 above. >> > > >> > > >> > Why can not we have case 2 covering case 3? >> >> Because it is not our choice rather the DPDK application. >> We could not allow it, and force the application to register their >> external memory to the DPDK memory management system. However IMO it wil= l >> be wrong. >> The use case exists - some application wants to manage their memory by >> themselves. w/o the extra overhead of rte_malloc, without creating a >> special socket to populate the memory and without redundant API calls to >> rte_extmem_*. >> >> Simply allocate chunk of memory, DMA map it to device and that=E2=80=99s= it. >> >> > Usability is a strong point, but up to some extent. DPDK is all about > performance, and adding options the user can choose from will add pressur= e > and complexity for keeping the performance. Your proposal makes sense fro= m > an user point of view, but will it avoid to modify things in the DPDK cor= e > for supporting this case broadly in the future? Multiprocess will be hard > to get, if not impossible, without adding more complexity, and although y= ou > likely do not expect that use case requiring multiprocess support, once w= e > have DPDK apps using this model, sooner or later those companies with > products based on such option will demand broadly support. I can foresee > not just multiprocess support will require changes in the future. > > This reminds me the case of obtaining real time: the more complexity the > less determinism can be obtained. It is not impossible, simply it is far > more complex. Pure real time operating systems can add new functionalitie= s, > but it is hard to do it properly without jeopardising the main goal. > Generic purpose operating systems can try to improve determinism, but up = to > some extent and with important complexity costs. DPDK is the real time > operating system in this comparison. > > >> > >> > >> > > Currently the only way to map external memory is through VFIO >> > > (rte_vfio_dma_map). While VFIO is common, there are other vendors >> > > which use different ways to map memory (e.g. Mellanox and NXP). >> > > >> > > >> > As you say, VFIO is common, and when allowing DMAs programmed in user >> > space, the right thing to do. >> >> It is common indeed. Why it the right thing to do? >> >> > Compared with UIO, for sure. VFIO does have the right view of the system > in terms of which devices can properly be isolated. Can you confirm a > specific implementation by a vendor can ensure same behaviour? If so, do > you have duplicated code then? if the answer is your are using VFIO data, > why not to use VFIO as the interface and add the required connection > between VFIO and drivers? > > What about mapping validation? is the driver doing that part or relying o= n > kernel code? or is it just assuming the mapping is safe? > > >> I'm assuming there is an IOMMU hardware and >> > this is what Mellanox and NXP rely on in some way or another. >> >> For Mellanox, the device works with virtual memory, not physical. If you >> think of it, it is more secure for user space application. Mellanox devi= ce >> has internal memory translation unit between virtual memory and physical >> memory. >> IOMMU can be added on top of it, in case the host doesn't trust the >> device or the device is given to untrusted entity like VM. >> >> > Any current NIC or device will work with virtual addresses if IOMMU is in > place, not matter if the device is IOMMU-aware or not. Any vendor, with > that capability in their devices, should follow generic paths and a commo= n > interface with the vendor drivers being the executors. The drivers know h= ow > to tell the device, but they should be told what to tell and not by the > user but by the kernel. > > I think reading your comment "in case the host doesn't trust the device" > makes easier to understand what you try to obtain, and at the same time > makes my concerns not a problem at all. This is a case for DPDK being use= d > in certain scenarios where the full system is trusted, what I think is a > completely rightful option. My only concern then is the complexity it cou= ld > imply sooner or later, although I have to admit it is not a strong one :-= ) > > I forgot to mention the problem of leaving that option open in not fully trusted systems. I do not know how it could be avoided, maybe some checks in EAL initialization, but maybe this is not possible at all. Anyway, I think this is worth to be discussed further. > > >> > Having each driver doing things in their own way will end up in a >> harder to >> > validate system. >> >> Different vendors will have different HW implementations. We cannot forc= e >> everybody to align the IOMMU. >> What we can do, is to ease the user life and provide vendor agnostic API= s >> which just provide the needed functionality. On our case DMA map and unm= ap. >> The user should not care if its IOMMU, Mellanox memory registration >> through verbs or NXP special mapping. >> >> The sys admin should set/unset the IOMMU as a general mean of protection= . >> And this of course will work also w/ Mellanox devices. >> >> If there is an IOMMU hardware, same mechanism should be >> > used always, leaving to the IOMMU hw specific implementation to deal >> with >> > the details. If a NIC is IOMMU-able, that should not be supported by >> specific >> > vendor drivers but through a generic solution like VFIO which will >> validate a >> > device with such capability and to perform the required actions for >> that case. >> > VFIO and IOMMU should be modified as needed for supporting this >> > requirement instead of leaving vendor drivers to implement their own >> > solution. >> >> Again - I am against of forcing every PCI device to use VFIO, and I don'= t >> think IOMMU as a HW device should control other PCI devices. >> I see nothing wrong with device which also has extra capabilities of >> memory translation, and adds another level of security to the user >> application. >> >> > In a system with untrusted components using the device, a generic way of > properly configure the system with the right protections should be used > instead of relying on specific vendor implementation. > > >> > >> > In any case, I think this support should be in a different patchset >> than the >> > private user space mappings. >> > >> > > > > >> > > The work in this patch moves the DMA mapping to vendor agnostic APIs= . >> > > A new map and unmap ops were added to rte_bus structure. >> > > Implementation of those was done currently only on the PCI bus. The >> > > implementation takes the driver map and umap implementation as bypas= s >> > to the VFIO mapping. >> > > That is, in case of no specific map/unmap from the PCI driver, VFIO >> > > mapping, if possible, will be used. >> > > >> > > Application use with those APIs is quite simple: >> > > * allocate memory >> > > * take a device, and query its rte_device. >> > > * call the bus map function for this device. >> > > >> > > Future work will deprecate the rte_vfio_dma_map and >> > rte_vfio_dma_unmap >> > > APIs, leaving the PCI device APIs as the preferred option for the >> user. >> > > >> > > [1] >> > > >> > https://emea01.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fp= at >> > > >> > ches.dpdk.org%2Fpatch%2F47796%2F&data=3D02%7C01%7Cshahafs%40 >> > mellanox >> > > >> > .com%7Cdc209a2ceace48c0452008d691a8762d%7Ca652971c7d2e4d9ba6a4d1 >> > 49256f >> > > >> > 461b%7C0%7C0%7C636856550053348339&sdata=3D3TEUJfS9jBOsbvaPYwo >> > itQLj7o >> > > h9VCrtaK7We%2FItg5c%3D&reserved=3D0 >> > > >> > > Shahaf Shuler (6): >> > > vfio: allow DMA map of memory for the default vfio fd >> > > vfio: don't fail to DMA map if memory is already mapped >> > > bus: introduce DMA memory mapping for external memory >> > > net/mlx5: refactor external memory registration >> > > net/mlx5: support PCI device DMA map and unmap >> > > doc: deprecate VFIO DMA map APIs >> > > >> > > doc/guides/prog_guide/env_abstraction_layer.rst | 2 +- >> > > doc/guides/rel_notes/deprecation.rst | 4 + >> > > drivers/bus/pci/pci_common.c | 78 +++++++ >> > > drivers/bus/pci/rte_bus_pci.h | 14 ++ >> > > drivers/net/mlx5/mlx5.c | 2 + >> > > drivers/net/mlx5/mlx5_mr.c | 232 >> ++++++++++++++++--- >> > > drivers/net/mlx5/mlx5_rxtx.h | 5 + >> > > lib/librte_eal/common/eal_common_bus.c | 22 ++ >> > > lib/librte_eal/common/include/rte_bus.h | 57 +++++ >> > > lib/librte_eal/common/include/rte_vfio.h | 12 +- >> > > lib/librte_eal/linuxapp/eal/eal_vfio.c | 26 ++- >> > > lib/librte_eal/rte_eal_version.map | 2 + >> > > 12 files changed, 418 insertions(+), 38 deletions(-) >> > > >> > > -- >> > > 2.12.0 >> > > >> > > >> >