DPDK patches and discussions
 help / color / mirror / Atom feed
From: Alejandro Lucero <alejandro.lucero@netronome.com>
To: Shahaf Shuler <shahafs@mellanox.com>
Cc: "Burakov, Anatoly" <anatoly.burakov@intel.com>,
	Yongseok Koh <yskoh@mellanox.com>,
	 Thomas Monjalon <thomas@monjalon.net>,
	Ferruh Yigit <ferruh.yigit@intel.com>,
	 "nhorman@tuxdriver.com" <nhorman@tuxdriver.com>,
	Gaetan Rivet <gaetan.rivet@6wind.com>, dev <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for external memory
Date: Thu, 14 Feb 2019 12:27:38 +0000	[thread overview]
Message-ID: <CAD+H992MH5b59m1=NF1PqM_oKGniNjCp6d0tqun2N+QLrXw3sg@mail.gmail.com> (raw)
In-Reply-To: <CAD+H992TkEwN3m7yS5Qo-F7+9H8b6E71Hf6pn-WFEsECCpemYw@mail.gmail.com>

On Thu, Feb 14, 2019 at 12:22 PM Alejandro Lucero <
alejandro.lucero@netronome.com> wrote:

>
>
> On Wed, Feb 13, 2019 at 7:24 PM Shahaf Shuler <shahafs@mellanox.com>
> wrote:
>
>> Wednesday, February 13, 2019 1:43 PM, Alejandro Lucero:
>> > Subject: Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for
>> > external memory
>> >
>> > On Wed, Feb 13, 2019 at 9:11 AM Shahaf Shuler <shahafs@mellanox.com>
>> > wrote:
>> >
>> > > This series is in continue to RFC[1].
>> > >
>> > > The DPDK APIs expose 3 different modes to work with memory used for
>> > DMA:
>> > >
>> > > 1. Use the DPDK owned memory (backed by the DPDK provided
>> > hugepages).
>> > > This memory is allocated by the DPDK libraries, included in the DPDK
>> > > memory system (memseg lists) and automatically DMA mapped by the
>> > DPDK
>> > > layers.
>> > >
>> > > 2. Use memory allocated by the user and register to the DPDK memory
>> > > systems. This is also referred as external memory. Upon registration
>> > > of the external memory, the DPDK layers will DMA map it to all needed
>> > > devices.
>> > >
>> > > 3. Use memory allocated by the user and not registered to the DPDK
>> > > memory system. This is for users who wants to have tight control on
>> > > this memory. The user will need to explicitly call DMA map function in
>> > > order to register such memory to the different devices.
>> > >
>> > > The scope of the patch focus on #3 above.
>> > >
>> > >
>> > Why can not we have case 2 covering case 3?
>>
>> Because it is not our choice rather the DPDK application.
>> We could not allow it, and force the application to register their
>> external memory to the DPDK memory management system. However IMO it will
>> be wrong.
>> The use case exists  - some application wants to manage their memory by
>> themselves. w/o the extra overhead of rte_malloc, without creating a
>> special socket to populate the memory and without redundant API calls to
>> rte_extmem_*.
>>
>> Simply allocate chunk of memory, DMA map it to device and that’s it.
>>
>>
> Usability is a strong point, but up to some extent. DPDK is all about
> performance, and adding options the user can choose from will add pressure
> and complexity for keeping the performance. Your proposal makes sense from
> an user point of view, but will it avoid to modify things in the DPDK core
> for supporting this case broadly in the future? Multiprocess will be hard
> to get, if not impossible, without adding more complexity, and although you
> likely do not expect that use case requiring multiprocess support, once we
> have DPDK apps using this model, sooner or later those companies with
> products based on such option will demand broadly support. I can foresee
> not just multiprocess support will require changes in the future.
>
> This reminds me the case of obtaining real time: the more complexity the
> less determinism can be obtained. It is not impossible, simply it is far
> more complex. Pure real time operating systems can add new functionalities,
> but it is hard to do it properly without jeopardising the main goal.
> Generic purpose operating systems can try to improve determinism, but up to
> some extent and with important complexity costs. DPDK is the real time
> operating system in this comparison.
>
>
>> >
>> >
>> > > Currently the only way to map external memory is through VFIO
>> > > (rte_vfio_dma_map). While VFIO is common, there are other vendors
>> > > which use different ways to map memory (e.g. Mellanox and NXP).
>> > >
>> > >
>> > As you say, VFIO is common, and when allowing DMAs programmed in user
>> > space, the right thing to do.
>>
>> It is common indeed. Why it the right thing to do?
>>
>>
> Compared with UIO, for sure. VFIO does have the right view of the system
> in terms of which devices can properly be isolated. Can you confirm a
> specific implementation by a vendor can ensure same behaviour? If so, do
> you have duplicated code then? if the answer is your are using VFIO data,
> why not to use VFIO as the interface and add the required connection
> between VFIO and drivers?
>
> What about mapping validation? is the driver doing that part or relying on
> kernel code? or is it just assuming the mapping is safe?
>
>
>>  I'm assuming there is an IOMMU hardware and
>> > this is what Mellanox and NXP rely on in some way or another.
>>
>> For Mellanox, the device works with virtual memory, not physical. If you
>> think of it, it is more secure for user space application. Mellanox device
>> has internal memory translation unit between virtual memory and physical
>> memory.
>> IOMMU can be added on top of it, in case the host doesn't trust the
>> device or the device is given to untrusted entity like VM.
>>
>>
> Any current NIC or device will work with virtual addresses if IOMMU is in
> place, not matter if the device is  IOMMU-aware or not. Any vendor, with
> that capability in their devices, should follow generic paths and a common
> interface with the vendor drivers being the executors. The drivers know how
> to tell the device, but they should be told what to tell and not by the
> user but by the kernel.
>
> I think reading your comment "in case the host doesn't trust the device"
> makes easier to understand what you try to obtain, and at the same time
> makes my concerns not a problem at all. This is a case for DPDK being used
> in certain scenarios where the full system is trusted, what I think is a
> completely rightful option. My only concern then is the complexity it could
> imply sooner or later, although I have to admit it is not a strong one :-)
>
>

I forgot to mention the problem of leaving that option open in not fully
trusted systems. I do not know how it could be avoided, maybe some checks
in EAL initialization, but maybe this is not possible at all. Anyway, I
think this is worth to be discussed further.


> >
>> > Having each driver doing things in their own way will end up in a
>> harder to
>> > validate system.
>>
>> Different vendors will have different HW implementations. We cannot force
>> everybody to align the IOMMU.
>> What we can do, is to ease the user life and provide vendor agnostic APIs
>> which just provide the needed functionality. On our case DMA map and unmap.
>> The user should not care if its IOMMU, Mellanox memory registration
>> through verbs or NXP special mapping.
>>
>> The sys admin should set/unset the IOMMU as a general mean of protection.
>> And this of course will work also w/ Mellanox devices.
>>
>> If there is an IOMMU hardware, same mechanism should be
>> > used always, leaving to the IOMMU hw specific implementation to deal
>> with
>> > the details. If a NIC is IOMMU-able, that should not  be supported by
>> specific
>> > vendor drivers but through a generic solution like VFIO which will
>> validate a
>> > device with such capability and to perform the required actions for
>> that case.
>> > VFIO and IOMMU should be modified as needed for supporting this
>> > requirement instead of leaving vendor drivers to implement their own
>> > solution.
>>
>> Again - I am against of forcing every PCI device to use VFIO, and I don't
>> think IOMMU as a HW device should control other PCI devices.
>> I see nothing wrong with device which also has extra capabilities of
>> memory translation, and adds another level of security to the user
>> application.
>>
>>
> In a system with untrusted components using the device, a generic way of
> properly configure the system with the right protections should be used
> instead of relying on specific vendor implementation.
>
>
>> >
>> > In any case, I think this support should be in a different patchset
>> than the
>> > private user space mappings.
>> >
>> >
>
> >
>> > > The work in this patch moves the DMA mapping to vendor agnostic APIs.
>> > > A new map and unmap ops were added to rte_bus structure.
>> > > Implementation of those was done currently only on the PCI bus. The
>> > > implementation takes the driver map and umap implementation as bypass
>> > to the VFIO mapping.
>> > > That is, in case of no specific map/unmap from the PCI driver, VFIO
>> > > mapping, if possible, will be used.
>> > >
>> > > Application use with those APIs is quite simple:
>> > > * allocate memory
>> > > * take a device, and query its rte_device.
>> > > * call the bus map function for this device.
>> > >
>> > > Future work will deprecate the rte_vfio_dma_map and
>> > rte_vfio_dma_unmap
>> > > APIs, leaving the PCI device APIs as the preferred option for the
>> user.
>> > >
>> > > [1]
>> > >
>> > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpat
>> > >
>> > ches.dpdk.org%2Fpatch%2F47796%2F&amp;data=02%7C01%7Cshahafs%40
>> > mellanox
>> > >
>> > .com%7Cdc209a2ceace48c0452008d691a8762d%7Ca652971c7d2e4d9ba6a4d1
>> > 49256f
>> > >
>> > 461b%7C0%7C0%7C636856550053348339&amp;sdata=3TEUJfS9jBOsbvaPYwo
>> > itQLj7o
>> > > h9VCrtaK7We%2FItg5c%3D&amp;reserved=0
>> > >
>> > > Shahaf Shuler (6):
>> > >   vfio: allow DMA map of memory for the default vfio fd
>> > >   vfio: don't fail to DMA map if memory is already mapped
>> > >   bus: introduce DMA memory mapping for external memory
>> > >   net/mlx5: refactor external memory registration
>> > >   net/mlx5: support PCI device DMA map and unmap
>> > >   doc: deprecate VFIO DMA map APIs
>> > >
>> > >  doc/guides/prog_guide/env_abstraction_layer.rst |   2 +-
>> > >  doc/guides/rel_notes/deprecation.rst            |   4 +
>> > >  drivers/bus/pci/pci_common.c                    |  78 +++++++
>> > >  drivers/bus/pci/rte_bus_pci.h                   |  14 ++
>> > >  drivers/net/mlx5/mlx5.c                         |   2 +
>> > >  drivers/net/mlx5/mlx5_mr.c                      | 232
>> ++++++++++++++++---
>> > >  drivers/net/mlx5/mlx5_rxtx.h                    |   5 +
>> > >  lib/librte_eal/common/eal_common_bus.c          |  22 ++
>> > >  lib/librte_eal/common/include/rte_bus.h         |  57 +++++
>> > >  lib/librte_eal/common/include/rte_vfio.h        |  12 +-
>> > >  lib/librte_eal/linuxapp/eal/eal_vfio.c          |  26 ++-
>> > >  lib/librte_eal/rte_eal_version.map              |   2 +
>> > >  12 files changed, 418 insertions(+), 38 deletions(-)
>> > >
>> > > --
>> > > 2.12.0
>> > >
>> > >
>>
>

  reply	other threads:[~2019-02-14 12:27 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-13  9:10 Shahaf Shuler
2019-02-13  9:10 ` [dpdk-dev] [PATCH 1/6] vfio: allow DMA map of memory for the default vfio fd Shahaf Shuler
2019-02-13  9:45   ` Gaëtan Rivet
2019-02-13 11:38     ` Gaëtan Rivet
2019-02-13 15:23     ` Shahaf Shuler
2019-02-13 14:41   ` Burakov, Anatoly
2019-02-13  9:10 ` [dpdk-dev] [PATCH 2/6] vfio: don't fail to DMA map if memory is already mapped Shahaf Shuler
2019-02-13  9:58   ` Gaëtan Rivet
2019-02-13 19:52     ` Shahaf Shuler
2019-02-13  9:10 ` [dpdk-dev] [PATCH 3/6] bus: introduce DMA memory mapping for external memory Shahaf Shuler
2019-02-13 11:17   ` Gaëtan Rivet
2019-02-13 19:07     ` Shahaf Shuler
2019-02-14 14:00       ` Gaëtan Rivet
2019-02-17  6:23         ` Shahaf Shuler
2019-02-13  9:10 ` [dpdk-dev] [PATCH 4/6] net/mlx5: refactor external memory registration Shahaf Shuler
2019-02-13  9:10 ` [dpdk-dev] [PATCH 5/6] net/mlx5: support PCI device DMA map and unmap Shahaf Shuler
2019-02-13 11:35   ` Gaëtan Rivet
2019-02-13 11:44     ` Gaëtan Rivet
2019-02-13 19:11       ` Shahaf Shuler
2019-02-14 10:21         ` Gaëtan Rivet
2019-02-21  9:21           ` Shahaf Shuler
2019-02-13  9:10 ` [dpdk-dev] [PATCH 6/6] doc: deprecate VFIO DMA map APIs Shahaf Shuler
2019-02-13 11:43 ` [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for external memory Alejandro Lucero
2019-02-13 19:24   ` Shahaf Shuler
2019-02-14 10:19     ` Burakov, Anatoly
2019-02-14 13:28       ` Shahaf Shuler
2019-02-14 16:19         ` Burakov, Anatoly
2019-02-17  6:18           ` Shahaf Shuler
2019-02-18 12:21             ` Burakov, Anatoly
2019-02-14 12:22     ` Alejandro Lucero
2019-02-14 12:27       ` Alejandro Lucero [this message]
2019-02-14 13:41       ` Shahaf Shuler
2019-02-14 16:43         ` Burakov, Anatoly
2019-02-21 14:50 ` [dpdk-dev] [PATCH v2 " Shahaf Shuler
2019-03-05 13:59   ` [dpdk-dev] [PATCH v3 " Shahaf Shuler
2019-03-10  8:27     ` [dpdk-dev] [PATCH v4 " Shahaf Shuler
2019-03-10  8:27       ` [dpdk-dev] [PATCH v4 1/6] vfio: allow DMA map of memory for the default vfio fd Shahaf Shuler
2019-03-30  0:23         ` Thomas Monjalon
2019-03-30  0:23           ` Thomas Monjalon
2019-03-30 14:29           ` Thomas Monjalon
2019-03-30 14:29             ` Thomas Monjalon
2019-03-10  8:27       ` [dpdk-dev] [PATCH v4 2/6] vfio: don't fail to DMA map if memory is already mapped Shahaf Shuler
2019-03-10  8:28       ` [dpdk-dev] [PATCH v4 3/6] bus: introduce device level DMA memory mapping Shahaf Shuler
2019-03-11 10:19         ` Burakov, Anatoly
2019-03-13  9:56         ` Thomas Monjalon
2019-03-13 11:12           ` Shahaf Shuler
2019-03-13 11:19             ` Thomas Monjalon
2019-03-13 11:47               ` Burakov, Anatoly
2019-03-30 14:36             ` Thomas Monjalon
2019-03-30 14:36               ` Thomas Monjalon
2019-03-10  8:28       ` [dpdk-dev] [PATCH v4 4/6] net/mlx5: refactor external memory registration Shahaf Shuler
2019-03-10  8:28       ` [dpdk-dev] [PATCH v4 5/6] net/mlx5: support PCI device DMA map and unmap Shahaf Shuler
2019-03-10  8:28       ` [dpdk-dev] [PATCH v4 6/6] doc: deprecation notice for VFIO DMA map APIs Shahaf Shuler
2019-03-11 10:20         ` Burakov, Anatoly
2019-03-11 17:35           ` Rami Rosen
2019-10-01 15:20         ` David Marchand
2019-10-02  4:53           ` Shahaf Shuler
2019-10-02  7:51             ` David Marchand
2019-03-11  9:27       ` [dpdk-dev] [PATCH v4 0/6] introduce DMA memory mapping for external memory Gaëtan Rivet
2019-03-30 14:40       ` Thomas Monjalon
2019-03-30 14:40         ` Thomas Monjalon
2019-03-05 13:59   ` [dpdk-dev] [PATCH v3 1/6] vfio: allow DMA map of memory for the default vfio fd Shahaf Shuler
2019-03-05 13:59   ` [dpdk-dev] [PATCH v3 2/6] vfio: don't fail to DMA map if memory is already mapped Shahaf Shuler
2019-03-05 13:59   ` [dpdk-dev] [PATCH v3 3/6] bus: introduce device level DMA memory mapping Shahaf Shuler
2019-03-05 16:35     ` Burakov, Anatoly
2019-03-05 13:59   ` [dpdk-dev] [PATCH v3 4/6] net/mlx5: refactor external memory registration Shahaf Shuler
2019-03-05 13:59   ` [dpdk-dev] [PATCH v3 5/6] net/mlx5: support PCI device DMA map and unmap Shahaf Shuler
2019-03-05 13:59   ` [dpdk-dev] [PATCH v3 6/6] doc: deprecation notice for VFIO DMA map APIs Shahaf Shuler
2019-02-21 14:50 ` [dpdk-dev] [PATCH v2 1/6] vfio: allow DMA map of memory for the default vfio fd Shahaf Shuler
2019-02-28 11:56   ` Burakov, Anatoly
2019-02-21 14:50 ` [dpdk-dev] [PATCH v2 2/6] vfio: don't fail to DMA map if memory is already mapped Shahaf Shuler
2019-02-28 11:58   ` Burakov, Anatoly
2019-02-21 14:50 ` [dpdk-dev] [PATCH v2 3/6] bus: introduce device level DMA memory mapping Shahaf Shuler
2019-02-28 12:14   ` Burakov, Anatoly
2019-02-28 14:41     ` Burakov, Anatoly
2019-02-21 14:50 ` [dpdk-dev] [PATCH v2 4/6] net/mlx5: refactor external memory registration Shahaf Shuler
2019-02-21 14:50 ` [dpdk-dev] [PATCH v2 5/6] net/mlx5: support PCI device DMA map and unmap Shahaf Shuler
2019-02-21 14:50 ` [dpdk-dev] [PATCH v2 6/6] doc: deprecate VFIO DMA map APIs Shahaf Shuler
2019-02-21 15:50   ` David Marchand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAD+H992MH5b59m1=NF1PqM_oKGniNjCp6d0tqun2N+QLrXw3sg@mail.gmail.com' \
    --to=alejandro.lucero@netronome.com \
    --cc=anatoly.burakov@intel.com \
    --cc=dev@dpdk.org \
    --cc=ferruh.yigit@intel.com \
    --cc=gaetan.rivet@6wind.com \
    --cc=nhorman@tuxdriver.com \
    --cc=shahafs@mellanox.com \
    --cc=thomas@monjalon.net \
    --cc=yskoh@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).