DPDK patches and discussions
 help / color / mirror / Atom feed
From: Shahaf Shuler <shahafs@mellanox.com>
To: "Burakov, Anatoly" <anatoly.burakov@intel.com>,
	"dev@dpdk.org" <dev@dpdk.org>
Cc: Olga Shern <olgas@mellanox.com>,
	Yongseok Koh <yskoh@mellanox.com>,
	"pawelx.wodkowski@intel.com" <pawelx.wodkowski@intel.com>,
	"gowrishankar.m@linux.vnet.ibm.com"
	<gowrishankar.m@linux.vnet.ibm.com>,
	"ferruh.yigit@intel.com" <ferruh.yigit@intel.com>,
	Thomas Monjalon <thomas@monjalon.net>,
	"arybchenko@solarflare.com" <arybchenko@solarflare.com>,
	"shreyansh.jain@nxp.com" <shreyansh.jain@nxp.com>
Subject: Re: [dpdk-dev] [RFC] ethdev: introduce DMA memory mapping for external memory
Date: Wed, 14 Nov 2018 14:53:29 +0000
Message-ID: <DB7PR05MB442617F5BEE7251AAD8836A6C3C30@DB7PR05MB4426.eurprd05.prod.outlook.com> (raw)
In-Reply-To: <ba00bccc-f20d-47a9-052f-3e5b6bc1c2c7@intel.com>

Hi Anatoly, 

Wednesday, November 14, 2018 1:19 PM, Burakov, Anatoly:
> Subject: Re: [RFC] ethdev: introduce DMA memory mapping for external
> memory
> 
> Hi Shahaf,
> 
> Great to see such effort! Few comments below.
> 
> Note: halfway through writing my comments i realized that i am starting with
> an assumption that this API is a replacement for current VFIO DMA mapping
> API's. So, if my comments seem out of left field, this is probably why :)
> 
> On 04-Nov-18 12:41 PM, Shahaf Shuler wrote:
> > Request for comment on the high level changes present on this patch.
> >
> > The need to use external memory (memory belong to application and not
> > part of the DPDK hugepages) is allready present.
> > Starting from storage apps which prefer to manage their own memory
> > blocks for efficient use of the storage device. Continue with GPU
> > based application which strives to achieve zero copy while processing
> > the packet payload on the GPU core. And finally by vSwitch/vRouter
> > application who just prefer to have a full control over the memory in use
> (e.g. VPP).
> >
> > Recent work[1] in the DPDK enabled the use of external memory, however
> > it mostly focus on VFIO as the only way to map memory.
> > While VFIO is common, there are other vendors which use different ways
> > to map memory (e.g. Mellanox and NXP[2]).
> >
> > The work in this patch moves the DMA mapping to vendor agnostic APIs
> > located under ethdev. The choice in ethdev was because memory map
> > should be associated with a specific port(s). Otherwise the memory is
> > being mapped multiple times to different frameworks and ends up with
> > memory being wasted on redundant translation table in the host or in the
> device.
> 
> So, anything other than ethdev (e.g. cryptodev) will not be able to map
> memory for DMA?

That's is a fair point. 

> 
> I have thought about this for some length of time, and i think DMA mapping
> belongs in EAL (more specifically, somewhere at the bus layer), rather than at
> device level.

I am not sure I agree here. For example take Intel and Mellanox devices. Both are PCI devices, so how will you distinguish which mapping API to use? 
Also I still think the mapping should be in device granularity and not bus/system granularity, since it is very typical for a memory to be used for DMA be a specific device. 

Maybe we can say the DMA mapping is a rte_device attribute. It is the parent class for all the DPDK devices. 
We need to see w/ vport representors (which all has the same rte_device). On that case I believe the rte_device.map call can register the memory to all of the representors as well (if needed). 

Placing this functionality at device level comes with more work
> to support different device types and puts a burden on device driver
> developers to implement their own mapping functions.

The mapping function can be shared. For example we can still maintain the vfio mapping scheme as part of eal and have all the related driver to call this function. 
The only overhead will be to maintain the function pointer for the dma call. 
With this work, instead of the eal layer to guess which type of DMA mapping the devices in the  system needs or alternatively force them all to work w/ VFIO, each driver will select its own function. 
The driver is the only one which knows what type of DMA mapping its device needs. 

> 
> However, i have no familiarity with how MLX/NXP devices do their DMA
> mapping, so maybe the device-centric approach would be better. We could
> provide "standard" mapping functions at the bus level (such as VFIO mapping
> functions for PCI bus), so that this could would not have to be
> reimplemented in the devices.

Yes, like I said above, I wasn't intending to re-implement all the mapping function again on each driver. Yet, I believe it should be per device. 

> 
> Moreover, i'm not sure how this is going to work for VFIO. If this is to be
> called for each NIC that needs access to the memory, then we'll end up with
> double mappings for any NIC that uses VFIO, unless you want each NIC to be
> in a separate container.

I am not much familiar w/ VFIO (you are the expert😊). 
What will happen if we map the same memory twice (under same container)? The translation on the IOMMU will be doubled? The map will return with error that this memory mapping already exists? 

> 
> >
> > For example, consider a host with Mellanox and Intel devices. Mapping a
> > memory without specifying to which port will end up with IOMMU
> > registration and Verbs (Mellanox DMA map) registration.
> > Another example can be two Mellanox devices on the same host. The
> memory
> > will be mapped for both, even though application will use mempool per
> > device.
> >
> > To use the suggested APIs the application will allocate a memory block
> > and will call rte_eth_dma_map. It will map it to every port that needs
> > DMA access to this memory.
> 
> This bit is unclear to me. What do you mean "map it to every port that
> needs DMA access to this memory"? I don't see how this API solves the
> above problem of mapping the same memory to all devices. How does a
> device know which memory it will need? Does the user specifically have
> to call this API for each and every NIC they're using?

Yes, the user will call this API for every port which needs to have DMA access to this memory.
Remember we are speaking here on external memory the application allocated and wants to use for send/receive.  The device doesn't guess which memory he will need, the user is telling it to him explicitly. 

> 
> For DPDK-managed memory, everything will still get mapped to every
> device automatically, correct? 

Yes, even though it is not the case today. 

If so, then such a manual approach for
> external memory will be bad for both usability and drop-in replacement
> of internal-to-external memory, because it introduces inconsistency
> between using internal and external memory. From my point of view,
> either we do *everything* manually (i.e. register all memory for DMA
> explicitly) and thereby avoid this problem but keep the consistency, or
> we do *everything* automatically and deal with duplication of mappings
> somehow (say, by MLX/NXP drivers sharing their mappings through bus
> interface).

I understand your point, however I am not sure external and internal memory *must* be consist.
The DPDK-managed memory is part of the DPDK subsystems and the DPDK libs are preparing it for the optimal use of the underlying devices. The external memory is different, it is a proprietary memory the application allocated and the DPDK cannot do anything in advance on it.  
Even today there is inconsistency, because if user wants to use external memory it must map it (rte_vfio_dma_map) while he doesn't need to do that for the DPDK-managed memory. 

I guess we can we can add a flag on the device mapping which will say MAP_TO_ALL_DEVICES, to ease the application life in the presence of multiple device in the host. 

> 
> > Later on the application could use this memory to populate a mempool or
> > to attach mbuf with external buffer.
> > When the memory should no longer be used by the device the application
> > will call rte_eth_dma_unmap from every port it did registration to.
> >
> > The Drivers will implement the DMA map/unmap, and it is very likely they
> > will use the help of the existing VFIO mapping.
> >
> > Support for hotplug/unplug of device is out of the scope for this patch,
> > however can be implemented in the same way it is done on VFIO.
> >
> > Cc: pawelx.wodkowski@intel.com
> > Cc: anatoly.burakov@intel.com
> > Cc: gowrishankar.m@linux.vnet.ibm.com
> > Cc: ferruh.yigit@intel.com
> > Cc: thomas@monjalon.net
> > Cc: arybchenko@solarflare.com
> > Cc: shreyansh.jain@nxp.com
> >
> > Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> >
> > [1]
> > commit 73a639085938 ("vfio: allow to map other memory regions")
> > [2]
> >
> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail
> s.dpdk.org%2Farchives%2Fdev%2F2018-
> September%2F111978.html&amp;data=02%7C01%7Cshahafs%40mellanox.co
> m%7C4ad77203f6034b173eb908d64a230121%7Ca652971c7d2e4d9ba6a4d1492
> 56f461b%7C0%7C0%7C636777911537908464&amp;sdata=gUGpiDUQOkHn5N
> %2BtgjSEqiXctkQxAHWBSGyyHhG84UY%3D&amp;reserved=0
> > ---
> --
> Thanks,
> Anatoly

  reply	other threads:[~2018-11-14 14:53 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-04 12:41 Shahaf Shuler
2018-11-14 11:19 ` Burakov, Anatoly
2018-11-14 14:53   ` Shahaf Shuler [this message]
2018-11-14 17:06     ` Burakov, Anatoly
2018-11-15  9:46       ` Shahaf Shuler
2018-11-15 10:59         ` Burakov, Anatoly
2018-11-19 11:20           ` Shahaf Shuler
2018-11-19 17:18             ` Burakov, Anatoly
     [not found]               ` <DB7PR05MB442643DFD33B71797CD34B5EC3D90@DB7PR05MB4426.eurprd05.prod.outlook.com>
2018-11-20 10:55                 ` Burakov, Anatoly
2018-11-22 10:06                   ` Shahaf Shuler
2018-11-22 10:41                     ` Burakov, Anatoly
2018-11-22 11:31                       ` Shahaf Shuler
2018-11-22 11:34                         ` Burakov, Anatoly
2019-01-14  6:12                         ` Shahaf Shuler
2019-01-15 12:07                           ` Burakov, Anatoly
2019-01-16 11:04                             ` Shahaf Shuler
2018-11-19 17:04           ` Stephen Hemminger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DB7PR05MB442617F5BEE7251AAD8836A6C3C30@DB7PR05MB4426.eurprd05.prod.outlook.com \
    --to=shahafs@mellanox.com \
    --cc=anatoly.burakov@intel.com \
    --cc=arybchenko@solarflare.com \
    --cc=dev@dpdk.org \
    --cc=ferruh.yigit@intel.com \
    --cc=gowrishankar.m@linux.vnet.ibm.com \
    --cc=olgas@mellanox.com \
    --cc=pawelx.wodkowski@intel.com \
    --cc=shreyansh.jain@nxp.com \
    --cc=thomas@monjalon.net \
    --cc=yskoh@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

DPDK patches and discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ https://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git