From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by dpdk.org (Postfix) with ESMTP id BC06A4C74 for ; Thu, 28 Feb 2019 13:15:00 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 28 Feb 2019 04:14:59 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.58,423,1544515200"; d="scan'208";a="130062596" Received: from aburakov-mobl1.ger.corp.intel.com (HELO [10.237.220.83]) ([10.237.220.83]) by orsmga003.jf.intel.com with ESMTP; 28 Feb 2019 04:14:57 -0800 To: Shahaf Shuler , yskoh@mellanox.com, thomas@monjalon.net, ferruh.yigit@intel.com, nhorman@tuxdriver.com, gaetan.rivet@6wind.com Cc: dev@dpdk.org References: From: "Burakov, Anatoly" Message-ID: Date: Thu, 28 Feb 2019 12:14:56 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.5.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH v2 3/6] bus: introduce device level DMA memory mapping X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Feb 2019 12:15:01 -0000 On 21-Feb-19 2:50 PM, Shahaf Shuler wrote: > The DPDK APIs expose 3 different modes to work with memory used for DMA: > > 1. Use the DPDK owned memory (backed by the DPDK provided hugepages). > This memory is allocated by the DPDK libraries, included in the DPDK > memory system (memseg lists) and automatically DMA mapped by the DPDK > layers. > > 2. Use memory allocated by the user and register to the DPDK memory > systems. Upon registration of memory, the DPDK layers will DMA map it > to all needed devices. After registration, allocation of this memory > will be done with rte_*malloc APIs. > > 3. Use memory allocated by the user and not registered to the DPDK memory > system. This is for users who wants to have tight control on this > memory (e.g. avoid the rte_malloc header). > The user should create a memory, register it through rte_extmem_register > API, and call DMA map function in order to register such memory to > the different devices. > > The scope of the patch focus on #3 above. > > Currently the only way to map external memory is through VFIO > (rte_vfio_dma_map). While VFIO is common, there are other vendors > which use different ways to map memory (e.g. Mellanox and NXP). > > The work in this patch moves the DMA mapping to vendor agnostic APIs. > Device level DMA map and unmap APIs were added. Implementation of those > APIs was done currently only for PCI devices. > > For PCI bus devices, the pci driver can expose its own map and unmap > functions to be used for the mapping. In case the driver doesn't provide > any, the memory will be mapped, if possible, to IOMMU through VFIO APIs. > > Application usage with those APIs is quite simple: > * allocate memory > * call rte_extmem_register on the memory chunk. > * take a device, and query its rte_device. > * call the device specific mapping function for this device. > > Future work will deprecate the rte_vfio_dma_map and rte_vfio_dma_unmap > APIs, leaving the rte device APIs as the preferred option for the user. > > Signed-off-by: Shahaf Shuler > --- > + > + if (!pdev || !pdev->driver) { > + rte_errno = EINVAL; > + return -rte_errno; > + } We could put a check in here to see if the memory has been registered with DPDK. Just call rte_mem_virt2memseg_list(addr) - if it returns NULL, the memory wasn't registered, so you can throw an error. Not sure of appropriate errno in that case - ENODEV? EINVAL? > + if (pdev->driver->dma_map) > + return pdev->driver->dma_map(pdev, addr, iova, len); > + /** > + * In case driver don't provides any specific mapping > + * try fallback to VFIO. > + */ > + if (pdev->kdrv == RTE_KDRV_VFIO) > + return rte_vfio_container_dma_map > + (RTE_VFIO_DEFAULT_CONTAINER_FD, (uintptr_t)addr, > + iova, len); > +rte_dev_dma_map(struct rte_device *dev, void *addr, uint64_t iova, > + size_t len) > +{ > + if (dev->bus->dma_map == NULL || len == 0) { > + rte_errno = EINVAL; > + return -rte_errno; > + } > + /* Memory must be registered through rte_extmem_* APIs */ > + if (rte_mem_virt2memseg(addr, NULL) == NULL) { No need to call rte_mem_virt2memseg - rte_mem_virt2memseg_list will do. > + rte_errno = EINVAL; > + return -rte_errno; > + } > + > + return dev->bus->dma_map(dev, addr, iova, len); > +} > + > +int > +rte_dev_dma_unmap(struct rte_device *dev, void *addr, uint64_t iova, > + size_t len) > +{ > + if (dev->bus->dma_unmap == NULL || len == 0) { > + rte_errno = EINVAL; > + return -rte_errno; > + } I think attempting to unmap a memory region that isn't registered should be an error, so rte_mem_virt2memseg_list call should be here too. > + > + return dev->bus->dma_unmap(dev, addr, iova, len); > +} > diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h > index 6be4b5cabe..4faf2d20a0 100644 > --- a/lib/librte_eal/common/include/rte_bus.h > +++ b/lib/librte_eal/common/include/rte_bus.h > @@ -168,6 +168,48 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev); > typedef int (*rte_bus_parse_t)(const char *name, void *addr); > --- a/lib/librte_eal/common/include/rte_dev.h > +++ b/lib/librte_eal/common/include/rte_dev.h > @@ -515,4 +515,47 @@ rte_dev_hotplug_handle_enable(void); > int __rte_experimental > rte_dev_hotplug_handle_disable(void); > > +/** > + * Device level DMA map function. > + * After a successful call, the memory segment will be mapped to the > + * given device. here and in unmap: @note please register memory first ? > + * > + * @param dev > + * Device pointer. > + * @param addr > + * Virtual address to map. > + * @param iova > + * IOVA address to map. > + * @param len > + * Length of the memory segment being mapped. > + * > + * @return > + * 0 if mapping was successful. > + * Negative value and rte_errno is set otherwise. Here and in other similar places: why are we setting rte_errno *and* returning -rte_errno? Wouldn't returning -1 be enough? -- Thanks, Anatoly