* [dpdk-dev] [PATCH 1/6] vfio: allow DMA map of memory for the default vfio fd
2019-02-13 9:10 [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for external memory Shahaf Shuler
@ 2019-02-13 9:10 ` Shahaf Shuler
2019-02-13 9:45 ` Gaëtan Rivet
2019-02-13 14:41 ` Burakov, Anatoly
2019-02-13 9:10 ` [dpdk-dev] [PATCH 2/6] vfio: don't fail to DMA map if memory is already mapped Shahaf Shuler
` (12 subsequent siblings)
13 siblings, 2 replies; 79+ messages in thread
From: Shahaf Shuler @ 2019-02-13 9:10 UTC (permalink / raw)
To: anatoly.burakov, yskoh, thomas, ferruh.yigit, nhorman, gaetan.rivet; +Cc: dev
Enable users the option to call rte_vfio_dma_map with request to map
to the default vfio fd.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
---
lib/librte_eal/common/include/rte_vfio.h | 6 ++++--
lib/librte_eal/linuxapp/eal/eal_vfio.c | 14 ++++++++++++--
2 files changed, 16 insertions(+), 4 deletions(-)
diff --git a/lib/librte_eal/common/include/rte_vfio.h b/lib/librte_eal/common/include/rte_vfio.h
index cae96fab90..2a6827012f 100644
--- a/lib/librte_eal/common/include/rte_vfio.h
+++ b/lib/librte_eal/common/include/rte_vfio.h
@@ -347,7 +347,8 @@ rte_vfio_container_group_unbind(int container_fd, int iommu_group_num);
* Perform DMA mapping for devices in a container.
*
* @param container_fd
- * the specified container fd
+ * the specified container fd. In case of -1 the default container
+ * fd will be used.
*
* @param vaddr
* Starting virtual address of memory to be mapped.
@@ -370,7 +371,8 @@ rte_vfio_container_dma_map(int container_fd, uint64_t vaddr,
* Perform DMA unmapping for devices in a container.
*
* @param container_fd
- * the specified container fd
+ * the specified container fd. In case of -1 the default container
+ * fd will be used.
*
* @param vaddr
* Starting virtual address of memory to be unmapped.
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index c821e83826..48ca9465d4 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -1897,7 +1897,12 @@ rte_vfio_container_dma_map(int container_fd, uint64_t vaddr, uint64_t iova,
return -1;
}
- vfio_cfg = get_vfio_cfg_by_container_fd(container_fd);
+ if (container_fd > 0) {
+ vfio_cfg = get_vfio_cfg_by_container_fd(container_fd);
+ } else {
+ /* when no fd provided use the default. */
+ vfio_cfg = &vfio_cfgs[0];
+ }
if (vfio_cfg == NULL) {
RTE_LOG(ERR, EAL, "Invalid container fd\n");
return -1;
@@ -1917,7 +1922,12 @@ rte_vfio_container_dma_unmap(int container_fd, uint64_t vaddr, uint64_t iova,
return -1;
}
- vfio_cfg = get_vfio_cfg_by_container_fd(container_fd);
+ if (container_fd > 0) {
+ vfio_cfg = get_vfio_cfg_by_container_fd(container_fd);
+ } else {
+ /* when no fd provided use the default. */
+ vfio_cfg = &vfio_cfgs[0];
+ }
if (vfio_cfg == NULL) {
RTE_LOG(ERR, EAL, "Invalid container fd\n");
return -1;
--
2.12.0
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH 1/6] vfio: allow DMA map of memory for the default vfio fd
2019-02-13 9:10 ` [dpdk-dev] [PATCH 1/6] vfio: allow DMA map of memory for the default vfio fd Shahaf Shuler
@ 2019-02-13 9:45 ` Gaëtan Rivet
2019-02-13 11:38 ` Gaëtan Rivet
2019-02-13 15:23 ` Shahaf Shuler
2019-02-13 14:41 ` Burakov, Anatoly
1 sibling, 2 replies; 79+ messages in thread
From: Gaëtan Rivet @ 2019-02-13 9:45 UTC (permalink / raw)
To: Shahaf Shuler; +Cc: anatoly.burakov, yskoh, thomas, ferruh.yigit, nhorman, dev
Hello Shahaf,
On Wed, Feb 13, 2019 at 11:10:21AM +0200, Shahaf Shuler wrote:
> Enable users the option to call rte_vfio_dma_map with request to map
> to the default vfio fd.
>
> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> ---
> lib/librte_eal/common/include/rte_vfio.h | 6 ++++--
> lib/librte_eal/linuxapp/eal/eal_vfio.c | 14 ++++++++++++--
> 2 files changed, 16 insertions(+), 4 deletions(-)
>
> diff --git a/lib/librte_eal/common/include/rte_vfio.h b/lib/librte_eal/common/include/rte_vfio.h
> index cae96fab90..2a6827012f 100644
> --- a/lib/librte_eal/common/include/rte_vfio.h
> +++ b/lib/librte_eal/common/include/rte_vfio.h
> @@ -347,7 +347,8 @@ rte_vfio_container_group_unbind(int container_fd, int iommu_group_num);
> * Perform DMA mapping for devices in a container.
> *
> * @param container_fd
> - * the specified container fd
> + * the specified container fd. In case of -1 the default container
> + * fd will be used.
I think
#define RTE_VFIO_DEFAULT_CONTAINER_FD (-1)
might help reading the code that will later call these functions.
> *
> * @param vaddr
> * Starting virtual address of memory to be mapped.
> @@ -370,7 +371,8 @@ rte_vfio_container_dma_map(int container_fd, uint64_t vaddr,
> * Perform DMA unmapping for devices in a container.
> *
> * @param container_fd
> - * the specified container fd
> + * the specified container fd. In case of -1 the default container
> + * fd will be used.
> *
> * @param vaddr
> * Starting virtual address of memory to be unmapped.
> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> index c821e83826..48ca9465d4 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> @@ -1897,7 +1897,12 @@ rte_vfio_container_dma_map(int container_fd, uint64_t vaddr, uint64_t iova,
> return -1;
> }
>
> - vfio_cfg = get_vfio_cfg_by_container_fd(container_fd);
> + if (container_fd > 0) {
Should it not be container_fd >= 0? This seems inconsistent with the doc
above. Reading the code quickly, it's not clear that the container_fd==0
would be at vfio_cfgs[0], so this seems wrong.
> + vfio_cfg = get_vfio_cfg_by_container_fd(container_fd);
> + } else {
> + /* when no fd provided use the default. */
> + vfio_cfg = &vfio_cfgs[0];
> + }
Can you use:
vfio_cfg = default_vfio_cfg;
instead? Then the comment is redundant.
Actually, to keep with my comment above, it might be better to simply
have
if (container_fd == RTE_VFIO_DEFAULT_CONTAINER_FD)
vfio_cfg = default_vfio_cfg;
else
vfio_cfg = get_vfio_cfg_by_group_num(container_fd);
> if (vfio_cfg == NULL) {
> RTE_LOG(ERR, EAL, "Invalid container fd\n");
> return -1;
> @@ -1917,7 +1922,12 @@ rte_vfio_container_dma_unmap(int container_fd, uint64_t vaddr, uint64_t iova,
> return -1;
> }
>
> - vfio_cfg = get_vfio_cfg_by_container_fd(container_fd);
> + if (container_fd > 0) {
> + vfio_cfg = get_vfio_cfg_by_container_fd(container_fd);
> + } else {
> + /* when no fd provided use the default. */
> + vfio_cfg = &vfio_cfgs[0];
> + }
> if (vfio_cfg == NULL) {
> RTE_LOG(ERR, EAL, "Invalid container fd\n");
> return -1;
> --
> 2.12.0
>
--
Gaëtan Rivet
6WIND
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH 1/6] vfio: allow DMA map of memory for the default vfio fd
2019-02-13 9:45 ` Gaëtan Rivet
@ 2019-02-13 11:38 ` Gaëtan Rivet
2019-02-13 15:23 ` Shahaf Shuler
1 sibling, 0 replies; 79+ messages in thread
From: Gaëtan Rivet @ 2019-02-13 11:38 UTC (permalink / raw)
To: Shahaf Shuler; +Cc: anatoly.burakov, yskoh, thomas, ferruh.yigit, nhorman, dev
On Wed, Feb 13, 2019 at 10:45:05AM +0100, Gaëtan Rivet wrote:
> Hello Shahaf,
>
> On Wed, Feb 13, 2019 at 11:10:21AM +0200, Shahaf Shuler wrote:
> > Enable users the option to call rte_vfio_dma_map with request to map
> > to the default vfio fd.
> >
> > Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> > ---
> > lib/librte_eal/common/include/rte_vfio.h | 6 ++++--
> > lib/librte_eal/linuxapp/eal/eal_vfio.c | 14 ++++++++++++--
> > 2 files changed, 16 insertions(+), 4 deletions(-)
> >
> > diff --git a/lib/librte_eal/common/include/rte_vfio.h b/lib/librte_eal/common/include/rte_vfio.h
> > index cae96fab90..2a6827012f 100644
> > --- a/lib/librte_eal/common/include/rte_vfio.h
> > +++ b/lib/librte_eal/common/include/rte_vfio.h
> > @@ -347,7 +347,8 @@ rte_vfio_container_group_unbind(int container_fd, int iommu_group_num);
> > * Perform DMA mapping for devices in a container.
> > *
> > * @param container_fd
> > - * the specified container fd
> > + * the specified container fd. In case of -1 the default container
> > + * fd will be used.
>
> I think
>
> #define RTE_VFIO_DEFAULT_CONTAINER_FD (-1)
>
> might help reading the code that will later call these functions.
>
> > *
> > * @param vaddr
> > * Starting virtual address of memory to be mapped.
> > @@ -370,7 +371,8 @@ rte_vfio_container_dma_map(int container_fd, uint64_t vaddr,
> > * Perform DMA unmapping for devices in a container.
> > *
> > * @param container_fd
> > - * the specified container fd
> > + * the specified container fd. In case of -1 the default container
> > + * fd will be used.
> > *
> > * @param vaddr
> > * Starting virtual address of memory to be unmapped.
> > diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > index c821e83826..48ca9465d4 100644
> > --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > @@ -1897,7 +1897,12 @@ rte_vfio_container_dma_map(int container_fd, uint64_t vaddr, uint64_t iova,
> > return -1;
> > }
> >
> > - vfio_cfg = get_vfio_cfg_by_container_fd(container_fd);
> > + if (container_fd > 0) {
>
> Should it not be container_fd >= 0? This seems inconsistent with the doc
> above. Reading the code quickly, it's not clear that the container_fd==0
> would be at vfio_cfgs[0], so this seems wrong.
>
> > + vfio_cfg = get_vfio_cfg_by_container_fd(container_fd);
> > + } else {
> > + /* when no fd provided use the default. */
> > + vfio_cfg = &vfio_cfgs[0];
> > + }
>
> Can you use:
>
> vfio_cfg = default_vfio_cfg;
>
> instead? Then the comment is redundant.
> Actually, to keep with my comment above, it might be better to simply
> have
>
> if (container_fd == RTE_VFIO_DEFAULT_CONTAINER_FD)
> vfio_cfg = default_vfio_cfg;
> else
> vfio_cfg = get_vfio_cfg_by_group_num(container_fd);
>
Errm, copy error, this line should be
vfio_cfg = get_vfio_cfg_by_container_fd(container_fd);
of course.
> > if (vfio_cfg == NULL) {
> > RTE_LOG(ERR, EAL, "Invalid container fd\n");
> > return -1;
> > @@ -1917,7 +1922,12 @@ rte_vfio_container_dma_unmap(int container_fd, uint64_t vaddr, uint64_t iova,
> > return -1;
> > }
> >
> > - vfio_cfg = get_vfio_cfg_by_container_fd(container_fd);
> > + if (container_fd > 0) {
> > + vfio_cfg = get_vfio_cfg_by_container_fd(container_fd);
> > + } else {
> > + /* when no fd provided use the default. */
> > + vfio_cfg = &vfio_cfgs[0];
> > + }
> > if (vfio_cfg == NULL) {
> > RTE_LOG(ERR, EAL, "Invalid container fd\n");
> > return -1;
> > --
> > 2.12.0
> >
>
> --
> Gaëtan Rivet
> 6WIND
--
Gaëtan Rivet
6WIND
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH 1/6] vfio: allow DMA map of memory for the default vfio fd
2019-02-13 9:45 ` Gaëtan Rivet
2019-02-13 11:38 ` Gaëtan Rivet
@ 2019-02-13 15:23 ` Shahaf Shuler
1 sibling, 0 replies; 79+ messages in thread
From: Shahaf Shuler @ 2019-02-13 15:23 UTC (permalink / raw)
To: Gaëtan Rivet
Cc: anatoly.burakov, Yongseok Koh, Thomas Monjalon, ferruh.yigit,
nhorman, dev
Wednesday, February 13, 2019 11:45 AM, Gaëtan Rivet:
> Subject: Re: [PATCH 1/6] vfio: allow DMA map of memory for the default vfio
> fd
>
> Hello Shahaf,
Hi Gaetan.
>
> On Wed, Feb 13, 2019 at 11:10:21AM +0200, Shahaf Shuler wrote:
> > Enable users the option to call rte_vfio_dma_map with request to map
> > to the default vfio fd.
> >
> > Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> > ---
> > lib/librte_eal/common/include/rte_vfio.h | 6 ++++--
> > lib/librte_eal/linuxapp/eal/eal_vfio.c | 14 ++++++++++++--
> > 2 files changed, 16 insertions(+), 4 deletions(-)
> >
>
[...]
> Can you use:
>
> vfio_cfg = default_vfio_cfg;
>
> instead? Then the comment is redundant.
> Actually, to keep with my comment above, it might be better to simply have
>
> if (container_fd == RTE_VFIO_DEFAULT_CONTAINER_FD)
> vfio_cfg = default_vfio_cfg;
> else
> vfio_cfg = get_vfio_cfg_by_group_num(container_fd);
>
Good suggestion. Will adjust in v2.
> > if (vfio_cfg == NULL) {
> > RTE_LOG(ERR, EAL, "Invalid container fd\n");
> > return -1;
> > @@ -1917,7 +1922,12 @@ rte_vfio_container_dma_unmap(int
> container_fd, uint64_t vaddr, uint64_t iova,
> > return -1;
> > }
> >
> > - vfio_cfg = get_vfio_cfg_by_container_fd(container_fd);
> > + if (container_fd > 0) {
> > + vfio_cfg = get_vfio_cfg_by_container_fd(container_fd);
> > + } else {
> > + /* when no fd provided use the default. */
> > + vfio_cfg = &vfio_cfgs[0];
> > + }
> > if (vfio_cfg == NULL) {
> > RTE_LOG(ERR, EAL, "Invalid container fd\n");
> > return -1;
> > --
> > 2.12.0
> >
>
> --
> Gaëtan Rivet
> 6WIND
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH 1/6] vfio: allow DMA map of memory for the default vfio fd
2019-02-13 9:10 ` [dpdk-dev] [PATCH 1/6] vfio: allow DMA map of memory for the default vfio fd Shahaf Shuler
2019-02-13 9:45 ` Gaëtan Rivet
@ 2019-02-13 14:41 ` Burakov, Anatoly
1 sibling, 0 replies; 79+ messages in thread
From: Burakov, Anatoly @ 2019-02-13 14:41 UTC (permalink / raw)
To: Shahaf Shuler, yskoh, thomas, ferruh.yigit, nhorman, gaetan.rivet; +Cc: dev
On 13-Feb-19 9:10 AM, Shahaf Shuler wrote:
> Enable users the option to call rte_vfio_dma_map with request to map
> to the default vfio fd.
>
> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> ---
> lib/librte_eal/common/include/rte_vfio.h | 6 ++++--
> lib/librte_eal/linuxapp/eal/eal_vfio.c | 14 ++++++++++++--
> 2 files changed, 16 insertions(+), 4 deletions(-)
>
> diff --git a/lib/librte_eal/common/include/rte_vfio.h b/lib/librte_eal/common/include/rte_vfio.h
> index cae96fab90..2a6827012f 100644
> --- a/lib/librte_eal/common/include/rte_vfio.h
> +++ b/lib/librte_eal/common/include/rte_vfio.h
> @@ -347,7 +347,8 @@ rte_vfio_container_group_unbind(int container_fd, int iommu_group_num);
> * Perform DMA mapping for devices in a container.
> *
> * @param container_fd
> - * the specified container fd
> + * the specified container fd. In case of -1 the default container
> + * fd will be used.
I haven't looked at the patchset in depth yet, however, this changes the
public API, so warrants a note in release notes.
--
Thanks,
Anatoly
^ permalink raw reply [flat|nested] 79+ messages in thread
* [dpdk-dev] [PATCH 2/6] vfio: don't fail to DMA map if memory is already mapped
2019-02-13 9:10 [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for external memory Shahaf Shuler
2019-02-13 9:10 ` [dpdk-dev] [PATCH 1/6] vfio: allow DMA map of memory for the default vfio fd Shahaf Shuler
@ 2019-02-13 9:10 ` Shahaf Shuler
2019-02-13 9:58 ` Gaëtan Rivet
2019-02-13 9:10 ` [dpdk-dev] [PATCH 3/6] bus: introduce DMA memory mapping for external memory Shahaf Shuler
` (11 subsequent siblings)
13 siblings, 1 reply; 79+ messages in thread
From: Shahaf Shuler @ 2019-02-13 9:10 UTC (permalink / raw)
To: anatoly.burakov, yskoh, thomas, ferruh.yigit, nhorman, gaetan.rivet; +Cc: dev
Currently vfio DMA map function will fail in case the same memory
segment is mapped twice.
This is too strict, as this is not an error to map the same memory
twice.
Instead, use the kernel return value to detect such state and have the
DMA function to return as successful.
For type1 mapping the kernel driver first implementation returns EBUSY
and since kernel 3.11 returns EEXISTS. For spapr mapping EBUSY is returned
since kernel 4.10.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
---
lib/librte_eal/linuxapp/eal/eal_vfio.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index 48ca9465d4..2a2d655b37 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -1263,7 +1263,11 @@ vfio_type1_dma_mem_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova,
VFIO_DMA_MAP_FLAG_WRITE;
ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, &dma_map);
- if (ret) {
+ /**
+ * In case the mapping was already done EEXIST will be
+ * returned from kernel.
+ */
+ if ((ret != -EEXIST) && (ret != -EBUSY)) {
RTE_LOG(ERR, EAL, " cannot set up DMA remapping, error %i (%s)\n",
errno, strerror(errno));
return -1;
@@ -1324,7 +1328,11 @@ vfio_spapr_dma_do_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova,
VFIO_DMA_MAP_FLAG_WRITE;
ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, &dma_map);
- if (ret) {
+ /**
+ * In case the mapping was already done EBUSY will be
+ * returned from kernel.
+ */
+ if (ret != -EBUSY) {
RTE_LOG(ERR, EAL, " cannot set up DMA remapping, error %i (%s)\n",
errno, strerror(errno));
return -1;
--
2.12.0
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH 2/6] vfio: don't fail to DMA map if memory is already mapped
2019-02-13 9:10 ` [dpdk-dev] [PATCH 2/6] vfio: don't fail to DMA map if memory is already mapped Shahaf Shuler
@ 2019-02-13 9:58 ` Gaëtan Rivet
2019-02-13 19:52 ` Shahaf Shuler
0 siblings, 1 reply; 79+ messages in thread
From: Gaëtan Rivet @ 2019-02-13 9:58 UTC (permalink / raw)
To: Shahaf Shuler; +Cc: anatoly.burakov, yskoh, thomas, ferruh.yigit, nhorman, dev
On Wed, Feb 13, 2019 at 11:10:22AM +0200, Shahaf Shuler wrote:
> Currently vfio DMA map function will fail in case the same memory
> segment is mapped twice.
>
> This is too strict, as this is not an error to map the same memory
> twice.
>
> Instead, use the kernel return value to detect such state and have the
> DMA function to return as successful.
>
> For type1 mapping the kernel driver first implementation returns EBUSY
> and since kernel 3.11 returns EEXISTS. For spapr mapping EBUSY is returned
> since kernel 4.10.
>
What is the earliest version supported by DPDK? I thought 3.10 was
dropped, should we care about the 3.11 return value?
> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> ---
> lib/librte_eal/linuxapp/eal/eal_vfio.c | 12 ++++++++++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> index 48ca9465d4..2a2d655b37 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> @@ -1263,7 +1263,11 @@ vfio_type1_dma_mem_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova,
> VFIO_DMA_MAP_FLAG_WRITE;
>
> ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, &dma_map);
> - if (ret) {
> + /**
> + * In case the mapping was already done EEXIST will be
> + * returned from kernel.
> + */
> + if ((ret != -EEXIST) && (ret != -EBUSY)) {
Won't a ret == 0 trigger the error then?
It seems ifdef about linux versions are not common in vfio code, but
bar that I think it would be cleaner to restrict the acceptable
error to it.
When a version will be dropped it will be much easier to remove the
related code instead of digging in the commit logs, and leaving both
would not be clean.
> RTE_LOG(ERR, EAL, " cannot set up DMA remapping, error %i (%s)\n",
> errno, strerror(errno));
> return -1;
> @@ -1324,7 +1328,11 @@ vfio_spapr_dma_do_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova,
> VFIO_DMA_MAP_FLAG_WRITE;
>
> ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, &dma_map);
> - if (ret) {
> + /**
> + * In case the mapping was already done EBUSY will be
> + * returned from kernel.
> + */
> + if (ret != -EBUSY) {
> RTE_LOG(ERR, EAL, " cannot set up DMA remapping, error %i (%s)\n",
> errno, strerror(errno));
> return -1;
> --
> 2.12.0
>
--
Gaëtan Rivet
6WIND
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH 2/6] vfio: don't fail to DMA map if memory is already mapped
2019-02-13 9:58 ` Gaëtan Rivet
@ 2019-02-13 19:52 ` Shahaf Shuler
0 siblings, 0 replies; 79+ messages in thread
From: Shahaf Shuler @ 2019-02-13 19:52 UTC (permalink / raw)
To: Gaëtan Rivet
Cc: anatoly.burakov, Yongseok Koh, Thomas Monjalon, ferruh.yigit,
nhorman, dev
Wednesday, February 13, 2019 11:59 AM, Gaëtan RiveT:
> Subject: Re: [PATCH 2/6] vfio: don't fail to DMA map if memory is already
> mapped
>
> On Wed, Feb 13, 2019 at 11:10:22AM +0200, Shahaf Shuler wrote:
> > Currently vfio DMA map function will fail in case the same memory
> > segment is mapped twice.
> >
> > This is too strict, as this is not an error to map the same memory
> > twice.
> >
> > Instead, use the kernel return value to detect such state and have the
> > DMA function to return as successful.
> >
> > For type1 mapping the kernel driver first implementation returns EBUSY
> > and since kernel 3.11 returns EEXISTS. For spapr mapping EBUSY is
> > returned since kernel 4.10.
> >
>
> What is the earliest version supported by DPDK? I thought 3.10 was dropped,
> should we care about the 3.11 return value?
According to DPDK doc it is 3.16, however compatibility for Centos/RH 7 should be kept.
I have looked on Centos 7 source code (linux-3.10.0-957.5.1.el7), it uses EEXIST for the vfio type 1.
So I think, I can drop the EBUSY error check and leave only the EEXIST one.
>
> > Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> > ---
> > lib/librte_eal/linuxapp/eal/eal_vfio.c | 12 ++++++++++--
> > 1 file changed, 10 insertions(+), 2 deletions(-)
> >
> > diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > index 48ca9465d4..2a2d655b37 100644
> > --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > @@ -1263,7 +1263,11 @@ vfio_type1_dma_mem_map(int
> vfio_container_fd, uint64_t vaddr, uint64_t iova,
> > VFIO_DMA_MAP_FLAG_WRITE;
> >
> > ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA,
> &dma_map);
> > - if (ret) {
> > + /**
> > + * In case the mapping was already done EEXIST will be
> > + * returned from kernel.
> > + */
> > + if ((ret != -EEXIST) && (ret != -EBUSY)) {
>
> Won't a ret == 0 trigger the error then?
Yes good catch.
>
> It seems ifdef about linux versions are not common in vfio code, but bar that
> I think it would be cleaner to restrict the acceptable error to it.
>
> When a version will be dropped it will be much easier to remove the related
> code instead of digging in the commit logs, and leaving both would not be
> clean.
Relaying on kernel version is not safe enough. Many distro backport to their kernel w/o updating its version. This will lead to unexpected behavior.
>
> > RTE_LOG(ERR, EAL, " cannot set up DMA remapping,
> error %i (%s)\n",
> > errno, strerror(errno));
> > return -1;
> > @@ -1324,7 +1328,11 @@ vfio_spapr_dma_do_map(int vfio_container_fd,
> uint64_t vaddr, uint64_t iova,
> > VFIO_DMA_MAP_FLAG_WRITE;
> >
> > ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA,
> &dma_map);
> > - if (ret) {
> > + /**
> > + * In case the mapping was already done EBUSY will be
> > + * returned from kernel.
> > + */
> > + if (ret != -EBUSY) {
> > RTE_LOG(ERR, EAL, " cannot set up DMA remapping,
> error %i (%s)\n",
> > errno, strerror(errno));
> > return -1;
> > --
> > 2.12.0
> >
>
> --
> Gaëtan Rivet
> 6WIND
^ permalink raw reply [flat|nested] 79+ messages in thread
* [dpdk-dev] [PATCH 3/6] bus: introduce DMA memory mapping for external memory
2019-02-13 9:10 [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for external memory Shahaf Shuler
2019-02-13 9:10 ` [dpdk-dev] [PATCH 1/6] vfio: allow DMA map of memory for the default vfio fd Shahaf Shuler
2019-02-13 9:10 ` [dpdk-dev] [PATCH 2/6] vfio: don't fail to DMA map if memory is already mapped Shahaf Shuler
@ 2019-02-13 9:10 ` Shahaf Shuler
2019-02-13 11:17 ` Gaëtan Rivet
2019-02-13 9:10 ` [dpdk-dev] [PATCH 4/6] net/mlx5: refactor external memory registration Shahaf Shuler
` (10 subsequent siblings)
13 siblings, 1 reply; 79+ messages in thread
From: Shahaf Shuler @ 2019-02-13 9:10 UTC (permalink / raw)
To: anatoly.burakov, yskoh, thomas, ferruh.yigit, nhorman, gaetan.rivet; +Cc: dev
The DPDK APIs expose 3 different modes to work with memory used for DMA:
1. Use the DPDK owned memory (backed by the DPDK provided hugepages).
This memory is allocated by the DPDK libraries, included in the DPDK
memory system (memseg lists) and automatically DMA mapped by the DPDK
layers.
2. Use memory allocated by the user and register to the DPDK memory
systems. This is also referred as external memory. Upon registration of
the external memory, the DPDK layers will DMA map it to all needed
devices.
3. Use memory allocated by the user and not registered to the DPDK memory
system. This is for users who wants to have tight control on this
memory. The user will need to explicitly call DMA map function in order
to register such memory to the different devices.
The scope of the patch focus on #3 above.
Currently the only way to map external memory is through VFIO
(rte_vfio_dma_map). While VFIO is common, there are other vendors
which use different ways to map memory (e.g. Mellanox and NXP).
The work in this patch moves the DMA mapping to vendor agnostic APIs.
A new map and unmap ops were added to rte_bus structure. Implementation
of those was done currently only on the PCI bus. The implementation takes
the driver map and umap implementation as bypass to the VFIO mapping.
That is, in case of no specific map/unmap from the PCI driver,
VFIO mapping, if possible, will be used.
Application use with those APIs is quite simple:
* allocate memory
* take a device, and query its rte_device.
* call the bus map function for this device.
Future work will deprecate the rte_vfio_dma_map and rte_vfio_dma_unmap
APIs, leaving the PCI device APIs as the preferred option for the user.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
---
drivers/bus/pci/pci_common.c | 78 ++++++++++++++++++++++++++++
drivers/bus/pci/rte_bus_pci.h | 14 +++++
lib/librte_eal/common/eal_common_bus.c | 22 ++++++++
lib/librte_eal/common/include/rte_bus.h | 57 ++++++++++++++++++++
lib/librte_eal/rte_eal_version.map | 2 +
5 files changed, 173 insertions(+)
diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index 6276e5d695..018080c48b 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -528,6 +528,82 @@ pci_unplug(struct rte_device *dev)
return ret;
}
+/**
+ * DMA Map memory segment to device. After a successful call the device
+ * will be able to read/write from/to this segment.
+ *
+ * @param dev
+ * Pointer to the PCI device.
+ * @param addr
+ * Starting virtual address of memory to be mapped.
+ * @param iova
+ * Starting IOVA address of memory to be mapped.
+ * @param len
+ * Length of memory segment being mapped.
+ * @return
+ * - 0 On success.
+ * - Negative value and rte_errno is set otherwise.
+ */
+static int __rte_experimental
+pci_dma_map(struct rte_device *dev, void *addr, uint64_t iova, size_t len)
+{
+ struct rte_pci_device *pdev = RTE_DEV_TO_PCI(dev);
+
+ if (!pdev || !pdev->driver) {
+ rte_errno = EINVAL;
+ return -rte_errno;
+ }
+ if (pdev->driver->map)
+ return pdev->driver->map(pdev, addr, iova, len);
+ /**
+ * In case driver don't provides any specific mapping
+ * try fallback to VFIO.
+ */
+ if (pdev->kdrv == RTE_KDRV_VFIO)
+ return rte_vfio_container_dma_map(-1, (uintptr_t)addr, iova,
+ len);
+ rte_errno = ENOTSUP;
+ return -rte_errno;
+}
+
+/**
+ * Un-map memory segment to device. After a successful call the device
+ * will not be able to read/write from/to this segment.
+ *
+ * @param dev
+ * Pointer to the PCI device.
+ * @param addr
+ * Starting virtual address of memory to be unmapped.
+ * @param iova
+ * Starting IOVA address of memory to be unmapped.
+ * @param len
+ * Length of memory segment being unmapped.
+ * @return
+ * - 0 On success.
+ * - Negative value and rte_errno is set otherwise.
+ */
+static int __rte_experimental
+pci_dma_unmap(struct rte_device *dev, void *addr, uint64_t iova, size_t len)
+{
+ struct rte_pci_device *pdev = RTE_DEV_TO_PCI(dev);
+
+ if (!pdev || !pdev->driver) {
+ rte_errno = EINVAL;
+ return -rte_errno;
+ }
+ if (pdev->driver->unmap)
+ return pdev->driver->unmap(pdev, addr, iova, len);
+ /**
+ * In case driver don't provides any specific mapping
+ * try fallback to VFIO.
+ */
+ if (pdev->kdrv == RTE_KDRV_VFIO)
+ return rte_vfio_container_dma_unmap(-1, (uintptr_t)addr, iova,
+ len);
+ rte_errno = ENOTSUP;
+ return -rte_errno;
+}
+
struct rte_pci_bus rte_pci_bus = {
.bus = {
.scan = rte_pci_scan,
@@ -536,6 +612,8 @@ struct rte_pci_bus rte_pci_bus = {
.plug = pci_plug,
.unplug = pci_unplug,
.parse = pci_parse,
+ .map = pci_dma_map,
+ .unmap = pci_dma_unmap,
.get_iommu_class = rte_pci_get_iommu_class,
.dev_iterate = rte_pci_dev_iterate,
.hot_unplug_handler = pci_hot_unplug_handler,
diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h
index f0d6d81c00..00b2d412c7 100644
--- a/drivers/bus/pci/rte_bus_pci.h
+++ b/drivers/bus/pci/rte_bus_pci.h
@@ -114,6 +114,18 @@ typedef int (pci_probe_t)(struct rte_pci_driver *, struct rte_pci_device *);
typedef int (pci_remove_t)(struct rte_pci_device *);
/**
+ * Driver-specific DMA mapping.
+ */
+typedef int (pci_dma_map_t)(struct rte_pci_device *dev, void *addr,
+ uint64_t iova, size_t len);
+
+/**
+ * Driver-specific DMA unmapping.
+ */
+typedef int (pci_dma_unmap_t)(struct rte_pci_device *dev, void *addr,
+ uint64_t iova, size_t len);
+
+/**
* A structure describing a PCI driver.
*/
struct rte_pci_driver {
@@ -122,6 +134,8 @@ struct rte_pci_driver {
struct rte_pci_bus *bus; /**< PCI bus reference. */
pci_probe_t *probe; /**< Device Probe function. */
pci_remove_t *remove; /**< Device Remove function. */
+ pci_dma_map_t *map; /**< device dma map function. */
+ pci_dma_unmap_t *unmap; /**< device dma unmap function. */
const struct rte_pci_id *id_table; /**< ID table, NULL terminated. */
uint32_t drv_flags; /**< Flags RTE_PCI_DRV_*. */
};
diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
index c8f1901f0b..b7911d5ddd 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -285,3 +285,25 @@ rte_bus_sigbus_handler(const void *failure_addr)
return ret;
}
+
+int __rte_experimental
+rte_bus_dma_map(struct rte_device *dev, void *addr, uint64_t iova,
+ size_t len)
+{
+ if (dev->bus->map == NULL || len == 0) {
+ rte_errno = EINVAL;
+ return -rte_errno;
+ }
+ return dev->bus->map(dev, addr, iova, len);
+}
+
+int __rte_experimental
+rte_bus_dma_unmap(struct rte_device *dev, void *addr, uint64_t iova,
+ size_t len)
+{
+ if (dev->bus->unmap == NULL || len == 0) {
+ rte_errno = EINVAL;
+ return -rte_errno;
+ }
+ return dev->bus->unmap(dev, addr, iova, len);
+}
diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index 6be4b5cabe..90e4bf51b2 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -168,6 +168,48 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
typedef int (*rte_bus_parse_t)(const char *name, void *addr);
/**
+ * Bus specific DMA map function.
+ * After a successful call, the memory segment will be mapped to the
+ * given device.
+ *
+ * @param dev
+ * Device pointer.
+ * @param addr
+ * Virtual address to map.
+ * @param iova
+ * IOVA address to map.
+ * @param len
+ * Length of the memory segment being mapped.
+ *
+ * @return
+ * 0 if mapping was successful.
+ * Negative value and rte_errno is set otherwise.
+ */
+typedef int (*rte_bus_map_t)(struct rte_device *dev, void *addr,
+ uint64_t iova, size_t len);
+
+/**
+ * Bus specific DMA unmap function.
+ * After a successful call, the memory segment will no longer be
+ * accessible by the given device.
+ *
+ * @param dev
+ * Device pointer.
+ * @param addr
+ * Virtual address to unmap.
+ * @param iova
+ * IOVA address to unmap.
+ * @param len
+ * Length of the memory segment being mapped.
+ *
+ * @return
+ * 0 if un-mapping was successful.
+ * Negative value and rte_errno is set otherwise.
+ */
+typedef int (*rte_bus_unmap_t)(struct rte_device *dev, void *addr,
+ uint64_t iova, size_t len);
+
+/**
* Implement a specific hot-unplug handler, which is responsible for
* handle the failure when device be hot-unplugged. When the event of
* hot-unplug be detected, it could call this function to handle
@@ -238,6 +280,8 @@ struct rte_bus {
rte_bus_plug_t plug; /**< Probe single device for drivers */
rte_bus_unplug_t unplug; /**< Remove single device from driver */
rte_bus_parse_t parse; /**< Parse a device name */
+ rte_bus_map_t map; /**< DMA map for device in the bus */
+ rte_bus_unmap_t unmap; /**< DMA unmap for device in the bus */
struct rte_bus_conf conf; /**< Bus configuration */
rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
rte_dev_iterate_t dev_iterate; /**< Device iterator. */
@@ -356,6 +400,19 @@ struct rte_bus *rte_bus_find_by_name(const char *busname);
enum rte_iova_mode rte_bus_get_iommu_class(void);
/**
+ * Wrapper to call the bus specific DMA map function.
+ */
+int __rte_experimental
+rte_bus_dma_map(struct rte_device *dev, void *addr, uint64_t iova, size_t len);
+
+/**
+ * Wrapper to call the bus specific DMA unmap function.
+ */
+int __rte_experimental
+rte_bus_dma_unmap(struct rte_device *dev, void *addr, uint64_t iova,
+ size_t len);
+
+/**
* Helper for Bus registration.
* The constructor has higher priority than PMD constructors.
*/
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index eb5f7b9cbd..23f3adb73a 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -364,4 +364,6 @@ EXPERIMENTAL {
rte_service_may_be_active;
rte_socket_count;
rte_socket_id_by_idx;
+ rte_bus_dma_map;
+ rte_bus_dma_unmap;
};
--
2.12.0
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH 3/6] bus: introduce DMA memory mapping for external memory
2019-02-13 9:10 ` [dpdk-dev] [PATCH 3/6] bus: introduce DMA memory mapping for external memory Shahaf Shuler
@ 2019-02-13 11:17 ` Gaëtan Rivet
2019-02-13 19:07 ` Shahaf Shuler
0 siblings, 1 reply; 79+ messages in thread
From: Gaëtan Rivet @ 2019-02-13 11:17 UTC (permalink / raw)
To: Shahaf Shuler; +Cc: anatoly.burakov, yskoh, thomas, ferruh.yigit, nhorman, dev
On Wed, Feb 13, 2019 at 11:10:23AM +0200, Shahaf Shuler wrote:
> The DPDK APIs expose 3 different modes to work with memory used for DMA:
>
> 1. Use the DPDK owned memory (backed by the DPDK provided hugepages).
> This memory is allocated by the DPDK libraries, included in the DPDK
> memory system (memseg lists) and automatically DMA mapped by the DPDK
> layers.
>
> 2. Use memory allocated by the user and register to the DPDK memory
> systems. This is also referred as external memory. Upon registration of
> the external memory, the DPDK layers will DMA map it to all needed
> devices.
>
> 3. Use memory allocated by the user and not registered to the DPDK memory
> system. This is for users who wants to have tight control on this
> memory. The user will need to explicitly call DMA map function in order
> to register such memory to the different devices.
>
> The scope of the patch focus on #3 above.
>
> Currently the only way to map external memory is through VFIO
> (rte_vfio_dma_map). While VFIO is common, there are other vendors
> which use different ways to map memory (e.g. Mellanox and NXP).
>
How are those other vendors' devices mapped initially right now? Are
they using #2 scheme instead? Then the user will remap everything using
#3?
Would it be interesting to be able to describe a mapping prior to
probing a device and refer to it upon hotplug?
> The work in this patch moves the DMA mapping to vendor agnostic APIs.
> A new map and unmap ops were added to rte_bus structure. Implementation
> of those was done currently only on the PCI bus. The implementation takes
> the driver map and umap implementation as bypass to the VFIO mapping.
> That is, in case of no specific map/unmap from the PCI driver,
> VFIO mapping, if possible, will be used.
This paragraph should be rewritten to better fit a commit log.
>
> Application use with those APIs is quite simple:
> * allocate memory
> * take a device, and query its rte_device.
> * call the bus map function for this device.
Is the device already configured with the existing mappings? Should the
application stop it before attempting to map its allocated memory?
>
> Future work will deprecate the rte_vfio_dma_map and rte_vfio_dma_unmap
> APIs, leaving the PCI device APIs as the preferred option for the user.
>
> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> ---
> drivers/bus/pci/pci_common.c | 78 ++++++++++++++++++++++++++++
> drivers/bus/pci/rte_bus_pci.h | 14 +++++
> lib/librte_eal/common/eal_common_bus.c | 22 ++++++++
> lib/librte_eal/common/include/rte_bus.h | 57 ++++++++++++++++++++
> lib/librte_eal/rte_eal_version.map | 2 +
> 5 files changed, 173 insertions(+)
>
> diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
> index 6276e5d695..018080c48b 100644
> --- a/drivers/bus/pci/pci_common.c
> +++ b/drivers/bus/pci/pci_common.c
> @@ -528,6 +528,82 @@ pci_unplug(struct rte_device *dev)
> return ret;
> }
>
> +/**
> + * DMA Map memory segment to device. After a successful call the device
> + * will be able to read/write from/to this segment.
> + *
> + * @param dev
> + * Pointer to the PCI device.
> + * @param addr
> + * Starting virtual address of memory to be mapped.
> + * @param iova
> + * Starting IOVA address of memory to be mapped.
> + * @param len
> + * Length of memory segment being mapped.
> + * @return
> + * - 0 On success.
> + * - Negative value and rte_errno is set otherwise.
> + */
This doc should be on the callback typedef, not their implementation.
The rte_errno error spec should also be documented higher-up in the
abstraction pile, on the bus callback I think. Everyone should follow
the same error codes for applications to really be able to use any
implementation generically.
> +static int __rte_experimental
The __rte_experimental is not necessary in compilation units themselves,
only in the headers.
In any case, it would only be the publicly available API that must be
marked as such, so more the callback typedefs than their
implementations.
> +pci_dma_map(struct rte_device *dev, void *addr, uint64_t iova, size_t len)
> +{
> + struct rte_pci_device *pdev = RTE_DEV_TO_PCI(dev);
> +
> + if (!pdev || !pdev->driver) {
pdev cannot be null here, nor should its driver be.
> + rte_errno = EINVAL;
> + return -rte_errno;
> + }
> + if (pdev->driver->map)
> + return pdev->driver->map(pdev, addr, iova, len);
> + /**
> + * In case driver don't provides any specific mapping
> + * try fallback to VFIO.
> + */
> + if (pdev->kdrv == RTE_KDRV_VFIO)
> + return rte_vfio_container_dma_map(-1, (uintptr_t)addr, iova,
> + len);
Reiterating: RTE_VFIO_DEFAULT_CONTAINER_FD is more readable I think than
-1 here.
> + rte_errno = ENOTSUP;
> + return -rte_errno;
> +}
> +
> +/**
> + * Un-map memory segment to device. After a successful call the device
> + * will not be able to read/write from/to this segment.
> + *
> + * @param dev
> + * Pointer to the PCI device.
> + * @param addr
> + * Starting virtual address of memory to be unmapped.
> + * @param iova
> + * Starting IOVA address of memory to be unmapped.
> + * @param len
> + * Length of memory segment being unmapped.
> + * @return
> + * - 0 On success.
> + * - Negative value and rte_errno is set otherwise.
> + */
> +static int __rte_experimental
Same as before for __rte_experimental and doc.
> +pci_dma_unmap(struct rte_device *dev, void *addr, uint64_t iova, size_t len)
> +{
> + struct rte_pci_device *pdev = RTE_DEV_TO_PCI(dev);
> +
> + if (!pdev || !pdev->driver) {
> + rte_errno = EINVAL;
> + return -rte_errno;
> + }
> + if (pdev->driver->unmap)
> + return pdev->driver->unmap(pdev, addr, iova, len);
> + /**
> + * In case driver don't provides any specific mapping
> + * try fallback to VFIO.
> + */
> + if (pdev->kdrv == RTE_KDRV_VFIO)
> + return rte_vfio_container_dma_unmap(-1, (uintptr_t)addr, iova,
> + len);
> + rte_errno = ENOTSUP;
> + return -rte_errno;
> +}
> +
> struct rte_pci_bus rte_pci_bus = {
> .bus = {
> .scan = rte_pci_scan,
> @@ -536,6 +612,8 @@ struct rte_pci_bus rte_pci_bus = {
> .plug = pci_plug,
> .unplug = pci_unplug,
> .parse = pci_parse,
> + .map = pci_dma_map,
> + .unmap = pci_dma_unmap,
> .get_iommu_class = rte_pci_get_iommu_class,
> .dev_iterate = rte_pci_dev_iterate,
> .hot_unplug_handler = pci_hot_unplug_handler,
> diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h
> index f0d6d81c00..00b2d412c7 100644
> --- a/drivers/bus/pci/rte_bus_pci.h
> +++ b/drivers/bus/pci/rte_bus_pci.h
> @@ -114,6 +114,18 @@ typedef int (pci_probe_t)(struct rte_pci_driver *, struct rte_pci_device *);
> typedef int (pci_remove_t)(struct rte_pci_device *);
>
> /**
> + * Driver-specific DMA mapping.
> + */
> +typedef int (pci_dma_map_t)(struct rte_pci_device *dev, void *addr,
> + uint64_t iova, size_t len);
> +
> +/**
> + * Driver-specific DMA unmapping.
> + */
> +typedef int (pci_dma_unmap_t)(struct rte_pci_device *dev, void *addr,
> + uint64_t iova, size_t len);
> +
> +/**
> * A structure describing a PCI driver.
> */
> struct rte_pci_driver {
> @@ -122,6 +134,8 @@ struct rte_pci_driver {
> struct rte_pci_bus *bus; /**< PCI bus reference. */
> pci_probe_t *probe; /**< Device Probe function. */
> pci_remove_t *remove; /**< Device Remove function. */
> + pci_dma_map_t *map; /**< device dma map function. */
> + pci_dma_unmap_t *unmap; /**< device dma unmap function. */
I'd call both callbacks dma_map and dma_unmap. It's clearer and more
consistent.
> const struct rte_pci_id *id_table; /**< ID table, NULL terminated. */
> uint32_t drv_flags; /**< Flags RTE_PCI_DRV_*. */
> };
> diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
> index c8f1901f0b..b7911d5ddd 100644
> --- a/lib/librte_eal/common/eal_common_bus.c
> +++ b/lib/librte_eal/common/eal_common_bus.c
> @@ -285,3 +285,25 @@ rte_bus_sigbus_handler(const void *failure_addr)
>
> return ret;
> }
> +
> +int __rte_experimental
> +rte_bus_dma_map(struct rte_device *dev, void *addr, uint64_t iova,
> + size_t len)
> +{
> + if (dev->bus->map == NULL || len == 0) {
> + rte_errno = EINVAL;
> + return -rte_errno;
> + }
> + return dev->bus->map(dev, addr, iova, len);
> +}
> +
> +int __rte_experimental
> +rte_bus_dma_unmap(struct rte_device *dev, void *addr, uint64_t iova,
> + size_t len)
> +{
> + if (dev->bus->unmap == NULL || len == 0) {
> + rte_errno = EINVAL;
> + return -rte_errno;
> + }
> + return dev->bus->unmap(dev, addr, iova, len);
> +}
These functions should be called rte_dev_dma_{map,unmap} and be part of
eal_common_dev.c instead.
> diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
> index 6be4b5cabe..90e4bf51b2 100644
> --- a/lib/librte_eal/common/include/rte_bus.h
> +++ b/lib/librte_eal/common/include/rte_bus.h
> @@ -168,6 +168,48 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
> typedef int (*rte_bus_parse_t)(const char *name, void *addr);
>
> /**
> + * Bus specific DMA map function.
> + * After a successful call, the memory segment will be mapped to the
> + * given device.
> + *
> + * @param dev
> + * Device pointer.
> + * @param addr
> + * Virtual address to map.
> + * @param iova
> + * IOVA address to map.
> + * @param len
> + * Length of the memory segment being mapped.
> + *
> + * @return
> + * 0 if mapping was successful.
> + * Negative value and rte_errno is set otherwise.
> + */
> +typedef int (*rte_bus_map_t)(struct rte_device *dev, void *addr,
> + uint64_t iova, size_t len);
> +
> +/**
> + * Bus specific DMA unmap function.
> + * After a successful call, the memory segment will no longer be
> + * accessible by the given device.
> + *
> + * @param dev
> + * Device pointer.
> + * @param addr
> + * Virtual address to unmap.
> + * @param iova
> + * IOVA address to unmap.
> + * @param len
> + * Length of the memory segment being mapped.
> + *
> + * @return
> + * 0 if un-mapping was successful.
> + * Negative value and rte_errno is set otherwise.
> + */
> +typedef int (*rte_bus_unmap_t)(struct rte_device *dev, void *addr,
> + uint64_t iova, size_t len);
> +
> +/**
> * Implement a specific hot-unplug handler, which is responsible for
> * handle the failure when device be hot-unplugged. When the event of
> * hot-unplug be detected, it could call this function to handle
> @@ -238,6 +280,8 @@ struct rte_bus {
> rte_bus_plug_t plug; /**< Probe single device for drivers */
> rte_bus_unplug_t unplug; /**< Remove single device from driver */
> rte_bus_parse_t parse; /**< Parse a device name */
> + rte_bus_map_t map; /**< DMA map for device in the bus */
> + rte_bus_unmap_t unmap; /**< DMA unmap for device in the bus */
Same as for the driver callbacks, dma_map and dma_unmap seem a better
fit for the field names.
> struct rte_bus_conf conf; /**< Bus configuration */
> rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
> rte_dev_iterate_t dev_iterate; /**< Device iterator. */
> @@ -356,6 +400,19 @@ struct rte_bus *rte_bus_find_by_name(const char *busname);
> enum rte_iova_mode rte_bus_get_iommu_class(void);
>
> /**
> + * Wrapper to call the bus specific DMA map function.
> + */
> +int __rte_experimental
> +rte_bus_dma_map(struct rte_device *dev, void *addr, uint64_t iova, size_t len);
> +
> +/**
> + * Wrapper to call the bus specific DMA unmap function.
> + */
> +int __rte_experimental
> +rte_bus_dma_unmap(struct rte_device *dev, void *addr, uint64_t iova,
> + size_t len);
> +
> +/**
Same as earlier -> these seem device-level functions, not bus-related.
You won't map those addresses to all devices on the bus.
> * Helper for Bus registration.
> * The constructor has higher priority than PMD constructors.
> */
> diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
> index eb5f7b9cbd..23f3adb73a 100644
> --- a/lib/librte_eal/rte_eal_version.map
> +++ b/lib/librte_eal/rte_eal_version.map
> @@ -364,4 +364,6 @@ EXPERIMENTAL {
> rte_service_may_be_active;
> rte_socket_count;
> rte_socket_id_by_idx;
> + rte_bus_dma_map;
> + rte_bus_dma_unmap;
> };
> --
> 2.12.0
>
--
Gaëtan Rivet
6WIND
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH 3/6] bus: introduce DMA memory mapping for external memory
2019-02-13 11:17 ` Gaëtan Rivet
@ 2019-02-13 19:07 ` Shahaf Shuler
2019-02-14 14:00 ` Gaëtan Rivet
0 siblings, 1 reply; 79+ messages in thread
From: Shahaf Shuler @ 2019-02-13 19:07 UTC (permalink / raw)
To: Gaëtan Rivet
Cc: anatoly.burakov, Yongseok Koh, Thomas Monjalon, ferruh.yigit,
nhorman, dev
Wednesday, February 13, 2019 1:17 PM, Gaëtan Rivet:
> Subject: Re: [PATCH 3/6] bus: introduce DMA memory mapping for external
> memory
>
> On Wed, Feb 13, 2019 at 11:10:23AM +0200, Shahaf Shuler wrote:
> > The DPDK APIs expose 3 different modes to work with memory used for
> DMA:
> >
> > 1. Use the DPDK owned memory (backed by the DPDK provided
> hugepages).
> > This memory is allocated by the DPDK libraries, included in the DPDK
> > memory system (memseg lists) and automatically DMA mapped by the
> DPDK
> > layers.
> >
> > 2. Use memory allocated by the user and register to the DPDK memory
> > systems. This is also referred as external memory. Upon registration
> > of the external memory, the DPDK layers will DMA map it to all needed
> > devices.
> >
> > 3. Use memory allocated by the user and not registered to the DPDK
> > memory system. This is for users who wants to have tight control on
> > this memory. The user will need to explicitly call DMA map function in
> > order to register such memory to the different devices.
> >
> > The scope of the patch focus on #3 above.
> >
> > Currently the only way to map external memory is through VFIO
> > (rte_vfio_dma_map). While VFIO is common, there are other vendors
> > which use different ways to map memory (e.g. Mellanox and NXP).
> >
>
> How are those other vendors' devices mapped initially right now? Are they
> using #2 scheme instead? Then the user will remap everything using #3?
It is not a re-map, it is a completely different mode for the memory management.
The first question to ask is "how the application wants to manage its memory" ?
If it is either #1 or #2 above, no problem to make the mapping internal on the "other vendor devices" as they can register to the memory event callback which trigger every time new memory is added to the DPDK memory management system.
For #3 the memory does not exists in the DPDK memory management system, and no memory events. Hence the application needs to explicitly call the dma MAP.
The change on this patch is just to make it more generic than calling only VFIO.
>
> Would it be interesting to be able to describe a mapping prior to probing a
> device and refer to it upon hotplug?
Not sure it is an interesting use case. I don't see the need to setup the application memory before the probing of the devices.
Regarding hotplug - this is a feature we can add on top of this series (for example if device was removed and hotplug back). This will require to store the mapping on some database, like VFIO does.
>
> > The work in this patch moves the DMA mapping to vendor agnostic APIs.
> > A new map and unmap ops were added to rte_bus structure.
> > Implementation of those was done currently only on the PCI bus. The
> > implementation takes the driver map and umap implementation as bypass
> to the VFIO mapping.
> > That is, in case of no specific map/unmap from the PCI driver, VFIO
> > mapping, if possible, will be used.
>
> This paragraph should be rewritten to better fit a commit log.
>
> >
> > Application use with those APIs is quite simple:
> > * allocate memory
> > * take a device, and query its rte_device.
> > * call the bus map function for this device.
>
> Is the device already configured with the existing mappings? Should the
> application stop it before attempting to map its allocated memory?
Am not following.
When the application wants to register new memory for DMA for this device it calls map. When it wants to unregister it calls unmap. w/o explicit call to the map function the memory cannot be used for DMA.
>
> >
> > Future work will deprecate the rte_vfio_dma_map and
> rte_vfio_dma_unmap
> > APIs, leaving the PCI device APIs as the preferred option for the user.
> >
> > Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> > ---
> > drivers/bus/pci/pci_common.c | 78
> ++++++++++++++++++++++++++++
> > drivers/bus/pci/rte_bus_pci.h | 14 +++++
> > lib/librte_eal/common/eal_common_bus.c | 22 ++++++++
> > lib/librte_eal/common/include/rte_bus.h | 57 ++++++++++++++++++++
> > lib/librte_eal/rte_eal_version.map | 2 +
> > 5 files changed, 173 insertions(+)
> >
> > diff --git a/drivers/bus/pci/pci_common.c
> > b/drivers/bus/pci/pci_common.c index 6276e5d695..018080c48b 100644
> > --- a/drivers/bus/pci/pci_common.c
> > +++ b/drivers/bus/pci/pci_common.c
> > @@ -528,6 +528,82 @@ pci_unplug(struct rte_device *dev)
> > return ret;
> > }
> >
> > +/**
> > + * DMA Map memory segment to device. After a successful call the
> > +device
> > + * will be able to read/write from/to this segment.
> > + *
> > + * @param dev
> > + * Pointer to the PCI device.
> > + * @param addr
> > + * Starting virtual address of memory to be mapped.
> > + * @param iova
> > + * Starting IOVA address of memory to be mapped.
> > + * @param len
> > + * Length of memory segment being mapped.
> > + * @return
> > + * - 0 On success.
> > + * - Negative value and rte_errno is set otherwise.
> > + */
>
> This doc should be on the callback typedef, not their implementation.
> The rte_errno error spec should also be documented higher-up in the
> abstraction pile, on the bus callback I think. Everyone should follow the same
> error codes for applications to really be able to use any implementation
> generically.
OK.
>
> > +static int __rte_experimental
>
> The __rte_experimental is not necessary in compilation units themselves,
> only in the headers.
>
> In any case, it would only be the publicly available API that must be marked
> as such, so more the callback typedefs than their implementations.
OK
>
> > +pci_dma_map(struct rte_device *dev, void *addr, uint64_t iova, size_t
> > +len) {
> > + struct rte_pci_device *pdev = RTE_DEV_TO_PCI(dev);
> > +
> > + if (!pdev || !pdev->driver) {
>
> pdev cannot be null here, nor should its driver be.
So you say to relay on it and drop the check?
>
> > + rte_errno = EINVAL;
> > + return -rte_errno;
> > + }
> > + if (pdev->driver->map)
> > + return pdev->driver->map(pdev, addr, iova, len);
> > + /**
> > + * In case driver don't provides any specific mapping
> > + * try fallback to VFIO.
> > + */
> > + if (pdev->kdrv == RTE_KDRV_VFIO)
> > + return rte_vfio_container_dma_map(-1, (uintptr_t)addr,
> iova,
> > + len);
>
> Reiterating: RTE_VFIO_DEFAULT_CONTAINER_FD is more readable I think
> than
> -1 here.
>
> > + rte_errno = ENOTSUP;
> > + return -rte_errno;
> > +}
> > +
> > +/**
> > + * Un-map memory segment to device. After a successful call the
> > +device
> > + * will not be able to read/write from/to this segment.
> > + *
> > + * @param dev
> > + * Pointer to the PCI device.
> > + * @param addr
> > + * Starting virtual address of memory to be unmapped.
> > + * @param iova
> > + * Starting IOVA address of memory to be unmapped.
> > + * @param len
> > + * Length of memory segment being unmapped.
> > + * @return
> > + * - 0 On success.
> > + * - Negative value and rte_errno is set otherwise.
> > + */
> > +static int __rte_experimental
>
> Same as before for __rte_experimental and doc.
>
> > +pci_dma_unmap(struct rte_device *dev, void *addr, uint64_t iova,
> > +size_t len) {
> > + struct rte_pci_device *pdev = RTE_DEV_TO_PCI(dev);
> > +
> > + if (!pdev || !pdev->driver) {
> > + rte_errno = EINVAL;
> > + return -rte_errno;
> > + }
> > + if (pdev->driver->unmap)
> > + return pdev->driver->unmap(pdev, addr, iova, len);
> > + /**
> > + * In case driver don't provides any specific mapping
> > + * try fallback to VFIO.
> > + */
> > + if (pdev->kdrv == RTE_KDRV_VFIO)
> > + return rte_vfio_container_dma_unmap(-1, (uintptr_t)addr,
> iova,
> > + len);
> > + rte_errno = ENOTSUP;
> > + return -rte_errno;
> > +}
> > +
> > struct rte_pci_bus rte_pci_bus = {
> > .bus = {
> > .scan = rte_pci_scan,
> > @@ -536,6 +612,8 @@ struct rte_pci_bus rte_pci_bus = {
> > .plug = pci_plug,
> > .unplug = pci_unplug,
> > .parse = pci_parse,
> > + .map = pci_dma_map,
> > + .unmap = pci_dma_unmap,
> > .get_iommu_class = rte_pci_get_iommu_class,
> > .dev_iterate = rte_pci_dev_iterate,
> > .hot_unplug_handler = pci_hot_unplug_handler, diff --git
> > a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h index
> > f0d6d81c00..00b2d412c7 100644
> > --- a/drivers/bus/pci/rte_bus_pci.h
> > +++ b/drivers/bus/pci/rte_bus_pci.h
> > @@ -114,6 +114,18 @@ typedef int (pci_probe_t)(struct rte_pci_driver
> > *, struct rte_pci_device *); typedef int (pci_remove_t)(struct
> > rte_pci_device *);
> >
> > /**
> > + * Driver-specific DMA mapping.
> > + */
> > +typedef int (pci_dma_map_t)(struct rte_pci_device *dev, void *addr,
> > + uint64_t iova, size_t len);
> > +
> > +/**
> > + * Driver-specific DMA unmapping.
> > + */
> > +typedef int (pci_dma_unmap_t)(struct rte_pci_device *dev, void *addr,
> > + uint64_t iova, size_t len);
> > +
> > +/**
> > * A structure describing a PCI driver.
> > */
> > struct rte_pci_driver {
> > @@ -122,6 +134,8 @@ struct rte_pci_driver {
> > struct rte_pci_bus *bus; /**< PCI bus reference. */
> > pci_probe_t *probe; /**< Device Probe function. */
> > pci_remove_t *remove; /**< Device Remove function. */
> > + pci_dma_map_t *map; /**< device dma map function. */
> > + pci_dma_unmap_t *unmap; /**< device dma unmap
> function. */
>
> I'd call both callbacks dma_map and dma_unmap. It's clearer and more
> consistent.
OK.
>
> > const struct rte_pci_id *id_table; /**< ID table, NULL terminated. */
> > uint32_t drv_flags; /**< Flags RTE_PCI_DRV_*. */
> > };
> > diff --git a/lib/librte_eal/common/eal_common_bus.c
> > b/lib/librte_eal/common/eal_common_bus.c
> > index c8f1901f0b..b7911d5ddd 100644
> > --- a/lib/librte_eal/common/eal_common_bus.c
> > +++ b/lib/librte_eal/common/eal_common_bus.c
> > @@ -285,3 +285,25 @@ rte_bus_sigbus_handler(const void *failure_addr)
> >
> > return ret;
> > }
> > +
> > +int __rte_experimental
> > +rte_bus_dma_map(struct rte_device *dev, void *addr, uint64_t iova,
> > + size_t len)
> > +{
> > + if (dev->bus->map == NULL || len == 0) {
> > + rte_errno = EINVAL;
> > + return -rte_errno;
> > + }
> > + return dev->bus->map(dev, addr, iova, len); }
> > +
> > +int __rte_experimental
> > +rte_bus_dma_unmap(struct rte_device *dev, void *addr, uint64_t iova,
> > + size_t len)
> > +{
> > + if (dev->bus->unmap == NULL || len == 0) {
> > + rte_errno = EINVAL;
> > + return -rte_errno;
> > + }
> > + return dev->bus->unmap(dev, addr, iova, len); }
>
> These functions should be called rte_dev_dma_{map,unmap} and be part of
> eal_common_dev.c instead.
Will move.
>
> > diff --git a/lib/librte_eal/common/include/rte_bus.h
> > b/lib/librte_eal/common/include/rte_bus.h
> > index 6be4b5cabe..90e4bf51b2 100644
> > --- a/lib/librte_eal/common/include/rte_bus.h
> > +++ b/lib/librte_eal/common/include/rte_bus.h
> > @@ -168,6 +168,48 @@ typedef int (*rte_bus_unplug_t)(struct rte_device
> > *dev); typedef int (*rte_bus_parse_t)(const char *name, void *addr);
> >
> > /**
> > + * Bus specific DMA map function.
> > + * After a successful call, the memory segment will be mapped to the
> > + * given device.
> > + *
> > + * @param dev
> > + * Device pointer.
> > + * @param addr
> > + * Virtual address to map.
> > + * @param iova
> > + * IOVA address to map.
> > + * @param len
> > + * Length of the memory segment being mapped.
> > + *
> > + * @return
> > + * 0 if mapping was successful.
> > + * Negative value and rte_errno is set otherwise.
> > + */
> > +typedef int (*rte_bus_map_t)(struct rte_device *dev, void *addr,
> > + uint64_t iova, size_t len);
> > +
> > +/**
> > + * Bus specific DMA unmap function.
> > + * After a successful call, the memory segment will no longer be
> > + * accessible by the given device.
> > + *
> > + * @param dev
> > + * Device pointer.
> > + * @param addr
> > + * Virtual address to unmap.
> > + * @param iova
> > + * IOVA address to unmap.
> > + * @param len
> > + * Length of the memory segment being mapped.
> > + *
> > + * @return
> > + * 0 if un-mapping was successful.
> > + * Negative value and rte_errno is set otherwise.
> > + */
> > +typedef int (*rte_bus_unmap_t)(struct rte_device *dev, void *addr,
> > + uint64_t iova, size_t len);
> > +
> > +/**
> > * Implement a specific hot-unplug handler, which is responsible for
> > * handle the failure when device be hot-unplugged. When the event of
> > * hot-unplug be detected, it could call this function to handle @@
> > -238,6 +280,8 @@ struct rte_bus {
> > rte_bus_plug_t plug; /**< Probe single device for drivers */
> > rte_bus_unplug_t unplug; /**< Remove single device from driver
> */
> > rte_bus_parse_t parse; /**< Parse a device name */
> > + rte_bus_map_t map; /**< DMA map for device in the bus */
> > + rte_bus_unmap_t unmap; /**< DMA unmap for device in the
> bus */
>
> Same as for the driver callbacks, dma_map and dma_unmap seem a better
> fit for the field names.
OK
>
> > struct rte_bus_conf conf; /**< Bus configuration */
> > rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu
> class */
> > rte_dev_iterate_t dev_iterate; /**< Device iterator. */ @@ -356,6
> > +400,19 @@ struct rte_bus *rte_bus_find_by_name(const char
> *busname);
> > enum rte_iova_mode rte_bus_get_iommu_class(void);
> >
> > /**
> > + * Wrapper to call the bus specific DMA map function.
> > + */
> > +int __rte_experimental
> > +rte_bus_dma_map(struct rte_device *dev, void *addr, uint64_t iova,
> > +size_t len);
> > +
> > +/**
> > + * Wrapper to call the bus specific DMA unmap function.
> > + */
> > +int __rte_experimental
> > +rte_bus_dma_unmap(struct rte_device *dev, void *addr, uint64_t iova,
> > + size_t len);
> > +
> > +/**
>
> Same as earlier -> these seem device-level functions, not bus-related.
> You won't map those addresses to all devices on the bus.
>
> > * Helper for Bus registration.
> > * The constructor has higher priority than PMD constructors.
> > */
> > diff --git a/lib/librte_eal/rte_eal_version.map
> > b/lib/librte_eal/rte_eal_version.map
> > index eb5f7b9cbd..23f3adb73a 100644
> > --- a/lib/librte_eal/rte_eal_version.map
> > +++ b/lib/librte_eal/rte_eal_version.map
> > @@ -364,4 +364,6 @@ EXPERIMENTAL {
> > rte_service_may_be_active;
> > rte_socket_count;
> > rte_socket_id_by_idx;
> > + rte_bus_dma_map;
> > + rte_bus_dma_unmap;
> > };
> > --
> > 2.12.0
> >
>
> --
> Gaëtan Rivet
> 6WIND
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH 3/6] bus: introduce DMA memory mapping for external memory
2019-02-13 19:07 ` Shahaf Shuler
@ 2019-02-14 14:00 ` Gaëtan Rivet
2019-02-17 6:23 ` Shahaf Shuler
0 siblings, 1 reply; 79+ messages in thread
From: Gaëtan Rivet @ 2019-02-14 14:00 UTC (permalink / raw)
To: Shahaf Shuler
Cc: anatoly.burakov, Yongseok Koh, Thomas Monjalon, ferruh.yigit,
nhorman, dev
On Wed, Feb 13, 2019 at 07:07:11PM +0000, Shahaf Shuler wrote:
> Wednesday, February 13, 2019 1:17 PM, Gaëtan Rivet:
> > Subject: Re: [PATCH 3/6] bus: introduce DMA memory mapping for external
> > memory
> >
> > On Wed, Feb 13, 2019 at 11:10:23AM +0200, Shahaf Shuler wrote:
> > > The DPDK APIs expose 3 different modes to work with memory used for
> > DMA:
> > >
> > > 1. Use the DPDK owned memory (backed by the DPDK provided
> > hugepages).
> > > This memory is allocated by the DPDK libraries, included in the DPDK
> > > memory system (memseg lists) and automatically DMA mapped by the
> > DPDK
> > > layers.
> > >
> > > 2. Use memory allocated by the user and register to the DPDK memory
> > > systems. This is also referred as external memory. Upon registration
> > > of the external memory, the DPDK layers will DMA map it to all needed
> > > devices.
> > >
> > > 3. Use memory allocated by the user and not registered to the DPDK
> > > memory system. This is for users who wants to have tight control on
> > > this memory. The user will need to explicitly call DMA map function in
> > > order to register such memory to the different devices.
> > >
> > > The scope of the patch focus on #3 above.
> > >
> > > Currently the only way to map external memory is through VFIO
> > > (rte_vfio_dma_map). While VFIO is common, there are other vendors
> > > which use different ways to map memory (e.g. Mellanox and NXP).
> > >
> >
> > How are those other vendors' devices mapped initially right now? Are they
> > using #2 scheme instead? Then the user will remap everything using #3?
>
> It is not a re-map, it is a completely different mode for the memory management.
> The first question to ask is "how the application wants to manage its memory" ?
> If it is either #1 or #2 above, no problem to make the mapping internal on the "other vendor devices" as they can register to the memory event callback which trigger every time new memory is added to the DPDK memory management system.
> For #3 the memory does not exists in the DPDK memory management system, and no memory events. Hence the application needs to explicitly call the dma MAP.
> The change on this patch is just to make it more generic than calling only VFIO.
>
Right! I mostly used #1 ports and never really thought about other kind
of memory management or how they might follow a different logic.
Do you think this could be used with a lot of sequential
mapping/unmappings happening?
I'm thinking for example about a crypto app feeding crypto buffers,
being able to directly map the result instead of copying it within
buffers might be interesting. But then you'd have to unmap often.
- Is the unmap() simple from the app PoV?
- Must the mapping remain available for a long time?
- Does the app need to call tx_descriptor_status() a few times or
does dma_unmap() verify that the mapping is not in use before unmapping?
--
Gaëtan Rivet
6WIND
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH 3/6] bus: introduce DMA memory mapping for external memory
2019-02-14 14:00 ` Gaëtan Rivet
@ 2019-02-17 6:23 ` Shahaf Shuler
0 siblings, 0 replies; 79+ messages in thread
From: Shahaf Shuler @ 2019-02-17 6:23 UTC (permalink / raw)
To: Gaëtan Rivet
Cc: anatoly.burakov, Yongseok Koh, Thomas Monjalon, ferruh.yigit,
nhorman, dev
Thursday, February 14, 2019 4:01 PM, Gaëtan Rivet:
> Subject: Re: [PATCH 3/6] bus: introduce DMA memory mapping for external
> memory
>
> On Wed, Feb 13, 2019 at 07:07:11PM +0000, Shahaf Shuler wrote:
> > Wednesday, February 13, 2019 1:17 PM, Gaëtan Rivet:
> > > Subject: Re: [PATCH 3/6] bus: introduce DMA memory mapping for
> > > external memory
[...]
> > >
> > > How are those other vendors' devices mapped initially right now? Are
> > > they using #2 scheme instead? Then the user will remap everything using
> #3?
> >
> > It is not a re-map, it is a completely different mode for the memory
> management.
> > The first question to ask is "how the application wants to manage its
> memory" ?
> > If it is either #1 or #2 above, no problem to make the mapping internal on
> the "other vendor devices" as they can register to the memory event
> callback which trigger every time new memory is added to the DPDK memory
> management system.
> > For #3 the memory does not exists in the DPDK memory management
> system, and no memory events. Hence the application needs to explicitly call
> the dma MAP.
> > The change on this patch is just to make it more generic than calling only
> VFIO.
> >
>
> Right! I mostly used #1 ports and never really thought about other kind of
> memory management or how they might follow a different logic.
>
> Do you think this could be used with a lot of sequential mapping/unmappings
> happening?
It much depends how efficient is the driver mapping and unmapping.
In most cases, mapping is heavy operation.
>
> I'm thinking for example about a crypto app feeding crypto buffers, being
> able to directly map the result instead of copying it within buffers might be
> interesting. But then you'd have to unmap often.
>
> - Is the unmap() simple from the app PoV?
Yes, just call rte_bus_dma_unmap.
>
> - Must the mapping remain available for a long time?
It must remain as long as you need the device to access the memory. On your example, it should remain till the crypto dev finished writing the buffers.
>
> - Does the app need to call tx_descriptor_status() a few times or
> does dma_unmap() verify that the mapping is not in use before
> unmapping?
I think it is a matter of driver implementation.
In general, it is application responsibly to make sure the memory is no longer needed before unmapping, just like I don't destroy today mempool being used by some rxq. It can be done by any means the application has, not only tx_descriptor_status.
Driver can protect bad application and warn + fail the call on such case, however it is not a must.
>
> --
> Gaëtan Rivet
> 6WIND
^ permalink raw reply [flat|nested] 79+ messages in thread
* [dpdk-dev] [PATCH 4/6] net/mlx5: refactor external memory registration
2019-02-13 9:10 [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for external memory Shahaf Shuler
` (2 preceding siblings ...)
2019-02-13 9:10 ` [dpdk-dev] [PATCH 3/6] bus: introduce DMA memory mapping for external memory Shahaf Shuler
@ 2019-02-13 9:10 ` Shahaf Shuler
2019-02-13 9:10 ` [dpdk-dev] [PATCH 5/6] net/mlx5: support PCI device DMA map and unmap Shahaf Shuler
` (9 subsequent siblings)
13 siblings, 0 replies; 79+ messages in thread
From: Shahaf Shuler @ 2019-02-13 9:10 UTC (permalink / raw)
To: anatoly.burakov, yskoh, thomas, ferruh.yigit, nhorman, gaetan.rivet; +Cc: dev
Move the memory region creation to a separate function to
prepare the ground for the reuse of it on the PCI driver map and unmap
functions.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
---
drivers/net/mlx5/mlx5_mr.c | 86 +++++++++++++++++++++++++++--------------
1 file changed, 57 insertions(+), 29 deletions(-)
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 442b2d2321..32be6a5445 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -1109,6 +1109,58 @@ mlx5_mr_flush_local_cache(struct mlx5_mr_ctrl *mr_ctrl)
}
/**
+ * Creates a memory region for external memory, that is memory which is not
+ * part of the DPDK memory segments.
+ *
+ * @param dev
+ * Pointer to the ethernet device.
+ * @param addr
+ * Starting virtual address of memory.
+ * @param len
+ * Length of memory segment being mapped.
+ * @param socked_id
+ * Socket to allocate heap memory for the control structures.
+ *
+ * @return
+ * Pointer to MR structure on success, NULL otherwise.
+ */
+static struct mlx5_mr *
+mlx5_create_mr_ext(struct rte_eth_dev *dev, uintptr_t addr, size_t len,
+ int socket_id)
+{
+ struct priv *priv = dev->data->dev_private;
+ struct mlx5_mr *mr = NULL;
+
+ mr = rte_zmalloc_socket(NULL,
+ RTE_ALIGN_CEIL(sizeof(*mr),
+ RTE_CACHE_LINE_SIZE),
+ RTE_CACHE_LINE_SIZE, socket_id);
+ if (mr == NULL)
+ return NULL;
+ mr->ibv_mr = mlx5_glue->reg_mr(priv->pd, (void *)addr, len,
+ IBV_ACCESS_LOCAL_WRITE);
+ if (mr->ibv_mr == NULL) {
+ DRV_LOG(WARNING,
+ "port %u fail to create a verbs MR for address (%p)",
+ dev->data->port_id, (void *)addr);
+ rte_free(mr);
+ return NULL;
+ }
+ mr->msl = NULL; /* Mark it is external memory. */
+ mr->ms_bmp = NULL;
+ mr->ms_n = 1;
+ mr->ms_bmp_n = 1;
+ DRV_LOG(DEBUG,
+ "port %u MR CREATED (%p) for external memory %p:\n"
+ " [0x%" PRIxPTR ", 0x%" PRIxPTR "),"
+ " lkey=0x%x base_idx=%u ms_n=%u, ms_bmp_n=%u",
+ dev->data->port_id, (void *)mr, (void *)addr,
+ addr, addr + len, rte_cpu_to_be_32(mr->ibv_mr->lkey),
+ mr->ms_base_idx, mr->ms_n, mr->ms_bmp_n);
+ return mr;
+}
+
+/**
* Called during rte_mempool_mem_iter() by mlx5_mr_update_ext_mp().
*
* Externally allocated chunk is registered and a MR is created for the chunk.
@@ -1142,43 +1194,19 @@ mlx5_mr_update_ext_mp_cb(struct rte_mempool *mp, void *opaque,
rte_rwlock_read_unlock(&priv->mr.rwlock);
if (lkey != UINT32_MAX)
return;
- mr = rte_zmalloc_socket(NULL,
- RTE_ALIGN_CEIL(sizeof(*mr),
- RTE_CACHE_LINE_SIZE),
- RTE_CACHE_LINE_SIZE, mp->socket_id);
- if (mr == NULL) {
- DRV_LOG(WARNING,
- "port %u unable to allocate memory for a new MR of"
- " mempool (%s).",
- dev->data->port_id, mp->name);
- data->ret = -1;
- return;
- }
DRV_LOG(DEBUG, "port %u register MR for chunk #%d of mempool (%s)",
dev->data->port_id, mem_idx, mp->name);
- mr->ibv_mr = mlx5_glue->reg_mr(priv->pd, (void *)addr, len,
- IBV_ACCESS_LOCAL_WRITE);
- if (mr->ibv_mr == NULL) {
+ mr = mlx5_create_mr_ext(dev, addr, len, mp->socket_id);
+ if (!mr) {
DRV_LOG(WARNING,
- "port %u fail to create a verbs MR for address (%p)",
- dev->data->port_id, (void *)addr);
- rte_free(mr);
+ "port %u unable to allocate a new MR of"
+ " mempool (%s).",
+ dev->data->port_id, mp->name);
data->ret = -1;
return;
}
- mr->msl = NULL; /* Mark it is external memory. */
- mr->ms_bmp = NULL;
- mr->ms_n = 1;
- mr->ms_bmp_n = 1;
rte_rwlock_write_lock(&priv->mr.rwlock);
LIST_INSERT_HEAD(&priv->mr.mr_list, mr, mr);
- DRV_LOG(DEBUG,
- "port %u MR CREATED (%p) for external memory %p:\n"
- " [0x%" PRIxPTR ", 0x%" PRIxPTR "),"
- " lkey=0x%x base_idx=%u ms_n=%u, ms_bmp_n=%u",
- dev->data->port_id, (void *)mr, (void *)addr,
- addr, addr + len, rte_cpu_to_be_32(mr->ibv_mr->lkey),
- mr->ms_base_idx, mr->ms_n, mr->ms_bmp_n);
/* Insert to the global cache table. */
mr_insert_dev_cache(dev, mr);
rte_rwlock_write_unlock(&priv->mr.rwlock);
--
2.12.0
^ permalink raw reply [flat|nested] 79+ messages in thread
* [dpdk-dev] [PATCH 5/6] net/mlx5: support PCI device DMA map and unmap
2019-02-13 9:10 [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for external memory Shahaf Shuler
` (3 preceding siblings ...)
2019-02-13 9:10 ` [dpdk-dev] [PATCH 4/6] net/mlx5: refactor external memory registration Shahaf Shuler
@ 2019-02-13 9:10 ` Shahaf Shuler
2019-02-13 11:35 ` Gaëtan Rivet
2019-02-13 9:10 ` [dpdk-dev] [PATCH 6/6] doc: deprecate VFIO DMA map APIs Shahaf Shuler
` (8 subsequent siblings)
13 siblings, 1 reply; 79+ messages in thread
From: Shahaf Shuler @ 2019-02-13 9:10 UTC (permalink / raw)
To: anatoly.burakov, yskoh, thomas, ferruh.yigit, nhorman, gaetan.rivet; +Cc: dev
The implementation reuses the external memory registration work done by
commit[1].
Note about representors:
The current representor design will not work
with those map and unmap functions. The reason is that for representors
we have multiple IB devices share the same PCI function, so mapping will
happen only on one of the representors and not all of them.
While it is possible to implement such support, the IB representor
design is going to be changed during DPDK19.05. The new design will have
a single IB device for all representors, hence sharing of a single
memory region between all representors will be possible.
[1]
commit 7e43a32ee060
("net/mlx5: support externally allocated static memory")
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
---
drivers/net/mlx5/mlx5.c | 2 +
drivers/net/mlx5/mlx5_mr.c | 146 ++++++++++++++++++++++++++++++++++++++
drivers/net/mlx5/mlx5_rxtx.h | 5 ++
3 files changed, 153 insertions(+)
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index a913a5955f..7c91701713 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1626,6 +1626,8 @@ static struct rte_pci_driver mlx5_driver = {
.id_table = mlx5_pci_id_map,
.probe = mlx5_pci_probe,
.remove = mlx5_pci_remove,
+ .map = mlx5_dma_map,
+ .unmap = mlx5_dma_unmap,
.drv_flags = (RTE_PCI_DRV_INTR_LSC | RTE_PCI_DRV_INTR_RMV |
RTE_PCI_DRV_PROBE_AGAIN),
};
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 32be6a5445..7059181959 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -14,6 +14,7 @@
#include <rte_mempool.h>
#include <rte_malloc.h>
#include <rte_rwlock.h>
+#include <rte_bus_pci.h>
#include "mlx5.h"
#include "mlx5_mr.h"
@@ -1215,6 +1216,151 @@ mlx5_mr_update_ext_mp_cb(struct rte_mempool *mp, void *opaque,
}
/**
+ * Finds the first ethdev that match the pci device.
+ * The existence of multiple ethdev per pci device is only with representors.
+ * On such case, it is enough to get only one of the ports as they all share
+ * the same ibv context.
+ *
+ * @param pdev
+ * Pointer to the PCI device.
+ *
+ * @return
+ * Pointer to the ethdev if found, NULL otherwise.
+ */
+static struct rte_eth_dev *
+pci_dev_to_eth_dev(struct rte_pci_device *pdev)
+{
+ struct rte_eth_dev *dev;
+ const char *drv_name;
+ uint16_t port_id = 0;
+
+ /**
+ * We really need to iterate all eth devices regardless of
+ * their owner.
+ */
+ while (port_id < RTE_MAX_ETHPORTS) {
+ port_id = rte_eth_find_next(port_id);
+ if (port_id >= RTE_MAX_ETHPORTS)
+ break;
+ dev = &rte_eth_devices[port_id];
+ drv_name = dev->device->driver->name;
+ if (!strncmp(drv_name, MLX5_DRIVER_NAME,
+ sizeof(MLX5_DRIVER_NAME) + 1) &&
+ pdev == RTE_DEV_TO_PCI(dev->device)) {
+ /* found the PCI device. */
+ return dev;
+ }
+ }
+ return NULL;
+}
+
+/**
+ * DPDK callback to DMA map external memory to a PCI device.
+ *
+ * @param pdev
+ * Pointer to the PCI device.
+ * @param addr
+ * Starting virtual address of memory to be mapped.
+ * @param iova
+ * Starting IOVA address of memory to be mapped.
+ * @param len
+ * Length of memory segment being mapped.
+ *
+ * @return
+ * 0 on success, negative value on error.
+ */
+int
+mlx5_dma_map(struct rte_pci_device *pdev, void *addr,
+ uint64_t iova __rte_unused, size_t len)
+{
+ struct rte_eth_dev *dev;
+ struct mlx5_mr *mr;
+ struct priv *priv;
+
+ dev = pci_dev_to_eth_dev(pdev);
+ if (!dev) {
+ DRV_LOG(WARNING, "unable to find matching ethdev "
+ "to PCI device %p", (void *)pdev);
+ return -1;
+ }
+ priv = dev->data->dev_private;
+ mr = mlx5_create_mr_ext(dev, (uintptr_t)addr, len, SOCKET_ID_ANY);
+ if (!mr) {
+ DRV_LOG(WARNING,
+ "port %u unable to dma map", dev->data->port_id);
+ return -1;
+ }
+ rte_rwlock_write_lock(&priv->mr.rwlock);
+ LIST_INSERT_HEAD(&priv->mr.mr_list, mr, mr);
+ /* Insert to the global cache table. */
+ mr_insert_dev_cache(dev, mr);
+ rte_rwlock_write_unlock(&priv->mr.rwlock);
+ return 0;
+}
+
+/**
+ * DPDK callback to DMA unmap external memory to a PCI device.
+ *
+ * @param pdev
+ * Pointer to the PCI device.
+ * @param addr
+ * Starting virtual address of memory to be unmapped.
+ * @param iova
+ * Starting IOVA address of memory to be unmapped.
+ * @param len
+ * Length of memory segment being unmapped.
+ *
+ * @return
+ * 0 on success, negative value on error.
+ */
+int
+mlx5_dma_unmap(struct rte_pci_device *pdev, void *addr,
+ uint64_t iova __rte_unused, size_t len __rte_unused)
+{
+ struct rte_eth_dev *dev;
+ struct priv *priv;
+ struct mlx5_mr *mr;
+ struct mlx5_mr_cache entry;
+
+ dev = pci_dev_to_eth_dev(pdev);
+ if (!dev) {
+ DRV_LOG(WARNING, "unable to find matching ethdev "
+ "to PCI device %p", (void *)pdev);
+ return -1;
+ }
+ priv = dev->data->dev_private;
+ rte_rwlock_read_lock(&priv->mr.rwlock);
+ mr = mr_lookup_dev_list(dev, &entry, (uintptr_t)addr);
+ if (!mr) {
+ DRV_LOG(WARNING, "address 0x%" PRIxPTR " wasn't registered "
+ "to PCI device %p", (uintptr_t)addr,
+ (void *)pdev);
+ rte_rwlock_read_unlock(&priv->mr.rwlock);
+ return -1;
+ }
+ LIST_REMOVE(mr, mr);
+ LIST_INSERT_HEAD(&priv->mr.mr_free_list, mr, mr);
+ DEBUG("port %u remove MR(%p) from list", dev->data->port_id,
+ (void *)mr);
+ mr_rebuild_dev_cache(dev);
+ /*
+ * Flush local caches by propagating invalidation across cores.
+ * rte_smp_wmb() is enough to synchronize this event. If one of
+ * freed memsegs is seen by other core, that means the memseg
+ * has been allocated by allocator, which will come after this
+ * free call. Therefore, this store instruction (incrementing
+ * generation below) will be guaranteed to be seen by other core
+ * before the core sees the newly allocated memory.
+ */
+ ++priv->mr.dev_gen;
+ DEBUG("broadcasting local cache flush, gen=%d",
+ priv->mr.dev_gen);
+ rte_smp_wmb();
+ rte_rwlock_read_unlock(&priv->mr.rwlock);
+ return 0;
+}
+
+/**
* Register MR for entire memory chunks in a Mempool having externally allocated
* memory and fill in local cache.
*
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index c2529f96bc..f3f84dbac3 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -28,6 +28,7 @@
#include <rte_atomic.h>
#include <rte_spinlock.h>
#include <rte_io.h>
+#include <rte_bus_pci.h>
#include "mlx5_utils.h"
#include "mlx5.h"
@@ -367,6 +368,10 @@ uint32_t mlx5_rx_addr2mr_bh(struct mlx5_rxq_data *rxq, uintptr_t addr);
uint32_t mlx5_tx_mb2mr_bh(struct mlx5_txq_data *txq, struct rte_mbuf *mb);
uint32_t mlx5_tx_update_ext_mp(struct mlx5_txq_data *txq, uintptr_t addr,
struct rte_mempool *mp);
+int mlx5_dma_map(struct rte_pci_device *pdev, void *addr, uint64_t iova,
+ size_t len);
+int mlx5_dma_unmap(struct rte_pci_device *pdev, void *addr, uint64_t iova,
+ size_t len);
/**
* Provide safe 64bit store operation to mlx5 UAR region for both 32bit and
--
2.12.0
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH 5/6] net/mlx5: support PCI device DMA map and unmap
2019-02-13 9:10 ` [dpdk-dev] [PATCH 5/6] net/mlx5: support PCI device DMA map and unmap Shahaf Shuler
@ 2019-02-13 11:35 ` Gaëtan Rivet
2019-02-13 11:44 ` Gaëtan Rivet
0 siblings, 1 reply; 79+ messages in thread
From: Gaëtan Rivet @ 2019-02-13 11:35 UTC (permalink / raw)
To: Shahaf Shuler; +Cc: anatoly.burakov, yskoh, thomas, ferruh.yigit, nhorman, dev
On Wed, Feb 13, 2019 at 11:10:25AM +0200, Shahaf Shuler wrote:
> The implementation reuses the external memory registration work done by
> commit[1].
>
> Note about representors:
>
> The current representor design will not work
> with those map and unmap functions. The reason is that for representors
> we have multiple IB devices share the same PCI function, so mapping will
> happen only on one of the representors and not all of them.
>
> While it is possible to implement such support, the IB representor
> design is going to be changed during DPDK19.05. The new design will have
> a single IB device for all representors, hence sharing of a single
> memory region between all representors will be possible.
>
> [1]
> commit 7e43a32ee060
> ("net/mlx5: support externally allocated static memory")
>
> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> ---
> drivers/net/mlx5/mlx5.c | 2 +
> drivers/net/mlx5/mlx5_mr.c | 146 ++++++++++++++++++++++++++++++++++++++
> drivers/net/mlx5/mlx5_rxtx.h | 5 ++
> 3 files changed, 153 insertions(+)
>
> diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
> index a913a5955f..7c91701713 100644
> --- a/drivers/net/mlx5/mlx5.c
> +++ b/drivers/net/mlx5/mlx5.c
> @@ -1626,6 +1626,8 @@ static struct rte_pci_driver mlx5_driver = {
> .id_table = mlx5_pci_id_map,
> .probe = mlx5_pci_probe,
> .remove = mlx5_pci_remove,
> + .map = mlx5_dma_map,
> + .unmap = mlx5_dma_unmap,
> .drv_flags = (RTE_PCI_DRV_INTR_LSC | RTE_PCI_DRV_INTR_RMV |
> RTE_PCI_DRV_PROBE_AGAIN),
> };
> diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
> index 32be6a5445..7059181959 100644
> --- a/drivers/net/mlx5/mlx5_mr.c
> +++ b/drivers/net/mlx5/mlx5_mr.c
> @@ -14,6 +14,7 @@
> #include <rte_mempool.h>
> #include <rte_malloc.h>
> #include <rte_rwlock.h>
> +#include <rte_bus_pci.h>
>
> #include "mlx5.h"
> #include "mlx5_mr.h"
> @@ -1215,6 +1216,151 @@ mlx5_mr_update_ext_mp_cb(struct rte_mempool *mp, void *opaque,
> }
>
> /**
> + * Finds the first ethdev that match the pci device.
> + * The existence of multiple ethdev per pci device is only with representors.
> + * On such case, it is enough to get only one of the ports as they all share
> + * the same ibv context.
> + *
> + * @param pdev
> + * Pointer to the PCI device.
> + *
> + * @return
> + * Pointer to the ethdev if found, NULL otherwise.
> + */
> +static struct rte_eth_dev *
> +pci_dev_to_eth_dev(struct rte_pci_device *pdev)
> +{
> + struct rte_eth_dev *dev;
> + const char *drv_name;
> + uint16_t port_id = 0;
> +
> + /**
> + * We really need to iterate all eth devices regardless of
> + * their owner.
> + */
> + while (port_id < RTE_MAX_ETHPORTS) {
> + port_id = rte_eth_find_next(port_id);
> + if (port_id >= RTE_MAX_ETHPORTS)
> + break;
> + dev = &rte_eth_devices[port_id];
> + drv_name = dev->device->driver->name;
> + if (!strncmp(drv_name, MLX5_DRIVER_NAME,
> + sizeof(MLX5_DRIVER_NAME) + 1) &&
> + pdev == RTE_DEV_TO_PCI(dev->device)) {
> + /* found the PCI device. */
> + return dev;
> + }
> + }
> + return NULL;
> +}
Might I interest you in the new API?
{
struct rte_dev_iterator it;
struct rte_device *dev;
RTE_DEV_FOREACH(dev, "class=eth", &it)
if (dev == &pdev->device)
return it.class_device;
return NULL;
}
> +
> +/**
> + * DPDK callback to DMA map external memory to a PCI device.
> + *
> + * @param pdev
> + * Pointer to the PCI device.
> + * @param addr
> + * Starting virtual address of memory to be mapped.
> + * @param iova
> + * Starting IOVA address of memory to be mapped.
> + * @param len
> + * Length of memory segment being mapped.
> + *
> + * @return
> + * 0 on success, negative value on error.
> + */
> +int
> +mlx5_dma_map(struct rte_pci_device *pdev, void *addr,
> + uint64_t iova __rte_unused, size_t len)
> +{
> + struct rte_eth_dev *dev;
> + struct mlx5_mr *mr;
> + struct priv *priv;
> +
> + dev = pci_dev_to_eth_dev(pdev);
> + if (!dev) {
> + DRV_LOG(WARNING, "unable to find matching ethdev "
> + "to PCI device %p", (void *)pdev);
> + return -1;
> + }
> + priv = dev->data->dev_private;
> + mr = mlx5_create_mr_ext(dev, (uintptr_t)addr, len, SOCKET_ID_ANY);
> + if (!mr) {
> + DRV_LOG(WARNING,
> + "port %u unable to dma map", dev->data->port_id);
> + return -1;
> + }
> + rte_rwlock_write_lock(&priv->mr.rwlock);
> + LIST_INSERT_HEAD(&priv->mr.mr_list, mr, mr);
> + /* Insert to the global cache table. */
> + mr_insert_dev_cache(dev, mr);
> + rte_rwlock_write_unlock(&priv->mr.rwlock);
> + return 0;
> +}
> +
> +/**
> + * DPDK callback to DMA unmap external memory to a PCI device.
> + *
> + * @param pdev
> + * Pointer to the PCI device.
> + * @param addr
> + * Starting virtual address of memory to be unmapped.
> + * @param iova
> + * Starting IOVA address of memory to be unmapped.
> + * @param len
> + * Length of memory segment being unmapped.
> + *
> + * @return
> + * 0 on success, negative value on error.
> + */
> +int
> +mlx5_dma_unmap(struct rte_pci_device *pdev, void *addr,
> + uint64_t iova __rte_unused, size_t len __rte_unused)
> +{
> + struct rte_eth_dev *dev;
> + struct priv *priv;
> + struct mlx5_mr *mr;
> + struct mlx5_mr_cache entry;
> +
> + dev = pci_dev_to_eth_dev(pdev);
> + if (!dev) {
> + DRV_LOG(WARNING, "unable to find matching ethdev "
> + "to PCI device %p", (void *)pdev);
> + return -1;
> + }
> + priv = dev->data->dev_private;
> + rte_rwlock_read_lock(&priv->mr.rwlock);
> + mr = mr_lookup_dev_list(dev, &entry, (uintptr_t)addr);
> + if (!mr) {
> + DRV_LOG(WARNING, "address 0x%" PRIxPTR " wasn't registered "
> + "to PCI device %p", (uintptr_t)addr,
> + (void *)pdev);
> + rte_rwlock_read_unlock(&priv->mr.rwlock);
> + return -1;
> + }
> + LIST_REMOVE(mr, mr);
> + LIST_INSERT_HEAD(&priv->mr.mr_free_list, mr, mr);
> + DEBUG("port %u remove MR(%p) from list", dev->data->port_id,
> + (void *)mr);
> + mr_rebuild_dev_cache(dev);
> + /*
> + * Flush local caches by propagating invalidation across cores.
> + * rte_smp_wmb() is enough to synchronize this event. If one of
> + * freed memsegs is seen by other core, that means the memseg
> + * has been allocated by allocator, which will come after this
> + * free call. Therefore, this store instruction (incrementing
> + * generation below) will be guaranteed to be seen by other core
> + * before the core sees the newly allocated memory.
> + */
> + ++priv->mr.dev_gen;
> + DEBUG("broadcasting local cache flush, gen=%d",
> + priv->mr.dev_gen);
> + rte_smp_wmb();
> + rte_rwlock_read_unlock(&priv->mr.rwlock);
> + return 0;
> +}
> +
> +/**
> * Register MR for entire memory chunks in a Mempool having externally allocated
> * memory and fill in local cache.
> *
> diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
> index c2529f96bc..f3f84dbac3 100644
> --- a/drivers/net/mlx5/mlx5_rxtx.h
> +++ b/drivers/net/mlx5/mlx5_rxtx.h
> @@ -28,6 +28,7 @@
> #include <rte_atomic.h>
> #include <rte_spinlock.h>
> #include <rte_io.h>
> +#include <rte_bus_pci.h>
>
> #include "mlx5_utils.h"
> #include "mlx5.h"
> @@ -367,6 +368,10 @@ uint32_t mlx5_rx_addr2mr_bh(struct mlx5_rxq_data *rxq, uintptr_t addr);
> uint32_t mlx5_tx_mb2mr_bh(struct mlx5_txq_data *txq, struct rte_mbuf *mb);
> uint32_t mlx5_tx_update_ext_mp(struct mlx5_txq_data *txq, uintptr_t addr,
> struct rte_mempool *mp);
> +int mlx5_dma_map(struct rte_pci_device *pdev, void *addr, uint64_t iova,
> + size_t len);
> +int mlx5_dma_unmap(struct rte_pci_device *pdev, void *addr, uint64_t iova,
> + size_t len);
>
> /**
> * Provide safe 64bit store operation to mlx5 UAR region for both 32bit and
> --
> 2.12.0
>
--
Gaëtan Rivet
6WIND
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH 5/6] net/mlx5: support PCI device DMA map and unmap
2019-02-13 11:35 ` Gaëtan Rivet
@ 2019-02-13 11:44 ` Gaëtan Rivet
2019-02-13 19:11 ` Shahaf Shuler
0 siblings, 1 reply; 79+ messages in thread
From: Gaëtan Rivet @ 2019-02-13 11:44 UTC (permalink / raw)
To: Shahaf Shuler; +Cc: anatoly.burakov, yskoh, thomas, ferruh.yigit, nhorman, dev
On Wed, Feb 13, 2019 at 12:35:04PM +0100, Gaëtan Rivet wrote:
> On Wed, Feb 13, 2019 at 11:10:25AM +0200, Shahaf Shuler wrote:
> > The implementation reuses the external memory registration work done by
> > commit[1].
> >
> > Note about representors:
> >
> > The current representor design will not work
> > with those map and unmap functions. The reason is that for representors
> > we have multiple IB devices share the same PCI function, so mapping will
> > happen only on one of the representors and not all of them.
> >
> > While it is possible to implement such support, the IB representor
> > design is going to be changed during DPDK19.05. The new design will have
> > a single IB device for all representors, hence sharing of a single
> > memory region between all representors will be possible.
> >
> > [1]
> > commit 7e43a32ee060
> > ("net/mlx5: support externally allocated static memory")
> >
> > Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> > ---
> > drivers/net/mlx5/mlx5.c | 2 +
> > drivers/net/mlx5/mlx5_mr.c | 146 ++++++++++++++++++++++++++++++++++++++
> > drivers/net/mlx5/mlx5_rxtx.h | 5 ++
> > 3 files changed, 153 insertions(+)
> >
> > diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
> > index a913a5955f..7c91701713 100644
> > --- a/drivers/net/mlx5/mlx5.c
> > +++ b/drivers/net/mlx5/mlx5.c
> > @@ -1626,6 +1626,8 @@ static struct rte_pci_driver mlx5_driver = {
> > .id_table = mlx5_pci_id_map,
> > .probe = mlx5_pci_probe,
> > .remove = mlx5_pci_remove,
> > + .map = mlx5_dma_map,
> > + .unmap = mlx5_dma_unmap,
> > .drv_flags = (RTE_PCI_DRV_INTR_LSC | RTE_PCI_DRV_INTR_RMV |
> > RTE_PCI_DRV_PROBE_AGAIN),
> > };
> > diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
> > index 32be6a5445..7059181959 100644
> > --- a/drivers/net/mlx5/mlx5_mr.c
> > +++ b/drivers/net/mlx5/mlx5_mr.c
> > @@ -14,6 +14,7 @@
> > #include <rte_mempool.h>
> > #include <rte_malloc.h>
> > #include <rte_rwlock.h>
> > +#include <rte_bus_pci.h>
> >
> > #include "mlx5.h"
> > #include "mlx5_mr.h"
> > @@ -1215,6 +1216,151 @@ mlx5_mr_update_ext_mp_cb(struct rte_mempool *mp, void *opaque,
> > }
> >
> > /**
> > + * Finds the first ethdev that match the pci device.
> > + * The existence of multiple ethdev per pci device is only with representors.
> > + * On such case, it is enough to get only one of the ports as they all share
> > + * the same ibv context.
> > + *
> > + * @param pdev
> > + * Pointer to the PCI device.
> > + *
> > + * @return
> > + * Pointer to the ethdev if found, NULL otherwise.
> > + */
> > +static struct rte_eth_dev *
> > +pci_dev_to_eth_dev(struct rte_pci_device *pdev)
> > +{
> > + struct rte_eth_dev *dev;
> > + const char *drv_name;
> > + uint16_t port_id = 0;
> > +
> > + /**
> > + * We really need to iterate all eth devices regardless of
> > + * their owner.
> > + */
> > + while (port_id < RTE_MAX_ETHPORTS) {
> > + port_id = rte_eth_find_next(port_id);
> > + if (port_id >= RTE_MAX_ETHPORTS)
> > + break;
> > + dev = &rte_eth_devices[port_id];
> > + drv_name = dev->device->driver->name;
> > + if (!strncmp(drv_name, MLX5_DRIVER_NAME,
> > + sizeof(MLX5_DRIVER_NAME) + 1) &&
> > + pdev == RTE_DEV_TO_PCI(dev->device)) {
> > + /* found the PCI device. */
> > + return dev;
> > + }
> > + }
> > + return NULL;
> > +}
>
> Might I interest you in the new API?
>
> {
> struct rte_dev_iterator it;
> struct rte_device *dev;
>
> RTE_DEV_FOREACH(dev, "class=eth", &it)
> if (dev == &pdev->device)
> return it.class_device;
> return NULL;
> }
>
On that note, this could be in the PCI bus instead?
> > +
> > +/**
> > + * DPDK callback to DMA map external memory to a PCI device.
> > + *
> > + * @param pdev
> > + * Pointer to the PCI device.
> > + * @param addr
> > + * Starting virtual address of memory to be mapped.
> > + * @param iova
> > + * Starting IOVA address of memory to be mapped.
> > + * @param len
> > + * Length of memory segment being mapped.
> > + *
> > + * @return
> > + * 0 on success, negative value on error.
> > + */
> > +int
> > +mlx5_dma_map(struct rte_pci_device *pdev, void *addr,
> > + uint64_t iova __rte_unused, size_t len)
> > +{
> > + struct rte_eth_dev *dev;
> > + struct mlx5_mr *mr;
> > + struct priv *priv;
> > +
> > + dev = pci_dev_to_eth_dev(pdev);
> > + if (!dev) {
> > + DRV_LOG(WARNING, "unable to find matching ethdev "
> > + "to PCI device %p", (void *)pdev);
> > + return -1;
> > + }
> > + priv = dev->data->dev_private;
> > + mr = mlx5_create_mr_ext(dev, (uintptr_t)addr, len, SOCKET_ID_ANY);
> > + if (!mr) {
> > + DRV_LOG(WARNING,
> > + "port %u unable to dma map", dev->data->port_id);
> > + return -1;
> > + }
> > + rte_rwlock_write_lock(&priv->mr.rwlock);
> > + LIST_INSERT_HEAD(&priv->mr.mr_list, mr, mr);
> > + /* Insert to the global cache table. */
> > + mr_insert_dev_cache(dev, mr);
> > + rte_rwlock_write_unlock(&priv->mr.rwlock);
> > + return 0;
> > +}
> > +
> > +/**
> > + * DPDK callback to DMA unmap external memory to a PCI device.
> > + *
> > + * @param pdev
> > + * Pointer to the PCI device.
> > + * @param addr
> > + * Starting virtual address of memory to be unmapped.
> > + * @param iova
> > + * Starting IOVA address of memory to be unmapped.
> > + * @param len
> > + * Length of memory segment being unmapped.
> > + *
> > + * @return
> > + * 0 on success, negative value on error.
> > + */
> > +int
> > +mlx5_dma_unmap(struct rte_pci_device *pdev, void *addr,
> > + uint64_t iova __rte_unused, size_t len __rte_unused)
> > +{
> > + struct rte_eth_dev *dev;
> > + struct priv *priv;
> > + struct mlx5_mr *mr;
> > + struct mlx5_mr_cache entry;
> > +
> > + dev = pci_dev_to_eth_dev(pdev);
> > + if (!dev) {
> > + DRV_LOG(WARNING, "unable to find matching ethdev "
> > + "to PCI device %p", (void *)pdev);
> > + return -1;
> > + }
> > + priv = dev->data->dev_private;
> > + rte_rwlock_read_lock(&priv->mr.rwlock);
> > + mr = mr_lookup_dev_list(dev, &entry, (uintptr_t)addr);
> > + if (!mr) {
> > + DRV_LOG(WARNING, "address 0x%" PRIxPTR " wasn't registered "
> > + "to PCI device %p", (uintptr_t)addr,
> > + (void *)pdev);
> > + rte_rwlock_read_unlock(&priv->mr.rwlock);
> > + return -1;
> > + }
> > + LIST_REMOVE(mr, mr);
> > + LIST_INSERT_HEAD(&priv->mr.mr_free_list, mr, mr);
> > + DEBUG("port %u remove MR(%p) from list", dev->data->port_id,
> > + (void *)mr);
> > + mr_rebuild_dev_cache(dev);
> > + /*
> > + * Flush local caches by propagating invalidation across cores.
> > + * rte_smp_wmb() is enough to synchronize this event. If one of
> > + * freed memsegs is seen by other core, that means the memseg
> > + * has been allocated by allocator, which will come after this
> > + * free call. Therefore, this store instruction (incrementing
> > + * generation below) will be guaranteed to be seen by other core
> > + * before the core sees the newly allocated memory.
> > + */
> > + ++priv->mr.dev_gen;
> > + DEBUG("broadcasting local cache flush, gen=%d",
> > + priv->mr.dev_gen);
> > + rte_smp_wmb();
> > + rte_rwlock_read_unlock(&priv->mr.rwlock);
> > + return 0;
> > +}
> > +
> > +/**
> > * Register MR for entire memory chunks in a Mempool having externally allocated
> > * memory and fill in local cache.
> > *
> > diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
> > index c2529f96bc..f3f84dbac3 100644
> > --- a/drivers/net/mlx5/mlx5_rxtx.h
> > +++ b/drivers/net/mlx5/mlx5_rxtx.h
> > @@ -28,6 +28,7 @@
> > #include <rte_atomic.h>
> > #include <rte_spinlock.h>
> > #include <rte_io.h>
> > +#include <rte_bus_pci.h>
> >
> > #include "mlx5_utils.h"
> > #include "mlx5.h"
> > @@ -367,6 +368,10 @@ uint32_t mlx5_rx_addr2mr_bh(struct mlx5_rxq_data *rxq, uintptr_t addr);
> > uint32_t mlx5_tx_mb2mr_bh(struct mlx5_txq_data *txq, struct rte_mbuf *mb);
> > uint32_t mlx5_tx_update_ext_mp(struct mlx5_txq_data *txq, uintptr_t addr,
> > struct rte_mempool *mp);
> > +int mlx5_dma_map(struct rte_pci_device *pdev, void *addr, uint64_t iova,
> > + size_t len);
> > +int mlx5_dma_unmap(struct rte_pci_device *pdev, void *addr, uint64_t iova,
> > + size_t len);
> >
> > /**
> > * Provide safe 64bit store operation to mlx5 UAR region for both 32bit and
> > --
> > 2.12.0
> >
>
> --
> Gaëtan Rivet
> 6WIND
--
Gaëtan Rivet
6WIND
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH 5/6] net/mlx5: support PCI device DMA map and unmap
2019-02-13 11:44 ` Gaëtan Rivet
@ 2019-02-13 19:11 ` Shahaf Shuler
2019-02-14 10:21 ` Gaëtan Rivet
0 siblings, 1 reply; 79+ messages in thread
From: Shahaf Shuler @ 2019-02-13 19:11 UTC (permalink / raw)
To: Gaëtan Rivet
Cc: anatoly.burakov, Yongseok Koh, Thomas Monjalon, ferruh.yigit,
nhorman, dev
Wednesday, February 13, 2019 1:44 PM, Gaëtan Rivet:
> Subject: Re: [PATCH 5/6] net/mlx5: support PCI device DMA map and unmap
>
> On Wed, Feb 13, 2019 at 12:35:04PM +0100, Gaëtan Rivet wrote:
> > On Wed, Feb 13, 2019 at 11:10:25AM +0200, Shahaf Shuler wrote:
> > > The implementation reuses the external memory registration work done
[..]
> > > +
> > > + /**
> > > + * We really need to iterate all eth devices regardless of
> > > + * their owner.
> > > + */
> > > + while (port_id < RTE_MAX_ETHPORTS) {
> > > + port_id = rte_eth_find_next(port_id);
> > > + if (port_id >= RTE_MAX_ETHPORTS)
> > > + break;
> > > + dev = &rte_eth_devices[port_id];
> > > + drv_name = dev->device->driver->name;
> > > + if (!strncmp(drv_name, MLX5_DRIVER_NAME,
> > > + sizeof(MLX5_DRIVER_NAME) + 1) &&
> > > + pdev == RTE_DEV_TO_PCI(dev->device)) {
> > > + /* found the PCI device. */
> > > + return dev;
> > > + }
> > > + }
> > > + return NULL;
> > > +}
> >
> > Might I interest you in the new API?
Good suggestion, will have a look on it in depth.
> >
> > {
> > struct rte_dev_iterator it;
> > struct rte_device *dev;
> >
> > RTE_DEV_FOREACH(dev, "class=eth", &it)
> > if (dev == &pdev->device)
> > return it.class_device;
> > return NULL;
> > }
> >
>
> On that note, this could be in the PCI bus instead?
We can put it on the PCI bus, but it would mean the PCI bus will not be device agnostic.
Currently, I couldn't find any reference to eth_dev on the PCI bus, besides a single macro which convert to pci device that doesn't really do type checks.
Having it in, would mean the PCI will need start to distinguish between ethdev, crypto dev and what ever devices exists on its bus.
>
> > > +
> > > +/**
> > > + * DPDK callback to DMA map external memory to a PCI device.
> > > + *
> > > + * @param pdev
> > > + * Pointer to the PCI device.
> > > + * @param addr
> > > + * Starting virtual address of memory to be mapped.
> > > + * @param iova
> > > + * Starting IOVA address of memory to be mapped.
> > > + * @param len
> > > + * Length of memory segment being mapped.
> > > + *
> > > + * @return
> > > + * 0 on success, negative value on error.
> > > + */
> > > +int
> > > +mlx5_dma_map(struct rte_pci_device *pdev, void *addr,
> > > + uint64_t iova __rte_unused, size_t len) {
> > > + struct rte_eth_dev *dev;
> > > + struct mlx5_mr *mr;
> > > + struct priv *priv;
> > > +
> > > + dev = pci_dev_to_eth_dev(pdev);
> > > + if (!dev) {
> > > + DRV_LOG(WARNING, "unable to find matching ethdev "
> > > + "to PCI device %p", (void *)pdev);
> > > + return -1;
> > > + }
> > > + priv = dev->data->dev_private;
> > > + mr = mlx5_create_mr_ext(dev, (uintptr_t)addr, len,
> SOCKET_ID_ANY);
> > > + if (!mr) {
> > > + DRV_LOG(WARNING,
> > > + "port %u unable to dma map", dev->data->port_id);
> > > + return -1;
> > > + }
> > > + rte_rwlock_write_lock(&priv->mr.rwlock);
> > > + LIST_INSERT_HEAD(&priv->mr.mr_list, mr, mr);
> > > + /* Insert to the global cache table. */
> > > + mr_insert_dev_cache(dev, mr);
> > > + rte_rwlock_write_unlock(&priv->mr.rwlock);
> > > + return 0;
> > > +}
> > > +
> > > +/**
> > > + * DPDK callback to DMA unmap external memory to a PCI device.
> > > + *
> > > + * @param pdev
> > > + * Pointer to the PCI device.
> > > + * @param addr
> > > + * Starting virtual address of memory to be unmapped.
> > > + * @param iova
> > > + * Starting IOVA address of memory to be unmapped.
> > > + * @param len
> > > + * Length of memory segment being unmapped.
> > > + *
> > > + * @return
> > > + * 0 on success, negative value on error.
> > > + */
> > > +int
> > > +mlx5_dma_unmap(struct rte_pci_device *pdev, void *addr,
> > > + uint64_t iova __rte_unused, size_t len __rte_unused) {
> > > + struct rte_eth_dev *dev;
> > > + struct priv *priv;
> > > + struct mlx5_mr *mr;
> > > + struct mlx5_mr_cache entry;
> > > +
> > > + dev = pci_dev_to_eth_dev(pdev);
> > > + if (!dev) {
> > > + DRV_LOG(WARNING, "unable to find matching ethdev "
> > > + "to PCI device %p", (void *)pdev);
> > > + return -1;
> > > + }
> > > + priv = dev->data->dev_private;
> > > + rte_rwlock_read_lock(&priv->mr.rwlock);
> > > + mr = mr_lookup_dev_list(dev, &entry, (uintptr_t)addr);
> > > + if (!mr) {
> > > + DRV_LOG(WARNING, "address 0x%" PRIxPTR " wasn't
> registered "
> > > + "to PCI device %p", (uintptr_t)addr,
> > > + (void *)pdev);
> > > + rte_rwlock_read_unlock(&priv->mr.rwlock);
> > > + return -1;
> > > + }
> > > + LIST_REMOVE(mr, mr);
> > > + LIST_INSERT_HEAD(&priv->mr.mr_free_list, mr, mr);
> > > + DEBUG("port %u remove MR(%p) from list", dev->data->port_id,
> > > + (void *)mr);
> > > + mr_rebuild_dev_cache(dev);
> > > + /*
> > > + * Flush local caches by propagating invalidation across cores.
> > > + * rte_smp_wmb() is enough to synchronize this event. If one of
> > > + * freed memsegs is seen by other core, that means the memseg
> > > + * has been allocated by allocator, which will come after this
> > > + * free call. Therefore, this store instruction (incrementing
> > > + * generation below) will be guaranteed to be seen by other core
> > > + * before the core sees the newly allocated memory.
> > > + */
> > > + ++priv->mr.dev_gen;
> > > + DEBUG("broadcasting local cache flush, gen=%d",
> > > + priv->mr.dev_gen);
> > > + rte_smp_wmb();
> > > + rte_rwlock_read_unlock(&priv->mr.rwlock);
> > > + return 0;
> > > +}
> > > +
> > > +/**
> > > * Register MR for entire memory chunks in a Mempool having externally
> allocated
> > > * memory and fill in local cache.
> > > *
> > > diff --git a/drivers/net/mlx5/mlx5_rxtx.h
> > > b/drivers/net/mlx5/mlx5_rxtx.h index c2529f96bc..f3f84dbac3 100644
> > > --- a/drivers/net/mlx5/mlx5_rxtx.h
> > > +++ b/drivers/net/mlx5/mlx5_rxtx.h
> > > @@ -28,6 +28,7 @@
> > > #include <rte_atomic.h>
> > > #include <rte_spinlock.h>
> > > #include <rte_io.h>
> > > +#include <rte_bus_pci.h>
> > >
> > > #include "mlx5_utils.h"
> > > #include "mlx5.h"
> > > @@ -367,6 +368,10 @@ uint32_t mlx5_rx_addr2mr_bh(struct
> > > mlx5_rxq_data *rxq, uintptr_t addr); uint32_t
> > > mlx5_tx_mb2mr_bh(struct mlx5_txq_data *txq, struct rte_mbuf *mb);
> uint32_t mlx5_tx_update_ext_mp(struct mlx5_txq_data *txq, uintptr_t
> addr,
> > > struct rte_mempool *mp);
> > > +int mlx5_dma_map(struct rte_pci_device *pdev, void *addr, uint64_t
> iova,
> > > + size_t len);
> > > +int mlx5_dma_unmap(struct rte_pci_device *pdev, void *addr, uint64_t
> iova,
> > > + size_t len);
> > >
> > > /**
> > > * Provide safe 64bit store operation to mlx5 UAR region for both
> > > 32bit and
> > > --
> > > 2.12.0
> > >
> >
> > --
> > Gaëtan Rivet
> > 6WIND
>
> --
> Gaëtan Rivet
> 6WIND
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH 5/6] net/mlx5: support PCI device DMA map and unmap
2019-02-13 19:11 ` Shahaf Shuler
@ 2019-02-14 10:21 ` Gaëtan Rivet
2019-02-21 9:21 ` Shahaf Shuler
0 siblings, 1 reply; 79+ messages in thread
From: Gaëtan Rivet @ 2019-02-14 10:21 UTC (permalink / raw)
To: Shahaf Shuler
Cc: anatoly.burakov, Yongseok Koh, Thomas Monjalon, ferruh.yigit,
nhorman, dev
On Wed, Feb 13, 2019 at 07:11:35PM +0000, Shahaf Shuler wrote:
> Wednesday, February 13, 2019 1:44 PM, Gaëtan Rivet:
> > Subject: Re: [PATCH 5/6] net/mlx5: support PCI device DMA map and unmap
> >
> > On Wed, Feb 13, 2019 at 12:35:04PM +0100, Gaëtan Rivet wrote:
> > > On Wed, Feb 13, 2019 at 11:10:25AM +0200, Shahaf Shuler wrote:
> > > > The implementation reuses the external memory registration work done
>
> [..]
>
> > > > +
> > > > + /**
> > > > + * We really need to iterate all eth devices regardless of
> > > > + * their owner.
> > > > + */
> > > > + while (port_id < RTE_MAX_ETHPORTS) {
> > > > + port_id = rte_eth_find_next(port_id);
> > > > + if (port_id >= RTE_MAX_ETHPORTS)
> > > > + break;
> > > > + dev = &rte_eth_devices[port_id];
> > > > + drv_name = dev->device->driver->name;
> > > > + if (!strncmp(drv_name, MLX5_DRIVER_NAME,
> > > > + sizeof(MLX5_DRIVER_NAME) + 1) &&
> > > > + pdev == RTE_DEV_TO_PCI(dev->device)) {
> > > > + /* found the PCI device. */
> > > > + return dev;
> > > > + }
> > > > + }
> > > > + return NULL;
> > > > +}
> > >
> > > Might I interest you in the new API?
>
> Good suggestion, will have a look on it in depth.
>
> > >
> > > {
> > > struct rte_dev_iterator it;
> > > struct rte_device *dev;
> > >
> > > RTE_DEV_FOREACH(dev, "class=eth", &it)
> > > if (dev == &pdev->device)
> > > return it.class_device;
> > > return NULL;
> > > }
> > >
> >
> > On that note, this could be in the PCI bus instead?
>
> We can put it on the PCI bus, but it would mean the PCI bus will not be device agnostic.
> Currently, I couldn't find any reference to eth_dev on the PCI bus, besides a single macro which convert to pci device that doesn't really do type checks.
>
> Having it in, would mean the PCI will need start to distinguish between ethdev, crypto dev and what ever devices exists on its bus.
>
I think it's worth thinking about it.
It can stay class-agnostic:
void *
rte_pci_device_class(struct rte_pci_device *pdev, const char *class)
{
char devstr[15+strlen(class)];
struct rte_dev_iterator it;
struct rte_device *dev;
snprintf(devstr, sizeof(devstr), "bus=pci/class=%s", class);
RTE_DEV_FOREACH(dev, devstr, &it)
if (dev == &pdev->device)
return it.class_device;
return NULL;
}
(not a fan of the stack VLA but whatever.)
then:
eth_dev = rte_pci_device_class(pdev, "eth");
Whichever type of device could be returned. Only limit is that you have
to know beforehand what is the device type of the PCI device you are
querying about, but that's necessary anyway.
And if it was instead a crypto dev, it would return NULL.
--
Gaëtan Rivet
6WIND
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH 5/6] net/mlx5: support PCI device DMA map and unmap
2019-02-14 10:21 ` Gaëtan Rivet
@ 2019-02-21 9:21 ` Shahaf Shuler
0 siblings, 0 replies; 79+ messages in thread
From: Shahaf Shuler @ 2019-02-21 9:21 UTC (permalink / raw)
To: Gaëtan Rivet
Cc: anatoly.burakov, Yongseok Koh, Thomas Monjalon, ferruh.yigit,
nhorman, dev
Thursday, February 14, 2019 12:22 PM, Gaëtan Rivet:
> Subject: Re: [PATCH 5/6] net/mlx5: support PCI device DMA map and unmap
>
> On Wed, Feb 13, 2019 at 07:11:35PM +0000, Shahaf Shuler wrote:
> > Wednesday, February 13, 2019 1:44 PM, Gaëtan Rivet:
> > > Subject: Re: [PATCH 5/6] net/mlx5: support PCI device DMA map and
> > > unmap
> > >
> > > On Wed, Feb 13, 2019 at 12:35:04PM +0100, Gaëtan Rivet wrote:
> > > > On Wed, Feb 13, 2019 at 11:10:25AM +0200, Shahaf Shuler wrote:
> > > > > The implementation reuses the external memory registration work
> > > > > done
> >
> > [..]
> >
> > > > > +
> > > > > + /**
> > > > > + * We really need to iterate all eth devices regardless of
> > > > > + * their owner.
> > > > > + */
> > > > > + while (port_id < RTE_MAX_ETHPORTS) {
> > > > > + port_id = rte_eth_find_next(port_id);
> > > > > + if (port_id >= RTE_MAX_ETHPORTS)
> > > > > + break;
> > > > > + dev = &rte_eth_devices[port_id];
> > > > > + drv_name = dev->device->driver->name;
> > > > > + if (!strncmp(drv_name, MLX5_DRIVER_NAME,
> > > > > + sizeof(MLX5_DRIVER_NAME) + 1) &&
> > > > > + pdev == RTE_DEV_TO_PCI(dev->device)) {
> > > > > + /* found the PCI device. */
> > > > > + return dev;
> > > > > + }
> > > > > + }
> > > > > + return NULL;
> > > > > +}
> > > >
> > > > Might I interest you in the new API?
> >
> > Good suggestion, will have a look on it in depth.
> >
> > > >
> > > > {
> > > > struct rte_dev_iterator it;
> > > > struct rte_device *dev;
> > > >
> > > > RTE_DEV_FOREACH(dev, "class=eth", &it)
> > > > if (dev == &pdev->device)
> > > > return it.class_device;
> > > > return NULL;
> > > > }
> > > >
> > >
> > > On that note, this could be in the PCI bus instead?
Looking in more depth into it. it looks like ethdev is the only device class which register its type.
So putting a generic iterator for every class on the PCI bus looks an overkill to me at this point.
I think I will take the above suggestion to replace the internal PMD code.
> >
> > We can put it on the PCI bus, but it would mean the PCI bus will not be
> device agnostic.
> > Currently, I couldn't find any reference to eth_dev on the PCI bus, besides
> a single macro which convert to pci device that doesn't really do type checks.
> >
> > Having it in, would mean the PCI will need start to distinguish between
> ethdev, crypto dev and what ever devices exists on its bus.
> >
>
> I think it's worth thinking about it.
> It can stay class-agnostic:
>
> void *
> rte_pci_device_class(struct rte_pci_device *pdev, const char *class)
> {
> char devstr[15+strlen(class)];
> struct rte_dev_iterator it;
> struct rte_device *dev;
>
> snprintf(devstr, sizeof(devstr), "bus=pci/class=%s", class);
> RTE_DEV_FOREACH(dev, devstr, &it)
> if (dev == &pdev->device)
> return it.class_device;
> return NULL;
> }
>
> (not a fan of the stack VLA but whatever.)
> then:
>
> eth_dev = rte_pci_device_class(pdev, "eth");
>
> Whichever type of device could be returned. Only limit is that you have to
> know beforehand what is the device type of the PCI device you are querying
> about, but that's necessary anyway.
>
> And if it was instead a crypto dev, it would return NULL.
>
> --
> Gaëtan Rivet
> 6WIND
^ permalink raw reply [flat|nested] 79+ messages in thread
* [dpdk-dev] [PATCH 6/6] doc: deprecate VFIO DMA map APIs
2019-02-13 9:10 [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for external memory Shahaf Shuler
` (4 preceding siblings ...)
2019-02-13 9:10 ` [dpdk-dev] [PATCH 5/6] net/mlx5: support PCI device DMA map and unmap Shahaf Shuler
@ 2019-02-13 9:10 ` Shahaf Shuler
2019-02-13 11:43 ` [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for external memory Alejandro Lucero
` (7 subsequent siblings)
13 siblings, 0 replies; 79+ messages in thread
From: Shahaf Shuler @ 2019-02-13 9:10 UTC (permalink / raw)
To: anatoly.burakov, yskoh, thomas, ferruh.yigit, nhorman, gaetan.rivet; +Cc: dev
As those have been replaced by rte_bus_dma_map and rte_pci_dma_unmap
APIs.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
---
doc/guides/prog_guide/env_abstraction_layer.rst | 2 +-
doc/guides/rel_notes/deprecation.rst | 4 ++++
lib/librte_eal/common/include/rte_vfio.h | 6 ++++--
3 files changed, 9 insertions(+), 3 deletions(-)
diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index 929d76dba7..6a1ddf8b4a 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -282,7 +282,7 @@ The expected workflow is as follows:
- If IOVA table is not specified, IOVA addresses will be assumed to be
unavailable
- Other processes must attach to the memory area before they can use it
-* Perform DMA mapping with ``rte_vfio_dma_map`` if needed
+* Perform DMA mapping with ``rte_bus_dma_map`` if needed
* Use the memory area in your application
* If memory area is no longer needed, it can be unregistered
- If the area was mapped for DMA, unmapping must be performed before
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 1b4fcb7e64..f7ae0d56fb 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -35,6 +35,10 @@ Deprecation Notices
+ ``rte_eal_devargs_type_count``
+* vfio: removal of ``rte_vfio_dma_map`` and ``rte_vfio_dma_unmap`` APIs which
+ have been replaced with ``rte_bus_dma_map`` and ``rte_bus_dma_unmap``
+ functions. The due date for the removal targets DPDK 19.08.
+
* pci: Several exposed functions are misnamed.
The following functions are deprecated starting from v17.11 and are replaced:
diff --git a/lib/librte_eal/common/include/rte_vfio.h b/lib/librte_eal/common/include/rte_vfio.h
index 2a6827012f..8dd2c5316d 100644
--- a/lib/librte_eal/common/include/rte_vfio.h
+++ b/lib/librte_eal/common/include/rte_vfio.h
@@ -188,6 +188,7 @@ int
rte_vfio_clear_group(int vfio_group_fd);
/**
+ * @deprecated
* Map memory region for use with VFIO.
*
* @note Require at least one device to be attached at the time of
@@ -208,11 +209,12 @@ rte_vfio_clear_group(int vfio_group_fd);
* 0 if success.
* -1 on error.
*/
-int
+int __rte_deprecated
rte_vfio_dma_map(uint64_t vaddr, uint64_t iova, uint64_t len);
/**
+ * @deprecated
* Unmap memory region from VFIO.
*
* @param vaddr
@@ -229,7 +231,7 @@ rte_vfio_dma_map(uint64_t vaddr, uint64_t iova, uint64_t len);
* -1 on error.
*/
-int
+int __rte_deprecated
rte_vfio_dma_unmap(uint64_t vaddr, uint64_t iova, uint64_t len);
/**
* Parse IOMMU group number for a device
--
2.12.0
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for external memory
2019-02-13 9:10 [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for external memory Shahaf Shuler
` (5 preceding siblings ...)
2019-02-13 9:10 ` [dpdk-dev] [PATCH 6/6] doc: deprecate VFIO DMA map APIs Shahaf Shuler
@ 2019-02-13 11:43 ` Alejandro Lucero
2019-02-13 19:24 ` Shahaf Shuler
2019-02-21 14:50 ` [dpdk-dev] [PATCH v2 " Shahaf Shuler
` (6 subsequent siblings)
13 siblings, 1 reply; 79+ messages in thread
From: Alejandro Lucero @ 2019-02-13 11:43 UTC (permalink / raw)
To: Shahaf Shuler
Cc: Burakov, Anatoly, Yongseok Koh, Thomas Monjalon, Ferruh Yigit,
nhorman, Gaetan Rivet, dev
On Wed, Feb 13, 2019 at 9:11 AM Shahaf Shuler <shahafs@mellanox.com> wrote:
> This series is in continue to RFC[1].
>
> The DPDK APIs expose 3 different modes to work with memory used for DMA:
>
> 1. Use the DPDK owned memory (backed by the DPDK provided hugepages).
> This memory is allocated by the DPDK libraries, included in the DPDK
> memory system (memseg lists) and automatically DMA mapped by the DPDK
> layers.
>
> 2. Use memory allocated by the user and register to the DPDK memory
> systems. This is also referred as external memory. Upon registration of
> the external memory, the DPDK layers will DMA map it to all needed
> devices.
>
> 3. Use memory allocated by the user and not registered to the DPDK memory
> system. This is for users who wants to have tight control on this
> memory. The user will need to explicitly call DMA map function in order
> to register such memory to the different devices.
>
> The scope of the patch focus on #3 above.
>
>
Why can not we have case 2 covering case 3?
> Currently the only way to map external memory is through VFIO
> (rte_vfio_dma_map). While VFIO is common, there are other vendors
> which use different ways to map memory (e.g. Mellanox and NXP).
>
>
As you say, VFIO is common, and when allowing DMAs programmed in user
space, the right thing to do. I'm assuming there is an IOMMU hardware and
this is what Mellanox and NXP rely on in some way or another.
Having each driver doing things in their own way will end up in a harder to
validate system. If there is an IOMMU hardware, same mechanism should be
used always, leaving to the IOMMU hw specific implementation to deal with
the details. If a NIC is IOMMU-able, that should not be supported by
specific vendor drivers but through a generic solution like VFIO which will
validate a device with such capability and to perform the required actions
for that case. VFIO and IOMMU should be modified as needed for supporting
this requirement instead of leaving vendor drivers to implement their own
solution.
In any case, I think this support should be in a different patchset than
the private user space mappings.
> The work in this patch moves the DMA mapping to vendor agnostic APIs.
> A new map and unmap ops were added to rte_bus structure. Implementation
> of those was done currently only on the PCI bus. The implementation takes
> the driver map and umap implementation as bypass to the VFIO mapping.
> That is, in case of no specific map/unmap from the PCI driver,
> VFIO mapping, if possible, will be used.
>
> Application use with those APIs is quite simple:
> * allocate memory
> * take a device, and query its rte_device.
> * call the bus map function for this device.
>
> Future work will deprecate the rte_vfio_dma_map and rte_vfio_dma_unmap
> APIs, leaving the PCI device APIs as the preferred option for the user.
>
> [1] https://patches.dpdk.org/patch/47796/
>
> Shahaf Shuler (6):
> vfio: allow DMA map of memory for the default vfio fd
> vfio: don't fail to DMA map if memory is already mapped
> bus: introduce DMA memory mapping for external memory
> net/mlx5: refactor external memory registration
> net/mlx5: support PCI device DMA map and unmap
> doc: deprecate VFIO DMA map APIs
>
> doc/guides/prog_guide/env_abstraction_layer.rst | 2 +-
> doc/guides/rel_notes/deprecation.rst | 4 +
> drivers/bus/pci/pci_common.c | 78 +++++++
> drivers/bus/pci/rte_bus_pci.h | 14 ++
> drivers/net/mlx5/mlx5.c | 2 +
> drivers/net/mlx5/mlx5_mr.c | 232 ++++++++++++++++---
> drivers/net/mlx5/mlx5_rxtx.h | 5 +
> lib/librte_eal/common/eal_common_bus.c | 22 ++
> lib/librte_eal/common/include/rte_bus.h | 57 +++++
> lib/librte_eal/common/include/rte_vfio.h | 12 +-
> lib/librte_eal/linuxapp/eal/eal_vfio.c | 26 ++-
> lib/librte_eal/rte_eal_version.map | 2 +
> 12 files changed, 418 insertions(+), 38 deletions(-)
>
> --
> 2.12.0
>
>
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for external memory
2019-02-13 11:43 ` [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for external memory Alejandro Lucero
@ 2019-02-13 19:24 ` Shahaf Shuler
2019-02-14 10:19 ` Burakov, Anatoly
2019-02-14 12:22 ` Alejandro Lucero
0 siblings, 2 replies; 79+ messages in thread
From: Shahaf Shuler @ 2019-02-13 19:24 UTC (permalink / raw)
To: Alejandro Lucero
Cc: Burakov, Anatoly, Yongseok Koh, Thomas Monjalon, Ferruh Yigit,
nhorman, Gaetan Rivet, dev
Wednesday, February 13, 2019 1:43 PM, Alejandro Lucero:
> Subject: Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for
> external memory
>
> On Wed, Feb 13, 2019 at 9:11 AM Shahaf Shuler <shahafs@mellanox.com>
> wrote:
>
> > This series is in continue to RFC[1].
> >
> > The DPDK APIs expose 3 different modes to work with memory used for
> DMA:
> >
> > 1. Use the DPDK owned memory (backed by the DPDK provided
> hugepages).
> > This memory is allocated by the DPDK libraries, included in the DPDK
> > memory system (memseg lists) and automatically DMA mapped by the
> DPDK
> > layers.
> >
> > 2. Use memory allocated by the user and register to the DPDK memory
> > systems. This is also referred as external memory. Upon registration
> > of the external memory, the DPDK layers will DMA map it to all needed
> > devices.
> >
> > 3. Use memory allocated by the user and not registered to the DPDK
> > memory system. This is for users who wants to have tight control on
> > this memory. The user will need to explicitly call DMA map function in
> > order to register such memory to the different devices.
> >
> > The scope of the patch focus on #3 above.
> >
> >
> Why can not we have case 2 covering case 3?
Because it is not our choice rather the DPDK application.
We could not allow it, and force the application to register their external memory to the DPDK memory management system. However IMO it will be wrong.
The use case exists - some application wants to manage their memory by themselves. w/o the extra overhead of rte_malloc, without creating a special socket to populate the memory and without redundant API calls to rte_extmem_*.
Simply allocate chunk of memory, DMA map it to device and that’s it.
>
>
> > Currently the only way to map external memory is through VFIO
> > (rte_vfio_dma_map). While VFIO is common, there are other vendors
> > which use different ways to map memory (e.g. Mellanox and NXP).
> >
> >
> As you say, VFIO is common, and when allowing DMAs programmed in user
> space, the right thing to do.
It is common indeed. Why it the right thing to do?
I'm assuming there is an IOMMU hardware and
> this is what Mellanox and NXP rely on in some way or another.
For Mellanox, the device works with virtual memory, not physical. If you think of it, it is more secure for user space application. Mellanox device has internal memory translation unit between virtual memory and physical memory.
IOMMU can be added on top of it, in case the host doesn't trust the device or the device is given to untrusted entity like VM.
>
> Having each driver doing things in their own way will end up in a harder to
> validate system.
Different vendors will have different HW implementations. We cannot force everybody to align the IOMMU.
What we can do, is to ease the user life and provide vendor agnostic APIs which just provide the needed functionality. On our case DMA map and unmap.
The user should not care if its IOMMU, Mellanox memory registration through verbs or NXP special mapping.
The sys admin should set/unset the IOMMU as a general mean of protection. And this of course will work also w/ Mellanox devices.
If there is an IOMMU hardware, same mechanism should be
> used always, leaving to the IOMMU hw specific implementation to deal with
> the details. If a NIC is IOMMU-able, that should not be supported by specific
> vendor drivers but through a generic solution like VFIO which will validate a
> device with such capability and to perform the required actions for that case.
> VFIO and IOMMU should be modified as needed for supporting this
> requirement instead of leaving vendor drivers to implement their own
> solution.
Again - I am against of forcing every PCI device to use VFIO, and I don't think IOMMU as a HW device should control other PCI devices.
I see nothing wrong with device which also has extra capabilities of memory translation, and adds another level of security to the user application.
>
> In any case, I think this support should be in a different patchset than the
> private user space mappings.
>
>
>
> > The work in this patch moves the DMA mapping to vendor agnostic APIs.
> > A new map and unmap ops were added to rte_bus structure.
> > Implementation of those was done currently only on the PCI bus. The
> > implementation takes the driver map and umap implementation as bypass
> to the VFIO mapping.
> > That is, in case of no specific map/unmap from the PCI driver, VFIO
> > mapping, if possible, will be used.
> >
> > Application use with those APIs is quite simple:
> > * allocate memory
> > * take a device, and query its rte_device.
> > * call the bus map function for this device.
> >
> > Future work will deprecate the rte_vfio_dma_map and
> rte_vfio_dma_unmap
> > APIs, leaving the PCI device APIs as the preferred option for the user.
> >
> > [1]
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpat
> >
> ches.dpdk.org%2Fpatch%2F47796%2F&data=02%7C01%7Cshahafs%40
> mellanox
> >
> .com%7Cdc209a2ceace48c0452008d691a8762d%7Ca652971c7d2e4d9ba6a4d1
> 49256f
> >
> 461b%7C0%7C0%7C636856550053348339&sdata=3TEUJfS9jBOsbvaPYwo
> itQLj7o
> > h9VCrtaK7We%2FItg5c%3D&reserved=0
> >
> > Shahaf Shuler (6):
> > vfio: allow DMA map of memory for the default vfio fd
> > vfio: don't fail to DMA map if memory is already mapped
> > bus: introduce DMA memory mapping for external memory
> > net/mlx5: refactor external memory registration
> > net/mlx5: support PCI device DMA map and unmap
> > doc: deprecate VFIO DMA map APIs
> >
> > doc/guides/prog_guide/env_abstraction_layer.rst | 2 +-
> > doc/guides/rel_notes/deprecation.rst | 4 +
> > drivers/bus/pci/pci_common.c | 78 +++++++
> > drivers/bus/pci/rte_bus_pci.h | 14 ++
> > drivers/net/mlx5/mlx5.c | 2 +
> > drivers/net/mlx5/mlx5_mr.c | 232 ++++++++++++++++---
> > drivers/net/mlx5/mlx5_rxtx.h | 5 +
> > lib/librte_eal/common/eal_common_bus.c | 22 ++
> > lib/librte_eal/common/include/rte_bus.h | 57 +++++
> > lib/librte_eal/common/include/rte_vfio.h | 12 +-
> > lib/librte_eal/linuxapp/eal/eal_vfio.c | 26 ++-
> > lib/librte_eal/rte_eal_version.map | 2 +
> > 12 files changed, 418 insertions(+), 38 deletions(-)
> >
> > --
> > 2.12.0
> >
> >
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for external memory
2019-02-13 19:24 ` Shahaf Shuler
@ 2019-02-14 10:19 ` Burakov, Anatoly
2019-02-14 13:28 ` Shahaf Shuler
2019-02-14 12:22 ` Alejandro Lucero
1 sibling, 1 reply; 79+ messages in thread
From: Burakov, Anatoly @ 2019-02-14 10:19 UTC (permalink / raw)
To: Shahaf Shuler, Alejandro Lucero
Cc: Yongseok Koh, Thomas Monjalon, Ferruh Yigit, nhorman, Gaetan Rivet, dev
On 13-Feb-19 7:24 PM, Shahaf Shuler wrote:
> Wednesday, February 13, 2019 1:43 PM, Alejandro Lucero:
>> Subject: Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for
>> external memory
>>
>> On Wed, Feb 13, 2019 at 9:11 AM Shahaf Shuler <shahafs@mellanox.com>
>> wrote:
>>
>>> This series is in continue to RFC[1].
>>>
>>> The DPDK APIs expose 3 different modes to work with memory used for
>> DMA:
>>>
>>> 1. Use the DPDK owned memory (backed by the DPDK provided
>> hugepages).
>>> This memory is allocated by the DPDK libraries, included in the DPDK
>>> memory system (memseg lists) and automatically DMA mapped by the
>> DPDK
>>> layers.
>>>
>>> 2. Use memory allocated by the user and register to the DPDK memory
>>> systems. This is also referred as external memory. Upon registration
>>> of the external memory, the DPDK layers will DMA map it to all needed
>>> devices.
>>>
>>> 3. Use memory allocated by the user and not registered to the DPDK
>>> memory system. This is for users who wants to have tight control on
>>> this memory. The user will need to explicitly call DMA map function in
>>> order to register such memory to the different devices.
>>>
>>> The scope of the patch focus on #3 above.
>>>
>>>
>> Why can not we have case 2 covering case 3?
>
> Because it is not our choice rather the DPDK application.
> We could not allow it, and force the application to register their external memory to the DPDK memory management system. However IMO it will be wrong.
> The use case exists - some application wants to manage their memory by themselves. w/o the extra overhead of rte_malloc, without creating a special socket to populate the memory and without redundant API calls to rte_extmem_*.
>
> Simply allocate chunk of memory, DMA map it to device and that’s it.
Just a small note: while this sounds good on paper, i should point out
that at least *registering* the memory with DPDK is a necessity. You may
see rte_extmem_* calls as redundant (and i agree, to an extent), but we
don't advertise our PMD's capabilities in a way that makes it easy to
determine whether a particular PMD will or will not work without
registering external memory within DPDK (i.e. does it use
rte_virt2memseg() internally, for example).
So, extmem register calls are a necessary evil in such case, and IMO
should be called out as required for such external memory usage scenario.
--
Thanks,
Anatoly
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for external memory
2019-02-14 10:19 ` Burakov, Anatoly
@ 2019-02-14 13:28 ` Shahaf Shuler
2019-02-14 16:19 ` Burakov, Anatoly
0 siblings, 1 reply; 79+ messages in thread
From: Shahaf Shuler @ 2019-02-14 13:28 UTC (permalink / raw)
To: Burakov, Anatoly, Alejandro Lucero
Cc: Yongseok Koh, Thomas Monjalon, Ferruh Yigit, nhorman, Gaetan Rivet, dev
Thursday, February 14, 2019 12:19 PM, Burakov, Anatoly:
> Subject: Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for
> external memory
>
> On 13-Feb-19 7:24 PM, Shahaf Shuler wrote:
> > Wednesday, February 13, 2019 1:43 PM, Alejandro Lucero:
> >> Subject: Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for
> >> external memory
> >>
> >> On Wed, Feb 13, 2019 at 9:11 AM Shahaf Shuler
> <shahafs@mellanox.com>
> >> wrote:
> >>
> >>> This series is in continue to RFC[1].
> >>>
> >>> The DPDK APIs expose 3 different modes to work with memory used for
> >> DMA:
> >>>
> >>> 1. Use the DPDK owned memory (backed by the DPDK provided
> >> hugepages).
> >>> This memory is allocated by the DPDK libraries, included in the DPDK
> >>> memory system (memseg lists) and automatically DMA mapped by the
> >> DPDK
> >>> layers.
> >>>
> >>> 2. Use memory allocated by the user and register to the DPDK memory
> >>> systems. This is also referred as external memory. Upon registration
> >>> of the external memory, the DPDK layers will DMA map it to all
> >>> needed devices.
> >>>
> >>> 3. Use memory allocated by the user and not registered to the DPDK
> >>> memory system. This is for users who wants to have tight control on
> >>> this memory. The user will need to explicitly call DMA map function
> >>> in order to register such memory to the different devices.
> >>>
> >>> The scope of the patch focus on #3 above.
> >>>
> >>>
> >> Why can not we have case 2 covering case 3?
> >
> > Because it is not our choice rather the DPDK application.
> > We could not allow it, and force the application to register their external
> memory to the DPDK memory management system. However IMO it will be
> wrong.
> > The use case exists - some application wants to manage their memory by
> themselves. w/o the extra overhead of rte_malloc, without creating a special
> socket to populate the memory and without redundant API calls to
> rte_extmem_*.
> >
> > Simply allocate chunk of memory, DMA map it to device and that’s it.
>
> Just a small note: while this sounds good on paper, i should point out that at
> least *registering* the memory with DPDK is a necessity. You may see
> rte_extmem_* calls as redundant (and i agree, to an extent), but we don't
> advertise our PMD's capabilities in a way that makes it easy to determine
> whether a particular PMD will or will not work without registering external
> memory within DPDK (i.e. does it use
> rte_virt2memseg() internally, for example).
>
> So, extmem register calls are a necessary evil in such case, and IMO should
> be called out as required for such external memory usage scenario.
If we are going to force all to use the extmem, then there is no need w/ this API. we can have the PMDs to register when the memory is registered.
We can just drop the vfio_dma_map APIs and that's it.
>
> --
> Thanks,
> Anatoly
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for external memory
2019-02-14 13:28 ` Shahaf Shuler
@ 2019-02-14 16:19 ` Burakov, Anatoly
2019-02-17 6:18 ` Shahaf Shuler
0 siblings, 1 reply; 79+ messages in thread
From: Burakov, Anatoly @ 2019-02-14 16:19 UTC (permalink / raw)
To: Shahaf Shuler, Alejandro Lucero
Cc: Yongseok Koh, Thomas Monjalon, Ferruh Yigit, nhorman, Gaetan Rivet, dev
On 14-Feb-19 1:28 PM, Shahaf Shuler wrote:
> Thursday, February 14, 2019 12:19 PM, Burakov, Anatoly:
>> Subject: Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for
>> external memory
>>
>> On 13-Feb-19 7:24 PM, Shahaf Shuler wrote:
>>> Wednesday, February 13, 2019 1:43 PM, Alejandro Lucero:
>>>> Subject: Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for
>>>> external memory
>>>>
>>>> On Wed, Feb 13, 2019 at 9:11 AM Shahaf Shuler
>> <shahafs@mellanox.com>
>>>> wrote:
>>>>
>>>>> This series is in continue to RFC[1].
>>>>>
>>>>> The DPDK APIs expose 3 different modes to work with memory used for
>>>> DMA:
>>>>>
>>>>> 1. Use the DPDK owned memory (backed by the DPDK provided
>>>> hugepages).
>>>>> This memory is allocated by the DPDK libraries, included in the DPDK
>>>>> memory system (memseg lists) and automatically DMA mapped by the
>>>> DPDK
>>>>> layers.
>>>>>
>>>>> 2. Use memory allocated by the user and register to the DPDK memory
>>>>> systems. This is also referred as external memory. Upon registration
>>>>> of the external memory, the DPDK layers will DMA map it to all
>>>>> needed devices.
>>>>>
>>>>> 3. Use memory allocated by the user and not registered to the DPDK
>>>>> memory system. This is for users who wants to have tight control on
>>>>> this memory. The user will need to explicitly call DMA map function
>>>>> in order to register such memory to the different devices.
>>>>>
>>>>> The scope of the patch focus on #3 above.
>>>>>
>>>>>
>>>> Why can not we have case 2 covering case 3?
>>>
>>> Because it is not our choice rather the DPDK application.
>>> We could not allow it, and force the application to register their external
>> memory to the DPDK memory management system. However IMO it will be
>> wrong.
>>> The use case exists - some application wants to manage their memory by
>> themselves. w/o the extra overhead of rte_malloc, without creating a special
>> socket to populate the memory and without redundant API calls to
>> rte_extmem_*.
>>>
>>> Simply allocate chunk of memory, DMA map it to device and that’s it.
>>
>> Just a small note: while this sounds good on paper, i should point out that at
>> least *registering* the memory with DPDK is a necessity. You may see
>> rte_extmem_* calls as redundant (and i agree, to an extent), but we don't
>> advertise our PMD's capabilities in a way that makes it easy to determine
>> whether a particular PMD will or will not work without registering external
>> memory within DPDK (i.e. does it use
>> rte_virt2memseg() internally, for example).
>>
>> So, extmem register calls are a necessary evil in such case, and IMO should
>> be called out as required for such external memory usage scenario.
>
> If we are going to force all to use the extmem, then there is no need w/ this API. we can have the PMDs to register when the memory is registered.
> We can just drop the vfio_dma_map APIs and that's it.
>
Well, whether we needed it or not is not really my call, but what i can
say is that using extmem_register is _necessary_ if you're going to use
the PMD's. You're right, we could just map memory for DMA at register
time - that would save one API call to get the memory working. It makes
it a bit weird semantically, but i think we can live with that :)
--
Thanks,
Anatoly
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for external memory
2019-02-14 16:19 ` Burakov, Anatoly
@ 2019-02-17 6:18 ` Shahaf Shuler
2019-02-18 12:21 ` Burakov, Anatoly
0 siblings, 1 reply; 79+ messages in thread
From: Shahaf Shuler @ 2019-02-17 6:18 UTC (permalink / raw)
To: Burakov, Anatoly, Alejandro Lucero
Cc: Yongseok Koh, Thomas Monjalon, Ferruh Yigit, nhorman, Gaetan Rivet, dev
Thursday, February 14, 2019 6:20 PM, Burakov, Anatoly:
> Subject: Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for
> external memory
>
> On 14-Feb-19 1:28 PM, Shahaf Shuler wrote:
> > Thursday, February 14, 2019 12:19 PM, Burakov, Anatoly:
> >> Subject: Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for
> >> external memory
> >>
> >> On 13-Feb-19 7:24 PM, Shahaf Shuler wrote:
> >>> Wednesday, February 13, 2019 1:43 PM, Alejandro Lucero:
> >>>> Subject: Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping
> >>>> for external memory
> >>>>
[...]
> >
> > If we are going to force all to use the extmem, then there is no need w/
> this API. we can have the PMDs to register when the memory is registered.
> > We can just drop the vfio_dma_map APIs and that's it.
> >
>
> Well, whether we needed it or not is not really my call, but what i can say is
> that using extmem_register is _necessary_ if you're going to use the PMD's.
> You're right, we could just map memory for DMA at register time - that
> would save one API call to get the memory working. It makes it a bit weird
> semantically, but i think we can live with that :)
This was not my suggestion 😊. I don't think the register API should do the mapping as well.
My thoughts were on one of the two options:
1. have the series I propose here, and enable the user to work with memory managed outside of DPDK. Either force the user to call rte_extmem_register before the mapping or devices which needs memory to be also registered in the DPDK system can fail the mapping.
2. not providing such option to application, and forcing applications to populate a socket w/ external memory.
I vote for #1.
>
> --
> Thanks,
> Anatoly
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for external memory
2019-02-17 6:18 ` Shahaf Shuler
@ 2019-02-18 12:21 ` Burakov, Anatoly
0 siblings, 0 replies; 79+ messages in thread
From: Burakov, Anatoly @ 2019-02-18 12:21 UTC (permalink / raw)
To: Shahaf Shuler, Alejandro Lucero
Cc: Yongseok Koh, Thomas Monjalon, Ferruh Yigit, nhorman, Gaetan Rivet, dev
On 17-Feb-19 6:18 AM, Shahaf Shuler wrote:
> Thursday, February 14, 2019 6:20 PM, Burakov, Anatoly:
>> Subject: Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for
>> external memory
>>
>> On 14-Feb-19 1:28 PM, Shahaf Shuler wrote:
>>> Thursday, February 14, 2019 12:19 PM, Burakov, Anatoly:
>>>> Subject: Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for
>>>> external memory
>>>>
>>>> On 13-Feb-19 7:24 PM, Shahaf Shuler wrote:
>>>>> Wednesday, February 13, 2019 1:43 PM, Alejandro Lucero:
>>>>>> Subject: Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping
>>>>>> for external memory
>>>>>>
>
> [...]
>
>>>
>>> If we are going to force all to use the extmem, then there is no need w/
>> this API. we can have the PMDs to register when the memory is registered.
>>> We can just drop the vfio_dma_map APIs and that's it.
>>>
>>
>> Well, whether we needed it or not is not really my call, but what i can say is
>> that using extmem_register is _necessary_ if you're going to use the PMD's.
>> You're right, we could just map memory for DMA at register time - that
>> would save one API call to get the memory working. It makes it a bit weird
>> semantically, but i think we can live with that :)
>
> This was not my suggestion 😊. I don't think the register API should do the mapping as well.
> My thoughts were on one of the two options:
> 1. have the series I propose here, and enable the user to work with memory managed outside of DPDK. Either force the user to call rte_extmem_register before the mapping or devices which needs memory to be also registered in the DPDK system can fail the mapping.
> 2. not providing such option to application, and forcing applications to populate a socket w/ external memory.
>
> I vote for #1.
I too think #1 is better - we want this to be a valid use case. Allowing
such usage in the first place is already gracious enough - all we ask in
return is one extra API call, to politely let DPDK know that this memory
exists and is going to be used for DMA :)
Also, having the memory registered will also allow us to refuse mapping
if it cannot be found in DPDK maps - if rte_virt2memseg returns NULL,
that means extmem_register was not called. I.e. we can _enforce_ usage
of extmem_register, which i believe is a good thing for usability.
>
>>
>> --
>> Thanks,
>> Anatoly
--
Thanks,
Anatoly
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for external memory
2019-02-13 19:24 ` Shahaf Shuler
2019-02-14 10:19 ` Burakov, Anatoly
@ 2019-02-14 12:22 ` Alejandro Lucero
2019-02-14 12:27 ` Alejandro Lucero
2019-02-14 13:41 ` Shahaf Shuler
1 sibling, 2 replies; 79+ messages in thread
From: Alejandro Lucero @ 2019-02-14 12:22 UTC (permalink / raw)
To: Shahaf Shuler
Cc: Burakov, Anatoly, Yongseok Koh, Thomas Monjalon, Ferruh Yigit,
nhorman, Gaetan Rivet, dev
On Wed, Feb 13, 2019 at 7:24 PM Shahaf Shuler <shahafs@mellanox.com> wrote:
> Wednesday, February 13, 2019 1:43 PM, Alejandro Lucero:
> > Subject: Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for
> > external memory
> >
> > On Wed, Feb 13, 2019 at 9:11 AM Shahaf Shuler <shahafs@mellanox.com>
> > wrote:
> >
> > > This series is in continue to RFC[1].
> > >
> > > The DPDK APIs expose 3 different modes to work with memory used for
> > DMA:
> > >
> > > 1. Use the DPDK owned memory (backed by the DPDK provided
> > hugepages).
> > > This memory is allocated by the DPDK libraries, included in the DPDK
> > > memory system (memseg lists) and automatically DMA mapped by the
> > DPDK
> > > layers.
> > >
> > > 2. Use memory allocated by the user and register to the DPDK memory
> > > systems. This is also referred as external memory. Upon registration
> > > of the external memory, the DPDK layers will DMA map it to all needed
> > > devices.
> > >
> > > 3. Use memory allocated by the user and not registered to the DPDK
> > > memory system. This is for users who wants to have tight control on
> > > this memory. The user will need to explicitly call DMA map function in
> > > order to register such memory to the different devices.
> > >
> > > The scope of the patch focus on #3 above.
> > >
> > >
> > Why can not we have case 2 covering case 3?
>
> Because it is not our choice rather the DPDK application.
> We could not allow it, and force the application to register their
> external memory to the DPDK memory management system. However IMO it will
> be wrong.
> The use case exists - some application wants to manage their memory by
> themselves. w/o the extra overhead of rte_malloc, without creating a
> special socket to populate the memory and without redundant API calls to
> rte_extmem_*.
>
> Simply allocate chunk of memory, DMA map it to device and that’s it.
>
>
Usability is a strong point, but up to some extent. DPDK is all about
performance, and adding options the user can choose from will add pressure
and complexity for keeping the performance. Your proposal makes sense from
an user point of view, but will it avoid to modify things in the DPDK core
for supporting this case broadly in the future? Multiprocess will be hard
to get, if not impossible, without adding more complexity, and although you
likely do not expect that use case requiring multiprocess support, once we
have DPDK apps using this model, sooner or later those companies with
products based on such option will demand broadly support. I can foresee
not just multiprocess support will require changes in the future.
This reminds me the case of obtaining real time: the more complexity the
less determinism can be obtained. It is not impossible, simply it is far
more complex. Pure real time operating systems can add new functionalities,
but it is hard to do it properly without jeopardising the main goal.
Generic purpose operating systems can try to improve determinism, but up to
some extent and with important complexity costs. DPDK is the real time
operating system in this comparison.
> >
> >
> > > Currently the only way to map external memory is through VFIO
> > > (rte_vfio_dma_map). While VFIO is common, there are other vendors
> > > which use different ways to map memory (e.g. Mellanox and NXP).
> > >
> > >
> > As you say, VFIO is common, and when allowing DMAs programmed in user
> > space, the right thing to do.
>
> It is common indeed. Why it the right thing to do?
>
>
Compared with UIO, for sure. VFIO does have the right view of the system in
terms of which devices can properly be isolated. Can you confirm a specific
implementation by a vendor can ensure same behaviour? If so, do you have
duplicated code then? if the answer is your are using VFIO data, why not to
use VFIO as the interface and add the required connection between VFIO and
drivers?
What about mapping validation? is the driver doing that part or relying on
kernel code? or is it just assuming the mapping is safe?
> I'm assuming there is an IOMMU hardware and
> > this is what Mellanox and NXP rely on in some way or another.
>
> For Mellanox, the device works with virtual memory, not physical. If you
> think of it, it is more secure for user space application. Mellanox device
> has internal memory translation unit between virtual memory and physical
> memory.
> IOMMU can be added on top of it, in case the host doesn't trust the device
> or the device is given to untrusted entity like VM.
>
>
Any current NIC or device will work with virtual addresses if IOMMU is in
place, not matter if the device is IOMMU-aware or not. Any vendor, with
that capability in their devices, should follow generic paths and a common
interface with the vendor drivers being the executors. The drivers know how
to tell the device, but they should be told what to tell and not by the
user but by the kernel.
I think reading your comment "in case the host doesn't trust the device"
makes easier to understand what you try to obtain, and at the same time
makes my concerns not a problem at all. This is a case for DPDK being used
in certain scenarios where the full system is trusted, what I think is a
completely rightful option. My only concern then is the complexity it could
imply sooner or later, although I have to admit it is not a strong one :-)
> >
> > Having each driver doing things in their own way will end up in a harder
> to
> > validate system.
>
> Different vendors will have different HW implementations. We cannot force
> everybody to align the IOMMU.
> What we can do, is to ease the user life and provide vendor agnostic APIs
> which just provide the needed functionality. On our case DMA map and unmap.
> The user should not care if its IOMMU, Mellanox memory registration
> through verbs or NXP special mapping.
>
> The sys admin should set/unset the IOMMU as a general mean of protection.
> And this of course will work also w/ Mellanox devices.
>
> If there is an IOMMU hardware, same mechanism should be
> > used always, leaving to the IOMMU hw specific implementation to deal with
> > the details. If a NIC is IOMMU-able, that should not be supported by
> specific
> > vendor drivers but through a generic solution like VFIO which will
> validate a
> > device with such capability and to perform the required actions for that
> case.
> > VFIO and IOMMU should be modified as needed for supporting this
> > requirement instead of leaving vendor drivers to implement their own
> > solution.
>
> Again - I am against of forcing every PCI device to use VFIO, and I don't
> think IOMMU as a HW device should control other PCI devices.
> I see nothing wrong with device which also has extra capabilities of
> memory translation, and adds another level of security to the user
> application.
>
>
In a system with untrusted components using the device, a generic way of
properly configure the system with the right protections should be used
instead of relying on specific vendor implementation.
> >
> > In any case, I think this support should be in a different patchset than
> the
> > private user space mappings.
> >
> >
>
> > > The work in this patch moves the DMA mapping to vendor agnostic APIs.
> > > A new map and unmap ops were added to rte_bus structure.
> > > Implementation of those was done currently only on the PCI bus. The
> > > implementation takes the driver map and umap implementation as bypass
> > to the VFIO mapping.
> > > That is, in case of no specific map/unmap from the PCI driver, VFIO
> > > mapping, if possible, will be used.
> > >
> > > Application use with those APIs is quite simple:
> > > * allocate memory
> > > * take a device, and query its rte_device.
> > > * call the bus map function for this device.
> > >
> > > Future work will deprecate the rte_vfio_dma_map and
> > rte_vfio_dma_unmap
> > > APIs, leaving the PCI device APIs as the preferred option for the user.
> > >
> > > [1]
> > >
> > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpat
> > >
> > ches.dpdk.org%2Fpatch%2F47796%2F&data=02%7C01%7Cshahafs%40
> > mellanox
> > >
> > .com%7Cdc209a2ceace48c0452008d691a8762d%7Ca652971c7d2e4d9ba6a4d1
> > 49256f
> > >
> > 461b%7C0%7C0%7C636856550053348339&sdata=3TEUJfS9jBOsbvaPYwo
> > itQLj7o
> > > h9VCrtaK7We%2FItg5c%3D&reserved=0
> > >
> > > Shahaf Shuler (6):
> > > vfio: allow DMA map of memory for the default vfio fd
> > > vfio: don't fail to DMA map if memory is already mapped
> > > bus: introduce DMA memory mapping for external memory
> > > net/mlx5: refactor external memory registration
> > > net/mlx5: support PCI device DMA map and unmap
> > > doc: deprecate VFIO DMA map APIs
> > >
> > > doc/guides/prog_guide/env_abstraction_layer.rst | 2 +-
> > > doc/guides/rel_notes/deprecation.rst | 4 +
> > > drivers/bus/pci/pci_common.c | 78 +++++++
> > > drivers/bus/pci/rte_bus_pci.h | 14 ++
> > > drivers/net/mlx5/mlx5.c | 2 +
> > > drivers/net/mlx5/mlx5_mr.c | 232
> ++++++++++++++++---
> > > drivers/net/mlx5/mlx5_rxtx.h | 5 +
> > > lib/librte_eal/common/eal_common_bus.c | 22 ++
> > > lib/librte_eal/common/include/rte_bus.h | 57 +++++
> > > lib/librte_eal/common/include/rte_vfio.h | 12 +-
> > > lib/librte_eal/linuxapp/eal/eal_vfio.c | 26 ++-
> > > lib/librte_eal/rte_eal_version.map | 2 +
> > > 12 files changed, 418 insertions(+), 38 deletions(-)
> > >
> > > --
> > > 2.12.0
> > >
> > >
>
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for external memory
2019-02-14 12:22 ` Alejandro Lucero
@ 2019-02-14 12:27 ` Alejandro Lucero
2019-02-14 13:41 ` Shahaf Shuler
1 sibling, 0 replies; 79+ messages in thread
From: Alejandro Lucero @ 2019-02-14 12:27 UTC (permalink / raw)
To: Shahaf Shuler
Cc: Burakov, Anatoly, Yongseok Koh, Thomas Monjalon, Ferruh Yigit,
nhorman, Gaetan Rivet, dev
On Thu, Feb 14, 2019 at 12:22 PM Alejandro Lucero <
alejandro.lucero@netronome.com> wrote:
>
>
> On Wed, Feb 13, 2019 at 7:24 PM Shahaf Shuler <shahafs@mellanox.com>
> wrote:
>
>> Wednesday, February 13, 2019 1:43 PM, Alejandro Lucero:
>> > Subject: Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for
>> > external memory
>> >
>> > On Wed, Feb 13, 2019 at 9:11 AM Shahaf Shuler <shahafs@mellanox.com>
>> > wrote:
>> >
>> > > This series is in continue to RFC[1].
>> > >
>> > > The DPDK APIs expose 3 different modes to work with memory used for
>> > DMA:
>> > >
>> > > 1. Use the DPDK owned memory (backed by the DPDK provided
>> > hugepages).
>> > > This memory is allocated by the DPDK libraries, included in the DPDK
>> > > memory system (memseg lists) and automatically DMA mapped by the
>> > DPDK
>> > > layers.
>> > >
>> > > 2. Use memory allocated by the user and register to the DPDK memory
>> > > systems. This is also referred as external memory. Upon registration
>> > > of the external memory, the DPDK layers will DMA map it to all needed
>> > > devices.
>> > >
>> > > 3. Use memory allocated by the user and not registered to the DPDK
>> > > memory system. This is for users who wants to have tight control on
>> > > this memory. The user will need to explicitly call DMA map function in
>> > > order to register such memory to the different devices.
>> > >
>> > > The scope of the patch focus on #3 above.
>> > >
>> > >
>> > Why can not we have case 2 covering case 3?
>>
>> Because it is not our choice rather the DPDK application.
>> We could not allow it, and force the application to register their
>> external memory to the DPDK memory management system. However IMO it will
>> be wrong.
>> The use case exists - some application wants to manage their memory by
>> themselves. w/o the extra overhead of rte_malloc, without creating a
>> special socket to populate the memory and without redundant API calls to
>> rte_extmem_*.
>>
>> Simply allocate chunk of memory, DMA map it to device and that’s it.
>>
>>
> Usability is a strong point, but up to some extent. DPDK is all about
> performance, and adding options the user can choose from will add pressure
> and complexity for keeping the performance. Your proposal makes sense from
> an user point of view, but will it avoid to modify things in the DPDK core
> for supporting this case broadly in the future? Multiprocess will be hard
> to get, if not impossible, without adding more complexity, and although you
> likely do not expect that use case requiring multiprocess support, once we
> have DPDK apps using this model, sooner or later those companies with
> products based on such option will demand broadly support. I can foresee
> not just multiprocess support will require changes in the future.
>
> This reminds me the case of obtaining real time: the more complexity the
> less determinism can be obtained. It is not impossible, simply it is far
> more complex. Pure real time operating systems can add new functionalities,
> but it is hard to do it properly without jeopardising the main goal.
> Generic purpose operating systems can try to improve determinism, but up to
> some extent and with important complexity costs. DPDK is the real time
> operating system in this comparison.
>
>
>> >
>> >
>> > > Currently the only way to map external memory is through VFIO
>> > > (rte_vfio_dma_map). While VFIO is common, there are other vendors
>> > > which use different ways to map memory (e.g. Mellanox and NXP).
>> > >
>> > >
>> > As you say, VFIO is common, and when allowing DMAs programmed in user
>> > space, the right thing to do.
>>
>> It is common indeed. Why it the right thing to do?
>>
>>
> Compared with UIO, for sure. VFIO does have the right view of the system
> in terms of which devices can properly be isolated. Can you confirm a
> specific implementation by a vendor can ensure same behaviour? If so, do
> you have duplicated code then? if the answer is your are using VFIO data,
> why not to use VFIO as the interface and add the required connection
> between VFIO and drivers?
>
> What about mapping validation? is the driver doing that part or relying on
> kernel code? or is it just assuming the mapping is safe?
>
>
>> I'm assuming there is an IOMMU hardware and
>> > this is what Mellanox and NXP rely on in some way or another.
>>
>> For Mellanox, the device works with virtual memory, not physical. If you
>> think of it, it is more secure for user space application. Mellanox device
>> has internal memory translation unit between virtual memory and physical
>> memory.
>> IOMMU can be added on top of it, in case the host doesn't trust the
>> device or the device is given to untrusted entity like VM.
>>
>>
> Any current NIC or device will work with virtual addresses if IOMMU is in
> place, not matter if the device is IOMMU-aware or not. Any vendor, with
> that capability in their devices, should follow generic paths and a common
> interface with the vendor drivers being the executors. The drivers know how
> to tell the device, but they should be told what to tell and not by the
> user but by the kernel.
>
> I think reading your comment "in case the host doesn't trust the device"
> makes easier to understand what you try to obtain, and at the same time
> makes my concerns not a problem at all. This is a case for DPDK being used
> in certain scenarios where the full system is trusted, what I think is a
> completely rightful option. My only concern then is the complexity it could
> imply sooner or later, although I have to admit it is not a strong one :-)
>
>
I forgot to mention the problem of leaving that option open in not fully
trusted systems. I do not know how it could be avoided, maybe some checks
in EAL initialization, but maybe this is not possible at all. Anyway, I
think this is worth to be discussed further.
> >
>> > Having each driver doing things in their own way will end up in a
>> harder to
>> > validate system.
>>
>> Different vendors will have different HW implementations. We cannot force
>> everybody to align the IOMMU.
>> What we can do, is to ease the user life and provide vendor agnostic APIs
>> which just provide the needed functionality. On our case DMA map and unmap.
>> The user should not care if its IOMMU, Mellanox memory registration
>> through verbs or NXP special mapping.
>>
>> The sys admin should set/unset the IOMMU as a general mean of protection.
>> And this of course will work also w/ Mellanox devices.
>>
>> If there is an IOMMU hardware, same mechanism should be
>> > used always, leaving to the IOMMU hw specific implementation to deal
>> with
>> > the details. If a NIC is IOMMU-able, that should not be supported by
>> specific
>> > vendor drivers but through a generic solution like VFIO which will
>> validate a
>> > device with such capability and to perform the required actions for
>> that case.
>> > VFIO and IOMMU should be modified as needed for supporting this
>> > requirement instead of leaving vendor drivers to implement their own
>> > solution.
>>
>> Again - I am against of forcing every PCI device to use VFIO, and I don't
>> think IOMMU as a HW device should control other PCI devices.
>> I see nothing wrong with device which also has extra capabilities of
>> memory translation, and adds another level of security to the user
>> application.
>>
>>
> In a system with untrusted components using the device, a generic way of
> properly configure the system with the right protections should be used
> instead of relying on specific vendor implementation.
>
>
>> >
>> > In any case, I think this support should be in a different patchset
>> than the
>> > private user space mappings.
>> >
>> >
>
> >
>> > > The work in this patch moves the DMA mapping to vendor agnostic APIs.
>> > > A new map and unmap ops were added to rte_bus structure.
>> > > Implementation of those was done currently only on the PCI bus. The
>> > > implementation takes the driver map and umap implementation as bypass
>> > to the VFIO mapping.
>> > > That is, in case of no specific map/unmap from the PCI driver, VFIO
>> > > mapping, if possible, will be used.
>> > >
>> > > Application use with those APIs is quite simple:
>> > > * allocate memory
>> > > * take a device, and query its rte_device.
>> > > * call the bus map function for this device.
>> > >
>> > > Future work will deprecate the rte_vfio_dma_map and
>> > rte_vfio_dma_unmap
>> > > APIs, leaving the PCI device APIs as the preferred option for the
>> user.
>> > >
>> > > [1]
>> > >
>> > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpat
>> > >
>> > ches.dpdk.org%2Fpatch%2F47796%2F&data=02%7C01%7Cshahafs%40
>> > mellanox
>> > >
>> > .com%7Cdc209a2ceace48c0452008d691a8762d%7Ca652971c7d2e4d9ba6a4d1
>> > 49256f
>> > >
>> > 461b%7C0%7C0%7C636856550053348339&sdata=3TEUJfS9jBOsbvaPYwo
>> > itQLj7o
>> > > h9VCrtaK7We%2FItg5c%3D&reserved=0
>> > >
>> > > Shahaf Shuler (6):
>> > > vfio: allow DMA map of memory for the default vfio fd
>> > > vfio: don't fail to DMA map if memory is already mapped
>> > > bus: introduce DMA memory mapping for external memory
>> > > net/mlx5: refactor external memory registration
>> > > net/mlx5: support PCI device DMA map and unmap
>> > > doc: deprecate VFIO DMA map APIs
>> > >
>> > > doc/guides/prog_guide/env_abstraction_layer.rst | 2 +-
>> > > doc/guides/rel_notes/deprecation.rst | 4 +
>> > > drivers/bus/pci/pci_common.c | 78 +++++++
>> > > drivers/bus/pci/rte_bus_pci.h | 14 ++
>> > > drivers/net/mlx5/mlx5.c | 2 +
>> > > drivers/net/mlx5/mlx5_mr.c | 232
>> ++++++++++++++++---
>> > > drivers/net/mlx5/mlx5_rxtx.h | 5 +
>> > > lib/librte_eal/common/eal_common_bus.c | 22 ++
>> > > lib/librte_eal/common/include/rte_bus.h | 57 +++++
>> > > lib/librte_eal/common/include/rte_vfio.h | 12 +-
>> > > lib/librte_eal/linuxapp/eal/eal_vfio.c | 26 ++-
>> > > lib/librte_eal/rte_eal_version.map | 2 +
>> > > 12 files changed, 418 insertions(+), 38 deletions(-)
>> > >
>> > > --
>> > > 2.12.0
>> > >
>> > >
>>
>
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for external memory
2019-02-14 12:22 ` Alejandro Lucero
2019-02-14 12:27 ` Alejandro Lucero
@ 2019-02-14 13:41 ` Shahaf Shuler
2019-02-14 16:43 ` Burakov, Anatoly
1 sibling, 1 reply; 79+ messages in thread
From: Shahaf Shuler @ 2019-02-14 13:41 UTC (permalink / raw)
To: Alejandro Lucero
Cc: Burakov, Anatoly, Yongseok Koh, Thomas Monjalon, Ferruh Yigit,
nhorman, Gaetan Rivet, dev
Thursday, February 14, 2019 2:22 PM, Alejandro Lucero:
>On Wed, Feb 13, 2019 at 7:24 PM Shahaf Shuler <shahafs@mellanox.com> wrote:
>Wednesday, February 13, 2019 1:43 PM, Alejandro Lucero:
>> Subject: Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for
>> external memory
>>
>> On Wed, Feb 13, 2019 at 9:11 AM Shahaf Shuler <shahafs@mellanox.com>
>> wrote:
>>
>> > This series is in continue to RFC[1].
>> >
>> > The DPDK APIs expose 3 different modes to work with memory used for
>> DMA:
>> >
>> > 1. Use the DPDK owned memory (backed by the DPDK provided
>> hugepages).
>> > This memory is allocated by the DPDK libraries, included in the DPDK
>> > memory system (memseg lists) and automatically DMA mapped by the
>> DPDK
>> > layers.
>> >
>> > 2. Use memory allocated by the user and register to the DPDK memory
>> > systems. This is also referred as external memory. Upon registration
>> > of the external memory, the DPDK layers will DMA map it to all needed
>> > devices.
>> >
>> > 3. Use memory allocated by the user and not registered to the DPDK
>> > memory system. This is for users who wants to have tight control on
>> > this memory. The user will need to explicitly call DMA map function in
>> > order to register such memory to the different devices.
>> >
>> > The scope of the patch focus on #3 above.
>> >
>> >
>> Why can not we have case 2 covering case 3?
>
>Because it is not our choice rather the DPDK application.
>We could not allow it, and force the application to register their external memory to the DPDK memory management system. However IMO it will be wrong.
>The use case exists - some application wants to manage their memory by themselves. w/o the extra overhead of rte_malloc, without creating a special socket to populate the memory and without redundant API calls to rte_extmem_*.
>
>Simply allocate chunk of memory, DMA map it to device and that’s it.
>
>Usability is a strong point, but up to some extent. DPDK is all about performance, and adding options the user can choose from will add pressure and complexity for keeping the performance. Your proposal makes sense from an user point of view, but will it avoid to modify things in the DPDK core for supporting this case broadly in the future? Multiprocess will be hard to get, if not impossible, without adding more complexity, and although you likely do not expect that use case requiring multiprocess support, once we have DPDK apps using this model, sooner or later those companies with products based on such option will demand broadly support. I can foresee not just multiprocess support will require changes in the future.
>
>This reminds me the case of obtaining real time: the more complexity the less determinism can be obtained. It is not impossible, simply it is far more complex. Pure real time operating systems can add new functionalities, but it is hard to do it properly without jeopardising the main goal. Generic purpose operating systems can try to improve determinism, but up to some extent and with important complexity costs. DPDK is the real time operating system in this comparison.
it makes some sense.
as I wrote to Anatoly, I am not against forcing the user to work only w/ DPDK registered memory.
we may cause some overhead to application, but will make things less complex. we just need to agree on it, and remove backdoors like vfio_dma_map (which BTW, currently being used by application, checkout VPP).
>
>>
>>
>> > Currently the only way to map external memory is through VFIO
>> > (rte_vfio_dma_map). While VFIO is common, there are other vendors
>> > which use different ways to map memory (e.g. Mellanox and NXP).
>> >
>> >
>> As you say, VFIO is common, and when allowing DMAs programmed in user
>> space, the right thing to do.
>
>It is common indeed. Why it the right thing to do?
>
>Compared with UIO, for sure. VFIO does have the right view of the system in terms of which devices can properly be isolated. Can you confirm a specific implementation by a vendor can ensure same behaviour? If so, do you have duplicated code then? if the answer is your are using VFIO data, why not to use VFIO as the interface and add the required connection between VFIO and drivers?
>
>What about mapping validation? is the driver doing that part or relying on kernel code? or is it just assuming the mapping is safe?
mapping validation is done by Mellanox kernel module, the kernel is trusted.
>
> I'm assuming there is an IOMMU hardware and
>> this is what Mellanox and NXP rely on in some way or another.
>
>For Mellanox, the device works with virtual memory, not physical. If you think of it, it is more secure for user space application. Mellanox device has internal memory translation unit between virtual memory and physical memory.
>IOMMU can be added on top of it, in case the host doesn't trust the device or the device is given to untrusted entity like VM.
>
>Any current NIC or device will work with virtual addresses if IOMMU is in place, not matter if the device is IOMMU-aware or not.
Not sure what you mean here. For example Intel devices works w/ VFIO and use iova to provide buffers to NIC. hence protection between multiple process is by application responsibility or new VFIO container.
for devices which works w/ virtual addresses, sharing of device with multiple process is simple and secure.
Any vendor, with that capability in their devices, should follow generic paths and a common interface with the vendor drivers being the executors. The drivers know how to tell the device, but they should be told what to tell and not by the user but by the kernel.
>
>I think reading your comment "in case the host doesn't trust the device" makes easier to understand what you try to obtain, and at the same time makes my concerns not a problem at all. This is a case for DPDK being used in certain scenarios where the full system is trusted, what I think is a completely rightful option. My only concern then is the complexity it could imply sooner or later, although I have to admit it is not a strong one :-)
>
>>
>> Having each driver doing things in their own way will end up in a harder to
>> validate system.
>
>Different vendors will have different HW implementations. We cannot force everybody to align the IOMMU.
>What we can do, is to ease the user life and provide vendor agnostic APIs which just provide the needed functionality. On our case DMA map and unmap.
>The user should not care if its IOMMU, Mellanox memory registration through verbs or NXP special mapping.
>
>The sys admin should set/unset the IOMMU as a general mean of protection. And this of course will work also w/ Mellanox devices.
>
>If there is an IOMMU hardware, same mechanism should be
>> used always, leaving to the IOMMU hw specific implementation to deal with
>> the details. If a NIC is IOMMU-able, that should not be supported by specific
>> vendor drivers but through a generic solution like VFIO which will validate a
>> device with such capability and to perform the required actions for that case.
>> VFIO and IOMMU should be modified as needed for supporting this
>> requirement instead of leaving vendor drivers to implement their own
>> solution.
>
>Again - I am against of forcing every PCI device to use VFIO, and I don't think IOMMU as a HW device should control other PCI devices.
>I see nothing wrong with device which also has extra capabilities of memory translation, and adds another level of security to the user application.
>
>In a system with untrusted components using the device, a generic way of properly configure the system with the right protections should be used instead of relying on specific vendor implementation.
>
>>
>> In any case, I think this support should be in a different patchset than the
>> private user space mappings.
>>
>>
>>
>> > The work in this patch moves the DMA mapping to vendor agnostic APIs.
>> > A new map and unmap ops were added to rte_bus structure.
>> > Implementation of those was done currently only on the PCI bus. The
>> > implementation takes the driver map and umap implementation as bypass
>> to the VFIO mapping.
>> > That is, in case of no specific map/unmap from the PCI driver, VFIO
>> > mapping, if possible, will be used.
>> >
>
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for external memory
2019-02-14 13:41 ` Shahaf Shuler
@ 2019-02-14 16:43 ` Burakov, Anatoly
0 siblings, 0 replies; 79+ messages in thread
From: Burakov, Anatoly @ 2019-02-14 16:43 UTC (permalink / raw)
To: Shahaf Shuler, Alejandro Lucero
Cc: Yongseok Koh, Thomas Monjalon, Ferruh Yigit, nhorman, Gaetan Rivet, dev
On 14-Feb-19 1:41 PM, Shahaf Shuler wrote:
> Thursday, February 14, 2019 2:22 PM, Alejandro Lucero:
>
> >Any current NIC or device will work with virtual addresses if IOMMU is
> in place, not matter if the device is IOMMU-aware or not.
>
> Not sure what you mean here. For example Intel devices works w/ VFIO and
> use iova to provide buffers to NIC. hence protection between multiple
> process is by application responsibility or new VFIO container.
>
As far as VFIO is concerned, "multiprocess protection" is not a thing,
because the device cannot be used twice in the first place - each usage
is strictly limited to one VFIO container. We just sidestep this
"limitation" by sharing container/device file descriptors with multiple
processes via IPC.
So while it's technically true that multiprocess protection is
"application responsibility" because we can pass around fd's, it's still
protected by the kernel. IOVA mappings are per-container, so the same
IOVA range can be mapped twice (thrice...), as long as it's for a
different set of devices, in effect making them virtual addresses.
--
Thanks,
Anatoly
^ permalink raw reply [flat|nested] 79+ messages in thread
* [dpdk-dev] [PATCH v2 0/6] introduce DMA memory mapping for external memory
2019-02-13 9:10 [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for external memory Shahaf Shuler
` (6 preceding siblings ...)
2019-02-13 11:43 ` [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for external memory Alejandro Lucero
@ 2019-02-21 14:50 ` Shahaf Shuler
2019-03-05 13:59 ` [dpdk-dev] [PATCH v3 " Shahaf Shuler
` (6 more replies)
2019-02-21 14:50 ` [dpdk-dev] [PATCH v2 1/6] vfio: allow DMA map of memory for the default vfio fd Shahaf Shuler
` (5 subsequent siblings)
13 siblings, 7 replies; 79+ messages in thread
From: Shahaf Shuler @ 2019-02-21 14:50 UTC (permalink / raw)
To: anatoly.burakov, yskoh, thomas, ferruh.yigit, nhorman, gaetan.rivet; +Cc: dev
The DPDK APIs expose 3 different modes to work with memory used for DMA:
1. Use the DPDK owned memory (backed by the DPDK provided hugepages).
This memory is allocated by the DPDK libraries, included in the DPDK
memory system (memseg lists) and automatically DMA mapped by the DPDK
layers.
2. Use memory allocated by the user and register to the DPDK memory
systems. Upon registration of memory, the DPDK layers will DMA map it
to all needed devices. After registration, allocation of this memory
will be done with rte_*malloc APIs.
3. Use memory allocated by the user and not registered to the DPDK memory
system. This is for users who wants to have tight control on this
memory (e.g. avoid the rte_malloc header).
The user should create a memory, register it through rte_extmem_register
API, and call DMA map function in order to register such memory to
the different devices.
The scope of the patch focus on #3 above.
Currently the only way to map external memory is through VFIO
(rte_vfio_dma_map). While VFIO is common, there are other vendors
which use different ways to map memory (e.g. Mellanox and NXP).
The work in this patch moves the DMA mapping to vendor agnostic APIs.
Device level DMA map and unmap APIs were added. Implementation of those
APIs was done currently only for PCI devices.
For PCI bus devices, the pci driver can expose its own map and unmap
functions to be used for the mapping. In case the driver doesn't provide
any, the memory will be mapped, if possible, to IOMMU through VFIO APIs.
Application usage with those APIs is quite simple:
* allocate memory
* call rte_extmem_register on the memory chunk.
* take a device, and query its rte_device.
* call the device specific mapping function for this device.
Future work will deprecate the rte_vfio_dma_map and rte_vfio_dma_unmap
APIs, leaving the rte device APIs as the preferred option for the user.
On v2:
- Added warn in release notes about the API change in vfio.
- Moved function doc to prototype declaration.
- Used dma_map and dma_unmap instead of map and unmap.
- Used RTE_VFIO_DEFAULT_CONTAINER_FD instead of -1 fixed value.
- Moved bus function to eal_common_dev.c and renamed them properly.
- Changed eth device iterator to use RTE_DEV_FOREACH.
- Enforced memory is registered with rte_extmem_* prior to mapping.
- Used EEXIST as the only possible return value from type1 vfio IOMMU mapping.
[1] https://patches.dpdk.org/patch/47796/
Shahaf Shuler (6):
vfio: allow DMA map of memory for the default vfio fd
vfio: don't fail to DMA map if memory is already mapped
bus: introduce device level DMA memory mapping
net/mlx5: refactor external memory registration
net/mlx5: support PCI device DMA map and unmap
doc: deprecate VFIO DMA map APIs
doc/guides/prog_guide/env_abstraction_layer.rst | 2 +-
doc/guides/rel_notes/deprecation.rst | 4 +
doc/guides/rel_notes/release_19_05.rst | 3 +
drivers/bus/pci/pci_common.c | 48 ++++
drivers/bus/pci/rte_bus_pci.h | 40 ++++
drivers/net/mlx5/mlx5.c | 2 +
drivers/net/mlx5/mlx5_mr.c | 221 ++++++++++++++++---
drivers/net/mlx5/mlx5_rxtx.h | 5 +
lib/librte_eal/common/eal_common_dev.c | 29 +++
lib/librte_eal/common/include/rte_bus.h | 44 ++++
lib/librte_eal/common/include/rte_dev.h | 43 ++++
lib/librte_eal/common/include/rte_vfio.h | 14 +-
lib/librte_eal/linuxapp/eal/eal_vfio.c | 22 +-
lib/librte_eal/rte_eal_version.map | 2 +
14 files changed, 441 insertions(+), 38 deletions(-)
--
2.12.0
^ permalink raw reply [flat|nested] 79+ messages in thread
* [dpdk-dev] [PATCH v3 0/6] introduce DMA memory mapping for external memory
2019-02-21 14:50 ` [dpdk-dev] [PATCH v2 " Shahaf Shuler
@ 2019-03-05 13:59 ` Shahaf Shuler
2019-03-10 8:27 ` [dpdk-dev] [PATCH v4 " Shahaf Shuler
2019-03-05 13:59 ` [dpdk-dev] [PATCH v3 1/6] vfio: allow DMA map of memory for the default vfio fd Shahaf Shuler
` (5 subsequent siblings)
6 siblings, 1 reply; 79+ messages in thread
From: Shahaf Shuler @ 2019-03-05 13:59 UTC (permalink / raw)
To: anatoly.burakov, yskoh, thomas, ferruh.yigit, nhorman, gaetan.rivet; +Cc: dev
The DPDK APIs expose 3 different modes to work with memory used for DMA:
1. Use the DPDK owned memory (backed by the DPDK provided hugepages).
This memory is allocated by the DPDK libraries, included in the DPDK
memory system (memseg lists) and automatically DMA mapped by the DPDK
layers.
2. Use memory allocated by the user and register to the DPDK memory
systems. Upon registration of memory, the DPDK layers will DMA map it
to all needed devices. After registration, allocation of this memory
will be done with rte_*malloc APIs.
3. Use memory allocated by the user and not registered to the DPDK memory
system. This is for users who wants to have tight control on this
memory (e.g. avoid the rte_malloc header).
The user should create a memory, register it through rte_extmem_register
API, and call DMA map function in order to register such memory to
the different devices.
The scope of the patch focus on #3 above.
Currently the only way to map external memory is through VFIO
(rte_vfio_dma_map). While VFIO is common, there are other vendors
which use different ways to map memory (e.g. Mellanox and NXP).
The work in this patch moves the DMA mapping to vendor agnostic APIs.
Device level DMA map and unmap APIs were added. Implementation of those
APIs was done currently only for PCI devices.
For PCI bus devices, the pci driver can expose its own map and unmap
functions to be used for the mapping. In case the driver doesn't provide
any, the memory will be mapped, if possible, to IOMMU through VFIO APIs.
Application usage with those APIs is quite simple:
* allocate memory
* call rte_extmem_register on the memory chunk.
* take a device, and query its rte_device.
* call the device specific mapping function for this device.
Future work will deprecate the rte_vfio_dma_map and rte_vfio_dma_unmap
APIs, leaving the rte device APIs as the preferred option for the user.
On v3:
- Fixed compilation issue on freebsd.
- Fixed forgotten rte_bus_dma_map to rte_dev_dma_map.
- Removed __rte_deprecated from vfio function till the time the rte_dev_dma_map
will be non-experimental.
- Changed error return value to always be -1, with proper errno.
- Used rte_mem_virt2memseg_list instead of rte_mem_virt2memseg to verify
memory is registered.
- Added above check also on dma_unmap calls.
- Added note in the API the memory must be registered in advance.
- Added debug log to report the case memory mapping to vfio was skipped.
On v2:
- Added warn in release notes about the API change in vfio.
- Moved function doc to prototype declaration.
- Used dma_map and dma_unmap instead of map and unmap.
- Used RTE_VFIO_DEFAULT_CONTAINER_FD instead of -1 fixed value.
- Moved bus function to eal_common_dev.c and renamed them properly.
- Changed eth device iterator to use RTE_DEV_FOREACH.
- Enforced memory is registered with rte_extmem_* prior to mapping.
- Used EEXIST as the only possible return value from type1 vfio IOMMU mapping.
[1] https://patches.dpdk.org/patch/47796/
Shahaf Shuler (6):
vfio: allow DMA map of memory for the default vfio fd
vfio: don't fail to DMA map if memory is already mapped
bus: introduce device level DMA memory mapping
net/mlx5: refactor external memory registration
net/mlx5: support PCI device DMA map and unmap
doc: deprecation notice for VFIO DMA map APIs
doc/guides/prog_guide/env_abstraction_layer.rst | 2 +-
doc/guides/rel_notes/deprecation.rst | 4 +
doc/guides/rel_notes/release_19_05.rst | 3 +
drivers/bus/pci/pci_common.c | 48 ++++
drivers/bus/pci/rte_bus_pci.h | 40 ++++
drivers/net/mlx5/mlx5.c | 2 +
drivers/net/mlx5/mlx5_mr.c | 225 ++++++++++++++++---
drivers/net/mlx5/mlx5_rxtx.h | 5 +
lib/librte_eal/common/eal_common_dev.c | 34 +++
lib/librte_eal/common/include/rte_bus.h | 44 ++++
lib/librte_eal/common/include/rte_dev.h | 47 ++++
lib/librte_eal/common/include/rte_vfio.h | 8 +-
lib/librte_eal/linuxapp/eal/eal_vfio.c | 42 +++-
lib/librte_eal/rte_eal_version.map | 2 +
14 files changed, 468 insertions(+), 38 deletions(-)
--
2.12.0
^ permalink raw reply [flat|nested] 79+ messages in thread
* [dpdk-dev] [PATCH v4 0/6] introduce DMA memory mapping for external memory
2019-03-05 13:59 ` [dpdk-dev] [PATCH v3 " Shahaf Shuler
@ 2019-03-10 8:27 ` Shahaf Shuler
2019-03-10 8:27 ` [dpdk-dev] [PATCH v4 1/6] vfio: allow DMA map of memory for the default vfio fd Shahaf Shuler
` (7 more replies)
0 siblings, 8 replies; 79+ messages in thread
From: Shahaf Shuler @ 2019-03-10 8:27 UTC (permalink / raw)
To: anatoly.burakov, yskoh, thomas, ferruh.yigit, nhorman, gaetan.rivet; +Cc: dev
The DPDK APIs expose 3 different modes to work with memory used for DMA:
1. Use the DPDK owned memory (backed by the DPDK provided hugepages).
This memory is allocated by the DPDK libraries, included in the DPDK
memory system (memseg lists) and automatically DMA mapped by the DPDK
layers.
2. Use memory allocated by the user and register to the DPDK memory
systems. Upon registration of memory, the DPDK layers will DMA map it
to all needed devices. After registration, allocation of this memory
will be done with rte_*malloc APIs.
3. Use memory allocated by the user and not registered to the DPDK memory
system. This is for users who wants to have tight control on this
memory (e.g. avoid the rte_malloc header).
The user should create a memory, register it through rte_extmem_register
API, and call DMA map function in order to register such memory to
the different devices.
The scope of the patch focus on #3 above.
Currently the only way to map external memory is through VFIO
(rte_vfio_dma_map). While VFIO is common, there are other vendors
which use different ways to map memory (e.g. Mellanox and NXP).
The work in this patch moves the DMA mapping to vendor agnostic APIs.
Device level DMA map and unmap APIs were added. Implementation of those
APIs was done currently only for PCI devices.
For PCI bus devices, the pci driver can expose its own map and unmap
functions to be used for the mapping. In case the driver doesn't provide
any, the memory will be mapped, if possible, to IOMMU through VFIO APIs.
Application usage with those APIs is quite simple:
* allocate memory
* call rte_extmem_register on the memory chunk.
* take a device, and query its rte_device.
* call the device specific mapping function for this device.
Future work will deprecate the rte_vfio_dma_map and rte_vfio_dma_unmap
APIs, leaving the rte device APIs as the preferred option for the user.
On v4:
- Changed rte_dev_dma_map errno to ENOTSUP in case bus doesn't support
DMA map API.
On v3:
- Fixed compilation issue on freebsd.
- Fixed forgotten rte_bus_dma_map to rte_dev_dma_map.
- Removed __rte_deprecated from vfio function till the time the rte_dev_dma_map
will be non-experimental.
- Changed error return value to always be -1, with proper errno.
- Used rte_mem_virt2memseg_list instead of rte_mem_virt2memseg to verify
memory is registered.
- Added above check also on dma_unmap calls.
- Added note in the API the memory must be registered in advance.
- Added debug log to report the case memory mapping to vfio was skipped.
On v2:
- Added warn in release notes about the API change in vfio.
- Moved function doc to prototype declaration.
- Used dma_map and dma_unmap instead of map and unmap.
- Used RTE_VFIO_DEFAULT_CONTAINER_FD instead of -1 fixed value.
- Moved bus function to eal_common_dev.c and renamed them properly.
- Changed eth device iterator to use RTE_DEV_FOREACH.
- Enforced memory is registered with rte_extmem_* prior to mapping.
- Used EEXIST as the only possible return value from type1 vfio IOMMU mapping.
[1] https://patches.dpdk.org/patch/47796/
Shahaf Shuler (6):
vfio: allow DMA map of memory for the default vfio fd
vfio: don't fail to DMA map if memory is already mapped
bus: introduce device level DMA memory mapping
net/mlx5: refactor external memory registration
net/mlx5: support PCI device DMA map and unmap
doc: deprecation notice for VFIO DMA map APIs
doc/guides/prog_guide/env_abstraction_layer.rst | 2 +-
doc/guides/rel_notes/deprecation.rst | 4 +
doc/guides/rel_notes/release_19_05.rst | 3 +
drivers/bus/pci/pci_common.c | 48 ++++
drivers/bus/pci/rte_bus_pci.h | 40 ++++
drivers/net/mlx5/mlx5.c | 2 +
drivers/net/mlx5/mlx5_mr.c | 225 ++++++++++++++++---
drivers/net/mlx5/mlx5_rxtx.h | 5 +
lib/librte_eal/common/eal_common_dev.c | 34 +++
lib/librte_eal/common/include/rte_bus.h | 44 ++++
lib/librte_eal/common/include/rte_dev.h | 47 ++++
lib/librte_eal/common/include/rte_vfio.h | 8 +-
lib/librte_eal/linuxapp/eal/eal_vfio.c | 42 +++-
lib/librte_eal/rte_eal_version.map | 2 +
14 files changed, 468 insertions(+), 38 deletions(-)
--
2.12.0
^ permalink raw reply [flat|nested] 79+ messages in thread
* [dpdk-dev] [PATCH v4 1/6] vfio: allow DMA map of memory for the default vfio fd
2019-03-10 8:27 ` [dpdk-dev] [PATCH v4 " Shahaf Shuler
@ 2019-03-10 8:27 ` Shahaf Shuler
2019-03-30 0:23 ` Thomas Monjalon
2019-03-10 8:27 ` [dpdk-dev] [PATCH v4 2/6] vfio: don't fail to DMA map if memory is already mapped Shahaf Shuler
` (6 subsequent siblings)
7 siblings, 1 reply; 79+ messages in thread
From: Shahaf Shuler @ 2019-03-10 8:27 UTC (permalink / raw)
To: anatoly.burakov, yskoh, thomas, ferruh.yigit, nhorman, gaetan.rivet; +Cc: dev
Enable users the option to call rte_vfio_dma_map with request to map
to the default vfio fd.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
doc/guides/rel_notes/release_19_05.rst | 3 +++
lib/librte_eal/common/include/rte_vfio.h | 8 ++++++--
lib/librte_eal/linuxapp/eal/eal_vfio.c | 10 ++++++++--
3 files changed, 17 insertions(+), 4 deletions(-)
diff --git a/doc/guides/rel_notes/release_19_05.rst b/doc/guides/rel_notes/release_19_05.rst
index 61a2c73837..aa77b24bf5 100644
--- a/doc/guides/rel_notes/release_19_05.rst
+++ b/doc/guides/rel_notes/release_19_05.rst
@@ -136,6 +136,9 @@ ABI Changes
Also, make sure to start the actual text at the margin.
=========================================================
+* vfio: Functions ``rte_vfio_container_dma_map`` and
+ ``rte_vfio_container_dma_unmap`` have been extended with an option to
+ request mapping or un-mapping to the default vfio container fd.
Shared Library Versions
-----------------------
diff --git a/lib/librte_eal/common/include/rte_vfio.h b/lib/librte_eal/common/include/rte_vfio.h
index cae96fab90..cdfbedc1f9 100644
--- a/lib/librte_eal/common/include/rte_vfio.h
+++ b/lib/librte_eal/common/include/rte_vfio.h
@@ -80,6 +80,8 @@ struct vfio_device_info;
#endif /* VFIO_PRESENT */
+#define RTE_VFIO_DEFAULT_CONTAINER_FD (-1)
+
/**
* Setup vfio_cfg for the device identified by its address.
* It discovers the configured I/O MMU groups or sets a new one for the device.
@@ -347,7 +349,8 @@ rte_vfio_container_group_unbind(int container_fd, int iommu_group_num);
* Perform DMA mapping for devices in a container.
*
* @param container_fd
- * the specified container fd
+ * the specified container fd. Use RTE_VFIO_DEFAULT_CONTAINER_FD to
+ * use the default container.
*
* @param vaddr
* Starting virtual address of memory to be mapped.
@@ -370,7 +373,8 @@ rte_vfio_container_dma_map(int container_fd, uint64_t vaddr,
* Perform DMA unmapping for devices in a container.
*
* @param container_fd
- * the specified container fd
+ * the specified container fd. Use RTE_VFIO_DEFAULT_CONTAINER_FD to
+ * use the default container.
*
* @param vaddr
* Starting virtual address of memory to be unmapped.
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index c821e83826..9adbda8bb7 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -1897,7 +1897,10 @@ rte_vfio_container_dma_map(int container_fd, uint64_t vaddr, uint64_t iova,
return -1;
}
- vfio_cfg = get_vfio_cfg_by_container_fd(container_fd);
+ if (container_fd == RTE_VFIO_DEFAULT_CONTAINER_FD)
+ vfio_cfg = default_vfio_cfg;
+ else
+ vfio_cfg = get_vfio_cfg_by_container_fd(container_fd);
if (vfio_cfg == NULL) {
RTE_LOG(ERR, EAL, "Invalid container fd\n");
return -1;
@@ -1917,7 +1920,10 @@ rte_vfio_container_dma_unmap(int container_fd, uint64_t vaddr, uint64_t iova,
return -1;
}
- vfio_cfg = get_vfio_cfg_by_container_fd(container_fd);
+ if (container_fd == RTE_VFIO_DEFAULT_CONTAINER_FD)
+ vfio_cfg = default_vfio_cfg;
+ else
+ vfio_cfg = get_vfio_cfg_by_container_fd(container_fd);
if (vfio_cfg == NULL) {
RTE_LOG(ERR, EAL, "Invalid container fd\n");
return -1;
--
2.12.0
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH v4 1/6] vfio: allow DMA map of memory for the default vfio fd
2019-03-10 8:27 ` [dpdk-dev] [PATCH v4 1/6] vfio: allow DMA map of memory for the default vfio fd Shahaf Shuler
@ 2019-03-30 0:23 ` Thomas Monjalon
2019-03-30 0:23 ` Thomas Monjalon
2019-03-30 14:29 ` Thomas Monjalon
0 siblings, 2 replies; 79+ messages in thread
From: Thomas Monjalon @ 2019-03-30 0:23 UTC (permalink / raw)
To: Shahaf Shuler
Cc: dev, anatoly.burakov, yskoh, ferruh.yigit, nhorman, gaetan.rivet
10/03/2019 09:27, Shahaf Shuler:
> Enable users the option to call rte_vfio_dma_map with request to map
> to the default vfio fd.
>
> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
> --- a/doc/guides/rel_notes/release_19_05.rst
> +++ b/doc/guides/rel_notes/release_19_05.rst
> @@ -136,6 +136,9 @@ ABI Changes
> +* vfio: Functions ``rte_vfio_container_dma_map`` and
> + ``rte_vfio_container_dma_unmap`` have been extended with an option to
> + request mapping or un-mapping to the default vfio container fd.
Isn't it an API change rather than ABI?
It is adding -1 as a special value, I think it is not breaking
previous interface, and does not require a notification in the release notes.
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH v4 1/6] vfio: allow DMA map of memory for the default vfio fd
2019-03-30 0:23 ` Thomas Monjalon
@ 2019-03-30 0:23 ` Thomas Monjalon
2019-03-30 14:29 ` Thomas Monjalon
1 sibling, 0 replies; 79+ messages in thread
From: Thomas Monjalon @ 2019-03-30 0:23 UTC (permalink / raw)
To: Shahaf Shuler
Cc: dev, anatoly.burakov, yskoh, ferruh.yigit, nhorman, gaetan.rivet
10/03/2019 09:27, Shahaf Shuler:
> Enable users the option to call rte_vfio_dma_map with request to map
> to the default vfio fd.
>
> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
> --- a/doc/guides/rel_notes/release_19_05.rst
> +++ b/doc/guides/rel_notes/release_19_05.rst
> @@ -136,6 +136,9 @@ ABI Changes
> +* vfio: Functions ``rte_vfio_container_dma_map`` and
> + ``rte_vfio_container_dma_unmap`` have been extended with an option to
> + request mapping or un-mapping to the default vfio container fd.
Isn't it an API change rather than ABI?
It is adding -1 as a special value, I think it is not breaking
previous interface, and does not require a notification in the release notes.
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH v4 1/6] vfio: allow DMA map of memory for the default vfio fd
2019-03-30 0:23 ` Thomas Monjalon
2019-03-30 0:23 ` Thomas Monjalon
@ 2019-03-30 14:29 ` Thomas Monjalon
2019-03-30 14:29 ` Thomas Monjalon
1 sibling, 1 reply; 79+ messages in thread
From: Thomas Monjalon @ 2019-03-30 14:29 UTC (permalink / raw)
To: Shahaf Shuler
Cc: dev, anatoly.burakov, yskoh, ferruh.yigit, nhorman, gaetan.rivet
30/03/2019 01:23, Thomas Monjalon:
> 10/03/2019 09:27, Shahaf Shuler:
> > Enable users the option to call rte_vfio_dma_map with request to map
> > to the default vfio fd.
> >
> > Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> > Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
> > ---
> > --- a/doc/guides/rel_notes/release_19_05.rst
> > +++ b/doc/guides/rel_notes/release_19_05.rst
> > @@ -136,6 +136,9 @@ ABI Changes
> > +* vfio: Functions ``rte_vfio_container_dma_map`` and
> > + ``rte_vfio_container_dma_unmap`` have been extended with an option to
> > + request mapping or un-mapping to the default vfio container fd.
>
> Isn't it an API change rather than ABI?
> It is adding -1 as a special value, I think it is not breaking
> previous interface, and does not require a notification in the release notes.
I will move it to the "API changes" section.
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH v4 1/6] vfio: allow DMA map of memory for the default vfio fd
2019-03-30 14:29 ` Thomas Monjalon
@ 2019-03-30 14:29 ` Thomas Monjalon
0 siblings, 0 replies; 79+ messages in thread
From: Thomas Monjalon @ 2019-03-30 14:29 UTC (permalink / raw)
To: Shahaf Shuler
Cc: dev, anatoly.burakov, yskoh, ferruh.yigit, nhorman, gaetan.rivet
30/03/2019 01:23, Thomas Monjalon:
> 10/03/2019 09:27, Shahaf Shuler:
> > Enable users the option to call rte_vfio_dma_map with request to map
> > to the default vfio fd.
> >
> > Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> > Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
> > ---
> > --- a/doc/guides/rel_notes/release_19_05.rst
> > +++ b/doc/guides/rel_notes/release_19_05.rst
> > @@ -136,6 +136,9 @@ ABI Changes
> > +* vfio: Functions ``rte_vfio_container_dma_map`` and
> > + ``rte_vfio_container_dma_unmap`` have been extended with an option to
> > + request mapping or un-mapping to the default vfio container fd.
>
> Isn't it an API change rather than ABI?
> It is adding -1 as a special value, I think it is not breaking
> previous interface, and does not require a notification in the release notes.
I will move it to the "API changes" section.
^ permalink raw reply [flat|nested] 79+ messages in thread
* [dpdk-dev] [PATCH v4 2/6] vfio: don't fail to DMA map if memory is already mapped
2019-03-10 8:27 ` [dpdk-dev] [PATCH v4 " Shahaf Shuler
2019-03-10 8:27 ` [dpdk-dev] [PATCH v4 1/6] vfio: allow DMA map of memory for the default vfio fd Shahaf Shuler
@ 2019-03-10 8:27 ` Shahaf Shuler
2019-03-10 8:28 ` [dpdk-dev] [PATCH v4 3/6] bus: introduce device level DMA memory mapping Shahaf Shuler
` (5 subsequent siblings)
7 siblings, 0 replies; 79+ messages in thread
From: Shahaf Shuler @ 2019-03-10 8:27 UTC (permalink / raw)
To: anatoly.burakov, yskoh, thomas, ferruh.yigit, nhorman, gaetan.rivet; +Cc: dev
Currently vfio DMA map function will fail in case the same memory
segment is mapped twice.
This is too strict, as this is not an error to map the same memory
twice.
Instead, use the kernel return value to detect such state and have the
DMA function to return as successful.
For type1 mapping the kernel driver returns EEXISTS.
For spapr mapping EBUSY is returned since kernel 4.10.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
lib/librte_eal/linuxapp/eal/eal_vfio.c | 32 +++++++++++++++++++++++++----
1 file changed, 28 insertions(+), 4 deletions(-)
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index 9adbda8bb7..d0a0f9c16f 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -1264,9 +1264,21 @@ vfio_type1_dma_mem_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova,
ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, &dma_map);
if (ret) {
- RTE_LOG(ERR, EAL, " cannot set up DMA remapping, error %i (%s)\n",
- errno, strerror(errno));
+ /**
+ * In case the mapping was already done EEXIST will be
+ * returned from kernel.
+ */
+ if (errno == EEXIST) {
+ RTE_LOG(DEBUG, EAL,
+ " Memory segment is allready mapped,"
+ " skipping");
+ } else {
+ RTE_LOG(ERR, EAL,
+ " cannot set up DMA remapping,"
+ " error %i (%s)\n",
+ errno, strerror(errno));
return -1;
+ }
}
} else {
memset(&dma_unmap, 0, sizeof(dma_unmap));
@@ -1325,9 +1337,21 @@ vfio_spapr_dma_do_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova,
ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, &dma_map);
if (ret) {
- RTE_LOG(ERR, EAL, " cannot set up DMA remapping, error %i (%s)\n",
- errno, strerror(errno));
+ /**
+ * In case the mapping was already done EBUSY will be
+ * returned from kernel.
+ */
+ if (errno == EBUSY) {
+ RTE_LOG(DEBUG, EAL,
+ " Memory segment is allready mapped,"
+ " skipping");
+ } else {
+ RTE_LOG(ERR, EAL,
+ " cannot set up DMA remapping,"
+ " error %i (%s)\n", errno,
+ strerror(errno));
return -1;
+ }
}
} else {
--
2.12.0
^ permalink raw reply [flat|nested] 79+ messages in thread
* [dpdk-dev] [PATCH v4 3/6] bus: introduce device level DMA memory mapping
2019-03-10 8:27 ` [dpdk-dev] [PATCH v4 " Shahaf Shuler
2019-03-10 8:27 ` [dpdk-dev] [PATCH v4 1/6] vfio: allow DMA map of memory for the default vfio fd Shahaf Shuler
2019-03-10 8:27 ` [dpdk-dev] [PATCH v4 2/6] vfio: don't fail to DMA map if memory is already mapped Shahaf Shuler
@ 2019-03-10 8:28 ` Shahaf Shuler
2019-03-11 10:19 ` Burakov, Anatoly
2019-03-13 9:56 ` Thomas Monjalon
2019-03-10 8:28 ` [dpdk-dev] [PATCH v4 4/6] net/mlx5: refactor external memory registration Shahaf Shuler
` (4 subsequent siblings)
7 siblings, 2 replies; 79+ messages in thread
From: Shahaf Shuler @ 2019-03-10 8:28 UTC (permalink / raw)
To: anatoly.burakov, yskoh, thomas, ferruh.yigit, nhorman, gaetan.rivet; +Cc: dev
The DPDK APIs expose 3 different modes to work with memory used for DMA:
1. Use the DPDK owned memory (backed by the DPDK provided hugepages).
This memory is allocated by the DPDK libraries, included in the DPDK
memory system (memseg lists) and automatically DMA mapped by the DPDK
layers.
2. Use memory allocated by the user and register to the DPDK memory
systems. Upon registration of memory, the DPDK layers will DMA map it
to all needed devices. After registration, allocation of this memory
will be done with rte_*malloc APIs.
3. Use memory allocated by the user and not registered to the DPDK memory
system. This is for users who wants to have tight control on this
memory (e.g. avoid the rte_malloc header).
The user should create a memory, register it through rte_extmem_register
API, and call DMA map function in order to register such memory to
the different devices.
The scope of the patch focus on #3 above.
Currently the only way to map external memory is through VFIO
(rte_vfio_dma_map). While VFIO is common, there are other vendors
which use different ways to map memory (e.g. Mellanox and NXP).
The work in this patch moves the DMA mapping to vendor agnostic APIs.
Device level DMA map and unmap APIs were added. Implementation of those
APIs was done currently only for PCI devices.
For PCI bus devices, the pci driver can expose its own map and unmap
functions to be used for the mapping. In case the driver doesn't provide
any, the memory will be mapped, if possible, to IOMMU through VFIO APIs.
Application usage with those APIs is quite simple:
* allocate memory
* call rte_extmem_register on the memory chunk.
* take a device, and query its rte_device.
* call the device specific mapping function for this device.
Future work will deprecate the rte_vfio_dma_map and rte_vfio_dma_unmap
APIs, leaving the rte device APIs as the preferred option for the user.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
---
drivers/bus/pci/pci_common.c | 48 ++++++++++++++++++++++++++++
drivers/bus/pci/rte_bus_pci.h | 40 +++++++++++++++++++++++
lib/librte_eal/common/eal_common_dev.c | 34 ++++++++++++++++++++
lib/librte_eal/common/include/rte_bus.h | 44 +++++++++++++++++++++++++
lib/librte_eal/common/include/rte_dev.h | 47 +++++++++++++++++++++++++++
lib/librte_eal/rte_eal_version.map | 2 ++
6 files changed, 215 insertions(+)
diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index 6276e5d695..704b9d71af 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -528,6 +528,52 @@ pci_unplug(struct rte_device *dev)
return ret;
}
+static int
+pci_dma_map(struct rte_device *dev, void *addr, uint64_t iova, size_t len)
+{
+ struct rte_pci_device *pdev = RTE_DEV_TO_PCI(dev);
+
+ if (!pdev || !pdev->driver) {
+ rte_errno = EINVAL;
+ return -1;
+ }
+ if (pdev->driver->dma_map)
+ return pdev->driver->dma_map(pdev, addr, iova, len);
+ /**
+ * In case driver don't provides any specific mapping
+ * try fallback to VFIO.
+ */
+ if (pdev->kdrv == RTE_KDRV_VFIO)
+ return rte_vfio_container_dma_map
+ (RTE_VFIO_DEFAULT_CONTAINER_FD, (uintptr_t)addr,
+ iova, len);
+ rte_errno = ENOTSUP;
+ return -1;
+}
+
+static int
+pci_dma_unmap(struct rte_device *dev, void *addr, uint64_t iova, size_t len)
+{
+ struct rte_pci_device *pdev = RTE_DEV_TO_PCI(dev);
+
+ if (!pdev || !pdev->driver) {
+ rte_errno = EINVAL;
+ return -1;
+ }
+ if (pdev->driver->dma_unmap)
+ return pdev->driver->dma_unmap(pdev, addr, iova, len);
+ /**
+ * In case driver don't provides any specific mapping
+ * try fallback to VFIO.
+ */
+ if (pdev->kdrv == RTE_KDRV_VFIO)
+ return rte_vfio_container_dma_unmap
+ (RTE_VFIO_DEFAULT_CONTAINER_FD, (uintptr_t)addr,
+ iova, len);
+ rte_errno = ENOTSUP;
+ return -1;
+}
+
struct rte_pci_bus rte_pci_bus = {
.bus = {
.scan = rte_pci_scan,
@@ -536,6 +582,8 @@ struct rte_pci_bus rte_pci_bus = {
.plug = pci_plug,
.unplug = pci_unplug,
.parse = pci_parse,
+ .dma_map = pci_dma_map,
+ .dma_unmap = pci_dma_unmap,
.get_iommu_class = rte_pci_get_iommu_class,
.dev_iterate = rte_pci_dev_iterate,
.hot_unplug_handler = pci_hot_unplug_handler,
diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h
index f0d6d81c00..06e004cd3f 100644
--- a/drivers/bus/pci/rte_bus_pci.h
+++ b/drivers/bus/pci/rte_bus_pci.h
@@ -114,6 +114,44 @@ typedef int (pci_probe_t)(struct rte_pci_driver *, struct rte_pci_device *);
typedef int (pci_remove_t)(struct rte_pci_device *);
/**
+ * Driver-specific DMA mapping. After a successful call the device
+ * will be able to read/write from/to this segment.
+ *
+ * @param dev
+ * Pointer to the PCI device.
+ * @param addr
+ * Starting virtual address of memory to be mapped.
+ * @param iova
+ * Starting IOVA address of memory to be mapped.
+ * @param len
+ * Length of memory segment being mapped.
+ * @return
+ * - 0 On success.
+ * - Negative value and rte_errno is set otherwise.
+ */
+typedef int (pci_dma_map_t)(struct rte_pci_device *dev, void *addr,
+ uint64_t iova, size_t len);
+
+/**
+ * Driver-specific DMA un-mapping. After a successful call the device
+ * will not be able to read/write from/to this segment.
+ *
+ * @param dev
+ * Pointer to the PCI device.
+ * @param addr
+ * Starting virtual address of memory to be unmapped.
+ * @param iova
+ * Starting IOVA address of memory to be unmapped.
+ * @param len
+ * Length of memory segment being unmapped.
+ * @return
+ * - 0 On success.
+ * - Negative value and rte_errno is set otherwise.
+ */
+typedef int (pci_dma_unmap_t)(struct rte_pci_device *dev, void *addr,
+ uint64_t iova, size_t len);
+
+/**
* A structure describing a PCI driver.
*/
struct rte_pci_driver {
@@ -122,6 +160,8 @@ struct rte_pci_driver {
struct rte_pci_bus *bus; /**< PCI bus reference. */
pci_probe_t *probe; /**< Device Probe function. */
pci_remove_t *remove; /**< Device Remove function. */
+ pci_dma_map_t *dma_map; /**< device dma map function. */
+ pci_dma_unmap_t *dma_unmap; /**< device dma unmap function. */
const struct rte_pci_id *id_table; /**< ID table, NULL terminated. */
uint32_t drv_flags; /**< Flags RTE_PCI_DRV_*. */
};
diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
index fd7f5ca7d5..0ec42d8289 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -756,3 +756,37 @@ rte_dev_iterator_next(struct rte_dev_iterator *it)
free(cls_str);
return it->device;
}
+
+int
+rte_dev_dma_map(struct rte_device *dev, void *addr, uint64_t iova,
+ size_t len)
+{
+ if (dev->bus->dma_map == NULL || len == 0) {
+ rte_errno = ENOTSUP;
+ return -1;
+ }
+ /* Memory must be registered through rte_extmem_* APIs */
+ if (rte_mem_virt2memseg_list(addr) == NULL) {
+ rte_errno = EINVAL;
+ return -1;
+ }
+
+ return dev->bus->dma_map(dev, addr, iova, len);
+}
+
+int
+rte_dev_dma_unmap(struct rte_device *dev, void *addr, uint64_t iova,
+ size_t len)
+{
+ if (dev->bus->dma_unmap == NULL || len == 0) {
+ rte_errno = ENOTSUP;
+ return -1;
+ }
+ /* Memory must be registered through rte_extmem_* APIs */
+ if (rte_mem_virt2memseg_list(addr) == NULL) {
+ rte_errno = EINVAL;
+ return -1;
+ }
+
+ return dev->bus->dma_unmap(dev, addr, iova, len);
+}
diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index 6be4b5cabe..4faf2d20a0 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -168,6 +168,48 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
typedef int (*rte_bus_parse_t)(const char *name, void *addr);
/**
+ * Device level DMA map function.
+ * After a successful call, the memory segment will be mapped to the
+ * given device.
+ *
+ * @param dev
+ * Device pointer.
+ * @param addr
+ * Virtual address to map.
+ * @param iova
+ * IOVA address to map.
+ * @param len
+ * Length of the memory segment being mapped.
+ *
+ * @return
+ * 0 if mapping was successful.
+ * Negative value and rte_errno is set otherwise.
+ */
+typedef int (*rte_dev_dma_map_t)(struct rte_device *dev, void *addr,
+ uint64_t iova, size_t len);
+
+/**
+ * Device level DMA unmap function.
+ * After a successful call, the memory segment will no longer be
+ * accessible by the given device.
+ *
+ * @param dev
+ * Device pointer.
+ * @param addr
+ * Virtual address to unmap.
+ * @param iova
+ * IOVA address to unmap.
+ * @param len
+ * Length of the memory segment being mapped.
+ *
+ * @return
+ * 0 if un-mapping was successful.
+ * Negative value and rte_errno is set otherwise.
+ */
+typedef int (*rte_dev_dma_unmap_t)(struct rte_device *dev, void *addr,
+ uint64_t iova, size_t len);
+
+/**
* Implement a specific hot-unplug handler, which is responsible for
* handle the failure when device be hot-unplugged. When the event of
* hot-unplug be detected, it could call this function to handle
@@ -238,6 +280,8 @@ struct rte_bus {
rte_bus_plug_t plug; /**< Probe single device for drivers */
rte_bus_unplug_t unplug; /**< Remove single device from driver */
rte_bus_parse_t parse; /**< Parse a device name */
+ rte_dev_dma_map_t dma_map; /**< DMA map for device in the bus */
+ rte_dev_dma_unmap_t dma_unmap; /**< DMA unmap for device in the bus */
struct rte_bus_conf conf; /**< Bus configuration */
rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
rte_dev_iterate_t dev_iterate; /**< Device iterator. */
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index 3cad4bce57..0d5e25b500 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -463,4 +463,51 @@ rte_dev_hotplug_handle_enable(void);
int __rte_experimental
rte_dev_hotplug_handle_disable(void);
+/**
+ * Device level DMA map function.
+ * After a successful call, the memory segment will be mapped to the
+ * given device.
+ *
+ * @note: Memory must be registered in advance using rte_extmem_* APIs.
+ *
+ * @param dev
+ * Device pointer.
+ * @param addr
+ * Virtual address to map.
+ * @param iova
+ * IOVA address to map.
+ * @param len
+ * Length of the memory segment being mapped.
+ *
+ * @return
+ * 0 if mapping was successful.
+ * Negative value and rte_errno is set otherwise.
+ */
+int __rte_experimental
+rte_dev_dma_map(struct rte_device *dev, void *addr, uint64_t iova, size_t len);
+
+/**
+ * Device level DMA unmap function.
+ * After a successful call, the memory segment will no longer be
+ * accessible by the given device.
+ *
+ * @note: Memory must be registered in advance using rte_extmem_* APIs.
+ *
+ * @param dev
+ * Device pointer.
+ * @param addr
+ * Virtual address to unmap.
+ * @param iova
+ * IOVA address to unmap.
+ * @param len
+ * Length of the memory segment being mapped.
+ *
+ * @return
+ * 0 if un-mapping was successful.
+ * Negative value and rte_errno is set otherwise.
+ */
+int __rte_experimental
+rte_dev_dma_unmap(struct rte_device *dev, void *addr, uint64_t iova,
+ size_t len);
+
#endif /* _RTE_DEV_H_ */
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index eb5f7b9cbd..264aa050fa 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -277,6 +277,8 @@ EXPERIMENTAL {
rte_class_unregister;
rte_ctrl_thread_create;
rte_delay_us_sleep;
+ rte_dev_dma_map;
+ rte_dev_dma_unmap;
rte_dev_event_callback_process;
rte_dev_event_callback_register;
rte_dev_event_callback_unregister;
--
2.12.0
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH v4 3/6] bus: introduce device level DMA memory mapping
2019-03-10 8:28 ` [dpdk-dev] [PATCH v4 3/6] bus: introduce device level DMA memory mapping Shahaf Shuler
@ 2019-03-11 10:19 ` Burakov, Anatoly
2019-03-13 9:56 ` Thomas Monjalon
1 sibling, 0 replies; 79+ messages in thread
From: Burakov, Anatoly @ 2019-03-11 10:19 UTC (permalink / raw)
To: Shahaf Shuler, yskoh, thomas, ferruh.yigit, nhorman, gaetan.rivet; +Cc: dev
On 10-Mar-19 8:28 AM, Shahaf Shuler wrote:
> The DPDK APIs expose 3 different modes to work with memory used for DMA:
>
> 1. Use the DPDK owned memory (backed by the DPDK provided hugepages).
> This memory is allocated by the DPDK libraries, included in the DPDK
> memory system (memseg lists) and automatically DMA mapped by the DPDK
> layers.
>
> 2. Use memory allocated by the user and register to the DPDK memory
> systems. Upon registration of memory, the DPDK layers will DMA map it
> to all needed devices. After registration, allocation of this memory
> will be done with rte_*malloc APIs.
>
> 3. Use memory allocated by the user and not registered to the DPDK memory
> system. This is for users who wants to have tight control on this
> memory (e.g. avoid the rte_malloc header).
> The user should create a memory, register it through rte_extmem_register
> API, and call DMA map function in order to register such memory to
> the different devices.
>
> The scope of the patch focus on #3 above.
>
> Currently the only way to map external memory is through VFIO
> (rte_vfio_dma_map). While VFIO is common, there are other vendors
> which use different ways to map memory (e.g. Mellanox and NXP).
>
> The work in this patch moves the DMA mapping to vendor agnostic APIs.
> Device level DMA map and unmap APIs were added. Implementation of those
> APIs was done currently only for PCI devices.
>
> For PCI bus devices, the pci driver can expose its own map and unmap
> functions to be used for the mapping. In case the driver doesn't provide
> any, the memory will be mapped, if possible, to IOMMU through VFIO APIs.
>
> Application usage with those APIs is quite simple:
> * allocate memory
> * call rte_extmem_register on the memory chunk.
> * take a device, and query its rte_device.
> * call the device specific mapping function for this device.
>
> Future work will deprecate the rte_vfio_dma_map and rte_vfio_dma_unmap
> APIs, leaving the rte device APIs as the preferred option for the user.
>
> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> ---
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
--
Thanks,
Anatoly
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH v4 3/6] bus: introduce device level DMA memory mapping
2019-03-10 8:28 ` [dpdk-dev] [PATCH v4 3/6] bus: introduce device level DMA memory mapping Shahaf Shuler
2019-03-11 10:19 ` Burakov, Anatoly
@ 2019-03-13 9:56 ` Thomas Monjalon
2019-03-13 11:12 ` Shahaf Shuler
1 sibling, 1 reply; 79+ messages in thread
From: Thomas Monjalon @ 2019-03-13 9:56 UTC (permalink / raw)
To: Shahaf Shuler, anatoly.burakov
Cc: dev, yskoh, ferruh.yigit, nhorman, gaetan.rivet
10/03/2019 09:28, Shahaf Shuler:
> For PCI bus devices, the pci driver can expose its own map and unmap
> functions to be used for the mapping. In case the driver doesn't provide
> any, the memory will be mapped, if possible, to IOMMU through VFIO APIs.
>
> Application usage with those APIs is quite simple:
> * allocate memory
> * call rte_extmem_register on the memory chunk.
> * take a device, and query its rte_device.
> * call the device specific mapping function for this device.
Should we make it documented somewhere?
> +/**
> + * Device level DMA map function.
> + * After a successful call, the memory segment will be mapped to the
> + * given device.
> + *
> + * @note: Memory must be registered in advance using rte_extmem_* APIs.
Could we make more explicit that this function is part of
the "external memory API"?
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH v4 3/6] bus: introduce device level DMA memory mapping
2019-03-13 9:56 ` Thomas Monjalon
@ 2019-03-13 11:12 ` Shahaf Shuler
2019-03-13 11:19 ` Thomas Monjalon
2019-03-30 14:36 ` Thomas Monjalon
0 siblings, 2 replies; 79+ messages in thread
From: Shahaf Shuler @ 2019-03-13 11:12 UTC (permalink / raw)
To: Thomas Monjalon, anatoly.burakov
Cc: dev, Yongseok Koh, ferruh.yigit, nhorman, gaetan.rivet
> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Wednesday, March 13, 2019 11:56 AM
> To: Shahaf Shuler <shahafs@mellanox.com>; anatoly.burakov@intel.com
> Cc: dev@dpdk.org; Yongseok Koh <yskoh@mellanox.com>;
> ferruh.yigit@intel.com; nhorman@tuxdriver.com; gaetan.rivet@6wind.com
> Subject: Re: [dpdk-dev] [PATCH v4 3/6] bus: introduce device level DMA
> memory mapping
>
> 10/03/2019 09:28, Shahaf Shuler:
> > For PCI bus devices, the pci driver can expose its own map and unmap
> > functions to be used for the mapping. In case the driver doesn't
> > provide any, the memory will be mapped, if possible, to IOMMU through
> VFIO APIs.
> >
> > Application usage with those APIs is quite simple:
> > * allocate memory
> > * call rte_extmem_register on the memory chunk.
> > * take a device, and query its rte_device.
> > * call the device specific mapping function for this device.
>
> Should we make it documented somewhere?
The full flow to work w/ external memory is documented at doc/guides/prog_guide/env_abstraction_layer.rst , Subchapter "Support for Externally Allocated Memory.
The last commit in series update the right API to use.
>
> > +/**
> > + * Device level DMA map function.
> > + * After a successful call, the memory segment will be mapped to the
> > + * given device.
> > + *
> > + * @note: Memory must be registered in advance using rte_extmem_*
> APIs.
>
> Could we make more explicit that this function is part of the "external
> memory API"?
How do you suggest?
This function belongs to rte_dev therefore the rte_dev prefix. better rte_dev_extmem_dma_map ?
>
>
[1]
https://patches.dpdk.org/patch/51018/
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH v4 3/6] bus: introduce device level DMA memory mapping
2019-03-13 11:12 ` Shahaf Shuler
@ 2019-03-13 11:19 ` Thomas Monjalon
2019-03-13 11:47 ` Burakov, Anatoly
2019-03-30 14:36 ` Thomas Monjalon
1 sibling, 1 reply; 79+ messages in thread
From: Thomas Monjalon @ 2019-03-13 11:19 UTC (permalink / raw)
To: Shahaf Shuler
Cc: anatoly.burakov, dev, Yongseok Koh, ferruh.yigit, nhorman, gaetan.rivet
13/03/2019 12:12, Shahaf Shuler:
> From: Thomas Monjalon <thomas@monjalon.net>
> > > +/**
> > > + * Device level DMA map function.
> > > + * After a successful call, the memory segment will be mapped to the
> > > + * given device.
> > > + *
> > > + * @note: Memory must be registered in advance using rte_extmem_*
> > APIs.
> >
> > Could we make more explicit that this function is part of the "external
> > memory API"?
>
> How do you suggest?
There could be an explicit comment.
> This function belongs to rte_dev therefore the rte_dev prefix. better rte_dev_extmem_dma_map ?
Not sure about the prefix. Anatoly?
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH v4 3/6] bus: introduce device level DMA memory mapping
2019-03-13 11:19 ` Thomas Monjalon
@ 2019-03-13 11:47 ` Burakov, Anatoly
0 siblings, 0 replies; 79+ messages in thread
From: Burakov, Anatoly @ 2019-03-13 11:47 UTC (permalink / raw)
To: Thomas Monjalon, Shahaf Shuler
Cc: dev, Yongseok Koh, ferruh.yigit, nhorman, gaetan.rivet
On 13-Mar-19 11:19 AM, Thomas Monjalon wrote:
> 13/03/2019 12:12, Shahaf Shuler:
>> From: Thomas Monjalon <thomas@monjalon.net>
>>>> +/**
>>>> + * Device level DMA map function.
>>>> + * After a successful call, the memory segment will be mapped to the
>>>> + * given device.
>>>> + *
>>>> + * @note: Memory must be registered in advance using rte_extmem_*
>>> APIs.
>>>
>>> Could we make more explicit that this function is part of the "external
>>> memory API"?
>>
>> How do you suggest?
>
> There could be an explicit comment.
>
>> This function belongs to rte_dev therefore the rte_dev prefix. better rte_dev_extmem_dma_map ?
>
> Not sure about the prefix. Anatoly?
>
IMO this is a dev API. The fact that its purpose is to use it with
extmem is coincidental.
--
Thanks,
Anatoly
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH v4 3/6] bus: introduce device level DMA memory mapping
2019-03-13 11:12 ` Shahaf Shuler
2019-03-13 11:19 ` Thomas Monjalon
@ 2019-03-30 14:36 ` Thomas Monjalon
2019-03-30 14:36 ` Thomas Monjalon
1 sibling, 1 reply; 79+ messages in thread
From: Thomas Monjalon @ 2019-03-30 14:36 UTC (permalink / raw)
To: Shahaf Shuler
Cc: dev, anatoly.burakov, Yongseok Koh, ferruh.yigit, nhorman, gaetan.rivet
13/03/2019 12:12, Shahaf Shuler:
> From: Thomas Monjalon <thomas@monjalon.net>
> > 10/03/2019 09:28, Shahaf Shuler:
> > > For PCI bus devices, the pci driver can expose its own map and unmap
> > > functions to be used for the mapping. In case the driver doesn't
> > > provide any, the memory will be mapped, if possible, to IOMMU through
> > VFIO APIs.
> > >
> > > Application usage with those APIs is quite simple:
> > > * allocate memory
> > > * call rte_extmem_register on the memory chunk.
> > > * take a device, and query its rte_device.
> > > * call the device specific mapping function for this device.
> >
> > Should we make it documented somewhere?
>
> The full flow to work w/ external memory is documented at doc/guides/prog_guide/env_abstraction_layer.rst , Subchapter "Support for Externally Allocated Memory.
> The last commit in series update the right API to use.
OK, then I will move this doc update in this patch.
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH v4 3/6] bus: introduce device level DMA memory mapping
2019-03-30 14:36 ` Thomas Monjalon
@ 2019-03-30 14:36 ` Thomas Monjalon
0 siblings, 0 replies; 79+ messages in thread
From: Thomas Monjalon @ 2019-03-30 14:36 UTC (permalink / raw)
To: Shahaf Shuler
Cc: dev, anatoly.burakov, Yongseok Koh, ferruh.yigit, nhorman, gaetan.rivet
13/03/2019 12:12, Shahaf Shuler:
> From: Thomas Monjalon <thomas@monjalon.net>
> > 10/03/2019 09:28, Shahaf Shuler:
> > > For PCI bus devices, the pci driver can expose its own map and unmap
> > > functions to be used for the mapping. In case the driver doesn't
> > > provide any, the memory will be mapped, if possible, to IOMMU through
> > VFIO APIs.
> > >
> > > Application usage with those APIs is quite simple:
> > > * allocate memory
> > > * call rte_extmem_register on the memory chunk.
> > > * take a device, and query its rte_device.
> > > * call the device specific mapping function for this device.
> >
> > Should we make it documented somewhere?
>
> The full flow to work w/ external memory is documented at doc/guides/prog_guide/env_abstraction_layer.rst , Subchapter "Support for Externally Allocated Memory.
> The last commit in series update the right API to use.
OK, then I will move this doc update in this patch.
^ permalink raw reply [flat|nested] 79+ messages in thread
* [dpdk-dev] [PATCH v4 4/6] net/mlx5: refactor external memory registration
2019-03-10 8:27 ` [dpdk-dev] [PATCH v4 " Shahaf Shuler
` (2 preceding siblings ...)
2019-03-10 8:28 ` [dpdk-dev] [PATCH v4 3/6] bus: introduce device level DMA memory mapping Shahaf Shuler
@ 2019-03-10 8:28 ` Shahaf Shuler
2019-03-10 8:28 ` [dpdk-dev] [PATCH v4 5/6] net/mlx5: support PCI device DMA map and unmap Shahaf Shuler
` (3 subsequent siblings)
7 siblings, 0 replies; 79+ messages in thread
From: Shahaf Shuler @ 2019-03-10 8:28 UTC (permalink / raw)
To: anatoly.burakov, yskoh, thomas, ferruh.yigit, nhorman, gaetan.rivet; +Cc: dev
Move the memory region creation to a separate function to
prepare the ground for the reuse of it on the PCI driver map and unmap
functions.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
---
drivers/net/mlx5/mlx5_mr.c | 86 +++++++++++++++++++++++++++--------------
1 file changed, 57 insertions(+), 29 deletions(-)
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 700d83d1bc..43ee9c961b 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -1109,6 +1109,58 @@ mlx5_mr_flush_local_cache(struct mlx5_mr_ctrl *mr_ctrl)
}
/**
+ * Creates a memory region for external memory, that is memory which is not
+ * part of the DPDK memory segments.
+ *
+ * @param dev
+ * Pointer to the ethernet device.
+ * @param addr
+ * Starting virtual address of memory.
+ * @param len
+ * Length of memory segment being mapped.
+ * @param socked_id
+ * Socket to allocate heap memory for the control structures.
+ *
+ * @return
+ * Pointer to MR structure on success, NULL otherwise.
+ */
+static struct mlx5_mr *
+mlx5_create_mr_ext(struct rte_eth_dev *dev, uintptr_t addr, size_t len,
+ int socket_id)
+{
+ struct mlx5_priv *priv = dev->data->dev_private;
+ struct mlx5_mr *mr = NULL;
+
+ mr = rte_zmalloc_socket(NULL,
+ RTE_ALIGN_CEIL(sizeof(*mr),
+ RTE_CACHE_LINE_SIZE),
+ RTE_CACHE_LINE_SIZE, socket_id);
+ if (mr == NULL)
+ return NULL;
+ mr->ibv_mr = mlx5_glue->reg_mr(priv->pd, (void *)addr, len,
+ IBV_ACCESS_LOCAL_WRITE);
+ if (mr->ibv_mr == NULL) {
+ DRV_LOG(WARNING,
+ "port %u fail to create a verbs MR for address (%p)",
+ dev->data->port_id, (void *)addr);
+ rte_free(mr);
+ return NULL;
+ }
+ mr->msl = NULL; /* Mark it is external memory. */
+ mr->ms_bmp = NULL;
+ mr->ms_n = 1;
+ mr->ms_bmp_n = 1;
+ DRV_LOG(DEBUG,
+ "port %u MR CREATED (%p) for external memory %p:\n"
+ " [0x%" PRIxPTR ", 0x%" PRIxPTR "),"
+ " lkey=0x%x base_idx=%u ms_n=%u, ms_bmp_n=%u",
+ dev->data->port_id, (void *)mr, (void *)addr,
+ addr, addr + len, rte_cpu_to_be_32(mr->ibv_mr->lkey),
+ mr->ms_base_idx, mr->ms_n, mr->ms_bmp_n);
+ return mr;
+}
+
+/**
* Called during rte_mempool_mem_iter() by mlx5_mr_update_ext_mp().
*
* Externally allocated chunk is registered and a MR is created for the chunk.
@@ -1142,43 +1194,19 @@ mlx5_mr_update_ext_mp_cb(struct rte_mempool *mp, void *opaque,
rte_rwlock_read_unlock(&priv->mr.rwlock);
if (lkey != UINT32_MAX)
return;
- mr = rte_zmalloc_socket(NULL,
- RTE_ALIGN_CEIL(sizeof(*mr),
- RTE_CACHE_LINE_SIZE),
- RTE_CACHE_LINE_SIZE, mp->socket_id);
- if (mr == NULL) {
- DRV_LOG(WARNING,
- "port %u unable to allocate memory for a new MR of"
- " mempool (%s).",
- dev->data->port_id, mp->name);
- data->ret = -1;
- return;
- }
DRV_LOG(DEBUG, "port %u register MR for chunk #%d of mempool (%s)",
dev->data->port_id, mem_idx, mp->name);
- mr->ibv_mr = mlx5_glue->reg_mr(priv->pd, (void *)addr, len,
- IBV_ACCESS_LOCAL_WRITE);
- if (mr->ibv_mr == NULL) {
+ mr = mlx5_create_mr_ext(dev, addr, len, mp->socket_id);
+ if (!mr) {
DRV_LOG(WARNING,
- "port %u fail to create a verbs MR for address (%p)",
- dev->data->port_id, (void *)addr);
- rte_free(mr);
+ "port %u unable to allocate a new MR of"
+ " mempool (%s).",
+ dev->data->port_id, mp->name);
data->ret = -1;
return;
}
- mr->msl = NULL; /* Mark it is external memory. */
- mr->ms_bmp = NULL;
- mr->ms_n = 1;
- mr->ms_bmp_n = 1;
rte_rwlock_write_lock(&priv->mr.rwlock);
LIST_INSERT_HEAD(&priv->mr.mr_list, mr, mr);
- DRV_LOG(DEBUG,
- "port %u MR CREATED (%p) for external memory %p:\n"
- " [0x%" PRIxPTR ", 0x%" PRIxPTR "),"
- " lkey=0x%x base_idx=%u ms_n=%u, ms_bmp_n=%u",
- dev->data->port_id, (void *)mr, (void *)addr,
- addr, addr + len, rte_cpu_to_be_32(mr->ibv_mr->lkey),
- mr->ms_base_idx, mr->ms_n, mr->ms_bmp_n);
/* Insert to the global cache table. */
mr_insert_dev_cache(dev, mr);
rte_rwlock_write_unlock(&priv->mr.rwlock);
--
2.12.0
^ permalink raw reply [flat|nested] 79+ messages in thread
* [dpdk-dev] [PATCH v4 5/6] net/mlx5: support PCI device DMA map and unmap
2019-03-10 8:27 ` [dpdk-dev] [PATCH v4 " Shahaf Shuler
` (3 preceding siblings ...)
2019-03-10 8:28 ` [dpdk-dev] [PATCH v4 4/6] net/mlx5: refactor external memory registration Shahaf Shuler
@ 2019-03-10 8:28 ` Shahaf Shuler
2019-03-10 8:28 ` [dpdk-dev] [PATCH v4 6/6] doc: deprecation notice for VFIO DMA map APIs Shahaf Shuler
` (2 subsequent siblings)
7 siblings, 0 replies; 79+ messages in thread
From: Shahaf Shuler @ 2019-03-10 8:28 UTC (permalink / raw)
To: anatoly.burakov, yskoh, thomas, ferruh.yigit, nhorman, gaetan.rivet; +Cc: dev
The implementation reuses the external memory registration work done by
commit[1].
Note about representors:
The current representor design will not work
with those map and unmap functions. The reason is that for representors
we have multiple IB devices share the same PCI function, so mapping will
happen only on one of the representors and not all of them.
While it is possible to implement such support, the IB representor
design is going to be changed during DPDK19.05. The new design will have
a single IB device for all representors, hence sharing of a single
memory region between all representors will be possible.
[1]
commit 7e43a32ee060
("net/mlx5: support externally allocated static memory")
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
---
drivers/net/mlx5/mlx5.c | 2 +
drivers/net/mlx5/mlx5_mr.c | 139 ++++++++++++++++++++++++++++++++++++++
drivers/net/mlx5/mlx5_rxtx.h | 5 ++
3 files changed, 146 insertions(+)
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index ae4b71695e..8141bda3fb 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1632,6 +1632,8 @@ static struct rte_pci_driver mlx5_driver = {
.id_table = mlx5_pci_id_map,
.probe = mlx5_pci_probe,
.remove = mlx5_pci_remove,
+ .dma_map = mlx5_dma_map,
+ .dma_unmap = mlx5_dma_unmap,
.drv_flags = (RTE_PCI_DRV_INTR_LSC | RTE_PCI_DRV_INTR_RMV |
RTE_PCI_DRV_PROBE_AGAIN),
};
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 43ee9c961b..21f8b5e045 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -14,6 +14,7 @@
#include <rte_mempool.h>
#include <rte_malloc.h>
#include <rte_rwlock.h>
+#include <rte_bus_pci.h>
#include "mlx5.h"
#include "mlx5_mr.h"
@@ -1215,6 +1216,144 @@ mlx5_mr_update_ext_mp_cb(struct rte_mempool *mp, void *opaque,
}
/**
+ * Finds the first ethdev that match the pci device.
+ * The existence of multiple ethdev per pci device is only with representors.
+ * On such case, it is enough to get only one of the ports as they all share
+ * the same ibv context.
+ *
+ * @param pdev
+ * Pointer to the PCI device.
+ *
+ * @return
+ * Pointer to the ethdev if found, NULL otherwise.
+ */
+static struct rte_eth_dev *
+pci_dev_to_eth_dev(struct rte_pci_device *pdev)
+{
+ struct rte_dev_iterator it;
+ struct rte_device *dev;
+
+ /**
+ * We really need to iterate all devices regardless of
+ * their owner.
+ */
+ RTE_DEV_FOREACH(dev, "class=eth", &it)
+ if (dev == &pdev->device)
+ return it.class_device;
+ return NULL;
+}
+
+/**
+ * DPDK callback to DMA map external memory to a PCI device.
+ *
+ * @param pdev
+ * Pointer to the PCI device.
+ * @param addr
+ * Starting virtual address of memory to be mapped.
+ * @param iova
+ * Starting IOVA address of memory to be mapped.
+ * @param len
+ * Length of memory segment being mapped.
+ *
+ * @return
+ * 0 on success, negative value on error.
+ */
+int
+mlx5_dma_map(struct rte_pci_device *pdev, void *addr,
+ uint64_t iova __rte_unused, size_t len)
+{
+ struct rte_eth_dev *dev;
+ struct mlx5_mr *mr;
+ struct mlx5_priv *priv;
+
+ dev = pci_dev_to_eth_dev(pdev);
+ if (!dev) {
+ DRV_LOG(WARNING, "unable to find matching ethdev "
+ "to PCI device %p", (void *)pdev);
+ rte_errno = ENODEV;
+ return -1;
+ }
+ priv = dev->data->dev_private;
+ mr = mlx5_create_mr_ext(dev, (uintptr_t)addr, len, SOCKET_ID_ANY);
+ if (!mr) {
+ DRV_LOG(WARNING,
+ "port %u unable to dma map", dev->data->port_id);
+ rte_errno = EINVAL;
+ return -1;
+ }
+ rte_rwlock_write_lock(&priv->mr.rwlock);
+ LIST_INSERT_HEAD(&priv->mr.mr_list, mr, mr);
+ /* Insert to the global cache table. */
+ mr_insert_dev_cache(dev, mr);
+ rte_rwlock_write_unlock(&priv->mr.rwlock);
+ return 0;
+}
+
+/**
+ * DPDK callback to DMA unmap external memory to a PCI device.
+ *
+ * @param pdev
+ * Pointer to the PCI device.
+ * @param addr
+ * Starting virtual address of memory to be unmapped.
+ * @param iova
+ * Starting IOVA address of memory to be unmapped.
+ * @param len
+ * Length of memory segment being unmapped.
+ *
+ * @return
+ * 0 on success, negative value on error.
+ */
+int
+mlx5_dma_unmap(struct rte_pci_device *pdev, void *addr,
+ uint64_t iova __rte_unused, size_t len __rte_unused)
+{
+ struct rte_eth_dev *dev;
+ struct mlx5_priv *priv;
+ struct mlx5_mr *mr;
+ struct mlx5_mr_cache entry;
+
+ dev = pci_dev_to_eth_dev(pdev);
+ if (!dev) {
+ DRV_LOG(WARNING, "unable to find matching ethdev "
+ "to PCI device %p", (void *)pdev);
+ rte_errno = ENODEV;
+ return -1;
+ }
+ priv = dev->data->dev_private;
+ rte_rwlock_read_lock(&priv->mr.rwlock);
+ mr = mr_lookup_dev_list(dev, &entry, (uintptr_t)addr);
+ if (!mr) {
+ rte_rwlock_read_unlock(&priv->mr.rwlock);
+ DRV_LOG(WARNING, "address 0x%" PRIxPTR " wasn't registered "
+ "to PCI device %p", (uintptr_t)addr,
+ (void *)pdev);
+ rte_errno = EINVAL;
+ return -1;
+ }
+ LIST_REMOVE(mr, mr);
+ LIST_INSERT_HEAD(&priv->mr.mr_free_list, mr, mr);
+ DEBUG("port %u remove MR(%p) from list", dev->data->port_id,
+ (void *)mr);
+ mr_rebuild_dev_cache(dev);
+ /*
+ * Flush local caches by propagating invalidation across cores.
+ * rte_smp_wmb() is enough to synchronize this event. If one of
+ * freed memsegs is seen by other core, that means the memseg
+ * has been allocated by allocator, which will come after this
+ * free call. Therefore, this store instruction (incrementing
+ * generation below) will be guaranteed to be seen by other core
+ * before the core sees the newly allocated memory.
+ */
+ ++priv->mr.dev_gen;
+ DEBUG("broadcasting local cache flush, gen=%d",
+ priv->mr.dev_gen);
+ rte_smp_wmb();
+ rte_rwlock_read_unlock(&priv->mr.rwlock);
+ return 0;
+}
+
+/**
* Register MR for entire memory chunks in a Mempool having externally allocated
* memory and fill in local cache.
*
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 53115dde3d..ced9945888 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -28,6 +28,7 @@
#include <rte_atomic.h>
#include <rte_spinlock.h>
#include <rte_io.h>
+#include <rte_bus_pci.h>
#include "mlx5_utils.h"
#include "mlx5.h"
@@ -367,6 +368,10 @@ uint32_t mlx5_rx_addr2mr_bh(struct mlx5_rxq_data *rxq, uintptr_t addr);
uint32_t mlx5_tx_mb2mr_bh(struct mlx5_txq_data *txq, struct rte_mbuf *mb);
uint32_t mlx5_tx_update_ext_mp(struct mlx5_txq_data *txq, uintptr_t addr,
struct rte_mempool *mp);
+int mlx5_dma_map(struct rte_pci_device *pdev, void *addr, uint64_t iova,
+ size_t len);
+int mlx5_dma_unmap(struct rte_pci_device *pdev, void *addr, uint64_t iova,
+ size_t len);
/**
* Provide safe 64bit store operation to mlx5 UAR region for both 32bit and
--
2.12.0
^ permalink raw reply [flat|nested] 79+ messages in thread
* [dpdk-dev] [PATCH v4 6/6] doc: deprecation notice for VFIO DMA map APIs
2019-03-10 8:27 ` [dpdk-dev] [PATCH v4 " Shahaf Shuler
` (4 preceding siblings ...)
2019-03-10 8:28 ` [dpdk-dev] [PATCH v4 5/6] net/mlx5: support PCI device DMA map and unmap Shahaf Shuler
@ 2019-03-10 8:28 ` Shahaf Shuler
2019-03-11 10:20 ` Burakov, Anatoly
2019-10-01 15:20 ` David Marchand
2019-03-11 9:27 ` [dpdk-dev] [PATCH v4 0/6] introduce DMA memory mapping for external memory Gaëtan Rivet
2019-03-30 14:40 ` Thomas Monjalon
7 siblings, 2 replies; 79+ messages in thread
From: Shahaf Shuler @ 2019-03-10 8:28 UTC (permalink / raw)
To: anatoly.burakov, yskoh, thomas, ferruh.yigit, nhorman, gaetan.rivet; +Cc: dev
As those should be replaced by rte_dev_dma_map and rte_dev_dma_unmap
APIs.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
---
doc/guides/prog_guide/env_abstraction_layer.rst | 2 +-
doc/guides/rel_notes/deprecation.rst | 4 ++++
2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index 929d76dba7..ec2fe65523 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -282,7 +282,7 @@ The expected workflow is as follows:
- If IOVA table is not specified, IOVA addresses will be assumed to be
unavailable
- Other processes must attach to the memory area before they can use it
-* Perform DMA mapping with ``rte_vfio_dma_map`` if needed
+* Perform DMA mapping with ``rte_dev_dma_map`` if needed
* Use the memory area in your application
* If memory area is no longer needed, it can be unregistered
- If the area was mapped for DMA, unmapping must be performed before
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 1b4fcb7e64..48ec4fee88 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -35,6 +35,10 @@ Deprecation Notices
+ ``rte_eal_devargs_type_count``
+* vfio: removal of ``rte_vfio_dma_map`` and ``rte_vfio_dma_unmap`` APIs which
+ have been replaced with ``rte_dev_dma_map`` and ``rte_dev_dma_unmap``
+ functions. The due date for the removal targets DPDK 20.02.
+
* pci: Several exposed functions are misnamed.
The following functions are deprecated starting from v17.11 and are replaced:
--
2.12.0
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH v4 6/6] doc: deprecation notice for VFIO DMA map APIs
2019-03-10 8:28 ` [dpdk-dev] [PATCH v4 6/6] doc: deprecation notice for VFIO DMA map APIs Shahaf Shuler
@ 2019-03-11 10:20 ` Burakov, Anatoly
2019-03-11 17:35 ` Rami Rosen
2019-10-01 15:20 ` David Marchand
1 sibling, 1 reply; 79+ messages in thread
From: Burakov, Anatoly @ 2019-03-11 10:20 UTC (permalink / raw)
To: Shahaf Shuler, yskoh, thomas, ferruh.yigit, nhorman, gaetan.rivet; +Cc: dev
On 10-Mar-19 8:28 AM, Shahaf Shuler wrote:
> As those should be replaced by rte_dev_dma_map and rte_dev_dma_unmap
> APIs.
>
> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> ---
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
--
Thanks,
Anatoly
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH v4 6/6] doc: deprecation notice for VFIO DMA map APIs
2019-03-11 10:20 ` Burakov, Anatoly
@ 2019-03-11 17:35 ` Rami Rosen
0 siblings, 0 replies; 79+ messages in thread
From: Rami Rosen @ 2019-03-11 17:35 UTC (permalink / raw)
To: Burakov, Anatoly
Cc: Shahaf Shuler, yskoh, thomas, ferruh.yigit, nhorman, gaetan.rivet, dev
> On 10-Mar-19 8:28 AM, Shahaf Shuler wrote:
> > As those should be replaced by rte_dev_dma_map and rte_dev_dma_unmap
> > APIs.
> >
> > Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> > --
>
Acked-by: Rami Rosen <ramirose@gmail.com>
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH v4 6/6] doc: deprecation notice for VFIO DMA map APIs
2019-03-10 8:28 ` [dpdk-dev] [PATCH v4 6/6] doc: deprecation notice for VFIO DMA map APIs Shahaf Shuler
2019-03-11 10:20 ` Burakov, Anatoly
@ 2019-10-01 15:20 ` David Marchand
2019-10-02 4:53 ` Shahaf Shuler
1 sibling, 1 reply; 79+ messages in thread
From: David Marchand @ 2019-10-01 15:20 UTC (permalink / raw)
To: Shahaf Shuler, anatoly.burakov, yskoh, thomas, ferruh.yigit,
nhorman, gaetan.rivet
Cc: dev
Hello Shahaf,
On 10/03/2019 09:28, Shahaf Shuler wrote:
> As those should be replaced by rte_dev_dma_map and rte_dev_dma_unmap
> APIs.
>
> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> ---
> doc/guides/prog_guide/env_abstraction_layer.rst | 2 +-
> doc/guides/rel_notes/deprecation.rst | 4 ++++
> 2 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
> index 929d76dba7..ec2fe65523 100644
> --- a/doc/guides/prog_guide/env_abstraction_layer.rst
> +++ b/doc/guides/prog_guide/env_abstraction_layer.rst
> @@ -282,7 +282,7 @@ The expected workflow is as follows:
> - If IOVA table is not specified, IOVA addresses will be assumed to be
> unavailable
> - Other processes must attach to the memory area before they can use it
> -* Perform DMA mapping with ``rte_vfio_dma_map`` if needed
> +* Perform DMA mapping with ``rte_dev_dma_map`` if needed
> * Use the memory area in your application
> * If memory area is no longer needed, it can be unregistered
> - If the area was mapped for DMA, unmapping must be performed before
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index 1b4fcb7e64..48ec4fee88 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -35,6 +35,10 @@ Deprecation Notices
>
> + ``rte_eal_devargs_type_count``
>
> +* vfio: removal of ``rte_vfio_dma_map`` and ``rte_vfio_dma_unmap`` APIs which
> + have been replaced with ``rte_dev_dma_map`` and ``rte_dev_dma_unmap``
> + functions. The due date for the removal targets DPDK 20.02.
> +
> * pci: Several exposed functions are misnamed.
> The following functions are deprecated starting from v17.11 and are replaced:
>
>
With the ABI freeze that is going to happen in 19.11, this can't happen
in 20.02.
What would work best from your pov?
I can't see any in-tree user of rte_vfio_dma_*map, do you know of users
of this api?
Thanks.
--
David Marchand
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH v4 6/6] doc: deprecation notice for VFIO DMA map APIs
2019-10-01 15:20 ` David Marchand
@ 2019-10-02 4:53 ` Shahaf Shuler
2019-10-02 7:51 ` David Marchand
0 siblings, 1 reply; 79+ messages in thread
From: Shahaf Shuler @ 2019-10-02 4:53 UTC (permalink / raw)
To: David Marchand, anatoly.burakov, Yongseok Koh, Thomas Monjalon,
ferruh.yigit, nhorman, gaetan.rivet
Cc: dev
Hi David,
Tuesday, October 1, 2019 6:20 PM, David Marchand:
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v4 6/6] doc: deprecation notice for VFIO
> DMA map APIs
>
> Hello Shahaf,
>
> On 10/03/2019 09:28, Shahaf Shuler wrote:
> > As those should be replaced by rte_dev_dma_map and
> rte_dev_dma_unmap
> > APIs.
> >
> > Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> > ---
> > doc/guides/prog_guide/env_abstraction_layer.rst | 2 +-
> > doc/guides/rel_notes/deprecation.rst | 4 ++++
> > 2 files changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst
> > b/doc/guides/prog_guide/env_abstraction_layer.rst
> > index 929d76dba7..ec2fe65523 100644
> > --- a/doc/guides/prog_guide/env_abstraction_layer.rst
> > +++ b/doc/guides/prog_guide/env_abstraction_layer.rst
> > @@ -282,7 +282,7 @@ The expected workflow is as follows:
> > - If IOVA table is not specified, IOVA addresses will be assumed to be
> > unavailable
> > - Other processes must attach to the memory area before they can
> > use it
> > -* Perform DMA mapping with ``rte_vfio_dma_map`` if needed
> > +* Perform DMA mapping with ``rte_dev_dma_map`` if needed
> > * Use the memory area in your application
> > * If memory area is no longer needed, it can be unregistered
> > - If the area was mapped for DMA, unmapping must be performed
> > before diff --git a/doc/guides/rel_notes/deprecation.rst
> > b/doc/guides/rel_notes/deprecation.rst
> > index 1b4fcb7e64..48ec4fee88 100644
> > --- a/doc/guides/rel_notes/deprecation.rst
> > +++ b/doc/guides/rel_notes/deprecation.rst
> > @@ -35,6 +35,10 @@ Deprecation Notices
> >
> > + ``rte_eal_devargs_type_count``
> >
> > +* vfio: removal of ``rte_vfio_dma_map`` and ``rte_vfio_dma_unmap``
> > +APIs which
> > + have been replaced with ``rte_dev_dma_map`` and
> > +``rte_dev_dma_unmap``
> > + functions. The due date for the removal targets DPDK 20.02.
> > +
> > * pci: Several exposed functions are misnamed.
> > The following functions are deprecated starting from v17.11 and are
> replaced:
> >
> >
>
> With the ABI freeze that is going to happen in 19.11, this can't happen in
> 20.02.
>
> What would work best from your pov?
I have no object (even prefer) to remove them at 19.11.
At the time I sent the deprecation I was requested to provide more time for application to adopt.
>
> I can't see any in-tree user of rte_vfio_dma_*map, do you know of users of
> this api?
There is one - VPP. They don't use DPDK memory subsystem at all, rather use they own allocated memory and map all, wrongly, w/ above APIs.
If all agree - we can remove those now.
>
>
> Thanks.
>
> --
> David Marchand
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH v4 6/6] doc: deprecation notice for VFIO DMA map APIs
2019-10-02 4:53 ` Shahaf Shuler
@ 2019-10-02 7:51 ` David Marchand
0 siblings, 0 replies; 79+ messages in thread
From: David Marchand @ 2019-10-02 7:51 UTC (permalink / raw)
To: Shahaf Shuler, Ray Kinsella
Cc: anatoly.burakov, Thomas Monjalon, ferruh.yigit, nhorman,
gaetan.rivet, dev
On Wed, Oct 2, 2019 at 6:53 AM Shahaf Shuler <shahafs@mellanox.com> wrote:
>
> Hi David,
>
> Tuesday, October 1, 2019 6:20 PM, David Marchand:
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v4 6/6] doc: deprecation notice for VFIO
> > DMA map APIs
> >
> > Hello Shahaf,
> >
> > On 10/03/2019 09:28, Shahaf Shuler wrote:
> > > As those should be replaced by rte_dev_dma_map and
> > rte_dev_dma_unmap
> > > APIs.
> > >
> > > Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> > > ---
> > > doc/guides/prog_guide/env_abstraction_layer.rst | 2 +-
> > > doc/guides/rel_notes/deprecation.rst | 4 ++++
> > > 2 files changed, 5 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst
> > > b/doc/guides/prog_guide/env_abstraction_layer.rst
> > > index 929d76dba7..ec2fe65523 100644
> > > --- a/doc/guides/prog_guide/env_abstraction_layer.rst
> > > +++ b/doc/guides/prog_guide/env_abstraction_layer.rst
> > > @@ -282,7 +282,7 @@ The expected workflow is as follows:
> > > - If IOVA table is not specified, IOVA addresses will be assumed to be
> > > unavailable
> > > - Other processes must attach to the memory area before they can
> > > use it
> > > -* Perform DMA mapping with ``rte_vfio_dma_map`` if needed
> > > +* Perform DMA mapping with ``rte_dev_dma_map`` if needed
> > > * Use the memory area in your application
> > > * If memory area is no longer needed, it can be unregistered
> > > - If the area was mapped for DMA, unmapping must be performed
> > > before diff --git a/doc/guides/rel_notes/deprecation.rst
> > > b/doc/guides/rel_notes/deprecation.rst
> > > index 1b4fcb7e64..48ec4fee88 100644
> > > --- a/doc/guides/rel_notes/deprecation.rst
> > > +++ b/doc/guides/rel_notes/deprecation.rst
> > > @@ -35,6 +35,10 @@ Deprecation Notices
> > >
> > > + ``rte_eal_devargs_type_count``
> > >
> > > +* vfio: removal of ``rte_vfio_dma_map`` and ``rte_vfio_dma_unmap``
> > > +APIs which
> > > + have been replaced with ``rte_dev_dma_map`` and
> > > +``rte_dev_dma_unmap``
> > > + functions. The due date for the removal targets DPDK 20.02.
> > > +
> > > * pci: Several exposed functions are misnamed.
> > > The following functions are deprecated starting from v17.11 and are
> > replaced:
> > >
> > >
> >
> > With the ABI freeze that is going to happen in 19.11, this can't happen in
> > 20.02.
> >
> > What would work best from your pov?
>
> I have no object (even prefer) to remove them at 19.11.
> At the time I sent the deprecation I was requested to provide more time for application to adopt.
>
> >
> > I can't see any in-tree user of rte_vfio_dma_*map, do you know of users of
> > this api?
>
> There is one - VPP. They don't use DPDK memory subsystem at all, rather use they own allocated memory and map all, wrongly, w/ above APIs.
Thanks Shahaf.
I cannot see VPP involved people copied in this thread.
It would have been great to involve them at the time.
Ray, can you reply on this topic (replacement of rte_vfio_dma_map) ?
Or could you serve as a gateway/copy the vpp guys?
Thanks.
--
David Marchand
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH v4 0/6] introduce DMA memory mapping for external memory
2019-03-10 8:27 ` [dpdk-dev] [PATCH v4 " Shahaf Shuler
` (5 preceding siblings ...)
2019-03-10 8:28 ` [dpdk-dev] [PATCH v4 6/6] doc: deprecation notice for VFIO DMA map APIs Shahaf Shuler
@ 2019-03-11 9:27 ` Gaëtan Rivet
2019-03-30 14:40 ` Thomas Monjalon
7 siblings, 0 replies; 79+ messages in thread
From: Gaëtan Rivet @ 2019-03-11 9:27 UTC (permalink / raw)
To: Shahaf Shuler; +Cc: anatoly.burakov, yskoh, thomas, ferruh.yigit, nhorman, dev
Hello Shahaf,
Thanks for taking my remarks into account. You can add my acked-by if
you need to the series (not really for mlx5 PMD but you get the idea).
BR,
On Sun, Mar 10, 2019 at 10:27:57AM +0200, Shahaf Shuler wrote:
> The DPDK APIs expose 3 different modes to work with memory used for DMA:
>
> 1. Use the DPDK owned memory (backed by the DPDK provided hugepages).
> This memory is allocated by the DPDK libraries, included in the DPDK
> memory system (memseg lists) and automatically DMA mapped by the DPDK
> layers.
>
> 2. Use memory allocated by the user and register to the DPDK memory
> systems. Upon registration of memory, the DPDK layers will DMA map it
> to all needed devices. After registration, allocation of this memory
> will be done with rte_*malloc APIs.
>
> 3. Use memory allocated by the user and not registered to the DPDK memory
> system. This is for users who wants to have tight control on this
> memory (e.g. avoid the rte_malloc header).
> The user should create a memory, register it through rte_extmem_register
> API, and call DMA map function in order to register such memory to
> the different devices.
>
> The scope of the patch focus on #3 above.
>
> Currently the only way to map external memory is through VFIO
> (rte_vfio_dma_map). While VFIO is common, there are other vendors
> which use different ways to map memory (e.g. Mellanox and NXP).
>
> The work in this patch moves the DMA mapping to vendor agnostic APIs.
> Device level DMA map and unmap APIs were added. Implementation of those
> APIs was done currently only for PCI devices.
>
> For PCI bus devices, the pci driver can expose its own map and unmap
> functions to be used for the mapping. In case the driver doesn't provide
> any, the memory will be mapped, if possible, to IOMMU through VFIO APIs.
>
> Application usage with those APIs is quite simple:
> * allocate memory
> * call rte_extmem_register on the memory chunk.
> * take a device, and query its rte_device.
> * call the device specific mapping function for this device.
>
> Future work will deprecate the rte_vfio_dma_map and rte_vfio_dma_unmap
> APIs, leaving the rte device APIs as the preferred option for the user.
>
> On v4:
> - Changed rte_dev_dma_map errno to ENOTSUP in case bus doesn't support
> DMA map API.
>
> On v3:
> - Fixed compilation issue on freebsd.
> - Fixed forgotten rte_bus_dma_map to rte_dev_dma_map.
> - Removed __rte_deprecated from vfio function till the time the rte_dev_dma_map
> will be non-experimental.
> - Changed error return value to always be -1, with proper errno.
> - Used rte_mem_virt2memseg_list instead of rte_mem_virt2memseg to verify
> memory is registered.
> - Added above check also on dma_unmap calls.
> - Added note in the API the memory must be registered in advance.
> - Added debug log to report the case memory mapping to vfio was skipped.
>
> On v2:
> - Added warn in release notes about the API change in vfio.
> - Moved function doc to prototype declaration.
> - Used dma_map and dma_unmap instead of map and unmap.
> - Used RTE_VFIO_DEFAULT_CONTAINER_FD instead of -1 fixed value.
> - Moved bus function to eal_common_dev.c and renamed them properly.
> - Changed eth device iterator to use RTE_DEV_FOREACH.
> - Enforced memory is registered with rte_extmem_* prior to mapping.
> - Used EEXIST as the only possible return value from type1 vfio IOMMU mapping.
>
> [1] https://patches.dpdk.org/patch/47796/
>
> Shahaf Shuler (6):
> vfio: allow DMA map of memory for the default vfio fd
> vfio: don't fail to DMA map if memory is already mapped
> bus: introduce device level DMA memory mapping
> net/mlx5: refactor external memory registration
> net/mlx5: support PCI device DMA map and unmap
> doc: deprecation notice for VFIO DMA map APIs
>
> doc/guides/prog_guide/env_abstraction_layer.rst | 2 +-
> doc/guides/rel_notes/deprecation.rst | 4 +
> doc/guides/rel_notes/release_19_05.rst | 3 +
> drivers/bus/pci/pci_common.c | 48 ++++
> drivers/bus/pci/rte_bus_pci.h | 40 ++++
> drivers/net/mlx5/mlx5.c | 2 +
> drivers/net/mlx5/mlx5_mr.c | 225 ++++++++++++++++---
> drivers/net/mlx5/mlx5_rxtx.h | 5 +
> lib/librte_eal/common/eal_common_dev.c | 34 +++
> lib/librte_eal/common/include/rte_bus.h | 44 ++++
> lib/librte_eal/common/include/rte_dev.h | 47 ++++
> lib/librte_eal/common/include/rte_vfio.h | 8 +-
> lib/librte_eal/linuxapp/eal/eal_vfio.c | 42 +++-
> lib/librte_eal/rte_eal_version.map | 2 +
> 14 files changed, 468 insertions(+), 38 deletions(-)
>
> --
> 2.12.0
>
--
Gaëtan Rivet
6WIND
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH v4 0/6] introduce DMA memory mapping for external memory
2019-03-10 8:27 ` [dpdk-dev] [PATCH v4 " Shahaf Shuler
` (6 preceding siblings ...)
2019-03-11 9:27 ` [dpdk-dev] [PATCH v4 0/6] introduce DMA memory mapping for external memory Gaëtan Rivet
@ 2019-03-30 14:40 ` Thomas Monjalon
2019-03-30 14:40 ` Thomas Monjalon
7 siblings, 1 reply; 79+ messages in thread
From: Thomas Monjalon @ 2019-03-30 14:40 UTC (permalink / raw)
To: Shahaf Shuler
Cc: dev, anatoly.burakov, yskoh, ferruh.yigit, nhorman, gaetan.rivet
10/03/2019 09:27, Shahaf Shuler:
> The DPDK APIs expose 3 different modes to work with memory used for DMA:
>
> 1. Use the DPDK owned memory (backed by the DPDK provided hugepages).
> This memory is allocated by the DPDK libraries, included in the DPDK
> memory system (memseg lists) and automatically DMA mapped by the DPDK
> layers.
>
> 2. Use memory allocated by the user and register to the DPDK memory
> systems. Upon registration of memory, the DPDK layers will DMA map it
> to all needed devices. After registration, allocation of this memory
> will be done with rte_*malloc APIs.
>
> 3. Use memory allocated by the user and not registered to the DPDK memory
> system. This is for users who wants to have tight control on this
> memory (e.g. avoid the rte_malloc header).
> The user should create a memory, register it through rte_extmem_register
> API, and call DMA map function in order to register such memory to
> the different devices.
>
> The scope of the patch focus on #3 above.
>
> Currently the only way to map external memory is through VFIO
> (rte_vfio_dma_map). While VFIO is common, there are other vendors
> which use different ways to map memory (e.g. Mellanox and NXP).
>
> The work in this patch moves the DMA mapping to vendor agnostic APIs.
> Device level DMA map and unmap APIs were added. Implementation of those
> APIs was done currently only for PCI devices.
>
> For PCI bus devices, the pci driver can expose its own map and unmap
> functions to be used for the mapping. In case the driver doesn't provide
> any, the memory will be mapped, if possible, to IOMMU through VFIO APIs.
>
> Application usage with those APIs is quite simple:
> * allocate memory
> * call rte_extmem_register on the memory chunk.
> * take a device, and query its rte_device.
> * call the device specific mapping function for this device.
>
> Future work will deprecate the rte_vfio_dma_map and rte_vfio_dma_unmap
> APIs, leaving the rte device APIs as the preferred option for the user.
>
> Shahaf Shuler (6):
> vfio: allow DMA map of memory for the default vfio fd
> vfio: don't fail to DMA map if memory is already mapped
> bus: introduce device level DMA memory mapping
> net/mlx5: refactor external memory registration
> net/mlx5: support PCI device DMA map and unmap
> doc: deprecation notice for VFIO DMA map APIs
Applied, thanks
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH v4 0/6] introduce DMA memory mapping for external memory
2019-03-30 14:40 ` Thomas Monjalon
@ 2019-03-30 14:40 ` Thomas Monjalon
0 siblings, 0 replies; 79+ messages in thread
From: Thomas Monjalon @ 2019-03-30 14:40 UTC (permalink / raw)
To: Shahaf Shuler
Cc: dev, anatoly.burakov, yskoh, ferruh.yigit, nhorman, gaetan.rivet
10/03/2019 09:27, Shahaf Shuler:
> The DPDK APIs expose 3 different modes to work with memory used for DMA:
>
> 1. Use the DPDK owned memory (backed by the DPDK provided hugepages).
> This memory is allocated by the DPDK libraries, included in the DPDK
> memory system (memseg lists) and automatically DMA mapped by the DPDK
> layers.
>
> 2. Use memory allocated by the user and register to the DPDK memory
> systems. Upon registration of memory, the DPDK layers will DMA map it
> to all needed devices. After registration, allocation of this memory
> will be done with rte_*malloc APIs.
>
> 3. Use memory allocated by the user and not registered to the DPDK memory
> system. This is for users who wants to have tight control on this
> memory (e.g. avoid the rte_malloc header).
> The user should create a memory, register it through rte_extmem_register
> API, and call DMA map function in order to register such memory to
> the different devices.
>
> The scope of the patch focus on #3 above.
>
> Currently the only way to map external memory is through VFIO
> (rte_vfio_dma_map). While VFIO is common, there are other vendors
> which use different ways to map memory (e.g. Mellanox and NXP).
>
> The work in this patch moves the DMA mapping to vendor agnostic APIs.
> Device level DMA map and unmap APIs were added. Implementation of those
> APIs was done currently only for PCI devices.
>
> For PCI bus devices, the pci driver can expose its own map and unmap
> functions to be used for the mapping. In case the driver doesn't provide
> any, the memory will be mapped, if possible, to IOMMU through VFIO APIs.
>
> Application usage with those APIs is quite simple:
> * allocate memory
> * call rte_extmem_register on the memory chunk.
> * take a device, and query its rte_device.
> * call the device specific mapping function for this device.
>
> Future work will deprecate the rte_vfio_dma_map and rte_vfio_dma_unmap
> APIs, leaving the rte device APIs as the preferred option for the user.
>
> Shahaf Shuler (6):
> vfio: allow DMA map of memory for the default vfio fd
> vfio: don't fail to DMA map if memory is already mapped
> bus: introduce device level DMA memory mapping
> net/mlx5: refactor external memory registration
> net/mlx5: support PCI device DMA map and unmap
> doc: deprecation notice for VFIO DMA map APIs
Applied, thanks
^ permalink raw reply [flat|nested] 79+ messages in thread
* [dpdk-dev] [PATCH v3 1/6] vfio: allow DMA map of memory for the default vfio fd
2019-02-21 14:50 ` [dpdk-dev] [PATCH v2 " Shahaf Shuler
2019-03-05 13:59 ` [dpdk-dev] [PATCH v3 " Shahaf Shuler
@ 2019-03-05 13:59 ` Shahaf Shuler
2019-03-05 13:59 ` [dpdk-dev] [PATCH v3 2/6] vfio: don't fail to DMA map if memory is already mapped Shahaf Shuler
` (4 subsequent siblings)
6 siblings, 0 replies; 79+ messages in thread
From: Shahaf Shuler @ 2019-03-05 13:59 UTC (permalink / raw)
To: anatoly.burakov, yskoh, thomas, ferruh.yigit, nhorman, gaetan.rivet; +Cc: dev
Enable users the option to call rte_vfio_dma_map with request to map
to the default vfio fd.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
doc/guides/rel_notes/release_19_05.rst | 3 +++
lib/librte_eal/common/include/rte_vfio.h | 8 ++++++--
lib/librte_eal/linuxapp/eal/eal_vfio.c | 10 ++++++++--
3 files changed, 17 insertions(+), 4 deletions(-)
diff --git a/doc/guides/rel_notes/release_19_05.rst b/doc/guides/rel_notes/release_19_05.rst
index 4a3e2a7f31..b02753bbc4 100644
--- a/doc/guides/rel_notes/release_19_05.rst
+++ b/doc/guides/rel_notes/release_19_05.rst
@@ -122,6 +122,9 @@ ABI Changes
Also, make sure to start the actual text at the margin.
=========================================================
+* vfio: Functions ``rte_vfio_container_dma_map`` and
+ ``rte_vfio_container_dma_unmap`` have been extended with an option to
+ request mapping or un-mapping to the default vfio container fd.
Shared Library Versions
-----------------------
diff --git a/lib/librte_eal/common/include/rte_vfio.h b/lib/librte_eal/common/include/rte_vfio.h
index cae96fab90..cdfbedc1f9 100644
--- a/lib/librte_eal/common/include/rte_vfio.h
+++ b/lib/librte_eal/common/include/rte_vfio.h
@@ -80,6 +80,8 @@ struct vfio_device_info;
#endif /* VFIO_PRESENT */
+#define RTE_VFIO_DEFAULT_CONTAINER_FD (-1)
+
/**
* Setup vfio_cfg for the device identified by its address.
* It discovers the configured I/O MMU groups or sets a new one for the device.
@@ -347,7 +349,8 @@ rte_vfio_container_group_unbind(int container_fd, int iommu_group_num);
* Perform DMA mapping for devices in a container.
*
* @param container_fd
- * the specified container fd
+ * the specified container fd. Use RTE_VFIO_DEFAULT_CONTAINER_FD to
+ * use the default container.
*
* @param vaddr
* Starting virtual address of memory to be mapped.
@@ -370,7 +373,8 @@ rte_vfio_container_dma_map(int container_fd, uint64_t vaddr,
* Perform DMA unmapping for devices in a container.
*
* @param container_fd
- * the specified container fd
+ * the specified container fd. Use RTE_VFIO_DEFAULT_CONTAINER_FD to
+ * use the default container.
*
* @param vaddr
* Starting virtual address of memory to be unmapped.
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index c821e83826..9adbda8bb7 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -1897,7 +1897,10 @@ rte_vfio_container_dma_map(int container_fd, uint64_t vaddr, uint64_t iova,
return -1;
}
- vfio_cfg = get_vfio_cfg_by_container_fd(container_fd);
+ if (container_fd == RTE_VFIO_DEFAULT_CONTAINER_FD)
+ vfio_cfg = default_vfio_cfg;
+ else
+ vfio_cfg = get_vfio_cfg_by_container_fd(container_fd);
if (vfio_cfg == NULL) {
RTE_LOG(ERR, EAL, "Invalid container fd\n");
return -1;
@@ -1917,7 +1920,10 @@ rte_vfio_container_dma_unmap(int container_fd, uint64_t vaddr, uint64_t iova,
return -1;
}
- vfio_cfg = get_vfio_cfg_by_container_fd(container_fd);
+ if (container_fd == RTE_VFIO_DEFAULT_CONTAINER_FD)
+ vfio_cfg = default_vfio_cfg;
+ else
+ vfio_cfg = get_vfio_cfg_by_container_fd(container_fd);
if (vfio_cfg == NULL) {
RTE_LOG(ERR, EAL, "Invalid container fd\n");
return -1;
--
2.12.0
^ permalink raw reply [flat|nested] 79+ messages in thread
* [dpdk-dev] [PATCH v3 2/6] vfio: don't fail to DMA map if memory is already mapped
2019-02-21 14:50 ` [dpdk-dev] [PATCH v2 " Shahaf Shuler
2019-03-05 13:59 ` [dpdk-dev] [PATCH v3 " Shahaf Shuler
2019-03-05 13:59 ` [dpdk-dev] [PATCH v3 1/6] vfio: allow DMA map of memory for the default vfio fd Shahaf Shuler
@ 2019-03-05 13:59 ` Shahaf Shuler
2019-03-05 13:59 ` [dpdk-dev] [PATCH v3 3/6] bus: introduce device level DMA memory mapping Shahaf Shuler
` (3 subsequent siblings)
6 siblings, 0 replies; 79+ messages in thread
From: Shahaf Shuler @ 2019-03-05 13:59 UTC (permalink / raw)
To: anatoly.burakov, yskoh, thomas, ferruh.yigit, nhorman, gaetan.rivet; +Cc: dev
Currently vfio DMA map function will fail in case the same memory
segment is mapped twice.
This is too strict, as this is not an error to map the same memory
twice.
Instead, use the kernel return value to detect such state and have the
DMA function to return as successful.
For type1 mapping the kernel driver returns EEXISTS.
For spapr mapping EBUSY is returned since kernel 4.10.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
lib/librte_eal/linuxapp/eal/eal_vfio.c | 32 +++++++++++++++++++++++++----
1 file changed, 28 insertions(+), 4 deletions(-)
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index 9adbda8bb7..d0a0f9c16f 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -1264,9 +1264,21 @@ vfio_type1_dma_mem_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova,
ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, &dma_map);
if (ret) {
- RTE_LOG(ERR, EAL, " cannot set up DMA remapping, error %i (%s)\n",
- errno, strerror(errno));
+ /**
+ * In case the mapping was already done EEXIST will be
+ * returned from kernel.
+ */
+ if (errno == EEXIST) {
+ RTE_LOG(DEBUG, EAL,
+ " Memory segment is allready mapped,"
+ " skipping");
+ } else {
+ RTE_LOG(ERR, EAL,
+ " cannot set up DMA remapping,"
+ " error %i (%s)\n",
+ errno, strerror(errno));
return -1;
+ }
}
} else {
memset(&dma_unmap, 0, sizeof(dma_unmap));
@@ -1325,9 +1337,21 @@ vfio_spapr_dma_do_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova,
ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, &dma_map);
if (ret) {
- RTE_LOG(ERR, EAL, " cannot set up DMA remapping, error %i (%s)\n",
- errno, strerror(errno));
+ /**
+ * In case the mapping was already done EBUSY will be
+ * returned from kernel.
+ */
+ if (errno == EBUSY) {
+ RTE_LOG(DEBUG, EAL,
+ " Memory segment is allready mapped,"
+ " skipping");
+ } else {
+ RTE_LOG(ERR, EAL,
+ " cannot set up DMA remapping,"
+ " error %i (%s)\n", errno,
+ strerror(errno));
return -1;
+ }
}
} else {
--
2.12.0
^ permalink raw reply [flat|nested] 79+ messages in thread
* [dpdk-dev] [PATCH v3 3/6] bus: introduce device level DMA memory mapping
2019-02-21 14:50 ` [dpdk-dev] [PATCH v2 " Shahaf Shuler
` (2 preceding siblings ...)
2019-03-05 13:59 ` [dpdk-dev] [PATCH v3 2/6] vfio: don't fail to DMA map if memory is already mapped Shahaf Shuler
@ 2019-03-05 13:59 ` Shahaf Shuler
2019-03-05 16:35 ` Burakov, Anatoly
2019-03-05 13:59 ` [dpdk-dev] [PATCH v3 4/6] net/mlx5: refactor external memory registration Shahaf Shuler
` (2 subsequent siblings)
6 siblings, 1 reply; 79+ messages in thread
From: Shahaf Shuler @ 2019-03-05 13:59 UTC (permalink / raw)
To: anatoly.burakov, yskoh, thomas, ferruh.yigit, nhorman, gaetan.rivet; +Cc: dev
The DPDK APIs expose 3 different modes to work with memory used for DMA:
1. Use the DPDK owned memory (backed by the DPDK provided hugepages).
This memory is allocated by the DPDK libraries, included in the DPDK
memory system (memseg lists) and automatically DMA mapped by the DPDK
layers.
2. Use memory allocated by the user and register to the DPDK memory
systems. Upon registration of memory, the DPDK layers will DMA map it
to all needed devices. After registration, allocation of this memory
will be done with rte_*malloc APIs.
3. Use memory allocated by the user and not registered to the DPDK memory
system. This is for users who wants to have tight control on this
memory (e.g. avoid the rte_malloc header).
The user should create a memory, register it through rte_extmem_register
API, and call DMA map function in order to register such memory to
the different devices.
The scope of the patch focus on #3 above.
Currently the only way to map external memory is through VFIO
(rte_vfio_dma_map). While VFIO is common, there are other vendors
which use different ways to map memory (e.g. Mellanox and NXP).
The work in this patch moves the DMA mapping to vendor agnostic APIs.
Device level DMA map and unmap APIs were added. Implementation of those
APIs was done currently only for PCI devices.
For PCI bus devices, the pci driver can expose its own map and unmap
functions to be used for the mapping. In case the driver doesn't provide
any, the memory will be mapped, if possible, to IOMMU through VFIO APIs.
Application usage with those APIs is quite simple:
* allocate memory
* call rte_extmem_register on the memory chunk.
* take a device, and query its rte_device.
* call the device specific mapping function for this device.
Future work will deprecate the rte_vfio_dma_map and rte_vfio_dma_unmap
APIs, leaving the rte device APIs as the preferred option for the user.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
---
drivers/bus/pci/pci_common.c | 48 ++++++++++++++++++++++++++++
drivers/bus/pci/rte_bus_pci.h | 40 +++++++++++++++++++++++
lib/librte_eal/common/eal_common_dev.c | 34 ++++++++++++++++++++
lib/librte_eal/common/include/rte_bus.h | 44 +++++++++++++++++++++++++
lib/librte_eal/common/include/rte_dev.h | 47 +++++++++++++++++++++++++++
lib/librte_eal/rte_eal_version.map | 2 ++
6 files changed, 215 insertions(+)
diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index 6276e5d695..704b9d71af 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -528,6 +528,52 @@ pci_unplug(struct rte_device *dev)
return ret;
}
+static int
+pci_dma_map(struct rte_device *dev, void *addr, uint64_t iova, size_t len)
+{
+ struct rte_pci_device *pdev = RTE_DEV_TO_PCI(dev);
+
+ if (!pdev || !pdev->driver) {
+ rte_errno = EINVAL;
+ return -1;
+ }
+ if (pdev->driver->dma_map)
+ return pdev->driver->dma_map(pdev, addr, iova, len);
+ /**
+ * In case driver don't provides any specific mapping
+ * try fallback to VFIO.
+ */
+ if (pdev->kdrv == RTE_KDRV_VFIO)
+ return rte_vfio_container_dma_map
+ (RTE_VFIO_DEFAULT_CONTAINER_FD, (uintptr_t)addr,
+ iova, len);
+ rte_errno = ENOTSUP;
+ return -1;
+}
+
+static int
+pci_dma_unmap(struct rte_device *dev, void *addr, uint64_t iova, size_t len)
+{
+ struct rte_pci_device *pdev = RTE_DEV_TO_PCI(dev);
+
+ if (!pdev || !pdev->driver) {
+ rte_errno = EINVAL;
+ return -1;
+ }
+ if (pdev->driver->dma_unmap)
+ return pdev->driver->dma_unmap(pdev, addr, iova, len);
+ /**
+ * In case driver don't provides any specific mapping
+ * try fallback to VFIO.
+ */
+ if (pdev->kdrv == RTE_KDRV_VFIO)
+ return rte_vfio_container_dma_unmap
+ (RTE_VFIO_DEFAULT_CONTAINER_FD, (uintptr_t)addr,
+ iova, len);
+ rte_errno = ENOTSUP;
+ return -1;
+}
+
struct rte_pci_bus rte_pci_bus = {
.bus = {
.scan = rte_pci_scan,
@@ -536,6 +582,8 @@ struct rte_pci_bus rte_pci_bus = {
.plug = pci_plug,
.unplug = pci_unplug,
.parse = pci_parse,
+ .dma_map = pci_dma_map,
+ .dma_unmap = pci_dma_unmap,
.get_iommu_class = rte_pci_get_iommu_class,
.dev_iterate = rte_pci_dev_iterate,
.hot_unplug_handler = pci_hot_unplug_handler,
diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h
index f0d6d81c00..06e004cd3f 100644
--- a/drivers/bus/pci/rte_bus_pci.h
+++ b/drivers/bus/pci/rte_bus_pci.h
@@ -114,6 +114,44 @@ typedef int (pci_probe_t)(struct rte_pci_driver *, struct rte_pci_device *);
typedef int (pci_remove_t)(struct rte_pci_device *);
/**
+ * Driver-specific DMA mapping. After a successful call the device
+ * will be able to read/write from/to this segment.
+ *
+ * @param dev
+ * Pointer to the PCI device.
+ * @param addr
+ * Starting virtual address of memory to be mapped.
+ * @param iova
+ * Starting IOVA address of memory to be mapped.
+ * @param len
+ * Length of memory segment being mapped.
+ * @return
+ * - 0 On success.
+ * - Negative value and rte_errno is set otherwise.
+ */
+typedef int (pci_dma_map_t)(struct rte_pci_device *dev, void *addr,
+ uint64_t iova, size_t len);
+
+/**
+ * Driver-specific DMA un-mapping. After a successful call the device
+ * will not be able to read/write from/to this segment.
+ *
+ * @param dev
+ * Pointer to the PCI device.
+ * @param addr
+ * Starting virtual address of memory to be unmapped.
+ * @param iova
+ * Starting IOVA address of memory to be unmapped.
+ * @param len
+ * Length of memory segment being unmapped.
+ * @return
+ * - 0 On success.
+ * - Negative value and rte_errno is set otherwise.
+ */
+typedef int (pci_dma_unmap_t)(struct rte_pci_device *dev, void *addr,
+ uint64_t iova, size_t len);
+
+/**
* A structure describing a PCI driver.
*/
struct rte_pci_driver {
@@ -122,6 +160,8 @@ struct rte_pci_driver {
struct rte_pci_bus *bus; /**< PCI bus reference. */
pci_probe_t *probe; /**< Device Probe function. */
pci_remove_t *remove; /**< Device Remove function. */
+ pci_dma_map_t *dma_map; /**< device dma map function. */
+ pci_dma_unmap_t *dma_unmap; /**< device dma unmap function. */
const struct rte_pci_id *id_table; /**< ID table, NULL terminated. */
uint32_t drv_flags; /**< Flags RTE_PCI_DRV_*. */
};
diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
index fd7f5ca7d5..08303b2f53 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -756,3 +756,37 @@ rte_dev_iterator_next(struct rte_dev_iterator *it)
free(cls_str);
return it->device;
}
+
+int
+rte_dev_dma_map(struct rte_device *dev, void *addr, uint64_t iova,
+ size_t len)
+{
+ if (dev->bus->dma_map == NULL || len == 0) {
+ rte_errno = EINVAL;
+ return -1;
+ }
+ /* Memory must be registered through rte_extmem_* APIs */
+ if (rte_mem_virt2memseg_list(addr) == NULL) {
+ rte_errno = EINVAL;
+ return -1;
+ }
+
+ return dev->bus->dma_map(dev, addr, iova, len);
+}
+
+int
+rte_dev_dma_unmap(struct rte_device *dev, void *addr, uint64_t iova,
+ size_t len)
+{
+ if (dev->bus->dma_unmap == NULL || len == 0) {
+ rte_errno = EINVAL;
+ return -1;
+ }
+ /* Memory must be registered through rte_extmem_* APIs */
+ if (rte_mem_virt2memseg_list(addr) == NULL) {
+ rte_errno = EINVAL;
+ return -1;
+ }
+
+ return dev->bus->dma_unmap(dev, addr, iova, len);
+}
diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index 6be4b5cabe..4faf2d20a0 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -168,6 +168,48 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
typedef int (*rte_bus_parse_t)(const char *name, void *addr);
/**
+ * Device level DMA map function.
+ * After a successful call, the memory segment will be mapped to the
+ * given device.
+ *
+ * @param dev
+ * Device pointer.
+ * @param addr
+ * Virtual address to map.
+ * @param iova
+ * IOVA address to map.
+ * @param len
+ * Length of the memory segment being mapped.
+ *
+ * @return
+ * 0 if mapping was successful.
+ * Negative value and rte_errno is set otherwise.
+ */
+typedef int (*rte_dev_dma_map_t)(struct rte_device *dev, void *addr,
+ uint64_t iova, size_t len);
+
+/**
+ * Device level DMA unmap function.
+ * After a successful call, the memory segment will no longer be
+ * accessible by the given device.
+ *
+ * @param dev
+ * Device pointer.
+ * @param addr
+ * Virtual address to unmap.
+ * @param iova
+ * IOVA address to unmap.
+ * @param len
+ * Length of the memory segment being mapped.
+ *
+ * @return
+ * 0 if un-mapping was successful.
+ * Negative value and rte_errno is set otherwise.
+ */
+typedef int (*rte_dev_dma_unmap_t)(struct rte_device *dev, void *addr,
+ uint64_t iova, size_t len);
+
+/**
* Implement a specific hot-unplug handler, which is responsible for
* handle the failure when device be hot-unplugged. When the event of
* hot-unplug be detected, it could call this function to handle
@@ -238,6 +280,8 @@ struct rte_bus {
rte_bus_plug_t plug; /**< Probe single device for drivers */
rte_bus_unplug_t unplug; /**< Remove single device from driver */
rte_bus_parse_t parse; /**< Parse a device name */
+ rte_dev_dma_map_t dma_map; /**< DMA map for device in the bus */
+ rte_dev_dma_unmap_t dma_unmap; /**< DMA unmap for device in the bus */
struct rte_bus_conf conf; /**< Bus configuration */
rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
rte_dev_iterate_t dev_iterate; /**< Device iterator. */
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index 3cad4bce57..0d5e25b500 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -463,4 +463,51 @@ rte_dev_hotplug_handle_enable(void);
int __rte_experimental
rte_dev_hotplug_handle_disable(void);
+/**
+ * Device level DMA map function.
+ * After a successful call, the memory segment will be mapped to the
+ * given device.
+ *
+ * @note: Memory must be registered in advance using rte_extmem_* APIs.
+ *
+ * @param dev
+ * Device pointer.
+ * @param addr
+ * Virtual address to map.
+ * @param iova
+ * IOVA address to map.
+ * @param len
+ * Length of the memory segment being mapped.
+ *
+ * @return
+ * 0 if mapping was successful.
+ * Negative value and rte_errno is set otherwise.
+ */
+int __rte_experimental
+rte_dev_dma_map(struct rte_device *dev, void *addr, uint64_t iova, size_t len);
+
+/**
+ * Device level DMA unmap function.
+ * After a successful call, the memory segment will no longer be
+ * accessible by the given device.
+ *
+ * @note: Memory must be registered in advance using rte_extmem_* APIs.
+ *
+ * @param dev
+ * Device pointer.
+ * @param addr
+ * Virtual address to unmap.
+ * @param iova
+ * IOVA address to unmap.
+ * @param len
+ * Length of the memory segment being mapped.
+ *
+ * @return
+ * 0 if un-mapping was successful.
+ * Negative value and rte_errno is set otherwise.
+ */
+int __rte_experimental
+rte_dev_dma_unmap(struct rte_device *dev, void *addr, uint64_t iova,
+ size_t len);
+
#endif /* _RTE_DEV_H_ */
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index eb5f7b9cbd..264aa050fa 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -277,6 +277,8 @@ EXPERIMENTAL {
rte_class_unregister;
rte_ctrl_thread_create;
rte_delay_us_sleep;
+ rte_dev_dma_map;
+ rte_dev_dma_unmap;
rte_dev_event_callback_process;
rte_dev_event_callback_register;
rte_dev_event_callback_unregister;
--
2.12.0
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH v3 3/6] bus: introduce device level DMA memory mapping
2019-03-05 13:59 ` [dpdk-dev] [PATCH v3 3/6] bus: introduce device level DMA memory mapping Shahaf Shuler
@ 2019-03-05 16:35 ` Burakov, Anatoly
0 siblings, 0 replies; 79+ messages in thread
From: Burakov, Anatoly @ 2019-03-05 16:35 UTC (permalink / raw)
To: Shahaf Shuler, yskoh, thomas, ferruh.yigit, nhorman, gaetan.rivet; +Cc: dev
On 05-Mar-19 1:59 PM, Shahaf Shuler wrote:
> The DPDK APIs expose 3 different modes to work with memory used for DMA:
>
> 1. Use the DPDK owned memory (backed by the DPDK provided hugepages).
> This memory is allocated by the DPDK libraries, included in the DPDK
> memory system (memseg lists) and automatically DMA mapped by the DPDK
> layers.
>
> 2. Use memory allocated by the user and register to the DPDK memory
> systems. Upon registration of memory, the DPDK layers will DMA map it
> to all needed devices. After registration, allocation of this memory
> will be done with rte_*malloc APIs.
>
> 3. Use memory allocated by the user and not registered to the DPDK memory
> system. This is for users who wants to have tight control on this
> memory (e.g. avoid the rte_malloc header).
> The user should create a memory, register it through rte_extmem_register
> API, and call DMA map function in order to register such memory to
> the different devices.
>
> The scope of the patch focus on #3 above.
>
> Currently the only way to map external memory is through VFIO
> (rte_vfio_dma_map). While VFIO is common, there are other vendors
> which use different ways to map memory (e.g. Mellanox and NXP).
>
> The work in this patch moves the DMA mapping to vendor agnostic APIs.
> Device level DMA map and unmap APIs were added. Implementation of those
> APIs was done currently only for PCI devices.
>
> For PCI bus devices, the pci driver can expose its own map and unmap
> functions to be used for the mapping. In case the driver doesn't provide
> any, the memory will be mapped, if possible, to IOMMU through VFIO APIs.
>
> Application usage with those APIs is quite simple:
> * allocate memory
> * call rte_extmem_register on the memory chunk.
> * take a device, and query its rte_device.
> * call the device specific mapping function for this device.
>
> Future work will deprecate the rte_vfio_dma_map and rte_vfio_dma_unmap
> APIs, leaving the rte device APIs as the preferred option for the user.
>
> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> ---
<snip>
> diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
> index fd7f5ca7d5..08303b2f53 100644
> --- a/lib/librte_eal/common/eal_common_dev.c
> +++ b/lib/librte_eal/common/eal_common_dev.c
> @@ -756,3 +756,37 @@ rte_dev_iterator_next(struct rte_dev_iterator *it)
> free(cls_str);
> return it->device;
> }
> +
> +int
> +rte_dev_dma_map(struct rte_device *dev, void *addr, uint64_t iova,
> + size_t len)
> +{
> + if (dev->bus->dma_map == NULL || len == 0) {
> + rte_errno = EINVAL;
> + return -1;
Here and below as well - please correct me if i'm wrong here, but should
not having dma_map defined for a bus be an EINVAL error? EINVAL is
reserved for invalid arguments - this seems to me that ENOTSUP would be
more appropriate error code. User should be able to differentiate
between failed mapping due to invalid data, and failed data due to the
device's bus not supporting DMA remapping API in the first place.
--
Thanks,
Anatoly
^ permalink raw reply [flat|nested] 79+ messages in thread
* [dpdk-dev] [PATCH v3 4/6] net/mlx5: refactor external memory registration
2019-02-21 14:50 ` [dpdk-dev] [PATCH v2 " Shahaf Shuler
` (3 preceding siblings ...)
2019-03-05 13:59 ` [dpdk-dev] [PATCH v3 3/6] bus: introduce device level DMA memory mapping Shahaf Shuler
@ 2019-03-05 13:59 ` Shahaf Shuler
2019-03-05 13:59 ` [dpdk-dev] [PATCH v3 5/6] net/mlx5: support PCI device DMA map and unmap Shahaf Shuler
2019-03-05 13:59 ` [dpdk-dev] [PATCH v3 6/6] doc: deprecation notice for VFIO DMA map APIs Shahaf Shuler
6 siblings, 0 replies; 79+ messages in thread
From: Shahaf Shuler @ 2019-03-05 13:59 UTC (permalink / raw)
To: anatoly.burakov, yskoh, thomas, ferruh.yigit, nhorman, gaetan.rivet; +Cc: dev
Move the memory region creation to a separate function to
prepare the ground for the reuse of it on the PCI driver map and unmap
functions.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
---
drivers/net/mlx5/mlx5_mr.c | 86 +++++++++++++++++++++++++++--------------
1 file changed, 57 insertions(+), 29 deletions(-)
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 700d83d1bc..43ee9c961b 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -1109,6 +1109,58 @@ mlx5_mr_flush_local_cache(struct mlx5_mr_ctrl *mr_ctrl)
}
/**
+ * Creates a memory region for external memory, that is memory which is not
+ * part of the DPDK memory segments.
+ *
+ * @param dev
+ * Pointer to the ethernet device.
+ * @param addr
+ * Starting virtual address of memory.
+ * @param len
+ * Length of memory segment being mapped.
+ * @param socked_id
+ * Socket to allocate heap memory for the control structures.
+ *
+ * @return
+ * Pointer to MR structure on success, NULL otherwise.
+ */
+static struct mlx5_mr *
+mlx5_create_mr_ext(struct rte_eth_dev *dev, uintptr_t addr, size_t len,
+ int socket_id)
+{
+ struct mlx5_priv *priv = dev->data->dev_private;
+ struct mlx5_mr *mr = NULL;
+
+ mr = rte_zmalloc_socket(NULL,
+ RTE_ALIGN_CEIL(sizeof(*mr),
+ RTE_CACHE_LINE_SIZE),
+ RTE_CACHE_LINE_SIZE, socket_id);
+ if (mr == NULL)
+ return NULL;
+ mr->ibv_mr = mlx5_glue->reg_mr(priv->pd, (void *)addr, len,
+ IBV_ACCESS_LOCAL_WRITE);
+ if (mr->ibv_mr == NULL) {
+ DRV_LOG(WARNING,
+ "port %u fail to create a verbs MR for address (%p)",
+ dev->data->port_id, (void *)addr);
+ rte_free(mr);
+ return NULL;
+ }
+ mr->msl = NULL; /* Mark it is external memory. */
+ mr->ms_bmp = NULL;
+ mr->ms_n = 1;
+ mr->ms_bmp_n = 1;
+ DRV_LOG(DEBUG,
+ "port %u MR CREATED (%p) for external memory %p:\n"
+ " [0x%" PRIxPTR ", 0x%" PRIxPTR "),"
+ " lkey=0x%x base_idx=%u ms_n=%u, ms_bmp_n=%u",
+ dev->data->port_id, (void *)mr, (void *)addr,
+ addr, addr + len, rte_cpu_to_be_32(mr->ibv_mr->lkey),
+ mr->ms_base_idx, mr->ms_n, mr->ms_bmp_n);
+ return mr;
+}
+
+/**
* Called during rte_mempool_mem_iter() by mlx5_mr_update_ext_mp().
*
* Externally allocated chunk is registered and a MR is created for the chunk.
@@ -1142,43 +1194,19 @@ mlx5_mr_update_ext_mp_cb(struct rte_mempool *mp, void *opaque,
rte_rwlock_read_unlock(&priv->mr.rwlock);
if (lkey != UINT32_MAX)
return;
- mr = rte_zmalloc_socket(NULL,
- RTE_ALIGN_CEIL(sizeof(*mr),
- RTE_CACHE_LINE_SIZE),
- RTE_CACHE_LINE_SIZE, mp->socket_id);
- if (mr == NULL) {
- DRV_LOG(WARNING,
- "port %u unable to allocate memory for a new MR of"
- " mempool (%s).",
- dev->data->port_id, mp->name);
- data->ret = -1;
- return;
- }
DRV_LOG(DEBUG, "port %u register MR for chunk #%d of mempool (%s)",
dev->data->port_id, mem_idx, mp->name);
- mr->ibv_mr = mlx5_glue->reg_mr(priv->pd, (void *)addr, len,
- IBV_ACCESS_LOCAL_WRITE);
- if (mr->ibv_mr == NULL) {
+ mr = mlx5_create_mr_ext(dev, addr, len, mp->socket_id);
+ if (!mr) {
DRV_LOG(WARNING,
- "port %u fail to create a verbs MR for address (%p)",
- dev->data->port_id, (void *)addr);
- rte_free(mr);
+ "port %u unable to allocate a new MR of"
+ " mempool (%s).",
+ dev->data->port_id, mp->name);
data->ret = -1;
return;
}
- mr->msl = NULL; /* Mark it is external memory. */
- mr->ms_bmp = NULL;
- mr->ms_n = 1;
- mr->ms_bmp_n = 1;
rte_rwlock_write_lock(&priv->mr.rwlock);
LIST_INSERT_HEAD(&priv->mr.mr_list, mr, mr);
- DRV_LOG(DEBUG,
- "port %u MR CREATED (%p) for external memory %p:\n"
- " [0x%" PRIxPTR ", 0x%" PRIxPTR "),"
- " lkey=0x%x base_idx=%u ms_n=%u, ms_bmp_n=%u",
- dev->data->port_id, (void *)mr, (void *)addr,
- addr, addr + len, rte_cpu_to_be_32(mr->ibv_mr->lkey),
- mr->ms_base_idx, mr->ms_n, mr->ms_bmp_n);
/* Insert to the global cache table. */
mr_insert_dev_cache(dev, mr);
rte_rwlock_write_unlock(&priv->mr.rwlock);
--
2.12.0
^ permalink raw reply [flat|nested] 79+ messages in thread
* [dpdk-dev] [PATCH v3 5/6] net/mlx5: support PCI device DMA map and unmap
2019-02-21 14:50 ` [dpdk-dev] [PATCH v2 " Shahaf Shuler
` (4 preceding siblings ...)
2019-03-05 13:59 ` [dpdk-dev] [PATCH v3 4/6] net/mlx5: refactor external memory registration Shahaf Shuler
@ 2019-03-05 13:59 ` Shahaf Shuler
2019-03-05 13:59 ` [dpdk-dev] [PATCH v3 6/6] doc: deprecation notice for VFIO DMA map APIs Shahaf Shuler
6 siblings, 0 replies; 79+ messages in thread
From: Shahaf Shuler @ 2019-03-05 13:59 UTC (permalink / raw)
To: anatoly.burakov, yskoh, thomas, ferruh.yigit, nhorman, gaetan.rivet; +Cc: dev
The implementation reuses the external memory registration work done by
commit[1].
Note about representors:
The current representor design will not work
with those map and unmap functions. The reason is that for representors
we have multiple IB devices share the same PCI function, so mapping will
happen only on one of the representors and not all of them.
While it is possible to implement such support, the IB representor
design is going to be changed during DPDK19.05. The new design will have
a single IB device for all representors, hence sharing of a single
memory region between all representors will be possible.
[1]
commit 7e43a32ee060
("net/mlx5: support externally allocated static memory")
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
---
drivers/net/mlx5/mlx5.c | 2 +
drivers/net/mlx5/mlx5_mr.c | 139 ++++++++++++++++++++++++++++++++++++++
drivers/net/mlx5/mlx5_rxtx.h | 5 ++
3 files changed, 146 insertions(+)
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 9706e351aa..ab98aec8a2 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1630,6 +1630,8 @@ static struct rte_pci_driver mlx5_driver = {
.id_table = mlx5_pci_id_map,
.probe = mlx5_pci_probe,
.remove = mlx5_pci_remove,
+ .dma_map = mlx5_dma_map,
+ .dma_unmap = mlx5_dma_unmap,
.drv_flags = (RTE_PCI_DRV_INTR_LSC | RTE_PCI_DRV_INTR_RMV |
RTE_PCI_DRV_PROBE_AGAIN),
};
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 43ee9c961b..21f8b5e045 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -14,6 +14,7 @@
#include <rte_mempool.h>
#include <rte_malloc.h>
#include <rte_rwlock.h>
+#include <rte_bus_pci.h>
#include "mlx5.h"
#include "mlx5_mr.h"
@@ -1215,6 +1216,144 @@ mlx5_mr_update_ext_mp_cb(struct rte_mempool *mp, void *opaque,
}
/**
+ * Finds the first ethdev that match the pci device.
+ * The existence of multiple ethdev per pci device is only with representors.
+ * On such case, it is enough to get only one of the ports as they all share
+ * the same ibv context.
+ *
+ * @param pdev
+ * Pointer to the PCI device.
+ *
+ * @return
+ * Pointer to the ethdev if found, NULL otherwise.
+ */
+static struct rte_eth_dev *
+pci_dev_to_eth_dev(struct rte_pci_device *pdev)
+{
+ struct rte_dev_iterator it;
+ struct rte_device *dev;
+
+ /**
+ * We really need to iterate all devices regardless of
+ * their owner.
+ */
+ RTE_DEV_FOREACH(dev, "class=eth", &it)
+ if (dev == &pdev->device)
+ return it.class_device;
+ return NULL;
+}
+
+/**
+ * DPDK callback to DMA map external memory to a PCI device.
+ *
+ * @param pdev
+ * Pointer to the PCI device.
+ * @param addr
+ * Starting virtual address of memory to be mapped.
+ * @param iova
+ * Starting IOVA address of memory to be mapped.
+ * @param len
+ * Length of memory segment being mapped.
+ *
+ * @return
+ * 0 on success, negative value on error.
+ */
+int
+mlx5_dma_map(struct rte_pci_device *pdev, void *addr,
+ uint64_t iova __rte_unused, size_t len)
+{
+ struct rte_eth_dev *dev;
+ struct mlx5_mr *mr;
+ struct mlx5_priv *priv;
+
+ dev = pci_dev_to_eth_dev(pdev);
+ if (!dev) {
+ DRV_LOG(WARNING, "unable to find matching ethdev "
+ "to PCI device %p", (void *)pdev);
+ rte_errno = ENODEV;
+ return -1;
+ }
+ priv = dev->data->dev_private;
+ mr = mlx5_create_mr_ext(dev, (uintptr_t)addr, len, SOCKET_ID_ANY);
+ if (!mr) {
+ DRV_LOG(WARNING,
+ "port %u unable to dma map", dev->data->port_id);
+ rte_errno = EINVAL;
+ return -1;
+ }
+ rte_rwlock_write_lock(&priv->mr.rwlock);
+ LIST_INSERT_HEAD(&priv->mr.mr_list, mr, mr);
+ /* Insert to the global cache table. */
+ mr_insert_dev_cache(dev, mr);
+ rte_rwlock_write_unlock(&priv->mr.rwlock);
+ return 0;
+}
+
+/**
+ * DPDK callback to DMA unmap external memory to a PCI device.
+ *
+ * @param pdev
+ * Pointer to the PCI device.
+ * @param addr
+ * Starting virtual address of memory to be unmapped.
+ * @param iova
+ * Starting IOVA address of memory to be unmapped.
+ * @param len
+ * Length of memory segment being unmapped.
+ *
+ * @return
+ * 0 on success, negative value on error.
+ */
+int
+mlx5_dma_unmap(struct rte_pci_device *pdev, void *addr,
+ uint64_t iova __rte_unused, size_t len __rte_unused)
+{
+ struct rte_eth_dev *dev;
+ struct mlx5_priv *priv;
+ struct mlx5_mr *mr;
+ struct mlx5_mr_cache entry;
+
+ dev = pci_dev_to_eth_dev(pdev);
+ if (!dev) {
+ DRV_LOG(WARNING, "unable to find matching ethdev "
+ "to PCI device %p", (void *)pdev);
+ rte_errno = ENODEV;
+ return -1;
+ }
+ priv = dev->data->dev_private;
+ rte_rwlock_read_lock(&priv->mr.rwlock);
+ mr = mr_lookup_dev_list(dev, &entry, (uintptr_t)addr);
+ if (!mr) {
+ rte_rwlock_read_unlock(&priv->mr.rwlock);
+ DRV_LOG(WARNING, "address 0x%" PRIxPTR " wasn't registered "
+ "to PCI device %p", (uintptr_t)addr,
+ (void *)pdev);
+ rte_errno = EINVAL;
+ return -1;
+ }
+ LIST_REMOVE(mr, mr);
+ LIST_INSERT_HEAD(&priv->mr.mr_free_list, mr, mr);
+ DEBUG("port %u remove MR(%p) from list", dev->data->port_id,
+ (void *)mr);
+ mr_rebuild_dev_cache(dev);
+ /*
+ * Flush local caches by propagating invalidation across cores.
+ * rte_smp_wmb() is enough to synchronize this event. If one of
+ * freed memsegs is seen by other core, that means the memseg
+ * has been allocated by allocator, which will come after this
+ * free call. Therefore, this store instruction (incrementing
+ * generation below) will be guaranteed to be seen by other core
+ * before the core sees the newly allocated memory.
+ */
+ ++priv->mr.dev_gen;
+ DEBUG("broadcasting local cache flush, gen=%d",
+ priv->mr.dev_gen);
+ rte_smp_wmb();
+ rte_rwlock_read_unlock(&priv->mr.rwlock);
+ return 0;
+}
+
+/**
* Register MR for entire memory chunks in a Mempool having externally allocated
* memory and fill in local cache.
*
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index be464e8705..dcf044488e 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -28,6 +28,7 @@
#include <rte_atomic.h>
#include <rte_spinlock.h>
#include <rte_io.h>
+#include <rte_bus_pci.h>
#include "mlx5_utils.h"
#include "mlx5.h"
@@ -367,6 +368,10 @@ uint32_t mlx5_rx_addr2mr_bh(struct mlx5_rxq_data *rxq, uintptr_t addr);
uint32_t mlx5_tx_mb2mr_bh(struct mlx5_txq_data *txq, struct rte_mbuf *mb);
uint32_t mlx5_tx_update_ext_mp(struct mlx5_txq_data *txq, uintptr_t addr,
struct rte_mempool *mp);
+int mlx5_dma_map(struct rte_pci_device *pdev, void *addr, uint64_t iova,
+ size_t len);
+int mlx5_dma_unmap(struct rte_pci_device *pdev, void *addr, uint64_t iova,
+ size_t len);
/**
* Provide safe 64bit store operation to mlx5 UAR region for both 32bit and
--
2.12.0
^ permalink raw reply [flat|nested] 79+ messages in thread
* [dpdk-dev] [PATCH v3 6/6] doc: deprecation notice for VFIO DMA map APIs
2019-02-21 14:50 ` [dpdk-dev] [PATCH v2 " Shahaf Shuler
` (5 preceding siblings ...)
2019-03-05 13:59 ` [dpdk-dev] [PATCH v3 5/6] net/mlx5: support PCI device DMA map and unmap Shahaf Shuler
@ 2019-03-05 13:59 ` Shahaf Shuler
6 siblings, 0 replies; 79+ messages in thread
From: Shahaf Shuler @ 2019-03-05 13:59 UTC (permalink / raw)
To: anatoly.burakov, yskoh, thomas, ferruh.yigit, nhorman, gaetan.rivet; +Cc: dev
As those should be replaced by rte_dev_dma_map and rte_dev_dma_unmap
APIs.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
---
doc/guides/prog_guide/env_abstraction_layer.rst | 2 +-
doc/guides/rel_notes/deprecation.rst | 4 ++++
2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index 929d76dba7..ec2fe65523 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -282,7 +282,7 @@ The expected workflow is as follows:
- If IOVA table is not specified, IOVA addresses will be assumed to be
unavailable
- Other processes must attach to the memory area before they can use it
-* Perform DMA mapping with ``rte_vfio_dma_map`` if needed
+* Perform DMA mapping with ``rte_dev_dma_map`` if needed
* Use the memory area in your application
* If memory area is no longer needed, it can be unregistered
- If the area was mapped for DMA, unmapping must be performed before
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 1b4fcb7e64..48ec4fee88 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -35,6 +35,10 @@ Deprecation Notices
+ ``rte_eal_devargs_type_count``
+* vfio: removal of ``rte_vfio_dma_map`` and ``rte_vfio_dma_unmap`` APIs which
+ have been replaced with ``rte_dev_dma_map`` and ``rte_dev_dma_unmap``
+ functions. The due date for the removal targets DPDK 20.02.
+
* pci: Several exposed functions are misnamed.
The following functions are deprecated starting from v17.11 and are replaced:
--
2.12.0
^ permalink raw reply [flat|nested] 79+ messages in thread
* [dpdk-dev] [PATCH v2 1/6] vfio: allow DMA map of memory for the default vfio fd
2019-02-13 9:10 [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for external memory Shahaf Shuler
` (7 preceding siblings ...)
2019-02-21 14:50 ` [dpdk-dev] [PATCH v2 " Shahaf Shuler
@ 2019-02-21 14:50 ` Shahaf Shuler
2019-02-28 11:56 ` Burakov, Anatoly
2019-02-21 14:50 ` [dpdk-dev] [PATCH v2 2/6] vfio: don't fail to DMA map if memory is already mapped Shahaf Shuler
` (4 subsequent siblings)
13 siblings, 1 reply; 79+ messages in thread
From: Shahaf Shuler @ 2019-02-21 14:50 UTC (permalink / raw)
To: anatoly.burakov, yskoh, thomas, ferruh.yigit, nhorman, gaetan.rivet; +Cc: dev
Enable users the option to call rte_vfio_dma_map with request to map
to the default vfio fd.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
---
doc/guides/rel_notes/release_19_05.rst | 3 +++
lib/librte_eal/common/include/rte_vfio.h | 8 ++++++--
lib/librte_eal/linuxapp/eal/eal_vfio.c | 10 ++++++++--
3 files changed, 17 insertions(+), 4 deletions(-)
diff --git a/doc/guides/rel_notes/release_19_05.rst b/doc/guides/rel_notes/release_19_05.rst
index 2b0f60d3d8..2c682e36cf 100644
--- a/doc/guides/rel_notes/release_19_05.rst
+++ b/doc/guides/rel_notes/release_19_05.rst
@@ -110,6 +110,9 @@ ABI Changes
Also, make sure to start the actual text at the margin.
=========================================================
+* vfio: Functions ``rte_vfio_container_dma_map`` and
+ ``rte_vfio_container_dma_unmap`` have been extended with an option to
+ request mapping or un-mapping to the default vfio container fd.
Shared Library Versions
-----------------------
diff --git a/lib/librte_eal/common/include/rte_vfio.h b/lib/librte_eal/common/include/rte_vfio.h
index cae96fab90..54a0df5726 100644
--- a/lib/librte_eal/common/include/rte_vfio.h
+++ b/lib/librte_eal/common/include/rte_vfio.h
@@ -73,6 +73,8 @@ struct vfio_info_cap_header {
#define RTE_VFIO_CAP_MSIX_MAPPABLE 3
#endif
+#define RTE_VFIO_DEFAULT_CONTAINER_FD (-1)
+
#else /* not VFIO_PRESENT */
/* we don't need an actual definition, only pointer is used */
@@ -347,7 +349,8 @@ rte_vfio_container_group_unbind(int container_fd, int iommu_group_num);
* Perform DMA mapping for devices in a container.
*
* @param container_fd
- * the specified container fd
+ * the specified container fd. Use RTE_VFIO_DEFAULT_CONTAINER_FD to
+ * use the default container.
*
* @param vaddr
* Starting virtual address of memory to be mapped.
@@ -370,7 +373,8 @@ rte_vfio_container_dma_map(int container_fd, uint64_t vaddr,
* Perform DMA unmapping for devices in a container.
*
* @param container_fd
- * the specified container fd
+ * the specified container fd. Use RTE_VFIO_DEFAULT_CONTAINER_FD to
+ * use the default container.
*
* @param vaddr
* Starting virtual address of memory to be unmapped.
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index c821e83826..9adbda8bb7 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -1897,7 +1897,10 @@ rte_vfio_container_dma_map(int container_fd, uint64_t vaddr, uint64_t iova,
return -1;
}
- vfio_cfg = get_vfio_cfg_by_container_fd(container_fd);
+ if (container_fd == RTE_VFIO_DEFAULT_CONTAINER_FD)
+ vfio_cfg = default_vfio_cfg;
+ else
+ vfio_cfg = get_vfio_cfg_by_container_fd(container_fd);
if (vfio_cfg == NULL) {
RTE_LOG(ERR, EAL, "Invalid container fd\n");
return -1;
@@ -1917,7 +1920,10 @@ rte_vfio_container_dma_unmap(int container_fd, uint64_t vaddr, uint64_t iova,
return -1;
}
- vfio_cfg = get_vfio_cfg_by_container_fd(container_fd);
+ if (container_fd == RTE_VFIO_DEFAULT_CONTAINER_FD)
+ vfio_cfg = default_vfio_cfg;
+ else
+ vfio_cfg = get_vfio_cfg_by_container_fd(container_fd);
if (vfio_cfg == NULL) {
RTE_LOG(ERR, EAL, "Invalid container fd\n");
return -1;
--
2.12.0
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/6] vfio: allow DMA map of memory for the default vfio fd
2019-02-21 14:50 ` [dpdk-dev] [PATCH v2 1/6] vfio: allow DMA map of memory for the default vfio fd Shahaf Shuler
@ 2019-02-28 11:56 ` Burakov, Anatoly
0 siblings, 0 replies; 79+ messages in thread
From: Burakov, Anatoly @ 2019-02-28 11:56 UTC (permalink / raw)
To: Shahaf Shuler, yskoh, thomas, ferruh.yigit, nhorman, gaetan.rivet; +Cc: dev
On 21-Feb-19 2:50 PM, Shahaf Shuler wrote:
> Enable users the option to call rte_vfio_dma_map with request to map
> to the default vfio fd.
>
> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> ---
LGTM
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
--
Thanks,
Anatoly
^ permalink raw reply [flat|nested] 79+ messages in thread
* [dpdk-dev] [PATCH v2 2/6] vfio: don't fail to DMA map if memory is already mapped
2019-02-13 9:10 [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for external memory Shahaf Shuler
` (8 preceding siblings ...)
2019-02-21 14:50 ` [dpdk-dev] [PATCH v2 1/6] vfio: allow DMA map of memory for the default vfio fd Shahaf Shuler
@ 2019-02-21 14:50 ` Shahaf Shuler
2019-02-28 11:58 ` Burakov, Anatoly
2019-02-21 14:50 ` [dpdk-dev] [PATCH v2 3/6] bus: introduce device level DMA memory mapping Shahaf Shuler
` (3 subsequent siblings)
13 siblings, 1 reply; 79+ messages in thread
From: Shahaf Shuler @ 2019-02-21 14:50 UTC (permalink / raw)
To: anatoly.burakov, yskoh, thomas, ferruh.yigit, nhorman, gaetan.rivet; +Cc: dev
Currently vfio DMA map function will fail in case the same memory
segment is mapped twice.
This is too strict, as this is not an error to map the same memory
twice.
Instead, use the kernel return value to detect such state and have the
DMA function to return as successful.
For type1 mapping the kernel driver returns EEXISTS.
For spapr mapping EBUSY is returned since kernel 4.10.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
---
lib/librte_eal/linuxapp/eal/eal_vfio.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index 9adbda8bb7..9e8ad399f5 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -1263,7 +1263,11 @@ vfio_type1_dma_mem_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova,
VFIO_DMA_MAP_FLAG_WRITE;
ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, &dma_map);
- if (ret) {
+ /**
+ * In case the mapping was already done EEXIST will be
+ * returned from kernel.
+ */
+ if (ret && (errno != EEXIST)) {
RTE_LOG(ERR, EAL, " cannot set up DMA remapping, error %i (%s)\n",
errno, strerror(errno));
return -1;
@@ -1324,7 +1328,11 @@ vfio_spapr_dma_do_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova,
VFIO_DMA_MAP_FLAG_WRITE;
ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, &dma_map);
- if (ret) {
+ /**
+ * In case the mapping was already done EBUSY will be
+ * returned from kernel.
+ */
+ if (ret && (errno != EBUSY)) {
RTE_LOG(ERR, EAL, " cannot set up DMA remapping, error %i (%s)\n",
errno, strerror(errno));
return -1;
--
2.12.0
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH v2 2/6] vfio: don't fail to DMA map if memory is already mapped
2019-02-21 14:50 ` [dpdk-dev] [PATCH v2 2/6] vfio: don't fail to DMA map if memory is already mapped Shahaf Shuler
@ 2019-02-28 11:58 ` Burakov, Anatoly
0 siblings, 0 replies; 79+ messages in thread
From: Burakov, Anatoly @ 2019-02-28 11:58 UTC (permalink / raw)
To: Shahaf Shuler, yskoh, thomas, ferruh.yigit, nhorman, gaetan.rivet; +Cc: dev
On 21-Feb-19 2:50 PM, Shahaf Shuler wrote:
> Currently vfio DMA map function will fail in case the same memory
> segment is mapped twice.
>
> This is too strict, as this is not an error to map the same memory
> twice.
>
> Instead, use the kernel return value to detect such state and have the
> DMA function to return as successful.
>
> For type1 mapping the kernel driver returns EEXISTS.
> For spapr mapping EBUSY is returned since kernel 4.10.
>
> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> ---
> lib/librte_eal/linuxapp/eal/eal_vfio.c | 12 ++++++++++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> index 9adbda8bb7..9e8ad399f5 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> @@ -1263,7 +1263,11 @@ vfio_type1_dma_mem_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova,
> VFIO_DMA_MAP_FLAG_WRITE;
>
> ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, &dma_map);
> - if (ret) {
> + /**
> + * In case the mapping was already done EEXIST will be
> + * returned from kernel.
> + */
> + if (ret && (errno != EEXIST)) {
> RTE_LOG(ERR, EAL, " cannot set up DMA remapping, error %i (%s)\n",
> errno, strerror(errno));
> return -1;
> @@ -1324,7 +1328,11 @@ vfio_spapr_dma_do_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova,
> VFIO_DMA_MAP_FLAG_WRITE;
>
> ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, &dma_map);
> - if (ret) {
> + /**
> + * In case the mapping was already done EBUSY will be
> + * returned from kernel.
> + */
> + if (ret && (errno != EBUSY)) {
> RTE_LOG(ERR, EAL, " cannot set up DMA remapping, error %i (%s)\n",
> errno, strerror(errno));
> return -1;
>
Nitpicking, but maybe it would be good to throw a debug output about not
attempting to map because the area is already mapped. Silently ignoring
the error won't help with debugging, should any issues arise :)
Otherwise LGTM
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
--
Thanks,
Anatoly
^ permalink raw reply [flat|nested] 79+ messages in thread
* [dpdk-dev] [PATCH v2 3/6] bus: introduce device level DMA memory mapping
2019-02-13 9:10 [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for external memory Shahaf Shuler
` (9 preceding siblings ...)
2019-02-21 14:50 ` [dpdk-dev] [PATCH v2 2/6] vfio: don't fail to DMA map if memory is already mapped Shahaf Shuler
@ 2019-02-21 14:50 ` Shahaf Shuler
2019-02-28 12:14 ` Burakov, Anatoly
2019-02-21 14:50 ` [dpdk-dev] [PATCH v2 4/6] net/mlx5: refactor external memory registration Shahaf Shuler
` (2 subsequent siblings)
13 siblings, 1 reply; 79+ messages in thread
From: Shahaf Shuler @ 2019-02-21 14:50 UTC (permalink / raw)
To: anatoly.burakov, yskoh, thomas, ferruh.yigit, nhorman, gaetan.rivet; +Cc: dev
The DPDK APIs expose 3 different modes to work with memory used for DMA:
1. Use the DPDK owned memory (backed by the DPDK provided hugepages).
This memory is allocated by the DPDK libraries, included in the DPDK
memory system (memseg lists) and automatically DMA mapped by the DPDK
layers.
2. Use memory allocated by the user and register to the DPDK memory
systems. Upon registration of memory, the DPDK layers will DMA map it
to all needed devices. After registration, allocation of this memory
will be done with rte_*malloc APIs.
3. Use memory allocated by the user and not registered to the DPDK memory
system. This is for users who wants to have tight control on this
memory (e.g. avoid the rte_malloc header).
The user should create a memory, register it through rte_extmem_register
API, and call DMA map function in order to register such memory to
the different devices.
The scope of the patch focus on #3 above.
Currently the only way to map external memory is through VFIO
(rte_vfio_dma_map). While VFIO is common, there are other vendors
which use different ways to map memory (e.g. Mellanox and NXP).
The work in this patch moves the DMA mapping to vendor agnostic APIs.
Device level DMA map and unmap APIs were added. Implementation of those
APIs was done currently only for PCI devices.
For PCI bus devices, the pci driver can expose its own map and unmap
functions to be used for the mapping. In case the driver doesn't provide
any, the memory will be mapped, if possible, to IOMMU through VFIO APIs.
Application usage with those APIs is quite simple:
* allocate memory
* call rte_extmem_register on the memory chunk.
* take a device, and query its rte_device.
* call the device specific mapping function for this device.
Future work will deprecate the rte_vfio_dma_map and rte_vfio_dma_unmap
APIs, leaving the rte device APIs as the preferred option for the user.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
---
drivers/bus/pci/pci_common.c | 48 ++++++++++++++++++++++++++++
drivers/bus/pci/rte_bus_pci.h | 40 +++++++++++++++++++++++
lib/librte_eal/common/eal_common_dev.c | 29 +++++++++++++++++
lib/librte_eal/common/include/rte_bus.h | 44 +++++++++++++++++++++++++
lib/librte_eal/common/include/rte_dev.h | 43 +++++++++++++++++++++++++
lib/librte_eal/rte_eal_version.map | 2 ++
6 files changed, 206 insertions(+)
diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index 6276e5d695..7013c53a7b 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -528,6 +528,52 @@ pci_unplug(struct rte_device *dev)
return ret;
}
+static int
+pci_dma_map(struct rte_device *dev, void *addr, uint64_t iova, size_t len)
+{
+ struct rte_pci_device *pdev = RTE_DEV_TO_PCI(dev);
+
+ if (!pdev || !pdev->driver) {
+ rte_errno = EINVAL;
+ return -rte_errno;
+ }
+ if (pdev->driver->dma_map)
+ return pdev->driver->dma_map(pdev, addr, iova, len);
+ /**
+ * In case driver don't provides any specific mapping
+ * try fallback to VFIO.
+ */
+ if (pdev->kdrv == RTE_KDRV_VFIO)
+ return rte_vfio_container_dma_map
+ (RTE_VFIO_DEFAULT_CONTAINER_FD, (uintptr_t)addr,
+ iova, len);
+ rte_errno = ENOTSUP;
+ return -rte_errno;
+}
+
+static int
+pci_dma_unmap(struct rte_device *dev, void *addr, uint64_t iova, size_t len)
+{
+ struct rte_pci_device *pdev = RTE_DEV_TO_PCI(dev);
+
+ if (!pdev || !pdev->driver) {
+ rte_errno = EINVAL;
+ return -rte_errno;
+ }
+ if (pdev->driver->dma_unmap)
+ return pdev->driver->dma_unmap(pdev, addr, iova, len);
+ /**
+ * In case driver don't provides any specific mapping
+ * try fallback to VFIO.
+ */
+ if (pdev->kdrv == RTE_KDRV_VFIO)
+ return rte_vfio_container_dma_unmap
+ (RTE_VFIO_DEFAULT_CONTAINER_FD, (uintptr_t)addr,
+ iova, len);
+ rte_errno = ENOTSUP;
+ return -rte_errno;
+}
+
struct rte_pci_bus rte_pci_bus = {
.bus = {
.scan = rte_pci_scan,
@@ -536,6 +582,8 @@ struct rte_pci_bus rte_pci_bus = {
.plug = pci_plug,
.unplug = pci_unplug,
.parse = pci_parse,
+ .dma_map = pci_dma_map,
+ .dma_unmap = pci_dma_unmap,
.get_iommu_class = rte_pci_get_iommu_class,
.dev_iterate = rte_pci_dev_iterate,
.hot_unplug_handler = pci_hot_unplug_handler,
diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h
index f0d6d81c00..06e004cd3f 100644
--- a/drivers/bus/pci/rte_bus_pci.h
+++ b/drivers/bus/pci/rte_bus_pci.h
@@ -114,6 +114,44 @@ typedef int (pci_probe_t)(struct rte_pci_driver *, struct rte_pci_device *);
typedef int (pci_remove_t)(struct rte_pci_device *);
/**
+ * Driver-specific DMA mapping. After a successful call the device
+ * will be able to read/write from/to this segment.
+ *
+ * @param dev
+ * Pointer to the PCI device.
+ * @param addr
+ * Starting virtual address of memory to be mapped.
+ * @param iova
+ * Starting IOVA address of memory to be mapped.
+ * @param len
+ * Length of memory segment being mapped.
+ * @return
+ * - 0 On success.
+ * - Negative value and rte_errno is set otherwise.
+ */
+typedef int (pci_dma_map_t)(struct rte_pci_device *dev, void *addr,
+ uint64_t iova, size_t len);
+
+/**
+ * Driver-specific DMA un-mapping. After a successful call the device
+ * will not be able to read/write from/to this segment.
+ *
+ * @param dev
+ * Pointer to the PCI device.
+ * @param addr
+ * Starting virtual address of memory to be unmapped.
+ * @param iova
+ * Starting IOVA address of memory to be unmapped.
+ * @param len
+ * Length of memory segment being unmapped.
+ * @return
+ * - 0 On success.
+ * - Negative value and rte_errno is set otherwise.
+ */
+typedef int (pci_dma_unmap_t)(struct rte_pci_device *dev, void *addr,
+ uint64_t iova, size_t len);
+
+/**
* A structure describing a PCI driver.
*/
struct rte_pci_driver {
@@ -122,6 +160,8 @@ struct rte_pci_driver {
struct rte_pci_bus *bus; /**< PCI bus reference. */
pci_probe_t *probe; /**< Device Probe function. */
pci_remove_t *remove; /**< Device Remove function. */
+ pci_dma_map_t *dma_map; /**< device dma map function. */
+ pci_dma_unmap_t *dma_unmap; /**< device dma unmap function. */
const struct rte_pci_id *id_table; /**< ID table, NULL terminated. */
uint32_t drv_flags; /**< Flags RTE_PCI_DRV_*. */
};
diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
index fd7f5ca7d5..ab4dbc9499 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -756,3 +756,32 @@ rte_dev_iterator_next(struct rte_dev_iterator *it)
free(cls_str);
return it->device;
}
+
+int
+rte_dev_dma_map(struct rte_device *dev, void *addr, uint64_t iova,
+ size_t len)
+{
+ if (dev->bus->dma_map == NULL || len == 0) {
+ rte_errno = EINVAL;
+ return -rte_errno;
+ }
+ /* Memory must be registered through rte_extmem_* APIs */
+ if (rte_mem_virt2memseg(addr, NULL) == NULL) {
+ rte_errno = EINVAL;
+ return -rte_errno;
+ }
+
+ return dev->bus->dma_map(dev, addr, iova, len);
+}
+
+int
+rte_dev_dma_unmap(struct rte_device *dev, void *addr, uint64_t iova,
+ size_t len)
+{
+ if (dev->bus->dma_unmap == NULL || len == 0) {
+ rte_errno = EINVAL;
+ return -rte_errno;
+ }
+
+ return dev->bus->dma_unmap(dev, addr, iova, len);
+}
diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index 6be4b5cabe..4faf2d20a0 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -168,6 +168,48 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
typedef int (*rte_bus_parse_t)(const char *name, void *addr);
/**
+ * Device level DMA map function.
+ * After a successful call, the memory segment will be mapped to the
+ * given device.
+ *
+ * @param dev
+ * Device pointer.
+ * @param addr
+ * Virtual address to map.
+ * @param iova
+ * IOVA address to map.
+ * @param len
+ * Length of the memory segment being mapped.
+ *
+ * @return
+ * 0 if mapping was successful.
+ * Negative value and rte_errno is set otherwise.
+ */
+typedef int (*rte_dev_dma_map_t)(struct rte_device *dev, void *addr,
+ uint64_t iova, size_t len);
+
+/**
+ * Device level DMA unmap function.
+ * After a successful call, the memory segment will no longer be
+ * accessible by the given device.
+ *
+ * @param dev
+ * Device pointer.
+ * @param addr
+ * Virtual address to unmap.
+ * @param iova
+ * IOVA address to unmap.
+ * @param len
+ * Length of the memory segment being mapped.
+ *
+ * @return
+ * 0 if un-mapping was successful.
+ * Negative value and rte_errno is set otherwise.
+ */
+typedef int (*rte_dev_dma_unmap_t)(struct rte_device *dev, void *addr,
+ uint64_t iova, size_t len);
+
+/**
* Implement a specific hot-unplug handler, which is responsible for
* handle the failure when device be hot-unplugged. When the event of
* hot-unplug be detected, it could call this function to handle
@@ -238,6 +280,8 @@ struct rte_bus {
rte_bus_plug_t plug; /**< Probe single device for drivers */
rte_bus_unplug_t unplug; /**< Remove single device from driver */
rte_bus_parse_t parse; /**< Parse a device name */
+ rte_dev_dma_map_t dma_map; /**< DMA map for device in the bus */
+ rte_dev_dma_unmap_t dma_unmap; /**< DMA unmap for device in the bus */
struct rte_bus_conf conf; /**< Bus configuration */
rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
rte_dev_iterate_t dev_iterate; /**< Device iterator. */
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index a9724dc918..fd39c5fd86 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -515,4 +515,47 @@ rte_dev_hotplug_handle_enable(void);
int __rte_experimental
rte_dev_hotplug_handle_disable(void);
+/**
+ * Device level DMA map function.
+ * After a successful call, the memory segment will be mapped to the
+ * given device.
+ *
+ * @param dev
+ * Device pointer.
+ * @param addr
+ * Virtual address to map.
+ * @param iova
+ * IOVA address to map.
+ * @param len
+ * Length of the memory segment being mapped.
+ *
+ * @return
+ * 0 if mapping was successful.
+ * Negative value and rte_errno is set otherwise.
+ */
+int __rte_experimental
+rte_dev_dma_map(struct rte_device *dev, void *addr, uint64_t iova, size_t len);
+
+/**
+ * Device level DMA unmap function.
+ * After a successful call, the memory segment will no longer be
+ * accessible by the given device.
+ *
+ * @param dev
+ * Device pointer.
+ * @param addr
+ * Virtual address to unmap.
+ * @param iova
+ * IOVA address to unmap.
+ * @param len
+ * Length of the memory segment being mapped.
+ *
+ * @return
+ * 0 if un-mapping was successful.
+ * Negative value and rte_errno is set otherwise.
+ */
+int __rte_experimental
+rte_dev_dma_unmap(struct rte_device *dev, void *addr, uint64_t iova,
+ size_t len);
+
#endif /* _RTE_DEV_H_ */
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index eb5f7b9cbd..264aa050fa 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -277,6 +277,8 @@ EXPERIMENTAL {
rte_class_unregister;
rte_ctrl_thread_create;
rte_delay_us_sleep;
+ rte_dev_dma_map;
+ rte_dev_dma_unmap;
rte_dev_event_callback_process;
rte_dev_event_callback_register;
rte_dev_event_callback_unregister;
--
2.12.0
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH v2 3/6] bus: introduce device level DMA memory mapping
2019-02-21 14:50 ` [dpdk-dev] [PATCH v2 3/6] bus: introduce device level DMA memory mapping Shahaf Shuler
@ 2019-02-28 12:14 ` Burakov, Anatoly
2019-02-28 14:41 ` Burakov, Anatoly
0 siblings, 1 reply; 79+ messages in thread
From: Burakov, Anatoly @ 2019-02-28 12:14 UTC (permalink / raw)
To: Shahaf Shuler, yskoh, thomas, ferruh.yigit, nhorman, gaetan.rivet; +Cc: dev
On 21-Feb-19 2:50 PM, Shahaf Shuler wrote:
> The DPDK APIs expose 3 different modes to work with memory used for DMA:
>
> 1. Use the DPDK owned memory (backed by the DPDK provided hugepages).
> This memory is allocated by the DPDK libraries, included in the DPDK
> memory system (memseg lists) and automatically DMA mapped by the DPDK
> layers.
>
> 2. Use memory allocated by the user and register to the DPDK memory
> systems. Upon registration of memory, the DPDK layers will DMA map it
> to all needed devices. After registration, allocation of this memory
> will be done with rte_*malloc APIs.
>
> 3. Use memory allocated by the user and not registered to the DPDK memory
> system. This is for users who wants to have tight control on this
> memory (e.g. avoid the rte_malloc header).
> The user should create a memory, register it through rte_extmem_register
> API, and call DMA map function in order to register such memory to
> the different devices.
>
> The scope of the patch focus on #3 above.
>
> Currently the only way to map external memory is through VFIO
> (rte_vfio_dma_map). While VFIO is common, there are other vendors
> which use different ways to map memory (e.g. Mellanox and NXP).
>
> The work in this patch moves the DMA mapping to vendor agnostic APIs.
> Device level DMA map and unmap APIs were added. Implementation of those
> APIs was done currently only for PCI devices.
>
> For PCI bus devices, the pci driver can expose its own map and unmap
> functions to be used for the mapping. In case the driver doesn't provide
> any, the memory will be mapped, if possible, to IOMMU through VFIO APIs.
>
> Application usage with those APIs is quite simple:
> * allocate memory
> * call rte_extmem_register on the memory chunk.
> * take a device, and query its rte_device.
> * call the device specific mapping function for this device.
>
> Future work will deprecate the rte_vfio_dma_map and rte_vfio_dma_unmap
> APIs, leaving the rte device APIs as the preferred option for the user.
>
> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> ---
<snip>
> +
> + if (!pdev || !pdev->driver) {
> + rte_errno = EINVAL;
> + return -rte_errno;
> + }
We could put a check in here to see if the memory has been registered
with DPDK. Just call rte_mem_virt2memseg_list(addr) - if it returns
NULL, the memory wasn't registered, so you can throw an error. Not sure
of appropriate errno in that case - ENODEV? EINVAL?
> + if (pdev->driver->dma_map)
> + return pdev->driver->dma_map(pdev, addr, iova, len);
> + /**
> + * In case driver don't provides any specific mapping
> + * try fallback to VFIO.
> + */
> + if (pdev->kdrv == RTE_KDRV_VFIO)
> + return rte_vfio_container_dma_map
> + (RTE_VFIO_DEFAULT_CONTAINER_FD, (uintptr_t)addr,
> + iova, len);
<snip>
> +rte_dev_dma_map(struct rte_device *dev, void *addr, uint64_t iova,
> + size_t len)
> +{
> + if (dev->bus->dma_map == NULL || len == 0) {
> + rte_errno = EINVAL;
> + return -rte_errno;
> + }
> + /* Memory must be registered through rte_extmem_* APIs */
> + if (rte_mem_virt2memseg(addr, NULL) == NULL) {
No need to call rte_mem_virt2memseg - rte_mem_virt2memseg_list will do.
> + rte_errno = EINVAL;
> + return -rte_errno;
> + }
> +
> + return dev->bus->dma_map(dev, addr, iova, len);
> +}
> +
> +int
> +rte_dev_dma_unmap(struct rte_device *dev, void *addr, uint64_t iova,
> + size_t len)
> +{
> + if (dev->bus->dma_unmap == NULL || len == 0) {
> + rte_errno = EINVAL;
> + return -rte_errno;
> + }
I think attempting to unmap a memory region that isn't registered should
be an error, so rte_mem_virt2memseg_list call should be here too.
> +
> + return dev->bus->dma_unmap(dev, addr, iova, len);
> +}
> diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
> index 6be4b5cabe..4faf2d20a0 100644
> --- a/lib/librte_eal/common/include/rte_bus.h
> +++ b/lib/librte_eal/common/include/rte_bus.h
> @@ -168,6 +168,48 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
> typedef int (*rte_bus_parse_t)(const char *name, void *addr);
<snip>
> --- a/lib/librte_eal/common/include/rte_dev.h
> +++ b/lib/librte_eal/common/include/rte_dev.h
> @@ -515,4 +515,47 @@ rte_dev_hotplug_handle_enable(void);
> int __rte_experimental
> rte_dev_hotplug_handle_disable(void);
>
> +/**
> + * Device level DMA map function.
> + * After a successful call, the memory segment will be mapped to the
> + * given device.
here and in unmap:
@note please register memory first
?
> + *
> + * @param dev
> + * Device pointer.
> + * @param addr
> + * Virtual address to map.
> + * @param iova
> + * IOVA address to map.
> + * @param len
> + * Length of the memory segment being mapped.
> + *
> + * @return
> + * 0 if mapping was successful.
> + * Negative value and rte_errno is set otherwise.
Here and in other similar places: why are we setting rte_errno *and*
returning -rte_errno? Wouldn't returning -1 be enough?
--
Thanks,
Anatoly
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH v2 3/6] bus: introduce device level DMA memory mapping
2019-02-28 12:14 ` Burakov, Anatoly
@ 2019-02-28 14:41 ` Burakov, Anatoly
0 siblings, 0 replies; 79+ messages in thread
From: Burakov, Anatoly @ 2019-02-28 14:41 UTC (permalink / raw)
To: Shahaf Shuler, yskoh, thomas, ferruh.yigit, nhorman, gaetan.rivet; +Cc: dev
On 28-Feb-19 12:14 PM, Burakov, Anatoly wrote:
> On 21-Feb-19 2:50 PM, Shahaf Shuler wrote:
>> The DPDK APIs expose 3 different modes to work with memory used for DMA:
>>
>> 1. Use the DPDK owned memory (backed by the DPDK provided hugepages).
>> This memory is allocated by the DPDK libraries, included in the DPDK
>> memory system (memseg lists) and automatically DMA mapped by the DPDK
>> layers.
>>
>> 2. Use memory allocated by the user and register to the DPDK memory
>> systems. Upon registration of memory, the DPDK layers will DMA map it
>> to all needed devices. After registration, allocation of this memory
>> will be done with rte_*malloc APIs.
>>
>> 3. Use memory allocated by the user and not registered to the DPDK memory
>> system. This is for users who wants to have tight control on this
>> memory (e.g. avoid the rte_malloc header).
>> The user should create a memory, register it through rte_extmem_register
>> API, and call DMA map function in order to register such memory to
>> the different devices.
>>
>> The scope of the patch focus on #3 above.
>>
>> Currently the only way to map external memory is through VFIO
>> (rte_vfio_dma_map). While VFIO is common, there are other vendors
>> which use different ways to map memory (e.g. Mellanox and NXP).
>>
>> The work in this patch moves the DMA mapping to vendor agnostic APIs.
>> Device level DMA map and unmap APIs were added. Implementation of those
>> APIs was done currently only for PCI devices.
>>
>> For PCI bus devices, the pci driver can expose its own map and unmap
>> functions to be used for the mapping. In case the driver doesn't provide
>> any, the memory will be mapped, if possible, to IOMMU through VFIO APIs.
>>
>> Application usage with those APIs is quite simple:
>> * allocate memory
>> * call rte_extmem_register on the memory chunk.
>> * take a device, and query its rte_device.
>> * call the device specific mapping function for this device.
>>
>> Future work will deprecate the rte_vfio_dma_map and rte_vfio_dma_unmap
>> APIs, leaving the rte device APIs as the preferred option for the user.
>>
>> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
>> ---
>
> <snip>
>
>> +
>> + if (!pdev || !pdev->driver) {
>> + rte_errno = EINVAL;
>> + return -rte_errno;
>> + }
>
> We could put a check in here to see if the memory has been registered
> with DPDK. Just call rte_mem_virt2memseg_list(addr) - if it returns
> NULL, the memory wasn't registered, so you can throw an error. Not sure
> of appropriate errno in that case - ENODEV? EINVAL?
Apologies - i meant to delete that, but hit one ctrl+Z too many :(
--
Thanks,
Anatoly
^ permalink raw reply [flat|nested] 79+ messages in thread
* [dpdk-dev] [PATCH v2 4/6] net/mlx5: refactor external memory registration
2019-02-13 9:10 [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for external memory Shahaf Shuler
` (10 preceding siblings ...)
2019-02-21 14:50 ` [dpdk-dev] [PATCH v2 3/6] bus: introduce device level DMA memory mapping Shahaf Shuler
@ 2019-02-21 14:50 ` Shahaf Shuler
2019-02-21 14:50 ` [dpdk-dev] [PATCH v2 5/6] net/mlx5: support PCI device DMA map and unmap Shahaf Shuler
2019-02-21 14:50 ` [dpdk-dev] [PATCH v2 6/6] doc: deprecate VFIO DMA map APIs Shahaf Shuler
13 siblings, 0 replies; 79+ messages in thread
From: Shahaf Shuler @ 2019-02-21 14:50 UTC (permalink / raw)
To: anatoly.burakov, yskoh, thomas, ferruh.yigit, nhorman, gaetan.rivet; +Cc: dev
Move the memory region creation to a separate function to
prepare the ground for the reuse of it on the PCI driver map and unmap
functions.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
---
drivers/net/mlx5/mlx5_mr.c | 86 +++++++++++++++++++++++++++--------------
1 file changed, 57 insertions(+), 29 deletions(-)
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 442b2d2321..32be6a5445 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -1109,6 +1109,58 @@ mlx5_mr_flush_local_cache(struct mlx5_mr_ctrl *mr_ctrl)
}
/**
+ * Creates a memory region for external memory, that is memory which is not
+ * part of the DPDK memory segments.
+ *
+ * @param dev
+ * Pointer to the ethernet device.
+ * @param addr
+ * Starting virtual address of memory.
+ * @param len
+ * Length of memory segment being mapped.
+ * @param socked_id
+ * Socket to allocate heap memory for the control structures.
+ *
+ * @return
+ * Pointer to MR structure on success, NULL otherwise.
+ */
+static struct mlx5_mr *
+mlx5_create_mr_ext(struct rte_eth_dev *dev, uintptr_t addr, size_t len,
+ int socket_id)
+{
+ struct priv *priv = dev->data->dev_private;
+ struct mlx5_mr *mr = NULL;
+
+ mr = rte_zmalloc_socket(NULL,
+ RTE_ALIGN_CEIL(sizeof(*mr),
+ RTE_CACHE_LINE_SIZE),
+ RTE_CACHE_LINE_SIZE, socket_id);
+ if (mr == NULL)
+ return NULL;
+ mr->ibv_mr = mlx5_glue->reg_mr(priv->pd, (void *)addr, len,
+ IBV_ACCESS_LOCAL_WRITE);
+ if (mr->ibv_mr == NULL) {
+ DRV_LOG(WARNING,
+ "port %u fail to create a verbs MR for address (%p)",
+ dev->data->port_id, (void *)addr);
+ rte_free(mr);
+ return NULL;
+ }
+ mr->msl = NULL; /* Mark it is external memory. */
+ mr->ms_bmp = NULL;
+ mr->ms_n = 1;
+ mr->ms_bmp_n = 1;
+ DRV_LOG(DEBUG,
+ "port %u MR CREATED (%p) for external memory %p:\n"
+ " [0x%" PRIxPTR ", 0x%" PRIxPTR "),"
+ " lkey=0x%x base_idx=%u ms_n=%u, ms_bmp_n=%u",
+ dev->data->port_id, (void *)mr, (void *)addr,
+ addr, addr + len, rte_cpu_to_be_32(mr->ibv_mr->lkey),
+ mr->ms_base_idx, mr->ms_n, mr->ms_bmp_n);
+ return mr;
+}
+
+/**
* Called during rte_mempool_mem_iter() by mlx5_mr_update_ext_mp().
*
* Externally allocated chunk is registered and a MR is created for the chunk.
@@ -1142,43 +1194,19 @@ mlx5_mr_update_ext_mp_cb(struct rte_mempool *mp, void *opaque,
rte_rwlock_read_unlock(&priv->mr.rwlock);
if (lkey != UINT32_MAX)
return;
- mr = rte_zmalloc_socket(NULL,
- RTE_ALIGN_CEIL(sizeof(*mr),
- RTE_CACHE_LINE_SIZE),
- RTE_CACHE_LINE_SIZE, mp->socket_id);
- if (mr == NULL) {
- DRV_LOG(WARNING,
- "port %u unable to allocate memory for a new MR of"
- " mempool (%s).",
- dev->data->port_id, mp->name);
- data->ret = -1;
- return;
- }
DRV_LOG(DEBUG, "port %u register MR for chunk #%d of mempool (%s)",
dev->data->port_id, mem_idx, mp->name);
- mr->ibv_mr = mlx5_glue->reg_mr(priv->pd, (void *)addr, len,
- IBV_ACCESS_LOCAL_WRITE);
- if (mr->ibv_mr == NULL) {
+ mr = mlx5_create_mr_ext(dev, addr, len, mp->socket_id);
+ if (!mr) {
DRV_LOG(WARNING,
- "port %u fail to create a verbs MR for address (%p)",
- dev->data->port_id, (void *)addr);
- rte_free(mr);
+ "port %u unable to allocate a new MR of"
+ " mempool (%s).",
+ dev->data->port_id, mp->name);
data->ret = -1;
return;
}
- mr->msl = NULL; /* Mark it is external memory. */
- mr->ms_bmp = NULL;
- mr->ms_n = 1;
- mr->ms_bmp_n = 1;
rte_rwlock_write_lock(&priv->mr.rwlock);
LIST_INSERT_HEAD(&priv->mr.mr_list, mr, mr);
- DRV_LOG(DEBUG,
- "port %u MR CREATED (%p) for external memory %p:\n"
- " [0x%" PRIxPTR ", 0x%" PRIxPTR "),"
- " lkey=0x%x base_idx=%u ms_n=%u, ms_bmp_n=%u",
- dev->data->port_id, (void *)mr, (void *)addr,
- addr, addr + len, rte_cpu_to_be_32(mr->ibv_mr->lkey),
- mr->ms_base_idx, mr->ms_n, mr->ms_bmp_n);
/* Insert to the global cache table. */
mr_insert_dev_cache(dev, mr);
rte_rwlock_write_unlock(&priv->mr.rwlock);
--
2.12.0
^ permalink raw reply [flat|nested] 79+ messages in thread
* [dpdk-dev] [PATCH v2 5/6] net/mlx5: support PCI device DMA map and unmap
2019-02-13 9:10 [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for external memory Shahaf Shuler
` (11 preceding siblings ...)
2019-02-21 14:50 ` [dpdk-dev] [PATCH v2 4/6] net/mlx5: refactor external memory registration Shahaf Shuler
@ 2019-02-21 14:50 ` Shahaf Shuler
2019-02-21 14:50 ` [dpdk-dev] [PATCH v2 6/6] doc: deprecate VFIO DMA map APIs Shahaf Shuler
13 siblings, 0 replies; 79+ messages in thread
From: Shahaf Shuler @ 2019-02-21 14:50 UTC (permalink / raw)
To: anatoly.burakov, yskoh, thomas, ferruh.yigit, nhorman, gaetan.rivet; +Cc: dev
The implementation reuses the external memory registration work done by
commit[1].
Note about representors:
The current representor design will not work
with those map and unmap functions. The reason is that for representors
we have multiple IB devices share the same PCI function, so mapping will
happen only on one of the representors and not all of them.
While it is possible to implement such support, the IB representor
design is going to be changed during DPDK19.05. The new design will have
a single IB device for all representors, hence sharing of a single
memory region between all representors will be possible.
[1]
commit 7e43a32ee060
("net/mlx5: support externally allocated static memory")
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
---
drivers/net/mlx5/mlx5.c | 2 +
drivers/net/mlx5/mlx5_mr.c | 135 ++++++++++++++++++++++++++++++++++++++
drivers/net/mlx5/mlx5_rxtx.h | 5 ++
3 files changed, 142 insertions(+)
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index a913a5955f..ff17899cb3 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1626,6 +1626,8 @@ static struct rte_pci_driver mlx5_driver = {
.id_table = mlx5_pci_id_map,
.probe = mlx5_pci_probe,
.remove = mlx5_pci_remove,
+ .dma_map = mlx5_dma_map,
+ .dma_unmap = mlx5_dma_unmap,
.drv_flags = (RTE_PCI_DRV_INTR_LSC | RTE_PCI_DRV_INTR_RMV |
RTE_PCI_DRV_PROBE_AGAIN),
};
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 32be6a5445..4047a02bc0 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -14,6 +14,7 @@
#include <rte_mempool.h>
#include <rte_malloc.h>
#include <rte_rwlock.h>
+#include <rte_bus_pci.h>
#include "mlx5.h"
#include "mlx5_mr.h"
@@ -1215,6 +1216,140 @@ mlx5_mr_update_ext_mp_cb(struct rte_mempool *mp, void *opaque,
}
/**
+ * Finds the first ethdev that match the pci device.
+ * The existence of multiple ethdev per pci device is only with representors.
+ * On such case, it is enough to get only one of the ports as they all share
+ * the same ibv context.
+ *
+ * @param pdev
+ * Pointer to the PCI device.
+ *
+ * @return
+ * Pointer to the ethdev if found, NULL otherwise.
+ */
+static struct rte_eth_dev *
+pci_dev_to_eth_dev(struct rte_pci_device *pdev)
+{
+ struct rte_dev_iterator it;
+ struct rte_device *dev;
+
+ /**
+ * We really need to iterate all devices regardless of
+ * their owner.
+ */
+ RTE_DEV_FOREACH(dev, "class=eth", &it)
+ if (dev == &pdev->device)
+ return it.class_device;
+ return NULL;
+}
+
+/**
+ * DPDK callback to DMA map external memory to a PCI device.
+ *
+ * @param pdev
+ * Pointer to the PCI device.
+ * @param addr
+ * Starting virtual address of memory to be mapped.
+ * @param iova
+ * Starting IOVA address of memory to be mapped.
+ * @param len
+ * Length of memory segment being mapped.
+ *
+ * @return
+ * 0 on success, negative value on error.
+ */
+int
+mlx5_dma_map(struct rte_pci_device *pdev, void *addr,
+ uint64_t iova __rte_unused, size_t len)
+{
+ struct rte_eth_dev *dev;
+ struct mlx5_mr *mr;
+ struct priv *priv;
+
+ dev = pci_dev_to_eth_dev(pdev);
+ if (!dev) {
+ DRV_LOG(WARNING, "unable to find matching ethdev "
+ "to PCI device %p", (void *)pdev);
+ return -1;
+ }
+ priv = dev->data->dev_private;
+ mr = mlx5_create_mr_ext(dev, (uintptr_t)addr, len, SOCKET_ID_ANY);
+ if (!mr) {
+ DRV_LOG(WARNING,
+ "port %u unable to dma map", dev->data->port_id);
+ return -1;
+ }
+ rte_rwlock_write_lock(&priv->mr.rwlock);
+ LIST_INSERT_HEAD(&priv->mr.mr_list, mr, mr);
+ /* Insert to the global cache table. */
+ mr_insert_dev_cache(dev, mr);
+ rte_rwlock_write_unlock(&priv->mr.rwlock);
+ return 0;
+}
+
+/**
+ * DPDK callback to DMA unmap external memory to a PCI device.
+ *
+ * @param pdev
+ * Pointer to the PCI device.
+ * @param addr
+ * Starting virtual address of memory to be unmapped.
+ * @param iova
+ * Starting IOVA address of memory to be unmapped.
+ * @param len
+ * Length of memory segment being unmapped.
+ *
+ * @return
+ * 0 on success, negative value on error.
+ */
+int
+mlx5_dma_unmap(struct rte_pci_device *pdev, void *addr,
+ uint64_t iova __rte_unused, size_t len __rte_unused)
+{
+ struct rte_eth_dev *dev;
+ struct priv *priv;
+ struct mlx5_mr *mr;
+ struct mlx5_mr_cache entry;
+
+ dev = pci_dev_to_eth_dev(pdev);
+ if (!dev) {
+ DRV_LOG(WARNING, "unable to find matching ethdev "
+ "to PCI device %p", (void *)pdev);
+ return -1;
+ }
+ priv = dev->data->dev_private;
+ rte_rwlock_read_lock(&priv->mr.rwlock);
+ mr = mr_lookup_dev_list(dev, &entry, (uintptr_t)addr);
+ if (!mr) {
+ DRV_LOG(WARNING, "address 0x%" PRIxPTR " wasn't registered "
+ "to PCI device %p", (uintptr_t)addr,
+ (void *)pdev);
+ rte_rwlock_read_unlock(&priv->mr.rwlock);
+ return -1;
+ }
+ LIST_REMOVE(mr, mr);
+ LIST_INSERT_HEAD(&priv->mr.mr_free_list, mr, mr);
+ DEBUG("port %u remove MR(%p) from list", dev->data->port_id,
+ (void *)mr);
+ mr_rebuild_dev_cache(dev);
+ /*
+ * Flush local caches by propagating invalidation across cores.
+ * rte_smp_wmb() is enough to synchronize this event. If one of
+ * freed memsegs is seen by other core, that means the memseg
+ * has been allocated by allocator, which will come after this
+ * free call. Therefore, this store instruction (incrementing
+ * generation below) will be guaranteed to be seen by other core
+ * before the core sees the newly allocated memory.
+ */
+ ++priv->mr.dev_gen;
+ DEBUG("broadcasting local cache flush, gen=%d",
+ priv->mr.dev_gen);
+ rte_smp_wmb();
+ rte_rwlock_read_unlock(&priv->mr.rwlock);
+ return 0;
+}
+
+/**
* Register MR for entire memory chunks in a Mempool having externally allocated
* memory and fill in local cache.
*
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index c2529f96bc..f3f84dbac3 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -28,6 +28,7 @@
#include <rte_atomic.h>
#include <rte_spinlock.h>
#include <rte_io.h>
+#include <rte_bus_pci.h>
#include "mlx5_utils.h"
#include "mlx5.h"
@@ -367,6 +368,10 @@ uint32_t mlx5_rx_addr2mr_bh(struct mlx5_rxq_data *rxq, uintptr_t addr);
uint32_t mlx5_tx_mb2mr_bh(struct mlx5_txq_data *txq, struct rte_mbuf *mb);
uint32_t mlx5_tx_update_ext_mp(struct mlx5_txq_data *txq, uintptr_t addr,
struct rte_mempool *mp);
+int mlx5_dma_map(struct rte_pci_device *pdev, void *addr, uint64_t iova,
+ size_t len);
+int mlx5_dma_unmap(struct rte_pci_device *pdev, void *addr, uint64_t iova,
+ size_t len);
/**
* Provide safe 64bit store operation to mlx5 UAR region for both 32bit and
--
2.12.0
^ permalink raw reply [flat|nested] 79+ messages in thread
* [dpdk-dev] [PATCH v2 6/6] doc: deprecate VFIO DMA map APIs
2019-02-13 9:10 [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for external memory Shahaf Shuler
` (12 preceding siblings ...)
2019-02-21 14:50 ` [dpdk-dev] [PATCH v2 5/6] net/mlx5: support PCI device DMA map and unmap Shahaf Shuler
@ 2019-02-21 14:50 ` Shahaf Shuler
2019-02-21 15:50 ` David Marchand
13 siblings, 1 reply; 79+ messages in thread
From: Shahaf Shuler @ 2019-02-21 14:50 UTC (permalink / raw)
To: anatoly.burakov, yskoh, thomas, ferruh.yigit, nhorman, gaetan.rivet; +Cc: dev
As those have been replaced by rte_bus_dma_map and rte_pci_dma_unmap
APIs.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
---
doc/guides/prog_guide/env_abstraction_layer.rst | 2 +-
doc/guides/rel_notes/deprecation.rst | 4 ++++
lib/librte_eal/common/include/rte_vfio.h | 6 ++++--
3 files changed, 9 insertions(+), 3 deletions(-)
diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index 929d76dba7..6a1ddf8b4a 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -282,7 +282,7 @@ The expected workflow is as follows:
- If IOVA table is not specified, IOVA addresses will be assumed to be
unavailable
- Other processes must attach to the memory area before they can use it
-* Perform DMA mapping with ``rte_vfio_dma_map`` if needed
+* Perform DMA mapping with ``rte_bus_dma_map`` if needed
* Use the memory area in your application
* If memory area is no longer needed, it can be unregistered
- If the area was mapped for DMA, unmapping must be performed before
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 1b4fcb7e64..f7ae0d56fb 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -35,6 +35,10 @@ Deprecation Notices
+ ``rte_eal_devargs_type_count``
+* vfio: removal of ``rte_vfio_dma_map`` and ``rte_vfio_dma_unmap`` APIs which
+ have been replaced with ``rte_bus_dma_map`` and ``rte_bus_dma_unmap``
+ functions. The due date for the removal targets DPDK 19.08.
+
* pci: Several exposed functions are misnamed.
The following functions are deprecated starting from v17.11 and are replaced:
diff --git a/lib/librte_eal/common/include/rte_vfio.h b/lib/librte_eal/common/include/rte_vfio.h
index 54a0df5726..df139edea2 100644
--- a/lib/librte_eal/common/include/rte_vfio.h
+++ b/lib/librte_eal/common/include/rte_vfio.h
@@ -190,6 +190,7 @@ int
rte_vfio_clear_group(int vfio_group_fd);
/**
+ * @deprecated
* Map memory region for use with VFIO.
*
* @note Require at least one device to be attached at the time of
@@ -210,11 +211,12 @@ rte_vfio_clear_group(int vfio_group_fd);
* 0 if success.
* -1 on error.
*/
-int
+int __rte_deprecated
rte_vfio_dma_map(uint64_t vaddr, uint64_t iova, uint64_t len);
/**
+ * @deprecated
* Unmap memory region from VFIO.
*
* @param vaddr
@@ -231,7 +233,7 @@ rte_vfio_dma_map(uint64_t vaddr, uint64_t iova, uint64_t len);
* -1 on error.
*/
-int
+int __rte_deprecated
rte_vfio_dma_unmap(uint64_t vaddr, uint64_t iova, uint64_t len);
/**
* Parse IOMMU group number for a device
--
2.12.0
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: [dpdk-dev] [PATCH v2 6/6] doc: deprecate VFIO DMA map APIs
2019-02-21 14:50 ` [dpdk-dev] [PATCH v2 6/6] doc: deprecate VFIO DMA map APIs Shahaf Shuler
@ 2019-02-21 15:50 ` David Marchand
0 siblings, 0 replies; 79+ messages in thread
From: David Marchand @ 2019-02-21 15:50 UTC (permalink / raw)
To: Shahaf Shuler
Cc: Burakov, Anatoly, Yongseok Koh, Thomas Monjalon, Yigit, Ferruh,
Neil Horman, Gaetan Rivet, dev, Kevin Traynor
On Thu, Feb 21, 2019 at 3:51 PM Shahaf Shuler <shahafs@mellanox.com> wrote:
> As those have been replaced by rte_bus_dma_map and rte_pci_dma_unmap
> APIs.
>
> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> ---
> doc/guides/prog_guide/env_abstraction_layer.rst | 2 +-
> doc/guides/rel_notes/deprecation.rst | 4 ++++
> lib/librte_eal/common/include/rte_vfio.h | 6 ++++--
> 3 files changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst
> b/doc/guides/prog_guide/env_abstraction_layer.rst
> index 929d76dba7..6a1ddf8b4a 100644
> --- a/doc/guides/prog_guide/env_abstraction_layer.rst
> +++ b/doc/guides/prog_guide/env_abstraction_layer.rst
> @@ -282,7 +282,7 @@ The expected workflow is as follows:
> - If IOVA table is not specified, IOVA addresses will be assumed to be
> unavailable
> - Other processes must attach to the memory area before they can use
> it
> -* Perform DMA mapping with ``rte_vfio_dma_map`` if needed
> +* Perform DMA mapping with ``rte_bus_dma_map`` if needed
> * Use the memory area in your application
> * If memory area is no longer needed, it can be unregistered
> - If the area was mapped for DMA, unmapping must be performed before
> diff --git a/doc/guides/rel_notes/deprecation.rst
> b/doc/guides/rel_notes/deprecation.rst
> index 1b4fcb7e64..f7ae0d56fb 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -35,6 +35,10 @@ Deprecation Notices
>
> + ``rte_eal_devargs_type_count``
>
> +* vfio: removal of ``rte_vfio_dma_map`` and ``rte_vfio_dma_unmap`` APIs
> which
> + have been replaced with ``rte_bus_dma_map`` and ``rte_bus_dma_unmap``
> + functions. The due date for the removal targets DPDK 19.08.
> +
> * pci: Several exposed functions are misnamed.
> The following functions are deprecated starting from v17.11 and are
> replaced:
>
> diff --git a/lib/librte_eal/common/include/rte_vfio.h
> b/lib/librte_eal/common/include/rte_vfio.h
> index 54a0df5726..df139edea2 100644
> --- a/lib/librte_eal/common/include/rte_vfio.h
> +++ b/lib/librte_eal/common/include/rte_vfio.h
> @@ -190,6 +190,7 @@ int
> rte_vfio_clear_group(int vfio_group_fd);
>
> /**
> + * @deprecated
> * Map memory region for use with VFIO.
> *
> * @note Require at least one device to be attached at the time of
> @@ -210,11 +211,12 @@ rte_vfio_clear_group(int vfio_group_fd);
> * 0 if success.
> * -1 on error.
> */
> -int
> +int __rte_deprecated
> rte_vfio_dma_map(uint64_t vaddr, uint64_t iova, uint64_t len);
>
>
> /**
> + * @deprecated
> * Unmap memory region from VFIO.
> *
> * @param vaddr
> @@ -231,7 +233,7 @@ rte_vfio_dma_map(uint64_t vaddr, uint64_t iova,
> uint64_t len);
> * -1 on error.
> */
>
> -int
> +int __rte_deprecated
> rte_vfio_dma_unmap(uint64_t vaddr, uint64_t iova, uint64_t len);
> /**
> * Parse IOMMU group number for a device
> --
> 2.12.0
>
>
I don't know users of such apis, but you can't mark rte_vfio_dma_map as
deprecated now.
There is no stable alternative, rte_bus_dma_map() is experimental.
This has been discussed and some patches are in progress about it.
Last version: http://patchwork.dpdk.org/patch/50040/
--
David Marchand
^ permalink raw reply [flat|nested] 79+ messages in thread