* [dpdk-stable] [PATCH 2/2] vfio: fix partial DMA unmapping for VFIO type1 [not found] <20201012081106.10610-1-ndabilpuram@marvell.com> @ 2020-10-12 8:11 ` Nithin Dabilpuram 2020-10-14 15:07 ` Burakov, Anatoly [not found] ` <20201105090423.11954-1-ndabilpuram@marvell.com> ` (6 subsequent siblings) 7 siblings, 1 reply; 43+ messages in thread From: Nithin Dabilpuram @ 2020-10-12 8:11 UTC (permalink / raw) To: anatoly.burakov; +Cc: jerinj, dev, Nithin Dabilpuram, stable Partial unmapping is not supported for VFIO IOMMU type1 by kernel. Though kernel gives return as zero, the unmapped size returned will not be same as expected. So check for returned unmap size and return error. For case of DMA map/unmap triggered by heap allocations, maintain granularity of memseg page size so that heap expansion and contraction does not have this issue. For user requested DMA map/unmap disallow partial unmapping for VFIO type1. Fixes: 73a639085938 ("vfio: allow to map other memory regions") Cc: anatoly.burakov@intel.com Cc: stable@dpdk.org Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> --- lib/librte_eal/linux/eal_vfio.c | 34 ++++++++++++++++++++++++++++------ lib/librte_eal/linux/eal_vfio.h | 1 + 2 files changed, 29 insertions(+), 6 deletions(-) diff --git a/lib/librte_eal/linux/eal_vfio.c b/lib/librte_eal/linux/eal_vfio.c index d26e164..ef95259 100644 --- a/lib/librte_eal/linux/eal_vfio.c +++ b/lib/librte_eal/linux/eal_vfio.c @@ -69,6 +69,7 @@ static const struct vfio_iommu_type iommu_types[] = { { .type_id = RTE_VFIO_TYPE1, .name = "Type 1", + .partial_unmap = false, .dma_map_func = &vfio_type1_dma_map, .dma_user_map_func = &vfio_type1_dma_mem_map }, @@ -76,6 +77,7 @@ static const struct vfio_iommu_type iommu_types[] = { { .type_id = RTE_VFIO_SPAPR, .name = "sPAPR", + .partial_unmap = true, .dma_map_func = &vfio_spapr_dma_map, .dma_user_map_func = &vfio_spapr_dma_mem_map }, @@ -83,6 +85,7 @@ static const struct vfio_iommu_type iommu_types[] = { { .type_id = RTE_VFIO_NOIOMMU, .name = "No-IOMMU", + .partial_unmap = true, .dma_map_func = &vfio_noiommu_dma_map, .dma_user_map_func = &vfio_noiommu_dma_mem_map }, @@ -525,12 +528,19 @@ vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len, /* for IOVA as VA mode, no need to care for IOVA addresses */ if (rte_eal_iova_mode() == RTE_IOVA_VA && msl->external == 0) { uint64_t vfio_va = (uint64_t)(uintptr_t)addr; - if (type == RTE_MEM_EVENT_ALLOC) - vfio_dma_mem_map(default_vfio_cfg, vfio_va, vfio_va, - len, 1); - else - vfio_dma_mem_map(default_vfio_cfg, vfio_va, vfio_va, - len, 0); + uint64_t page_sz = msl->page_sz; + + /* Maintain granularity of DMA map/unmap to memseg size */ + for (; cur_len < len; cur_len += page_sz) { + if (type == RTE_MEM_EVENT_ALLOC) + vfio_dma_mem_map(default_vfio_cfg, vfio_va, + vfio_va, page_sz, 1); + else + vfio_dma_mem_map(default_vfio_cfg, vfio_va, + vfio_va, page_sz, 0); + vfio_va += page_sz; + } + return; } @@ -1383,6 +1393,12 @@ vfio_type1_dma_mem_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova, RTE_LOG(ERR, EAL, " cannot clear DMA remapping, error %i (%s)\n", errno, strerror(errno)); return -1; + } else if (dma_unmap.size != len) { + RTE_LOG(ERR, EAL, " unexpected size %"PRIu64" of DMA " + "remapping cleared instead of %"PRIu64"\n", + (uint64_t)dma_unmap.size, len); + rte_errno = EIO; + return -1; } } @@ -1853,6 +1869,12 @@ container_dma_unmap(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova, /* we're partially unmapping a previously mapped region, so we * need to split entry into two. */ + if (!vfio_cfg->vfio_iommu_type->partial_unmap) { + RTE_LOG(DEBUG, EAL, "DMA partial unmap unsupported\n"); + rte_errno = ENOTSUP; + ret = -1; + goto out; + } if (user_mem_maps->n_maps == VFIO_MAX_USER_MEM_MAPS) { RTE_LOG(ERR, EAL, "Not enough space to store partial mapping\n"); rte_errno = ENOMEM; diff --git a/lib/librte_eal/linux/eal_vfio.h b/lib/librte_eal/linux/eal_vfio.h index cb2d35f..6ebaca6 100644 --- a/lib/librte_eal/linux/eal_vfio.h +++ b/lib/librte_eal/linux/eal_vfio.h @@ -113,6 +113,7 @@ typedef int (*vfio_dma_user_func_t)(int fd, uint64_t vaddr, uint64_t iova, struct vfio_iommu_type { int type_id; const char *name; + bool partial_unmap; vfio_dma_user_func_t dma_user_map_func; vfio_dma_func_t dma_map_func; }; -- 2.8.4 ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [dpdk-stable] [PATCH 2/2] vfio: fix partial DMA unmapping for VFIO type1 2020-10-12 8:11 ` [dpdk-stable] [PATCH 2/2] vfio: fix partial DMA unmapping for VFIO type1 Nithin Dabilpuram @ 2020-10-14 15:07 ` Burakov, Anatoly 2020-10-15 6:09 ` [dpdk-stable] [EXT] " Nithin Dabilpuram 0 siblings, 1 reply; 43+ messages in thread From: Burakov, Anatoly @ 2020-10-14 15:07 UTC (permalink / raw) To: Nithin Dabilpuram; +Cc: jerinj, dev, stable On 12-Oct-20 9:11 AM, Nithin Dabilpuram wrote: > Partial unmapping is not supported for VFIO IOMMU type1 > by kernel. Though kernel gives return as zero, the unmapped size > returned will not be same as expected. So check for > returned unmap size and return error. > > For case of DMA map/unmap triggered by heap allocations, > maintain granularity of memseg page size so that heap > expansion and contraction does not have this issue. This is quite unfortunate, because there was a different bug that had to do with kernel having a very limited number of mappings available [1], as a result of which the page concatenation code was added. It should therefore be documented that the dma_entry_limit parameter should be adjusted should the user run out of the DMA entries. [1] https://lore.kernel.org/lkml/155414977872.12780.13728555131525362206.stgit@gimli.home/T/ > > For user requested DMA map/unmap disallow partial unmapping > for VFIO type1. > > Fixes: 73a639085938 ("vfio: allow to map other memory regions") > Cc: anatoly.burakov@intel.com > Cc: stable@dpdk.org > > Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> > --- > lib/librte_eal/linux/eal_vfio.c | 34 ++++++++++++++++++++++++++++------ > lib/librte_eal/linux/eal_vfio.h | 1 + > 2 files changed, 29 insertions(+), 6 deletions(-) > > diff --git a/lib/librte_eal/linux/eal_vfio.c b/lib/librte_eal/linux/eal_vfio.c > index d26e164..ef95259 100644 > --- a/lib/librte_eal/linux/eal_vfio.c > +++ b/lib/librte_eal/linux/eal_vfio.c > @@ -69,6 +69,7 @@ static const struct vfio_iommu_type iommu_types[] = { > { > .type_id = RTE_VFIO_TYPE1, > .name = "Type 1", > + .partial_unmap = false, > .dma_map_func = &vfio_type1_dma_map, > .dma_user_map_func = &vfio_type1_dma_mem_map > }, > @@ -76,6 +77,7 @@ static const struct vfio_iommu_type iommu_types[] = { > { > .type_id = RTE_VFIO_SPAPR, > .name = "sPAPR", > + .partial_unmap = true, > .dma_map_func = &vfio_spapr_dma_map, > .dma_user_map_func = &vfio_spapr_dma_mem_map > }, > @@ -83,6 +85,7 @@ static const struct vfio_iommu_type iommu_types[] = { > { > .type_id = RTE_VFIO_NOIOMMU, > .name = "No-IOMMU", > + .partial_unmap = true, > .dma_map_func = &vfio_noiommu_dma_map, > .dma_user_map_func = &vfio_noiommu_dma_mem_map > }, > @@ -525,12 +528,19 @@ vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len, > /* for IOVA as VA mode, no need to care for IOVA addresses */ > if (rte_eal_iova_mode() == RTE_IOVA_VA && msl->external == 0) { > uint64_t vfio_va = (uint64_t)(uintptr_t)addr; > - if (type == RTE_MEM_EVENT_ALLOC) > - vfio_dma_mem_map(default_vfio_cfg, vfio_va, vfio_va, > - len, 1); > - else > - vfio_dma_mem_map(default_vfio_cfg, vfio_va, vfio_va, > - len, 0); > + uint64_t page_sz = msl->page_sz; > + > + /* Maintain granularity of DMA map/unmap to memseg size */ > + for (; cur_len < len; cur_len += page_sz) { > + if (type == RTE_MEM_EVENT_ALLOC) > + vfio_dma_mem_map(default_vfio_cfg, vfio_va, > + vfio_va, page_sz, 1); > + else > + vfio_dma_mem_map(default_vfio_cfg, vfio_va, > + vfio_va, page_sz, 0); > + vfio_va += page_sz; > + } > + You'd also have to revert d1c7c0cdf7bac5eb40d3a2a690453aefeee5887b because currently the PA path will opportunistically concantenate contiguous segments into single mapping too. > return; > } > > @@ -1383,6 +1393,12 @@ vfio_type1_dma_mem_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova, > RTE_LOG(ERR, EAL, " cannot clear DMA remapping, error %i (%s)\n", > errno, strerror(errno)); > return -1; > + } else if (dma_unmap.size != len) { > + RTE_LOG(ERR, EAL, " unexpected size %"PRIu64" of DMA " > + "remapping cleared instead of %"PRIu64"\n", > + (uint64_t)dma_unmap.size, len); > + rte_errno = EIO; > + return -1; > } > } > > @@ -1853,6 +1869,12 @@ container_dma_unmap(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova, > /* we're partially unmapping a previously mapped region, so we > * need to split entry into two. > */ > + if (!vfio_cfg->vfio_iommu_type->partial_unmap) { > + RTE_LOG(DEBUG, EAL, "DMA partial unmap unsupported\n"); > + rte_errno = ENOTSUP; > + ret = -1; > + goto out; > + } How would we ever arrive here if we never do more than 1 page worth of memory anyway? I don't think this is needed. > if (user_mem_maps->n_maps == VFIO_MAX_USER_MEM_MAPS) { > RTE_LOG(ERR, EAL, "Not enough space to store partial mapping\n"); > rte_errno = ENOMEM; > diff --git a/lib/librte_eal/linux/eal_vfio.h b/lib/librte_eal/linux/eal_vfio.h > index cb2d35f..6ebaca6 100644 > --- a/lib/librte_eal/linux/eal_vfio.h > +++ b/lib/librte_eal/linux/eal_vfio.h > @@ -113,6 +113,7 @@ typedef int (*vfio_dma_user_func_t)(int fd, uint64_t vaddr, uint64_t iova, > struct vfio_iommu_type { > int type_id; > const char *name; > + bool partial_unmap; > vfio_dma_user_func_t dma_user_map_func; > vfio_dma_func_t dma_map_func; > }; > -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [dpdk-stable] [EXT] Re: [PATCH 2/2] vfio: fix partial DMA unmapping for VFIO type1 2020-10-14 15:07 ` Burakov, Anatoly @ 2020-10-15 6:09 ` Nithin Dabilpuram 2020-10-15 10:00 ` Burakov, Anatoly 0 siblings, 1 reply; 43+ messages in thread From: Nithin Dabilpuram @ 2020-10-15 6:09 UTC (permalink / raw) To: Burakov, Anatoly; +Cc: jerinj, dev, stable On Wed, Oct 14, 2020 at 04:07:10PM +0100, Burakov, Anatoly wrote: > External Email > > ---------------------------------------------------------------------- > On 12-Oct-20 9:11 AM, Nithin Dabilpuram wrote: > > Partial unmapping is not supported for VFIO IOMMU type1 > > by kernel. Though kernel gives return as zero, the unmapped size > > returned will not be same as expected. So check for > > returned unmap size and return error. > > > > For case of DMA map/unmap triggered by heap allocations, > > maintain granularity of memseg page size so that heap > > expansion and contraction does not have this issue. > > This is quite unfortunate, because there was a different bug that had to do > with kernel having a very limited number of mappings available [1], as a > result of which the page concatenation code was added. > > It should therefore be documented that the dma_entry_limit parameter should > be adjusted should the user run out of the DMA entries. > > [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_lkml_155414977872.12780.13728555131525362206.stgit-40gimli.home_T_&d=DwICaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=FZ_tPCbgFOh18zwRPO9H0yDx8VW38vuapifdDfc8SFQ&m=3GMg-634_cdUCY4WpQPwjzZ_S4ckuMHOnt2FxyyjXMk&s=TJLzppkaDS95VGyRHX2hzflQfb9XLK0OiOszSXoeXKk&e= Ack, I'll document it in guides/linux_gsg/linux_drivers.rst in vfio section. > > > > > For user requested DMA map/unmap disallow partial unmapping > > for VFIO type1. > > > > Fixes: 73a639085938 ("vfio: allow to map other memory regions") > > Cc: anatoly.burakov@intel.com > > Cc: stable@dpdk.org > > > > Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> > > --- > > lib/librte_eal/linux/eal_vfio.c | 34 ++++++++++++++++++++++++++++------ > > lib/librte_eal/linux/eal_vfio.h | 1 + > > 2 files changed, 29 insertions(+), 6 deletions(-) > > > > diff --git a/lib/librte_eal/linux/eal_vfio.c b/lib/librte_eal/linux/eal_vfio.c > > index d26e164..ef95259 100644 > > --- a/lib/librte_eal/linux/eal_vfio.c > > +++ b/lib/librte_eal/linux/eal_vfio.c > > @@ -69,6 +69,7 @@ static const struct vfio_iommu_type iommu_types[] = { > > { > > .type_id = RTE_VFIO_TYPE1, > > .name = "Type 1", > > + .partial_unmap = false, > > .dma_map_func = &vfio_type1_dma_map, > > .dma_user_map_func = &vfio_type1_dma_mem_map > > }, > > @@ -76,6 +77,7 @@ static const struct vfio_iommu_type iommu_types[] = { > > { > > .type_id = RTE_VFIO_SPAPR, > > .name = "sPAPR", > > + .partial_unmap = true, > > .dma_map_func = &vfio_spapr_dma_map, > > .dma_user_map_func = &vfio_spapr_dma_mem_map > > }, > > @@ -83,6 +85,7 @@ static const struct vfio_iommu_type iommu_types[] = { > > { > > .type_id = RTE_VFIO_NOIOMMU, > > .name = "No-IOMMU", > > + .partial_unmap = true, > > .dma_map_func = &vfio_noiommu_dma_map, > > .dma_user_map_func = &vfio_noiommu_dma_mem_map > > }, > > @@ -525,12 +528,19 @@ vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len, > > /* for IOVA as VA mode, no need to care for IOVA addresses */ > > if (rte_eal_iova_mode() == RTE_IOVA_VA && msl->external == 0) { > > uint64_t vfio_va = (uint64_t)(uintptr_t)addr; > > - if (type == RTE_MEM_EVENT_ALLOC) > > - vfio_dma_mem_map(default_vfio_cfg, vfio_va, vfio_va, > > - len, 1); > > - else > > - vfio_dma_mem_map(default_vfio_cfg, vfio_va, vfio_va, > > - len, 0); > > + uint64_t page_sz = msl->page_sz; > > + > > + /* Maintain granularity of DMA map/unmap to memseg size */ > > + for (; cur_len < len; cur_len += page_sz) { > > + if (type == RTE_MEM_EVENT_ALLOC) > > + vfio_dma_mem_map(default_vfio_cfg, vfio_va, > > + vfio_va, page_sz, 1); > > + else > > + vfio_dma_mem_map(default_vfio_cfg, vfio_va, > > + vfio_va, page_sz, 0); > > + vfio_va += page_sz; > > + } > > + > > You'd also have to revert d1c7c0cdf7bac5eb40d3a2a690453aefeee5887b because > currently the PA path will opportunistically concantenate contiguous > segments into single mapping too. Ack, I'll change it even for IOVA as PA mode. I missed that. > > > return; > > } > > @@ -1383,6 +1393,12 @@ vfio_type1_dma_mem_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova, > > RTE_LOG(ERR, EAL, " cannot clear DMA remapping, error %i (%s)\n", > > errno, strerror(errno)); > > return -1; > > + } else if (dma_unmap.size != len) { > > + RTE_LOG(ERR, EAL, " unexpected size %"PRIu64" of DMA " > > + "remapping cleared instead of %"PRIu64"\n", > > + (uint64_t)dma_unmap.size, len); > > + rte_errno = EIO; > > + return -1; > > } > > } > > @@ -1853,6 +1869,12 @@ container_dma_unmap(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova, > > /* we're partially unmapping a previously mapped region, so we > > * need to split entry into two. > > */ > > + if (!vfio_cfg->vfio_iommu_type->partial_unmap) { > > + RTE_LOG(DEBUG, EAL, "DMA partial unmap unsupported\n"); > > + rte_errno = ENOTSUP; > > + ret = -1; > > + goto out; > > + } > > How would we ever arrive here if we never do more than 1 page worth of > memory anyway? I don't think this is needed. container_dma_unmap() is called by user via rte_vfio_container_dma_unmap() and when he maps we don't split it as we don't about his memory. So if he maps multiple pages and tries to unmap partially, then we should fail. > > > if (user_mem_maps->n_maps == VFIO_MAX_USER_MEM_MAPS) { > > RTE_LOG(ERR, EAL, "Not enough space to store partial mapping\n"); > > rte_errno = ENOMEM; > > diff --git a/lib/librte_eal/linux/eal_vfio.h b/lib/librte_eal/linux/eal_vfio.h > > index cb2d35f..6ebaca6 100644 > > --- a/lib/librte_eal/linux/eal_vfio.h > > +++ b/lib/librte_eal/linux/eal_vfio.h > > @@ -113,6 +113,7 @@ typedef int (*vfio_dma_user_func_t)(int fd, uint64_t vaddr, uint64_t iova, > > struct vfio_iommu_type { > > int type_id; > > const char *name; > > + bool partial_unmap; > > vfio_dma_user_func_t dma_user_map_func; > > vfio_dma_func_t dma_map_func; > > }; > > > > > -- > Thanks, > Anatoly ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [dpdk-stable] [EXT] Re: [PATCH 2/2] vfio: fix partial DMA unmapping for VFIO type1 2020-10-15 6:09 ` [dpdk-stable] [EXT] " Nithin Dabilpuram @ 2020-10-15 10:00 ` Burakov, Anatoly 2020-10-15 11:38 ` Nithin Dabilpuram ` (2 more replies) 0 siblings, 3 replies; 43+ messages in thread From: Burakov, Anatoly @ 2020-10-15 10:00 UTC (permalink / raw) To: Nithin Dabilpuram; +Cc: jerinj, dev, stable On 15-Oct-20 7:09 AM, Nithin Dabilpuram wrote: > On Wed, Oct 14, 2020 at 04:07:10PM +0100, Burakov, Anatoly wrote: >> External Email >> >> ---------------------------------------------------------------------- >> On 12-Oct-20 9:11 AM, Nithin Dabilpuram wrote: >>> Partial unmapping is not supported for VFIO IOMMU type1 >>> by kernel. Though kernel gives return as zero, the unmapped size >>> returned will not be same as expected. So check for >>> returned unmap size and return error. >>> >>> For case of DMA map/unmap triggered by heap allocations, >>> maintain granularity of memseg page size so that heap >>> expansion and contraction does not have this issue. >> >> This is quite unfortunate, because there was a different bug that had to do >> with kernel having a very limited number of mappings available [1], as a >> result of which the page concatenation code was added. >> >> It should therefore be documented that the dma_entry_limit parameter should >> be adjusted should the user run out of the DMA entries. >> >> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_lkml_155414977872.12780.13728555131525362206.stgit-40gimli.home_T_&d=DwICaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=FZ_tPCbgFOh18zwRPO9H0yDx8VW38vuapifdDfc8SFQ&m=3GMg-634_cdUCY4WpQPwjzZ_S4ckuMHOnt2FxyyjXMk&s=TJLzppkaDS95VGyRHX2hzflQfb9XLK0OiOszSXoeXKk&e= <snip> >>> RTE_LOG(ERR, EAL, " cannot clear DMA remapping, error %i (%s)\n", >>> errno, strerror(errno)); >>> return -1; >>> + } else if (dma_unmap.size != len) { >>> + RTE_LOG(ERR, EAL, " unexpected size %"PRIu64" of DMA " >>> + "remapping cleared instead of %"PRIu64"\n", >>> + (uint64_t)dma_unmap.size, len); >>> + rte_errno = EIO; >>> + return -1; >>> } >>> } >>> @@ -1853,6 +1869,12 @@ container_dma_unmap(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova, >>> /* we're partially unmapping a previously mapped region, so we >>> * need to split entry into two. >>> */ >>> + if (!vfio_cfg->vfio_iommu_type->partial_unmap) { >>> + RTE_LOG(DEBUG, EAL, "DMA partial unmap unsupported\n"); >>> + rte_errno = ENOTSUP; >>> + ret = -1; >>> + goto out; >>> + } >> >> How would we ever arrive here if we never do more than 1 page worth of >> memory anyway? I don't think this is needed. > > container_dma_unmap() is called by user via rte_vfio_container_dma_unmap() > and when he maps we don't split it as we don't about his memory. > So if he maps multiple pages and tries to unmap partially, then we should fail. Should we map it in page granularity then, instead of adding this discrepancy between EAL and user mapping? I.e. instead of adding a workaround, how about we just do the same thing for user mem mappings? -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [dpdk-stable] [EXT] Re: [PATCH 2/2] vfio: fix partial DMA unmapping for VFIO type1 2020-10-15 10:00 ` Burakov, Anatoly @ 2020-10-15 11:38 ` Nithin Dabilpuram 2020-10-15 11:50 ` Nithin Dabilpuram 2020-10-15 11:57 ` [dpdk-stable] [dpdk-dev] " Nithin Dabilpuram 2 siblings, 0 replies; 43+ messages in thread From: Nithin Dabilpuram @ 2020-10-15 11:38 UTC (permalink / raw) To: Burakov, Anatoly; +Cc: jerinj, dev, stable On Thu, Oct 15, 2020 at 11:00:59AM +0100, Burakov, Anatoly wrote: > On 15-Oct-20 7:09 AM, Nithin Dabilpuram wrote: > > On Wed, Oct 14, 2020 at 04:07:10PM +0100, Burakov, Anatoly wrote: > > > External Email > > > > > > ---------------------------------------------------------------------- > > > On 12-Oct-20 9:11 AM, Nithin Dabilpuram wrote: > > > > Partial unmapping is not supported for VFIO IOMMU type1 > > > > by kernel. Though kernel gives return as zero, the unmapped size > > > > returned will not be same as expected. So check for > > > > returned unmap size and return error. > > > > > > > > For case of DMA map/unmap triggered by heap allocations, > > > > maintain granularity of memseg page size so that heap > > > > expansion and contraction does not have this issue. > > > > > > This is quite unfortunate, because there was a different bug that had to do > > > with kernel having a very limited number of mappings available [1], as a > > > result of which the page concatenation code was added. > > > > > > It should therefore be documented that the dma_entry_limit parameter should > > > be adjusted should the user run out of the DMA entries. > > > > > > [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_lkml_155414977872.12780.13728555131525362206.stgit-40gimli.home_T_&d=DwICaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=FZ_tPCbgFOh18zwRPO9H0yDx8VW38vuapifdDfc8SFQ&m=3GMg-634_cdUCY4WpQPwjzZ_S4ckuMHOnt2FxyyjXMk&s=TJLzppkaDS95VGyRHX2hzflQfb9XLK0OiOszSXoeXKk&e= > > <snip> > > > > > RTE_LOG(ERR, EAL, " cannot clear DMA remapping, error %i (%s)\n", > > > > errno, strerror(errno)); > > > > return -1; > > > > + } else if (dma_unmap.size != len) { > > > > + RTE_LOG(ERR, EAL, " unexpected size %"PRIu64" of DMA " > > > > + "remapping cleared instead of %"PRIu64"\n", > > > > + (uint64_t)dma_unmap.size, len); > > > > + rte_errno = EIO; > > > > + return -1; > > > > } > > > > } > > > > @@ -1853,6 +1869,12 @@ container_dma_unmap(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova, > > > > /* we're partially unmapping a previously mapped region, so we > > > > * need to split entry into two. > > > > */ > > > > + if (!vfio_cfg->vfio_iommu_type->partial_unmap) { > > > > + RTE_LOG(DEBUG, EAL, "DMA partial unmap unsupported\n"); > > > > + rte_errno = ENOTSUP; > > > > + ret = -1; > > > > + goto out; > > > > + } > > > > > > How would we ever arrive here if we never do more than 1 page worth of > > > memory anyway? I don't think this is needed. > > > > container_dma_unmap() is called by user via rte_vfio_container_dma_unmap() > > and when he maps we don't split it as we don't about his memory. > > So if he maps multiple pages and tries to unmap partially, then we should fail. > > Should we map it in page granularity then, instead of adding this > discrepancy between EAL and user mapping? I.e. instead of adding a > workaround, how about we just do the same thing for user mem mappings? In heap mapping's we map and unmap it at huge page granularity as we will always maintain that. But here I think we don't know if user's allocation is huge page or collection of system pages. Only thing we can do here is map it at system page granularity which could waste entries if he say really is working with hugepages. Isn't ? > > -- > Thanks, > Anatoly ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [dpdk-stable] [EXT] Re: [PATCH 2/2] vfio: fix partial DMA unmapping for VFIO type1 2020-10-15 10:00 ` Burakov, Anatoly 2020-10-15 11:38 ` Nithin Dabilpuram @ 2020-10-15 11:50 ` Nithin Dabilpuram 2020-10-15 11:57 ` [dpdk-stable] [dpdk-dev] " Nithin Dabilpuram 2 siblings, 0 replies; 43+ messages in thread From: Nithin Dabilpuram @ 2020-10-15 11:50 UTC (permalink / raw) To: Burakov, Anatoly; +Cc: jerinj, dev, stable On Thu, Oct 15, 2020 at 11:00:59AM +0100, Burakov, Anatoly wrote: > On 15-Oct-20 7:09 AM, Nithin Dabilpuram wrote: > > On Wed, Oct 14, 2020 at 04:07:10PM +0100, Burakov, Anatoly wrote: > > > External Email > > > > > > ---------------------------------------------------------------------- > > > On 12-Oct-20 9:11 AM, Nithin Dabilpuram wrote: > > > > Partial unmapping is not supported for VFIO IOMMU type1 > > > > by kernel. Though kernel gives return as zero, the unmapped size > > > > returned will not be same as expected. So check for > > > > returned unmap size and return error. > > > > > > > > For case of DMA map/unmap triggered by heap allocations, > > > > maintain granularity of memseg page size so that heap > > > > expansion and contraction does not have this issue. > > > > > > This is quite unfortunate, because there was a different bug that had to do > > > with kernel having a very limited number of mappings available [1], as a > > > result of which the page concatenation code was added. > > > > > > It should therefore be documented that the dma_entry_limit parameter should > > > be adjusted should the user run out of the DMA entries. > > > > > > [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_lkml_155414977872.12780.13728555131525362206.stgit-40gimli.home_T_&d=DwICaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=FZ_tPCbgFOh18zwRPO9H0yDx8VW38vuapifdDfc8SFQ&m=3GMg-634_cdUCY4WpQPwjzZ_S4ckuMHOnt2FxyyjXMk&s=TJLzppkaDS95VGyRHX2hzflQfb9XLK0OiOszSXoeXKk&e= > > <snip> > > > > > RTE_LOG(ERR, EAL, " cannot clear DMA remapping, error %i (%s)\n", > > > > errno, strerror(errno)); > > > > return -1; > > > > + } else if (dma_unmap.size != len) { > > > > + RTE_LOG(ERR, EAL, " unexpected size %"PRIu64" of DMA " > > > > + "remapping cleared instead of %"PRIu64"\n", > > > > + (uint64_t)dma_unmap.size, len); > > > > + rte_errno = EIO; > > > > + return -1; > > > > } > > > > } > > > > @@ -1853,6 +1869,12 @@ container_dma_unmap(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova, > > > > /* we're partially unmapping a previously mapped region, so we > > > > * need to split entry into two. > > > > */ > > > > + if (!vfio_cfg->vfio_iommu_type->partial_unmap) { > > > > + RTE_LOG(DEBUG, EAL, "DMA partial unmap unsupported\n"); > > > > + rte_errno = ENOTSUP; > > > > + ret = -1; > > > > + goto out; > > > > + } > > > > > > How would we ever arrive here if we never do more than 1 page worth of > > > memory anyway? I don't think this is needed. > > > > container_dma_unmap() is called by user via rte_vfio_container_dma_unmap() > > and when he maps we don't split it as we don't about his memory. > > So if he maps multiple pages and tries to unmap partially, then we should fail. > > Should we map it in page granularity then, instead of adding this > discrepancy between EAL and user mapping? I.e. instead of adding a > workaround, how about we just do the same thing for user mem mappings? In heap mapping's we map and unmap it at huge page granularity as we will always maintain that. But here I think we don't know if user's allocation is huge page or collection of system pages. Only thing we can do here is map it at system page granularity which could waste entries if he say really is working with hugepages. Isn't ? > > -- > Thanks, > Anatoly ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [dpdk-stable] [dpdk-dev] [EXT] Re: [PATCH 2/2] vfio: fix partial DMA unmapping for VFIO type1 2020-10-15 10:00 ` Burakov, Anatoly 2020-10-15 11:38 ` Nithin Dabilpuram 2020-10-15 11:50 ` Nithin Dabilpuram @ 2020-10-15 11:57 ` Nithin Dabilpuram 2020-10-15 15:10 ` Burakov, Anatoly 2 siblings, 1 reply; 43+ messages in thread From: Nithin Dabilpuram @ 2020-10-15 11:57 UTC (permalink / raw) To: Burakov, Anatoly; +Cc: Nithin Dabilpuram, Jerin Jacob, dev, stable On Thu, Oct 15, 2020 at 3:31 PM Burakov, Anatoly <anatoly.burakov@intel.com> wrote: > > On 15-Oct-20 7:09 AM, Nithin Dabilpuram wrote: > > On Wed, Oct 14, 2020 at 04:07:10PM +0100, Burakov, Anatoly wrote: > >> External Email > >> > >> ---------------------------------------------------------------------- > >> On 12-Oct-20 9:11 AM, Nithin Dabilpuram wrote: > >>> Partial unmapping is not supported for VFIO IOMMU type1 > >>> by kernel. Though kernel gives return as zero, the unmapped size > >>> returned will not be same as expected. So check for > >>> returned unmap size and return error. > >>> > >>> For case of DMA map/unmap triggered by heap allocations, > >>> maintain granularity of memseg page size so that heap > >>> expansion and contraction does not have this issue. > >> > >> This is quite unfortunate, because there was a different bug that had to do > >> with kernel having a very limited number of mappings available [1], as a > >> result of which the page concatenation code was added. > >> > >> It should therefore be documented that the dma_entry_limit parameter should > >> be adjusted should the user run out of the DMA entries. > >> > >> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_lkml_155414977872.12780.13728555131525362206.stgit-40gimli.home_T_&d=DwICaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=FZ_tPCbgFOh18zwRPO9H0yDx8VW38vuapifdDfc8SFQ&m=3GMg-634_cdUCY4WpQPwjzZ_S4ckuMHOnt2FxyyjXMk&s=TJLzppkaDS95VGyRHX2hzflQfb9XLK0OiOszSXoeXKk&e= > > <snip> > > >>> RTE_LOG(ERR, EAL, " cannot clear DMA remapping, error %i (%s)\n", > >>> errno, strerror(errno)); > >>> return -1; > >>> + } else if (dma_unmap.size != len) { > >>> + RTE_LOG(ERR, EAL, " unexpected size %"PRIu64" of DMA " > >>> + "remapping cleared instead of %"PRIu64"\n", > >>> + (uint64_t)dma_unmap.size, len); > >>> + rte_errno = EIO; > >>> + return -1; > >>> } > >>> } > >>> @@ -1853,6 +1869,12 @@ container_dma_unmap(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova, > >>> /* we're partially unmapping a previously mapped region, so we > >>> * need to split entry into two. > >>> */ > >>> + if (!vfio_cfg->vfio_iommu_type->partial_unmap) { > >>> + RTE_LOG(DEBUG, EAL, "DMA partial unmap unsupported\n"); > >>> + rte_errno = ENOTSUP; > >>> + ret = -1; > >>> + goto out; > >>> + } > >> > >> How would we ever arrive here if we never do more than 1 page worth of > >> memory anyway? I don't think this is needed. > > > > container_dma_unmap() is called by user via rte_vfio_container_dma_unmap() > > and when he maps we don't split it as we don't about his memory. > > So if he maps multiple pages and tries to unmap partially, then we should fail. > > Should we map it in page granularity then, instead of adding this > discrepancy between EAL and user mapping? I.e. instead of adding a > workaround, how about we just do the same thing for user mem mappings? > In heap mapping's we map and unmap it at huge page granularity as we will always maintain that. But here I think we don't know if user's allocation is huge page or collection of system pages. Only thing we can do here is map it at system page granularity which could waste entries if he say really is working with hugepages. Isn't ? > > -- > Thanks, > Anatoly ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [dpdk-stable] [dpdk-dev] [EXT] Re: [PATCH 2/2] vfio: fix partial DMA unmapping for VFIO type1 2020-10-15 11:57 ` [dpdk-stable] [dpdk-dev] " Nithin Dabilpuram @ 2020-10-15 15:10 ` Burakov, Anatoly 2020-10-16 7:10 ` Nithin Dabilpuram 0 siblings, 1 reply; 43+ messages in thread From: Burakov, Anatoly @ 2020-10-15 15:10 UTC (permalink / raw) To: Nithin Dabilpuram; +Cc: Nithin Dabilpuram, Jerin Jacob, dev, stable On 15-Oct-20 12:57 PM, Nithin Dabilpuram wrote: > On Thu, Oct 15, 2020 at 3:31 PM Burakov, Anatoly > <anatoly.burakov@intel.com> wrote: >> >> On 15-Oct-20 7:09 AM, Nithin Dabilpuram wrote: >>> On Wed, Oct 14, 2020 at 04:07:10PM +0100, Burakov, Anatoly wrote: >>>> External Email >>>> >>>> ---------------------------------------------------------------------- >>>> On 12-Oct-20 9:11 AM, Nithin Dabilpuram wrote: >>>>> Partial unmapping is not supported for VFIO IOMMU type1 >>>>> by kernel. Though kernel gives return as zero, the unmapped size >>>>> returned will not be same as expected. So check for >>>>> returned unmap size and return error. >>>>> >>>>> For case of DMA map/unmap triggered by heap allocations, >>>>> maintain granularity of memseg page size so that heap >>>>> expansion and contraction does not have this issue. >>>> >>>> This is quite unfortunate, because there was a different bug that had to do >>>> with kernel having a very limited number of mappings available [1], as a >>>> result of which the page concatenation code was added. >>>> >>>> It should therefore be documented that the dma_entry_limit parameter should >>>> be adjusted should the user run out of the DMA entries. >>>> >>>> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_lkml_155414977872.12780.13728555131525362206.stgit-40gimli.home_T_&d=DwICaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=FZ_tPCbgFOh18zwRPO9H0yDx8VW38vuapifdDfc8SFQ&m=3GMg-634_cdUCY4WpQPwjzZ_S4ckuMHOnt2FxyyjXMk&s=TJLzppkaDS95VGyRHX2hzflQfb9XLK0OiOszSXoeXKk&e= >> >> <snip> >> >>>>> RTE_LOG(ERR, EAL, " cannot clear DMA remapping, error %i (%s)\n", >>>>> errno, strerror(errno)); >>>>> return -1; >>>>> + } else if (dma_unmap.size != len) { >>>>> + RTE_LOG(ERR, EAL, " unexpected size %"PRIu64" of DMA " >>>>> + "remapping cleared instead of %"PRIu64"\n", >>>>> + (uint64_t)dma_unmap.size, len); >>>>> + rte_errno = EIO; >>>>> + return -1; >>>>> } >>>>> } >>>>> @@ -1853,6 +1869,12 @@ container_dma_unmap(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova, >>>>> /* we're partially unmapping a previously mapped region, so we >>>>> * need to split entry into two. >>>>> */ >>>>> + if (!vfio_cfg->vfio_iommu_type->partial_unmap) { >>>>> + RTE_LOG(DEBUG, EAL, "DMA partial unmap unsupported\n"); >>>>> + rte_errno = ENOTSUP; >>>>> + ret = -1; >>>>> + goto out; >>>>> + } >>>> >>>> How would we ever arrive here if we never do more than 1 page worth of >>>> memory anyway? I don't think this is needed. >>> >>> container_dma_unmap() is called by user via rte_vfio_container_dma_unmap() >>> and when he maps we don't split it as we don't about his memory. >>> So if he maps multiple pages and tries to unmap partially, then we should fail. >> >> Should we map it in page granularity then, instead of adding this >> discrepancy between EAL and user mapping? I.e. instead of adding a >> workaround, how about we just do the same thing for user mem mappings? >> > In heap mapping's we map and unmap it at huge page granularity as we will always > maintain that. > > But here I think we don't know if user's allocation is huge page or > collection of system > pages. Only thing we can do here is map it at system page granularity which > could waste entries if he say really is working with hugepages. Isn't ? > Yeah we do. The API mandates the pages granularity, and it will check against page size and number of IOVA entries, so yes, we do enforce the fact that the IOVA addresses supplied by the user have to be page addresses. -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [dpdk-stable] [dpdk-dev] [EXT] Re: [PATCH 2/2] vfio: fix partial DMA unmapping for VFIO type1 2020-10-15 15:10 ` Burakov, Anatoly @ 2020-10-16 7:10 ` Nithin Dabilpuram 2020-10-17 16:14 ` Burakov, Anatoly 0 siblings, 1 reply; 43+ messages in thread From: Nithin Dabilpuram @ 2020-10-16 7:10 UTC (permalink / raw) To: Burakov, Anatoly; +Cc: Jerin Jacob, dev, stable On Thu, Oct 15, 2020 at 04:10:31PM +0100, Burakov, Anatoly wrote: > On 15-Oct-20 12:57 PM, Nithin Dabilpuram wrote: > > On Thu, Oct 15, 2020 at 3:31 PM Burakov, Anatoly > > <anatoly.burakov@intel.com> wrote: > > > > > > On 15-Oct-20 7:09 AM, Nithin Dabilpuram wrote: > > > > On Wed, Oct 14, 2020 at 04:07:10PM +0100, Burakov, Anatoly wrote: > > > > > External Email > > > > > > > > > > ---------------------------------------------------------------------- > > > > > On 12-Oct-20 9:11 AM, Nithin Dabilpuram wrote: > > > > > > Partial unmapping is not supported for VFIO IOMMU type1 > > > > > > by kernel. Though kernel gives return as zero, the unmapped size > > > > > > returned will not be same as expected. So check for > > > > > > returned unmap size and return error. > > > > > > > > > > > > For case of DMA map/unmap triggered by heap allocations, > > > > > > maintain granularity of memseg page size so that heap > > > > > > expansion and contraction does not have this issue. > > > > > > > > > > This is quite unfortunate, because there was a different bug that had to do > > > > > with kernel having a very limited number of mappings available [1], as a > > > > > result of which the page concatenation code was added. > > > > > > > > > > It should therefore be documented that the dma_entry_limit parameter should > > > > > be adjusted should the user run out of the DMA entries. > > > > > > > > > > [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_lkml_155414977872.12780.13728555131525362206.stgit-40gimli.home_T_&d=DwICaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=FZ_tPCbgFOh18zwRPO9H0yDx8VW38vuapifdDfc8SFQ&m=3GMg-634_cdUCY4WpQPwjzZ_S4ckuMHOnt2FxyyjXMk&s=TJLzppkaDS95VGyRHX2hzflQfb9XLK0OiOszSXoeXKk&e= > > > > > > <snip> > > > > > > > > > RTE_LOG(ERR, EAL, " cannot clear DMA remapping, error %i (%s)\n", > > > > > > errno, strerror(errno)); > > > > > > return -1; > > > > > > + } else if (dma_unmap.size != len) { > > > > > > + RTE_LOG(ERR, EAL, " unexpected size %"PRIu64" of DMA " > > > > > > + "remapping cleared instead of %"PRIu64"\n", > > > > > > + (uint64_t)dma_unmap.size, len); > > > > > > + rte_errno = EIO; > > > > > > + return -1; > > > > > > } > > > > > > } > > > > > > @@ -1853,6 +1869,12 @@ container_dma_unmap(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova, > > > > > > /* we're partially unmapping a previously mapped region, so we > > > > > > * need to split entry into two. > > > > > > */ > > > > > > + if (!vfio_cfg->vfio_iommu_type->partial_unmap) { > > > > > > + RTE_LOG(DEBUG, EAL, "DMA partial unmap unsupported\n"); > > > > > > + rte_errno = ENOTSUP; > > > > > > + ret = -1; > > > > > > + goto out; > > > > > > + } > > > > > > > > > > How would we ever arrive here if we never do more than 1 page worth of > > > > > memory anyway? I don't think this is needed. > > > > > > > > container_dma_unmap() is called by user via rte_vfio_container_dma_unmap() > > > > and when he maps we don't split it as we don't about his memory. > > > > So if he maps multiple pages and tries to unmap partially, then we should fail. > > > > > > Should we map it in page granularity then, instead of adding this > > > discrepancy between EAL and user mapping? I.e. instead of adding a > > > workaround, how about we just do the same thing for user mem mappings? > > > > > In heap mapping's we map and unmap it at huge page granularity as we will always > > maintain that. > > > > But here I think we don't know if user's allocation is huge page or > > collection of system > > pages. Only thing we can do here is map it at system page granularity which > > could waste entries if he say really is working with hugepages. Isn't ? > > > > Yeah we do. The API mandates the pages granularity, and it will check > against page size and number of IOVA entries, so yes, we do enforce the fact > that the IOVA addresses supplied by the user have to be page addresses. If I see rte_vfio_container_dma_map(), there is no mention of Huge page size user is providing or we computing. He can call rte_vfio_container_dma_map() with 1GB huge page or 4K system page. Am I missing something ? > > -- > Thanks, > Anatoly ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [dpdk-stable] [dpdk-dev] [EXT] Re: [PATCH 2/2] vfio: fix partial DMA unmapping for VFIO type1 2020-10-16 7:10 ` Nithin Dabilpuram @ 2020-10-17 16:14 ` Burakov, Anatoly 2020-10-19 9:43 ` Nithin Dabilpuram 0 siblings, 1 reply; 43+ messages in thread From: Burakov, Anatoly @ 2020-10-17 16:14 UTC (permalink / raw) To: Nithin Dabilpuram; +Cc: Jerin Jacob, dev, stable On 16-Oct-20 8:10 AM, Nithin Dabilpuram wrote: > On Thu, Oct 15, 2020 at 04:10:31PM +0100, Burakov, Anatoly wrote: >> On 15-Oct-20 12:57 PM, Nithin Dabilpuram wrote: >>> On Thu, Oct 15, 2020 at 3:31 PM Burakov, Anatoly >>> <anatoly.burakov@intel.com> wrote: >>>> >>>> On 15-Oct-20 7:09 AM, Nithin Dabilpuram wrote: >>>>> On Wed, Oct 14, 2020 at 04:07:10PM +0100, Burakov, Anatoly wrote: >>>>>> External Email >>>>>> >>>>>> ---------------------------------------------------------------------- >>>>>> On 12-Oct-20 9:11 AM, Nithin Dabilpuram wrote: >>>>>>> Partial unmapping is not supported for VFIO IOMMU type1 >>>>>>> by kernel. Though kernel gives return as zero, the unmapped size >>>>>>> returned will not be same as expected. So check for >>>>>>> returned unmap size and return error. >>>>>>> >>>>>>> For case of DMA map/unmap triggered by heap allocations, >>>>>>> maintain granularity of memseg page size so that heap >>>>>>> expansion and contraction does not have this issue. >>>>>> >>>>>> This is quite unfortunate, because there was a different bug that had to do >>>>>> with kernel having a very limited number of mappings available [1], as a >>>>>> result of which the page concatenation code was added. >>>>>> >>>>>> It should therefore be documented that the dma_entry_limit parameter should >>>>>> be adjusted should the user run out of the DMA entries. >>>>>> >>>>>> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_lkml_155414977872.12780.13728555131525362206.stgit-40gimli.home_T_&d=DwICaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=FZ_tPCbgFOh18zwRPO9H0yDx8VW38vuapifdDfc8SFQ&m=3GMg-634_cdUCY4WpQPwjzZ_S4ckuMHOnt2FxyyjXMk&s=TJLzppkaDS95VGyRHX2hzflQfb9XLK0OiOszSXoeXKk&e= >>>> >>>> <snip> >>>> >>>>>>> RTE_LOG(ERR, EAL, " cannot clear DMA remapping, error %i (%s)\n", >>>>>>> errno, strerror(errno)); >>>>>>> return -1; >>>>>>> + } else if (dma_unmap.size != len) { >>>>>>> + RTE_LOG(ERR, EAL, " unexpected size %"PRIu64" of DMA " >>>>>>> + "remapping cleared instead of %"PRIu64"\n", >>>>>>> + (uint64_t)dma_unmap.size, len); >>>>>>> + rte_errno = EIO; >>>>>>> + return -1; >>>>>>> } >>>>>>> } >>>>>>> @@ -1853,6 +1869,12 @@ container_dma_unmap(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova, >>>>>>> /* we're partially unmapping a previously mapped region, so we >>>>>>> * need to split entry into two. >>>>>>> */ >>>>>>> + if (!vfio_cfg->vfio_iommu_type->partial_unmap) { >>>>>>> + RTE_LOG(DEBUG, EAL, "DMA partial unmap unsupported\n"); >>>>>>> + rte_errno = ENOTSUP; >>>>>>> + ret = -1; >>>>>>> + goto out; >>>>>>> + } >>>>>> >>>>>> How would we ever arrive here if we never do more than 1 page worth of >>>>>> memory anyway? I don't think this is needed. >>>>> >>>>> container_dma_unmap() is called by user via rte_vfio_container_dma_unmap() >>>>> and when he maps we don't split it as we don't about his memory. >>>>> So if he maps multiple pages and tries to unmap partially, then we should fail. >>>> >>>> Should we map it in page granularity then, instead of adding this >>>> discrepancy between EAL and user mapping? I.e. instead of adding a >>>> workaround, how about we just do the same thing for user mem mappings? >>>> >>> In heap mapping's we map and unmap it at huge page granularity as we will always >>> maintain that. >>> >>> But here I think we don't know if user's allocation is huge page or >>> collection of system >>> pages. Only thing we can do here is map it at system page granularity which >>> could waste entries if he say really is working with hugepages. Isn't ? >>> >> >> Yeah we do. The API mandates the pages granularity, and it will check >> against page size and number of IOVA entries, so yes, we do enforce the fact >> that the IOVA addresses supplied by the user have to be page addresses. > > If I see rte_vfio_container_dma_map(), there is no mention of Huge page size > user is providing or we computing. He can call rte_vfio_container_dma_map() > with 1GB huge page or 4K system page. > > Am I missing something ? Are you suggesting that a DMA mapping for hugepage-backed memory will be made at system page size granularity? E.g. will a 1GB page-backed segment be mapped for DMA as a contiguous 4K-based block? >> >> -- >> Thanks, >> Anatoly -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [dpdk-stable] [dpdk-dev] [EXT] Re: [PATCH 2/2] vfio: fix partial DMA unmapping for VFIO type1 2020-10-17 16:14 ` Burakov, Anatoly @ 2020-10-19 9:43 ` Nithin Dabilpuram 2020-10-22 12:13 ` Nithin Dabilpuram 0 siblings, 1 reply; 43+ messages in thread From: Nithin Dabilpuram @ 2020-10-19 9:43 UTC (permalink / raw) To: Burakov, Anatoly; +Cc: Jerin Jacob, dev, stable On Sat, Oct 17, 2020 at 05:14:55PM +0100, Burakov, Anatoly wrote: > On 16-Oct-20 8:10 AM, Nithin Dabilpuram wrote: > > On Thu, Oct 15, 2020 at 04:10:31PM +0100, Burakov, Anatoly wrote: > > > On 15-Oct-20 12:57 PM, Nithin Dabilpuram wrote: > > > > On Thu, Oct 15, 2020 at 3:31 PM Burakov, Anatoly > > > > <anatoly.burakov@intel.com> wrote: > > > > > > > > > > On 15-Oct-20 7:09 AM, Nithin Dabilpuram wrote: > > > > > > On Wed, Oct 14, 2020 at 04:07:10PM +0100, Burakov, Anatoly wrote: > > > > > > > External Email > > > > > > > > > > > > > > ---------------------------------------------------------------------- > > > > > > > On 12-Oct-20 9:11 AM, Nithin Dabilpuram wrote: > > > > > > > > Partial unmapping is not supported for VFIO IOMMU type1 > > > > > > > > by kernel. Though kernel gives return as zero, the unmapped size > > > > > > > > returned will not be same as expected. So check for > > > > > > > > returned unmap size and return error. > > > > > > > > > > > > > > > > For case of DMA map/unmap triggered by heap allocations, > > > > > > > > maintain granularity of memseg page size so that heap > > > > > > > > expansion and contraction does not have this issue. > > > > > > > > > > > > > > This is quite unfortunate, because there was a different bug that had to do > > > > > > > with kernel having a very limited number of mappings available [1], as a > > > > > > > result of which the page concatenation code was added. > > > > > > > > > > > > > > It should therefore be documented that the dma_entry_limit parameter should > > > > > > > be adjusted should the user run out of the DMA entries. > > > > > > > > > > > > > > [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_lkml_155414977872.12780.13728555131525362206.stgit-40gimli.home_T_&d=DwICaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=FZ_tPCbgFOh18zwRPO9H0yDx8VW38vuapifdDfc8SFQ&m=3GMg-634_cdUCY4WpQPwjzZ_S4ckuMHOnt2FxyyjXMk&s=TJLzppkaDS95VGyRHX2hzflQfb9XLK0OiOszSXoeXKk&e= > > > > > > > > > > <snip> > > > > > > > > > > > > > RTE_LOG(ERR, EAL, " cannot clear DMA remapping, error %i (%s)\n", > > > > > > > > errno, strerror(errno)); > > > > > > > > return -1; > > > > > > > > + } else if (dma_unmap.size != len) { > > > > > > > > + RTE_LOG(ERR, EAL, " unexpected size %"PRIu64" of DMA " > > > > > > > > + "remapping cleared instead of %"PRIu64"\n", > > > > > > > > + (uint64_t)dma_unmap.size, len); > > > > > > > > + rte_errno = EIO; > > > > > > > > + return -1; > > > > > > > > } > > > > > > > > } > > > > > > > > @@ -1853,6 +1869,12 @@ container_dma_unmap(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova, > > > > > > > > /* we're partially unmapping a previously mapped region, so we > > > > > > > > * need to split entry into two. > > > > > > > > */ > > > > > > > > + if (!vfio_cfg->vfio_iommu_type->partial_unmap) { > > > > > > > > + RTE_LOG(DEBUG, EAL, "DMA partial unmap unsupported\n"); > > > > > > > > + rte_errno = ENOTSUP; > > > > > > > > + ret = -1; > > > > > > > > + goto out; > > > > > > > > + } > > > > > > > > > > > > > > How would we ever arrive here if we never do more than 1 page worth of > > > > > > > memory anyway? I don't think this is needed. > > > > > > > > > > > > container_dma_unmap() is called by user via rte_vfio_container_dma_unmap() > > > > > > and when he maps we don't split it as we don't about his memory. > > > > > > So if he maps multiple pages and tries to unmap partially, then we should fail. > > > > > > > > > > Should we map it in page granularity then, instead of adding this > > > > > discrepancy between EAL and user mapping? I.e. instead of adding a > > > > > workaround, how about we just do the same thing for user mem mappings? > > > > > > > > > In heap mapping's we map and unmap it at huge page granularity as we will always > > > > maintain that. > > > > > > > > But here I think we don't know if user's allocation is huge page or > > > > collection of system > > > > pages. Only thing we can do here is map it at system page granularity which > > > > could waste entries if he say really is working with hugepages. Isn't ? > > > > > > > > > > Yeah we do. The API mandates the pages granularity, and it will check > > > against page size and number of IOVA entries, so yes, we do enforce the fact > > > that the IOVA addresses supplied by the user have to be page addresses. > > > > If I see rte_vfio_container_dma_map(), there is no mention of Huge page size > > user is providing or we computing. He can call rte_vfio_container_dma_map() > > with 1GB huge page or 4K system page. > > > > Am I missing something ? > > Are you suggesting that a DMA mapping for hugepage-backed memory will be > made at system page size granularity? E.g. will a 1GB page-backed segment be > mapped for DMA as a contiguous 4K-based block? I'm not suggesting anything. My only thought is how to solve below problem. Say application does the following. #1 Allocate 1GB memory from huge page or some external mem. #2 Do rte_vfio_container_dma_map(RTE_VFIO_DEFAULT_CONTAINER_FD, mem, mem, 1GB) In linux/eal_vfio.c, we map it is as single VFIO DMA entry of 1 GB as we don't know where this memory is coming from or backed by what. #3 After a while call rte_vfio_container_dma_unmap(RTE_VFIO_DEFAULT_CONTAINER_FD, mem+4KB, mem+4KB, 4KB) Though rte_vfio_container_dma_unmap() supports #3 by splitting entry as shown below, In VFIO type1 iommu, #3 cannot be supported by current kernel interface. So how can we allow #3 ? static int container_dma_unmap(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova, uint64_t len) { struct user_mem_map *map, *new_map = NULL; struct user_mem_maps *user_mem_maps; int ret = 0; user_mem_maps = &vfio_cfg->mem_maps; rte_spinlock_recursive_lock(&user_mem_maps->lock); /* find our mapping */ map = find_user_mem_map(user_mem_maps, vaddr, iova, len); if (!map) { RTE_LOG(ERR, EAL, "Couldn't find previously mapped region\n"); rte_errno = EINVAL; ret = -1; goto out; } if (map->addr != vaddr || map->iova != iova || map->len != len) { /* we're partially unmapping a previously mapped region, so we * need to split entry into two. */ > > > > > > > -- > > > Thanks, > > > Anatoly > > > -- > Thanks, > Anatoly ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [dpdk-stable] [dpdk-dev] [EXT] Re: [PATCH 2/2] vfio: fix partial DMA unmapping for VFIO type1 2020-10-19 9:43 ` Nithin Dabilpuram @ 2020-10-22 12:13 ` Nithin Dabilpuram 2020-10-28 13:04 ` Burakov, Anatoly 0 siblings, 1 reply; 43+ messages in thread From: Nithin Dabilpuram @ 2020-10-22 12:13 UTC (permalink / raw) To: Burakov, Anatoly; +Cc: Jerin Jacob, dev, stable Ping. On Mon, Oct 19, 2020 at 03:13:15PM +0530, Nithin Dabilpuram wrote: > On Sat, Oct 17, 2020 at 05:14:55PM +0100, Burakov, Anatoly wrote: > > On 16-Oct-20 8:10 AM, Nithin Dabilpuram wrote: > > > On Thu, Oct 15, 2020 at 04:10:31PM +0100, Burakov, Anatoly wrote: > > > > On 15-Oct-20 12:57 PM, Nithin Dabilpuram wrote: > > > > > On Thu, Oct 15, 2020 at 3:31 PM Burakov, Anatoly > > > > > <anatoly.burakov@intel.com> wrote: > > > > > > > > > > > > On 15-Oct-20 7:09 AM, Nithin Dabilpuram wrote: > > > > > > > On Wed, Oct 14, 2020 at 04:07:10PM +0100, Burakov, Anatoly wrote: > > > > > > > > External Email > > > > > > > > > > > > > > > > ---------------------------------------------------------------------- > > > > > > > > On 12-Oct-20 9:11 AM, Nithin Dabilpuram wrote: > > > > > > > > > Partial unmapping is not supported for VFIO IOMMU type1 > > > > > > > > > by kernel. Though kernel gives return as zero, the unmapped size > > > > > > > > > returned will not be same as expected. So check for > > > > > > > > > returned unmap size and return error. > > > > > > > > > > > > > > > > > > For case of DMA map/unmap triggered by heap allocations, > > > > > > > > > maintain granularity of memseg page size so that heap > > > > > > > > > expansion and contraction does not have this issue. > > > > > > > > > > > > > > > > This is quite unfortunate, because there was a different bug that had to do > > > > > > > > with kernel having a very limited number of mappings available [1], as a > > > > > > > > result of which the page concatenation code was added. > > > > > > > > > > > > > > > > It should therefore be documented that the dma_entry_limit parameter should > > > > > > > > be adjusted should the user run out of the DMA entries. > > > > > > > > > > > > > > > > [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_lkml_155414977872.12780.13728555131525362206.stgit-40gimli.home_T_&d=DwICaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=FZ_tPCbgFOh18zwRPO9H0yDx8VW38vuapifdDfc8SFQ&m=3GMg-634_cdUCY4WpQPwjzZ_S4ckuMHOnt2FxyyjXMk&s=TJLzppkaDS95VGyRHX2hzflQfb9XLK0OiOszSXoeXKk&e= > > > > > > > > > > > > <snip> > > > > > > > > > > > > > > > RTE_LOG(ERR, EAL, " cannot clear DMA remapping, error %i (%s)\n", > > > > > > > > > errno, strerror(errno)); > > > > > > > > > return -1; > > > > > > > > > + } else if (dma_unmap.size != len) { > > > > > > > > > + RTE_LOG(ERR, EAL, " unexpected size %"PRIu64" of DMA " > > > > > > > > > + "remapping cleared instead of %"PRIu64"\n", > > > > > > > > > + (uint64_t)dma_unmap.size, len); > > > > > > > > > + rte_errno = EIO; > > > > > > > > > + return -1; > > > > > > > > > } > > > > > > > > > } > > > > > > > > > @@ -1853,6 +1869,12 @@ container_dma_unmap(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova, > > > > > > > > > /* we're partially unmapping a previously mapped region, so we > > > > > > > > > * need to split entry into two. > > > > > > > > > */ > > > > > > > > > + if (!vfio_cfg->vfio_iommu_type->partial_unmap) { > > > > > > > > > + RTE_LOG(DEBUG, EAL, "DMA partial unmap unsupported\n"); > > > > > > > > > + rte_errno = ENOTSUP; > > > > > > > > > + ret = -1; > > > > > > > > > + goto out; > > > > > > > > > + } > > > > > > > > > > > > > > > > How would we ever arrive here if we never do more than 1 page worth of > > > > > > > > memory anyway? I don't think this is needed. > > > > > > > > > > > > > > container_dma_unmap() is called by user via rte_vfio_container_dma_unmap() > > > > > > > and when he maps we don't split it as we don't about his memory. > > > > > > > So if he maps multiple pages and tries to unmap partially, then we should fail. > > > > > > > > > > > > Should we map it in page granularity then, instead of adding this > > > > > > discrepancy between EAL and user mapping? I.e. instead of adding a > > > > > > workaround, how about we just do the same thing for user mem mappings? > > > > > > > > > > > In heap mapping's we map and unmap it at huge page granularity as we will always > > > > > maintain that. > > > > > > > > > > But here I think we don't know if user's allocation is huge page or > > > > > collection of system > > > > > pages. Only thing we can do here is map it at system page granularity which > > > > > could waste entries if he say really is working with hugepages. Isn't ? > > > > > > > > > > > > > Yeah we do. The API mandates the pages granularity, and it will check > > > > against page size and number of IOVA entries, so yes, we do enforce the fact > > > > that the IOVA addresses supplied by the user have to be page addresses. > > > > > > If I see rte_vfio_container_dma_map(), there is no mention of Huge page size > > > user is providing or we computing. He can call rte_vfio_container_dma_map() > > > with 1GB huge page or 4K system page. > > > > > > Am I missing something ? > > > > Are you suggesting that a DMA mapping for hugepage-backed memory will be > > made at system page size granularity? E.g. will a 1GB page-backed segment be > > mapped for DMA as a contiguous 4K-based block? > > I'm not suggesting anything. My only thought is how to solve below problem. > Say application does the following. > > #1 Allocate 1GB memory from huge page or some external mem. > #2 Do rte_vfio_container_dma_map(RTE_VFIO_DEFAULT_CONTAINER_FD, mem, mem, 1GB) > In linux/eal_vfio.c, we map it is as single VFIO DMA entry of 1 GB as we > don't know where this memory is coming from or backed by what. > #3 After a while call rte_vfio_container_dma_unmap(RTE_VFIO_DEFAULT_CONTAINER_FD, mem+4KB, mem+4KB, 4KB) > > Though rte_vfio_container_dma_unmap() supports #3 by splitting entry as shown below, > In VFIO type1 iommu, #3 cannot be supported by current kernel interface. So how > can we allow #3 ? > > > static int > container_dma_unmap(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova, > uint64_t len) > { > struct user_mem_map *map, *new_map = NULL; > struct user_mem_maps *user_mem_maps; > int ret = 0; > > user_mem_maps = &vfio_cfg->mem_maps; > rte_spinlock_recursive_lock(&user_mem_maps->lock); > > /* find our mapping */ > map = find_user_mem_map(user_mem_maps, vaddr, iova, len); > if (!map) { > RTE_LOG(ERR, EAL, "Couldn't find previously mapped region\n"); > rte_errno = EINVAL; > ret = -1; > goto out; > } > if (map->addr != vaddr || map->iova != iova || map->len != len) { > /* we're partially unmapping a previously mapped region, so we > * need to split entry into two. > */ > > > > > > > > > > > > -- > > > > Thanks, > > > > Anatoly > > > > > > -- > > Thanks, > > Anatoly ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [dpdk-stable] [dpdk-dev] [EXT] Re: [PATCH 2/2] vfio: fix partial DMA unmapping for VFIO type1 2020-10-22 12:13 ` Nithin Dabilpuram @ 2020-10-28 13:04 ` Burakov, Anatoly 2020-10-28 14:17 ` Nithin Dabilpuram 0 siblings, 1 reply; 43+ messages in thread From: Burakov, Anatoly @ 2020-10-28 13:04 UTC (permalink / raw) To: Nithin Dabilpuram; +Cc: Jerin Jacob, dev, stable On 22-Oct-20 1:13 PM, Nithin Dabilpuram wrote: > Ping. > > On Mon, Oct 19, 2020 at 03:13:15PM +0530, Nithin Dabilpuram wrote: >> On Sat, Oct 17, 2020 at 05:14:55PM +0100, Burakov, Anatoly wrote: >>> On 16-Oct-20 8:10 AM, Nithin Dabilpuram wrote: >>>> On Thu, Oct 15, 2020 at 04:10:31PM +0100, Burakov, Anatoly wrote: >>>>> On 15-Oct-20 12:57 PM, Nithin Dabilpuram wrote: >>>>>> On Thu, Oct 15, 2020 at 3:31 PM Burakov, Anatoly >>>>>> <anatoly.burakov@intel.com> wrote: >>>>>>> >>>>>>> On 15-Oct-20 7:09 AM, Nithin Dabilpuram wrote: >>>>>>>> On Wed, Oct 14, 2020 at 04:07:10PM +0100, Burakov, Anatoly wrote: >>>>>>>>> External Email >>>>>>>>> >>>>>>>>> ---------------------------------------------------------------------- >>>>>>>>> On 12-Oct-20 9:11 AM, Nithin Dabilpuram wrote: >>>>>>>>>> Partial unmapping is not supported for VFIO IOMMU type1 >>>>>>>>>> by kernel. Though kernel gives return as zero, the unmapped size >>>>>>>>>> returned will not be same as expected. So check for >>>>>>>>>> returned unmap size and return error. >>>>>>>>>> >>>>>>>>>> For case of DMA map/unmap triggered by heap allocations, >>>>>>>>>> maintain granularity of memseg page size so that heap >>>>>>>>>> expansion and contraction does not have this issue. >>>>>>>>> >>>>>>>>> This is quite unfortunate, because there was a different bug that had to do >>>>>>>>> with kernel having a very limited number of mappings available [1], as a >>>>>>>>> result of which the page concatenation code was added. >>>>>>>>> >>>>>>>>> It should therefore be documented that the dma_entry_limit parameter should >>>>>>>>> be adjusted should the user run out of the DMA entries. >>>>>>>>> >>>>>>>>> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_lkml_155414977872.12780.13728555131525362206.stgit-40gimli.home_T_&d=DwICaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=FZ_tPCbgFOh18zwRPO9H0yDx8VW38vuapifdDfc8SFQ&m=3GMg-634_cdUCY4WpQPwjzZ_S4ckuMHOnt2FxyyjXMk&s=TJLzppkaDS95VGyRHX2hzflQfb9XLK0OiOszSXoeXKk&e= >>>>>>> >>>>>>> <snip> >>>>>>> >>>>>>>>>> RTE_LOG(ERR, EAL, " cannot clear DMA remapping, error %i (%s)\n", >>>>>>>>>> errno, strerror(errno)); >>>>>>>>>> return -1; >>>>>>>>>> + } else if (dma_unmap.size != len) { >>>>>>>>>> + RTE_LOG(ERR, EAL, " unexpected size %"PRIu64" of DMA " >>>>>>>>>> + "remapping cleared instead of %"PRIu64"\n", >>>>>>>>>> + (uint64_t)dma_unmap.size, len); >>>>>>>>>> + rte_errno = EIO; >>>>>>>>>> + return -1; >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>>> @@ -1853,6 +1869,12 @@ container_dma_unmap(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova, >>>>>>>>>> /* we're partially unmapping a previously mapped region, so we >>>>>>>>>> * need to split entry into two. >>>>>>>>>> */ >>>>>>>>>> + if (!vfio_cfg->vfio_iommu_type->partial_unmap) { >>>>>>>>>> + RTE_LOG(DEBUG, EAL, "DMA partial unmap unsupported\n"); >>>>>>>>>> + rte_errno = ENOTSUP; >>>>>>>>>> + ret = -1; >>>>>>>>>> + goto out; >>>>>>>>>> + } >>>>>>>>> >>>>>>>>> How would we ever arrive here if we never do more than 1 page worth of >>>>>>>>> memory anyway? I don't think this is needed. >>>>>>>> >>>>>>>> container_dma_unmap() is called by user via rte_vfio_container_dma_unmap() >>>>>>>> and when he maps we don't split it as we don't about his memory. >>>>>>>> So if he maps multiple pages and tries to unmap partially, then we should fail. >>>>>>> >>>>>>> Should we map it in page granularity then, instead of adding this >>>>>>> discrepancy between EAL and user mapping? I.e. instead of adding a >>>>>>> workaround, how about we just do the same thing for user mem mappings? >>>>>>> >>>>>> In heap mapping's we map and unmap it at huge page granularity as we will always >>>>>> maintain that. >>>>>> >>>>>> But here I think we don't know if user's allocation is huge page or >>>>>> collection of system >>>>>> pages. Only thing we can do here is map it at system page granularity which >>>>>> could waste entries if he say really is working with hugepages. Isn't ? >>>>>> >>>>> >>>>> Yeah we do. The API mandates the pages granularity, and it will check >>>>> against page size and number of IOVA entries, so yes, we do enforce the fact >>>>> that the IOVA addresses supplied by the user have to be page addresses. >>>> >>>> If I see rte_vfio_container_dma_map(), there is no mention of Huge page size >>>> user is providing or we computing. He can call rte_vfio_container_dma_map() >>>> with 1GB huge page or 4K system page. >>>> >>>> Am I missing something ? >>> >>> Are you suggesting that a DMA mapping for hugepage-backed memory will be >>> made at system page size granularity? E.g. will a 1GB page-backed segment be >>> mapped for DMA as a contiguous 4K-based block? >> >> I'm not suggesting anything. My only thought is how to solve below problem. >> Say application does the following. >> >> #1 Allocate 1GB memory from huge page or some external mem. >> #2 Do rte_vfio_container_dma_map(RTE_VFIO_DEFAULT_CONTAINER_FD, mem, mem, 1GB) >> In linux/eal_vfio.c, we map it is as single VFIO DMA entry of 1 GB as we >> don't know where this memory is coming from or backed by what. >> #3 After a while call rte_vfio_container_dma_unmap(RTE_VFIO_DEFAULT_CONTAINER_FD, mem+4KB, mem+4KB, 4KB) >> >> Though rte_vfio_container_dma_unmap() supports #3 by splitting entry as shown below, >> In VFIO type1 iommu, #3 cannot be supported by current kernel interface. So how >> can we allow #3 ? >> >> >> static int >> container_dma_unmap(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova, >> uint64_t len) >> { >> struct user_mem_map *map, *new_map = NULL; >> struct user_mem_maps *user_mem_maps; >> int ret = 0; >> >> user_mem_maps = &vfio_cfg->mem_maps; >> rte_spinlock_recursive_lock(&user_mem_maps->lock); >> >> /* find our mapping */ >> map = find_user_mem_map(user_mem_maps, vaddr, iova, len); >> if (!map) { >> RTE_LOG(ERR, EAL, "Couldn't find previously mapped region\n"); >> rte_errno = EINVAL; >> ret = -1; >> goto out; >> } >> if (map->addr != vaddr || map->iova != iova || map->len != len) { >> /* we're partially unmapping a previously mapped region, so we >> * need to split entry into two. >> */ Hi, Apologies, i was on vacation. Yes, I can see the problem now. Does VFIO even support non-system page sizes? Like, if i allocated a 1GB page, would i be able to map *this page* for DMA, as opposed to first 4K of this page? I suspect that the mapping doesn't support page sizes other than the system page size. -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [dpdk-stable] [dpdk-dev] [EXT] Re: [PATCH 2/2] vfio: fix partial DMA unmapping for VFIO type1 2020-10-28 13:04 ` Burakov, Anatoly @ 2020-10-28 14:17 ` Nithin Dabilpuram 2020-10-28 16:07 ` Burakov, Anatoly 0 siblings, 1 reply; 43+ messages in thread From: Nithin Dabilpuram @ 2020-10-28 14:17 UTC (permalink / raw) To: Burakov, Anatoly; +Cc: Jerin Jacob, dev, stable On Wed, Oct 28, 2020 at 01:04:26PM +0000, Burakov, Anatoly wrote: > On 22-Oct-20 1:13 PM, Nithin Dabilpuram wrote: > > Ping. > > > > On Mon, Oct 19, 2020 at 03:13:15PM +0530, Nithin Dabilpuram wrote: > > > On Sat, Oct 17, 2020 at 05:14:55PM +0100, Burakov, Anatoly wrote: > > > > On 16-Oct-20 8:10 AM, Nithin Dabilpuram wrote: > > > > > On Thu, Oct 15, 2020 at 04:10:31PM +0100, Burakov, Anatoly wrote: > > > > > > On 15-Oct-20 12:57 PM, Nithin Dabilpuram wrote: > > > > > > > On Thu, Oct 15, 2020 at 3:31 PM Burakov, Anatoly > > > > > > > <anatoly.burakov@intel.com> wrote: > > > > > > > > > > > > > > > > On 15-Oct-20 7:09 AM, Nithin Dabilpuram wrote: > > > > > > > > > On Wed, Oct 14, 2020 at 04:07:10PM +0100, Burakov, Anatoly wrote: > > > > > > > > > > External Email > > > > > > > > > > > > > > > > > > > > ---------------------------------------------------------------------- > > > > > > > > > > On 12-Oct-20 9:11 AM, Nithin Dabilpuram wrote: > > > > > > > > > > > Partial unmapping is not supported for VFIO IOMMU type1 > > > > > > > > > > > by kernel. Though kernel gives return as zero, the unmapped size > > > > > > > > > > > returned will not be same as expected. So check for > > > > > > > > > > > returned unmap size and return error. > > > > > > > > > > > > > > > > > > > > > > For case of DMA map/unmap triggered by heap allocations, > > > > > > > > > > > maintain granularity of memseg page size so that heap > > > > > > > > > > > expansion and contraction does not have this issue. > > > > > > > > > > > > > > > > > > > > This is quite unfortunate, because there was a different bug that had to do > > > > > > > > > > with kernel having a very limited number of mappings available [1], as a > > > > > > > > > > result of which the page concatenation code was added. > > > > > > > > > > > > > > > > > > > > It should therefore be documented that the dma_entry_limit parameter should > > > > > > > > > > be adjusted should the user run out of the DMA entries. > > > > > > > > > > > > > > > > > > > > [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_lkml_155414977872.12780.13728555131525362206.stgit-40gimli.home_T_&d=DwICaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=FZ_tPCbgFOh18zwRPO9H0yDx8VW38vuapifdDfc8SFQ&m=3GMg-634_cdUCY4WpQPwjzZ_S4ckuMHOnt2FxyyjXMk&s=TJLzppkaDS95VGyRHX2hzflQfb9XLK0OiOszSXoeXKk&e= > > > > > > > > > > > > > > > > <snip> > > > > > > > > > > > > > > > > > > > RTE_LOG(ERR, EAL, " cannot clear DMA remapping, error %i (%s)\n", > > > > > > > > > > > errno, strerror(errno)); > > > > > > > > > > > return -1; > > > > > > > > > > > + } else if (dma_unmap.size != len) { > > > > > > > > > > > + RTE_LOG(ERR, EAL, " unexpected size %"PRIu64" of DMA " > > > > > > > > > > > + "remapping cleared instead of %"PRIu64"\n", > > > > > > > > > > > + (uint64_t)dma_unmap.size, len); > > > > > > > > > > > + rte_errno = EIO; > > > > > > > > > > > + return -1; > > > > > > > > > > > } > > > > > > > > > > > } > > > > > > > > > > > @@ -1853,6 +1869,12 @@ container_dma_unmap(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova, > > > > > > > > > > > /* we're partially unmapping a previously mapped region, so we > > > > > > > > > > > * need to split entry into two. > > > > > > > > > > > */ > > > > > > > > > > > + if (!vfio_cfg->vfio_iommu_type->partial_unmap) { > > > > > > > > > > > + RTE_LOG(DEBUG, EAL, "DMA partial unmap unsupported\n"); > > > > > > > > > > > + rte_errno = ENOTSUP; > > > > > > > > > > > + ret = -1; > > > > > > > > > > > + goto out; > > > > > > > > > > > + } > > > > > > > > > > > > > > > > > > > > How would we ever arrive here if we never do more than 1 page worth of > > > > > > > > > > memory anyway? I don't think this is needed. > > > > > > > > > > > > > > > > > > container_dma_unmap() is called by user via rte_vfio_container_dma_unmap() > > > > > > > > > and when he maps we don't split it as we don't about his memory. > > > > > > > > > So if he maps multiple pages and tries to unmap partially, then we should fail. > > > > > > > > > > > > > > > > Should we map it in page granularity then, instead of adding this > > > > > > > > discrepancy between EAL and user mapping? I.e. instead of adding a > > > > > > > > workaround, how about we just do the same thing for user mem mappings? > > > > > > > > > > > > > > > In heap mapping's we map and unmap it at huge page granularity as we will always > > > > > > > maintain that. > > > > > > > > > > > > > > But here I think we don't know if user's allocation is huge page or > > > > > > > collection of system > > > > > > > pages. Only thing we can do here is map it at system page granularity which > > > > > > > could waste entries if he say really is working with hugepages. Isn't ? > > > > > > > > > > > > > > > > > > > Yeah we do. The API mandates the pages granularity, and it will check > > > > > > against page size and number of IOVA entries, so yes, we do enforce the fact > > > > > > that the IOVA addresses supplied by the user have to be page addresses. > > > > > > > > > > If I see rte_vfio_container_dma_map(), there is no mention of Huge page size > > > > > user is providing or we computing. He can call rte_vfio_container_dma_map() > > > > > with 1GB huge page or 4K system page. > > > > > > > > > > Am I missing something ? > > > > > > > > Are you suggesting that a DMA mapping for hugepage-backed memory will be > > > > made at system page size granularity? E.g. will a 1GB page-backed segment be > > > > mapped for DMA as a contiguous 4K-based block? > > > > > > I'm not suggesting anything. My only thought is how to solve below problem. > > > Say application does the following. > > > > > > #1 Allocate 1GB memory from huge page or some external mem. > > > #2 Do rte_vfio_container_dma_map(RTE_VFIO_DEFAULT_CONTAINER_FD, mem, mem, 1GB) > > > In linux/eal_vfio.c, we map it is as single VFIO DMA entry of 1 GB as we > > > don't know where this memory is coming from or backed by what. > > > #3 After a while call rte_vfio_container_dma_unmap(RTE_VFIO_DEFAULT_CONTAINER_FD, mem+4KB, mem+4KB, 4KB) > > > Though rte_vfio_container_dma_unmap() supports #3 by splitting entry as shown below, > > > In VFIO type1 iommu, #3 cannot be supported by current kernel interface. So how > > > can we allow #3 ? > > > > > > > > > static int > > > container_dma_unmap(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova, > > > uint64_t len) > > > { > > > struct user_mem_map *map, *new_map = NULL; > > > struct user_mem_maps *user_mem_maps; > > > int ret = 0; > > > > > > user_mem_maps = &vfio_cfg->mem_maps; > > > rte_spinlock_recursive_lock(&user_mem_maps->lock); > > > > > > /* find our mapping */ > > > map = find_user_mem_map(user_mem_maps, vaddr, iova, len); > > > if (!map) { > > > RTE_LOG(ERR, EAL, "Couldn't find previously mapped region\n"); > > > rte_errno = EINVAL; > > > ret = -1; > > > goto out; > > > } > > > if (map->addr != vaddr || map->iova != iova || map->len != len) { > > > /* we're partially unmapping a previously mapped region, so we > > > * need to split entry into two. > > > */ > > Hi, > > Apologies, i was on vacation. > > Yes, I can see the problem now. Does VFIO even support non-system page > sizes? Like, if i allocated a 1GB page, would i be able to map *this page* > for DMA, as opposed to first 4K of this page? I suspect that the mapping > doesn't support page sizes other than the system page size. It does support mapping any multiple of system page size. See vfio/vfio_iommu_type1.c vfio_pin_map_dma(). Also ./driver-api/vfio.rst doesn't mention any such restrictions even in its example. Also my test case is passing so that confirms the behavior. > > -- > Thanks, > Anatoly ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [dpdk-stable] [dpdk-dev] [EXT] Re: [PATCH 2/2] vfio: fix partial DMA unmapping for VFIO type1 2020-10-28 14:17 ` Nithin Dabilpuram @ 2020-10-28 16:07 ` Burakov, Anatoly 2020-10-28 16:31 ` Nithin Dabilpuram 0 siblings, 1 reply; 43+ messages in thread From: Burakov, Anatoly @ 2020-10-28 16:07 UTC (permalink / raw) To: Nithin Dabilpuram; +Cc: Jerin Jacob, dev, stable On 28-Oct-20 2:17 PM, Nithin Dabilpuram wrote: > On Wed, Oct 28, 2020 at 01:04:26PM +0000, Burakov, Anatoly wrote: >> On 22-Oct-20 1:13 PM, Nithin Dabilpuram wrote: >>> Ping. >>> >>> On Mon, Oct 19, 2020 at 03:13:15PM +0530, Nithin Dabilpuram wrote: >>>> On Sat, Oct 17, 2020 at 05:14:55PM +0100, Burakov, Anatoly wrote: >>>>> On 16-Oct-20 8:10 AM, Nithin Dabilpuram wrote: >>>>>> On Thu, Oct 15, 2020 at 04:10:31PM +0100, Burakov, Anatoly wrote: >>>>>>> On 15-Oct-20 12:57 PM, Nithin Dabilpuram wrote: >>>>>>>> On Thu, Oct 15, 2020 at 3:31 PM Burakov, Anatoly >>>>>>>> <anatoly.burakov@intel.com> wrote: >>>>>>>>> >>>>>>>>> On 15-Oct-20 7:09 AM, Nithin Dabilpuram wrote: >>>>>>>>>> On Wed, Oct 14, 2020 at 04:07:10PM +0100, Burakov, Anatoly wrote: >>>>>>>>>>> External Email >>>>>>>>>>> >>>>>>>>>>> ---------------------------------------------------------------------- >>>>>>>>>>> On 12-Oct-20 9:11 AM, Nithin Dabilpuram wrote: >>>>>>>>>>>> Partial unmapping is not supported for VFIO IOMMU type1 >>>>>>>>>>>> by kernel. Though kernel gives return as zero, the unmapped size >>>>>>>>>>>> returned will not be same as expected. So check for >>>>>>>>>>>> returned unmap size and return error. >>>>>>>>>>>> >>>>>>>>>>>> For case of DMA map/unmap triggered by heap allocations, >>>>>>>>>>>> maintain granularity of memseg page size so that heap >>>>>>>>>>>> expansion and contraction does not have this issue. >>>>>>>>>>> >>>>>>>>>>> This is quite unfortunate, because there was a different bug that had to do >>>>>>>>>>> with kernel having a very limited number of mappings available [1], as a >>>>>>>>>>> result of which the page concatenation code was added. >>>>>>>>>>> >>>>>>>>>>> It should therefore be documented that the dma_entry_limit parameter should >>>>>>>>>>> be adjusted should the user run out of the DMA entries. >>>>>>>>>>> >>>>>>>>>>> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_lkml_155414977872.12780.13728555131525362206.stgit-40gimli.home_T_&d=DwICaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=FZ_tPCbgFOh18zwRPO9H0yDx8VW38vuapifdDfc8SFQ&m=3GMg-634_cdUCY4WpQPwjzZ_S4ckuMHOnt2FxyyjXMk&s=TJLzppkaDS95VGyRHX2hzflQfb9XLK0OiOszSXoeXKk&e= >>>>>>>>> >>>>>>>>> <snip> >>>>>>>>> >>>>>>>>>>>> RTE_LOG(ERR, EAL, " cannot clear DMA remapping, error %i (%s)\n", >>>>>>>>>>>> errno, strerror(errno)); >>>>>>>>>>>> return -1; >>>>>>>>>>>> + } else if (dma_unmap.size != len) { >>>>>>>>>>>> + RTE_LOG(ERR, EAL, " unexpected size %"PRIu64" of DMA " >>>>>>>>>>>> + "remapping cleared instead of %"PRIu64"\n", >>>>>>>>>>>> + (uint64_t)dma_unmap.size, len); >>>>>>>>>>>> + rte_errno = EIO; >>>>>>>>>>>> + return -1; >>>>>>>>>>>> } >>>>>>>>>>>> } >>>>>>>>>>>> @@ -1853,6 +1869,12 @@ container_dma_unmap(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova, >>>>>>>>>>>> /* we're partially unmapping a previously mapped region, so we >>>>>>>>>>>> * need to split entry into two. >>>>>>>>>>>> */ >>>>>>>>>>>> + if (!vfio_cfg->vfio_iommu_type->partial_unmap) { >>>>>>>>>>>> + RTE_LOG(DEBUG, EAL, "DMA partial unmap unsupported\n"); >>>>>>>>>>>> + rte_errno = ENOTSUP; >>>>>>>>>>>> + ret = -1; >>>>>>>>>>>> + goto out; >>>>>>>>>>>> + } >>>>>>>>>>> >>>>>>>>>>> How would we ever arrive here if we never do more than 1 page worth of >>>>>>>>>>> memory anyway? I don't think this is needed. >>>>>>>>>> >>>>>>>>>> container_dma_unmap() is called by user via rte_vfio_container_dma_unmap() >>>>>>>>>> and when he maps we don't split it as we don't about his memory. >>>>>>>>>> So if he maps multiple pages and tries to unmap partially, then we should fail. >>>>>>>>> >>>>>>>>> Should we map it in page granularity then, instead of adding this >>>>>>>>> discrepancy between EAL and user mapping? I.e. instead of adding a >>>>>>>>> workaround, how about we just do the same thing for user mem mappings? >>>>>>>>> >>>>>>>> In heap mapping's we map and unmap it at huge page granularity as we will always >>>>>>>> maintain that. >>>>>>>> >>>>>>>> But here I think we don't know if user's allocation is huge page or >>>>>>>> collection of system >>>>>>>> pages. Only thing we can do here is map it at system page granularity which >>>>>>>> could waste entries if he say really is working with hugepages. Isn't ? >>>>>>>> >>>>>>> >>>>>>> Yeah we do. The API mandates the pages granularity, and it will check >>>>>>> against page size and number of IOVA entries, so yes, we do enforce the fact >>>>>>> that the IOVA addresses supplied by the user have to be page addresses. >>>>>> >>>>>> If I see rte_vfio_container_dma_map(), there is no mention of Huge page size >>>>>> user is providing or we computing. He can call rte_vfio_container_dma_map() >>>>>> with 1GB huge page or 4K system page. >>>>>> >>>>>> Am I missing something ? >>>>> >>>>> Are you suggesting that a DMA mapping for hugepage-backed memory will be >>>>> made at system page size granularity? E.g. will a 1GB page-backed segment be >>>>> mapped for DMA as a contiguous 4K-based block? >>>> >>>> I'm not suggesting anything. My only thought is how to solve below problem. >>>> Say application does the following. >>>> >>>> #1 Allocate 1GB memory from huge page or some external mem. >>>> #2 Do rte_vfio_container_dma_map(RTE_VFIO_DEFAULT_CONTAINER_FD, mem, mem, 1GB) >>>> In linux/eal_vfio.c, we map it is as single VFIO DMA entry of 1 GB as we >>>> don't know where this memory is coming from or backed by what. >>>> #3 After a while call rte_vfio_container_dma_unmap(RTE_VFIO_DEFAULT_CONTAINER_FD, mem+4KB, mem+4KB, 4KB) >>>> Though rte_vfio_container_dma_unmap() supports #3 by splitting entry as shown below, >>>> In VFIO type1 iommu, #3 cannot be supported by current kernel interface. So how >>>> can we allow #3 ? >>>> >>>> >>>> static int >>>> container_dma_unmap(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova, >>>> uint64_t len) >>>> { >>>> struct user_mem_map *map, *new_map = NULL; >>>> struct user_mem_maps *user_mem_maps; >>>> int ret = 0; >>>> >>>> user_mem_maps = &vfio_cfg->mem_maps; >>>> rte_spinlock_recursive_lock(&user_mem_maps->lock); >>>> >>>> /* find our mapping */ >>>> map = find_user_mem_map(user_mem_maps, vaddr, iova, len); >>>> if (!map) { >>>> RTE_LOG(ERR, EAL, "Couldn't find previously mapped region\n"); >>>> rte_errno = EINVAL; >>>> ret = -1; >>>> goto out; >>>> } >>>> if (map->addr != vaddr || map->iova != iova || map->len != len) { >>>> /* we're partially unmapping a previously mapped region, so we >>>> * need to split entry into two. >>>> */ >> >> Hi, >> >> Apologies, i was on vacation. >> >> Yes, I can see the problem now. Does VFIO even support non-system page >> sizes? Like, if i allocated a 1GB page, would i be able to map *this page* >> for DMA, as opposed to first 4K of this page? I suspect that the mapping >> doesn't support page sizes other than the system page size. > > It does support mapping any multiple of system page size. > See vfio/vfio_iommu_type1.c vfio_pin_map_dma(). Also > ./driver-api/vfio.rst doesn't mention any such restrictions even in its > example. > > Also my test case is passing so that confirms the behavior. Can we perhaps make it so that the API mandates mapping/unmapping the same chunks? That would be the easiest solution here. > > >> >> -- >> Thanks, >> Anatoly -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [dpdk-stable] [dpdk-dev] [EXT] Re: [PATCH 2/2] vfio: fix partial DMA unmapping for VFIO type1 2020-10-28 16:07 ` Burakov, Anatoly @ 2020-10-28 16:31 ` Nithin Dabilpuram 0 siblings, 0 replies; 43+ messages in thread From: Nithin Dabilpuram @ 2020-10-28 16:31 UTC (permalink / raw) To: Burakov, Anatoly; +Cc: Jerin Jacob, dev, stable On Wed, Oct 28, 2020 at 04:07:17PM +0000, Burakov, Anatoly wrote: > On 28-Oct-20 2:17 PM, Nithin Dabilpuram wrote: > > On Wed, Oct 28, 2020 at 01:04:26PM +0000, Burakov, Anatoly wrote: > > > On 22-Oct-20 1:13 PM, Nithin Dabilpuram wrote: > > > > Ping. > > > > > > > > On Mon, Oct 19, 2020 at 03:13:15PM +0530, Nithin Dabilpuram wrote: > > > > > On Sat, Oct 17, 2020 at 05:14:55PM +0100, Burakov, Anatoly wrote: > > > > > > On 16-Oct-20 8:10 AM, Nithin Dabilpuram wrote: > > > > > > > On Thu, Oct 15, 2020 at 04:10:31PM +0100, Burakov, Anatoly wrote: > > > > > > > > On 15-Oct-20 12:57 PM, Nithin Dabilpuram wrote: > > > > > > > > > On Thu, Oct 15, 2020 at 3:31 PM Burakov, Anatoly > > > > > > > > > <anatoly.burakov@intel.com> wrote: > > > > > > > > > > > > > > > > > > > > On 15-Oct-20 7:09 AM, Nithin Dabilpuram wrote: > > > > > > > > > > > On Wed, Oct 14, 2020 at 04:07:10PM +0100, Burakov, Anatoly wrote: > > > > > > > > > > > > External Email > > > > > > > > > > > > > > > > > > > > > > > > ---------------------------------------------------------------------- > > > > > > > > > > > > On 12-Oct-20 9:11 AM, Nithin Dabilpuram wrote: > > > > > > > > > > > > > Partial unmapping is not supported for VFIO IOMMU type1 > > > > > > > > > > > > > by kernel. Though kernel gives return as zero, the unmapped size > > > > > > > > > > > > > returned will not be same as expected. So check for > > > > > > > > > > > > > returned unmap size and return error. > > > > > > > > > > > > > > > > > > > > > > > > > > For case of DMA map/unmap triggered by heap allocations, > > > > > > > > > > > > > maintain granularity of memseg page size so that heap > > > > > > > > > > > > > expansion and contraction does not have this issue. > > > > > > > > > > > > > > > > > > > > > > > > This is quite unfortunate, because there was a different bug that had to do > > > > > > > > > > > > with kernel having a very limited number of mappings available [1], as a > > > > > > > > > > > > result of which the page concatenation code was added. > > > > > > > > > > > > > > > > > > > > > > > > It should therefore be documented that the dma_entry_limit parameter should > > > > > > > > > > > > be adjusted should the user run out of the DMA entries. > > > > > > > > > > > > > > > > > > > > > > > > [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_lkml_155414977872.12780.13728555131525362206.stgit-40gimli.home_T_&d=DwICaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=FZ_tPCbgFOh18zwRPO9H0yDx8VW38vuapifdDfc8SFQ&m=3GMg-634_cdUCY4WpQPwjzZ_S4ckuMHOnt2FxyyjXMk&s=TJLzppkaDS95VGyRHX2hzflQfb9XLK0OiOszSXoeXKk&e= > > > > > > > > > > > > > > > > > > > > <snip> > > > > > > > > > > > > > > > > > > > > > > > RTE_LOG(ERR, EAL, " cannot clear DMA remapping, error %i (%s)\n", > > > > > > > > > > > > > errno, strerror(errno)); > > > > > > > > > > > > > return -1; > > > > > > > > > > > > > + } else if (dma_unmap.size != len) { > > > > > > > > > > > > > + RTE_LOG(ERR, EAL, " unexpected size %"PRIu64" of DMA " > > > > > > > > > > > > > + "remapping cleared instead of %"PRIu64"\n", > > > > > > > > > > > > > + (uint64_t)dma_unmap.size, len); > > > > > > > > > > > > > + rte_errno = EIO; > > > > > > > > > > > > > + return -1; > > > > > > > > > > > > > } > > > > > > > > > > > > > } > > > > > > > > > > > > > @@ -1853,6 +1869,12 @@ container_dma_unmap(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova, > > > > > > > > > > > > > /* we're partially unmapping a previously mapped region, so we > > > > > > > > > > > > > * need to split entry into two. > > > > > > > > > > > > > */ > > > > > > > > > > > > > + if (!vfio_cfg->vfio_iommu_type->partial_unmap) { > > > > > > > > > > > > > + RTE_LOG(DEBUG, EAL, "DMA partial unmap unsupported\n"); > > > > > > > > > > > > > + rte_errno = ENOTSUP; > > > > > > > > > > > > > + ret = -1; > > > > > > > > > > > > > + goto out; > > > > > > > > > > > > > + } > > > > > > > > > > > > > > > > > > > > > > > > How would we ever arrive here if we never do more than 1 page worth of > > > > > > > > > > > > memory anyway? I don't think this is needed. > > > > > > > > > > > > > > > > > > > > > > container_dma_unmap() is called by user via rte_vfio_container_dma_unmap() > > > > > > > > > > > and when he maps we don't split it as we don't about his memory. > > > > > > > > > > > So if he maps multiple pages and tries to unmap partially, then we should fail. > > > > > > > > > > > > > > > > > > > > Should we map it in page granularity then, instead of adding this > > > > > > > > > > discrepancy between EAL and user mapping? I.e. instead of adding a > > > > > > > > > > workaround, how about we just do the same thing for user mem mappings? > > > > > > > > > > > > > > > > > > > In heap mapping's we map and unmap it at huge page granularity as we will always > > > > > > > > > maintain that. > > > > > > > > > > > > > > > > > > But here I think we don't know if user's allocation is huge page or > > > > > > > > > collection of system > > > > > > > > > pages. Only thing we can do here is map it at system page granularity which > > > > > > > > > could waste entries if he say really is working with hugepages. Isn't ? > > > > > > > > > > > > > > > > > > > > > > > > > Yeah we do. The API mandates the pages granularity, and it will check > > > > > > > > against page size and number of IOVA entries, so yes, we do enforce the fact > > > > > > > > that the IOVA addresses supplied by the user have to be page addresses. > > > > > > > > > > > > > > If I see rte_vfio_container_dma_map(), there is no mention of Huge page size > > > > > > > user is providing or we computing. He can call rte_vfio_container_dma_map() > > > > > > > with 1GB huge page or 4K system page. > > > > > > > > > > > > > > Am I missing something ? > > > > > > > > > > > > Are you suggesting that a DMA mapping for hugepage-backed memory will be > > > > > > made at system page size granularity? E.g. will a 1GB page-backed segment be > > > > > > mapped for DMA as a contiguous 4K-based block? > > > > > > > > > > I'm not suggesting anything. My only thought is how to solve below problem. > > > > > Say application does the following. > > > > > > > > > > #1 Allocate 1GB memory from huge page or some external mem. > > > > > #2 Do rte_vfio_container_dma_map(RTE_VFIO_DEFAULT_CONTAINER_FD, mem, mem, 1GB) > > > > > In linux/eal_vfio.c, we map it is as single VFIO DMA entry of 1 GB as we > > > > > don't know where this memory is coming from or backed by what. > > > > > #3 After a while call rte_vfio_container_dma_unmap(RTE_VFIO_DEFAULT_CONTAINER_FD, mem+4KB, mem+4KB, 4KB) > > > > > Though rte_vfio_container_dma_unmap() supports #3 by splitting entry as shown below, > > > > > In VFIO type1 iommu, #3 cannot be supported by current kernel interface. So how > > > > > can we allow #3 ? > > > > > > > > > > > > > > > static int > > > > > container_dma_unmap(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova, > > > > > uint64_t len) > > > > > { > > > > > struct user_mem_map *map, *new_map = NULL; > > > > > struct user_mem_maps *user_mem_maps; > > > > > int ret = 0; > > > > > > > > > > user_mem_maps = &vfio_cfg->mem_maps; > > > > > rte_spinlock_recursive_lock(&user_mem_maps->lock); > > > > > > > > > > /* find our mapping */ > > > > > map = find_user_mem_map(user_mem_maps, vaddr, iova, len); > > > > > if (!map) { > > > > > RTE_LOG(ERR, EAL, "Couldn't find previously mapped region\n"); > > > > > rte_errno = EINVAL; > > > > > ret = -1; > > > > > goto out; > > > > > } > > > > > if (map->addr != vaddr || map->iova != iova || map->len != len) { > > > > > /* we're partially unmapping a previously mapped region, so we > > > > > * need to split entry into two. > > > > > */ > > > > > > Hi, > > > > > > Apologies, i was on vacation. > > > > > > Yes, I can see the problem now. Does VFIO even support non-system page > > > sizes? Like, if i allocated a 1GB page, would i be able to map *this page* > > > for DMA, as opposed to first 4K of this page? I suspect that the mapping > > > doesn't support page sizes other than the system page size. > > > > It does support mapping any multiple of system page size. > > See vfio/vfio_iommu_type1.c vfio_pin_map_dma(). Also > > ./driver-api/vfio.rst doesn't mention any such restrictions even in its > > example. > > > > Also my test case is passing so that confirms the behavior. > > Can we perhaps make it so that the API mandates mapping/unmapping the same > chunks? That would be the easiest solution here. Ack, I was already doing that for type1 IOMMU with my above patch. I didn't change the behavior for sPAPR or no-iommu mode. > > > > > > > > > > > -- > > > Thanks, > > > Anatoly > > > -- > Thanks, > Anatoly ^ permalink raw reply [flat|nested] 43+ messages in thread
[parent not found: <20201105090423.11954-1-ndabilpuram@marvell.com>]
* [dpdk-stable] [PATCH v2 1/3] vfio: revert changes for map contiguous areas in one go [not found] ` <20201105090423.11954-1-ndabilpuram@marvell.com> @ 2020-11-05 9:04 ` Nithin Dabilpuram 2020-11-05 9:04 ` [dpdk-stable] [PATCH v2 2/3] vfio: fix DMA mapping granularity for type1 iova as va Nithin Dabilpuram 1 sibling, 0 replies; 43+ messages in thread From: Nithin Dabilpuram @ 2020-11-05 9:04 UTC (permalink / raw) To: anatoly.burakov; +Cc: jerinj, dev, Nithin Dabilpuram, stable In order to save DMA entries limited by kernel both for externel memory and hugepage memory, an attempt was made to map physically contiguous memory in one go. This cannot be done as VFIO IOMMU type1 does not support partially unmapping a previously mapped memory region while Heap can request for multi page mapping and partial unmapping. Hence for going back to old method of mapping/unmapping at memseg granularity, this commit reverts commit d1c7c0cdf7ba ("vfio: map contiguous areas in one go") Also add documentation on what module parameter needs to be used to increase the per-container dma map limit for VFIO. Fixes: d1c7c0cdf7ba ("vfio: map contiguous areas in one go") Cc: anatoly.burakov@intel.com Cc: stable@dpdk.org Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> --- doc/guides/linux_gsg/linux_drivers.rst | 10 ++++++ lib/librte_eal/linux/eal_vfio.c | 59 +++++----------------------------- 2 files changed, 18 insertions(+), 51 deletions(-) diff --git a/doc/guides/linux_gsg/linux_drivers.rst b/doc/guides/linux_gsg/linux_drivers.rst index 080b449..bb43ab2 100644 --- a/doc/guides/linux_gsg/linux_drivers.rst +++ b/doc/guides/linux_gsg/linux_drivers.rst @@ -67,6 +67,16 @@ Note that in order to use VFIO, your kernel must support it. VFIO kernel modules have been included in the Linux kernel since version 3.6.0 and are usually present by default, however please consult your distributions documentation to make sure that is the case. +For DMA mapping of either external memory or hugepages, VFIO interface is used. +VFIO does not support partial unmap of once mapped memory. Hence DPDK's memory is +mapped in hugepage granularity or system page granularity. Number of DMA +mappings is limited by kernel with user locked memory limit of a process(rlimit) +for system/hugepage memory. Another per-container overall limit applicable both +for external memory and system memory was added in kernel 5.1 defined by +VFIO module parameter ``dma_entry_limit`` with a default value of 64K. +When application is out of DMA entries, these limits need to be adjusted to +increase the allowed limit. + The ``vfio-pci`` module since Linux version 5.7 supports the creation of virtual functions. After the PF is bound to vfio-pci module, the user can create the VFs by sysfs interface, and these VFs are bound to vfio-pci module automatically. diff --git a/lib/librte_eal/linux/eal_vfio.c b/lib/librte_eal/linux/eal_vfio.c index 380f2f4..dbefcba 100644 --- a/lib/librte_eal/linux/eal_vfio.c +++ b/lib/librte_eal/linux/eal_vfio.c @@ -516,11 +516,9 @@ static void vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len, void *arg __rte_unused) { - rte_iova_t iova_start, iova_expected; struct rte_memseg_list *msl; struct rte_memseg *ms; size_t cur_len = 0; - uint64_t va_start; msl = rte_mem_virt2memseg_list(addr); @@ -549,63 +547,22 @@ vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len, #endif /* memsegs are contiguous in memory */ ms = rte_mem_virt2memseg(addr, msl); - - /* - * This memory is not guaranteed to be contiguous, but it still could - * be, or it could have some small contiguous chunks. Since the number - * of VFIO mappings is limited, and VFIO appears to not concatenate - * adjacent mappings, we have to do this ourselves. - * - * So, find contiguous chunks, then map them. - */ - va_start = ms->addr_64; - iova_start = iova_expected = ms->iova; while (cur_len < len) { - bool new_contig_area = ms->iova != iova_expected; - bool last_seg = (len - cur_len) == ms->len; - bool skip_last = false; - - /* only do mappings when current contiguous area ends */ - if (new_contig_area) { - if (type == RTE_MEM_EVENT_ALLOC) - vfio_dma_mem_map(default_vfio_cfg, va_start, - iova_start, - iova_expected - iova_start, 1); - else - vfio_dma_mem_map(default_vfio_cfg, va_start, - iova_start, - iova_expected - iova_start, 0); - va_start = ms->addr_64; - iova_start = ms->iova; - } /* some memory segments may have invalid IOVA */ if (ms->iova == RTE_BAD_IOVA) { RTE_LOG(DEBUG, EAL, "Memory segment at %p has bad IOVA, skipping\n", ms->addr); - skip_last = true; + goto next; } - iova_expected = ms->iova + ms->len; + if (type == RTE_MEM_EVENT_ALLOC) + vfio_dma_mem_map(default_vfio_cfg, ms->addr_64, + ms->iova, ms->len, 1); + else + vfio_dma_mem_map(default_vfio_cfg, ms->addr_64, + ms->iova, ms->len, 0); +next: cur_len += ms->len; ++ms; - - /* - * don't count previous segment, and don't attempt to - * dereference a potentially invalid pointer. - */ - if (skip_last && !last_seg) { - iova_expected = iova_start = ms->iova; - va_start = ms->addr_64; - } else if (!skip_last && last_seg) { - /* this is the last segment and we're not skipping */ - if (type == RTE_MEM_EVENT_ALLOC) - vfio_dma_mem_map(default_vfio_cfg, va_start, - iova_start, - iova_expected - iova_start, 1); - else - vfio_dma_mem_map(default_vfio_cfg, va_start, - iova_start, - iova_expected - iova_start, 0); - } } #ifdef RTE_ARCH_PPC_64 cur_len = 0; -- 2.8.4 ^ permalink raw reply [flat|nested] 43+ messages in thread
* [dpdk-stable] [PATCH v2 2/3] vfio: fix DMA mapping granularity for type1 iova as va [not found] ` <20201105090423.11954-1-ndabilpuram@marvell.com> 2020-11-05 9:04 ` [dpdk-stable] [PATCH v2 1/3] vfio: revert changes for map contiguous areas in one go Nithin Dabilpuram @ 2020-11-05 9:04 ` Nithin Dabilpuram 2020-11-10 14:04 ` Burakov, Anatoly 2020-11-10 14:17 ` [dpdk-stable] " Burakov, Anatoly 1 sibling, 2 replies; 43+ messages in thread From: Nithin Dabilpuram @ 2020-11-05 9:04 UTC (permalink / raw) To: anatoly.burakov; +Cc: jerinj, dev, Nithin Dabilpuram, stable Partial unmapping is not supported for VFIO IOMMU type1 by kernel. Though kernel gives return as zero, the unmapped size returned will not be same as expected. So check for returned unmap size and return error. For IOVA as PA, DMA mapping is already at memseg size granularity. Do the same even for IOVA as VA mode as DMA map/unmap triggered by heap allocations, maintain granularity of memseg page size so that heap expansion and contraction does not have this issue. For user requested DMA map/unmap disallow partial unmapping for VFIO type1. Fixes: 73a639085938 ("vfio: allow to map other memory regions") Cc: anatoly.burakov@intel.com Cc: stable@dpdk.org Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> --- lib/librte_eal/linux/eal_vfio.c | 34 ++++++++++++++++++++++++++++------ lib/librte_eal/linux/eal_vfio.h | 1 + 2 files changed, 29 insertions(+), 6 deletions(-) diff --git a/lib/librte_eal/linux/eal_vfio.c b/lib/librte_eal/linux/eal_vfio.c index dbefcba..b4f9c33 100644 --- a/lib/librte_eal/linux/eal_vfio.c +++ b/lib/librte_eal/linux/eal_vfio.c @@ -69,6 +69,7 @@ static const struct vfio_iommu_type iommu_types[] = { { .type_id = RTE_VFIO_TYPE1, .name = "Type 1", + .partial_unmap = false, .dma_map_func = &vfio_type1_dma_map, .dma_user_map_func = &vfio_type1_dma_mem_map }, @@ -76,6 +77,7 @@ static const struct vfio_iommu_type iommu_types[] = { { .type_id = RTE_VFIO_SPAPR, .name = "sPAPR", + .partial_unmap = true, .dma_map_func = &vfio_spapr_dma_map, .dma_user_map_func = &vfio_spapr_dma_mem_map }, @@ -83,6 +85,7 @@ static const struct vfio_iommu_type iommu_types[] = { { .type_id = RTE_VFIO_NOIOMMU, .name = "No-IOMMU", + .partial_unmap = true, .dma_map_func = &vfio_noiommu_dma_map, .dma_user_map_func = &vfio_noiommu_dma_mem_map }, @@ -525,12 +528,19 @@ vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len, /* for IOVA as VA mode, no need to care for IOVA addresses */ if (rte_eal_iova_mode() == RTE_IOVA_VA && msl->external == 0) { uint64_t vfio_va = (uint64_t)(uintptr_t)addr; - if (type == RTE_MEM_EVENT_ALLOC) - vfio_dma_mem_map(default_vfio_cfg, vfio_va, vfio_va, - len, 1); - else - vfio_dma_mem_map(default_vfio_cfg, vfio_va, vfio_va, - len, 0); + uint64_t page_sz = msl->page_sz; + + /* Maintain granularity of DMA map/unmap to memseg size */ + for (; cur_len < len; cur_len += page_sz) { + if (type == RTE_MEM_EVENT_ALLOC) + vfio_dma_mem_map(default_vfio_cfg, vfio_va, + vfio_va, page_sz, 1); + else + vfio_dma_mem_map(default_vfio_cfg, vfio_va, + vfio_va, page_sz, 0); + vfio_va += page_sz; + } + return; } @@ -1369,6 +1379,12 @@ vfio_type1_dma_mem_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova, RTE_LOG(ERR, EAL, " cannot clear DMA remapping, error %i (%s)\n", errno, strerror(errno)); return -1; + } else if (dma_unmap.size != len) { + RTE_LOG(ERR, EAL, " unexpected size %"PRIu64" of DMA " + "remapping cleared instead of %"PRIu64"\n", + (uint64_t)dma_unmap.size, len); + rte_errno = EIO; + return -1; } } @@ -1839,6 +1855,12 @@ container_dma_unmap(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova, /* we're partially unmapping a previously mapped region, so we * need to split entry into two. */ + if (!vfio_cfg->vfio_iommu_type->partial_unmap) { + RTE_LOG(DEBUG, EAL, "DMA partial unmap unsupported\n"); + rte_errno = ENOTSUP; + ret = -1; + goto out; + } if (user_mem_maps->n_maps == VFIO_MAX_USER_MEM_MAPS) { RTE_LOG(ERR, EAL, "Not enough space to store partial mapping\n"); rte_errno = ENOMEM; diff --git a/lib/librte_eal/linux/eal_vfio.h b/lib/librte_eal/linux/eal_vfio.h index cb2d35f..6ebaca6 100644 --- a/lib/librte_eal/linux/eal_vfio.h +++ b/lib/librte_eal/linux/eal_vfio.h @@ -113,6 +113,7 @@ typedef int (*vfio_dma_user_func_t)(int fd, uint64_t vaddr, uint64_t iova, struct vfio_iommu_type { int type_id; const char *name; + bool partial_unmap; vfio_dma_user_func_t dma_user_map_func; vfio_dma_func_t dma_map_func; }; -- 2.8.4 ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [dpdk-stable] [PATCH v2 2/3] vfio: fix DMA mapping granularity for type1 iova as va 2020-11-05 9:04 ` [dpdk-stable] [PATCH v2 2/3] vfio: fix DMA mapping granularity for type1 iova as va Nithin Dabilpuram @ 2020-11-10 14:04 ` Burakov, Anatoly 2020-11-10 14:22 ` [dpdk-stable] [dpdk-dev] " Burakov, Anatoly 2020-11-10 14:17 ` [dpdk-stable] " Burakov, Anatoly 1 sibling, 1 reply; 43+ messages in thread From: Burakov, Anatoly @ 2020-11-10 14:04 UTC (permalink / raw) To: Nithin Dabilpuram; +Cc: jerinj, dev, stable On 05-Nov-20 9:04 AM, Nithin Dabilpuram wrote: > Partial unmapping is not supported for VFIO IOMMU type1 > by kernel. Though kernel gives return as zero, the unmapped size > returned will not be same as expected. So check for > returned unmap size and return error. > > For IOVA as PA, DMA mapping is already at memseg size > granularity. Do the same even for IOVA as VA mode as > DMA map/unmap triggered by heap allocations, > maintain granularity of memseg page size so that heap > expansion and contraction does not have this issue. > > For user requested DMA map/unmap disallow partial unmapping > for VFIO type1. > > Fixes: 73a639085938 ("vfio: allow to map other memory regions") > Cc: anatoly.burakov@intel.com > Cc: stable@dpdk.org > > Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> > --- Maybe i just didn't have enough coffee today, but i still don't see why this "partial unmap" thing exists. We are already mapping the addresses page-by-page, so surely "partial" unmaps can't even exist in the first place? -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [dpdk-stable] [dpdk-dev] [PATCH v2 2/3] vfio: fix DMA mapping granularity for type1 iova as va 2020-11-10 14:04 ` Burakov, Anatoly @ 2020-11-10 14:22 ` Burakov, Anatoly 0 siblings, 0 replies; 43+ messages in thread From: Burakov, Anatoly @ 2020-11-10 14:22 UTC (permalink / raw) To: Nithin Dabilpuram; +Cc: jerinj, dev, stable On 10-Nov-20 2:04 PM, Burakov, Anatoly wrote: > On 05-Nov-20 9:04 AM, Nithin Dabilpuram wrote: >> Partial unmapping is not supported for VFIO IOMMU type1 >> by kernel. Though kernel gives return as zero, the unmapped size >> returned will not be same as expected. So check for >> returned unmap size and return error. >> >> For IOVA as PA, DMA mapping is already at memseg size >> granularity. Do the same even for IOVA as VA mode as >> DMA map/unmap triggered by heap allocations, >> maintain granularity of memseg page size so that heap >> expansion and contraction does not have this issue. >> >> For user requested DMA map/unmap disallow partial unmapping >> for VFIO type1. >> >> Fixes: 73a639085938 ("vfio: allow to map other memory regions") >> Cc: anatoly.burakov@intel.com >> Cc: stable@dpdk.org >> >> Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> >> --- > > Maybe i just didn't have enough coffee today, but i still don't see why > this "partial unmap" thing exists. Oh, right, this is for *user* mapped memory. Disregard this email. > > We are already mapping the addresses page-by-page, so surely "partial" > unmaps can't even exist in the first place? > -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [dpdk-stable] [PATCH v2 2/3] vfio: fix DMA mapping granularity for type1 iova as va 2020-11-05 9:04 ` [dpdk-stable] [PATCH v2 2/3] vfio: fix DMA mapping granularity for type1 iova as va Nithin Dabilpuram 2020-11-10 14:04 ` Burakov, Anatoly @ 2020-11-10 14:17 ` Burakov, Anatoly 2020-11-11 5:08 ` [dpdk-stable] [dpdk-dev] " Nithin Dabilpuram 1 sibling, 1 reply; 43+ messages in thread From: Burakov, Anatoly @ 2020-11-10 14:17 UTC (permalink / raw) To: Nithin Dabilpuram; +Cc: jerinj, dev, stable On 05-Nov-20 9:04 AM, Nithin Dabilpuram wrote: > Partial unmapping is not supported for VFIO IOMMU type1 > by kernel. Though kernel gives return as zero, the unmapped size > returned will not be same as expected. So check for > returned unmap size and return error. > > For IOVA as PA, DMA mapping is already at memseg size > granularity. Do the same even for IOVA as VA mode as > DMA map/unmap triggered by heap allocations, > maintain granularity of memseg page size so that heap > expansion and contraction does not have this issue. > > For user requested DMA map/unmap disallow partial unmapping > for VFIO type1. > > Fixes: 73a639085938 ("vfio: allow to map other memory regions") > Cc: anatoly.burakov@intel.com > Cc: stable@dpdk.org > > Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> > --- <snip> > @@ -525,12 +528,19 @@ vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len, > /* for IOVA as VA mode, no need to care for IOVA addresses */ > if (rte_eal_iova_mode() == RTE_IOVA_VA && msl->external == 0) { > uint64_t vfio_va = (uint64_t)(uintptr_t)addr; > - if (type == RTE_MEM_EVENT_ALLOC) > - vfio_dma_mem_map(default_vfio_cfg, vfio_va, vfio_va, > - len, 1); > - else > - vfio_dma_mem_map(default_vfio_cfg, vfio_va, vfio_va, > - len, 0); > + uint64_t page_sz = msl->page_sz; > + > + /* Maintain granularity of DMA map/unmap to memseg size */ > + for (; cur_len < len; cur_len += page_sz) { > + if (type == RTE_MEM_EVENT_ALLOC) > + vfio_dma_mem_map(default_vfio_cfg, vfio_va, > + vfio_va, page_sz, 1); > + else > + vfio_dma_mem_map(default_vfio_cfg, vfio_va, > + vfio_va, page_sz, 0); I think you're mapping the same address here, over and over. Perhaps you meant `vfio_va + cur_len` for the mapping addresses? -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [dpdk-stable] [dpdk-dev] [PATCH v2 2/3] vfio: fix DMA mapping granularity for type1 iova as va 2020-11-10 14:17 ` [dpdk-stable] " Burakov, Anatoly @ 2020-11-11 5:08 ` Nithin Dabilpuram 2020-11-11 10:00 ` Burakov, Anatoly 0 siblings, 1 reply; 43+ messages in thread From: Nithin Dabilpuram @ 2020-11-11 5:08 UTC (permalink / raw) To: Burakov, Anatoly; +Cc: jerinj, dev, stable On Tue, Nov 10, 2020 at 02:17:39PM +0000, Burakov, Anatoly wrote: > On 05-Nov-20 9:04 AM, Nithin Dabilpuram wrote: > > Partial unmapping is not supported for VFIO IOMMU type1 > > by kernel. Though kernel gives return as zero, the unmapped size > > returned will not be same as expected. So check for > > returned unmap size and return error. > > > > For IOVA as PA, DMA mapping is already at memseg size > > granularity. Do the same even for IOVA as VA mode as > > DMA map/unmap triggered by heap allocations, > > maintain granularity of memseg page size so that heap > > expansion and contraction does not have this issue. > > > > For user requested DMA map/unmap disallow partial unmapping > > for VFIO type1. > > > > Fixes: 73a639085938 ("vfio: allow to map other memory regions") > > Cc: anatoly.burakov@intel.com > > Cc: stable@dpdk.org > > > > Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> > > --- > > <snip> > > > @@ -525,12 +528,19 @@ vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len, > > /* for IOVA as VA mode, no need to care for IOVA addresses */ > > if (rte_eal_iova_mode() == RTE_IOVA_VA && msl->external == 0) { > > uint64_t vfio_va = (uint64_t)(uintptr_t)addr; > > - if (type == RTE_MEM_EVENT_ALLOC) > > - vfio_dma_mem_map(default_vfio_cfg, vfio_va, vfio_va, > > - len, 1); > > - else > > - vfio_dma_mem_map(default_vfio_cfg, vfio_va, vfio_va, > > - len, 0); > > + uint64_t page_sz = msl->page_sz; > > + > > + /* Maintain granularity of DMA map/unmap to memseg size */ > > + for (; cur_len < len; cur_len += page_sz) { > > + if (type == RTE_MEM_EVENT_ALLOC) > > + vfio_dma_mem_map(default_vfio_cfg, vfio_va, > > + vfio_va, page_sz, 1); > > + else > > + vfio_dma_mem_map(default_vfio_cfg, vfio_va, > > + vfio_va, page_sz, 0); > > I think you're mapping the same address here, over and over. Perhaps you > meant `vfio_va + cur_len` for the mapping addresses? There is a 'vfio_va += page_sz;' in next line right ? > > -- > Thanks, > Anatoly ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [dpdk-stable] [dpdk-dev] [PATCH v2 2/3] vfio: fix DMA mapping granularity for type1 iova as va 2020-11-11 5:08 ` [dpdk-stable] [dpdk-dev] " Nithin Dabilpuram @ 2020-11-11 10:00 ` Burakov, Anatoly 0 siblings, 0 replies; 43+ messages in thread From: Burakov, Anatoly @ 2020-11-11 10:00 UTC (permalink / raw) To: Nithin Dabilpuram; +Cc: jerinj, dev, stable On 11-Nov-20 5:08 AM, Nithin Dabilpuram wrote: > On Tue, Nov 10, 2020 at 02:17:39PM +0000, Burakov, Anatoly wrote: >> On 05-Nov-20 9:04 AM, Nithin Dabilpuram wrote: >>> Partial unmapping is not supported for VFIO IOMMU type1 >>> by kernel. Though kernel gives return as zero, the unmapped size >>> returned will not be same as expected. So check for >>> returned unmap size and return error. >>> >>> For IOVA as PA, DMA mapping is already at memseg size >>> granularity. Do the same even for IOVA as VA mode as >>> DMA map/unmap triggered by heap allocations, >>> maintain granularity of memseg page size so that heap >>> expansion and contraction does not have this issue. >>> >>> For user requested DMA map/unmap disallow partial unmapping >>> for VFIO type1. >>> >>> Fixes: 73a639085938 ("vfio: allow to map other memory regions") >>> Cc: anatoly.burakov@intel.com >>> Cc: stable@dpdk.org >>> >>> Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> >>> --- >> >> <snip> >> >>> @@ -525,12 +528,19 @@ vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len, >>> /* for IOVA as VA mode, no need to care for IOVA addresses */ >>> if (rte_eal_iova_mode() == RTE_IOVA_VA && msl->external == 0) { >>> uint64_t vfio_va = (uint64_t)(uintptr_t)addr; >>> - if (type == RTE_MEM_EVENT_ALLOC) >>> - vfio_dma_mem_map(default_vfio_cfg, vfio_va, vfio_va, >>> - len, 1); >>> - else >>> - vfio_dma_mem_map(default_vfio_cfg, vfio_va, vfio_va, >>> - len, 0); >>> + uint64_t page_sz = msl->page_sz; >>> + >>> + /* Maintain granularity of DMA map/unmap to memseg size */ >>> + for (; cur_len < len; cur_len += page_sz) { >>> + if (type == RTE_MEM_EVENT_ALLOC) >>> + vfio_dma_mem_map(default_vfio_cfg, vfio_va, >>> + vfio_va, page_sz, 1); >>> + else >>> + vfio_dma_mem_map(default_vfio_cfg, vfio_va, >>> + vfio_va, page_sz, 0); >> >> I think you're mapping the same address here, over and over. Perhaps you >> meant `vfio_va + cur_len` for the mapping addresses? > > There is a 'vfio_va += page_sz;' in next line right ? >> >> -- >> Thanks, >> Anatoly Oh, right, my apologies. I did need more coffee :D Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 43+ messages in thread
[parent not found: <20201201193302.28131-1-ndabilpuram@marvell.com>]
* [dpdk-stable] [PATCH v3 1/4] vfio: revert changes for map contiguous areas in one go [not found] ` <20201201193302.28131-1-ndabilpuram@marvell.com> @ 2020-12-01 19:32 ` Nithin Dabilpuram 2020-12-01 19:33 ` [dpdk-stable] [PATCH v3 2/4] vfio: fix DMA mapping granularity for type1 IOVA as VA Nithin Dabilpuram 1 sibling, 0 replies; 43+ messages in thread From: Nithin Dabilpuram @ 2020-12-01 19:32 UTC (permalink / raw) To: anatoly.burakov, David Christensen, david.marchand Cc: jerinj, dev, Nithin Dabilpuram, stable In order to save DMA entries limited by kernel both for externel memory and hugepage memory, an attempt was made to map physically contiguous memory in one go. This cannot be done as VFIO IOMMU type1 does not support partially unmapping a previously mapped memory region while Heap can request for multi page mapping and partial unmapping. Hence for going back to old method of mapping/unmapping at memseg granularity, this commit reverts commit d1c7c0cdf7ba ("vfio: map contiguous areas in one go") Also add documentation on what module parameter needs to be used to increase the per-container dma map limit for VFIO. Fixes: d1c7c0cdf7ba ("vfio: map contiguous areas in one go") Cc: anatoly.burakov@intel.com Cc: stable@dpdk.org Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> --- doc/guides/linux_gsg/linux_drivers.rst | 10 ++++++ lib/librte_eal/linux/eal_vfio.c | 59 +++++----------------------------- 2 files changed, 18 insertions(+), 51 deletions(-) diff --git a/doc/guides/linux_gsg/linux_drivers.rst b/doc/guides/linux_gsg/linux_drivers.rst index 90635a4..9a662a7 100644 --- a/doc/guides/linux_gsg/linux_drivers.rst +++ b/doc/guides/linux_gsg/linux_drivers.rst @@ -25,6 +25,16 @@ To make use of VFIO, the ``vfio-pci`` module must be loaded: VFIO kernel is usually present by default in all distributions, however please consult your distributions documentation to make sure that is the case. +For DMA mapping of either external memory or hugepages, VFIO interface is used. +VFIO does not support partial unmap of once mapped memory. Hence DPDK's memory is +mapped in hugepage granularity or system page granularity. Number of DMA +mappings is limited by kernel with user locked memory limit of a process(rlimit) +for system/hugepage memory. Another per-container overall limit applicable both +for external memory and system memory was added in kernel 5.1 defined by +VFIO module parameter ``dma_entry_limit`` with a default value of 64K. +When application is out of DMA entries, these limits need to be adjusted to +increase the allowed limit. + Since Linux version 5.7, the ``vfio-pci`` module supports the creation of virtual functions. After the PF is bound to ``vfio-pci`` module, diff --git a/lib/librte_eal/linux/eal_vfio.c b/lib/librte_eal/linux/eal_vfio.c index 0500824..64b134d 100644 --- a/lib/librte_eal/linux/eal_vfio.c +++ b/lib/librte_eal/linux/eal_vfio.c @@ -517,11 +517,9 @@ static void vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len, void *arg __rte_unused) { - rte_iova_t iova_start, iova_expected; struct rte_memseg_list *msl; struct rte_memseg *ms; size_t cur_len = 0; - uint64_t va_start; msl = rte_mem_virt2memseg_list(addr); @@ -539,63 +537,22 @@ vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len, /* memsegs are contiguous in memory */ ms = rte_mem_virt2memseg(addr, msl); - - /* - * This memory is not guaranteed to be contiguous, but it still could - * be, or it could have some small contiguous chunks. Since the number - * of VFIO mappings is limited, and VFIO appears to not concatenate - * adjacent mappings, we have to do this ourselves. - * - * So, find contiguous chunks, then map them. - */ - va_start = ms->addr_64; - iova_start = iova_expected = ms->iova; while (cur_len < len) { - bool new_contig_area = ms->iova != iova_expected; - bool last_seg = (len - cur_len) == ms->len; - bool skip_last = false; - - /* only do mappings when current contiguous area ends */ - if (new_contig_area) { - if (type == RTE_MEM_EVENT_ALLOC) - vfio_dma_mem_map(default_vfio_cfg, va_start, - iova_start, - iova_expected - iova_start, 1); - else - vfio_dma_mem_map(default_vfio_cfg, va_start, - iova_start, - iova_expected - iova_start, 0); - va_start = ms->addr_64; - iova_start = ms->iova; - } /* some memory segments may have invalid IOVA */ if (ms->iova == RTE_BAD_IOVA) { RTE_LOG(DEBUG, EAL, "Memory segment at %p has bad IOVA, skipping\n", ms->addr); - skip_last = true; + goto next; } - iova_expected = ms->iova + ms->len; + if (type == RTE_MEM_EVENT_ALLOC) + vfio_dma_mem_map(default_vfio_cfg, ms->addr_64, + ms->iova, ms->len, 1); + else + vfio_dma_mem_map(default_vfio_cfg, ms->addr_64, + ms->iova, ms->len, 0); +next: cur_len += ms->len; ++ms; - - /* - * don't count previous segment, and don't attempt to - * dereference a potentially invalid pointer. - */ - if (skip_last && !last_seg) { - iova_expected = iova_start = ms->iova; - va_start = ms->addr_64; - } else if (!skip_last && last_seg) { - /* this is the last segment and we're not skipping */ - if (type == RTE_MEM_EVENT_ALLOC) - vfio_dma_mem_map(default_vfio_cfg, va_start, - iova_start, - iova_expected - iova_start, 1); - else - vfio_dma_mem_map(default_vfio_cfg, va_start, - iova_start, - iova_expected - iova_start, 0); - } } } -- 2.8.4 ^ permalink raw reply [flat|nested] 43+ messages in thread
* [dpdk-stable] [PATCH v3 2/4] vfio: fix DMA mapping granularity for type1 IOVA as VA [not found] ` <20201201193302.28131-1-ndabilpuram@marvell.com> 2020-12-01 19:32 ` [dpdk-stable] [PATCH v3 1/4] vfio: revert changes for map contiguous areas in one go Nithin Dabilpuram @ 2020-12-01 19:33 ` Nithin Dabilpuram 1 sibling, 0 replies; 43+ messages in thread From: Nithin Dabilpuram @ 2020-12-01 19:33 UTC (permalink / raw) To: anatoly.burakov, David Christensen, david.marchand Cc: jerinj, dev, Nithin Dabilpuram, stable Partial unmapping is not supported for VFIO IOMMU type1 by kernel. Though kernel gives return as zero, the unmapped size returned will not be same as expected. So check for returned unmap size and return error. For IOVA as PA, DMA mapping is already at memseg size granularity. Do the same even for IOVA as VA mode as DMA map/unmap triggered by heap allocations, maintain granularity of memseg page size so that heap expansion and contraction does not have this issue. For user requested DMA map/unmap disallow partial unmapping for VFIO type1. Fixes: 73a639085938 ("vfio: allow to map other memory regions") Cc: anatoly.burakov@intel.com Cc: stable@dpdk.org Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> --- lib/librte_eal/linux/eal_vfio.c | 34 ++++++++++++++++++++++++++++------ lib/librte_eal/linux/eal_vfio.h | 1 + 2 files changed, 29 insertions(+), 6 deletions(-) diff --git a/lib/librte_eal/linux/eal_vfio.c b/lib/librte_eal/linux/eal_vfio.c index 64b134d..b15b758 100644 --- a/lib/librte_eal/linux/eal_vfio.c +++ b/lib/librte_eal/linux/eal_vfio.c @@ -70,6 +70,7 @@ static const struct vfio_iommu_type iommu_types[] = { { .type_id = RTE_VFIO_TYPE1, .name = "Type 1", + .partial_unmap = false, .dma_map_func = &vfio_type1_dma_map, .dma_user_map_func = &vfio_type1_dma_mem_map }, @@ -77,6 +78,7 @@ static const struct vfio_iommu_type iommu_types[] = { { .type_id = RTE_VFIO_SPAPR, .name = "sPAPR", + .partial_unmap = true, .dma_map_func = &vfio_spapr_dma_map, .dma_user_map_func = &vfio_spapr_dma_mem_map }, @@ -84,6 +86,7 @@ static const struct vfio_iommu_type iommu_types[] = { { .type_id = RTE_VFIO_NOIOMMU, .name = "No-IOMMU", + .partial_unmap = true, .dma_map_func = &vfio_noiommu_dma_map, .dma_user_map_func = &vfio_noiommu_dma_mem_map }, @@ -526,12 +529,19 @@ vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len, /* for IOVA as VA mode, no need to care for IOVA addresses */ if (rte_eal_iova_mode() == RTE_IOVA_VA && msl->external == 0) { uint64_t vfio_va = (uint64_t)(uintptr_t)addr; - if (type == RTE_MEM_EVENT_ALLOC) - vfio_dma_mem_map(default_vfio_cfg, vfio_va, vfio_va, - len, 1); - else - vfio_dma_mem_map(default_vfio_cfg, vfio_va, vfio_va, - len, 0); + uint64_t page_sz = msl->page_sz; + + /* Maintain granularity of DMA map/unmap to memseg size */ + for (; cur_len < len; cur_len += page_sz) { + if (type == RTE_MEM_EVENT_ALLOC) + vfio_dma_mem_map(default_vfio_cfg, vfio_va, + vfio_va, page_sz, 1); + else + vfio_dma_mem_map(default_vfio_cfg, vfio_va, + vfio_va, page_sz, 0); + vfio_va += page_sz; + } + return; } @@ -1348,6 +1358,12 @@ vfio_type1_dma_mem_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova, RTE_LOG(ERR, EAL, " cannot clear DMA remapping, error %i (%s)\n", errno, strerror(errno)); return -1; + } else if (dma_unmap.size != len) { + RTE_LOG(ERR, EAL, " unexpected size %"PRIu64" of DMA " + "remapping cleared instead of %"PRIu64"\n", + (uint64_t)dma_unmap.size, len); + rte_errno = EIO; + return -1; } } @@ -1823,6 +1839,12 @@ container_dma_unmap(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova, /* we're partially unmapping a previously mapped region, so we * need to split entry into two. */ + if (!vfio_cfg->vfio_iommu_type->partial_unmap) { + RTE_LOG(DEBUG, EAL, "DMA partial unmap unsupported\n"); + rte_errno = ENOTSUP; + ret = -1; + goto out; + } if (user_mem_maps->n_maps == VFIO_MAX_USER_MEM_MAPS) { RTE_LOG(ERR, EAL, "Not enough space to store partial mapping\n"); rte_errno = ENOMEM; diff --git a/lib/librte_eal/linux/eal_vfio.h b/lib/librte_eal/linux/eal_vfio.h index cb2d35f..6ebaca6 100644 --- a/lib/librte_eal/linux/eal_vfio.h +++ b/lib/librte_eal/linux/eal_vfio.h @@ -113,6 +113,7 @@ typedef int (*vfio_dma_user_func_t)(int fd, uint64_t vaddr, uint64_t iova, struct vfio_iommu_type { int type_id; const char *name; + bool partial_unmap; vfio_dma_user_func_t dma_user_map_func; vfio_dma_func_t dma_map_func; }; -- 2.8.4 ^ permalink raw reply [flat|nested] 43+ messages in thread
[parent not found: <20201202054647.3449-1-ndabilpuram@marvell.com>]
* [dpdk-stable] [PATCH v4 1/4] vfio: revert changes for map contiguous areas in one go [not found] ` <20201202054647.3449-1-ndabilpuram@marvell.com> @ 2020-12-02 5:46 ` Nithin Dabilpuram 2020-12-02 18:36 ` David Christensen 2020-12-02 5:46 ` [dpdk-stable] [PATCH v4 2/4] vfio: fix DMA mapping granularity for type1 IOVA as VA Nithin Dabilpuram 1 sibling, 1 reply; 43+ messages in thread From: Nithin Dabilpuram @ 2020-12-02 5:46 UTC (permalink / raw) To: anatoly.burakov, David Christensen, david.marchand Cc: jerinj, dev, Nithin Dabilpuram, stable In order to save DMA entries limited by kernel both for externel memory and hugepage memory, an attempt was made to map physically contiguous memory in one go. This cannot be done as VFIO IOMMU type1 does not support partially unmapping a previously mapped memory region while Heap can request for multi page mapping and partial unmapping. Hence for going back to old method of mapping/unmapping at memseg granularity, this commit reverts commit d1c7c0cdf7ba ("vfio: map contiguous areas in one go") Also add documentation on what module parameter needs to be used to increase the per-container dma map limit for VFIO. Fixes: d1c7c0cdf7ba ("vfio: map contiguous areas in one go") Cc: anatoly.burakov@intel.com Cc: stable@dpdk.org Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> --- doc/guides/linux_gsg/linux_drivers.rst | 10 ++++++ lib/librte_eal/linux/eal_vfio.c | 59 +++++----------------------------- 2 files changed, 18 insertions(+), 51 deletions(-) diff --git a/doc/guides/linux_gsg/linux_drivers.rst b/doc/guides/linux_gsg/linux_drivers.rst index 90635a4..9a662a7 100644 --- a/doc/guides/linux_gsg/linux_drivers.rst +++ b/doc/guides/linux_gsg/linux_drivers.rst @@ -25,6 +25,16 @@ To make use of VFIO, the ``vfio-pci`` module must be loaded: VFIO kernel is usually present by default in all distributions, however please consult your distributions documentation to make sure that is the case. +For DMA mapping of either external memory or hugepages, VFIO interface is used. +VFIO does not support partial unmap of once mapped memory. Hence DPDK's memory is +mapped in hugepage granularity or system page granularity. Number of DMA +mappings is limited by kernel with user locked memory limit of a process(rlimit) +for system/hugepage memory. Another per-container overall limit applicable both +for external memory and system memory was added in kernel 5.1 defined by +VFIO module parameter ``dma_entry_limit`` with a default value of 64K. +When application is out of DMA entries, these limits need to be adjusted to +increase the allowed limit. + Since Linux version 5.7, the ``vfio-pci`` module supports the creation of virtual functions. After the PF is bound to ``vfio-pci`` module, diff --git a/lib/librte_eal/linux/eal_vfio.c b/lib/librte_eal/linux/eal_vfio.c index 0500824..64b134d 100644 --- a/lib/librte_eal/linux/eal_vfio.c +++ b/lib/librte_eal/linux/eal_vfio.c @@ -517,11 +517,9 @@ static void vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len, void *arg __rte_unused) { - rte_iova_t iova_start, iova_expected; struct rte_memseg_list *msl; struct rte_memseg *ms; size_t cur_len = 0; - uint64_t va_start; msl = rte_mem_virt2memseg_list(addr); @@ -539,63 +537,22 @@ vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len, /* memsegs are contiguous in memory */ ms = rte_mem_virt2memseg(addr, msl); - - /* - * This memory is not guaranteed to be contiguous, but it still could - * be, or it could have some small contiguous chunks. Since the number - * of VFIO mappings is limited, and VFIO appears to not concatenate - * adjacent mappings, we have to do this ourselves. - * - * So, find contiguous chunks, then map them. - */ - va_start = ms->addr_64; - iova_start = iova_expected = ms->iova; while (cur_len < len) { - bool new_contig_area = ms->iova != iova_expected; - bool last_seg = (len - cur_len) == ms->len; - bool skip_last = false; - - /* only do mappings when current contiguous area ends */ - if (new_contig_area) { - if (type == RTE_MEM_EVENT_ALLOC) - vfio_dma_mem_map(default_vfio_cfg, va_start, - iova_start, - iova_expected - iova_start, 1); - else - vfio_dma_mem_map(default_vfio_cfg, va_start, - iova_start, - iova_expected - iova_start, 0); - va_start = ms->addr_64; - iova_start = ms->iova; - } /* some memory segments may have invalid IOVA */ if (ms->iova == RTE_BAD_IOVA) { RTE_LOG(DEBUG, EAL, "Memory segment at %p has bad IOVA, skipping\n", ms->addr); - skip_last = true; + goto next; } - iova_expected = ms->iova + ms->len; + if (type == RTE_MEM_EVENT_ALLOC) + vfio_dma_mem_map(default_vfio_cfg, ms->addr_64, + ms->iova, ms->len, 1); + else + vfio_dma_mem_map(default_vfio_cfg, ms->addr_64, + ms->iova, ms->len, 0); +next: cur_len += ms->len; ++ms; - - /* - * don't count previous segment, and don't attempt to - * dereference a potentially invalid pointer. - */ - if (skip_last && !last_seg) { - iova_expected = iova_start = ms->iova; - va_start = ms->addr_64; - } else if (!skip_last && last_seg) { - /* this is the last segment and we're not skipping */ - if (type == RTE_MEM_EVENT_ALLOC) - vfio_dma_mem_map(default_vfio_cfg, va_start, - iova_start, - iova_expected - iova_start, 1); - else - vfio_dma_mem_map(default_vfio_cfg, va_start, - iova_start, - iova_expected - iova_start, 0); - } } } -- 2.8.4 ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [dpdk-stable] [PATCH v4 1/4] vfio: revert changes for map contiguous areas in one go 2020-12-02 5:46 ` [dpdk-stable] [PATCH v4 1/4] vfio: revert changes for map contiguous areas in one go Nithin Dabilpuram @ 2020-12-02 18:36 ` David Christensen 0 siblings, 0 replies; 43+ messages in thread From: David Christensen @ 2020-12-02 18:36 UTC (permalink / raw) To: Nithin Dabilpuram, anatoly.burakov, david.marchand; +Cc: jerinj, dev, stable On 12/1/20 9:46 PM, Nithin Dabilpuram wrote: > In order to save DMA entries limited by kernel both for externel > memory and hugepage memory, an attempt was made to map physically > contiguous memory in one go. This cannot be done as VFIO IOMMU type1 > does not support partially unmapping a previously mapped memory > region while Heap can request for multi page mapping and > partial unmapping. > Hence for going back to old method of mapping/unmapping at > memseg granularity, this commit reverts > commit d1c7c0cdf7ba ("vfio: map contiguous areas in one go") > > Also add documentation on what module parameter needs to be used > to increase the per-container dma map limit for VFIO. > > Fixes: d1c7c0cdf7ba ("vfio: map contiguous areas in one go") > Cc: anatoly.burakov@intel.com > Cc: stable@dpdk.org > > Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> > Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> > --- > doc/guides/linux_gsg/linux_drivers.rst | 10 ++++++ > lib/librte_eal/linux/eal_vfio.c | 59 +++++----------------------------- > 2 files changed, 18 insertions(+), 51 deletions(-) > > diff --git a/doc/guides/linux_gsg/linux_drivers.rst b/doc/guides/linux_gsg/linux_drivers.rst > index 90635a4..9a662a7 100644 > --- a/doc/guides/linux_gsg/linux_drivers.rst > +++ b/doc/guides/linux_gsg/linux_drivers.rst > @@ -25,6 +25,16 @@ To make use of VFIO, the ``vfio-pci`` module must be loaded: > VFIO kernel is usually present by default in all distributions, > however please consult your distributions documentation to make sure that is the case. > > +For DMA mapping of either external memory or hugepages, VFIO interface is used. > +VFIO does not support partial unmap of once mapped memory. Hence DPDK's memory is > +mapped in hugepage granularity or system page granularity. Number of DMA > +mappings is limited by kernel with user locked memory limit of a process(rlimit) > +for system/hugepage memory. Another per-container overall limit applicable both > +for external memory and system memory was added in kernel 5.1 defined by > +VFIO module parameter ``dma_entry_limit`` with a default value of 64K. > +When application is out of DMA entries, these limits need to be adjusted to > +increase the allowed limit. > + > Since Linux version 5.7, > the ``vfio-pci`` module supports the creation of virtual functions. > After the PF is bound to ``vfio-pci`` module, > diff --git a/lib/librte_eal/linux/eal_vfio.c b/lib/librte_eal/linux/eal_vfio.c > index 0500824..64b134d 100644 > --- a/lib/librte_eal/linux/eal_vfio.c > +++ b/lib/librte_eal/linux/eal_vfio.c > @@ -517,11 +517,9 @@ static void > vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len, > void *arg __rte_unused) > { > - rte_iova_t iova_start, iova_expected; > struct rte_memseg_list *msl; > struct rte_memseg *ms; > size_t cur_len = 0; > - uint64_t va_start; > > msl = rte_mem_virt2memseg_list(addr); > > @@ -539,63 +537,22 @@ vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len, > > /* memsegs are contiguous in memory */ > ms = rte_mem_virt2memseg(addr, msl); > - > - /* > - * This memory is not guaranteed to be contiguous, but it still could > - * be, or it could have some small contiguous chunks. Since the number > - * of VFIO mappings is limited, and VFIO appears to not concatenate > - * adjacent mappings, we have to do this ourselves. > - * > - * So, find contiguous chunks, then map them. > - */ > - va_start = ms->addr_64; > - iova_start = iova_expected = ms->iova; > while (cur_len < len) { > - bool new_contig_area = ms->iova != iova_expected; > - bool last_seg = (len - cur_len) == ms->len; > - bool skip_last = false; > - > - /* only do mappings when current contiguous area ends */ > - if (new_contig_area) { > - if (type == RTE_MEM_EVENT_ALLOC) > - vfio_dma_mem_map(default_vfio_cfg, va_start, > - iova_start, > - iova_expected - iova_start, 1); > - else > - vfio_dma_mem_map(default_vfio_cfg, va_start, > - iova_start, > - iova_expected - iova_start, 0); > - va_start = ms->addr_64; > - iova_start = ms->iova; > - } > /* some memory segments may have invalid IOVA */ > if (ms->iova == RTE_BAD_IOVA) { > RTE_LOG(DEBUG, EAL, "Memory segment at %p has bad IOVA, skipping\n", > ms->addr); > - skip_last = true; > + goto next; > } > - iova_expected = ms->iova + ms->len; > + if (type == RTE_MEM_EVENT_ALLOC) > + vfio_dma_mem_map(default_vfio_cfg, ms->addr_64, > + ms->iova, ms->len, 1); > + else > + vfio_dma_mem_map(default_vfio_cfg, ms->addr_64, > + ms->iova, ms->len, 0); > +next: > cur_len += ms->len; > ++ms; > - > - /* > - * don't count previous segment, and don't attempt to > - * dereference a potentially invalid pointer. > - */ > - if (skip_last && !last_seg) { > - iova_expected = iova_start = ms->iova; > - va_start = ms->addr_64; > - } else if (!skip_last && last_seg) { > - /* this is the last segment and we're not skipping */ > - if (type == RTE_MEM_EVENT_ALLOC) > - vfio_dma_mem_map(default_vfio_cfg, va_start, > - iova_start, > - iova_expected - iova_start, 1); > - else > - vfio_dma_mem_map(default_vfio_cfg, va_start, > - iova_start, > - iova_expected - iova_start, 0); > - } > } > } > Acked-by: David Christensen <drc@linux.vnet.ibm.com> ^ permalink raw reply [flat|nested] 43+ messages in thread
* [dpdk-stable] [PATCH v4 2/4] vfio: fix DMA mapping granularity for type1 IOVA as VA [not found] ` <20201202054647.3449-1-ndabilpuram@marvell.com> 2020-12-02 5:46 ` [dpdk-stable] [PATCH v4 1/4] vfio: revert changes for map contiguous areas in one go Nithin Dabilpuram @ 2020-12-02 5:46 ` Nithin Dabilpuram 2020-12-02 18:38 ` David Christensen 1 sibling, 1 reply; 43+ messages in thread From: Nithin Dabilpuram @ 2020-12-02 5:46 UTC (permalink / raw) To: anatoly.burakov, David Christensen, david.marchand Cc: jerinj, dev, Nithin Dabilpuram, stable Partial unmapping is not supported for VFIO IOMMU type1 by kernel. Though kernel gives return as zero, the unmapped size returned will not be same as expected. So check for returned unmap size and return error. For IOVA as PA, DMA mapping is already at memseg size granularity. Do the same even for IOVA as VA mode as DMA map/unmap triggered by heap allocations, maintain granularity of memseg page size so that heap expansion and contraction does not have this issue. For user requested DMA map/unmap disallow partial unmapping for VFIO type1. Fixes: 73a639085938 ("vfio: allow to map other memory regions") Cc: anatoly.burakov@intel.com Cc: stable@dpdk.org Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> --- lib/librte_eal/linux/eal_vfio.c | 34 ++++++++++++++++++++++++++++------ lib/librte_eal/linux/eal_vfio.h | 1 + 2 files changed, 29 insertions(+), 6 deletions(-) diff --git a/lib/librte_eal/linux/eal_vfio.c b/lib/librte_eal/linux/eal_vfio.c index 64b134d..b15b758 100644 --- a/lib/librte_eal/linux/eal_vfio.c +++ b/lib/librte_eal/linux/eal_vfio.c @@ -70,6 +70,7 @@ static const struct vfio_iommu_type iommu_types[] = { { .type_id = RTE_VFIO_TYPE1, .name = "Type 1", + .partial_unmap = false, .dma_map_func = &vfio_type1_dma_map, .dma_user_map_func = &vfio_type1_dma_mem_map }, @@ -77,6 +78,7 @@ static const struct vfio_iommu_type iommu_types[] = { { .type_id = RTE_VFIO_SPAPR, .name = "sPAPR", + .partial_unmap = true, .dma_map_func = &vfio_spapr_dma_map, .dma_user_map_func = &vfio_spapr_dma_mem_map }, @@ -84,6 +86,7 @@ static const struct vfio_iommu_type iommu_types[] = { { .type_id = RTE_VFIO_NOIOMMU, .name = "No-IOMMU", + .partial_unmap = true, .dma_map_func = &vfio_noiommu_dma_map, .dma_user_map_func = &vfio_noiommu_dma_mem_map }, @@ -526,12 +529,19 @@ vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len, /* for IOVA as VA mode, no need to care for IOVA addresses */ if (rte_eal_iova_mode() == RTE_IOVA_VA && msl->external == 0) { uint64_t vfio_va = (uint64_t)(uintptr_t)addr; - if (type == RTE_MEM_EVENT_ALLOC) - vfio_dma_mem_map(default_vfio_cfg, vfio_va, vfio_va, - len, 1); - else - vfio_dma_mem_map(default_vfio_cfg, vfio_va, vfio_va, - len, 0); + uint64_t page_sz = msl->page_sz; + + /* Maintain granularity of DMA map/unmap to memseg size */ + for (; cur_len < len; cur_len += page_sz) { + if (type == RTE_MEM_EVENT_ALLOC) + vfio_dma_mem_map(default_vfio_cfg, vfio_va, + vfio_va, page_sz, 1); + else + vfio_dma_mem_map(default_vfio_cfg, vfio_va, + vfio_va, page_sz, 0); + vfio_va += page_sz; + } + return; } @@ -1348,6 +1358,12 @@ vfio_type1_dma_mem_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova, RTE_LOG(ERR, EAL, " cannot clear DMA remapping, error %i (%s)\n", errno, strerror(errno)); return -1; + } else if (dma_unmap.size != len) { + RTE_LOG(ERR, EAL, " unexpected size %"PRIu64" of DMA " + "remapping cleared instead of %"PRIu64"\n", + (uint64_t)dma_unmap.size, len); + rte_errno = EIO; + return -1; } } @@ -1823,6 +1839,12 @@ container_dma_unmap(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova, /* we're partially unmapping a previously mapped region, so we * need to split entry into two. */ + if (!vfio_cfg->vfio_iommu_type->partial_unmap) { + RTE_LOG(DEBUG, EAL, "DMA partial unmap unsupported\n"); + rte_errno = ENOTSUP; + ret = -1; + goto out; + } if (user_mem_maps->n_maps == VFIO_MAX_USER_MEM_MAPS) { RTE_LOG(ERR, EAL, "Not enough space to store partial mapping\n"); rte_errno = ENOMEM; diff --git a/lib/librte_eal/linux/eal_vfio.h b/lib/librte_eal/linux/eal_vfio.h index cb2d35f..6ebaca6 100644 --- a/lib/librte_eal/linux/eal_vfio.h +++ b/lib/librte_eal/linux/eal_vfio.h @@ -113,6 +113,7 @@ typedef int (*vfio_dma_user_func_t)(int fd, uint64_t vaddr, uint64_t iova, struct vfio_iommu_type { int type_id; const char *name; + bool partial_unmap; vfio_dma_user_func_t dma_user_map_func; vfio_dma_func_t dma_map_func; }; -- 2.8.4 ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [dpdk-stable] [PATCH v4 2/4] vfio: fix DMA mapping granularity for type1 IOVA as VA 2020-12-02 5:46 ` [dpdk-stable] [PATCH v4 2/4] vfio: fix DMA mapping granularity for type1 IOVA as VA Nithin Dabilpuram @ 2020-12-02 18:38 ` David Christensen 0 siblings, 0 replies; 43+ messages in thread From: David Christensen @ 2020-12-02 18:38 UTC (permalink / raw) To: Nithin Dabilpuram, anatoly.burakov, david.marchand; +Cc: jerinj, dev, stable On 12/1/20 9:46 PM, Nithin Dabilpuram wrote: > Partial unmapping is not supported for VFIO IOMMU type1 > by kernel. Though kernel gives return as zero, the unmapped size > returned will not be same as expected. So check for > returned unmap size and return error. > > For IOVA as PA, DMA mapping is already at memseg size > granularity. Do the same even for IOVA as VA mode as > DMA map/unmap triggered by heap allocations, > maintain granularity of memseg page size so that heap > expansion and contraction does not have this issue. > > For user requested DMA map/unmap disallow partial unmapping > for VFIO type1. > > Fixes: 73a639085938 ("vfio: allow to map other memory regions") > Cc: anatoly.burakov@intel.com > Cc: stable@dpdk.org > > Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> > Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> > --- > lib/librte_eal/linux/eal_vfio.c | 34 ++++++++++++++++++++++++++++------ > lib/librte_eal/linux/eal_vfio.h | 1 + > 2 files changed, 29 insertions(+), 6 deletions(-) > > diff --git a/lib/librte_eal/linux/eal_vfio.c b/lib/librte_eal/linux/eal_vfio.c > index 64b134d..b15b758 100644 > --- a/lib/librte_eal/linux/eal_vfio.c > +++ b/lib/librte_eal/linux/eal_vfio.c > @@ -70,6 +70,7 @@ static const struct vfio_iommu_type iommu_types[] = { > { > .type_id = RTE_VFIO_TYPE1, > .name = "Type 1", > + .partial_unmap = false, > .dma_map_func = &vfio_type1_dma_map, > .dma_user_map_func = &vfio_type1_dma_mem_map > }, > @@ -77,6 +78,7 @@ static const struct vfio_iommu_type iommu_types[] = { > { > .type_id = RTE_VFIO_SPAPR, > .name = "sPAPR", > + .partial_unmap = true, > .dma_map_func = &vfio_spapr_dma_map, > .dma_user_map_func = &vfio_spapr_dma_mem_map > }, > @@ -84,6 +86,7 @@ static const struct vfio_iommu_type iommu_types[] = { > { > .type_id = RTE_VFIO_NOIOMMU, > .name = "No-IOMMU", > + .partial_unmap = true, > .dma_map_func = &vfio_noiommu_dma_map, > .dma_user_map_func = &vfio_noiommu_dma_mem_map > }, > @@ -526,12 +529,19 @@ vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len, > /* for IOVA as VA mode, no need to care for IOVA addresses */ > if (rte_eal_iova_mode() == RTE_IOVA_VA && msl->external == 0) { > uint64_t vfio_va = (uint64_t)(uintptr_t)addr; > - if (type == RTE_MEM_EVENT_ALLOC) > - vfio_dma_mem_map(default_vfio_cfg, vfio_va, vfio_va, > - len, 1); > - else > - vfio_dma_mem_map(default_vfio_cfg, vfio_va, vfio_va, > - len, 0); > + uint64_t page_sz = msl->page_sz; > + > + /* Maintain granularity of DMA map/unmap to memseg size */ > + for (; cur_len < len; cur_len += page_sz) { > + if (type == RTE_MEM_EVENT_ALLOC) > + vfio_dma_mem_map(default_vfio_cfg, vfio_va, > + vfio_va, page_sz, 1); > + else > + vfio_dma_mem_map(default_vfio_cfg, vfio_va, > + vfio_va, page_sz, 0); > + vfio_va += page_sz; > + } > + > return; > } > > @@ -1348,6 +1358,12 @@ vfio_type1_dma_mem_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova, > RTE_LOG(ERR, EAL, " cannot clear DMA remapping, error %i (%s)\n", > errno, strerror(errno)); > return -1; > + } else if (dma_unmap.size != len) { > + RTE_LOG(ERR, EAL, " unexpected size %"PRIu64" of DMA " > + "remapping cleared instead of %"PRIu64"\n", > + (uint64_t)dma_unmap.size, len); > + rte_errno = EIO; > + return -1; > } > } > > @@ -1823,6 +1839,12 @@ container_dma_unmap(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova, > /* we're partially unmapping a previously mapped region, so we > * need to split entry into two. > */ > + if (!vfio_cfg->vfio_iommu_type->partial_unmap) { > + RTE_LOG(DEBUG, EAL, "DMA partial unmap unsupported\n"); > + rte_errno = ENOTSUP; > + ret = -1; > + goto out; > + } > if (user_mem_maps->n_maps == VFIO_MAX_USER_MEM_MAPS) { > RTE_LOG(ERR, EAL, "Not enough space to store partial mapping\n"); > rte_errno = ENOMEM; > diff --git a/lib/librte_eal/linux/eal_vfio.h b/lib/librte_eal/linux/eal_vfio.h > index cb2d35f..6ebaca6 100644 > --- a/lib/librte_eal/linux/eal_vfio.h > +++ b/lib/librte_eal/linux/eal_vfio.h > @@ -113,6 +113,7 @@ typedef int (*vfio_dma_user_func_t)(int fd, uint64_t vaddr, uint64_t iova, > struct vfio_iommu_type { > int type_id; > const char *name; > + bool partial_unmap; > vfio_dma_user_func_t dma_user_map_func; > vfio_dma_func_t dma_map_func; > }; > Acked-by: David Christensen <drc@linux.vnet.ibm.com> ^ permalink raw reply [flat|nested] 43+ messages in thread
[parent not found: <20201214081935.23577-1-ndabilpuram@marvell.com>]
* [dpdk-stable] [PATCH v5 1/4] vfio: revert changes for map contiguous areas in one go [not found] ` <20201214081935.23577-1-ndabilpuram@marvell.com> @ 2020-12-14 8:19 ` Nithin Dabilpuram 2020-12-14 8:19 ` [dpdk-stable] [PATCH v5 2/4] vfio: fix DMA mapping granularity for type1 IOVA as VA Nithin Dabilpuram 1 sibling, 0 replies; 43+ messages in thread From: Nithin Dabilpuram @ 2020-12-14 8:19 UTC (permalink / raw) To: anatoly.burakov, David Christensen, david.marchand Cc: jerinj, dev, Nithin Dabilpuram, stable In order to save DMA entries limited by kernel both for externel memory and hugepage memory, an attempt was made to map physically contiguous memory in one go. This cannot be done as VFIO IOMMU type1 does not support partially unmapping a previously mapped memory region while Heap can request for multi page mapping and partial unmapping. Hence for going back to old method of mapping/unmapping at memseg granularity, this commit reverts commit d1c7c0cdf7ba ("vfio: map contiguous areas in one go") Also add documentation on what module parameter needs to be used to increase the per-container dma map limit for VFIO. Fixes: d1c7c0cdf7ba ("vfio: map contiguous areas in one go") Cc: anatoly.burakov@intel.com Cc: stable@dpdk.org Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: David Christensen <drc@linux.vnet.ibm.com> --- doc/guides/linux_gsg/linux_drivers.rst | 10 ++++++ lib/librte_eal/linux/eal_vfio.c | 59 +++++----------------------------- 2 files changed, 18 insertions(+), 51 deletions(-) diff --git a/doc/guides/linux_gsg/linux_drivers.rst b/doc/guides/linux_gsg/linux_drivers.rst index 90635a4..9a662a7 100644 --- a/doc/guides/linux_gsg/linux_drivers.rst +++ b/doc/guides/linux_gsg/linux_drivers.rst @@ -25,6 +25,16 @@ To make use of VFIO, the ``vfio-pci`` module must be loaded: VFIO kernel is usually present by default in all distributions, however please consult your distributions documentation to make sure that is the case. +For DMA mapping of either external memory or hugepages, VFIO interface is used. +VFIO does not support partial unmap of once mapped memory. Hence DPDK's memory is +mapped in hugepage granularity or system page granularity. Number of DMA +mappings is limited by kernel with user locked memory limit of a process(rlimit) +for system/hugepage memory. Another per-container overall limit applicable both +for external memory and system memory was added in kernel 5.1 defined by +VFIO module parameter ``dma_entry_limit`` with a default value of 64K. +When application is out of DMA entries, these limits need to be adjusted to +increase the allowed limit. + Since Linux version 5.7, the ``vfio-pci`` module supports the creation of virtual functions. After the PF is bound to ``vfio-pci`` module, diff --git a/lib/librte_eal/linux/eal_vfio.c b/lib/librte_eal/linux/eal_vfio.c index 0500824..64b134d 100644 --- a/lib/librte_eal/linux/eal_vfio.c +++ b/lib/librte_eal/linux/eal_vfio.c @@ -517,11 +517,9 @@ static void vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len, void *arg __rte_unused) { - rte_iova_t iova_start, iova_expected; struct rte_memseg_list *msl; struct rte_memseg *ms; size_t cur_len = 0; - uint64_t va_start; msl = rte_mem_virt2memseg_list(addr); @@ -539,63 +537,22 @@ vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len, /* memsegs are contiguous in memory */ ms = rte_mem_virt2memseg(addr, msl); - - /* - * This memory is not guaranteed to be contiguous, but it still could - * be, or it could have some small contiguous chunks. Since the number - * of VFIO mappings is limited, and VFIO appears to not concatenate - * adjacent mappings, we have to do this ourselves. - * - * So, find contiguous chunks, then map them. - */ - va_start = ms->addr_64; - iova_start = iova_expected = ms->iova; while (cur_len < len) { - bool new_contig_area = ms->iova != iova_expected; - bool last_seg = (len - cur_len) == ms->len; - bool skip_last = false; - - /* only do mappings when current contiguous area ends */ - if (new_contig_area) { - if (type == RTE_MEM_EVENT_ALLOC) - vfio_dma_mem_map(default_vfio_cfg, va_start, - iova_start, - iova_expected - iova_start, 1); - else - vfio_dma_mem_map(default_vfio_cfg, va_start, - iova_start, - iova_expected - iova_start, 0); - va_start = ms->addr_64; - iova_start = ms->iova; - } /* some memory segments may have invalid IOVA */ if (ms->iova == RTE_BAD_IOVA) { RTE_LOG(DEBUG, EAL, "Memory segment at %p has bad IOVA, skipping\n", ms->addr); - skip_last = true; + goto next; } - iova_expected = ms->iova + ms->len; + if (type == RTE_MEM_EVENT_ALLOC) + vfio_dma_mem_map(default_vfio_cfg, ms->addr_64, + ms->iova, ms->len, 1); + else + vfio_dma_mem_map(default_vfio_cfg, ms->addr_64, + ms->iova, ms->len, 0); +next: cur_len += ms->len; ++ms; - - /* - * don't count previous segment, and don't attempt to - * dereference a potentially invalid pointer. - */ - if (skip_last && !last_seg) { - iova_expected = iova_start = ms->iova; - va_start = ms->addr_64; - } else if (!skip_last && last_seg) { - /* this is the last segment and we're not skipping */ - if (type == RTE_MEM_EVENT_ALLOC) - vfio_dma_mem_map(default_vfio_cfg, va_start, - iova_start, - iova_expected - iova_start, 1); - else - vfio_dma_mem_map(default_vfio_cfg, va_start, - iova_start, - iova_expected - iova_start, 0); - } } } -- 2.8.4 ^ permalink raw reply [flat|nested] 43+ messages in thread
* [dpdk-stable] [PATCH v5 2/4] vfio: fix DMA mapping granularity for type1 IOVA as VA [not found] ` <20201214081935.23577-1-ndabilpuram@marvell.com> 2020-12-14 8:19 ` [dpdk-stable] [PATCH v5 1/4] vfio: revert changes for map contiguous areas in one go Nithin Dabilpuram @ 2020-12-14 8:19 ` Nithin Dabilpuram 1 sibling, 0 replies; 43+ messages in thread From: Nithin Dabilpuram @ 2020-12-14 8:19 UTC (permalink / raw) To: anatoly.burakov, David Christensen, david.marchand Cc: jerinj, dev, Nithin Dabilpuram, stable Partial unmapping is not supported for VFIO IOMMU type1 by kernel. Though kernel gives return as zero, the unmapped size returned will not be same as expected. So check for returned unmap size and return error. For IOVA as PA, DMA mapping is already at memseg size granularity. Do the same even for IOVA as VA mode as DMA map/unmap triggered by heap allocations, maintain granularity of memseg page size so that heap expansion and contraction does not have this issue. For user requested DMA map/unmap disallow partial unmapping for VFIO type1. Fixes: 73a639085938 ("vfio: allow to map other memory regions") Cc: anatoly.burakov@intel.com Cc: stable@dpdk.org Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: David Christensen <drc@linux.vnet.ibm.com> --- lib/librte_eal/linux/eal_vfio.c | 34 ++++++++++++++++++++++++++++------ lib/librte_eal/linux/eal_vfio.h | 1 + 2 files changed, 29 insertions(+), 6 deletions(-) diff --git a/lib/librte_eal/linux/eal_vfio.c b/lib/librte_eal/linux/eal_vfio.c index 64b134d..b15b758 100644 --- a/lib/librte_eal/linux/eal_vfio.c +++ b/lib/librte_eal/linux/eal_vfio.c @@ -70,6 +70,7 @@ static const struct vfio_iommu_type iommu_types[] = { { .type_id = RTE_VFIO_TYPE1, .name = "Type 1", + .partial_unmap = false, .dma_map_func = &vfio_type1_dma_map, .dma_user_map_func = &vfio_type1_dma_mem_map }, @@ -77,6 +78,7 @@ static const struct vfio_iommu_type iommu_types[] = { { .type_id = RTE_VFIO_SPAPR, .name = "sPAPR", + .partial_unmap = true, .dma_map_func = &vfio_spapr_dma_map, .dma_user_map_func = &vfio_spapr_dma_mem_map }, @@ -84,6 +86,7 @@ static const struct vfio_iommu_type iommu_types[] = { { .type_id = RTE_VFIO_NOIOMMU, .name = "No-IOMMU", + .partial_unmap = true, .dma_map_func = &vfio_noiommu_dma_map, .dma_user_map_func = &vfio_noiommu_dma_mem_map }, @@ -526,12 +529,19 @@ vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len, /* for IOVA as VA mode, no need to care for IOVA addresses */ if (rte_eal_iova_mode() == RTE_IOVA_VA && msl->external == 0) { uint64_t vfio_va = (uint64_t)(uintptr_t)addr; - if (type == RTE_MEM_EVENT_ALLOC) - vfio_dma_mem_map(default_vfio_cfg, vfio_va, vfio_va, - len, 1); - else - vfio_dma_mem_map(default_vfio_cfg, vfio_va, vfio_va, - len, 0); + uint64_t page_sz = msl->page_sz; + + /* Maintain granularity of DMA map/unmap to memseg size */ + for (; cur_len < len; cur_len += page_sz) { + if (type == RTE_MEM_EVENT_ALLOC) + vfio_dma_mem_map(default_vfio_cfg, vfio_va, + vfio_va, page_sz, 1); + else + vfio_dma_mem_map(default_vfio_cfg, vfio_va, + vfio_va, page_sz, 0); + vfio_va += page_sz; + } + return; } @@ -1348,6 +1358,12 @@ vfio_type1_dma_mem_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova, RTE_LOG(ERR, EAL, " cannot clear DMA remapping, error %i (%s)\n", errno, strerror(errno)); return -1; + } else if (dma_unmap.size != len) { + RTE_LOG(ERR, EAL, " unexpected size %"PRIu64" of DMA " + "remapping cleared instead of %"PRIu64"\n", + (uint64_t)dma_unmap.size, len); + rte_errno = EIO; + return -1; } } @@ -1823,6 +1839,12 @@ container_dma_unmap(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova, /* we're partially unmapping a previously mapped region, so we * need to split entry into two. */ + if (!vfio_cfg->vfio_iommu_type->partial_unmap) { + RTE_LOG(DEBUG, EAL, "DMA partial unmap unsupported\n"); + rte_errno = ENOTSUP; + ret = -1; + goto out; + } if (user_mem_maps->n_maps == VFIO_MAX_USER_MEM_MAPS) { RTE_LOG(ERR, EAL, "Not enough space to store partial mapping\n"); rte_errno = ENOMEM; diff --git a/lib/librte_eal/linux/eal_vfio.h b/lib/librte_eal/linux/eal_vfio.h index cb2d35f..6ebaca6 100644 --- a/lib/librte_eal/linux/eal_vfio.h +++ b/lib/librte_eal/linux/eal_vfio.h @@ -113,6 +113,7 @@ typedef int (*vfio_dma_user_func_t)(int fd, uint64_t vaddr, uint64_t iova, struct vfio_iommu_type { int type_id; const char *name; + bool partial_unmap; vfio_dma_user_func_t dma_user_map_func; vfio_dma_func_t dma_map_func; }; -- 2.8.4 ^ permalink raw reply [flat|nested] 43+ messages in thread
[parent not found: <20201217190604.29803-1-ndabilpuram@marvell.com>]
* [dpdk-stable] [PATCH v6 1/4] vfio: revert changes for map contiguous areas in one go [not found] ` <20201217190604.29803-1-ndabilpuram@marvell.com> @ 2020-12-17 19:06 ` Nithin Dabilpuram 2020-12-17 19:06 ` [dpdk-stable] [PATCH v6 2/4] vfio: fix DMA mapping granularity for type1 IOVA as VA Nithin Dabilpuram 1 sibling, 0 replies; 43+ messages in thread From: Nithin Dabilpuram @ 2020-12-17 19:06 UTC (permalink / raw) To: anatoly.burakov, David Christensen, david.marchand Cc: jerinj, dev, Nithin Dabilpuram, stable In order to save DMA entries limited by kernel both for externel memory and hugepage memory, an attempt was made to map physically contiguous memory in one go. This cannot be done as VFIO IOMMU type1 does not support partially unmapping a previously mapped memory region while Heap can request for multi page mapping and partial unmapping. Hence for going back to old method of mapping/unmapping at memseg granularity, this commit reverts commit d1c7c0cdf7ba ("vfio: map contiguous areas in one go") Also add documentation on what module parameter needs to be used to increase the per-container dma map limit for VFIO. Fixes: d1c7c0cdf7ba ("vfio: map contiguous areas in one go") Cc: anatoly.burakov@intel.com Cc: stable@dpdk.org Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: David Christensen <drc@linux.vnet.ibm.com> --- doc/guides/linux_gsg/linux_drivers.rst | 10 ++++++ lib/librte_eal/linux/eal_vfio.c | 59 +++++----------------------------- 2 files changed, 18 insertions(+), 51 deletions(-) diff --git a/doc/guides/linux_gsg/linux_drivers.rst b/doc/guides/linux_gsg/linux_drivers.rst index 90635a4..9a662a7 100644 --- a/doc/guides/linux_gsg/linux_drivers.rst +++ b/doc/guides/linux_gsg/linux_drivers.rst @@ -25,6 +25,16 @@ To make use of VFIO, the ``vfio-pci`` module must be loaded: VFIO kernel is usually present by default in all distributions, however please consult your distributions documentation to make sure that is the case. +For DMA mapping of either external memory or hugepages, VFIO interface is used. +VFIO does not support partial unmap of once mapped memory. Hence DPDK's memory is +mapped in hugepage granularity or system page granularity. Number of DMA +mappings is limited by kernel with user locked memory limit of a process(rlimit) +for system/hugepage memory. Another per-container overall limit applicable both +for external memory and system memory was added in kernel 5.1 defined by +VFIO module parameter ``dma_entry_limit`` with a default value of 64K. +When application is out of DMA entries, these limits need to be adjusted to +increase the allowed limit. + Since Linux version 5.7, the ``vfio-pci`` module supports the creation of virtual functions. After the PF is bound to ``vfio-pci`` module, diff --git a/lib/librte_eal/linux/eal_vfio.c b/lib/librte_eal/linux/eal_vfio.c index 0500824..64b134d 100644 --- a/lib/librte_eal/linux/eal_vfio.c +++ b/lib/librte_eal/linux/eal_vfio.c @@ -517,11 +517,9 @@ static void vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len, void *arg __rte_unused) { - rte_iova_t iova_start, iova_expected; struct rte_memseg_list *msl; struct rte_memseg *ms; size_t cur_len = 0; - uint64_t va_start; msl = rte_mem_virt2memseg_list(addr); @@ -539,63 +537,22 @@ vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len, /* memsegs are contiguous in memory */ ms = rte_mem_virt2memseg(addr, msl); - - /* - * This memory is not guaranteed to be contiguous, but it still could - * be, or it could have some small contiguous chunks. Since the number - * of VFIO mappings is limited, and VFIO appears to not concatenate - * adjacent mappings, we have to do this ourselves. - * - * So, find contiguous chunks, then map them. - */ - va_start = ms->addr_64; - iova_start = iova_expected = ms->iova; while (cur_len < len) { - bool new_contig_area = ms->iova != iova_expected; - bool last_seg = (len - cur_len) == ms->len; - bool skip_last = false; - - /* only do mappings when current contiguous area ends */ - if (new_contig_area) { - if (type == RTE_MEM_EVENT_ALLOC) - vfio_dma_mem_map(default_vfio_cfg, va_start, - iova_start, - iova_expected - iova_start, 1); - else - vfio_dma_mem_map(default_vfio_cfg, va_start, - iova_start, - iova_expected - iova_start, 0); - va_start = ms->addr_64; - iova_start = ms->iova; - } /* some memory segments may have invalid IOVA */ if (ms->iova == RTE_BAD_IOVA) { RTE_LOG(DEBUG, EAL, "Memory segment at %p has bad IOVA, skipping\n", ms->addr); - skip_last = true; + goto next; } - iova_expected = ms->iova + ms->len; + if (type == RTE_MEM_EVENT_ALLOC) + vfio_dma_mem_map(default_vfio_cfg, ms->addr_64, + ms->iova, ms->len, 1); + else + vfio_dma_mem_map(default_vfio_cfg, ms->addr_64, + ms->iova, ms->len, 0); +next: cur_len += ms->len; ++ms; - - /* - * don't count previous segment, and don't attempt to - * dereference a potentially invalid pointer. - */ - if (skip_last && !last_seg) { - iova_expected = iova_start = ms->iova; - va_start = ms->addr_64; - } else if (!skip_last && last_seg) { - /* this is the last segment and we're not skipping */ - if (type == RTE_MEM_EVENT_ALLOC) - vfio_dma_mem_map(default_vfio_cfg, va_start, - iova_start, - iova_expected - iova_start, 1); - else - vfio_dma_mem_map(default_vfio_cfg, va_start, - iova_start, - iova_expected - iova_start, 0); - } } } -- 2.8.4 ^ permalink raw reply [flat|nested] 43+ messages in thread
* [dpdk-stable] [PATCH v6 2/4] vfio: fix DMA mapping granularity for type1 IOVA as VA [not found] ` <20201217190604.29803-1-ndabilpuram@marvell.com> 2020-12-17 19:06 ` [dpdk-stable] [PATCH v6 1/4] vfio: revert changes for map contiguous areas in one go Nithin Dabilpuram @ 2020-12-17 19:06 ` Nithin Dabilpuram 1 sibling, 0 replies; 43+ messages in thread From: Nithin Dabilpuram @ 2020-12-17 19:06 UTC (permalink / raw) To: anatoly.burakov, David Christensen, david.marchand Cc: jerinj, dev, Nithin Dabilpuram, stable Partial unmapping is not supported for VFIO IOMMU type1 by kernel. Though kernel gives return as zero, the unmapped size returned will not be same as expected. So check for returned unmap size and return error. For IOVA as PA, DMA mapping is already at memseg size granularity. Do the same even for IOVA as VA mode as DMA map/unmap triggered by heap allocations, maintain granularity of memseg page size so that heap expansion and contraction does not have this issue. For user requested DMA map/unmap disallow partial unmapping for VFIO type1. Fixes: 73a639085938 ("vfio: allow to map other memory regions") Cc: anatoly.burakov@intel.com Cc: stable@dpdk.org Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: David Christensen <drc@linux.vnet.ibm.com> --- lib/librte_eal/linux/eal_vfio.c | 34 ++++++++++++++++++++++++++++------ lib/librte_eal/linux/eal_vfio.h | 1 + 2 files changed, 29 insertions(+), 6 deletions(-) diff --git a/lib/librte_eal/linux/eal_vfio.c b/lib/librte_eal/linux/eal_vfio.c index 64b134d..b15b758 100644 --- a/lib/librte_eal/linux/eal_vfio.c +++ b/lib/librte_eal/linux/eal_vfio.c @@ -70,6 +70,7 @@ static const struct vfio_iommu_type iommu_types[] = { { .type_id = RTE_VFIO_TYPE1, .name = "Type 1", + .partial_unmap = false, .dma_map_func = &vfio_type1_dma_map, .dma_user_map_func = &vfio_type1_dma_mem_map }, @@ -77,6 +78,7 @@ static const struct vfio_iommu_type iommu_types[] = { { .type_id = RTE_VFIO_SPAPR, .name = "sPAPR", + .partial_unmap = true, .dma_map_func = &vfio_spapr_dma_map, .dma_user_map_func = &vfio_spapr_dma_mem_map }, @@ -84,6 +86,7 @@ static const struct vfio_iommu_type iommu_types[] = { { .type_id = RTE_VFIO_NOIOMMU, .name = "No-IOMMU", + .partial_unmap = true, .dma_map_func = &vfio_noiommu_dma_map, .dma_user_map_func = &vfio_noiommu_dma_mem_map }, @@ -526,12 +529,19 @@ vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len, /* for IOVA as VA mode, no need to care for IOVA addresses */ if (rte_eal_iova_mode() == RTE_IOVA_VA && msl->external == 0) { uint64_t vfio_va = (uint64_t)(uintptr_t)addr; - if (type == RTE_MEM_EVENT_ALLOC) - vfio_dma_mem_map(default_vfio_cfg, vfio_va, vfio_va, - len, 1); - else - vfio_dma_mem_map(default_vfio_cfg, vfio_va, vfio_va, - len, 0); + uint64_t page_sz = msl->page_sz; + + /* Maintain granularity of DMA map/unmap to memseg size */ + for (; cur_len < len; cur_len += page_sz) { + if (type == RTE_MEM_EVENT_ALLOC) + vfio_dma_mem_map(default_vfio_cfg, vfio_va, + vfio_va, page_sz, 1); + else + vfio_dma_mem_map(default_vfio_cfg, vfio_va, + vfio_va, page_sz, 0); + vfio_va += page_sz; + } + return; } @@ -1348,6 +1358,12 @@ vfio_type1_dma_mem_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova, RTE_LOG(ERR, EAL, " cannot clear DMA remapping, error %i (%s)\n", errno, strerror(errno)); return -1; + } else if (dma_unmap.size != len) { + RTE_LOG(ERR, EAL, " unexpected size %"PRIu64" of DMA " + "remapping cleared instead of %"PRIu64"\n", + (uint64_t)dma_unmap.size, len); + rte_errno = EIO; + return -1; } } @@ -1823,6 +1839,12 @@ container_dma_unmap(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova, /* we're partially unmapping a previously mapped region, so we * need to split entry into two. */ + if (!vfio_cfg->vfio_iommu_type->partial_unmap) { + RTE_LOG(DEBUG, EAL, "DMA partial unmap unsupported\n"); + rte_errno = ENOTSUP; + ret = -1; + goto out; + } if (user_mem_maps->n_maps == VFIO_MAX_USER_MEM_MAPS) { RTE_LOG(ERR, EAL, "Not enough space to store partial mapping\n"); rte_errno = ENOMEM; diff --git a/lib/librte_eal/linux/eal_vfio.h b/lib/librte_eal/linux/eal_vfio.h index cb2d35f..6ebaca6 100644 --- a/lib/librte_eal/linux/eal_vfio.h +++ b/lib/librte_eal/linux/eal_vfio.h @@ -113,6 +113,7 @@ typedef int (*vfio_dma_user_func_t)(int fd, uint64_t vaddr, uint64_t iova, struct vfio_iommu_type { int type_id; const char *name; + bool partial_unmap; vfio_dma_user_func_t dma_user_map_func; vfio_dma_func_t dma_map_func; }; -- 2.8.4 ^ permalink raw reply [flat|nested] 43+ messages in thread
[parent not found: <20210112173923.30320-1-ndabilpuram@marvell.com>]
* [dpdk-stable] [PATCH v7 1/3] vfio: revert changes for map contiguous areas in one go [not found] ` <20210112173923.30320-1-ndabilpuram@marvell.com> @ 2021-01-12 17:39 ` Nithin Dabilpuram 2021-01-12 17:39 ` [dpdk-stable] [PATCH v7 2/3] vfio: fix DMA mapping granularity for type1 IOVA as VA Nithin Dabilpuram 1 sibling, 0 replies; 43+ messages in thread From: Nithin Dabilpuram @ 2021-01-12 17:39 UTC (permalink / raw) To: anatoly.burakov, David Christensen, david.marchand Cc: jerinj, dev, Nithin Dabilpuram, stable In order to save DMA entries limited by kernel both for externel memory and hugepage memory, an attempt was made to map physically contiguous memory in one go. This cannot be done as VFIO IOMMU type1 does not support partially unmapping a previously mapped memory region while Heap can request for multi page mapping and partial unmapping. Hence for going back to old method of mapping/unmapping at memseg granularity, this commit reverts commit d1c7c0cdf7ba ("vfio: map contiguous areas in one go") Also add documentation on what module parameter needs to be used to increase the per-container dma map limit for VFIO. Fixes: d1c7c0cdf7ba ("vfio: map contiguous areas in one go") Cc: anatoly.burakov@intel.com Cc: stable@dpdk.org Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: David Christensen <drc@linux.vnet.ibm.com> --- doc/guides/linux_gsg/linux_drivers.rst | 10 ++++++ lib/librte_eal/linux/eal_vfio.c | 59 +++++----------------------------- 2 files changed, 18 insertions(+), 51 deletions(-) diff --git a/doc/guides/linux_gsg/linux_drivers.rst b/doc/guides/linux_gsg/linux_drivers.rst index 90635a4..9a662a7 100644 --- a/doc/guides/linux_gsg/linux_drivers.rst +++ b/doc/guides/linux_gsg/linux_drivers.rst @@ -25,6 +25,16 @@ To make use of VFIO, the ``vfio-pci`` module must be loaded: VFIO kernel is usually present by default in all distributions, however please consult your distributions documentation to make sure that is the case. +For DMA mapping of either external memory or hugepages, VFIO interface is used. +VFIO does not support partial unmap of once mapped memory. Hence DPDK's memory is +mapped in hugepage granularity or system page granularity. Number of DMA +mappings is limited by kernel with user locked memory limit of a process(rlimit) +for system/hugepage memory. Another per-container overall limit applicable both +for external memory and system memory was added in kernel 5.1 defined by +VFIO module parameter ``dma_entry_limit`` with a default value of 64K. +When application is out of DMA entries, these limits need to be adjusted to +increase the allowed limit. + Since Linux version 5.7, the ``vfio-pci`` module supports the creation of virtual functions. After the PF is bound to ``vfio-pci`` module, diff --git a/lib/librte_eal/linux/eal_vfio.c b/lib/librte_eal/linux/eal_vfio.c index 0500824..64b134d 100644 --- a/lib/librte_eal/linux/eal_vfio.c +++ b/lib/librte_eal/linux/eal_vfio.c @@ -517,11 +517,9 @@ static void vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len, void *arg __rte_unused) { - rte_iova_t iova_start, iova_expected; struct rte_memseg_list *msl; struct rte_memseg *ms; size_t cur_len = 0; - uint64_t va_start; msl = rte_mem_virt2memseg_list(addr); @@ -539,63 +537,22 @@ vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len, /* memsegs are contiguous in memory */ ms = rte_mem_virt2memseg(addr, msl); - - /* - * This memory is not guaranteed to be contiguous, but it still could - * be, or it could have some small contiguous chunks. Since the number - * of VFIO mappings is limited, and VFIO appears to not concatenate - * adjacent mappings, we have to do this ourselves. - * - * So, find contiguous chunks, then map them. - */ - va_start = ms->addr_64; - iova_start = iova_expected = ms->iova; while (cur_len < len) { - bool new_contig_area = ms->iova != iova_expected; - bool last_seg = (len - cur_len) == ms->len; - bool skip_last = false; - - /* only do mappings when current contiguous area ends */ - if (new_contig_area) { - if (type == RTE_MEM_EVENT_ALLOC) - vfio_dma_mem_map(default_vfio_cfg, va_start, - iova_start, - iova_expected - iova_start, 1); - else - vfio_dma_mem_map(default_vfio_cfg, va_start, - iova_start, - iova_expected - iova_start, 0); - va_start = ms->addr_64; - iova_start = ms->iova; - } /* some memory segments may have invalid IOVA */ if (ms->iova == RTE_BAD_IOVA) { RTE_LOG(DEBUG, EAL, "Memory segment at %p has bad IOVA, skipping\n", ms->addr); - skip_last = true; + goto next; } - iova_expected = ms->iova + ms->len; + if (type == RTE_MEM_EVENT_ALLOC) + vfio_dma_mem_map(default_vfio_cfg, ms->addr_64, + ms->iova, ms->len, 1); + else + vfio_dma_mem_map(default_vfio_cfg, ms->addr_64, + ms->iova, ms->len, 0); +next: cur_len += ms->len; ++ms; - - /* - * don't count previous segment, and don't attempt to - * dereference a potentially invalid pointer. - */ - if (skip_last && !last_seg) { - iova_expected = iova_start = ms->iova; - va_start = ms->addr_64; - } else if (!skip_last && last_seg) { - /* this is the last segment and we're not skipping */ - if (type == RTE_MEM_EVENT_ALLOC) - vfio_dma_mem_map(default_vfio_cfg, va_start, - iova_start, - iova_expected - iova_start, 1); - else - vfio_dma_mem_map(default_vfio_cfg, va_start, - iova_start, - iova_expected - iova_start, 0); - } } } -- 2.8.4 ^ permalink raw reply [flat|nested] 43+ messages in thread
* [dpdk-stable] [PATCH v7 2/3] vfio: fix DMA mapping granularity for type1 IOVA as VA [not found] ` <20210112173923.30320-1-ndabilpuram@marvell.com> 2021-01-12 17:39 ` [dpdk-stable] [PATCH v7 1/3] vfio: revert changes for map contiguous areas in one go Nithin Dabilpuram @ 2021-01-12 17:39 ` Nithin Dabilpuram 1 sibling, 0 replies; 43+ messages in thread From: Nithin Dabilpuram @ 2021-01-12 17:39 UTC (permalink / raw) To: anatoly.burakov, David Christensen, david.marchand Cc: jerinj, dev, Nithin Dabilpuram, stable Partial unmapping is not supported for VFIO IOMMU type1 by kernel. Though kernel gives return as zero, the unmapped size returned will not be same as expected. So check for returned unmap size and return error. For IOVA as PA, DMA mapping is already at memseg size granularity. Do the same even for IOVA as VA mode as DMA map/unmap triggered by heap allocations, maintain granularity of memseg page size so that heap expansion and contraction does not have this issue. For user requested DMA map/unmap disallow partial unmapping for VFIO type1. Fixes: 73a639085938 ("vfio: allow to map other memory regions") Cc: anatoly.burakov@intel.com Cc: stable@dpdk.org Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: David Christensen <drc@linux.vnet.ibm.com> --- lib/librte_eal/linux/eal_vfio.c | 34 ++++++++++++++++++++++++++++------ lib/librte_eal/linux/eal_vfio.h | 1 + 2 files changed, 29 insertions(+), 6 deletions(-) diff --git a/lib/librte_eal/linux/eal_vfio.c b/lib/librte_eal/linux/eal_vfio.c index 64b134d..b15b758 100644 --- a/lib/librte_eal/linux/eal_vfio.c +++ b/lib/librte_eal/linux/eal_vfio.c @@ -70,6 +70,7 @@ static const struct vfio_iommu_type iommu_types[] = { { .type_id = RTE_VFIO_TYPE1, .name = "Type 1", + .partial_unmap = false, .dma_map_func = &vfio_type1_dma_map, .dma_user_map_func = &vfio_type1_dma_mem_map }, @@ -77,6 +78,7 @@ static const struct vfio_iommu_type iommu_types[] = { { .type_id = RTE_VFIO_SPAPR, .name = "sPAPR", + .partial_unmap = true, .dma_map_func = &vfio_spapr_dma_map, .dma_user_map_func = &vfio_spapr_dma_mem_map }, @@ -84,6 +86,7 @@ static const struct vfio_iommu_type iommu_types[] = { { .type_id = RTE_VFIO_NOIOMMU, .name = "No-IOMMU", + .partial_unmap = true, .dma_map_func = &vfio_noiommu_dma_map, .dma_user_map_func = &vfio_noiommu_dma_mem_map }, @@ -526,12 +529,19 @@ vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len, /* for IOVA as VA mode, no need to care for IOVA addresses */ if (rte_eal_iova_mode() == RTE_IOVA_VA && msl->external == 0) { uint64_t vfio_va = (uint64_t)(uintptr_t)addr; - if (type == RTE_MEM_EVENT_ALLOC) - vfio_dma_mem_map(default_vfio_cfg, vfio_va, vfio_va, - len, 1); - else - vfio_dma_mem_map(default_vfio_cfg, vfio_va, vfio_va, - len, 0); + uint64_t page_sz = msl->page_sz; + + /* Maintain granularity of DMA map/unmap to memseg size */ + for (; cur_len < len; cur_len += page_sz) { + if (type == RTE_MEM_EVENT_ALLOC) + vfio_dma_mem_map(default_vfio_cfg, vfio_va, + vfio_va, page_sz, 1); + else + vfio_dma_mem_map(default_vfio_cfg, vfio_va, + vfio_va, page_sz, 0); + vfio_va += page_sz; + } + return; } @@ -1348,6 +1358,12 @@ vfio_type1_dma_mem_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova, RTE_LOG(ERR, EAL, " cannot clear DMA remapping, error %i (%s)\n", errno, strerror(errno)); return -1; + } else if (dma_unmap.size != len) { + RTE_LOG(ERR, EAL, " unexpected size %"PRIu64" of DMA " + "remapping cleared instead of %"PRIu64"\n", + (uint64_t)dma_unmap.size, len); + rte_errno = EIO; + return -1; } } @@ -1823,6 +1839,12 @@ container_dma_unmap(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova, /* we're partially unmapping a previously mapped region, so we * need to split entry into two. */ + if (!vfio_cfg->vfio_iommu_type->partial_unmap) { + RTE_LOG(DEBUG, EAL, "DMA partial unmap unsupported\n"); + rte_errno = ENOTSUP; + ret = -1; + goto out; + } if (user_mem_maps->n_maps == VFIO_MAX_USER_MEM_MAPS) { RTE_LOG(ERR, EAL, "Not enough space to store partial mapping\n"); rte_errno = ENOMEM; diff --git a/lib/librte_eal/linux/eal_vfio.h b/lib/librte_eal/linux/eal_vfio.h index cb2d35f..6ebaca6 100644 --- a/lib/librte_eal/linux/eal_vfio.h +++ b/lib/librte_eal/linux/eal_vfio.h @@ -113,6 +113,7 @@ typedef int (*vfio_dma_user_func_t)(int fd, uint64_t vaddr, uint64_t iova, struct vfio_iommu_type { int type_id; const char *name; + bool partial_unmap; vfio_dma_user_func_t dma_user_map_func; vfio_dma_func_t dma_map_func; }; -- 2.8.4 ^ permalink raw reply [flat|nested] 43+ messages in thread
[parent not found: <20210115073243.7025-1-ndabilpuram@marvell.com>]
* [dpdk-stable] [PATCH v8 1/3] vfio: revert changes for map contiguous areas in one go [not found] ` <20210115073243.7025-1-ndabilpuram@marvell.com> @ 2021-01-15 7:32 ` Nithin Dabilpuram 2021-03-05 7:50 ` David Marchand 2021-01-15 7:32 ` [dpdk-stable] [PATCH v8 2/3] vfio: fix DMA mapping granularity for type1 IOVA as VA Nithin Dabilpuram 2021-01-15 7:32 ` [dpdk-stable] [PATCH v8 3/3] test: change external memory test to use system page sz Nithin Dabilpuram 2 siblings, 1 reply; 43+ messages in thread From: Nithin Dabilpuram @ 2021-01-15 7:32 UTC (permalink / raw) To: anatoly.burakov, David Christensen, david.marchand Cc: jerinj, dev, Nithin Dabilpuram, stable In order to save DMA entries limited by kernel both for externel memory and hugepage memory, an attempt was made to map physically contiguous memory in one go. This cannot be done as VFIO IOMMU type1 does not support partially unmapping a previously mapped memory region while Heap can request for multi page mapping and partial unmapping. Hence for going back to old method of mapping/unmapping at memseg granularity, this commit reverts commit d1c7c0cdf7ba ("vfio: map contiguous areas in one go") Also add documentation on what module parameter needs to be used to increase the per-container dma map limit for VFIO. Fixes: d1c7c0cdf7ba ("vfio: map contiguous areas in one go") Cc: anatoly.burakov@intel.com Cc: stable@dpdk.org Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: David Christensen <drc@linux.vnet.ibm.com> --- doc/guides/linux_gsg/linux_drivers.rst | 10 ++++++ lib/librte_eal/linux/eal_vfio.c | 59 +++++----------------------------- 2 files changed, 18 insertions(+), 51 deletions(-) diff --git a/doc/guides/linux_gsg/linux_drivers.rst b/doc/guides/linux_gsg/linux_drivers.rst index 90635a4..9a662a7 100644 --- a/doc/guides/linux_gsg/linux_drivers.rst +++ b/doc/guides/linux_gsg/linux_drivers.rst @@ -25,6 +25,16 @@ To make use of VFIO, the ``vfio-pci`` module must be loaded: VFIO kernel is usually present by default in all distributions, however please consult your distributions documentation to make sure that is the case. +For DMA mapping of either external memory or hugepages, VFIO interface is used. +VFIO does not support partial unmap of once mapped memory. Hence DPDK's memory is +mapped in hugepage granularity or system page granularity. Number of DMA +mappings is limited by kernel with user locked memory limit of a process(rlimit) +for system/hugepage memory. Another per-container overall limit applicable both +for external memory and system memory was added in kernel 5.1 defined by +VFIO module parameter ``dma_entry_limit`` with a default value of 64K. +When application is out of DMA entries, these limits need to be adjusted to +increase the allowed limit. + Since Linux version 5.7, the ``vfio-pci`` module supports the creation of virtual functions. After the PF is bound to ``vfio-pci`` module, diff --git a/lib/librte_eal/linux/eal_vfio.c b/lib/librte_eal/linux/eal_vfio.c index 0500824..64b134d 100644 --- a/lib/librte_eal/linux/eal_vfio.c +++ b/lib/librte_eal/linux/eal_vfio.c @@ -517,11 +517,9 @@ static void vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len, void *arg __rte_unused) { - rte_iova_t iova_start, iova_expected; struct rte_memseg_list *msl; struct rte_memseg *ms; size_t cur_len = 0; - uint64_t va_start; msl = rte_mem_virt2memseg_list(addr); @@ -539,63 +537,22 @@ vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len, /* memsegs are contiguous in memory */ ms = rte_mem_virt2memseg(addr, msl); - - /* - * This memory is not guaranteed to be contiguous, but it still could - * be, or it could have some small contiguous chunks. Since the number - * of VFIO mappings is limited, and VFIO appears to not concatenate - * adjacent mappings, we have to do this ourselves. - * - * So, find contiguous chunks, then map them. - */ - va_start = ms->addr_64; - iova_start = iova_expected = ms->iova; while (cur_len < len) { - bool new_contig_area = ms->iova != iova_expected; - bool last_seg = (len - cur_len) == ms->len; - bool skip_last = false; - - /* only do mappings when current contiguous area ends */ - if (new_contig_area) { - if (type == RTE_MEM_EVENT_ALLOC) - vfio_dma_mem_map(default_vfio_cfg, va_start, - iova_start, - iova_expected - iova_start, 1); - else - vfio_dma_mem_map(default_vfio_cfg, va_start, - iova_start, - iova_expected - iova_start, 0); - va_start = ms->addr_64; - iova_start = ms->iova; - } /* some memory segments may have invalid IOVA */ if (ms->iova == RTE_BAD_IOVA) { RTE_LOG(DEBUG, EAL, "Memory segment at %p has bad IOVA, skipping\n", ms->addr); - skip_last = true; + goto next; } - iova_expected = ms->iova + ms->len; + if (type == RTE_MEM_EVENT_ALLOC) + vfio_dma_mem_map(default_vfio_cfg, ms->addr_64, + ms->iova, ms->len, 1); + else + vfio_dma_mem_map(default_vfio_cfg, ms->addr_64, + ms->iova, ms->len, 0); +next: cur_len += ms->len; ++ms; - - /* - * don't count previous segment, and don't attempt to - * dereference a potentially invalid pointer. - */ - if (skip_last && !last_seg) { - iova_expected = iova_start = ms->iova; - va_start = ms->addr_64; - } else if (!skip_last && last_seg) { - /* this is the last segment and we're not skipping */ - if (type == RTE_MEM_EVENT_ALLOC) - vfio_dma_mem_map(default_vfio_cfg, va_start, - iova_start, - iova_expected - iova_start, 1); - else - vfio_dma_mem_map(default_vfio_cfg, va_start, - iova_start, - iova_expected - iova_start, 0); - } } } -- 2.8.4 ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [dpdk-stable] [PATCH v8 1/3] vfio: revert changes for map contiguous areas in one go 2021-01-15 7:32 ` [dpdk-stable] [PATCH v8 1/3] vfio: revert changes for map contiguous areas in one go Nithin Dabilpuram @ 2021-03-05 7:50 ` David Marchand 2021-03-05 13:54 ` Burakov, Anatoly 0 siblings, 1 reply; 43+ messages in thread From: David Marchand @ 2021-03-05 7:50 UTC (permalink / raw) To: Nithin Dabilpuram Cc: Burakov, Anatoly, David Christensen, Jerin Jacob Kollanukkaran, dev, dpdk stable On Fri, Jan 15, 2021 at 8:33 AM Nithin Dabilpuram <ndabilpuram@marvell.com> wrote: > > In order to save DMA entries limited by kernel both for externel > memory and hugepage memory, an attempt was made to map physically > contiguous memory in one go. This cannot be done as VFIO IOMMU type1 > does not support partially unmapping a previously mapped memory > region while Heap can request for multi page mapping and > partial unmapping. > Hence for going back to old method of mapping/unmapping at > memseg granularity, this commit reverts > commit d1c7c0cdf7ba ("vfio: map contiguous areas in one go") > > Also add documentation on what module parameter needs to be used > to increase the per-container dma map limit for VFIO. > > Fixes: d1c7c0cdf7ba ("vfio: map contiguous areas in one go") > Cc: anatoly.burakov@intel.com > Cc: stable@dpdk.org > > Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> > Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> > Acked-by: David Christensen <drc@linux.vnet.ibm.com> There is a regression reported in bz: https://bugs.dpdk.org/show_bug.cgi?id=649 I assigned it to Anatoly for now. Nithin, can you have a loo too? Thanks. -- David Marchand ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [dpdk-stable] [PATCH v8 1/3] vfio: revert changes for map contiguous areas in one go 2021-03-05 7:50 ` David Marchand @ 2021-03-05 13:54 ` Burakov, Anatoly 2021-03-05 15:50 ` [dpdk-stable] [dpdk-dev] " Nithin Dabilpuram 0 siblings, 1 reply; 43+ messages in thread From: Burakov, Anatoly @ 2021-03-05 13:54 UTC (permalink / raw) To: David Marchand, Nithin Dabilpuram Cc: David Christensen, Jerin Jacob Kollanukkaran, dev, dpdk stable On 05-Mar-21 7:50 AM, David Marchand wrote: > On Fri, Jan 15, 2021 at 8:33 AM Nithin Dabilpuram > <ndabilpuram@marvell.com> wrote: >> >> In order to save DMA entries limited by kernel both for externel >> memory and hugepage memory, an attempt was made to map physically >> contiguous memory in one go. This cannot be done as VFIO IOMMU type1 >> does not support partially unmapping a previously mapped memory >> region while Heap can request for multi page mapping and >> partial unmapping. >> Hence for going back to old method of mapping/unmapping at >> memseg granularity, this commit reverts >> commit d1c7c0cdf7ba ("vfio: map contiguous areas in one go") >> >> Also add documentation on what module parameter needs to be used >> to increase the per-container dma map limit for VFIO. >> >> Fixes: d1c7c0cdf7ba ("vfio: map contiguous areas in one go") >> Cc: anatoly.burakov@intel.com >> Cc: stable@dpdk.org >> >> Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> >> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> >> Acked-by: David Christensen <drc@linux.vnet.ibm.com> > > There is a regression reported in bz: https://bugs.dpdk.org/show_bug.cgi?id=649 > > I assigned it to Anatoly for now. > Nithin, can you have a loo too? > > Thanks. > > I've responded on the bug tracker as well, but to repeat it here: this is not a regression, this is intended behavior. We cannot do anything about this. -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [dpdk-stable] [dpdk-dev] [PATCH v8 1/3] vfio: revert changes for map contiguous areas in one go 2021-03-05 13:54 ` Burakov, Anatoly @ 2021-03-05 15:50 ` Nithin Dabilpuram 2021-04-01 11:27 ` Burakov, Anatoly 0 siblings, 1 reply; 43+ messages in thread From: Nithin Dabilpuram @ 2021-03-05 15:50 UTC (permalink / raw) To: Burakov, Anatoly Cc: David Marchand, David Christensen, Jerin Jacob Kollanukkaran, dev, dpdk stable On Fri, Mar 05, 2021 at 01:54:34PM +0000, Burakov, Anatoly wrote: > On 05-Mar-21 7:50 AM, David Marchand wrote: > > On Fri, Jan 15, 2021 at 8:33 AM Nithin Dabilpuram > > <ndabilpuram@marvell.com> wrote: > > > > > > In order to save DMA entries limited by kernel both for externel > > > memory and hugepage memory, an attempt was made to map physically > > > contiguous memory in one go. This cannot be done as VFIO IOMMU type1 > > > does not support partially unmapping a previously mapped memory > > > region while Heap can request for multi page mapping and > > > partial unmapping. > > > Hence for going back to old method of mapping/unmapping at > > > memseg granularity, this commit reverts > > > commit d1c7c0cdf7ba ("vfio: map contiguous areas in one go") > > > > > > Also add documentation on what module parameter needs to be used > > > to increase the per-container dma map limit for VFIO. > > > > > > Fixes: d1c7c0cdf7ba ("vfio: map contiguous areas in one go") > > > Cc: anatoly.burakov@intel.com > > > Cc: stable@dpdk.org > > > > > > Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> > > > Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> > > > Acked-by: David Christensen <drc@linux.vnet.ibm.com> > > > > There is a regression reported in bz: https://bugs.dpdk.org/show_bug.cgi?id=649 > > > > I assigned it to Anatoly for now. > > Nithin, can you have a loo too? > > > > Thanks. > > > > > > I've responded on the bug tracker as well, but to repeat it here: this is > not a regression, this is intended behavior. We cannot do anything about > this. To add, for test case to pass, either limits have to be increased, or use "--mp-alloc=xmemhuge" instead of "--mp-alloc=xmem" which is forcing system page size or reduce total mbuf count to reduce page count. > > -- > Thanks, > Anatoly ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [dpdk-stable] [dpdk-dev] [PATCH v8 1/3] vfio: revert changes for map contiguous areas in one go 2021-03-05 15:50 ` [dpdk-stable] [dpdk-dev] " Nithin Dabilpuram @ 2021-04-01 11:27 ` Burakov, Anatoly 0 siblings, 0 replies; 43+ messages in thread From: Burakov, Anatoly @ 2021-04-01 11:27 UTC (permalink / raw) To: Nithin Dabilpuram Cc: David Marchand, David Christensen, Jerin Jacob Kollanukkaran, dev, dpdk stable On 05-Mar-21 3:50 PM, Nithin Dabilpuram wrote: > On Fri, Mar 05, 2021 at 01:54:34PM +0000, Burakov, Anatoly wrote: >> On 05-Mar-21 7:50 AM, David Marchand wrote: >>> On Fri, Jan 15, 2021 at 8:33 AM Nithin Dabilpuram >>> <ndabilpuram@marvell.com> wrote: >>>> >>>> In order to save DMA entries limited by kernel both for externel >>>> memory and hugepage memory, an attempt was made to map physically >>>> contiguous memory in one go. This cannot be done as VFIO IOMMU type1 >>>> does not support partially unmapping a previously mapped memory >>>> region while Heap can request for multi page mapping and >>>> partial unmapping. >>>> Hence for going back to old method of mapping/unmapping at >>>> memseg granularity, this commit reverts >>>> commit d1c7c0cdf7ba ("vfio: map contiguous areas in one go") >>>> >>>> Also add documentation on what module parameter needs to be used >>>> to increase the per-container dma map limit for VFIO. >>>> >>>> Fixes: d1c7c0cdf7ba ("vfio: map contiguous areas in one go") >>>> Cc: anatoly.burakov@intel.com >>>> Cc: stable@dpdk.org >>>> >>>> Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> >>>> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> >>>> Acked-by: David Christensen <drc@linux.vnet.ibm.com> >>> >>> There is a regression reported in bz: https://bugs.dpdk.org/show_bug.cgi?id=649 >>> >>> I assigned it to Anatoly for now. >>> Nithin, can you have a loo too? >>> >>> Thanks. >>> >>> >> >> I've responded on the bug tracker as well, but to repeat it here: this is >> not a regression, this is intended behavior. We cannot do anything about >> this. > > To add, for test case to pass, either limits have to be increased, or use "--mp-alloc=xmemhuge" > instead of "--mp-alloc=xmem" which is forcing system page size or reduce total mbuf count > to reduce page count. > Technically, one is not a replacement for the other, so the correct way to handle it is to increase the limits, not using xmemhuge. -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 43+ messages in thread
* [dpdk-stable] [PATCH v8 2/3] vfio: fix DMA mapping granularity for type1 IOVA as VA [not found] ` <20210115073243.7025-1-ndabilpuram@marvell.com> 2021-01-15 7:32 ` [dpdk-stable] [PATCH v8 1/3] vfio: revert changes for map contiguous areas in one go Nithin Dabilpuram @ 2021-01-15 7:32 ` Nithin Dabilpuram 2021-01-15 7:32 ` [dpdk-stable] [PATCH v8 3/3] test: change external memory test to use system page sz Nithin Dabilpuram 2 siblings, 0 replies; 43+ messages in thread From: Nithin Dabilpuram @ 2021-01-15 7:32 UTC (permalink / raw) To: anatoly.burakov, David Christensen, david.marchand Cc: jerinj, dev, Nithin Dabilpuram, stable Partial unmapping is not supported for VFIO IOMMU type1 by kernel. Though kernel gives return as zero, the unmapped size returned will not be same as expected. So check for returned unmap size and return error. For IOVA as PA, DMA mapping is already at memseg size granularity. Do the same even for IOVA as VA mode as DMA map/unmap triggered by heap allocations, maintain granularity of memseg page size so that heap expansion and contraction does not have this issue. For user requested DMA map/unmap disallow partial unmapping for VFIO type1. Fixes: 73a639085938 ("vfio: allow to map other memory regions") Cc: anatoly.burakov@intel.com Cc: stable@dpdk.org Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: David Christensen <drc@linux.vnet.ibm.com> --- lib/librte_eal/linux/eal_vfio.c | 34 ++++++++++++++++++++++++++++------ lib/librte_eal/linux/eal_vfio.h | 1 + 2 files changed, 29 insertions(+), 6 deletions(-) diff --git a/lib/librte_eal/linux/eal_vfio.c b/lib/librte_eal/linux/eal_vfio.c index 64b134d..b15b758 100644 --- a/lib/librte_eal/linux/eal_vfio.c +++ b/lib/librte_eal/linux/eal_vfio.c @@ -70,6 +70,7 @@ static const struct vfio_iommu_type iommu_types[] = { { .type_id = RTE_VFIO_TYPE1, .name = "Type 1", + .partial_unmap = false, .dma_map_func = &vfio_type1_dma_map, .dma_user_map_func = &vfio_type1_dma_mem_map }, @@ -77,6 +78,7 @@ static const struct vfio_iommu_type iommu_types[] = { { .type_id = RTE_VFIO_SPAPR, .name = "sPAPR", + .partial_unmap = true, .dma_map_func = &vfio_spapr_dma_map, .dma_user_map_func = &vfio_spapr_dma_mem_map }, @@ -84,6 +86,7 @@ static const struct vfio_iommu_type iommu_types[] = { { .type_id = RTE_VFIO_NOIOMMU, .name = "No-IOMMU", + .partial_unmap = true, .dma_map_func = &vfio_noiommu_dma_map, .dma_user_map_func = &vfio_noiommu_dma_mem_map }, @@ -526,12 +529,19 @@ vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len, /* for IOVA as VA mode, no need to care for IOVA addresses */ if (rte_eal_iova_mode() == RTE_IOVA_VA && msl->external == 0) { uint64_t vfio_va = (uint64_t)(uintptr_t)addr; - if (type == RTE_MEM_EVENT_ALLOC) - vfio_dma_mem_map(default_vfio_cfg, vfio_va, vfio_va, - len, 1); - else - vfio_dma_mem_map(default_vfio_cfg, vfio_va, vfio_va, - len, 0); + uint64_t page_sz = msl->page_sz; + + /* Maintain granularity of DMA map/unmap to memseg size */ + for (; cur_len < len; cur_len += page_sz) { + if (type == RTE_MEM_EVENT_ALLOC) + vfio_dma_mem_map(default_vfio_cfg, vfio_va, + vfio_va, page_sz, 1); + else + vfio_dma_mem_map(default_vfio_cfg, vfio_va, + vfio_va, page_sz, 0); + vfio_va += page_sz; + } + return; } @@ -1348,6 +1358,12 @@ vfio_type1_dma_mem_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova, RTE_LOG(ERR, EAL, " cannot clear DMA remapping, error %i (%s)\n", errno, strerror(errno)); return -1; + } else if (dma_unmap.size != len) { + RTE_LOG(ERR, EAL, " unexpected size %"PRIu64" of DMA " + "remapping cleared instead of %"PRIu64"\n", + (uint64_t)dma_unmap.size, len); + rte_errno = EIO; + return -1; } } @@ -1823,6 +1839,12 @@ container_dma_unmap(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova, /* we're partially unmapping a previously mapped region, so we * need to split entry into two. */ + if (!vfio_cfg->vfio_iommu_type->partial_unmap) { + RTE_LOG(DEBUG, EAL, "DMA partial unmap unsupported\n"); + rte_errno = ENOTSUP; + ret = -1; + goto out; + } if (user_mem_maps->n_maps == VFIO_MAX_USER_MEM_MAPS) { RTE_LOG(ERR, EAL, "Not enough space to store partial mapping\n"); rte_errno = ENOMEM; diff --git a/lib/librte_eal/linux/eal_vfio.h b/lib/librte_eal/linux/eal_vfio.h index cb2d35f..6ebaca6 100644 --- a/lib/librte_eal/linux/eal_vfio.h +++ b/lib/librte_eal/linux/eal_vfio.h @@ -113,6 +113,7 @@ typedef int (*vfio_dma_user_func_t)(int fd, uint64_t vaddr, uint64_t iova, struct vfio_iommu_type { int type_id; const char *name; + bool partial_unmap; vfio_dma_user_func_t dma_user_map_func; vfio_dma_func_t dma_map_func; }; -- 2.8.4 ^ permalink raw reply [flat|nested] 43+ messages in thread
* [dpdk-stable] [PATCH v8 3/3] test: change external memory test to use system page sz [not found] ` <20210115073243.7025-1-ndabilpuram@marvell.com> 2021-01-15 7:32 ` [dpdk-stable] [PATCH v8 1/3] vfio: revert changes for map contiguous areas in one go Nithin Dabilpuram 2021-01-15 7:32 ` [dpdk-stable] [PATCH v8 2/3] vfio: fix DMA mapping granularity for type1 IOVA as VA Nithin Dabilpuram @ 2021-01-15 7:32 ` Nithin Dabilpuram 2021-02-11 11:21 ` Burakov, Anatoly 2 siblings, 1 reply; 43+ messages in thread From: Nithin Dabilpuram @ 2021-01-15 7:32 UTC (permalink / raw) To: anatoly.burakov, David Christensen, david.marchand Cc: jerinj, dev, Nithin Dabilpuram, stable Currently external memory test uses 4K page size. VFIO DMA mapping works only with system page granularity. Earlier it was working because all the contiguous mappings were coalesced and mapped in one-go which ended up becoming a lot bigger page. Now that VFIO DMA mappings both in IOVA as VA and IOVA as PA mode, are being done at memseg list granularity, we need to use system page size. Fixes: b270daa43b3d ("test: support external memory") Cc: anatoly.burakov@intel.com Cc: stable@dpdk.org Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> --- app/test/test_external_mem.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/app/test/test_external_mem.c b/app/test/test_external_mem.c index 7eb81f6..5edf88b 100644 --- a/app/test/test_external_mem.c +++ b/app/test/test_external_mem.c @@ -13,6 +13,7 @@ #include <rte_common.h> #include <rte_debug.h> #include <rte_eal.h> +#include <rte_eal_paging.h> #include <rte_errno.h> #include <rte_malloc.h> #include <rte_ring.h> @@ -532,8 +533,8 @@ test_extmem_basic(void *addr, size_t len, size_t pgsz, rte_iova_t *iova, static int test_external_mem(void) { + size_t pgsz = rte_mem_page_size(); size_t len = EXTERNAL_MEM_SZ; - size_t pgsz = RTE_PGSIZE_4K; rte_iova_t iova[len / pgsz]; void *addr; int ret, n_pages; -- 2.8.4 ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [dpdk-stable] [PATCH v8 3/3] test: change external memory test to use system page sz 2021-01-15 7:32 ` [dpdk-stable] [PATCH v8 3/3] test: change external memory test to use system page sz Nithin Dabilpuram @ 2021-02-11 11:21 ` Burakov, Anatoly 0 siblings, 0 replies; 43+ messages in thread From: Burakov, Anatoly @ 2021-02-11 11:21 UTC (permalink / raw) To: Nithin Dabilpuram, David Christensen, david.marchand; +Cc: jerinj, dev, stable On 15-Jan-21 7:32 AM, Nithin Dabilpuram wrote: > Currently external memory test uses 4K page size. > VFIO DMA mapping works only with system page granularity. > > Earlier it was working because all the contiguous mappings > were coalesced and mapped in one-go which ended up becoming > a lot bigger page. Now that VFIO DMA mappings both in IOVA as VA > and IOVA as PA mode, are being done at memseg list granularity, > we need to use system page size. > > Fixes: b270daa43b3d ("test: support external memory") > Cc: anatoly.burakov@intel.com > Cc: stable@dpdk.org > > Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> > --- Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 43+ messages in thread
end of thread, other threads:[~2021-04-01 11:27 UTC | newest] Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <20201012081106.10610-1-ndabilpuram@marvell.com> 2020-10-12 8:11 ` [dpdk-stable] [PATCH 2/2] vfio: fix partial DMA unmapping for VFIO type1 Nithin Dabilpuram 2020-10-14 15:07 ` Burakov, Anatoly 2020-10-15 6:09 ` [dpdk-stable] [EXT] " Nithin Dabilpuram 2020-10-15 10:00 ` Burakov, Anatoly 2020-10-15 11:38 ` Nithin Dabilpuram 2020-10-15 11:50 ` Nithin Dabilpuram 2020-10-15 11:57 ` [dpdk-stable] [dpdk-dev] " Nithin Dabilpuram 2020-10-15 15:10 ` Burakov, Anatoly 2020-10-16 7:10 ` Nithin Dabilpuram 2020-10-17 16:14 ` Burakov, Anatoly 2020-10-19 9:43 ` Nithin Dabilpuram 2020-10-22 12:13 ` Nithin Dabilpuram 2020-10-28 13:04 ` Burakov, Anatoly 2020-10-28 14:17 ` Nithin Dabilpuram 2020-10-28 16:07 ` Burakov, Anatoly 2020-10-28 16:31 ` Nithin Dabilpuram [not found] ` <20201105090423.11954-1-ndabilpuram@marvell.com> 2020-11-05 9:04 ` [dpdk-stable] [PATCH v2 1/3] vfio: revert changes for map contiguous areas in one go Nithin Dabilpuram 2020-11-05 9:04 ` [dpdk-stable] [PATCH v2 2/3] vfio: fix DMA mapping granularity for type1 iova as va Nithin Dabilpuram 2020-11-10 14:04 ` Burakov, Anatoly 2020-11-10 14:22 ` [dpdk-stable] [dpdk-dev] " Burakov, Anatoly 2020-11-10 14:17 ` [dpdk-stable] " Burakov, Anatoly 2020-11-11 5:08 ` [dpdk-stable] [dpdk-dev] " Nithin Dabilpuram 2020-11-11 10:00 ` Burakov, Anatoly [not found] ` <20201201193302.28131-1-ndabilpuram@marvell.com> 2020-12-01 19:32 ` [dpdk-stable] [PATCH v3 1/4] vfio: revert changes for map contiguous areas in one go Nithin Dabilpuram 2020-12-01 19:33 ` [dpdk-stable] [PATCH v3 2/4] vfio: fix DMA mapping granularity for type1 IOVA as VA Nithin Dabilpuram [not found] ` <20201202054647.3449-1-ndabilpuram@marvell.com> 2020-12-02 5:46 ` [dpdk-stable] [PATCH v4 1/4] vfio: revert changes for map contiguous areas in one go Nithin Dabilpuram 2020-12-02 18:36 ` David Christensen 2020-12-02 5:46 ` [dpdk-stable] [PATCH v4 2/4] vfio: fix DMA mapping granularity for type1 IOVA as VA Nithin Dabilpuram 2020-12-02 18:38 ` David Christensen [not found] ` <20201214081935.23577-1-ndabilpuram@marvell.com> 2020-12-14 8:19 ` [dpdk-stable] [PATCH v5 1/4] vfio: revert changes for map contiguous areas in one go Nithin Dabilpuram 2020-12-14 8:19 ` [dpdk-stable] [PATCH v5 2/4] vfio: fix DMA mapping granularity for type1 IOVA as VA Nithin Dabilpuram [not found] ` <20201217190604.29803-1-ndabilpuram@marvell.com> 2020-12-17 19:06 ` [dpdk-stable] [PATCH v6 1/4] vfio: revert changes for map contiguous areas in one go Nithin Dabilpuram 2020-12-17 19:06 ` [dpdk-stable] [PATCH v6 2/4] vfio: fix DMA mapping granularity for type1 IOVA as VA Nithin Dabilpuram [not found] ` <20210112173923.30320-1-ndabilpuram@marvell.com> 2021-01-12 17:39 ` [dpdk-stable] [PATCH v7 1/3] vfio: revert changes for map contiguous areas in one go Nithin Dabilpuram 2021-01-12 17:39 ` [dpdk-stable] [PATCH v7 2/3] vfio: fix DMA mapping granularity for type1 IOVA as VA Nithin Dabilpuram [not found] ` <20210115073243.7025-1-ndabilpuram@marvell.com> 2021-01-15 7:32 ` [dpdk-stable] [PATCH v8 1/3] vfio: revert changes for map contiguous areas in one go Nithin Dabilpuram 2021-03-05 7:50 ` David Marchand 2021-03-05 13:54 ` Burakov, Anatoly 2021-03-05 15:50 ` [dpdk-stable] [dpdk-dev] " Nithin Dabilpuram 2021-04-01 11:27 ` Burakov, Anatoly 2021-01-15 7:32 ` [dpdk-stable] [PATCH v8 2/3] vfio: fix DMA mapping granularity for type1 IOVA as VA Nithin Dabilpuram 2021-01-15 7:32 ` [dpdk-stable] [PATCH v8 3/3] test: change external memory test to use system page sz Nithin Dabilpuram 2021-02-11 11:21 ` Burakov, Anatoly
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).