DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH v3] vfio: fix sPAPR IOMMU DMA window size
@ 2017-08-08  8:41 Jonas Pfefferle
  2017-08-08  9:15 ` Burakov, Anatoly
  0 siblings, 1 reply; 5+ messages in thread
From: Jonas Pfefferle @ 2017-08-08  8:41 UTC (permalink / raw)
  To: anatoly.burakov; +Cc: dev, aik, Jonas Pfefferle

DMA window size needs to be big enough to span all memory segment's
physical addresses. We do not need multiple levels of IOMMU tables
as we already span ~70TB of physical memory with 16MB hugepages.

Signed-off-by: Jonas Pfefferle <jpf@zurich.ibm.com>
---
v2:
* roundup to next power 2 function without loop.

v3:
* Replace roundup_next_pow2 with rte_align64pow2

 lib/librte_eal/linuxapp/eal/eal_vfio.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index 946df7e..550c41c 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -759,10 +759,12 @@ vfio_spapr_dma_map(int vfio_container_fd)
 		return -1;
 	}
 
-	/* calculate window size based on number of hugepages configured */
-	create.window_size = rte_eal_get_physmem_size();
+	/* physicaly pages are sorted descending i.e. ms[0].phys_addr is max */
+	/* create DMA window from 0 to max(phys_addr + len) */
+	/* sPAPR requires window size to be a power of 2 */
+	create.window_size = rte_align64pow2(ms[0].phys_addr + ms[0].len);
 	create.page_shift = __builtin_ctzll(ms->hugepage_sz);
-	create.levels = 2;
+	create.levels = 1;
 
 	ret = ioctl(vfio_container_fd, VFIO_IOMMU_SPAPR_TCE_CREATE, &create);
 	if (ret) {
@@ -771,6 +773,11 @@ vfio_spapr_dma_map(int vfio_container_fd)
 		return -1;
 	}
 
+	if (create.start_addr != 0) {
+		RTE_LOG(ERR, EAL, "  DMA window start address != 0\n");
+		return -1;
+	}
+
 	/* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */
 	for (i = 0; i < RTE_MAX_MEMSEG; i++) {
 		struct vfio_iommu_type1_dma_map dma_map;
-- 
2.7.4

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] [PATCH v3] vfio: fix sPAPR IOMMU DMA window size
  2017-08-08  8:41 [dpdk-dev] [PATCH v3] vfio: fix sPAPR IOMMU DMA window size Jonas Pfefferle
@ 2017-08-08  9:15 ` Burakov, Anatoly
  2017-08-08  9:29   ` Jonas Pfefferle1
  0 siblings, 1 reply; 5+ messages in thread
From: Burakov, Anatoly @ 2017-08-08  9:15 UTC (permalink / raw)
  To: Jonas Pfefferle; +Cc: dev, aik

From: Jonas Pfefferle [mailto:jpf@zurich.ibm.com]
> Sent: Tuesday, August 8, 2017 9:41 AM
> To: Burakov, Anatoly <anatoly.burakov@intel.com>
> Cc: dev@dpdk.org; aik@ozlabs.ru; Jonas Pfefferle <jpf@zurich.ibm.com>
> Subject: [PATCH v3] vfio: fix sPAPR IOMMU DMA window size
> 
> DMA window size needs to be big enough to span all memory segment's
> physical addresses. We do not need multiple levels of IOMMU tables
> as we already span ~70TB of physical memory with 16MB hugepages.
> 
> Signed-off-by: Jonas Pfefferle <jpf@zurich.ibm.com>
> ---
> v2:
> * roundup to next power 2 function without loop.
> 
> v3:
> * Replace roundup_next_pow2 with rte_align64pow2
> 
>  lib/librte_eal/linuxapp/eal/eal_vfio.c | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> index 946df7e..550c41c 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> @@ -759,10 +759,12 @@ vfio_spapr_dma_map(int vfio_container_fd)
>  		return -1;
>  	}
> 
> -	/* calculate window size based on number of hugepages configured
> */
> -	create.window_size = rte_eal_get_physmem_size();
> +	/* physicaly pages are sorted descending i.e. ms[0].phys_addr is max
> */

Do we always expect that to be the case in the future? Maybe it would be safer to walk the memsegs list.

Thanks,
Anatoly

> +	/* create DMA window from 0 to max(phys_addr + len) */
> +	/* sPAPR requires window size to be a power of 2 */
> +	create.window_size = rte_align64pow2(ms[0].phys_addr +
> ms[0].len);
>  	create.page_shift = __builtin_ctzll(ms->hugepage_sz);
> -	create.levels = 2;
> +	create.levels = 1;
> 
>  	ret = ioctl(vfio_container_fd, VFIO_IOMMU_SPAPR_TCE_CREATE,
> &create);
>  	if (ret) {
> @@ -771,6 +773,11 @@ vfio_spapr_dma_map(int vfio_container_fd)
>  		return -1;
>  	}
> 
> +	if (create.start_addr != 0) {
> +		RTE_LOG(ERR, EAL, "  DMA window start address != 0\n");
> +		return -1;
> +	}
> +
>  	/* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */
>  	for (i = 0; i < RTE_MAX_MEMSEG; i++) {
>  		struct vfio_iommu_type1_dma_map dma_map;
> --
> 2.7.4

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] [PATCH v3] vfio: fix sPAPR IOMMU DMA window size
  2017-08-08  9:15 ` Burakov, Anatoly
@ 2017-08-08  9:29   ` Jonas Pfefferle1
  2017-08-08  9:43     ` Burakov, Anatoly
  0 siblings, 1 reply; 5+ messages in thread
From: Jonas Pfefferle1 @ 2017-08-08  9:29 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: aik, dev

"Burakov, Anatoly" <anatoly.burakov@intel.com> wrote on 08/08/2017 11:15:24
AM:

> From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
> To: Jonas Pfefferle <jpf@zurich.ibm.com>
> Cc: "dev@dpdk.org" <dev@dpdk.org>, "aik@ozlabs.ru" <aik@ozlabs.ru>
> Date: 08/08/2017 11:18 AM
> Subject: RE: [PATCH v3] vfio: fix sPAPR IOMMU DMA window size
>
> From: Jonas Pfefferle [mailto:jpf@zurich.ibm.com]
> > Sent: Tuesday, August 8, 2017 9:41 AM
> > To: Burakov, Anatoly <anatoly.burakov@intel.com>
> > Cc: dev@dpdk.org; aik@ozlabs.ru; Jonas Pfefferle <jpf@zurich.ibm.com>
> > Subject: [PATCH v3] vfio: fix sPAPR IOMMU DMA window size
> >
> > DMA window size needs to be big enough to span all memory segment's
> > physical addresses. We do not need multiple levels of IOMMU tables
> > as we already span ~70TB of physical memory with 16MB hugepages.
> >
> > Signed-off-by: Jonas Pfefferle <jpf@zurich.ibm.com>
> > ---
> > v2:
> > * roundup to next power 2 function without loop.
> >
> > v3:
> > * Replace roundup_next_pow2 with rte_align64pow2
> >
> >  lib/librte_eal/linuxapp/eal/eal_vfio.c | 13 ++++++++++---
> >  1 file changed, 10 insertions(+), 3 deletions(-)
> >
> > diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > index 946df7e..550c41c 100644
> > --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > @@ -759,10 +759,12 @@ vfio_spapr_dma_map(int vfio_container_fd)
> >        return -1;
> >     }
> >
> > -   /* calculate window size based on number of hugepages configured
> > */
> > -   create.window_size = rte_eal_get_physmem_size();
> > +   /* physicaly pages are sorted descending i.e. ms[0].phys_addr is
max
> > */
>
> Do we always expect that to be the case in the future? Maybe it
> would be safer to walk the memsegs list.
>
> Thanks,
> Anatoly

I had this loop in before but removed it in favor of simplicity.
If we believe that the ordering is going to change in the future
I'm happy to bring back the loop. Is there other code which is
relying on the fact that the memsegs are sorted by their physical
addresses?

>
> > +   /* create DMA window from 0 to max(phys_addr + len) */
> > +   /* sPAPR requires window size to be a power of 2 */
> > +   create.window_size = rte_align64pow2(ms[0].phys_addr +
> > ms[0].len);
> >     create.page_shift = __builtin_ctzll(ms->hugepage_sz);
> > -   create.levels = 2;
> > +   create.levels = 1;
> >
> >     ret = ioctl(vfio_container_fd, VFIO_IOMMU_SPAPR_TCE_CREATE,
> > &create);
> >     if (ret) {
> > @@ -771,6 +773,11 @@ vfio_spapr_dma_map(int vfio_container_fd)
> >        return -1;
> >     }
> >
> > +   if (create.start_addr != 0) {
> > +      RTE_LOG(ERR, EAL, "  DMA window start address != 0\n");
> > +      return -1;
> > +   }
> > +
> >     /* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */
> >     for (i = 0; i < RTE_MAX_MEMSEG; i++) {
> >        struct vfio_iommu_type1_dma_map dma_map;
> > --
> > 2.7.4
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] [PATCH v3] vfio: fix sPAPR IOMMU DMA window size
  2017-08-08  9:29   ` Jonas Pfefferle1
@ 2017-08-08  9:43     ` Burakov, Anatoly
  2017-08-08 11:01       ` Jonas Pfefferle1
  0 siblings, 1 reply; 5+ messages in thread
From: Burakov, Anatoly @ 2017-08-08  9:43 UTC (permalink / raw)
  To: Jonas Pfefferle1; +Cc: aik, dev

> From: Jonas Pfefferle1 [mailto:JPF@zurich.ibm.com]
> Sent: Tuesday, August 8, 2017 10:30 AM
> To: Burakov, Anatoly <anatoly.burakov@intel.com>
> Cc: aik@ozlabs.ru; dev@dpdk.org
> Subject: RE: [PATCH v3] vfio: fix sPAPR IOMMU DMA window size
> 
> "Burakov, Anatoly" <anatoly.burakov@intel.com> wrote on 08/08/2017
> 11:15:24 AM:
> 
> > From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
> > To: Jonas Pfefferle <jpf@zurich.ibm.com>
> > Cc: "dev@dpdk.org" <dev@dpdk.org>, "aik@ozlabs.ru" <aik@ozlabs.ru>
> > Date: 08/08/2017 11:18 AM
> > Subject: RE: [PATCH v3] vfio: fix sPAPR IOMMU DMA window size
> >
> > From: Jonas Pfefferle [mailto:jpf@zurich.ibm.com]
> > > Sent: Tuesday, August 8, 2017 9:41 AM
> > > To: Burakov, Anatoly <anatoly.burakov@intel.com>
> > > Cc: dev@dpdk.org; aik@ozlabs.ru; Jonas Pfefferle <jpf@zurich.ibm.com>
> > > Subject: [PATCH v3] vfio: fix sPAPR IOMMU DMA window size
> > >
> > > DMA window size needs to be big enough to span all memory segment's
> > > physical addresses. We do not need multiple levels of IOMMU tables
> > > as we already span ~70TB of physical memory with 16MB hugepages.
> > >
> > > Signed-off-by: Jonas Pfefferle <jpf@zurich.ibm.com>
> > > ---
> > > v2:
> > > * roundup to next power 2 function without loop.
> > >
> > > v3:
> > > * Replace roundup_next_pow2 with rte_align64pow2
> > >
> > >  lib/librte_eal/linuxapp/eal/eal_vfio.c | 13 ++++++++++---
> > >  1 file changed, 10 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > > b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > > index 946df7e..550c41c 100644
> > > --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > > +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > > @@ -759,10 +759,12 @@ vfio_spapr_dma_map(int vfio_container_fd)
> > >        return -1;
> > >     }
> > >
> > > -   /* calculate window size based on number of hugepages configured
> > > */
> > > -   create.window_size = rte_eal_get_physmem_size();
> > > +   /* physicaly pages are sorted descending i.e. ms[0].phys_addr is max
> > > */
> >
> > Do we always expect that to be the case in the future? Maybe it
> > would be safer to walk the memsegs list.
> >
> > Thanks,
> > Anatoly
> 
> I had this loop in before but removed it in favor of simplicity.
> If we believe that the ordering is going to change in the future
> I'm happy to bring back the loop. Is there other code which is
> relying on the fact that the memsegs are sorted by their physical
> addresses?

I don't think there is. In any case, I think making assumptions about particulars of memseg organization is not a very good practice.

I seem to recall us doing similar things in other places, so maybe down the line we could introduce a new API (or internal-only) function to get a memseg with min/max address. For now I think a loop will do. 

> 
> >
> > > +   /* create DMA window from 0 to max(phys_addr + len) */
> > > +   /* sPAPR requires window size to be a power of 2 */
> > > +   create.window_size = rte_align64pow2(ms[0].phys_addr +
> > > ms[0].len);
> > >     create.page_shift = __builtin_ctzll(ms->hugepage_sz);
> > > -   create.levels = 2;
> > > +   create.levels = 1;
> > >
> > >     ret = ioctl(vfio_container_fd, VFIO_IOMMU_SPAPR_TCE_CREATE,
> > > &create);
> > >     if (ret) {
> > > @@ -771,6 +773,11 @@ vfio_spapr_dma_map(int vfio_container_fd)
> > >        return -1;
> > >     }
> > >
> > > +   if (create.start_addr != 0) {
> > > +      RTE_LOG(ERR, EAL, "  DMA window start address != 0\n");
> > > +      return -1;
> > > +   }
> > > +
> > >     /* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */
> > >     for (i = 0; i < RTE_MAX_MEMSEG; i++) {
> > >        struct vfio_iommu_type1_dma_map dma_map;
> > > --
> > > 2.7.4
> >

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] [PATCH v3] vfio: fix sPAPR IOMMU DMA window size
  2017-08-08  9:43     ` Burakov, Anatoly
@ 2017-08-08 11:01       ` Jonas Pfefferle1
  0 siblings, 0 replies; 5+ messages in thread
From: Jonas Pfefferle1 @ 2017-08-08 11:01 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: aik, dev


"Burakov, Anatoly" <anatoly.burakov@intel.com> wrote on 08/08/2017 11:43:43
AM:

> From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
> To: Jonas Pfefferle1 <JPF@zurich.ibm.com>
> Cc: "aik@ozlabs.ru" <aik@ozlabs.ru>, "dev@dpdk.org" <dev@dpdk.org>
> Date: 08/08/2017 11:43 AM
> Subject: RE: [PATCH v3] vfio: fix sPAPR IOMMU DMA window size
>
> > From: Jonas Pfefferle1 [mailto:JPF@zurich.ibm.com]
> > Sent: Tuesday, August 8, 2017 10:30 AM
> > To: Burakov, Anatoly <anatoly.burakov@intel.com>
> > Cc: aik@ozlabs.ru; dev@dpdk.org
> > Subject: RE: [PATCH v3] vfio: fix sPAPR IOMMU DMA window size
> >
> > "Burakov, Anatoly" <anatoly.burakov@intel.com> wrote on 08/08/2017
> > 11:15:24 AM:
> >
> > > From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
> > > To: Jonas Pfefferle <jpf@zurich.ibm.com>
> > > Cc: "dev@dpdk.org" <dev@dpdk.org>, "aik@ozlabs.ru" <aik@ozlabs.ru>
> > > Date: 08/08/2017 11:18 AM
> > > Subject: RE: [PATCH v3] vfio: fix sPAPR IOMMU DMA window size
> > >
> > > From: Jonas Pfefferle [mailto:jpf@zurich.ibm.com]
> > > > Sent: Tuesday, August 8, 2017 9:41 AM
> > > > To: Burakov, Anatoly <anatoly.burakov@intel.com>
> > > > Cc: dev@dpdk.org; aik@ozlabs.ru; Jonas Pfefferle
<jpf@zurich.ibm.com>
> > > > Subject: [PATCH v3] vfio: fix sPAPR IOMMU DMA window size
> > > >
> > > > DMA window size needs to be big enough to span all memory segment's
> > > > physical addresses. We do not need multiple levels of IOMMU tables
> > > > as we already span ~70TB of physical memory with 16MB hugepages.
> > > >
> > > > Signed-off-by: Jonas Pfefferle <jpf@zurich.ibm.com>
> > > > ---
> > > > v2:
> > > > * roundup to next power 2 function without loop.
> > > >
> > > > v3:
> > > > * Replace roundup_next_pow2 with rte_align64pow2
> > > >
> > > >  lib/librte_eal/linuxapp/eal/eal_vfio.c | 13 ++++++++++---
> > > >  1 file changed, 10 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > > > b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > > > index 946df7e..550c41c 100644
> > > > --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > > > +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > > > @@ -759,10 +759,12 @@ vfio_spapr_dma_map(int vfio_container_fd)
> > > >        return -1;
> > > >     }
> > > >
> > > > -   /* calculate window size based on number of hugepages
configured
> > > > */
> > > > -   create.window_size = rte_eal_get_physmem_size();
> > > > +   /* physicaly pages are sorted descending i.e. ms[0].phys_addr
is max
> > > > */
> > >
> > > Do we always expect that to be the case in the future? Maybe it
> > > would be safer to walk the memsegs list.
> > >
> > > Thanks,
> > > Anatoly
> >
> > I had this loop in before but removed it in favor of simplicity.
> > If we believe that the ordering is going to change in the future
> > I'm happy to bring back the loop. Is there other code which is
> > relying on the fact that the memsegs are sorted by their physical
> > addresses?
>
> I don't think there is. In any case, I think making assumptions
> about particulars of memseg organization is not a very good practice.
>
> I seem to recall us doing similar things in other places, so maybe
> down the line we could introduce a new API (or internal-only)
> function to get a memseg with min/max address. For now I think a
> loop will do.

Ok. Makes sense to me. Let me resubmit a new version with the loop.

>
> >
> > >
> > > > +   /* create DMA window from 0 to max(phys_addr + len) */
> > > > +   /* sPAPR requires window size to be a power of 2 */
> > > > +   create.window_size = rte_align64pow2(ms[0].phys_addr +
> > > > ms[0].len);
> > > >     create.page_shift = __builtin_ctzll(ms->hugepage_sz);
> > > > -   create.levels = 2;
> > > > +   create.levels = 1;
> > > >
> > > >     ret = ioctl(vfio_container_fd, VFIO_IOMMU_SPAPR_TCE_CREATE,
> > > > &create);
> > > >     if (ret) {
> > > > @@ -771,6 +773,11 @@ vfio_spapr_dma_map(int vfio_container_fd)
> > > >        return -1;
> > > >     }
> > > >
> > > > +   if (create.start_addr != 0) {
> > > > +      RTE_LOG(ERR, EAL, "  DMA window start address != 0\n");
> > > > +      return -1;
> > > > +   }
> > > > +
> > > >     /* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */
> > > >     for (i = 0; i < RTE_MAX_MEMSEG; i++) {
> > > >        struct vfio_iommu_type1_dma_map dma_map;
> > > > --
> > > > 2.7.4
> > >
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-08-08 11:01 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-08  8:41 [dpdk-dev] [PATCH v3] vfio: fix sPAPR IOMMU DMA window size Jonas Pfefferle
2017-08-08  9:15 ` Burakov, Anatoly
2017-08-08  9:29   ` Jonas Pfefferle1
2017-08-08  9:43     ` Burakov, Anatoly
2017-08-08 11:01       ` Jonas Pfefferle1

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).