From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id DDF0A2986 for ; Tue, 8 Aug 2017 11:43:47 +0200 (CEST) Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Aug 2017 02:43:46 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.41,342,1498546800"; d="scan'208";a="116848464" Received: from irsmsx106.ger.corp.intel.com ([163.33.3.31]) by orsmga004.jf.intel.com with ESMTP; 08 Aug 2017 02:43:45 -0700 Received: from irsmsx109.ger.corp.intel.com ([169.254.13.187]) by IRSMSX106.ger.corp.intel.com ([169.254.8.236]) with mapi id 14.03.0319.002; Tue, 8 Aug 2017 10:43:44 +0100 From: "Burakov, Anatoly" To: Jonas Pfefferle1 CC: "aik@ozlabs.ru" , "dev@dpdk.org" Thread-Topic: [PATCH v3] vfio: fix sPAPR IOMMU DMA window size Thread-Index: AQHTECIgLueaB22C+UCozdMrzosURqJ6LReg///zn4CAABPLQA== Date: Tue, 8 Aug 2017 09:43:43 +0000 Message-ID: References: <1502181667-17949-1-git-send-email-jpf@zurich.ibm.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ctpclassification: CTP_IC x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiNmZkNTAzNDktOTZmYy00OGRhLTliMzktZmY5YjQ0YmY0YTk4IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE2LjUuOS4zIiwiVHJ1c3RlZExhYmVsSGFzaCI6Imxkb1RLQkFqRkxEVitNbzNIRks4bVlmVUt5RE5TY2gxSEV5WEdtNU5WSzg9In0= dlp-product: dlpe-windows dlp-version: 11.0.0.116 dlp-reaction: no-action x-originating-ip: [163.33.239.181] Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v3] vfio: fix sPAPR IOMMU DMA window size X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Aug 2017 09:43:48 -0000 > From: Jonas Pfefferle1 [mailto:JPF@zurich.ibm.com] > Sent: Tuesday, August 8, 2017 10:30 AM > To: Burakov, Anatoly > Cc: aik@ozlabs.ru; dev@dpdk.org > Subject: RE: [PATCH v3] vfio: fix sPAPR IOMMU DMA window size >=20 > "Burakov, Anatoly" wrote on 08/08/2017 > 11:15:24 AM: >=20 > > From: "Burakov, Anatoly" > > To: Jonas Pfefferle > > Cc: "dev@dpdk.org" , "aik@ozlabs.ru" > > Date: 08/08/2017 11:18 AM > > Subject: RE: [PATCH v3] vfio: fix sPAPR IOMMU DMA window size > > > > From: Jonas Pfefferle [mailto:jpf@zurich.ibm.com] > > > Sent: Tuesday, August 8, 2017 9:41 AM > > > To: Burakov, Anatoly > > > Cc: dev@dpdk.org; aik@ozlabs.ru; Jonas Pfefferle > > > Subject: [PATCH v3] vfio: fix sPAPR IOMMU DMA window size > > > > > > DMA window size needs to be big enough to span all memory segment's > > > physical addresses. We do not need multiple levels of IOMMU tables > > > as we already span ~70TB of physical memory with 16MB hugepages. > > > > > > Signed-off-by: Jonas Pfefferle > > > --- > > > v2: > > > * roundup to next power 2 function without loop. > > > > > > v3: > > > * Replace roundup_next_pow2 with rte_align64pow2 > > > > > > =A0lib/librte_eal/linuxapp/eal/eal_vfio.c | 13 ++++++++++--- > > > =A01 file changed, 10 insertions(+), 3 deletions(-) > > > > > > diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c > > > b/lib/librte_eal/linuxapp/eal/eal_vfio.c > > > index 946df7e..550c41c 100644 > > > --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c > > > +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c > > > @@ -759,10 +759,12 @@ vfio_spapr_dma_map(int vfio_container_fd) > > > =A0 =A0 =A0 =A0return -1; > > > =A0 =A0 } > > > > > > - =A0 /* calculate window size based on number of hugepages configure= d > > > */ > > > - =A0 create.window_size =3D rte_eal_get_physmem_size(); > > > + =A0 /* physicaly pages are sorted descending i.e. ms[0].phys_addr i= s max > > > */ > > > > Do we always expect that to be the case in the future? Maybe it > > would be safer to walk the memsegs list. > > > > Thanks, > > Anatoly >=20 > I had this loop in before but removed it in favor of simplicity. > If we believe that the ordering is going to change in the future > I'm happy to bring back the loop. Is there other code which is > relying on the fact that the memsegs are sorted by their physical > addresses? I don't think there is. In any case, I think making assumptions about parti= culars of memseg organization is not a very good practice. I seem to recall us doing similar things in other places, so maybe down the= line we could introduce a new API (or internal-only) function to get a mem= seg with min/max address. For now I think a loop will do.=20 >=20 > > > > > + =A0 /* create DMA window from 0 to max(phys_addr + len) */ > > > + =A0 /* sPAPR requires window size to be a power of 2 */ > > > + =A0 create.window_size =3D rte_align64pow2(ms[0].phys_addr + > > > ms[0].len); > > > =A0 =A0 create.page_shift =3D __builtin_ctzll(ms->hugepage_sz); > > > - =A0 create.levels =3D 2; > > > + =A0 create.levels =3D 1; > > > > > > =A0 =A0 ret =3D ioctl(vfio_container_fd, VFIO_IOMMU_SPAPR_TCE_CREATE, > > > &create); > > > =A0 =A0 if (ret) { > > > @@ -771,6 +773,11 @@ vfio_spapr_dma_map(int vfio_container_fd) > > > =A0 =A0 =A0 =A0return -1; > > > =A0 =A0 } > > > > > > + =A0 if (create.start_addr !=3D 0) { > > > + =A0 =A0 =A0RTE_LOG(ERR, EAL, " =A0DMA window start address !=3D 0\n= "); > > > + =A0 =A0 =A0return -1; > > > + =A0 } > > > + > > > =A0 =A0 /* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping = */ > > > =A0 =A0 for (i =3D 0; i < RTE_MAX_MEMSEG; i++) { > > > =A0 =A0 =A0 =A0struct vfio_iommu_type1_dma_map dma_map; > > > -- > > > 2.7.4 > >