From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by dpdk.space (Postfix) with ESMTP id 13BD8A00E6 for ; Thu, 13 Jun 2019 04:30:56 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 9C6441CDF7; Thu, 13 Jun 2019 04:30:55 +0200 (CEST) Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by dpdk.org (Postfix) with ESMTP id 25FC21CDF4 for ; Thu, 13 Jun 2019 04:30:52 +0200 (CEST) Received: from pps.filterd (m0098396.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x5D2RFN0037357 for ; Wed, 12 Jun 2019 22:30:52 -0400 Received: from smtp.notes.na.collabserv.com (smtp.notes.na.collabserv.com [192.155.248.81]) by mx0a-001b2d01.pphosted.com with ESMTP id 2t3dejgjdw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Wed, 12 Jun 2019 22:30:52 -0400 Received: from localhost by smtp.notes.na.collabserv.com with smtp.notes.na.collabserv.com ESMTP for from ; Thu, 13 Jun 2019 02:30:51 -0000 Received: from us1a3-smtp02.a3.dal06.isc4sb.com (10.106.154.159) by smtp.notes.na.collabserv.com (10.106.227.88) with smtp.notes.na.collabserv.com ESMTP; Thu, 13 Jun 2019 02:30:48 -0000 Received: from us1a3-mail157.a3.dal06.isc4sb.com ([10.146.71.179]) by us1a3-smtp02.a3.dal06.isc4sb.com with ESMTP id 2019061302304839-1256584 ; Thu, 13 Jun 2019 02:30:48 +0000 In-Reply-To: <20190613022239.6946-1-tyos@jp.ibm.com> From: "Takeshi T Yoshimura" To: dev@dpdk.org Cc: "David Christensen" , "Pradeep Satyanarayana" , aconole@redhat.com Date: Thu, 13 Jun 2019 02:30:47 +0000 MIME-Version: 1.0 Sensitivity: Importance: Normal X-Priority: 3 (Normal) References: <20190613022239.6946-1-tyos@jp.ibm.com>, <20190612063313.2729-1-tyos@jp.ibm.com> X-Mailer: IBM iNotes ($HaikuForm 1054) | IBM Domino Build SCN1812108_20180501T0841_FP55 May 22, 2019 at 11:09 X-LLNOutbound: False X-Disclaimed: 21555 X-TNEFEvaluated: 1 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=UTF-8 x-cbid: 19061302-7093-0000-0000-00000BACC092 X-IBM-SpamModules-Scores: BY=0; FL=0; FP=0; FZ=0; HX=0; KW=0; PH=0; SC=0.425523; ST=0; TS=0; UL=0; ISC=; MB=0.025824 X-IBM-SpamModules-Versions: BY=3.00011253; HX=3.00000242; KW=3.00000007; PH=3.00000004; SC=3.00000286; SDB=6.01217170; UDB=6.00640037; IPR=6.00998282; BA=6.00006334; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00027289; XFM=3.00000015; UTC=2019-06-13 02:30:50 X-IBM-AV-DETECTION: SAVI=unsuspicious REMOTE=unsuspicious XFE=unused X-IBM-AV-VERSION: SAVI=2019-06-12 20:27:53 - 6.00010041 x-cbparentid: 19061302-7094-0000-0000-000072D2DD41 Message-Id: X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-06-13_01:, , signatures=0 X-Proofpoint-Spam-Reason: safe Subject: Re: [dpdk-dev] [PATCH] vfio: fix expanding DMA area in ppc64le X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" >=E5=AE=9B=E5=85=88: dev@dpdk.org >=E9=80=81=E4=BF=A1=E5=85=83: Takeshi Yoshimura >=E6=97=A5=E4=BB=98: 2019/06/13 11:22AM >Cc: drc@ibm.com, pradeep@us.ibm.com, Takeshi Yoshimura > >=E4=BB=B6=E5=90=8D: [EXTERNAL] [PATCH] vfio: fix expanding DMA area in ppc= 64le > >In ppc64le, expanding DMA areas always fail because we cannot remove >a DMA window. As a result, we cannot allocate more than one memseg in >ppc64le. This is because vfio=5Fspapr=5Fdma=5Fmem=5Fmap() doesn't unmap all >the mapped DMA before removing the window. This patch fixes this >incorrect behavior. > >I added a global variable to track current window size since we do >not have better ways to get exact size of it than doing so. sPAPR >IOMMU seems not to provide any ways to get window size with ioctl >interfaces. rte=5Fmemseg=5Fwalk*() is currently used to calculate window >size, but it walks memsegs that are marked as used, not mapped. So, >we need to determine if a given memseg is mapped or not, otherwise >the ioctl reports errors due to attempting to unregister memory >addresses that are not registered. The global variable is excluded >in non-ppc64le binaries. > >Similar problems happen in user maps. We need to avoid attempting to >unmap the address that is given as the function's parameter. The >compaction of user maps prevents us from passing correct length for >unmapping DMA at the window recreation. So, I removed it in ppc64le. > >I also fixed the order of ioctl for unregister and unmap. The ioctl >for unregister sometimes report device busy errors due to the >existence of mapped area. > >Signed-off-by: Takeshi Yoshimura >--- > lib/librte=5Feal/linux/eal/eal=5Fvfio.c | 154 >+++++++++++++++++++--------- > 1 file changed, 103 insertions(+), 51 deletions(-) > >diff --git a/lib/librte=5Feal/linux/eal/eal=5Fvfio.c >b/lib/librte=5Feal/linux/eal/eal=5Fvfio.c >index f16c5c3c0..c1b275b56 100644 >--- a/lib/librte=5Feal/linux/eal/eal=5Fvfio.c >+++ b/lib/librte=5Feal/linux/eal/eal=5Fvfio.c >@@ -93,6 +93,7 @@ is=5Fnull=5Fmap(const struct user=5Fmem=5Fmap *map) > return map->addr =3D=3D 0 && map->iova =3D=3D 0 && map->len =3D=3D 0; > } >=20 >+#ifndef RTE=5FARCH=5FPPC=5F64 > /* we may need to merge user mem maps together in case of user >mapping/unmapping > * chunks of memory, so we'll need a comparator function to sort >segments. > */ >@@ -126,6 +127,7 @@ user=5Fmem=5Fmap=5Fcmp(const void *a, const void *b) >=20 > return 0; > } >+#endif >=20 > /* adjust user map entry. this may result in shortening of existing >map, or in > * splitting existing map in two pieces. >@@ -162,6 +164,7 @@ adjust=5Fmap(struct user=5Fmem=5Fmap *src, struct >user=5Fmem=5Fmap *end, > } > } >=20 >+#ifndef RTE=5FARCH=5FPPC=5F64 > /* try merging two maps into one, return 1 if succeeded */ > static int > merge=5Fmap(struct user=5Fmem=5Fmap *left, struct user=5Fmem=5Fmap *right) >@@ -177,6 +180,7 @@ merge=5Fmap(struct user=5Fmem=5Fmap *left, struct >user=5Fmem=5Fmap *right) >=20 > return 1; > } >+#endif >=20 > static struct user=5Fmem=5Fmap * > find=5Fuser=5Fmem=5Fmap(struct user=5Fmem=5Fmaps *user=5Fmem=5Fmaps, uint= 64=5Ft >addr, >@@ -211,6 +215,16 @@ find=5Fuser=5Fmem=5Fmap(struct user=5Fmem=5Fmaps >*user=5Fmem=5Fmaps, uint64=5Ft addr, > return NULL; > } >=20 >+#ifdef RTE=5FARCH=5FPPC=5F64 >+/* Recreation of DMA window requires unregistering DMA memory. >+ * Compaction confuses the logic and causes false reports in the >recreation. >+ * For now, we do not compact user maps in ppc64le. >+ */ >+static void >+compact=5Fuser=5Fmaps(=5F=5Frte=5Funused struct user=5Fmem=5Fmaps *user= =5Fmem=5Fmaps) >+{ >+} >+#else > /* this will sort all user maps, and merge/compact any adjacent maps >*/ > static void > compact=5Fuser=5Fmaps(struct user=5Fmem=5Fmaps *user=5Fmem=5Fmaps) >@@ -256,6 +270,7 @@ compact=5Fuser=5Fmaps(struct user=5Fmem=5Fmaps >*user=5Fmem=5Fmaps) > user=5Fmem=5Fmaps->n=5Fmaps =3D cur=5Fidx; > } > } >+#endif >=20 > static int > vfio=5Fopen=5Fgroup=5Ffd(int iommu=5Fgroup=5Fnum) >@@ -1306,6 +1321,7 @@ vfio=5Ftype1=5Fdma=5Fmap(int vfio=5Fcontainer=5Ffd) > return rte=5Fmemseg=5Fwalk(type1=5Fmap, &vfio=5Fcontainer=5Ffd); > } >=20 >+#ifdef RTE=5FARCH=5FPPC=5F64 > static int > vfio=5Fspapr=5Fdma=5Fdo=5Fmap(int vfio=5Fcontainer=5Ffd, uint64=5Ft vaddr, >uint64=5Ft iova, > uint64=5Ft len, int do=5Fmap) >@@ -1357,14 +1373,6 @@ vfio=5Fspapr=5Fdma=5Fdo=5Fmap(int vfio=5Fcontainer= =5Ffd, >uint64=5Ft vaddr, uint64=5Ft iova, > } >=20 > } else { >- ret =3D ioctl(vfio=5Fcontainer=5Ffd, >- VFIO=5FIOMMU=5FSPAPR=5FUNREGISTER=5FMEMORY, ®); >- if (ret) { >- RTE=5FLOG(ERR, EAL, " cannot unregister vaddr for IOMMU, error %i >(%s)\n", >- errno, strerror(errno)); >- return -1; >- } >- > memset(&dma=5Funmap, 0, sizeof(dma=5Funmap)); > dma=5Funmap.argsz =3D sizeof(struct vfio=5Fiommu=5Ftype1=5Fdma=5Funmap); > dma=5Funmap.size =3D len; >@@ -1377,24 +1385,50 @@ vfio=5Fspapr=5Fdma=5Fdo=5Fmap(int vfio=5Fcontainer= =5Ffd, >uint64=5Ft vaddr, uint64=5Ft iova, > errno, strerror(errno)); > return -1; > } >+ >+ ret =3D ioctl(vfio=5Fcontainer=5Ffd, >+ VFIO=5FIOMMU=5FSPAPR=5FUNREGISTER=5FMEMORY, ®); >+ if (ret) { >+ RTE=5FLOG(ERR, EAL, " cannot unregister vaddr for IOMMU, error %i >(%s)\n", >+ errno, strerror(errno)); >+ return -1; >+ } > } >=20 > return 0; > } >=20 >+struct spapr=5Fremap=5Fwalk=5Fparam { >+ int vfio=5Fcontainer=5Ffd; >+ uint64=5Ft window=5Fsize; >+}; >+ > static int > vfio=5Fspapr=5Fmap=5Fwalk(const struct rte=5Fmemseg=5Flist *msl, > const struct rte=5Fmemseg *ms, void *arg) > { >- int *vfio=5Fcontainer=5Ffd =3D arg; >+ struct spapr=5Fremap=5Fwalk=5Fparam *p =3D arg; >=20 >- if (msl->external) >+ if (msl->external || ms->iova >=3D p->window=5Fsize) > return 0; >=20 >- return vfio=5Fspapr=5Fdma=5Fdo=5Fmap(*vfio=5Fcontainer=5Ffd, ms->addr=5F= 64, >ms->iova, >+ return vfio=5Fspapr=5Fdma=5Fdo=5Fmap(p->vfio=5Fcontainer=5Ffd, ms->addr= =5F64, >ms->iova, > ms->len, 1); > } >=20 >+static int >+vfio=5Fspapr=5Funmap=5Fwalk(const struct rte=5Fmemseg=5Flist *msl, >+ const struct rte=5Fmemseg *ms, void *arg) >+{ >+ struct spapr=5Fremap=5Fwalk=5Fparam *p =3D arg; >+ >+ if (msl->external || ms->iova >=3D p->window=5Fsize) >+ return 0; >+ >+ return vfio=5Fspapr=5Fdma=5Fdo=5Fmap(p->vfio=5Fcontainer=5Ffd, ms->addr= =5F64, >ms->iova, >+ ms->len, 0); >+} >+ > struct spapr=5Fwalk=5Fparam { > uint64=5Ft window=5Fsize; > uint64=5Ft hugepage=5Fsz; >@@ -1481,14 +1515,13 @@ vfio=5Fspapr=5Fcreate=5Fnew=5Fdma=5Fwindow(int >vfio=5Fcontainer=5Ffd, > return 0; > } >=20 >+static struct vfio=5Fiommu=5Fspapr=5Ftce=5Fcreate prev=5Fcreate; >+ > static int > vfio=5Fspapr=5Fdma=5Fmem=5Fmap(int vfio=5Fcontainer=5Ffd, uint64=5Ft vadd= r, >uint64=5Ft iova, > uint64=5Ft len, int do=5Fmap) > { >- struct spapr=5Fwalk=5Fparam param; >- struct vfio=5Fiommu=5Fspapr=5Ftce=5Fcreate create =3D { >- .argsz =3D sizeof(create), >- }; >+ struct vfio=5Fiommu=5Fspapr=5Ftce=5Fcreate create; > struct vfio=5Fconfig *vfio=5Fcfg; > struct user=5Fmem=5Fmaps *user=5Fmem=5Fmaps; > int i, ret =3D 0; >@@ -1502,43 +1535,59 @@ vfio=5Fspapr=5Fdma=5Fmem=5Fmap(int vfio=5Fcontaine= r=5Ffd, >uint64=5Ft vaddr, uint64=5Ft iova, > user=5Fmem=5Fmaps =3D &vfio=5Fcfg->mem=5Fmaps; > rte=5Fspinlock=5Frecursive=5Flock(&user=5Fmem=5Fmaps->lock); >=20 >- /* check if window size needs to be adjusted */ >- memset(¶m, 0, sizeof(param)); >- >- /* we're inside a callback so use thread-unsafe version */ >- if (rte=5Fmemseg=5Fwalk=5Fthread=5Funsafe(vfio=5Fspapr=5Fwindow=5Fsize= =5Fwalk, >- ¶m) < 0) { >- RTE=5FLOG(ERR, EAL, "Could not get window size\n"); >- ret =3D -1; >- goto out; >- } >+ memcpy(&create, &prev=5Fcreate, sizeof(create)); >=20 > /* also check user maps */ > for (i =3D 0; i < user=5Fmem=5Fmaps->n=5Fmaps; i++) { >- uint64=5Ft max =3D user=5Fmem=5Fmaps->maps[i].iova + >- user=5Fmem=5Fmaps->maps[i].len; >- create.window=5Fsize =3D RTE=5FMAX(create.window=5Fsize, max); >+ struct user=5Fmem=5Fmap *map =3D &user=5Fmem=5Fmaps->maps[i]; >+ >+ if (vaddr =3D=3D map->addr && len =3D=3D map->len) >+ continue; >+ create.window=5Fsize =3D RTE=5FMAX(create.window=5Fsize, map->iova + >map->len); > } >=20 > /* sPAPR requires window size to be a power of 2 */ >- create.window=5Fsize =3D rte=5Falign64pow2(param.window=5Fsize); >- create.page=5Fshift =3D =5F=5Fbuiltin=5Fctzll(param.hugepage=5Fsz); >- create.levels =3D 1; >+ create.window=5Fsize =3D rte=5Falign64pow2(create.window=5Fsize); >=20 > if (do=5Fmap) { >- void *addr; > /* re-create window and remap the entire memory */ >- if (iova > create.window=5Fsize) { >+ if (iova + len > create.window=5Fsize) { >+ struct spapr=5Fremap=5Fwalk=5Fparam param =3D { >+ .vfio=5Fcontainer=5Ffd =3D vfio=5Fcontainer=5Ffd, >+ .window=5Fsize =3D create.window=5Fsize, >+ }; >+ >+ /* we're inside a callback, so use thread-unsafe version >+ */ >+ rte=5Fmemseg=5Fwalk=5Fthread=5Funsafe(vfio=5Fspapr=5Funmap=5Fwalk, >+ ¶m); >+ /* destruct all user maps */ >+ for (i =3D 0; i < user=5Fmem=5Fmaps->n=5Fmaps; i++) { >+ struct user=5Fmem=5Fmap *map =3D >+ &user=5Fmem=5Fmaps->maps[i]; >+ if (vaddr =3D=3D map->addr && len =3D=3D map->len) >+ continue; >+ if (vfio=5Fspapr=5Fdma=5Fdo=5Fmap(vfio=5Fcontainer=5Ffd, >+ map->addr, map->iova, map->len, >+ 0)) { >+ RTE=5FLOG(ERR, EAL, "Could not destruct user DMA maps\n"); >+ ret =3D -1; >+ goto out; >+ } >+ } >+ >+ create.window=5Fsize =3D rte=5Falign64pow2(iova + len); > if (vfio=5Fspapr=5Fcreate=5Fnew=5Fdma=5Fwindow(vfio=5Fcontainer=5Ffd, > &create) < 0) { > RTE=5FLOG(ERR, EAL, "Could not create new DMA window\n"); > ret =3D -1; > goto out; > } >+ memcpy(&prev=5Fcreate, &create, sizeof(create)); > /* we're inside a callback, so use thread-unsafe version > */ > if (rte=5Fmemseg=5Fwalk=5Fthread=5Funsafe(vfio=5Fspapr=5Fmap=5Fwalk, >- &vfio=5Fcontainer=5Ffd) < 0) { >+ ¶m) < 0) { > RTE=5FLOG(ERR, EAL, "Could not recreate DMA maps\n"); > ret =3D -1; > goto out; >@@ -1547,6 +1596,8 @@ vfio=5Fspapr=5Fdma=5Fmem=5Fmap(int vfio=5Fcontainer= =5Ffd, >uint64=5Ft vaddr, uint64=5Ft iova, > for (i =3D 0; i < user=5Fmem=5Fmaps->n=5Fmaps; i++) { > struct user=5Fmem=5Fmap *map =3D > &user=5Fmem=5Fmaps->maps[i]; >+ if (vaddr =3D=3D map->addr && len =3D=3D map->len) >+ continue; > if (vfio=5Fspapr=5Fdma=5Fdo=5Fmap(vfio=5Fcontainer=5Ffd, > map->addr, map->iova, map->len, > 1)) { >@@ -1556,23 +1607,8 @@ vfio=5Fspapr=5Fdma=5Fmem=5Fmap(int vfio=5Fcontainer= =5Ffd, >uint64=5Ft vaddr, uint64=5Ft iova, > } > } > } >- >- /* now that we've remapped all of the memory that was present >- * before, map the segment that we were requested to map. >- * >- * however, if we were called by the callback, the memory we >- * were called with was already in the memseg list, so previous >- * mapping should've mapped that segment already. >- * >- * virt2memseg=5Flist is a relatively cheap check, so use that. if >- * memory is within any memseg list, it's a memseg, so it's >- * already mapped. >- */ >- addr =3D (void *)(uintptr=5Ft)vaddr; >- if (rte=5Fmem=5Fvirt2memseg=5Flist(addr) =3D=3D NULL && >- vfio=5Fspapr=5Fdma=5Fdo=5Fmap(vfio=5Fcontainer=5Ffd, >- vaddr, iova, len, 1) < 0) { >- RTE=5FLOG(ERR, EAL, "Could not map segment\n"); >+ if (vfio=5Fspapr=5Fdma=5Fdo=5Fmap(vfio=5Fcontainer=5Ffd, vaddr, iova, l= en, 1)) >{ >+ RTE=5FLOG(ERR, EAL, "Failed to map DMA\n"); > ret =3D -1; > goto out; > } >@@ -1613,6 +1649,7 @@ vfio=5Fspapr=5Fdma=5Fmap(int vfio=5Fcontainer=5Ffd) > RTE=5FLOG(ERR, EAL, "Could not create new DMA window\n"); > return -1; > } >+ memcpy(&prev=5Fcreate, &create, sizeof(create)); >=20 > /* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */ > if (rte=5Fmemseg=5Fwalk(vfio=5Fspapr=5Fmap=5Fwalk, &vfio=5Fcontainer=5Ff= d) < 0) >@@ -1620,6 +1657,21 @@ vfio=5Fspapr=5Fdma=5Fmap(int vfio=5Fcontainer=5Ffd) >=20 > return 0; > } >+#else >+static int >+vfio=5Fspapr=5Fdma=5Fmem=5Fmap(int =5F=5Frte=5Funused vfio=5Fcontainer=5F= fd, >+ uint64=5Ft =5F=5Frte=5Funused vaddr, >+ uint64=5Ft =5F=5Frte=5Funused iova, uint64=5Ft =5F=5Frte=5Funused len, >+ int =5F=5Frte=5Funused do=5Fmap) >+{ >+ return 0; >+} >+static int >+vfio=5Fspapr=5Fdma=5Fmap(int =5F=5Frte=5Funused vfio=5Fcontainer=5Ffd) >+{ >+ return 0; >+} >+#endif >=20 > static int > vfio=5Fnoiommu=5Fdma=5Fmap(int =5F=5Frte=5Funused vfio=5Fcontainer=5Ffd) >--=20 >2.17.1 > > Added CC: aconole@redhat.com I updated the patch not to break builds in x86.