From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by dpdk.org (Postfix) with ESMTP id 8330C7CFD for ; Wed, 20 Sep 2017 16:38:53 +0200 (CEST) Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id v8KEb6ZO051857 for ; Wed, 20 Sep 2017 10:38:52 -0400 Received: from smtp.notes.na.collabserv.com (smtp.notes.na.collabserv.com [192.155.248.93]) by mx0b-001b2d01.pphosted.com with ESMTP id 2d3t218y6g-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Wed, 20 Sep 2017 10:38:52 -0400 Received: from localhost by smtp.notes.na.collabserv.com with smtp.notes.na.collabserv.com ESMTP for from ; Wed, 20 Sep 2017 14:38:52 -0000 Received: from us1a3-smtp03.a3.dal06.isc4sb.com (10.106.154.98) by smtp.notes.na.collabserv.com (10.106.227.39) with smtp.notes.na.collabserv.com ESMTP; Wed, 20 Sep 2017 14:38:50 -0000 Received: from us1a3-mail173.a3.dal06.isc4sb.com ([10.146.71.126]) by us1a3-smtp03.a3.dal06.isc4sb.com with ESMTP id 2017092014384879-621429 ; Wed, 20 Sep 2017 14:38:48 +0000 MIME-Version: 1.0 In-Reply-To: <8c14c677-6341-6b3c-aa20-36a9297b1f94@intel.com> To: "Burakov, Anatoly" Cc: dev@dpdk.org From: "Jonas Pfefferle1" Date: Wed, 20 Sep 2017 16:38:47 +0200 References: <1502969759-11036-1-git-send-email-jpf@zurich.ibm.com> <8c14c677-6341-6b3c-aa20-36a9297b1f94@intel.com> X-KeepSent: EB5631D7:4021F9FC-C12581A1:004D3661; type=4; name=$KeepSent X-Mailer: IBM Notes Release 9.0.1 October 14, 2013 X-LLNOutbound: False X-Disclaimed: 34099 X-TNEFEvaluated: 1 x-cbid: 17092014-1799-0000-0000-000002E0CF7D X-IBM-SpamModules-Scores: BY=0.034398; FL=0; FP=0; FZ=0; HX=0; KW=0; PH=0; SC=0.4332; ST=0; TS=0; UL=0; ISC=; MB=0.086162 X-IBM-SpamModules-Versions: BY=3.00007768; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000230; SDB=6.00919787; UDB=6.00462122; IPR=6.00700013; BA=6.00005599; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00017221; XFM=3.00000015; UTC=2017-09-20 14:38:51 X-IBM-AV-DETECTION: SAVI=unsuspicious REMOTE=unsuspicious XFE=unused X-IBM-AV-VERSION: SAVI=2017-09-20 10:14:44 - 6.00007356 x-cbparentid: 17092014-1800-0000-0000-0000D5961920 Message-Id: X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2017-09-20_04:, , signatures=0 X-Proofpoint-Spam-Reason: safe Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-dev] [PATCH] vfio: refactor PCI BAR mapping X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Sep 2017 14:38:53 -0000 Hi Anatoly, "Burakov, Anatoly" wrote on 09/19/2017 01:40:51 PM: > From: "Burakov, Anatoly" > To: Jonas Pfefferle , dev@dpdk.org > Date: 09/19/2017 01:41 PM > Subject: Re: [dpdk-dev] [PATCH] vfio: refactor PCI BAR mapping > > Hi Jonas, > > On 17-Aug-17 12:35 PM, Jonas Pfefferle wrote: > > Split pci=5Fvfio=5Fmap=5Fresource for primary and secondary processes. > > Save all relevant mapping data in primary process to allow > > the secondary process to perform mappings. > > > > Signed-off-by: Jonas Pfefferle > > --- > > lib/librte=5Feal/common/include/rte=5Fpci.h | 7 + > > lib/librte=5Feal/linuxapp/eal/eal=5Fpci=5Fvfio.c | 447 +++++++++++++++ > ++------------ > > 2 files changed, 271 insertions(+), 183 deletions(-) > > > > diff --git a/lib/librte=5Feal/common/include/rte=5Fpci.h b/lib/ > librte=5Feal/common/include/rte=5Fpci.h > > index 8b12339..0821af9 100644 > > --- a/lib/librte=5Feal/common/include/rte=5Fpci.h > > +++ b/lib/librte=5Feal/common/include/rte=5Fpci.h > > @@ -214,6 +214,12 @@ struct pci=5Fmap { > > uint64=5Ft phaddr; > > }; > > > > +struct pci=5Fmsix=5Ftable { > > + int bar=5Findex; > > + uint32=5Ft offset; > > + uint32=5Ft size; > > +}; > > + > > /** > > * A structure describing a mapped PCI resource. > > * For multi-process we need to reproduce all PCI mappings in secondary > > @@ -226,6 +232,7 @@ struct mapped=5Fpci=5Fresource { > > char path[PATH=5FMAX]; > > int nb=5Fmaps; > > struct pci=5Fmap maps[PCI=5FMAX=5FRESOURCE]; > > + struct pci=5Fmsix=5Ftable msix=5Ftable; > > }; > > > > /** mapped pci device list */ > > diff --git a/lib/librte=5Feal/linuxapp/eal/eal=5Fpci=5Fvfio.c b/lib/ > librte=5Feal/linuxapp/eal/eal=5Fpci=5Fvfio.c > > index aa9d96e..f37552a 100644 > > --- a/lib/librte=5Feal/linuxapp/eal/eal=5Fpci=5Fvfio.c > > +++ b/lib/librte=5Feal/linuxapp/eal/eal=5Fpci=5Fvfio.c > > @@ -88,8 +88,7 @@ pci=5Fvfio=5Fwrite=5Fconfig(const struct > rte=5Fintr=5Fhandle *intr=5Fhandle, > > > > /* get PCI BAR number where MSI-X interrupts are */ > > static int > > -pci=5Fvfio=5Fget=5Fmsix=5Fbar(int fd, int *msix=5Fbar, uint32=5Ft *msix=5Ftable=5Foffset, > > - uint32=5Ft *msix=5Ftable=5Fsize) > > +pci=5Fvfio=5Fget=5Fmsix=5Fbar(int fd, struct pci=5Fmsix=5Ftable *msix= =5Ftable) > > { > > int ret; > > uint32=5Ft reg; > > @@ -161,9 +160,10 @@ pci=5Fvfio=5Fget=5Fmsix=5Fbar(int fd, int *msix=5F= bar, > uint32=5Ft *msix=5Ftable=5Foffset, > > return -1; > > } > > > > - *msix=5Fbar =3D reg & RTE=5FPCI=5FMSIX=5FTABLE=5FBIR; > > - *msix=5Ftable=5Foffset =3D reg & RTE=5FPCI=5FMSIX=5FTABLE=5FO= FFSET; > > - *msix=5Ftable=5Fsize =3D 16 * (1 + (flags & RTE=5FPCI=5FMSIX=5FFLAGS=5FQSIZE)); > > + msix=5Ftable->bar=5Findex =3D reg & RTE=5FPCI=5FMSIX=5FTABLE= =5FBIR; > > + msix=5Ftable->offset =3D reg & RTE=5FPCI=5FMSIX=5FTABLE=5FOFF= SET; > > + msix=5Ftable->size =3D > > + 16 * (1 + (flags & RTE=5FPCI=5FMSIX=5FFLAGS=5FQSIZE)); > > > > return 0; > > } > > @@ -300,25 +300,152 @@ pci=5Fvfio=5Fsetup=5Finterrupts(struct > rte=5Fpci=5Fdevice *dev, int vfio=5Fdev=5Ffd) > > return -1; > > } > > > > -/* > > - * map the PCI resources of a PCI device in virtual memory (VFIO version). > > - * primary and secondary processes follow almost exactly the same path > > - */ > > -int > > -pci=5Fvfio=5Fmap=5Fresource(struct rte=5Fpci=5Fdevice *dev) > > +static int > > +pci=5Fvfio=5Fis=5Fioport=5Fbar(int vfio=5Fdev=5Ffd, int bar=5Findex) > > +{ > > + uint32=5Ft ioport=5Fbar; > > + int ret; > > + > > + ret =3D pread64(vfio=5Fdev=5Ffd, &ioport=5Fbar, sizeof(ioport=5Fbar= ), > > + VFIO=5FGET=5FREGION=5FADDR(VFIO=5FPCI=5FCONFIG=5FREGION=5FI= NDEX) > > + + PCI=5FBASE=5FADDRESS=5F0 + bar=5Findex*4); > > + if (ret !=3D sizeof(ioport=5Fbar)) { > > + RTE=5FLOG(ERR, EAL, "Cannot read command (%x) from config space!\n", > > + PCI=5FBASE=5FADDRESS=5F0 + bar=5Findex*4); > > + return -1; > > + } > > + > > + if (ioport=5Fbar & PCI=5FBASE=5FADDRESS=5FSPACE=5FIO) { > > + RTE=5FLOG(INFO, EAL, "Ignore mapping IO port bar(%d) addr: %x\n", > > + bar=5Findex, ioport=5Fbar); > > This log message should probably go to the "continue" portion of the > calling code, it looks out of place here. Agree. I will move it. > > > + return 1; > > + } > > + return 0; > > +} > > + > > +static int > > +pci=5Fvfio=5Fsetup=5Fdevice(struct rte=5Fpci=5Fdevice *dev, int vfio= =5Fdev=5Ffd) > > +{ > > + if (pci=5Fvfio=5Fsetup=5Finterrupts(dev, vfio=5Fdev=5Ffd) !=3D 0) { > > + RTE=5FLOG(ERR, EAL, "Error setting up interrupts!\n"); > > + return -1; > > + } > > + > > + /* set bus mastering for the device */ > > + if (pci=5Fvfio=5Fset=5Fbus=5Fmaster(vfio=5Fdev=5Ffd, true)) { > > + RTE=5FLOG(ERR, EAL, "Cannot set up bus mastering!\n"); > > + return -1; > > + } > > + > > + /* Reset the device */ > > + ioctl(vfio=5Fdev=5Ffd, VFIO=5FDEVICE=5FRESET); > > + > > + return 0; > > +} > > + > > +static int > > +pci=5Fvfio=5Fmmap=5Fbar(int vfio=5Fdev=5Ffd, struct mapped=5Fpci=5Fres= ource *vfio=5Fres, > > + int bar=5Findex, int additional=5Fflags) > > +{ > > + struct memreg { > > + unsigned long offset, size; > > + } memreg[2] =3D {}; > > + void *bar=5Faddr; > > + struct pci=5Fmsix=5Ftable *msix=5Ftable =3D &vfio=5Fres->msix=5Ftab= le; > > + struct pci=5Fmap *bar =3D &vfio=5Fres->maps[bar=5Findex]; > > + > > + if (bar->size =3D=3D 0) > > + /* Skip this BAR */ > > + return 0; > > + > > + if (msix=5Ftable->bar=5Findex =3D=3D bar=5Findex) { > > + /* > > + * VFIO will not let us map the MSI-X table, > > + * but we can map around it. > > + */ > > + uint32=5Ft table=5Fstart =3D msix=5Ftable->offset; > > + uint32=5Ft table=5Fend =3D table=5Fstart + msix=5Ftable->size; > > + table=5Fend =3D (table=5Fend + ~PAGE=5FMASK) & PAGE=5FMASK; > > + table=5Fstart &=3D PAGE=5FMASK; > > + > > + if (table=5Fstart =3D=3D 0 && table=5Fend >=3D bar->size) { > > + /* Cannot map this BAR */ > > + RTE=5FLOG(DEBUG, EAL, "Skipping BAR%d\n", bar=5Findex); > > + bar->size =3D 0; > > + bar->addr =3D 0; > > + return 0; > > + } > > + > > + memreg[0].offset =3D bar->offset; > > + memreg[0].size =3D table=5Fstart; > > + memreg[1].offset =3D bar->offset + table=5Fend; > > + memreg[1].size =3D bar->size - table=5Fend; > > + > > + RTE=5FLOG(DEBUG, EAL, > > + "Trying to map BAR%d that contains the MSI-X " > > + "table. Trying offsets: " > > + "0x%04lx:0x%04lx, 0x%04lx:0x%04lx\n", bar=5Findex, > > + memreg[0].offset, memreg[0].size, > > + memreg[1].offset, memreg[1].size); > > + } > > I believe you forgot the "else" part. memreg is, by default, initialized > to zeroes, and if bar=5Findex is not equal to MSI-X bar index, memreg does > not get filled with any values, and therefore all of the following > checks for memreg.size etc. will return false and you'll end up with > failed BAR mappings. > > Confirmed with testing: > > EAL: PCI device 0000:08:00.0 on NUMA socket 0 > EAL: probe driver: 8086:10fb net=5Fixgbe > EAL: using IOMMU type 1 (Type 1) > EAL: Failed to map pci BAR0 > EAL: 0000:08:00.0 mapping BAR0 failed: Success > EAL: Requested device 0000:08:00.0 cannot be used You are correct. Not sure how this happened...I remember testing this successfully on x86 and POWER. I will create a new version with the fixes. The whole reason I started this refactoring effort is to allow (in a later patch) to mmap the MSI-X table if the kernel allows it (e.g. via this patch https://lkml.org/lkml/2017/8/7/98). The problem on POWER is that the default page size is 64K, i.e. you will not be able to map around the MSI-X table which makes most of the NVMe devices (at least to my knowledge) unusable with SPDK on POWER. > > -- > Thanks, > Anatoly > Thanks, Jonas