On Fri, Jul 10, 2020 at 6:08 AM Thomas Monjalon <thomas@monjalon.net> wrote:
10/07/2020 11:54, David Marchand:
> On Wed, Jul 8, 2020 at 11:26 AM <alvinx.zhang@intel.com> wrote:
> > From: Alvin Zhang <alvinx.zhang@intel.com>
> >
> > When mapping a PCI BAR containing an MSI-X table, some devices do not
> > need to actually map this BAR or only need to map part of them, which
> > may cause the mapping to fail. Now some checks are added and a non-NULL
> > initial value is set to the variable to avoid this situation.

Note: this regression would not have happened if we had some CI tests
for simple device probing.
Please let's invest more in CI.

Are you referring to adding tests to specifically check these conditions, or would this have been caught just from the continued expansion of testing on real hardware / NICs, or both? It seems like the issue is caused by a combination of hardware behaviors and "broken code".  My point is, without having some of those behaviors in the CI, we might still not have caught this issue, even with probing checks.  Of course, more checks are always a good thing.  

> > Fixes: 2fd3567e5425 ("pci: use OS generic memory mapping functions")
> > Cc: talshn@mellanox.com

No he was not Cc in the thread. Same for Anatoly.
Adding more people in Cc...

> > Signed-off-by: Alvin Zhang <alvinx.zhang@intel.com>
> > ---
> > --- a/drivers/bus/pci/linux/pci_vfio.c
> > +++ b/drivers/bus/pci/linux/pci_vfio.c
> > @@ -547,6 +547,14 @@
> >                         bar_index,
> >                         memreg[0].offset, memreg[0].size,
> >                         memreg[1].offset, memreg[1].size);
> > +
> > +               if (memreg[0].size == 0 && memreg[1].size == 0) {
> > +                       /* No need to map this BAR */
> > +                       RTE_LOG(DEBUG, EAL, "Skipping BAR%d\n", bar_index);
> > +                       bar->size = 0;
> > +                       bar->addr = 0;
> > +                       return 0;
> > +               }
>
> We already have a check on bar size == 0.
> Why would we have this condition?
> Broken hw?
>
>
> >         } else {
> >                 memreg[0].offset = bar->offset;
> >                 memreg[0].size = bar->size;
> > @@ -556,7 +564,9 @@
> >         bar_addr = mmap(bar->addr, bar->size, 0, MAP_PRIVATE |
> >                         MAP_ANONYMOUS | additional_flags, -1, 0);
> >         if (bar_addr != MAP_FAILED) {
> > -               void *map_addr = NULL;
> > +               /* Set non NULL initial value for in case of no PCI mapping */
> > +               void *map_addr = bar_addr;
> > +
>
> It took me some time to understand this code...
> Anyway, we have a regression in the librte_pci.
> This is where the fix should be.

Yes, I am going to send a fix.

> We can cleanup this code later.

Yes please, this function isn't understandable and lack of comments.
Anatoly please?




--
Lincoln Lavoie
Senior Engineer, Broadband Technologies
21 Madbury Rd., Ste. 100, Durham, NH 03824
+1-603-674-2755 (m)