From: Tiwei Bie <tiwei.bie@intel.com>
To: "Wiles, Keith" <keith.wiles@intel.com>
Cc: dpdk-dev <dev@dpdk.org>,
"Liang, Cunming" <cunming.liang@intel.com>,
"Richardson, Bruce" <bruce.richardson@intel.com>,
"alejandro.lucero@netronome.com" <alejandro.lucero@netronome.com>
Subject: Re: [dpdk-dev] [RFC 3/3] bus/pci: add mdev support
Date: Thu, 4 Apr 2019 12:19:25 +0800 [thread overview]
Message-ID: <20190404041925.GA26931@dpdk-tbie.sh.intel.com> (raw)
Message-ID: <20190404041925.x1ucNUFFu2cGqpbbJLRK9Rip0RogMjvaKnRBl0MDUwg@z> (raw)
In-Reply-To: <055CA897-B4BA-446F-BE9B-620DAEAB02EC@intel.com>
On Wed, Apr 03, 2019 at 10:13:25PM +0800, Wiles, Keith wrote:
> Some minor nits.
>
> > On Apr 3, 2019, at 2:18 AM, Tiwei Bie <tiwei.bie@intel.com> wrote:
> >
> > This patch adds the mdev support in PCI bus driver. A mdev
> > driver is introduced to probe the mdev devices whose device
> > API is "vfio-pci" on the mdev bus.
> >
> > PS. There are some hacks in this patch for now.
> >
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > ---
> > drivers/bus/pci/Makefile | 3 +
> > drivers/bus/pci/linux/Makefile | 4 +
> > drivers/bus/pci/linux/pci_vfio.c | 35 ++-
> > drivers/bus/pci/linux/pci_vfio_mdev.c | 305 ++++++++++++++++++++++++++
> > drivers/bus/pci/meson.build | 4 +-
> > drivers/bus/pci/pci_common.c | 17 +-
> > drivers/bus/pci/private.h | 9 +
> > drivers/bus/pci/rte_bus_pci.h | 11 +-
> > 8 files changed, 370 insertions(+), 18 deletions(-)
> > create mode 100644 drivers/bus/pci/linux/pci_vfio_mdev.c
> >
> > diff --git a/drivers/bus/pci/Makefile b/drivers/bus/pci/Makefile
> > index de53ce1bf..085ec9066 100644
> > --- a/drivers/bus/pci/Makefile
> > +++ b/drivers/bus/pci/Makefile
> > @@ -27,6 +27,9 @@ CFLAGS += -DALLOW_EXPERIMENTAL_API
>
> This define is enabled in 50-70 Makefiles, we can leave this here, but we should refactor this to a common place in the future.
> >
> > LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
> > LDLIBS += -lrte_ethdev -lrte_pci -lrte_kvargs
> > +ifeq ($(CONFIG_RTE_LIBRTE_MDEV_BUS),y)
> > +LDLIBS += -lrte_bus_mdev
> > +endif
>
> See comment below.
> >
> > include $(RTE_SDK)/drivers/bus/pci/$(SYSTEM)/Makefile
> > SRCS-$(CONFIG_RTE_LIBRTE_PCI_BUS) := $(addprefix $(SYSTEM)/,$(SRCS))
> > diff --git a/drivers/bus/pci/linux/Makefile b/drivers/bus/pci/linux/Makefile
> > index 90404468b..88bbc2390 100644
> > --- a/drivers/bus/pci/linux/Makefile
> > +++ b/drivers/bus/pci/linux/Makefile
> > @@ -4,3 +4,7 @@
> > SRCS += pci.c
> > SRCS += pci_uio.c
> > SRCS += pci_vfio.c
> > +
> > +ifeq ($(CONFIG_RTE_LIBRTE_MDEV_BUS),y)
> > + SRCS += pci_vfio_mdev.c
> > +endif
>
> Do we need a configuration option for MDEV?
> Can it be enabled for all builds or reuse a current configuration if only for some OS or arch?
I think it's possible.
>
[...]
> > +static int
> > +get_pci_id(const char *sysfs_base, const char *dev_addr,
> > + struct rte_pci_id *pci_id)
> > +{
> > + int ret = 0;
> > + int iommu_group_num;
> > + int vfio_group_fd;
> > + int vfio_dev_fd;
> > + int container;
> > + int class;
> > + char name[PATH_MAX];
> > + struct vfio_group_status group_status = {
> > + .argsz = sizeof(group_status) };
> > +
> > + container = open("/dev/vfio/vfio", O_RDWR);
>
> Should this one use the VFIO_CONTAINER_PATH define in rte_vfio.h?
> The define is gated by VFIO_PRESENT in that header.
Yeah! And all the code in this file should be gated
by VFIO_PRESENT as well.
> > + if (container < 0) {
> > + RTE_LOG(WARNING, EAL, "Failed to open VFIO container\n");
> > + ret = -1;
> > + goto out;
> > + }
> > +
> > + if (ioctl(container, VFIO_GET_API_VERSION) != VFIO_API_VERSION) {
> > + /* Unknown API version */
> > + RTE_LOG(WARNING, EAL, "Unknown VFIO API version\n");
> > + ret = -1;
> > + goto close_container;
> > + }
> > +
> > + if (rte_vfio_get_group_num(sysfs_base, dev_addr,
> > + &iommu_group_num) <= 0) {
> > + RTE_LOG(WARNING, EAL, "%s not managed by VFIO driver\n",
> > + dev_addr);
> > + ret = -1;
> > + goto close_container;
> > + }
> > +
> > + snprintf(name, sizeof(name), "/dev/vfio/%d", iommu_group_num);
>
> We should be testing the return value from snprintf, but it is not done anyplace else in the code?
> We need to look at fixing this in a different patch, but not here.
> > +
> > + vfio_group_fd = open(name, O_RDWR);
> > + if (vfio_group_fd < 0) {
> > + ret = -1;
> > + goto close_container;
> > + }
[...]
> > + /* class_id */
> > + if (pread64(vfio_dev_fd, &class, sizeof(uint32_t),
> > + VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
> > + PCI_CLASS_REVISION) != sizeof(uint32_t)) {
> > + RTE_LOG(ERR, EAL, "Cannot read ClassID from PCI config space\n”);
>
> These should possible be DEBUG messages, but ERR is ok I guess. To me filling up the log with a bunch of messages when it is also flagged and log at a higher layer to many log messages. It would require us to look and make a cleaner
> > + ret = -1;
> > + goto close_device;
> > + }
> > + pci_id->class_id = class >> 8;
> > +
> > +close_device:
> > + if (close(vfio_dev_fd) < 0) {
> > + RTE_LOG(INFO, EAL, "Error when closing VFIO device for %s\n",
> > + dev_addr);
>
> These should be ERR, DEBUG or WARN not INFO IMO or no log message at all.
You are right. I missed this one.
> > + ret = -1;
> > + }
> > +
> > +close_group:
> > + if (close(vfio_group_fd) < 0) {
> > + RTE_LOG(INFO, EAL, "Error when closing VFIO group for %s\n",
> > + dev_addr);
> > + ret = -1;
> > + }
> > +
> > +close_container:
> > + if (close(container) < 0) {
> > + RTE_LOG(INFO, EAL, "Error when closing VFIO container\n");
> > + ret = -1;
> > + }
> > +
> > +out:
>
> Jumping to 4 different exit points makes this function complex, would it not be better to have one error exit point and test if the fds need to be closed
> e.g.
> if (pread64(...)) {
> RTE_LOG(ERR, EAL, “Error message”);
> goto err_exit;
> }
>
> return 0;
> err_exit:
> if (vfio_dev_fd && close(vfio_dev_fd) < 0) {
>
> }
> if (…) {
> }
> return -1;
>
> This should eliminate the variable ret and reduce the lines of code.
Thanks for the suggestion. I can do that.
> > + return ret;
> > +}
> > +
> > +static int vfio_pci_probe(struct rte_mdev_driver *mdev_drv __rte_unused,
> > + struct rte_mdev_device *mdev_dev)
> > +{
> > + char name[RTE_UUID_STRLEN];
> > + struct rte_pci_device *dev;
> > + struct rte_bus *bus;
> > + int ret;
> > +
> > + bus = rte_bus_find_by_name("pci");
> > + if (bus == NULL) {
> > + RTE_LOG(ERR, EAL, "Cannot find bus pci\n");
> > + return -ENOENT;
> > + }
> > +
> > + if (bus->plug == NULL) {
> > + RTE_LOG(ERR, EAL, "Function plug not supported by bus (%s)\n",
> > + bus->name);
> > + return -ENOTSUP;
> > + }
> > +
> > + dev = malloc(sizeof(*dev));
> > + if (dev == NULL)
> > + return -ENOMEM;
>
> If going to add error logs for the above tests, why does this one not get one?
> Should we just remove them and check in the calling function instead? Then convert these to DEBUG logs or remove them.
Thanks for the suggestion. Will improve the logs.
> > +
> > + memset(dev, 0, sizeof(*dev));
> > + dev->device.bus = &rte_pci_bus.bus;
> > + rte_uuid_unparse(mdev_dev->addr, name, sizeof(name));
> > +
> > + if (get_pci_id(rte_mdev_get_sysfs_path(), name, &dev->id)) {
> > + free(dev);
> > + return -1;
> > + }
> > +
> > + snprintf(dev->name, sizeof(dev->name), "%s", name);
>
> This should be strlcpy()
> > + dev->device.name = dev->name;
> > + dev->kdrv = RTE_KDRV_VFIO;
> > + dev->use_uuid = 1;
> > + rte_uuid_copy(dev->uuid, mdev_dev->addr);
> > +
> > + // TODO: dev->device.devargs, etc
> > +
> > + memset(&dev->addr, -1, sizeof(dev->addr)); // XXX: TODO
>
> I have seen in the past that TODO or FIXME is not something that should be in the code. The TODO items should be removed and tracked outside the code if needed to be done later.
Sorry for the confusion. There are some quick hacks in this RFC
(especially in this function). I highlighted them with XXX or TODO.
I didn't get rid of them for now, because there are different
possible ways to add the mdev support in DPDK, and this RFC is just
to demonstrate one possible way that we can do and to hear people's
thoughts/opinions.
PS. All the hacks (including comments starting with //) in this RFC
are temporary. They will be fixed or removed in the formal patch.
> > +
> > + /* device is valid, add to the list (sorted) */
> > + if (TAILQ_EMPTY(&rte_pci_bus.device_list)) {
> > + rte_pci_add_device(dev);
> > + } else {
> > + struct rte_pci_device *dev2;
> > + int ret;
> > +
> > + TAILQ_FOREACH(dev2, &rte_pci_bus.device_list, next) {
> > + // XXX
>
> What does this comment mean? remove it or explain it.
It's to indicate that there is a quick hack here. It won't
exist in the formal patch.
> > + ret = rte_pci_addr_cmp(&dev->addr, &dev2->addr);
> > + if (ret == 0)
> > + ret = strncmp(dev->name, dev2->name,
> > + sizeof(dev->name));
> > + if (ret > 0)
> > + continue;
> > + if (ret < 0) {
> > + rte_pci_insert_device(dev2, dev);
> > + goto plug;
> > + }
> > + /* already registered */
> > + free(dev);
> > + return 0;
> > + }
> > +
> > + rte_pci_add_device(dev);
> > + }
> > +
> > +plug:
> > + ret = bus->plug(&dev->device);
> > + if (ret != 0) {
> > + rte_pci_remove_device(dev);
> > + free(dev);
> > + } else {
> > + mdev_dev->private = dev;
> > + }
>
> The coding guide states we remove {} around single line statements.
> > + return ret;
> > +}
> > +
> > +static int vfio_pci_remove(struct rte_mdev_device *mdev_dev)
> > +{
> > + struct rte_pci_device *dev = mdev_dev->private;
> > + struct rte_bus *bus;
> > + int ret;
> > +
> > + if (dev == NULL)
> > + return 0;
> > +
> > + bus = rte_bus_find_by_name("pci");
> > + if (bus == NULL) {
> > + RTE_LOG(ERR, EAL, "Cannot find bus pci\n");
> > + return -ENOENT;
> > + }
> > +
> > + if (bus->unplug == NULL) {
> > + RTE_LOG(ERR, EAL, "Function unplug not supported by bus (%s)\n",
> > + bus->name);
> > + return -ENOTSUP;
> > + }
> > +
> > + ret = bus->unplug(&dev->device);
> > + if (ret == 0)
> > + mdev_dev->private = NULL;
> > +
> > + return ret;
> > +}
> > +
> > +static struct rte_mdev_driver vfio_pci_drv = {
> > + .dev_api = RTE_MDEV_DEV_API_VFIO_PCI,
> > + .probe = vfio_pci_probe,
> > + .remove = vfio_pci_remove
> > +};
> > +
> > +RTE_MDEV_REGISTER_DRIVER(mdev_vfio_pci, vfio_pci_drv);
> > diff --git a/drivers/bus/pci/meson.build b/drivers/bus/pci/meson.build
> > index a3140ff97..c3e884657 100644
> > --- a/drivers/bus/pci/meson.build
> > +++ b/drivers/bus/pci/meson.build
> > @@ -11,8 +11,10 @@ sources = files('pci_common.c',
> > if host_machine.system() == 'linux'
> > sources += files('linux/pci.c',
> > 'linux/pci_uio.c',
> > - 'linux/pci_vfio.c')
> > + 'linux/pci_vfio.c',
> > + 'linux/pci_vfio_mdev.c’)
>
> If you need the RTE_LIBRTE_MDEV define then pci_vfio_mdev.c needs to be built conditionally?
> > includes += include_directories('linux')
> > + deps += ['bus_mdev’]
>
> If this was added form dev then is too should be conditional.
Yeah. There should be a check of dpdk_conf.has('...')
> > else
> > sources += files('bsd/pci.c')
> > includes += include_directories('bsd')
> > diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
> > index 704b9d71a..6b47333e6 100644
> > --- a/drivers/bus/pci/pci_common.c
> > +++ b/drivers/bus/pci/pci_common.c
> > @@ -124,21 +124,17 @@ rte_pci_probe_one_driver(struct rte_pci_driver *dr,
> > {
> > int ret;
> > bool already_probed;
> > - struct rte_pci_addr *loc;
> >
> > if ((dr == NULL) || (dev == NULL))
> > return -EINVAL;
> >
> > - loc = &dev->addr;
> > -
> > /* The device is not blacklisted; Check if driver supports it */
> > if (!rte_pci_match(dr, dev))
> > /* Match of device and driver failed */
> > return 1;
> >
> > - RTE_LOG(INFO, EAL, "PCI device "PCI_PRI_FMT" on NUMA socket %i\n",
> > - loc->domain, loc->bus, loc->devid, loc->function,
> > - dev->device.numa_node);
> > + RTE_LOG(INFO, EAL, "PCI device %s on NUMA socket %i\n",
> > + dev->name, dev->device.numa_node);
> >
> > /* no initialization when blacklisted, return without error */
> > if (dev->device.devargs != NULL &&
> > @@ -208,7 +204,6 @@ rte_pci_probe_one_driver(struct rte_pci_driver *dr,
> > static int
> > rte_pci_detach_dev(struct rte_pci_device *dev)
> > {
> > - struct rte_pci_addr *loc;
> > struct rte_pci_driver *dr;
> > int ret = 0;
> >
> > @@ -216,11 +211,9 @@ rte_pci_detach_dev(struct rte_pci_device *dev)
> > return -EINVAL;
> >
> > dr = dev->driver;
> > - loc = &dev->addr;
> >
> > - RTE_LOG(DEBUG, EAL, "PCI device "PCI_PRI_FMT" on NUMA socket %i\n",
> > - loc->domain, loc->bus, loc->devid,
> > - loc->function, dev->device.numa_node);
> > + RTE_LOG(DEBUG, EAL, "PCI device %s on NUMA socket %i\n",
> > + dev->name, dev->device.numa_node);
> >
> > RTE_LOG(DEBUG, EAL, " remove driver: %x:%x %s\n", dev->id.vendor_id,
> > dev->id.device_id, dr->driver.name);
> > @@ -387,7 +380,7 @@ rte_pci_insert_device(struct rte_pci_device *exist_pci_dev,
> > }
> >
> > /* Remove a device from PCI bus */
> > -static void
> > +void
> > rte_pci_remove_device(struct rte_pci_device *pci_dev)
>
> Have not looked yet, but did this function get added to the version.map file?
> Does converting a function to public function require experimental tag too, maybe not?
This is just to make it a global function declared in private.h,
so that we can call it from other C files inside PCI bus.
> > {
> > TAILQ_REMOVE(&rte_pci_bus.device_list, pci_dev, next);
> > diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
> > index 13c3324bb..d5815ee44 100644
> > --- a/drivers/bus/pci/private.h
> > +++ b/drivers/bus/pci/private.h
> > @@ -67,6 +67,15 @@ void rte_pci_add_device(struct rte_pci_device *pci_dev);
> > void rte_pci_insert_device(struct rte_pci_device *exist_pci_dev,
> > struct rte_pci_device *new_pci_dev);
> >
> > +/**
> > + * Remove a PCI device from the PCI Bus.
> > + *
> > + * @param pci_dev
> > + * PCI device to remove
> > + * @return void
> > + */
> > +void rte_pci_remove_device(struct rte_pci_device *pci_dev);
> > +
> > /**
> > * Update a pci device object by asking the kernel for the latest information.
> > *
> > diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h
> > index 06e004cd3..465a44935 100644
> > --- a/drivers/bus/pci/rte_bus_pci.h
> > +++ b/drivers/bus/pci/rte_bus_pci.h
> > @@ -51,6 +51,13 @@ TAILQ_HEAD(rte_pci_driver_list, rte_pci_driver);
> >
> > struct rte_devargs;
> >
> > +/* It's RTE_UUID_STRLEN, which is bigger than PCI_PRI_STR_SIZE. */
> > +#define RTE_PCI_NAME_LEN (36 + 1)
> > +
> > +// XXX: we can't include rte_uuid.h directly due to the conflicts
> > +// introduced by stdbool.h
> > +typedef unsigned char rte_uuid_t[16];
>
> Does this need to have a the string ‘XXX’ in the comment? Note maybe a better word.
OK.
Thanks for the reviews/suggestions! Do appreciate it!
Regards,
Tiwei
next prev parent reply other threads:[~2019-04-04 4:19 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-03 7:18 [dpdk-dev] [RFC 0/3] Add mdev (Mediated device) support in DPDK Tiwei Bie
2019-04-03 7:18 ` Tiwei Bie
2019-04-03 7:18 ` [dpdk-dev] [RFC 1/3] eal: add a helper for reading string from sysfs Tiwei Bie
2019-04-03 7:18 ` Tiwei Bie
2019-04-03 7:18 ` [dpdk-dev] [RFC 2/3] bus/mdev: add mdev bus support Tiwei Bie
2019-04-03 7:18 ` Tiwei Bie
2019-04-03 7:18 ` [dpdk-dev] [RFC 3/3] bus/pci: add mdev support Tiwei Bie
2019-04-03 7:18 ` Tiwei Bie
2019-04-03 14:13 ` Wiles, Keith
2019-04-03 14:13 ` Wiles, Keith
2019-04-04 4:19 ` Tiwei Bie [this message]
2019-04-04 4:19 ` Tiwei Bie
2019-04-08 8:44 ` [dpdk-dev] [RFC 0/3] Add mdev (Mediated device) support in DPDK Alejandro Lucero
2019-04-08 8:44 ` Alejandro Lucero
2019-04-08 9:36 ` Tiwei Bie
2019-04-08 9:36 ` Tiwei Bie
2019-04-10 10:02 ` Francois Ozog
2019-04-10 10:02 ` Francois Ozog
2023-07-03 23:54 ` Stephen Hemminger
2019-07-15 7:52 ` [dpdk-dev] [RFC v2 0/5] " Tiwei Bie
2019-07-15 7:52 ` [dpdk-dev] [RFC v2 1/5] bus/pci: introduce an internal representation of PCI device Tiwei Bie
2019-07-15 7:52 ` [dpdk-dev] [RFC v2 2/5] bus/pci: avoid depending on private value in kernel source Tiwei Bie
2019-07-15 7:52 ` [dpdk-dev] [RFC v2 3/5] bus/pci: introduce helper for MMIO read and write Tiwei Bie
2019-07-15 7:52 ` [dpdk-dev] [RFC v2 4/5] eal: add a helper for reading string from sysfs Tiwei Bie
2019-07-15 7:52 ` [dpdk-dev] [RFC v2 5/5] bus/pci: add mdev support Tiwei Bie
2021-06-01 3:06 ` [dpdk-dev] [RFC v3 0/6] Add mdev (Mediated device) support in DPDK Chenbo Xia
2021-06-01 3:06 ` [dpdk-dev] [RFC v3 1/6] bus/pci: introduce an internal representation of PCI device Chenbo Xia
2021-06-01 3:06 ` [dpdk-dev] [RFC v3 2/6] bus/pci: avoid depending on private value in kernel source Chenbo Xia
2021-06-01 3:06 ` [dpdk-dev] [RFC v3 3/6] bus/pci: introduce helper for MMIO read and write Chenbo Xia
2021-06-01 3:06 ` [dpdk-dev] [RFC v3 4/6] eal: add a helper for reading string from sysfs Chenbo Xia
2021-06-01 5:37 ` Stephen Hemminger
2021-06-08 5:47 ` Xia, Chenbo
2021-06-01 5:39 ` Stephen Hemminger
2021-06-08 5:48 ` Xia, Chenbo
2021-06-11 7:19 ` Thomas Monjalon
2021-06-01 3:06 ` [dpdk-dev] [RFC v3 5/6] bus/pci: add mdev support Chenbo Xia
2021-06-01 3:06 ` [dpdk-dev] [RFC v3 6/6] bus/pci: add sparse mmap support for mediated PCI devices Chenbo Xia
2021-06-11 7:15 ` [dpdk-dev] [RFC v3 0/6] Add mdev (Mediated device) support in DPDK Thomas Monjalon
2021-06-15 2:49 ` Xia, Chenbo
2021-06-15 7:48 ` Thomas Monjalon
2021-06-15 10:44 ` Xia, Chenbo
2021-06-15 11:57 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190404041925.GA26931@dpdk-tbie.sh.intel.com \
--to=tiwei.bie@intel.com \
--cc=alejandro.lucero@netronome.com \
--cc=bruce.richardson@intel.com \
--cc=cunming.liang@intel.com \
--cc=dev@dpdk.org \
--cc=keith.wiles@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).