From: Thomas Monjalon <thomas@monjalon.net>
To: "Xia, Chenbo" <chenbo.xia@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
"Liang, Cunming" <cunming.liang@intel.com>,
"Wu, Jingjing" <jingjing.wu@intel.com>,
"Burakov, Anatoly" <anatoly.burakov@intel.com>,
"Yigit, Ferruh" <ferruh.yigit@intel.com>,
"mdr@ashroe.eu" <mdr@ashroe.eu>,
"nhorman@tuxdriver.com" <nhorman@tuxdriver.com>,
"Richardson, Bruce" <bruce.richardson@intel.com>,
"david.marchand@redhat.com" <david.marchand@redhat.com>,
"stephen@networkplumber.org" <stephen@networkplumber.org>,
"Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
jgg@nvidia.com, parav@nvidia.com, xuemingl@nvidia.com
Subject: Re: [dpdk-dev] [RFC v3 0/6] Add mdev (Mediated device) support in DPDK
Date: Tue, 15 Jun 2021 09:48:24 +0200 [thread overview]
Message-ID: <50744230.0ZSezZt4d8@thomas> (raw)
In-Reply-To: <MN2PR11MB406381B94BB6305F54A6B7B69C309@MN2PR11MB4063.namprd11.prod.outlook.com>
15/06/2021 04:49, Xia, Chenbo:
> From: Thomas Monjalon <thomas@monjalon.net>
> > 01/06/2021 05:06, Chenbo Xia:
> > > Hi everyone,
> > >
> > > This is a draft implementation of the mdev (Mediated device [1])
> > > support in DPDK PCI bus driver. Mdev is a way to virtualize devices
> > > in Linux kernel. Based on the device-api (mdev_type/device_api),
> > > there could be different types of mdev devices (e.g. vfio-pci).
> >
> > Please could you illustrate with an usage of mdev in DPDK?
> > What does it enable which is not possible today?
>
> The main purpose is for DPDK to drive mdev-based devices, which is not
> possible today.
>
> I'd take PCI devices for an example. Currently DPDK can only drive devices
> of physical pci bus under /sys/bus/pci and kernel exposes the pci devices
> to APP in that way.
>
> But there are PCI devices using vfio-mdev as a software framework to expose
> Mdev to APP under /sys/bus/mdev. Devices could choose this way of virtualizing
> itself to let multiple APPs share one physical device. For example, Intel
> Scalable IOV technology is known to use vfio-mdev as SW framework for Scalable
> IOV enabled devices (and Intel net/crypto/raw devices support this tech). For
> those mdev-based devices, DPDK needs support on the bus layer to scan/plug/probe/..
> them, which is the main effort this patchset does. There are also other devices
> using the vfio-mdev framework, AFAIK, Nvidia's GPU is the first one using mdev
> and Intel's GPU virtualization also uses it.
Yes mdev was designed for virtualization I think.
The use of mdev for Scalable IOV without virtualization
may be seen as an abuse by Linux maintainers,
as they currently seem to prefer the auxiliary bus (which is a real bus).
Mellanox got a push back when trying to use mdev for the same purpose
(Scalable Function, also called Sub-Function) in the kernel.
The Linux community decided to use the auxiliary bus.
Any other feedback on the choice mdev vs aux?
Is there any kernel code supporting this mdev model for Intel devices?
> > > In this patchset, the PCI bus driver is extended to support scanning
> > > and probing the mdev devices whose device-api is "vfio-pci".
> > >
> > > +---------+
> > > | PCI bus |
> > > +----+----+
> > > |
> > > +--------+-------+-------+--------+
> > > | | | |
> > > Physical PCI devices ... Mediated PCI devices ...
> > >
> > > The first four patches in this patchset are mainly preparation of mdev
> > > bus support. The left two patches are the key implementation of mdev bus.
> > >
> > > The implementation of mdev bus in DPDK has several options:
> > >
> > > 1: Embed mdev bus in current pci bus
> > >
> > > This patchset takes this option for an example. Mdev has several
> > > device types: pci/platform/amba/ccw/ap. DPDK currently only cares
> > > pci devices in all mdev device types so we could embed the mdev bus
> > > into current pci bus. Then pci bus with mdev support will scan/plug/
> > > unplug/.. not only normal pci devices but also mediated pci devices.
> >
> > I think it is a different bus.
> > It would be cleaner to not touch the PCI bus.
> > Having a separate bus will allow an easy way to identify a device
> > with the new generic devargs syntax, example:
> > bus=mdev,uuid=XXX
> > or more complex:
> > bus=mdev,uuid=XXX/class=crypto/driver=qat,foo=bar
>
> OK. Agree on cleaner to not touch PCI bus. And there may also be a 'type=pci'
> as mdev has several types in its definition (pci/ap/platform/ccw/...).
>
> > > 2: A new mdev bus that scans mediated pci devices and probes mdev driver to
> > > plug-in pci devices to pci bus
> > >
> > > If we took this option, a new mdev bus will be implemented to scan
> > > mediated pci devices and a new mdev driver for pci devices will be
> > > implemented in pci bus to plug-in mediated pci devices to pci bus.
> > >
> > > Our RFC v1 takes this option:
> > > http://patchwork.dpdk.org/project/dpdk/cover/20190403071844.21126-1-
> > tiwei.bie@intel.com/
> > >
> > > Note that: for either option 1 or 2, device drivers do not know the
> > > implementation difference but only use structs/functions exposed by
> > > pci bus. Mediated pci devices are different from normal pci devices
> > > on: 1. Mediated pci devices use UUID as address but normal ones use BDF.
> > > 2. Mediated pci devices may have some capabilities that normal pci
> > > devices do not have. For example, mediated pci devices could have
> > > regions that have sparse mmap capability, which allows a region to have
> > > multiple mmap areas. Another example is mediated pci devices may have
> > > regions/part of regions not mmaped but need to access them. Above
> > > difference will change the current ABI (i.e., struct rte_pci_device).
> > > Please check 5th and 6th patch for details.
> > >
> > > 3. A brand new mdev bus that does everything
> > >
> > > This option will implement a new and standalone mdev bus. This option
> > > does not need any changes in current pci bus but only needs some shared
> > > code (linux vfio part) in pci bus. Drivers of devices that support mdev
> > > will register itself as a mdev driver and do not rely on pci bus anymore.
> > > This option, IMHO, will make the code clean. The only potential problem
> > > may be code duplication, which could be solved by making code of linux
> > > vfio part of pci bus common and shared.
> >
> > Yes I prefer this third option.
> > We can find an elegant way of sharing some VFIO code between buses.
>
> Yes, I have not thought about the details of the code sharing but will try to make
> it elegant.
Great, thanks.
next prev parent reply other threads:[~2021-06-15 7:48 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-03 7:18 [dpdk-dev] [RFC 0/3] " Tiwei Bie
2019-04-03 7:18 ` Tiwei Bie
2019-04-03 7:18 ` [dpdk-dev] [RFC 1/3] eal: add a helper for reading string from sysfs Tiwei Bie
2019-04-03 7:18 ` Tiwei Bie
2019-04-03 7:18 ` [dpdk-dev] [RFC 2/3] bus/mdev: add mdev bus support Tiwei Bie
2019-04-03 7:18 ` Tiwei Bie
2019-04-03 7:18 ` [dpdk-dev] [RFC 3/3] bus/pci: add mdev support Tiwei Bie
2019-04-03 7:18 ` Tiwei Bie
2019-04-03 14:13 ` Wiles, Keith
2019-04-03 14:13 ` Wiles, Keith
2019-04-04 4:19 ` Tiwei Bie
2019-04-04 4:19 ` Tiwei Bie
2019-04-08 8:44 ` [dpdk-dev] [RFC 0/3] Add mdev (Mediated device) support in DPDK Alejandro Lucero
2019-04-08 8:44 ` Alejandro Lucero
2019-04-08 9:36 ` Tiwei Bie
2019-04-08 9:36 ` Tiwei Bie
2019-04-10 10:02 ` Francois Ozog
2019-04-10 10:02 ` Francois Ozog
2023-07-03 23:54 ` Stephen Hemminger
2019-07-15 7:52 ` [dpdk-dev] [RFC v2 0/5] " Tiwei Bie
2019-07-15 7:52 ` [dpdk-dev] [RFC v2 1/5] bus/pci: introduce an internal representation of PCI device Tiwei Bie
2019-07-15 7:52 ` [dpdk-dev] [RFC v2 2/5] bus/pci: avoid depending on private value in kernel source Tiwei Bie
2019-07-15 7:52 ` [dpdk-dev] [RFC v2 3/5] bus/pci: introduce helper for MMIO read and write Tiwei Bie
2019-07-15 7:52 ` [dpdk-dev] [RFC v2 4/5] eal: add a helper for reading string from sysfs Tiwei Bie
2019-07-15 7:52 ` [dpdk-dev] [RFC v2 5/5] bus/pci: add mdev support Tiwei Bie
2021-06-01 3:06 ` [dpdk-dev] [RFC v3 0/6] Add mdev (Mediated device) support in DPDK Chenbo Xia
2021-06-01 3:06 ` [dpdk-dev] [RFC v3 1/6] bus/pci: introduce an internal representation of PCI device Chenbo Xia
2021-06-01 3:06 ` [dpdk-dev] [RFC v3 2/6] bus/pci: avoid depending on private value in kernel source Chenbo Xia
2021-06-01 3:06 ` [dpdk-dev] [RFC v3 3/6] bus/pci: introduce helper for MMIO read and write Chenbo Xia
2021-06-01 3:06 ` [dpdk-dev] [RFC v3 4/6] eal: add a helper for reading string from sysfs Chenbo Xia
2021-06-01 5:37 ` Stephen Hemminger
2021-06-08 5:47 ` Xia, Chenbo
2021-06-01 5:39 ` Stephen Hemminger
2021-06-08 5:48 ` Xia, Chenbo
2021-06-11 7:19 ` Thomas Monjalon
2021-06-01 3:06 ` [dpdk-dev] [RFC v3 5/6] bus/pci: add mdev support Chenbo Xia
2021-06-01 3:06 ` [dpdk-dev] [RFC v3 6/6] bus/pci: add sparse mmap support for mediated PCI devices Chenbo Xia
2021-06-11 7:15 ` [dpdk-dev] [RFC v3 0/6] Add mdev (Mediated device) support in DPDK Thomas Monjalon
2021-06-15 2:49 ` Xia, Chenbo
2021-06-15 7:48 ` Thomas Monjalon [this message]
2021-06-15 10:44 ` Xia, Chenbo
2021-06-15 11:57 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50744230.0ZSezZt4d8@thomas \
--to=thomas@monjalon.net \
--cc=anatoly.burakov@intel.com \
--cc=bruce.richardson@intel.com \
--cc=chenbo.xia@intel.com \
--cc=cunming.liang@intel.com \
--cc=david.marchand@redhat.com \
--cc=dev@dpdk.org \
--cc=ferruh.yigit@intel.com \
--cc=jgg@nvidia.com \
--cc=jingjing.wu@intel.com \
--cc=konstantin.ananyev@intel.com \
--cc=mdr@ashroe.eu \
--cc=nhorman@tuxdriver.com \
--cc=parav@nvidia.com \
--cc=stephen@networkplumber.org \
--cc=xuemingl@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).