* [dpdk-dev] Proposal for a big eal / ethdev cleanup @ 2016-01-14 10:38 David Marchand 2016-01-14 11:46 ` Jan Viktorin ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: David Marchand @ 2016-01-14 10:38 UTC (permalink / raw) To: dev; +Cc: Jan Viktorin Hello all, Here is a proposal of a big cleanup in ethdev (cryptodev would have to follow) and eal structures. This is something I wanted to do for quite some time and the arrival of a new bus makes me think we need it. This is an alternative to what Jan proposed [1]. ABI is most likely broken with this, but I think this discussion can come later. First some context on how dpdk is initialized at the moment : Let's imagine a system with one ixgbe pci device and take some part of ixgbe driver as an example. static struct eth_driver rte_ixgbe_pmd = { .pci_drv = { .name = "rte_ixgbe_pmd", .id_table = pci_id_ixgbe_map, .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | RTE_PCI_DRV_DETACHABLE, }, .eth_dev_init = eth_ixgbe_dev_init, .eth_dev_uninit = eth_ixgbe_dev_uninit, .dev_private_size = sizeof(struct ixgbe_adapter), }; static int rte_ixgbe_pmd_init(const char *name __rte_unused, const char *params __rte_unused) { PMD_INIT_FUNC_TRACE(); rte_eth_driver_register(&rte_ixgbe_pmd); return 0; } static struct rte_driver rte_ixgbe_driver = { .type = PMD_PDEV, .init = rte_ixgbe_pmd_init, }; PMD_REGISTER_DRIVER(rte_ixgbe_driver) DPDK initialisation goes as follows (focusing on ixgbe driver): PMD_REGISTER_DRIVER(rte_ixgbe_driver) which adds it to dev_driver_list rte_eal_init() -> rte_eal_pci_init() -> rte_eal_pci_scan() which fills pci_device_list -> rte_eal_dev_init() -> for each rte_driver (first vdev, then pdev), call driver->init() so here rte_ixgbe_pmd_init(NULL, NULL) -> rte_eth_driver_register(&rte_ixgbe_pmd); -> fills rte_ixgbe_pmd.pci_drv.devinit = rte_eth_dev_init -> call rte_eal_pci_register() which adds it to pci_driver_list -> rte_eal_pci_probe() -> for each rte_pci_device found in rte_eal_pci_scan(), and for all rte_pci_driver registered, call devinit(dr, dev), so here rte_eth_dev_init(dr, dev) -> creates a new eth_dev (which is a rte_eth_dev), then adds reference to passed dev pointer (which is a rte_pci_device), to passed dr pointer (which is a rte_pci_driver cast as a eth_driver) -> call eth_drv->eth_dev_init(eth_dev) so here eth_ixgbe_dev_init(eth_dev) -> fills other parts of eth_dev -> rte_eth_copy_pci_info(eth_dev, pci_dev) By the way, when invoking ethdev init, the pci-specific stuff is only handled in functions called from the drivers themselves, which already know that they are dealing with pci resources. Later in the life of the application, ethdev api is called for hotplug. int rte_eth_dev_attach(const char *devargs, uint8_t *port_id); A devargs is used to identify a vdev/pdev driver and call it to create a new rte_eth_dev. Current code goes as far as parsing devargs to understand if this is a pci device or a vdev. This part should be moved to eal since this is where all the "bus" logic is. So now, what I had in mind is something like this. It is far from perfect and I certainly did some shortcuts in my reasoning. Generic device/driver - introduce a rte_device structure, - a rte_device has a name, that identifies it in a unique way across all buses, maybe something like pci:0000:00:01.0, and for vdev, vdev:name - a rte_device references a rte_driver, - a rte_device references devargs - a rte_device embeds a intr_handle - rte_device objects are created by "buses" - a function to retrieve rte_device objects based on name is added - current rte_driver does not need to know about the pmd_type (pdev/vdev), this is only a way to order drivers init in eal, we could use the rte_driver names to order them or maybe remove this ordering - init and uninit functions are changed to take a pointer to a rte_device rte_device and rte_driver structures are at the "bus" level. Those are the basic structures we will build the other objects on. Impact on PCI device/driver - rte_pci_device is modified to embed a rte_device (embedding makes it possible later to cast the rte_device and get the rte_pci_device in pci specific functions) - no need for a rte_pci_driver reference in rte_pci_device, since we have the rte_device driver - rte_pci_driver is modified to embed a rte_driver - no more devinit and devuninit functions in rte_pci_driver, they can be moved as init / uninit functions in rte_driver - pci scan code creates rte_pci_device objects, associates them to rte_pci_driver, fills the driver field of the rte_driver then pass them to rte_driver init function. rte_pci_device and rte_pci_driver are specific implementation of rte_device and rte_driver. There are there to maintain pci private methods, create upper layer objects (ethdev / crypto) etc.. Impact on vdev - introduce a rte_vdev_driver structure - a rte_vdev_driver embeds a rte_driver - a rte_vdev_driver has a priv size for vdev objects creation - no need for a vdev device object, this is specific to vdev drivers - eal init code associates devargs to vdev drivers, creates a rte_device object (using the priv size), fills the driver field then pass them to rte_driver init function. Impact on ethdev - a rte_eth_dev object references a rte_device in place of rte_pci_device - no more information about a driver in rte_eth_dev, this is the rte_device that has a reference to its rte_driver - eth_driver can be removed, it is only a wrapper of a rte_pci_driver at the moment, maybe some init function wrappers can stay in ethdev with dev_private_size to be handled in the rte_driver - rte_eth_dev objects are created by rte_drivers during probe (pci scan, vdev init, hotplug) - ethdev ops are populated by rte_drivers Impact on hotplug - hotplug moves from ethdev to eal - a notification framework is added to ethdev when hotplugging - eal uses the name (remember the pci:0000:00:01.0 / vdev:name example) in devargs to identify the right bus (pci / vdev) - pci scan code is reused to create a rte_pci_device object etc... - vdev init code is reused We end up with something like this. An arrow means that the structure contains a pointer to an object of the other struct. I only wrote the fields I mentioned in this mail, for pci device a lot of other fields are omitted. - for ethdev on top of pci devices +------------------+ +-------------------------------+ | | | | | rte_pci_device | | rte_pci_driver | | | | | +-------------+ | +--------------+ | | +---------------------------+ | | | | | | | | | | | | rte_eth_dev +---> rte_device +-----> rte_driver | | | | | | char name[] | | | | char name[] | | +-------------+ | | | | | | int init(rte_device *) | | | +--------------+ | | | int uninit(rte_device *) | | | | | | | | +------------------+ | +---------------------------+ | | | +-------------------------------+ - for ethdev on top of vdev devices +------------------+ +-------------------------------+ | | | | | drv specific | | rte_vdev_driver | | | | | +-------------+ | +--------------+ | | +---------------------------+ | | | | | | | | | | | | rte_eth_dev +---> rte_device +-----> rte_driver | | | | | | char name[] | | | | char name[] | | +-------------+ | | | | | | int init(rte_device *) | | | +--------------+ | | | int uninit(rte_device *) | | | | | | | | +------------------+ | +---------------------------+ | | | | int priv_size | | | +-------------------------------+ Thanks for reading. Comments ? -- David Marchand [1] http://dpdk.org/ml/archives/dev/2016-January/030975.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] Proposal for a big eal / ethdev cleanup 2016-01-14 10:38 [dpdk-dev] Proposal for a big eal / ethdev cleanup David Marchand @ 2016-01-14 11:46 ` Jan Viktorin 2016-01-16 15:53 ` David Marchand 2016-01-18 14:54 ` Declan Doherty [not found] ` <20160118155834.04cb31f2@pcviktorin.fit.vutbr.cz> 2 siblings, 1 reply; 9+ messages in thread From: Jan Viktorin @ 2016-01-14 11:46 UTC (permalink / raw) To: David Marchand; +Cc: dev Hello David, nice to see that the things are moving... On Thu, 14 Jan 2016 11:38:16 +0100 David Marchand <david.marchand@6wind.com> wrote: > Hello all, > > Here is a proposal of a big cleanup in ethdev (cryptodev would have to > follow) and eal structures. > This is something I wanted to do for quite some time and the arrival of > a new bus makes me think we need it. > > This is an alternative to what Jan proposed [1]. As I need to extend DPDK by a non-PCI bus system, I would prefer any such working solution :). By [1], you probably mean: [1] http://comments.gmane.org/gmane.comp.networking.dpdk.devel/30973 (I didn't find it in the e-mail.) > > ABI is most likely broken with this, but I think this discussion can come later. I was trying in my initial approach to stay as much API and ABI backwards compatible as possible to be acceptable into upstream. As just a few people have shown their interest in these changes, I consider the ABI compatibility very important. I can see, that your approach does not care too much... Otherwise, it looks good to me. It is closer to the Linux drivers infra, so it seems to be clearer then the current one. > > > First some context on how dpdk is initialized at the moment : > > Let's imagine a system with one ixgbe pci device and take some > part of ixgbe driver as an example. > > static struct eth_driver rte_ixgbe_pmd = { > .pci_drv = { > .name = "rte_ixgbe_pmd", > .id_table = pci_id_ixgbe_map, > .drv_flags = RTE_PCI_DRV_NEED_MAPPING | > RTE_PCI_DRV_INTR_LSC | RTE_PCI_DRV_DETACHABLE, > }, > .eth_dev_init = eth_ixgbe_dev_init, > .eth_dev_uninit = eth_ixgbe_dev_uninit, > .dev_private_size = sizeof(struct ixgbe_adapter), > }; Note, that the biggest issue here is that the eth_driver has no way to distinguish among PCI or other subsystem. There is no field that helps the generic ethdev code (librte_ether) to decide what bus we are on (and it needs to know it in the current DPDK). Another point is that the ethdev infra should not know about the underlying bus infra. The question is whether we do a big clean up (extract PCI-bus code out) or a small clean up (give the ethdev infra a hint which bus system it deals with). > > static int > rte_ixgbe_pmd_init(const char *name __rte_unused, const char *params > __rte_unused) > { > PMD_INIT_FUNC_TRACE(); > rte_eth_driver_register(&rte_ixgbe_pmd); > return 0; > } > > static struct rte_driver rte_ixgbe_driver = { > .type = PMD_PDEV, > .init = rte_ixgbe_pmd_init, > }; > > PMD_REGISTER_DRIVER(rte_ixgbe_driver) > > > DPDK initialisation goes as follows (focusing on ixgbe driver): > > PMD_REGISTER_DRIVER(rte_ixgbe_driver) which adds it to dev_driver_list > > rte_eal_init() > -> rte_eal_pci_init() > -> rte_eal_pci_scan() which fills pci_device_list > > -> rte_eal_dev_init() > -> for each rte_driver (first vdev, then pdev), call driver->init() > so here rte_ixgbe_pmd_init(NULL, NULL) > -> rte_eth_driver_register(&rte_ixgbe_pmd); > -> fills rte_ixgbe_pmd.pci_drv.devinit = rte_eth_dev_init > -> call rte_eal_pci_register() which adds it to pci_driver_list > > -> rte_eal_pci_probe() > -> for each rte_pci_device found in rte_eal_pci_scan(), and for all > rte_pci_driver registered, call devinit(dr, dev), > so here rte_eth_dev_init(dr, dev) > -> creates a new eth_dev (which is a rte_eth_dev), then adds > reference to passed dev pointer (which is a rte_pci_device), to > passed dr pointer (which is a rte_pci_driver cast as a eth_driver) > -> call eth_drv->eth_dev_init(eth_dev) > so here eth_ixgbe_dev_init(eth_dev) > -> fills other parts of eth_dev > -> rte_eth_copy_pci_info(eth_dev, pci_dev) > > > By the way, when invoking ethdev init, the pci-specific stuff is only > handled in functions called from the drivers themselves, which already > know that they are dealing with pci resources. This is an important note as it allows to (almost) avoid touching the current drivers. > > > Later in the life of the application, ethdev api is called for hotplug. > > int rte_eth_dev_attach(const char *devargs, uint8_t *port_id); > > A devargs is used to identify a vdev/pdev driver and call it to create a > new rte_eth_dev. > Current code goes as far as parsing devargs to understand if this is a > pci device or a vdev. > This part should be moved to eal since this is where all the "bus" logic > is. Parsing of devargs is quite wierd - I mean whitelisting and blacklisting. At the moment, it guesses whether the given argument is a PCI device identification or an arbitrary string (vdev). It is not easy to extend this reliably. OK, I can see you are addressing this below. > > > > So now, what I had in mind is something like this. > It is far from perfect and I certainly did some shortcuts in my > reasoning. > > > Generic device/driver > > - introduce a rte_device structure, > - a rte_device has a name, that identifies it in a unique way across > all buses, maybe something like pci:0000:00:01.0, and for vdev, > vdev:name Having a prefix is good but would this break the user API? Is the name exposed to users? > - a rte_device references a rte_driver, > - a rte_device references devargs > - a rte_device embeds a intr_handle > - rte_device objects are created by "buses" > - a function to retrieve rte_device objects based on name is added > > - current rte_driver does not need to know about the pmd_type > (pdev/vdev), this is only a way to order drivers init in eal, we could > use the rte_driver names to order them or maybe remove this ordering What is the main reason to have pdev/vdev distinction here? After I register the driver, does eal really need to know the pmd_type? Moreover, there is no way how to pass arguments to pdevs, only to vdevs. This was shortly disscussed in [2] for the szedata2 PMD. [2] http://dpdk.org/ml/archives/dev/2015-October/026285.html > - init and uninit functions are changed to take a pointer to a > rte_device > > rte_device and rte_driver structures are at the "bus" level. > Those are the basic structures we will build the other objects on. > > > Impact on PCI device/driver > > - rte_pci_device is modified to embed a rte_device (embedding makes it > possible later to cast the rte_device and get the rte_pci_device in pci > specific functions) > - no need for a rte_pci_driver reference in rte_pci_device, since we > have the rte_device driver I've noticed that DPDK does not use any kind of the container_of macro like the Linux Kernel does. Instead, some considerations like the rte_pci_driver MUST be placed at the beginning of the eth_driver (if I am not mistaken). Here, it should be considered to do it another way. > > - rte_pci_driver is modified to embed a rte_driver > - no more devinit and devuninit functions in rte_pci_driver, they can > be moved as init / uninit functions in rte_driver > > - pci scan code creates rte_pci_device objects, associates them to > rte_pci_driver, fills the driver field of the rte_driver then pass > them to rte_driver init function. > > rte_pci_device and rte_pci_driver are specific implementation of > rte_device and rte_driver. > There are there to maintain pci private methods, create upper layer > objects (ethdev / crypto) etc.. > > > Impact on vdev > > - introduce a rte_vdev_driver structure > - a rte_vdev_driver embeds a rte_driver > - a rte_vdev_driver has a priv size for vdev objects creation > > - no need for a vdev device object, this is specific to vdev drivers > > - eal init code associates devargs to vdev drivers, creates a > rte_device object (using the priv size), fills the driver field then > pass them to rte_driver init function. > > > Impact on ethdev > > - a rte_eth_dev object references a rte_device in place of > rte_pci_device > - no more information about a driver in rte_eth_dev, this is the > rte_device that has a reference to its rte_driver > - eth_driver can be removed, it is only a wrapper of a rte_pci_driver > at the moment, maybe some init function wrappers can stay in ethdev > with dev_private_size to be handled in the rte_driver > - rte_eth_dev objects are created by rte_drivers during probe (pci > scan, vdev init, hotplug) > - ethdev ops are populated by rte_drivers > > > Impact on hotplug > > - hotplug moves from ethdev to eal > - a notification framework is added to ethdev when hotplugging This seems to be a good idea. I was suprised to find such code in librte_ether... > > - eal uses the name (remember the pci:0000:00:01.0 / vdev:name > example) in devargs to identify the right bus (pci / vdev) > - pci scan code is reused to create a rte_pci_device object etc... > - vdev init code is reused > > > We end up with something like this. > An arrow means that the structure contains a pointer to an object of the > other struct. > I only wrote the fields I mentioned in this mail, for pci device a lot > of other fields are omitted. > > - for ethdev on top of pci devices > > +------------------+ +-------------------------------+ > | | | | > | rte_pci_device | | rte_pci_driver | > | | | | > +-------------+ | +--------------+ | | +---------------------------+ | > | | | | | | | | | | > | rte_eth_dev +---> rte_device +-----> rte_driver | | > | | | | char name[] | | | | char name[] | | > +-------------+ | | | | | | int init(rte_device *) | | > | +--------------+ | | | int uninit(rte_device *) | | > | | | | | | > +------------------+ | +---------------------------+ | > | | > +-------------------------------+ > > - for ethdev on top of vdev devices > > +------------------+ +-------------------------------+ > | | | | > | drv specific | | rte_vdev_driver | > | | | | > +-------------+ | +--------------+ | | +---------------------------+ | > | | | | | | | | | | > | rte_eth_dev +---> rte_device +-----> rte_driver | | > | | | | char name[] | | | | char name[] | | > +-------------+ | | | | | | int init(rte_device *) | | > | +--------------+ | | | int uninit(rte_device *) | | > | | | | | | > +------------------+ | +---------------------------+ | > | | > | int priv_size | > | | > +-------------------------------+ > > > Thanks for reading. > Comments ? > > -- Jan Viktorin E-mail: Viktorin@RehiveTech.com System Architect Web: www.RehiveTech.com RehiveTech Brno, Czech Republic ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] Proposal for a big eal / ethdev cleanup 2016-01-14 11:46 ` Jan Viktorin @ 2016-01-16 15:53 ` David Marchand 2016-01-18 15:49 ` Thomas Monjalon 0 siblings, 1 reply; 9+ messages in thread From: David Marchand @ 2016-01-16 15:53 UTC (permalink / raw) To: Jan Viktorin; +Cc: dev Hello Jan, On Thu, Jan 14, 2016 at 12:46 PM, Jan Viktorin <viktorin@rehivetech.com> wrote: > On Thu, 14 Jan 2016 11:38:16 +0100 > David Marchand <david.marchand@6wind.com> wrote: > >> Hello all, >> >> Here is a proposal of a big cleanup in ethdev (cryptodev would have to >> follow) and eal structures. >> This is something I wanted to do for quite some time and the arrival of >> a new bus makes me think we need it. >> >> This is an alternative to what Jan proposed [1]. > > As I need to extend DPDK by a non-PCI bus system, I would prefer any such > working solution :). By [1], you probably mean: > > [1] http://comments.gmane.org/gmane.comp.networking.dpdk.devel/30973 > > (I didn't find it in the e-mail.) Thought I put the reference after my signature. Anyway, yes, I was referring to your thread. >> ABI is most likely broken with this, but I think this discussion can come later. > > I was trying in my initial approach to stay as much API and ABI backwards > compatible as possible to be acceptable into upstream. As just a few > people have shown their interest in these changes, I consider the ABI > compatibility very important. > > I can see, that your approach does not care too much... Otherwise, it looks > good to me. It is closer to the Linux drivers infra, so it seems to be clearer > then the current one. I did this on purpose. >From my point of view, we will have an API/ABI breakage in this code at one point. So I sent this mail to show where I'd like us to go, because this looks saner on the mid/long term. >> First some context on how dpdk is initialized at the moment : >> >> Let's imagine a system with one ixgbe pci device and take some >> part of ixgbe driver as an example. >> >> static struct eth_driver rte_ixgbe_pmd = { >> .pci_drv = { >> .name = "rte_ixgbe_pmd", >> .id_table = pci_id_ixgbe_map, >> .drv_flags = RTE_PCI_DRV_NEED_MAPPING | >> RTE_PCI_DRV_INTR_LSC | RTE_PCI_DRV_DETACHABLE, >> }, >> .eth_dev_init = eth_ixgbe_dev_init, >> .eth_dev_uninit = eth_ixgbe_dev_uninit, >> .dev_private_size = sizeof(struct ixgbe_adapter), >> }; > > Note, that the biggest issue here is that the eth_driver has no way to > distinguish among PCI or other subsystem. There is no field that helps > the generic ethdev code (librte_ether) to decide what bus we are on > (and it needs to know it in the current DPDK). > > Another point is that the ethdev infra should not know about the > underlying bus infra. The question is whether we do a big clean > up (extract PCI-bus code out) or a small clean up (give the ethdev > infra a hint which bus system it deals with). Yes, and I think these two choices are reflected by our two respective proposals :-) >> So now, what I had in mind is something like this. >> It is far from perfect and I certainly did some shortcuts in my >> reasoning. >> >> >> Generic device/driver >> >> - introduce a rte_device structure, >> - a rte_device has a name, that identifies it in a unique way across >> all buses, maybe something like pci:0000:00:01.0, and for vdev, >> vdev:name > > Having a prefix is good but would this break the user API? Is the > name exposed to users? Maybe define new apis, and wrap the old one in it ? Not too sure, I need to experiment. >> - current rte_driver does not need to know about the pmd_type >> (pdev/vdev), this is only a way to order drivers init in eal, we could >> use the rte_driver names to order them or maybe remove this ordering > > What is the main reason to have pdev/vdev distinction here? After I > register the driver, does eal really need to know the pmd_type? Already did some cleanup in the past because of pmd_type : http://dpdk.org/browse/dpdk/commit/?id=78aecefed955917753bfb6f44ae970dde4c652d0 http://dpdk.org/browse/dpdk/commit/?id=6bc2415c3387ae72f2ce3677f0e3540e734583d5 For me, there is no reason but the init ordering (vdevs come before pdevs), and I am pretty sure we don't need this for ethdev. > Moreover, there is no way how to pass arguments to pdevs, only to > vdevs. This was shortly disscussed in [2] for the szedata2 PMD. > > [2] http://dpdk.org/ml/archives/dev/2015-October/026285.html Shorty discussed with Thomas, yes. But after some experiments, it appears that you can pass devargs after a whitelist option at init (--pci-whitelist xxxx:xx:xx.x,whataniceoptionwehavehere=1). This does not work for hotplug. This is undocumented and this looks like a workaround, so I would prefer we come up with a better api for 2.3. >> Impact on PCI device/driver >> >> - rte_pci_device is modified to embed a rte_device (embedding makes it >> possible later to cast the rte_device and get the rte_pci_device in pci >> specific functions) >> - no need for a rte_pci_driver reference in rte_pci_device, since we >> have the rte_device driver > > I've noticed that DPDK does not use any kind of the container_of macro > like the Linux Kernel does. Instead, some considerations like the > rte_pci_driver MUST be placed at the beginning of the eth_driver (if I > am not mistaken). Here, it should be considered to do it another way. Ah, yes, we could use this. Just did not think of using it :-) Regards, -- David Marchand ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] Proposal for a big eal / ethdev cleanup 2016-01-16 15:53 ` David Marchand @ 2016-01-18 15:49 ` Thomas Monjalon 0 siblings, 0 replies; 9+ messages in thread From: Thomas Monjalon @ 2016-01-18 15:49 UTC (permalink / raw) To: David Marchand; +Cc: dev, Jan Viktorin 2016-01-16 16:53, David Marchand: > On Thu, Jan 14, 2016 at 12:46 PM, Jan Viktorin <viktorin@rehivetech.com> wrote: > > On Thu, 14 Jan 2016 11:38:16 +0100 > > David Marchand <david.marchand@6wind.com> wrote: > >> Here is a proposal of a big cleanup in ethdev (cryptodev would have to > >> follow) and eal structures. [...] > >> ABI is most likely broken with this, but I think this discussion can come later. > > > > I was trying in my initial approach to stay as much API and ABI backwards > > compatible as possible to be acceptable into upstream. As just a few > > people have shown their interest in these changes, I consider the ABI > > compatibility very important. > > > > I can see, that your approach does not care too much... Otherwise, it looks > > good to me. It is closer to the Linux drivers infra, so it seems to be clearer > > then the current one. > > I did this on purpose. > From my point of view, we will have an API/ABI breakage in this code > at one point. > So I sent this mail to show where I'd like us to go, because this > looks saner on the mid/long term. [...] > > Another point is that the ethdev infra should not know about the > > underlying bus infra. The question is whether we do a big clean > > up (extract PCI-bus code out) or a small clean up (give the ethdev > > infra a hint which bus system it deals with). > > Yes, and I think these two choices are reflected by our two respective > proposals :-) Regarding the API/ABI breaks and how the big the changes must be, I'd say it is nice to have some changes without breaking the user interfaces. Though sometimes there is a great value to refactor things and break them. In such case, it is better to do the big changes at once so the breaking would impact only one version instead of breaking the same API again and again. About this proposal, I vote for: +1 [...] > > Moreover, there is no way how to pass arguments to pdevs, only to > > vdevs. This was shortly disscussed in [2] for the szedata2 PMD. > > > > [2] http://dpdk.org/ml/archives/dev/2015-October/026285.html > > Shorty discussed with Thomas, yes. > But after some experiments, it appears that you can pass devargs after > a whitelist option at init (--pci-whitelist > xxxx:xx:xx.x,whataniceoptionwehavehere=1). > This does not work for hotplug. > This is undocumented and this looks like a workaround, so I would > prefer we come up with a better api for 2.3. Yes we need also to redefine the command line syntax to have a generic way of passing per-device parameters to the drivers. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] Proposal for a big eal / ethdev cleanup 2016-01-14 10:38 [dpdk-dev] Proposal for a big eal / ethdev cleanup David Marchand 2016-01-14 11:46 ` Jan Viktorin @ 2016-01-18 14:54 ` Declan Doherty 2016-01-18 15:45 ` David Marchand [not found] ` <20160118155834.04cb31f2@pcviktorin.fit.vutbr.cz> 2 siblings, 1 reply; 9+ messages in thread From: Declan Doherty @ 2016-01-18 14:54 UTC (permalink / raw) To: David Marchand, dev; +Cc: Jan Viktorin On 14/01/16 10:38, David Marchand wrote: > Hello all, > > Here is a proposal of a big cleanup in ethdev (cryptodev would have to > follow) and eal structures. > This is something I wanted to do for quite some time and the arrival of > a new bus makes me think we need it. > > This is an alternative to what Jan proposed [1]. > > ABI is most likely broken with this, but I think this discussion can come later. > > > First some context on how dpdk is initialized at the moment : > > Let's imagine a system with one ixgbe pci device and take some > part of ixgbe driver as an example. > > static struct eth_driver rte_ixgbe_pmd = { > .pci_drv = { > .name = "rte_ixgbe_pmd", > .id_table = pci_id_ixgbe_map, > .drv_flags = RTE_PCI_DRV_NEED_MAPPING | > RTE_PCI_DRV_INTR_LSC | RTE_PCI_DRV_DETACHABLE, > }, > .eth_dev_init = eth_ixgbe_dev_init, > .eth_dev_uninit = eth_ixgbe_dev_uninit, > .dev_private_size = sizeof(struct ixgbe_adapter), > }; > > static int > rte_ixgbe_pmd_init(const char *name __rte_unused, const char *params > __rte_unused) > { > PMD_INIT_FUNC_TRACE(); > rte_eth_driver_register(&rte_ixgbe_pmd); > return 0; > } > > static struct rte_driver rte_ixgbe_driver = { > .type = PMD_PDEV, > .init = rte_ixgbe_pmd_init, > }; > > PMD_REGISTER_DRIVER(rte_ixgbe_driver) > > > DPDK initialisation goes as follows (focusing on ixgbe driver): > > PMD_REGISTER_DRIVER(rte_ixgbe_driver) which adds it to dev_driver_list > > rte_eal_init() > -> rte_eal_pci_init() > -> rte_eal_pci_scan() which fills pci_device_list > > -> rte_eal_dev_init() > -> for each rte_driver (first vdev, then pdev), call driver->init() > so here rte_ixgbe_pmd_init(NULL, NULL) > -> rte_eth_driver_register(&rte_ixgbe_pmd); > -> fills rte_ixgbe_pmd.pci_drv.devinit = rte_eth_dev_init > -> call rte_eal_pci_register() which adds it to pci_driver_list > > -> rte_eal_pci_probe() > -> for each rte_pci_device found in rte_eal_pci_scan(), and for all > rte_pci_driver registered, call devinit(dr, dev), > so here rte_eth_dev_init(dr, dev) > -> creates a new eth_dev (which is a rte_eth_dev), then adds > reference to passed dev pointer (which is a rte_pci_device), to > passed dr pointer (which is a rte_pci_driver cast as a eth_driver) > -> call eth_drv->eth_dev_init(eth_dev) > so here eth_ixgbe_dev_init(eth_dev) > -> fills other parts of eth_dev > -> rte_eth_copy_pci_info(eth_dev, pci_dev) > > > By the way, when invoking ethdev init, the pci-specific stuff is only > handled in functions called from the drivers themselves, which already > know that they are dealing with pci resources. > > > Later in the life of the application, ethdev api is called for hotplug. > > int rte_eth_dev_attach(const char *devargs, uint8_t *port_id); > > A devargs is used to identify a vdev/pdev driver and call it to create a > new rte_eth_dev. > Current code goes as far as parsing devargs to understand if this is a > pci device or a vdev. > This part should be moved to eal since this is where all the "bus" logic > is. > > > > So now, what I had in mind is something like this. > It is far from perfect and I certainly did some shortcuts in my > reasoning. > > > Generic device/driver > > - introduce a rte_device structure, > - a rte_device has a name, that identifies it in a unique way across > all buses, maybe something like pci:0000:00:01.0, and for vdev, > vdev:name > - a rte_device references a rte_driver, > - a rte_device references devargs > - a rte_device embeds a intr_handle > - rte_device objects are created by "buses" > - a function to retrieve rte_device objects based on name is added > > - current rte_driver does not need to know about the pmd_type > (pdev/vdev), this is only a way to order drivers init in eal, we could > use the rte_driver names to order them or maybe remove this ordering > - init and uninit functions are changed to take a pointer to a > rte_device > > rte_device and rte_driver structures are at the "bus" level. > Those are the basic structures we will build the other objects on. > > > Impact on PCI device/driver > > - rte_pci_device is modified to embed a rte_device (embedding makes it > possible later to cast the rte_device and get the rte_pci_device in pci > specific functions) > - no need for a rte_pci_driver reference in rte_pci_device, since we > have the rte_device driver > > - rte_pci_driver is modified to embed a rte_driver > - no more devinit and devuninit functions in rte_pci_driver, they can > be moved as init / uninit functions in rte_driver > > - pci scan code creates rte_pci_device objects, associates them to > rte_pci_driver, fills the driver field of the rte_driver then pass > them to rte_driver init function. > > rte_pci_device and rte_pci_driver are specific implementation of > rte_device and rte_driver. > There are there to maintain pci private methods, create upper layer > objects (ethdev / crypto) etc.. > > > Impact on vdev > > - introduce a rte_vdev_driver structure > - a rte_vdev_driver embeds a rte_driver > - a rte_vdev_driver has a priv size for vdev objects creation > > - no need for a vdev device object, this is specific to vdev drivers > > - eal init code associates devargs to vdev drivers, creates a > rte_device object (using the priv size), fills the driver field then > pass them to rte_driver init function. > > > Impact on ethdev > > - a rte_eth_dev object references a rte_device in place of > rte_pci_device > - no more information about a driver in rte_eth_dev, this is the > rte_device that has a reference to its rte_driver > - eth_driver can be removed, it is only a wrapper of a rte_pci_driver > at the moment, maybe some init function wrappers can stay in ethdev > with dev_private_size to be handled in the rte_driver > - rte_eth_dev objects are created by rte_drivers during probe (pci > scan, vdev init, hotplug) > - ethdev ops are populated by rte_drivers > > > Impact on hotplug > > - hotplug moves from ethdev to eal > - a notification framework is added to ethdev when hotplugging > > - eal uses the name (remember the pci:0000:00:01.0 / vdev:name > example) in devargs to identify the right bus (pci / vdev) > - pci scan code is reused to create a rte_pci_device object etc... > - vdev init code is reused > > > We end up with something like this. > An arrow means that the structure contains a pointer to an object of the > other struct. > I only wrote the fields I mentioned in this mail, for pci device a lot > of other fields are omitted. > > - for ethdev on top of pci devices > > +------------------+ +-------------------------------+ > | | | | > | rte_pci_device | | rte_pci_driver | > | | | | > +-------------+ | +--------------+ | | +---------------------------+ | > | | | | | | | | | | > | rte_eth_dev +---> rte_device +-----> rte_driver | | > | | | | char name[] | | | | char name[] | | > +-------------+ | | | | | | int init(rte_device *) | | > | +--------------+ | | | int uninit(rte_device *) | | > | | | | | | > +------------------+ | +---------------------------+ | > | | > +-------------------------------+ > > - for ethdev on top of vdev devices > > +------------------+ +-------------------------------+ > | | | | > | drv specific | | rte_vdev_driver | > | | | | > +-------------+ | +--------------+ | | +---------------------------+ | > | | | | | | | | | | > | rte_eth_dev +---> rte_device +-----> rte_driver | | > | | | | char name[] | | | | char name[] | | > +-------------+ | | | | | | int init(rte_device *) | | > | +--------------+ | | | int uninit(rte_device *) | | > | | | | | | > +------------------+ | +---------------------------+ | > | | > | int priv_size | > | | > +-------------------------------+ > > > Thanks for reading. > Comments ? > > Hey David, form the work we done on the cryptodev I definitely agree this is something we need to look at. There is some duplication of code between ethdev and cryptodev which would disappear with a common rte_device abstraction. I just haven't had time to write anything down and didn't want to further complicate the cryptodev patches by introducing EAL changes with it as well. In your proposal above, having an bus type enumeration in the rte_device which specifies the bus type might be simpler than having to parse a specific name formatting. Moving the hot-plugging infrastructure to the EAL would be great as this is an element we didn't address in the cryptodev layer as currently we would need to re-implement a lot of the same logic which is done in lib_ethdev. Again this was something on our list of things to look at. One thing that we are working on at the moment and it will affect your proposed solution above quite a lot is the ability to support devices with share-able buses in DPDK, we will have a RFC for this proposal in the next few days but I'll give a quick overview now. The Quick Assist device which we currently support a symmetric crypto cryptodev PMD for in DPDK is a mufti-function device, in that it supports symmetric and asymmetric crypto and compression functionality. These features are supported from a single device instance through different hardware queues. We want to provide each service through as a separate dpdk device but with and underlying shared bus (in our case PCI), this way a device/queue will only provide 1 service type. What we are going to propose is to allow a rte_pci_device to be shared my multiple pdevs, to do this we would register multiple drivers against the same pci_id and with a small modification to the EAL rte_eal_pci_probe_all()/ rte_eal_pci_probe() functions we create a device instance for each matched driver as long as the diver has a share-able flag. So with the current DPDK architecture where the rte_cryptodev (also the same for the rte_eth_dev) has a pointer from the rte_pci_driver to the rte_pci_device and a back pointer from the rte_pci_device to the rte_pci_driver, we are proposing to remove the back pointer from rte_pci_device and then allow multiple rte_pci_driver. This will require an API change to the PCI device uninit function. We'll outline all of this in the RFC, but I just wanted to make you aware of it, as something to keep in mind. Our solution will logically look something a bit like below. +-----------------------+ +------------------------+ | qat_sym_pmd | | qat_asym_pmd | +-----------------------+ +------------------------+ | | V V +-----------------------+ +------------------------+ | rte_cryptodev | | rte_cryptodev | +-----------------------+ +------------------------+ | | V V +----------------+ +----------------+ | rte_pci_driver | | rte_pci_driver | +----------------+ +----------------+ ^ | | ^ | ------------------------ | / | / / V / | +----------------+ | ---------| rte_pci_device |--------- +----------------+ I guess the main change from your proposal to allow similar functionality would be that dependency chain is reversed with the rte_driver having the reference to the rte_device which could be shared by multiple drivers. +----------------------------+ +----------------+ | | | | | drv specific | | bus specific | | | | | +-------------+ |+--------------------------+| |+--------------+| | | || || || || | rte_eth_dev +--> rte_driver +--+--> rte_device || | | || char name[] || | || char name[] || +-------------+ || int init(rte_device *) || | || bool shared || || int uninit(rte_device *)|| | || mutex lock || || || | || || |+--------------------------+| | |+--------------+| | | | +----------------+ +----------------------------+ | | | +----------------------------+ | | | | | drv specific | | | | | +-------------+ |+--------------------------+| | | | || || | | rte_eth_dev +--> rte_driver +--+ | | || char name[] || +-------------+ || int init(rte_device *) || || int uninit(rte_device *) || || || |+--------------------------+| | | +----------------------------+ ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] Proposal for a big eal / ethdev cleanup 2016-01-18 14:54 ` Declan Doherty @ 2016-01-18 15:45 ` David Marchand 0 siblings, 0 replies; 9+ messages in thread From: David Marchand @ 2016-01-18 15:45 UTC (permalink / raw) To: Declan Doherty; +Cc: dev, Jan Viktorin Hello Declan, On Mon, Jan 18, 2016 at 3:54 PM, Declan Doherty <declan.doherty@intel.com> wrote: > In your proposal above, having an bus type enumeration in the rte_device > which specifies the bus type might be simpler than having to parse a > specific name formatting. This is not simpler. This is useless afaics. "upper" / "functional" layer should not rely on the "lower" / "physical" layer information. Your crypto device should be described through a struct with crypto ops functions. When registering the device, the (whatever bus) driver should register a crypto device with its own crypto ops. The only place I can think of, where parsing a resource name would have an interest, is when crossing borders dpdk <-> user application. This is why I proposed a naming convention to identify the devices. > One thing that we are working on at the moment and it will affect your > proposed solution above quite a lot is the ability to support devices with > share-able buses in DPDK, we will have a RFC for this proposal in the next > few days but I'll give a quick overview now. The Quick Assist device which > we currently support a symmetric crypto cryptodev PMD for in DPDK is a > mufti-function device, in that it supports symmetric and asymmetric crypto > and compression functionality. These features are supported from a single > device instance through different hardware queues. We want to provide each > service through as a separate dpdk device but with and underlying shared bus > (in our case PCI), this way a device/queue will only provide 1 service type. > What we are going to propose is to allow a rte_pci_device to be shared my > multiple pdevs, to do this we would register multiple drivers against the > same pci_id and with a small modification to the EAL > rte_eal_pci_probe_all()/ rte_eal_pci_probe() functions we create a device > instance for each matched driver as long as the diver has a share-able flag. >From the description you give, it sounds to me you want to expose two crypto devices that share the same pci device but I can't see where my proposed solution can not work. eal finds one pci device, associates it to one pci driver which then creates two different crypto devices (with their own specific logic). Those crypto devices references a generic rte_device (which happens to be a pci device) referencing a generic rte_driver. So to me, this is pure specific driver/crypto stuff, nothing to do in eal. What did I miss ? Regards, -- David Marchand ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <20160118155834.04cb31f2@pcviktorin.fit.vutbr.cz>]
* Re: [dpdk-dev] Proposal for a big eal / ethdev cleanup [not found] ` <20160118155834.04cb31f2@pcviktorin.fit.vutbr.cz> @ 2016-01-18 21:11 ` David Marchand 2016-01-19 10:29 ` Jan Viktorin 0 siblings, 1 reply; 9+ messages in thread From: David Marchand @ 2016-01-18 21:11 UTC (permalink / raw) To: Jan Viktorin; +Cc: dev Jan, I was waiting for some others feedbacks before going into the code. Glad to see you already tried this. On Mon, Jan 18, 2016 at 3:58 PM, Jan Viktorin <viktorin@rehivetech.com> wrote: > On Thu, 14 Jan 2016 11:38:16 +0100 > David Marchand <david.marchand@6wind.com> wrote: >> - no need for a rte_pci_driver reference in rte_pci_device, since we >> have the rte_device driver > > This is an issue, see below. > >> >> - rte_pci_driver is modified to embed a rte_driver > > The rte_driver and rte_pci_driver are related in a much different way > at the moment. The meaning of rte_driver is more like an rte_module in > the current DPDK. > > In fact, we don't have any generic rte_driver suitable for this purpose. > Thus, the transition to this model needs to rename rte_driver to > rte_module and to introduce a new data structure named rte_driver. > > Quite confusing... but this is how I understand it. Hum, yes. Well, looking at current rte_driver, this code has been first thought as a way to load pmd through dso, so yes, this is more like module init. Then the hotplug has been hooked on this, adding to the confusion. > (What is the current relation between rte_pci_device and rte_pci_driver? > Is the rte_pci_driver a singleton? I doubt. Well, it cannot be, as it > is embedded in each eth_driver.) Not sure I understand the question. At the moment, a rte_pci_device references a rte_pci_driver. Associating those happens at pci "probe" time lib/librte_eal/common/eal_common_pci.c +202 I agree there is a pci_driver embedded in eth_driver, but that does not mean pci drivers must be eth drivers. > Another way, not that beautiful... Introduce rte_generic_driver and > rte_generic_device. (Or rte_gen_driver/rte_gen_device or > rte_bus_driver/rte_bus_device if you want). This enables to let the > rte_driver as it is and it avoids a lot of quite terrible transition > patches that can break everything. > >> - no more devinit and devuninit functions in rte_pci_driver, they can >> be moved as init / uninit functions in rte_driver > > The rte_driver has init/uninit already and its semantics seem to be > module_init and module_uninit. Ok, so what you propose is something like this ? - keep rte_driver as it is (init and uninit), I would say the name can be changed later. - add rte_bus_driver (idem, not sure it is a good name) in place of the rte_driver I mentioned in my initial mail. Rather than have init / uninit, how about attach / detach methods ? Regards, -- David Marchand ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] Proposal for a big eal / ethdev cleanup 2016-01-18 21:11 ` David Marchand @ 2016-01-19 10:29 ` Jan Viktorin 2016-01-19 10:59 ` David Marchand 0 siblings, 1 reply; 9+ messages in thread From: Jan Viktorin @ 2016-01-19 10:29 UTC (permalink / raw) To: David Marchand; +Cc: dev On Mon, 18 Jan 2016 22:11:56 +0100 David Marchand <david.marchand@6wind.com> wrote: > Jan, > > I was waiting for some others feedbacks before going into the code. > Glad to see you already tried this. Of course... I think, it's better to have a particular code (if possible) to talk about ;). It is quite difficult to see all the impacts. I hope to see more people to join this discussion. > > > On Mon, Jan 18, 2016 at 3:58 PM, Jan Viktorin <viktorin@rehivetech.com> wrote: > > On Thu, 14 Jan 2016 11:38:16 +0100 > > David Marchand <david.marchand@6wind.com> wrote: > >> - no need for a rte_pci_driver reference in rte_pci_device, since we > >> have the rte_device driver > > > > This is an issue, see below. > > > >> > >> - rte_pci_driver is modified to embed a rte_driver > > > > The rte_driver and rte_pci_driver are related in a much different way > > at the moment. The meaning of rte_driver is more like an rte_module in > > the current DPDK. > > > > In fact, we don't have any generic rte_driver suitable for this purpose. > > Thus, the transition to this model needs to rename rte_driver to > > rte_module and to introduce a new data structure named rte_driver. > > > > Quite confusing... but this is how I understand it. > > Hum, yes. > Well, looking at current rte_driver, this code has been first thought > as a way to load pmd through dso, so yes, this is more like module > init. > Then the hotplug has been hooked on this, adding to the confusion. > > > > (What is the current relation between rte_pci_device and rte_pci_driver? > > Is the rte_pci_driver a singleton? I doubt. Well, it cannot be, as it > > is embedded in each eth_driver.) > > Not sure I understand the question. I was just thinking loudly. This note was not very important. It was a part of my confusion. Result: rte_driver is semantically very different from rte_pci_driver. > > At the moment, a rte_pci_device references a rte_pci_driver. > Associating those happens at pci "probe" time > lib/librte_eal/common/eal_common_pci.c +202 > > I agree there is a pci_driver embedded in eth_driver, but that does > not mean pci drivers must be eth drivers. > > > > Another way, not that beautiful... Introduce rte_generic_driver and > > rte_generic_device. (Or rte_gen_driver/rte_gen_device or > > rte_bus_driver/rte_bus_device if you want). This enables to let the > > rte_driver as it is and it avoids a lot of quite terrible transition > > patches that can break everything. > > > >> - no more devinit and devuninit functions in rte_pci_driver, they can > >> be moved as init / uninit functions in rte_driver > > > > The rte_driver has init/uninit already and its semantics seem to be > > module_init and module_uninit. > > Ok, so what you propose is something like this ? I've expressed my basic understanding of this topic in the RFC patch set yesterday (as you know). > > - keep rte_driver as it is (init and uninit), I would say the name can > be changed later. Agreed. > - add rte_bus_driver (idem, not sure it is a good name) in place of > the rte_driver I mentioned in my initial mail. I don't like the name either. I have no other idea at the moment. > Rather than have init / uninit, how about attach / detach methods ? You mean attach a driver to a device? Yes, much better. And what about probe? I was quite confused when writing a PMD as I couldn't understand clearly where should I start touching the hardware. Regards Jan > > > Regards, -- Jan Viktorin E-mail: Viktorin@RehiveTech.com System Architect Web: www.RehiveTech.com RehiveTech Brno, Czech Republic ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] Proposal for a big eal / ethdev cleanup 2016-01-19 10:29 ` Jan Viktorin @ 2016-01-19 10:59 ` David Marchand 0 siblings, 0 replies; 9+ messages in thread From: David Marchand @ 2016-01-19 10:59 UTC (permalink / raw) To: Jan Viktorin; +Cc: dev Jan, On Tue, Jan 19, 2016 at 11:29 AM, Jan Viktorin <viktorin@rehivetech.com> wrote: > On Mon, 18 Jan 2016 22:11:56 +0100 > David Marchand <david.marchand@6wind.com> wrote: >> Ok, so what you propose is something like this ? > > I've expressed my basic understanding of this topic in the RFC patch set > yesterday (as you know). > >> >> - keep rte_driver as it is (init and uninit), I would say the name can >> be changed later. > > Agreed. > >> - add rte_bus_driver (idem, not sure it is a good name) in place of >> the rte_driver I mentioned in my initial mail. > > I don't like the name either. I have no other idea at the moment. My initial intention was to go as far as possible with the approach I described without caring about the api / abi. Then if the result is worth, see how we could maintain the api / abi and how to manage the changes if not possible. So please, do not hesitate to break stuff. >> Rather than have init / uninit, how about attach / detach methods ? > > You mean attach a driver to a device? Yes, much better. And what about > probe? I was quite confused when writing a PMD as I couldn't understand > clearly where should I start touching the hardware. Yes, I also thought of probe name, but then for unplugging ? We could use the same names as linux kernel probe/remove ? I think freebsd kernel uses the same, so why not. Regards, -- David Marchand ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2016-01-19 11:00 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-01-14 10:38 [dpdk-dev] Proposal for a big eal / ethdev cleanup David Marchand 2016-01-14 11:46 ` Jan Viktorin 2016-01-16 15:53 ` David Marchand 2016-01-18 15:49 ` Thomas Monjalon 2016-01-18 14:54 ` Declan Doherty 2016-01-18 15:45 ` David Marchand [not found] ` <20160118155834.04cb31f2@pcviktorin.fit.vutbr.cz> 2016-01-18 21:11 ` David Marchand 2016-01-19 10:29 ` Jan Viktorin 2016-01-19 10:59 ` David Marchand
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).