* [dpdk-dev] [PATCH 0/2] Fixes on IOVA mode selection @ 2019-07-10 21:48 David Marchand 2019-07-10 21:48 ` [dpdk-dev] [PATCH 1/2] Revert "bus/pci: add Mellanox kernel driver type" David Marchand ` (3 more replies) 0 siblings, 4 replies; 57+ messages in thread From: David Marchand @ 2019-07-10 21:48 UTC (permalink / raw) To: dev; +Cc: anatoly.burakov, jerinj, thomas Following the issues reported by Jerin and the discussion that emerged from it, here are fixes to restore and document the behavior of the EAL and the pci bus driver. I pondered all the arguments and tried to have the less changes possible. I can't find a need for a flag to just announce support of physical addresses from the pmd point of view. So it ended up with something really close to what Jerin had suggested. But the problem is that this is still unfinished wrt the documentation. I will be offline for 10 days and we need this to move forward, so sending anyway. TODO on the second patch: - split it (?), - add documentation on PCI bus considerations, - add more rationals on RTE_IOVA_DC in the commitlog and the documentation, - fix the remaining bugs (hopefully, none), -- David Marchand David Marchand (2): Revert "bus/pci: add Mellanox kernel driver type" eal: fix IOVA mode selection as VA for pci drivers doc/guides/prog_guide/env_abstraction_layer.rst | 31 +++++++++++++++++++++++++ drivers/bus/pci/linux/pci.c | 24 +++++-------------- drivers/bus/pci/pci_common.c | 30 +++++++++++++++++++----- drivers/bus/pci/rte_bus_pci.h | 4 ++-- drivers/net/atlantic/atl_ethdev.c | 3 +-- drivers/net/bnxt/bnxt_ethdev.c | 3 +-- drivers/net/e1000/em_ethdev.c | 3 +-- drivers/net/e1000/igb_ethdev.c | 5 ++-- drivers/net/enic/enic_ethdev.c | 3 +-- drivers/net/fm10k/fm10k_ethdev.c | 3 +-- drivers/net/i40e/i40e_ethdev.c | 3 +-- drivers/net/i40e/i40e_ethdev_vf.c | 2 +- drivers/net/iavf/iavf_ethdev.c | 3 +-- drivers/net/ice/ice_ethdev.c | 3 +-- drivers/net/ixgbe/ixgbe_ethdev.c | 5 ++-- drivers/net/mlx4/mlx4.c | 3 +-- drivers/net/mlx5/mlx5.c | 2 +- drivers/net/nfp/nfp_net.c | 6 ++--- drivers/net/octeontx2/otx2_ethdev.c | 5 ---- drivers/net/qede/qede_ethdev.c | 6 ++--- drivers/raw/ioat/ioat_rawdev.c | 3 +-- lib/librte_eal/common/eal_common_bus.c | 30 +++++++++++++++++++++--- lib/librte_eal/common/include/rte_dev.h | 1 - 23 files changed, 110 insertions(+), 71 deletions(-) -- 1.8.3.1 ^ permalink raw reply [flat|nested] 57+ messages in thread
* [dpdk-dev] [PATCH 1/2] Revert "bus/pci: add Mellanox kernel driver type" 2019-07-10 21:48 [dpdk-dev] [PATCH 0/2] Fixes on IOVA mode selection David Marchand @ 2019-07-10 21:48 ` David Marchand 2019-07-16 10:37 ` [dpdk-dev] [EXT] " Jerin Jacob Kollanukkaran 2019-07-10 21:48 ` [dpdk-dev] [PATCH 2/2] eal: fix IOVA mode selection as VA for pci drivers David Marchand ` (2 subsequent siblings) 3 siblings, 1 reply; 57+ messages in thread From: David Marchand @ 2019-07-10 21:48 UTC (permalink / raw) To: dev; +Cc: anatoly.burakov, jerinj, thomas This reverts commit 0cb86518db57d35e0abc14d6703fad561a0310e2. The pci bus now reports DC when faced with a device bound to an unknown driver and, in such a case, the IOVA mode is selected against physical address availability. As a consequence, there is no reason for this special case for Mellanox drivers. Fixes: 703458e19c16 ("bus/pci: consider only usable devices for IOVA mode") Signed-off-by: David Marchand <david.marchand@redhat.com> --- drivers/bus/pci/linux/pci.c | 8 -------- lib/librte_eal/common/include/rte_dev.h | 1 - 2 files changed, 9 deletions(-) diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c index 33c8ea7..b12f10a 100644 --- a/drivers/bus/pci/linux/pci.c +++ b/drivers/bus/pci/linux/pci.c @@ -329,9 +329,6 @@ dev->kdrv = RTE_KDRV_IGB_UIO; else if (!strcmp(driver, "uio_pci_generic")) dev->kdrv = RTE_KDRV_UIO_GENERIC; - else if (!strcmp(driver, "mlx4_core") || - !strcmp(driver, "mlx5_core")) - dev->kdrv = RTE_KDRV_NIC_MLX; else dev->kdrv = RTE_KDRV_UNKNOWN; } else @@ -591,11 +588,6 @@ enum rte_iova_mode break; } - case RTE_KDRV_NIC_MLX: - if ((pdrv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) == 0) - iova_mode = RTE_IOVA_PA; - break; - case RTE_KDRV_IGB_UIO: case RTE_KDRV_UIO_GENERIC: iova_mode = RTE_IOVA_PA; diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h index 94829f6..c25e09e 100644 --- a/lib/librte_eal/common/include/rte_dev.h +++ b/lib/librte_eal/common/include/rte_dev.h @@ -63,7 +63,6 @@ enum rte_kernel_driver { RTE_KDRV_VFIO, RTE_KDRV_UIO_GENERIC, RTE_KDRV_NIC_UIO, - RTE_KDRV_NIC_MLX, RTE_KDRV_NONE, }; -- 1.8.3.1 ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [dpdk-dev] [EXT] [PATCH 1/2] Revert "bus/pci: add Mellanox kernel driver type" 2019-07-10 21:48 ` [dpdk-dev] [PATCH 1/2] Revert "bus/pci: add Mellanox kernel driver type" David Marchand @ 2019-07-16 10:37 ` Jerin Jacob Kollanukkaran 0 siblings, 0 replies; 57+ messages in thread From: Jerin Jacob Kollanukkaran @ 2019-07-16 10:37 UTC (permalink / raw) To: David Marchand, dev; +Cc: anatoly.burakov, thomas > -----Original Message----- > From: David Marchand <david.marchand@redhat.com> > Sent: Thursday, July 11, 2019 3:19 AM > To: dev@dpdk.org > Cc: anatoly.burakov@intel.com; Jerin Jacob Kollanukkaran > <jerinj@marvell.com>; thomas@monjalon.net > Subject: [EXT] [PATCH 1/2] Revert "bus/pci: add Mellanox kernel driver type" > > This reverts commit 0cb86518db57d35e0abc14d6703fad561a0310e2. > > The pci bus now reports DC when faced with a device bound to an unknown > driver and, in such a case, the IOVA mode is selected against physical address > availability. > > As a consequence, there is no reason for this special case for Mellanox > drivers. > > Fixes: 703458e19c16 ("bus/pci: consider only usable devices for IOVA mode") > > Signed-off-by: David Marchand <david.marchand@redhat.com> Reviewed-by: Jerin Jacob <jerinj@marvell.com> ^ permalink raw reply [flat|nested] 57+ messages in thread
* [dpdk-dev] [PATCH 2/2] eal: fix IOVA mode selection as VA for pci drivers 2019-07-10 21:48 [dpdk-dev] [PATCH 0/2] Fixes on IOVA mode selection David Marchand 2019-07-10 21:48 ` [dpdk-dev] [PATCH 1/2] Revert "bus/pci: add Mellanox kernel driver type" David Marchand @ 2019-07-10 21:48 ` David Marchand 2019-07-11 14:40 ` Thomas Monjalon 2019-07-12 11:03 ` Burakov, Anatoly 2019-07-16 13:46 ` [dpdk-dev] [PATCH v2 0/4] Fixes on IOVA mode selection jerinj 2019-07-22 12:56 ` [dpdk-dev] [PATCH v4 " David Marchand 3 siblings, 2 replies; 57+ messages in thread From: David Marchand @ 2019-07-10 21:48 UTC (permalink / raw) To: dev Cc: anatoly.burakov, jerinj, thomas, John McNamara, Marko Kovacevic, Igor Russkikh, Pavel Belous, Ajit Khaparde, Somnath Kotur, Wenzhuo Lu, John Daley, Hyong Youb Kim, Qi Zhang, Xiao Wang, Beilei Xing, Jingjing Wu, Qiming Yang, Konstantin Ananyev, Matan Azrad, Shahaf Shuler, Yongseok Koh, Viacheslav Ovsiienko, Alejandro Lucero, Nithin Dabilpuram, Kiran Kumar K, Rasesh Mody, Shahed Shaikh, Bruce Richardson The incriminated commit broke the use of RTE_PCI_DRV_IOVA_AS_VA which was intended to mean "driver only supports VA" but had been understood as "driver supports both PA and VA" by most net drivers and used to let dpdk processes to run as non root (which do not have access to physical addresses on recent kernels). The check on physical addresses actually closed the gap for those drivers. We don't need to mark them with RTE_PCI_DRV_IOVA_AS_VA and this flag can retain its intended meaning. Document explicitly its meaning. We can check that a driver requirement wrt to IOVA mode is fulfilled before trying to probe a device. Finally, document the heuristic used to select the IOVA mode and hope that we won't break it again. Fixes: 703458e19c16 ("bus/pci: consider only usable devices for IOVA mode") Signed-off-by: David Marchand <david.marchand@redhat.com> --- doc/guides/prog_guide/env_abstraction_layer.rst | 31 +++++++++++++++++++++++++ drivers/bus/pci/linux/pci.c | 16 +++++-------- drivers/bus/pci/pci_common.c | 30 +++++++++++++++++++----- drivers/bus/pci/rte_bus_pci.h | 4 ++-- drivers/net/atlantic/atl_ethdev.c | 3 +-- drivers/net/bnxt/bnxt_ethdev.c | 3 +-- drivers/net/e1000/em_ethdev.c | 3 +-- drivers/net/e1000/igb_ethdev.c | 5 ++-- drivers/net/enic/enic_ethdev.c | 3 +-- drivers/net/fm10k/fm10k_ethdev.c | 3 +-- drivers/net/i40e/i40e_ethdev.c | 3 +-- drivers/net/i40e/i40e_ethdev_vf.c | 2 +- drivers/net/iavf/iavf_ethdev.c | 3 +-- drivers/net/ice/ice_ethdev.c | 3 +-- drivers/net/ixgbe/ixgbe_ethdev.c | 5 ++-- drivers/net/mlx4/mlx4.c | 3 +-- drivers/net/mlx5/mlx5.c | 2 +- drivers/net/nfp/nfp_net.c | 6 ++--- drivers/net/octeontx2/otx2_ethdev.c | 5 ---- drivers/net/qede/qede_ethdev.c | 6 ++--- drivers/raw/ioat/ioat_rawdev.c | 3 +-- lib/librte_eal/common/eal_common_bus.c | 30 +++++++++++++++++++++--- 22 files changed, 110 insertions(+), 62 deletions(-) diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst index f15bcd9..77307e3 100644 --- a/doc/guides/prog_guide/env_abstraction_layer.rst +++ b/doc/guides/prog_guide/env_abstraction_layer.rst @@ -419,6 +419,37 @@ Misc Functions Locks and atomic operations are per-architecture (i686 and x86_64). +IOVA Mode Detection +~~~~~~~~~~~~~~~~~~~ + +IOVA Mode is selected by considering what the current usable Devices on the +system requires and/or supports. + +Below is the 2-step heuristic for this choice. + +For the first step, EAL asks each bus its requirement in terms of IOVA mode +and decides on a preferred IOVA mode. + +- if all buses report RTE_IOVA_PA, then the preferred IOVA mode is RTE_IOVA_PA, +- if all buses report RTE_IOVA_VA, then the preferred IOVA mode is RTE_IOVA_VA, +- if all buses report RTE_IOVA_DC, no bus expressed a preferrence, then the + preferred mode is RTE_IOVA_DC, +- if the buses disagree (at least one wants RTE_IOVA_PA and at least one wants + RTE_IOVA_VA), then the preferred IOVA mode is RTE_IOVA_DC (see below with the + check on Physical Addresses availability), + +The second step is checking if the preferred mode complies with the Physical +Addresses availability since those are only available to root user in recent +kernels. + +- if the preferred mode is RTE_IOVA_PA but there is no access to Physical + Addresses, then EAL init will fail early, since later probing of the devices + would fail anyway, +- if the preferred mode is RTE_IOVA_DC then based on the Physical Addresses + availability, the preferred mode is adjusted to RTE_IOVA_PA or RTE_IOVA_VA. + In the case when the buses had disagreed on the IOVA Mode at the first step, + part of the buses won't work because of this decision. + IOVA Mode Configuration ~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c index b12f10a..1a2f99b 100644 --- a/drivers/bus/pci/linux/pci.c +++ b/drivers/bus/pci/linux/pci.c @@ -578,12 +578,10 @@ enum rte_iova_mode else is_vfio_noiommu_enabled = 0; } - if ((pdrv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) == 0) { + if (is_vfio_noiommu_enabled != 0) iova_mode = RTE_IOVA_PA; - } else if (is_vfio_noiommu_enabled != 0) { - RTE_LOG(DEBUG, EAL, "Forcing to 'PA', vfio-noiommu mode configured\n"); - iova_mode = RTE_IOVA_PA; - } + else if ((pdrv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) != 0) + iova_mode = RTE_IOVA_VA; #endif break; } @@ -594,8 +592,8 @@ enum rte_iova_mode break; default: - RTE_LOG(DEBUG, EAL, "Unsupported kernel driver? Defaulting to IOVA as 'PA'\n"); - iova_mode = RTE_IOVA_PA; + if ((pdrv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) != 0) + iova_mode = RTE_IOVA_VA; break; } @@ -607,10 +605,8 @@ enum rte_iova_mode if (iommu_no_va == -1) iommu_no_va = pci_one_device_iommu_support_va(pdev) ? 0 : 1; - if (iommu_no_va != 0) { - RTE_LOG(DEBUG, EAL, "Forcing to 'PA', IOMMU does not support IOVA as 'VA'\n"); + if (iommu_no_va != 0) iova_mode = RTE_IOVA_PA; - } } return iova_mode; } diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c index d2af472..ed55b07 100644 --- a/drivers/bus/pci/pci_common.c +++ b/drivers/bus/pci/pci_common.c @@ -169,8 +169,22 @@ static struct rte_devargs *pci_devargs_lookup(struct rte_pci_device *dev) * This needs to be before rte_pci_map_device(), as it enables to use * driver flags for adjusting configuration. */ - if (!already_probed) + if (!already_probed) { + enum rte_iova_mode dev_iova_mode; + enum rte_iova_mode iova_mode; + + dev_iova_mode = pci_device_iova_mode(dr, dev); + iova_mode = rte_eal_iova_mode(); + if (dev_iova_mode != RTE_IOVA_DC && + dev_iova_mode != iova_mode) { + RTE_LOG(ERR, EAL, " Expecting '%s' IOVA mode but current mode is '%s', not initializing\n", + dev_iova_mode == RTE_IOVA_PA ? "PA" : "VA", + iova_mode == RTE_IOVA_PA ? "PA" : "VA"); + return -EINVAL; + } + dev->driver = dr; + } if (!already_probed && (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING)) { /* map resources for devices that use igb_uio */ @@ -629,12 +643,16 @@ enum rte_iova_mode devices_want_va = true; } } - if (devices_want_pa) { - iova_mode = RTE_IOVA_PA; - if (devices_want_va) - RTE_LOG(WARNING, EAL, "Some devices want 'VA' but forcing 'PA' because other devices want it\n"); - } else if (devices_want_va) { + if (devices_want_va && !devices_want_pa) { iova_mode = RTE_IOVA_VA; + } else if (devices_want_pa && !devices_want_va) { + iova_mode = RTE_IOVA_PA; + } else { + iova_mode = RTE_IOVA_DC; + if (devices_want_va) { + RTE_LOG(WARNING, EAL, "Some devices want 'VA' but forcing 'DC' because other devices want 'PA'.\n"); + RTE_LOG(WARNING, EAL, "Depending on the final decision by the EAL, part of your devices won't initialise.\n"); + } } return iova_mode; } diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h index 06e004c..0f21775 100644 --- a/drivers/bus/pci/rte_bus_pci.h +++ b/drivers/bus/pci/rte_bus_pci.h @@ -187,8 +187,8 @@ struct rte_pci_bus { #define RTE_PCI_DRV_INTR_RMV 0x0010 /** Device driver needs to keep mapped resources if unsupported dev detected */ #define RTE_PCI_DRV_KEEP_MAPPED_RES 0x0020 -/** Device driver supports IOVA as VA */ -#define RTE_PCI_DRV_IOVA_AS_VA 0X0040 +/** Device driver only supports IOVA as VA and cannot work with IOVA as PA */ +#define RTE_PCI_DRV_IOVA_AS_VA 0x0040 /** * Map the PCI device resources in user space virtual memory address diff --git a/drivers/net/atlantic/atl_ethdev.c b/drivers/net/atlantic/atl_ethdev.c index fdc0a7f..fa89ae7 100644 --- a/drivers/net/atlantic/atl_ethdev.c +++ b/drivers/net/atlantic/atl_ethdev.c @@ -157,8 +157,7 @@ static void atl_dev_info_get(struct rte_eth_dev *dev, static struct rte_pci_driver rte_atl_pmd = { .id_table = pci_id_atl_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = eth_atl_pci_probe, .remove = eth_atl_pci_remove, }; diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c index 8fc5103..9306d56 100644 --- a/drivers/net/bnxt/bnxt_ethdev.c +++ b/drivers/net/bnxt/bnxt_ethdev.c @@ -4028,8 +4028,7 @@ static int bnxt_pci_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver bnxt_rte_pmd = { .id_table = bnxt_pci_id_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | - RTE_PCI_DRV_INTR_LSC | RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = bnxt_pci_probe, .remove = bnxt_pci_remove, }; diff --git a/drivers/net/e1000/em_ethdev.c b/drivers/net/e1000/em_ethdev.c index dc88661..0c859e5 100644 --- a/drivers/net/e1000/em_ethdev.c +++ b/drivers/net/e1000/em_ethdev.c @@ -352,8 +352,7 @@ static int eth_em_pci_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver rte_em_pmd = { .id_table = pci_id_em_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = eth_em_pci_probe, .remove = eth_em_pci_remove, }; diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c index 3ee28cf..e784eeb 100644 --- a/drivers/net/e1000/igb_ethdev.c +++ b/drivers/net/e1000/igb_ethdev.c @@ -1116,8 +1116,7 @@ static int eth_igb_pci_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver rte_igb_pmd = { .id_table = pci_id_igb_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = eth_igb_pci_probe, .remove = eth_igb_pci_remove, }; @@ -1140,7 +1139,7 @@ static int eth_igbvf_pci_remove(struct rte_pci_device *pci_dev) */ static struct rte_pci_driver rte_igbvf_pmd = { .id_table = pci_id_igbvf_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING, .probe = eth_igbvf_pci_probe, .remove = eth_igbvf_pci_remove, }; diff --git a/drivers/net/enic/enic_ethdev.c b/drivers/net/enic/enic_ethdev.c index 5cfbd31..e9c6f83 100644 --- a/drivers/net/enic/enic_ethdev.c +++ b/drivers/net/enic/enic_ethdev.c @@ -1247,8 +1247,7 @@ static int eth_enic_pci_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver rte_enic_pmd = { .id_table = pci_id_enic_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = eth_enic_pci_probe, .remove = eth_enic_pci_remove, }; diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c index a1e3836..2d3c477 100644 --- a/drivers/net/fm10k/fm10k_ethdev.c +++ b/drivers/net/fm10k/fm10k_ethdev.c @@ -3268,8 +3268,7 @@ static int eth_fm10k_pci_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver rte_pmd_fm10k = { .id_table = pci_id_fm10k_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = eth_fm10k_pci_probe, .remove = eth_fm10k_pci_remove, }; diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c index 2b9fc45..dd46d4d 100644 --- a/drivers/net/i40e/i40e_ethdev.c +++ b/drivers/net/i40e/i40e_ethdev.c @@ -696,8 +696,7 @@ static int eth_i40e_pci_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver rte_i40e_pmd = { .id_table = pci_id_i40e_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = eth_i40e_pci_probe, .remove = eth_i40e_pci_remove, }; diff --git a/drivers/net/i40e/i40e_ethdev_vf.c b/drivers/net/i40e/i40e_ethdev_vf.c index 5be32b0..3ff2f60 100644 --- a/drivers/net/i40e/i40e_ethdev_vf.c +++ b/drivers/net/i40e/i40e_ethdev_vf.c @@ -1557,7 +1557,7 @@ static int eth_i40evf_pci_remove(struct rte_pci_device *pci_dev) */ static struct rte_pci_driver rte_i40evf_pmd = { .id_table = pci_id_i40evf_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING, .probe = eth_i40evf_pci_probe, .remove = eth_i40evf_pci_remove, }; diff --git a/drivers/net/iavf/iavf_ethdev.c b/drivers/net/iavf/iavf_ethdev.c index 53dc05c..a97cd76 100644 --- a/drivers/net/iavf/iavf_ethdev.c +++ b/drivers/net/iavf/iavf_ethdev.c @@ -1402,8 +1402,7 @@ static int eth_iavf_pci_remove(struct rte_pci_device *pci_dev) /* Adaptive virtual function driver struct */ static struct rte_pci_driver rte_iavf_pmd = { .id_table = pci_id_iavf_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = eth_iavf_pci_probe, .remove = eth_iavf_pci_remove, }; diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c index 9ce730c..f05b48c 100644 --- a/drivers/net/ice/ice_ethdev.c +++ b/drivers/net/ice/ice_ethdev.c @@ -3737,8 +3737,7 @@ static int ice_xstats_get_names(__rte_unused struct rte_eth_dev *dev, static struct rte_pci_driver rte_ice_pmd = { .id_table = pci_id_ice_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = ice_pci_probe, .remove = ice_pci_remove, }; diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c index 22c5b2c..4a6e5c3 100644 --- a/drivers/net/ixgbe/ixgbe_ethdev.c +++ b/drivers/net/ixgbe/ixgbe_ethdev.c @@ -1869,8 +1869,7 @@ static int eth_ixgbe_pci_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver rte_ixgbe_pmd = { .id_table = pci_id_ixgbe_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = eth_ixgbe_pci_probe, .remove = eth_ixgbe_pci_remove, }; @@ -1892,7 +1891,7 @@ static int eth_ixgbevf_pci_remove(struct rte_pci_device *pci_dev) */ static struct rte_pci_driver rte_ixgbevf_pmd = { .id_table = pci_id_ixgbevf_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING, .probe = eth_ixgbevf_pci_probe, .remove = eth_ixgbevf_pci_remove, }; diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c index 2e169b0..d6e5753 100644 --- a/drivers/net/mlx4/mlx4.c +++ b/drivers/net/mlx4/mlx4.c @@ -1142,8 +1142,7 @@ struct mlx4_conf { }, .id_table = mlx4_pci_id_map, .probe = mlx4_pci_probe, - .drv_flags = RTE_PCI_DRV_INTR_LSC | RTE_PCI_DRV_INTR_RMV | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_INTR_LSC | RTE_PCI_DRV_INTR_RMV, }; #ifdef RTE_IBVERBS_LINK_DLOPEN diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index d93f92d..0f05853 100644 --- a/drivers/net/mlx5/mlx5.c +++ b/drivers/net/mlx5/mlx5.c @@ -2087,7 +2087,7 @@ struct mlx5_dev_spawn_data { .dma_map = mlx5_dma_map, .dma_unmap = mlx5_dma_unmap, .drv_flags = RTE_PCI_DRV_INTR_LSC | RTE_PCI_DRV_INTR_RMV | - RTE_PCI_DRV_PROBE_AGAIN | RTE_PCI_DRV_IOVA_AS_VA, + RTE_PCI_DRV_PROBE_AGAIN, }; #ifdef RTE_IBVERBS_LINK_DLOPEN diff --git a/drivers/net/nfp/nfp_net.c b/drivers/net/nfp/nfp_net.c index 1a7aa17..f5d33ef 100644 --- a/drivers/net/nfp/nfp_net.c +++ b/drivers/net/nfp/nfp_net.c @@ -3760,16 +3760,14 @@ static int eth_nfp_pci_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver rte_nfp_net_pf_pmd = { .id_table = pci_id_nfp_pf_net_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = nfp_pf_pci_probe, .remove = eth_nfp_pci_remove, }; static struct rte_pci_driver rte_nfp_net_vf_pmd = { .id_table = pci_id_nfp_vf_net_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = eth_nfp_pci_probe, .remove = eth_nfp_pci_remove, }; diff --git a/drivers/net/octeontx2/otx2_ethdev.c b/drivers/net/octeontx2/otx2_ethdev.c index 156e7d3..2697842 100644 --- a/drivers/net/octeontx2/otx2_ethdev.c +++ b/drivers/net/octeontx2/otx2_ethdev.c @@ -1188,11 +1188,6 @@ goto fail; } - if (rte_eal_iova_mode() != RTE_IOVA_VA) { - otx2_err("iova mode should be va"); - goto fail; - } - if (conf->link_speeds & ETH_LINK_SPEED_FIXED) { otx2_err("Setting link speed/duplex not supported"); goto fail; diff --git a/drivers/net/qede/qede_ethdev.c b/drivers/net/qede/qede_ethdev.c index 82363e6..0b3046a 100644 --- a/drivers/net/qede/qede_ethdev.c +++ b/drivers/net/qede/qede_ethdev.c @@ -2737,8 +2737,7 @@ static int qedevf_eth_dev_pci_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver rte_qedevf_pmd = { .id_table = pci_id_qedevf_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = qedevf_eth_dev_pci_probe, .remove = qedevf_eth_dev_pci_remove, }; @@ -2757,8 +2756,7 @@ static int qede_eth_dev_pci_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver rte_qede_pmd = { .id_table = pci_id_qede_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = qede_eth_dev_pci_probe, .remove = qede_eth_dev_pci_remove, }; diff --git a/drivers/raw/ioat/ioat_rawdev.c b/drivers/raw/ioat/ioat_rawdev.c index d509b66..7270ad7 100644 --- a/drivers/raw/ioat/ioat_rawdev.c +++ b/drivers/raw/ioat/ioat_rawdev.c @@ -338,8 +338,7 @@ static struct rte_pci_driver ioat_pmd_drv = { .id_table = pci_id_ioat_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = ioat_rawdev_probe, .remove = ioat_rawdev_remove, }; diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c index 77f1be1..bf0da6e 100644 --- a/lib/librte_eal/common/eal_common_bus.c +++ b/lib/librte_eal/common/eal_common_bus.c @@ -228,13 +228,37 @@ struct rte_bus * enum rte_iova_mode rte_bus_get_iommu_class(void) { - int mode = RTE_IOVA_DC; + enum rte_iova_mode mode = RTE_IOVA_DC; + bool buses_want_va = false; + bool buses_want_pa = false; struct rte_bus *bus; TAILQ_FOREACH(bus, &rte_bus_list, next) { + enum rte_iova_mode bus_iova_mode; - if (bus->get_iommu_class) - mode |= bus->get_iommu_class(); + if (bus->get_iommu_class == NULL) + continue; + + bus_iova_mode = bus->get_iommu_class(); + RTE_LOG(DEBUG, EAL, "Bus %s wants IOVA as '%s'\n", + bus->name, + bus_iova_mode == RTE_IOVA_DC ? "DC" : + (bus_iova_mode == RTE_IOVA_PA ? "PA" : "VA")); + if (bus_iova_mode == RTE_IOVA_PA) + buses_want_pa = true; + else if (bus_iova_mode == RTE_IOVA_VA) + buses_want_va = true; + } + if (buses_want_va && !buses_want_pa) { + mode = RTE_IOVA_VA; + } else if (buses_want_pa && !buses_want_va) { + mode = RTE_IOVA_PA; + } else { + mode = RTE_IOVA_DC; + if (buses_want_va) { + RTE_LOG(WARNING, EAL, "Some buses want 'VA' but forcing 'DC' because other buses want 'PA'.\n"); + RTE_LOG(WARNING, EAL, "Depending on the final decision by the EAL, part of your buses won't initialise.\n"); + } } return mode; -- 1.8.3.1 ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [dpdk-dev] [PATCH 2/2] eal: fix IOVA mode selection as VA for pci drivers 2019-07-10 21:48 ` [dpdk-dev] [PATCH 2/2] eal: fix IOVA mode selection as VA for pci drivers David Marchand @ 2019-07-11 14:40 ` Thomas Monjalon 2019-07-12 8:05 ` Jerin Jacob Kollanukkaran 2019-07-12 11:03 ` Burakov, Anatoly 1 sibling, 1 reply; 57+ messages in thread From: Thomas Monjalon @ 2019-07-11 14:40 UTC (permalink / raw) To: dev Cc: David Marchand, anatoly.burakov, jerinj, John McNamara, Marko Kovacevic, Igor Russkikh, Pavel Belous, Ajit Khaparde, Somnath Kotur, Wenzhuo Lu, John Daley, Hyong Youb Kim, Qi Zhang, Xiao Wang, Beilei Xing, Jingjing Wu, Qiming Yang, Konstantin Ananyev, Matan Azrad, Shahaf Shuler, Yongseok Koh, Viacheslav Ovsiienko, Alejandro Lucero, Nithin Dabilpuram, Kiran Kumar K, Rasesh Mody, Shahed Shaikh, Bruce Richardson I was expecting some replies / reviews of this patch today. 10/07/2019 23:48, David Marchand: > The incriminated commit broke the use of RTE_PCI_DRV_IOVA_AS_VA which > was intended to mean "driver only supports VA" but had been understood > as "driver supports both PA and VA" by most net drivers and used to let > dpdk processes to run as non root (which do not have access to physical > addresses on recent kernels). > > The check on physical addresses actually closed the gap for those > drivers. We don't need to mark them with RTE_PCI_DRV_IOVA_AS_VA and this > flag can retain its intended meaning. > Document explicitly its meaning. > > We can check that a driver requirement wrt to IOVA mode is fulfilled > before trying to probe a device. > > Finally, document the heuristic used to select the IOVA mode and hope > that we won't break it again. > > Fixes: 703458e19c16 ("bus/pci: consider only usable devices for IOVA mode") > > Signed-off-by: David Marchand <david.marchand@redhat.com> ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [dpdk-dev] [PATCH 2/2] eal: fix IOVA mode selection as VA for pci drivers 2019-07-11 14:40 ` Thomas Monjalon @ 2019-07-12 8:05 ` Jerin Jacob Kollanukkaran 0 siblings, 0 replies; 57+ messages in thread From: Jerin Jacob Kollanukkaran @ 2019-07-12 8:05 UTC (permalink / raw) To: Thomas Monjalon, dev Cc: David Marchand, anatoly.burakov, John McNamara, Marko Kovacevic, Igor Russkikh, Pavel Belous, Ajit Khaparde, Somnath Kotur, Wenzhuo Lu, John Daley, Hyong Youb Kim, Qi Zhang, Xiao Wang, Beilei Xing, Jingjing Wu, Qiming Yang, Konstantin Ananyev, Matan Azrad, Shahaf Shuler, Yongseok Koh, Viacheslav Ovsiienko, Alejandro Lucero, Nithin Kumar Dabilpuram, Kiran Kumar Kokkilagadda, Rasesh Mody, Shahed Shaikh, Bruce Richardson > -----Original Message----- > From: Thomas Monjalon <thomas@monjalon.net> > Sent: Thursday, July 11, 2019 8:11 PM > To: dev@dpdk.org > Cc: David Marchand <david.marchand@redhat.com>; > anatoly.burakov@intel.com; Jerin Jacob Kollanukkaran > <jerinj@marvell.com>; John McNamara <john.mcnamara@intel.com>; > Marko Kovacevic <marko.kovacevic@intel.com>; Igor Russkikh > <igor.russkikh@aquantia.com>; Pavel Belous <pavel.belous@aquantia.com>; > Ajit Khaparde <ajit.khaparde@broadcom.com>; Somnath Kotur > <somnath.kotur@broadcom.com>; Wenzhuo Lu <wenzhuo.lu@intel.com>; > John Daley <johndale@cisco.com>; Hyong Youb Kim <hyonkim@cisco.com>; > Qi Zhang <qi.z.zhang@intel.com>; Xiao Wang <xiao.w.wang@intel.com>; > Beilei Xing <beilei.xing@intel.com>; Jingjing Wu <jingjing.wu@intel.com>; > Qiming Yang <qiming.yang@intel.com>; Konstantin Ananyev > <konstantin.ananyev@intel.com>; Matan Azrad <matan@mellanox.com>; > Shahaf Shuler <shahafs@mellanox.com>; Yongseok Koh > <yskoh@mellanox.com>; Viacheslav Ovsiienko > <viacheslavo@mellanox.com>; Alejandro Lucero > <alejandro.lucero@netronome.com>; Nithin Kumar Dabilpuram > <ndabilpuram@marvell.com>; Kiran Kumar Kokkilagadda > <kirankumark@marvell.com>; Rasesh Mody <rmody@marvell.com>; Shahed > Shaikh <shshaikh@marvell.com>; Bruce Richardson > <bruce.richardson@intel.com> > Subject: Re: [dpdk-dev] [PATCH 2/2] eal: fix IOVA mode selection as VA for > pci drivers > > I was expecting some replies / reviews of this patch today. In general, the theme is OK with this patch. It will fix the existing problems. I need to spend more time on reviewing the code and documentation. ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [dpdk-dev] [PATCH 2/2] eal: fix IOVA mode selection as VA for pci drivers 2019-07-10 21:48 ` [dpdk-dev] [PATCH 2/2] eal: fix IOVA mode selection as VA for pci drivers David Marchand 2019-07-11 14:40 ` Thomas Monjalon @ 2019-07-12 11:03 ` Burakov, Anatoly 2019-07-12 12:43 ` Thomas Monjalon 1 sibling, 1 reply; 57+ messages in thread From: Burakov, Anatoly @ 2019-07-12 11:03 UTC (permalink / raw) To: David Marchand, dev Cc: jerinj, thomas, John McNamara, Marko Kovacevic, Igor Russkikh, Pavel Belous, Ajit Khaparde, Somnath Kotur, Wenzhuo Lu, John Daley, Hyong Youb Kim, Qi Zhang, Xiao Wang, Beilei Xing, Jingjing Wu, Qiming Yang, Konstantin Ananyev, Matan Azrad, Shahaf Shuler, Yongseok Koh, Viacheslav Ovsiienko, Alejandro Lucero, Nithin Dabilpuram, Kiran Kumar K, Rasesh Mody, Shahed Shaikh, Bruce Richardson On 10-Jul-19 10:48 PM, David Marchand wrote: > The incriminated commit broke the use of RTE_PCI_DRV_IOVA_AS_VA which > was intended to mean "driver only supports VA" but had been understood > as "driver supports both PA and VA" by most net drivers and used to let > dpdk processes to run as non root (which do not have access to physical > addresses on recent kernels). > > The check on physical addresses actually closed the gap for those > drivers. We don't need to mark them with RTE_PCI_DRV_IOVA_AS_VA and this > flag can retain its intended meaning. > Document explicitly its meaning. > So, we always assume that all devices support both IOVA as PA and IOVA as VA by default. Well, as long as it's understood and documented :) Unless... <snip> > + > +IOVA Mode is selected by considering what the current usable Devices on the > +system requires and/or supports. > + > +Below is the 2-step heuristic for this choice. > + > +For the first step, EAL asks each bus its requirement in terms of IOVA mode > +and decides on a preferred IOVA mode. > + > +- if all buses report RTE_IOVA_PA, then the preferred IOVA mode is RTE_IOVA_PA, > +- if all buses report RTE_IOVA_VA, then the preferred IOVA mode is RTE_IOVA_VA, > +- if all buses report RTE_IOVA_DC, no bus expressed a preferrence, then the > + preferred mode is RTE_IOVA_DC, > +- if the buses disagree (at least one wants RTE_IOVA_PA and at least one wants > + RTE_IOVA_VA), then the preferred IOVA mode is RTE_IOVA_DC (see below with the > + check on Physical Addresses availability), > + > +The second step is checking if the preferred mode complies with the Physical > +Addresses availability since those are only available to root user in recent > +kernels. > + > +- if the preferred mode is RTE_IOVA_PA but there is no access to Physical > + Addresses, then EAL init will fail early, since later probing of the devices > + would fail anyway, > +- if the preferred mode is RTE_IOVA_DC then based on the Physical Addresses > + availability, the preferred mode is adjusted to RTE_IOVA_PA or RTE_IOVA_VA. > + In the case when the buses had disagreed on the IOVA Mode at the first step, > + part of the buses won't work because of this decision. Is there any specific reason why we always prefer PA if physical addresses are available? Since we're already assuming that all devices support PA and VA anyway, what's the harm in enabling VA by default? I seem to recall there were some concerns around SPDK and PA address availability - doesn't that mean that the assumption regarding PA and VA mode always being supported doesn't actually hold in practice? By the way, the reason i'm harping away on IOVA as VA being the default is because having IOVA as PA is not a free (as in beer) choice - we sacrifice some usability by doing that. Right now, by default, mempool will ask for IOVA-contiguous memory first, and this is slow in IOVA as PA mode - meaning, e.g. testpmd startup time is greatly increased for smaller page sizes because of IOVA as PA mode is the default in DPDK. I would also like to steer people away from using real physical addresses because doing so while requiring lots of IOVA contiguous memory also requires legacy mem mode, which i would rather people not use and grow dependent on, and would like to remove it at some point as it adds a lot of complexity for a corner case. So, picking address mode is not *just* about whether the device supports them - it has usability implications as well. -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [dpdk-dev] [PATCH 2/2] eal: fix IOVA mode selection as VA for pci drivers 2019-07-12 11:03 ` Burakov, Anatoly @ 2019-07-12 12:43 ` Thomas Monjalon 2019-07-12 12:58 ` Burakov, Anatoly 2019-07-15 14:26 ` Jerin Jacob Kollanukkaran 0 siblings, 2 replies; 57+ messages in thread From: Thomas Monjalon @ 2019-07-12 12:43 UTC (permalink / raw) To: Burakov, Anatoly Cc: David Marchand, dev, jerinj, John McNamara, Marko Kovacevic, Igor Russkikh, Pavel Belous, Ajit Khaparde, Somnath Kotur, Wenzhuo Lu, John Daley, Hyong Youb Kim, Qi Zhang, Xiao Wang, Beilei Xing, Jingjing Wu, Qiming Yang, Konstantin Ananyev, Matan Azrad, Shahaf Shuler, Yongseok Koh, Viacheslav Ovsiienko, Alejandro Lucero, Nithin Dabilpuram, Kiran Kumar K, Rasesh Mody, Shahed Shaikh, Bruce Richardson, alialnu, aconole 12/07/2019 13:03, Burakov, Anatoly: > On 10-Jul-19 10:48 PM, David Marchand wrote: > > The incriminated commit broke the use of RTE_PCI_DRV_IOVA_AS_VA which > > was intended to mean "driver only supports VA" but had been understood > > as "driver supports both PA and VA" by most net drivers and used to let > > dpdk processes to run as non root (which do not have access to physical > > addresses on recent kernels). > > > > The check on physical addresses actually closed the gap for those > > drivers. We don't need to mark them with RTE_PCI_DRV_IOVA_AS_VA and this > > flag can retain its intended meaning. > > Document explicitly its meaning. > > > > So, we always assume that all devices support both IOVA as PA and IOVA > as VA by default. Well, as long as it's understood and documented :) Yes Please make sure it is well documented. > Unless... > > > <snip> > > > + > > +IOVA Mode is selected by considering what the current usable Devices on the > > +system requires and/or supports. > > + > > +Below is the 2-step heuristic for this choice. > > + > > +For the first step, EAL asks each bus its requirement in terms of IOVA mode > > +and decides on a preferred IOVA mode. > > + > > +- if all buses report RTE_IOVA_PA, then the preferred IOVA mode is RTE_IOVA_PA, > > +- if all buses report RTE_IOVA_VA, then the preferred IOVA mode is RTE_IOVA_VA, > > +- if all buses report RTE_IOVA_DC, no bus expressed a preferrence, then the > > + preferred mode is RTE_IOVA_DC, > > +- if the buses disagree (at least one wants RTE_IOVA_PA and at least one wants > > + RTE_IOVA_VA), then the preferred IOVA mode is RTE_IOVA_DC (see below with the > > + check on Physical Addresses availability), > > + > > +The second step is checking if the preferred mode complies with the Physical > > +Addresses availability since those are only available to root user in recent > > +kernels. > > + > > +- if the preferred mode is RTE_IOVA_PA but there is no access to Physical > > + Addresses, then EAL init will fail early, since later probing of the devices > > + would fail anyway, > > +- if the preferred mode is RTE_IOVA_DC then based on the Physical Addresses > > + availability, the preferred mode is adjusted to RTE_IOVA_PA or RTE_IOVA_VA. > > + In the case when the buses had disagreed on the IOVA Mode at the first step, > > + part of the buses won't work because of this decision. > > Is there any specific reason why we always prefer PA if physical > addresses are available? Since we're already assuming that all devices > support PA and VA anyway, what's the harm in enabling VA by default? If PA is available, it means we are running as root. We can assume that using root is a choice, probably related to a preference for PA. > I seem to recall there were some concerns around SPDK and PA address > availability - doesn't that mean that the assumption regarding PA and VA > mode always being supported doesn't actually hold in practice? > > By the way, the reason i'm harping away on IOVA as VA being the default > is because having IOVA as PA is not a free (as in beer) choice - we > sacrifice some usability by doing that. Right now, by default, mempool > will ask for IOVA-contiguous memory first, and this is slow in IOVA as > PA mode - meaning, e.g. testpmd startup time is greatly increased for > smaller page sizes because of IOVA as PA mode is the default in DPDK. > > I would also like to steer people away from using real physical > addresses because doing so while requiring lots of IOVA contiguous > memory also requires legacy mem mode, which i would rather people not > use and grow dependent on, and would like to remove it at some point as > it adds a lot of complexity for a corner case. That's why we should better encourage to not run as root. We need more documentation about how to run as normal user. > So, picking address mode is not *just* about whether the device supports > them - it has usability implications as well. If we consider running as root an exception, then it makes sense to pick address mode which fits this exception (PA). ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [dpdk-dev] [PATCH 2/2] eal: fix IOVA mode selection as VA for pci drivers 2019-07-12 12:43 ` Thomas Monjalon @ 2019-07-12 12:58 ` Burakov, Anatoly 2019-07-12 13:19 ` Bruce Richardson 2019-07-15 14:26 ` Jerin Jacob Kollanukkaran 1 sibling, 1 reply; 57+ messages in thread From: Burakov, Anatoly @ 2019-07-12 12:58 UTC (permalink / raw) To: Thomas Monjalon Cc: David Marchand, dev, jerinj, John McNamara, Marko Kovacevic, Igor Russkikh, Pavel Belous, Ajit Khaparde, Somnath Kotur, Wenzhuo Lu, John Daley, Hyong Youb Kim, Qi Zhang, Xiao Wang, Beilei Xing, Jingjing Wu, Qiming Yang, Konstantin Ananyev, Matan Azrad, Shahaf Shuler, Yongseok Koh, Viacheslav Ovsiienko, Alejandro Lucero, Nithin Dabilpuram, Kiran Kumar K, Rasesh Mody, Shahed Shaikh, Bruce Richardson, alialnu, aconole On 12-Jul-19 1:43 PM, Thomas Monjalon wrote: > 12/07/2019 13:03, Burakov, Anatoly: >> On 10-Jul-19 10:48 PM, David Marchand wrote: >>> The incriminated commit broke the use of RTE_PCI_DRV_IOVA_AS_VA which >>> was intended to mean "driver only supports VA" but had been understood >>> as "driver supports both PA and VA" by most net drivers and used to let >>> dpdk processes to run as non root (which do not have access to physical >>> addresses on recent kernels). >>> >>> The check on physical addresses actually closed the gap for those >>> drivers. We don't need to mark them with RTE_PCI_DRV_IOVA_AS_VA and this >>> flag can retain its intended meaning. >>> Document explicitly its meaning. >>> >> >> So, we always assume that all devices support both IOVA as PA and IOVA >> as VA by default. Well, as long as it's understood and documented :) > > Yes > Please make sure it is well documented. > >> Unless... >> >> >> <snip> >> >>> + >>> +IOVA Mode is selected by considering what the current usable Devices on the >>> +system requires and/or supports. >>> + >>> +Below is the 2-step heuristic for this choice. >>> + >>> +For the first step, EAL asks each bus its requirement in terms of IOVA mode >>> +and decides on a preferred IOVA mode. >>> + >>> +- if all buses report RTE_IOVA_PA, then the preferred IOVA mode is RTE_IOVA_PA, >>> +- if all buses report RTE_IOVA_VA, then the preferred IOVA mode is RTE_IOVA_VA, >>> +- if all buses report RTE_IOVA_DC, no bus expressed a preferrence, then the >>> + preferred mode is RTE_IOVA_DC, >>> +- if the buses disagree (at least one wants RTE_IOVA_PA and at least one wants >>> + RTE_IOVA_VA), then the preferred IOVA mode is RTE_IOVA_DC (see below with the >>> + check on Physical Addresses availability), >>> + >>> +The second step is checking if the preferred mode complies with the Physical >>> +Addresses availability since those are only available to root user in recent >>> +kernels. >>> + >>> +- if the preferred mode is RTE_IOVA_PA but there is no access to Physical >>> + Addresses, then EAL init will fail early, since later probing of the devices >>> + would fail anyway, >>> +- if the preferred mode is RTE_IOVA_DC then based on the Physical Addresses >>> + availability, the preferred mode is adjusted to RTE_IOVA_PA or RTE_IOVA_VA. >>> + In the case when the buses had disagreed on the IOVA Mode at the first step, >>> + part of the buses won't work because of this decision. >> >> Is there any specific reason why we always prefer PA if physical >> addresses are available? Since we're already assuming that all devices >> support PA and VA anyway, what's the harm in enabling VA by default? > > If PA is available, it means we are running as root. > We can assume that using root is a choice, probably related > to a preference for PA. > >> I seem to recall there were some concerns around SPDK and PA address >> availability - doesn't that mean that the assumption regarding PA and VA >> mode always being supported doesn't actually hold in practice? >> >> By the way, the reason i'm harping away on IOVA as VA being the default >> is because having IOVA as PA is not a free (as in beer) choice - we >> sacrifice some usability by doing that. Right now, by default, mempool >> will ask for IOVA-contiguous memory first, and this is slow in IOVA as >> PA mode - meaning, e.g. testpmd startup time is greatly increased for >> smaller page sizes because of IOVA as PA mode is the default in DPDK. >> >> I would also like to steer people away from using real physical >> addresses because doing so while requiring lots of IOVA contiguous >> memory also requires legacy mem mode, which i would rather people not >> use and grow dependent on, and would like to remove it at some point as >> it adds a lot of complexity for a corner case. > > That's why we should better encourage to not run as root. > We need more documentation about how to run as normal user. > >> So, picking address mode is not *just* about whether the device supports >> them - it has usability implications as well. > > If we consider running as root an exception, then it makes > sense to pick address mode which fits this exception (PA). > When you put it that way, that does indeed make sense. Typically though, developers tend to run as root. I shall hereby stop doing so :) -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [dpdk-dev] [PATCH 2/2] eal: fix IOVA mode selection as VA for pci drivers 2019-07-12 12:58 ` Burakov, Anatoly @ 2019-07-12 13:19 ` Bruce Richardson 0 siblings, 0 replies; 57+ messages in thread From: Bruce Richardson @ 2019-07-12 13:19 UTC (permalink / raw) To: Burakov, Anatoly Cc: Thomas Monjalon, David Marchand, dev, jerinj, John McNamara, Marko Kovacevic, Igor Russkikh, Pavel Belous, Ajit Khaparde, Somnath Kotur, Wenzhuo Lu, John Daley, Hyong Youb Kim, Qi Zhang, Xiao Wang, Beilei Xing, Jingjing Wu, Qiming Yang, Konstantin Ananyev, Matan Azrad, Shahaf Shuler, Yongseok Koh, Viacheslav Ovsiienko, Alejandro Lucero, Nithin Dabilpuram, Kiran Kumar K, Rasesh Mody, Shahed Shaikh, alialnu, aconole On Fri, Jul 12, 2019 at 01:58:46PM +0100, Burakov, Anatoly wrote: > On 12-Jul-19 1:43 PM, Thomas Monjalon wrote: > > If we consider running as root an exception, then it makes > > sense to pick address mode which fits this exception (PA). > > > > When you put it that way, that does indeed make sense. Typically though, > developers tend to run as root. I shall hereby stop doing so :) > Welcome to the sane side! Learn to love "sudo"! ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [dpdk-dev] [PATCH 2/2] eal: fix IOVA mode selection as VA for pci drivers 2019-07-12 12:43 ` Thomas Monjalon 2019-07-12 12:58 ` Burakov, Anatoly @ 2019-07-15 14:26 ` Jerin Jacob Kollanukkaran 2019-07-15 15:03 ` Thomas Monjalon 1 sibling, 1 reply; 57+ messages in thread From: Jerin Jacob Kollanukkaran @ 2019-07-15 14:26 UTC (permalink / raw) To: Thomas Monjalon, Burakov, Anatoly Cc: David Marchand, dev, John McNamara, Marko Kovacevic, Igor Russkikh, Pavel Belous, Ajit Khaparde, Somnath Kotur, Wenzhuo Lu, John Daley, Hyong Youb Kim, Qi Zhang, Xiao Wang, Beilei Xing, Jingjing Wu, Qiming Yang, Konstantin Ananyev, Matan Azrad, Shahaf Shuler, Yongseok Koh, Viacheslav Ovsiienko, Alejandro Lucero, Nithin Kumar Dabilpuram, Kiran Kumar Kokkilagadda, Rasesh Mody, Shahed Shaikh, Bruce Richardson, alialnu, aconole > > > + > > > +IOVA Mode is selected by considering what the current usable > > > +Devices on the system requires and/or supports. > > > + > > > +Below is the 2-step heuristic for this choice. > > > + > > > +For the first step, EAL asks each bus its requirement in terms of > > > +IOVA mode and decides on a preferred IOVA mode. > > > + > > > +- if all buses report RTE_IOVA_PA, then the preferred IOVA mode is > > > +RTE_IOVA_PA, > > > +- if all buses report RTE_IOVA_VA, then the preferred IOVA mode is > > > +RTE_IOVA_VA, > > > +- if all buses report RTE_IOVA_DC, no bus expressed a preferrence, > > > +then the > > > + preferred mode is RTE_IOVA_DC, > > > +- if the buses disagree (at least one wants RTE_IOVA_PA and at > > > +least one wants > > > + RTE_IOVA_VA), then the preferred IOVA mode is RTE_IOVA_DC (see > > > +below with the > > > + check on Physical Addresses availability), > > > + > > > +The second step is checking if the preferred mode complies with the > > > +Physical Addresses availability since those are only available to > > > +root user in recent kernels. > > > + > > > +- if the preferred mode is RTE_IOVA_PA but there is no access to > > > +Physical > > > + Addresses, then EAL init will fail early, since later probing of > > > +the devices > > > + would fail anyway, > > > +- if the preferred mode is RTE_IOVA_DC then based on the Physical > > > +Addresses > > > + availability, the preferred mode is adjusted to RTE_IOVA_PA or > RTE_IOVA_VA. > > > + In the case when the buses had disagreed on the IOVA Mode at the > > > +first step, > > > + part of the buses won't work because of this decision. > > > > Is there any specific reason why we always prefer PA if physical > > addresses are available? Since we're already assuming that all devices > > support PA and VA anyway, what's the harm in enabling VA by default? > > If PA is available, it means we are running as root. > We can assume that using root is a choice, probably related to a preference > for PA. # Even if we are running as root, Why to choose PA in case of DC? ie. Following logic is not need if (iova_mode == RTE_IOVA_DC) { iova_mode = phys_addrs ? RTE_IOVA_PA : RTE_IOVA_VA; RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode, using '%s' based on physical addresses availability.\n", phys_addrs ? "PA" : "VA"); } # When DPDK running on guest, Anyway it can not access the real PA, It will be IPA. So I don't understand logic behind choose PA when DC. To me, it make sense to choose PA when DC. # To align with RTE_PCI_DRV_NEED_MAPPING flag and reflect it "need" rather than support, I think, flag can be changed to RTE_PCI_DRV_NEED_IOVA_AS_VA Other than above points, Reviewed this patch and tested on octeontx2, It looks good to me. ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [dpdk-dev] [PATCH 2/2] eal: fix IOVA mode selection as VA for pci drivers 2019-07-15 14:26 ` Jerin Jacob Kollanukkaran @ 2019-07-15 15:03 ` Thomas Monjalon 2019-07-15 15:35 ` Jerin Jacob Kollanukkaran 0 siblings, 1 reply; 57+ messages in thread From: Thomas Monjalon @ 2019-07-15 15:03 UTC (permalink / raw) To: Jerin Jacob Kollanukkaran Cc: Burakov, Anatoly, David Marchand, dev, John McNamara, Marko Kovacevic, Igor Russkikh, Pavel Belous, Ajit Khaparde, Somnath Kotur, Wenzhuo Lu, John Daley, Hyong Youb Kim, Qi Zhang, Xiao Wang, Beilei Xing, Jingjing Wu, Qiming Yang, Konstantin Ananyev, Matan Azrad, Shahaf Shuler, Yongseok Koh, Viacheslav Ovsiienko, Alejandro Lucero, Nithin Kumar Dabilpuram, Kiran Kumar Kokkilagadda, Rasesh Mody, Shahed Shaikh, Bruce Richardson, alialnu, aconole 15/07/2019 16:26, Jerin Jacob Kollanukkaran: > > > > + > > > > +IOVA Mode is selected by considering what the current usable > > > > +Devices on the system requires and/or supports. > > > > + > > > > +Below is the 2-step heuristic for this choice. > > > > + > > > > +For the first step, EAL asks each bus its requirement in terms of > > > > +IOVA mode and decides on a preferred IOVA mode. > > > > + > > > > +- if all buses report RTE_IOVA_PA, then the preferred IOVA mode is > > > > +RTE_IOVA_PA, > > > > +- if all buses report RTE_IOVA_VA, then the preferred IOVA mode is > > > > +RTE_IOVA_VA, > > > > +- if all buses report RTE_IOVA_DC, no bus expressed a preferrence, > > > > +then the > > > > + preferred mode is RTE_IOVA_DC, > > > > +- if the buses disagree (at least one wants RTE_IOVA_PA and at > > > > +least one wants > > > > + RTE_IOVA_VA), then the preferred IOVA mode is RTE_IOVA_DC (see > > > > +below with the > > > > + check on Physical Addresses availability), > > > > + > > > > +The second step is checking if the preferred mode complies with the > > > > +Physical Addresses availability since those are only available to > > > > +root user in recent kernels. > > > > + > > > > +- if the preferred mode is RTE_IOVA_PA but there is no access to > > > > +Physical > > > > + Addresses, then EAL init will fail early, since later probing of > > > > +the devices > > > > + would fail anyway, > > > > +- if the preferred mode is RTE_IOVA_DC then based on the Physical > > > > +Addresses > > > > + availability, the preferred mode is adjusted to RTE_IOVA_PA or > > RTE_IOVA_VA. > > > > + In the case when the buses had disagreed on the IOVA Mode at the > > > > +first step, > > > > + part of the buses won't work because of this decision. > > > > > > Is there any specific reason why we always prefer PA if physical > > > addresses are available? Since we're already assuming that all devices > > > support PA and VA anyway, what's the harm in enabling VA by default? > > > > If PA is available, it means we are running as root. > > We can assume that using root is a choice, probably related to a preference > > for PA. > > # Even if we are running as root, Why to choose PA in case of DC? > ie. Following logic is not need > if (iova_mode == RTE_IOVA_DC) { > iova_mode = phys_addrs ? RTE_IOVA_PA : RTE_IOVA_VA; > RTE_LOG(DEBUG, EAL, > "Buses did not request a specific IOVA mode, using '%s' based on physical addresses availability.\n", > phys_addrs ? "PA" : "VA"); > } Why running as root if using VA anyway? We can assume the user knows what he is doing, so it is a user choice. We want to allow the user choosing, right? > # When DPDK running on guest, Anyway it can not access the real PA, It will be IPA. What is IPA? Isn't it a beer? > So I don't understand logic behind choose PA when DC. > To me, it make sense to choose PA when DC. You probably mean "choose VA". > # To align with RTE_PCI_DRV_NEED_MAPPING flag and reflect it "need" rather > than support, I think, flag can be changed to RTE_PCI_DRV_NEED_IOVA_AS_VA I think the most important is to have a good documentation of this flag (it was not done properly when Cavium introduced it initially). If you want to rename the flag, you can do it in a separate patch. If renaming, I really would like to get an answer to an old question: Why IO adress is called IOVA? The name "IOVA_AS_VA" looks strange. For reference, one description of addressing: https://lists.linuxfoundation.org/pipermail/iommu/2018-May/027686.html About the naming, do you remember how I insisted to have a correct naming of all related stuff in DPDK? It was hard to get it accepted, the discussion was not nice and I stopped insisting to get all details fine because I just got bored. It was a really bad experience. You can ask why I remind this now? Because we must take care of all details, make sure our messages are well understood, and be cooperative. > Other than above points, > Reviewed this patch and tested on octeontx2, It looks good to me. ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [dpdk-dev] [PATCH 2/2] eal: fix IOVA mode selection as VA for pci drivers 2019-07-15 15:03 ` Thomas Monjalon @ 2019-07-15 15:35 ` Jerin Jacob Kollanukkaran 2019-07-15 16:06 ` Thomas Monjalon 0 siblings, 1 reply; 57+ messages in thread From: Jerin Jacob Kollanukkaran @ 2019-07-15 15:35 UTC (permalink / raw) To: Thomas Monjalon Cc: Burakov, Anatoly, David Marchand, dev, John McNamara, Marko Kovacevic, Igor Russkikh, Pavel Belous, Ajit Khaparde, Somnath Kotur, Wenzhuo Lu, John Daley, Hyong Youb Kim, Qi Zhang, Xiao Wang, Beilei Xing, Jingjing Wu, Qiming Yang, Konstantin Ananyev, Matan Azrad, Shahaf Shuler, Yongseok Koh, Viacheslav Ovsiienko, Alejandro Lucero, Nithin Kumar Dabilpuram, Kiran Kumar Kokkilagadda, Rasesh Mody, Shahed Shaikh, Bruce Richardson, alialnu, aconole > -----Original Message----- > From: Thomas Monjalon <thomas@monjalon.net> > Sent: Monday, July 15, 2019 8:34 PM > To: Jerin Jacob Kollanukkaran <jerinj@marvell.com> > Cc: Burakov, Anatoly <anatoly.burakov@intel.com>; David Marchand > <david.marchand@redhat.com>; dev@dpdk.org; John McNamara > <john.mcnamara@intel.com>; Marko Kovacevic > <marko.kovacevic@intel.com>; Igor Russkikh > <igor.russkikh@aquantia.com>; Pavel Belous <pavel.belous@aquantia.com>; > Ajit Khaparde <ajit.khaparde@broadcom.com>; Somnath Kotur > <somnath.kotur@broadcom.com>; Wenzhuo Lu <wenzhuo.lu@intel.com>; > John Daley <johndale@cisco.com>; Hyong Youb Kim <hyonkim@cisco.com>; > Qi Zhang <qi.z.zhang@intel.com>; Xiao Wang <xiao.w.wang@intel.com>; > Beilei Xing <beilei.xing@intel.com>; Jingjing Wu <jingjing.wu@intel.com>; > Qiming Yang <qiming.yang@intel.com>; Konstantin Ananyev > <konstantin.ananyev@intel.com>; Matan Azrad <matan@mellanox.com>; > Shahaf Shuler <shahafs@mellanox.com>; Yongseok Koh > <yskoh@mellanox.com>; Viacheslav Ovsiienko > <viacheslavo@mellanox.com>; Alejandro Lucero > <alejandro.lucero@netronome.com>; Nithin Kumar Dabilpuram > <ndabilpuram@marvell.com>; Kiran Kumar Kokkilagadda > <kirankumark@marvell.com>; Rasesh Mody <rmody@marvell.com>; Shahed > Shaikh <shshaikh@marvell.com>; Bruce Richardson > <bruce.richardson@intel.com>; alialnu@mellanox.com; > aconole@redhat.com > Subject: Re: [PATCH 2/2] eal: fix IOVA mode selection as VA for pci drivers > > 15/07/2019 16:26, Jerin Jacob Kollanukkaran: > > > > > + > > > > > +IOVA Mode is selected by considering what the current usable > > > > > +Devices on the system requires and/or supports. > > > > > + > > > > > +Below is the 2-step heuristic for this choice. > > > > > + > > > > > +For the first step, EAL asks each bus its requirement in terms > > > > > +of IOVA mode and decides on a preferred IOVA mode. > > > > > + > > > > > +- if all buses report RTE_IOVA_PA, then the preferred IOVA mode > > > > > +is RTE_IOVA_PA, > > > > > +- if all buses report RTE_IOVA_VA, then the preferred IOVA mode > > > > > +is RTE_IOVA_VA, > > > > > +- if all buses report RTE_IOVA_DC, no bus expressed a > > > > > +preferrence, then the > > > > > + preferred mode is RTE_IOVA_DC, > > > > > +- if the buses disagree (at least one wants RTE_IOVA_PA and at > > > > > +least one wants > > > > > + RTE_IOVA_VA), then the preferred IOVA mode is RTE_IOVA_DC > > > > > +(see below with the > > > > > + check on Physical Addresses availability), > > > > > + > > > > > +The second step is checking if the preferred mode complies with > > > > > +the Physical Addresses availability since those are only > > > > > +available to root user in recent kernels. > > > > > + > > > > > +- if the preferred mode is RTE_IOVA_PA but there is no access > > > > > +to Physical > > > > > + Addresses, then EAL init will fail early, since later probing > > > > > +of the devices > > > > > + would fail anyway, > > > > > +- if the preferred mode is RTE_IOVA_DC then based on the > > > > > +Physical Addresses > > > > > + availability, the preferred mode is adjusted to RTE_IOVA_PA > > > > > +or > > > RTE_IOVA_VA. > > > > > + In the case when the buses had disagreed on the IOVA Mode at > > > > > +the first step, > > > > > + part of the buses won't work because of this decision. > > > > > > > > Is there any specific reason why we always prefer PA if physical > > > > addresses are available? Since we're already assuming that all > > > > devices support PA and VA anyway, what's the harm in enabling VA by > default? > > > > > > If PA is available, it means we are running as root. > > > We can assume that using root is a choice, probably related to a > > > preference for PA. > > > > # Even if we are running as root, Why to choose PA in case of DC? > > ie. Following logic is not need > > if (iova_mode == RTE_IOVA_DC) { > > iova_mode = phys_addrs ? RTE_IOVA_PA : RTE_IOVA_VA; > > RTE_LOG(DEBUG, EAL, > > "Buses did not request a specific IOVA mode, using '%s' > based on physical addresses availability.\n", > > phys_addrs ? "PA" : "VA"); > > } > > Why running as root if using VA anyway? > We can assume the user knows what he is doing, so it is a user choice. > We want to allow the user choosing, right? The user can override iova=pa/va as eal argument if user needs to run a specific mode. Running as root for various other reason(just be lazy) etc. it is not or it should not be connected to set the mode as PA. > > > # When DPDK running on guest, Anyway it can not access the real PA, It will > be IPA. > > What is IPA? Isn't it a beer? There may a beer with that name. In this context, it is "Intermediate physical address" > > > So I don't understand logic behind choose PA when DC. > > To me, it make sense to choose PA when DC. > > You probably mean "choose VA". Yup. > > > # To align with RTE_PCI_DRV_NEED_MAPPING flag and reflect it "need" > > rather than support, I think, flag can be changed to > > RTE_PCI_DRV_NEED_IOVA_AS_VA > > I think the most important is to have a good documentation of this flag (it > was not done properly when Cavium introduced it initially). > If you want to rename the flag, you can do it in a separate patch. > If renaming, I really would like to get an answer to an old question: > Why IO adress is called IOVA? The name "IOVA_AS_VA" looks strange. IOVA = IO virtual address Since IOVA can be PA or VA, the name IOVA_AS_VA as chosen > For reference, one description of addressing: > https://lists.linuxfoundation.org/pipermail/iommu/2018-May/027686.html > > About the naming, do you remember how I insisted to have a correct naming > of all related stuff in DPDK? It was hard to get it accepted, the discussion was > not nice and I stopped insisting to get all details fine because I just got bored. > It was a really bad experience. I agree. To me that bad experience was due to mostly not having enough technical comments On the proposal. Though I am not the author/owner of it. > You can ask why I remind this now? Because we must take care of all details, > make sure our messages are well understood, and be cooperative. No disagreement. If we see the history the meaning got changed/updated in this commit By adding intel drivers to it. I would nt say it is big ideal, It just C code, It can be changed based on the need. I think, what really import is, maintain the the feature and commitment towards fixing any issue. commit f37dfab21c988d2d0ecb3c82be4ba9738c7e51c7 Author: Jianfeng Tan <jianfeng.tan@intel.com> Date: Wed Oct 11 10:33:48 2017 +0000 drivers/net: enable IOVA mode for Intel PMDs If we want to enable IOVA mode, introduced by commit 93878cf0255e ("eal: introduce helper API for IOVA mode"), we need PMDs (for PCI devices) to expose this flag. Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> > > > Other than above points, > > Reviewed this patch and tested on octeontx2, It looks good to me. > > ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [dpdk-dev] [PATCH 2/2] eal: fix IOVA mode selection as VA for pci drivers 2019-07-15 15:35 ` Jerin Jacob Kollanukkaran @ 2019-07-15 16:06 ` Thomas Monjalon 2019-07-15 16:27 ` Jerin Jacob Kollanukkaran 0 siblings, 1 reply; 57+ messages in thread From: Thomas Monjalon @ 2019-07-15 16:06 UTC (permalink / raw) To: Jerin Jacob Kollanukkaran Cc: Burakov, Anatoly, David Marchand, dev, John McNamara, Marko Kovacevic, Igor Russkikh, Pavel Belous, Ajit Khaparde, Somnath Kotur, Wenzhuo Lu, John Daley, Hyong Youb Kim, Qi Zhang, Xiao Wang, Beilei Xing, Jingjing Wu, Qiming Yang, Konstantin Ananyev, Matan Azrad, Shahaf Shuler, Yongseok Koh, Viacheslav Ovsiienko, Alejandro Lucero, Nithin Kumar Dabilpuram, Kiran Kumar Kokkilagadda, Rasesh Mody, Shahed Shaikh, Bruce Richardson, alialnu, aconole 15/07/2019 17:35, Jerin Jacob Kollanukkaran: > From: Thomas Monjalon <thomas@monjalon.net> > > 15/07/2019 16:26, Jerin Jacob Kollanukkaran: > > > > > Is there any specific reason why we always prefer PA if physical > > > > > addresses are available? Since we're already assuming that all > > > > > devices support PA and VA anyway, what's the harm in enabling VA by > > default? > > > > > > > > If PA is available, it means we are running as root. > > > > We can assume that using root is a choice, probably related to a > > > > preference for PA. > > > > > > # Even if we are running as root, Why to choose PA in case of DC? > > > ie. Following logic is not need > > > if (iova_mode == RTE_IOVA_DC) { > > > iova_mode = phys_addrs ? RTE_IOVA_PA : RTE_IOVA_VA; > > > RTE_LOG(DEBUG, EAL, > > > "Buses did not request a specific IOVA mode, using '%s' > > based on physical addresses availability.\n", > > > phys_addrs ? "PA" : "VA"); > > > } > > > > Why running as root if using VA anyway? > > We can assume the user knows what he is doing, so it is a user choice. > > We want to allow the user choosing, right? > > The user can override iova=pa/va as eal argument if user needs to run a specific mode. > Running as root for various other reason(just be lazy) etc. it is not or it should not > be connected to set the mode as PA. Good point. I tend to prefer avoiding the use of EAL arguments because they may be unavailable, depending on the application. > > > # When DPDK running on guest, Anyway it can not access the real PA, It will > > be IPA. > > > > What is IPA? Isn't it a beer? > > There may a beer with that name. In this context, it is "Intermediate physical address" > > > > So I don't understand logic behind choose PA when DC. > > > To me, it make sense to choose PA when DC. > > > > You probably mean "choose VA". > > Yup. > > > > # To align with RTE_PCI_DRV_NEED_MAPPING flag and reflect it "need" > > > rather than support, I think, flag can be changed to > > > RTE_PCI_DRV_NEED_IOVA_AS_VA > > > > I think the most important is to have a good documentation of this flag (it > > was not done properly when Cavium introduced it initially). > > If you want to rename the flag, you can do it in a separate patch. > > If renaming, I really would like to get an answer to an old question: > > Why IO adress is called IOVA? The name "IOVA_AS_VA" looks strange. > > IOVA = IO virtual address > Since IOVA can be PA or VA, the name IOVA_AS_VA as chosen We could also call it "bus address" or "device address". I think the word "IOVA" was enforced by Linux. Anyway, my real issue when using "virtual" is that we don't really know what we are talking about: is it an IOMMU translated address on the device side or an MMU translated address on the application side? I think we should better explain things. One diagram which can help: https://en.wikipedia.org/wiki/Input%E2%80%93output_memory_management_unit#/media/File:MMU_and_IOMMU.svg > > For reference, one description of addressing: > > https://lists.linuxfoundation.org/pipermail/iommu/2018-May/027686.html > > > > About the naming, do you remember how I insisted to have a correct naming > > of all related stuff in DPDK? It was hard to get it accepted, the discussion was > > not nice and I stopped insisting to get all details fine because I just got bored. > > It was a really bad experience. > > I agree. > To me that bad experience was due to mostly not having enough technical comments > On the proposal. Though I am not the author/owner of it. > > > You can ask why I remind this now? Because we must take care of all details, > > make sure our messages are well understood, and be cooperative. > > No disagreement. > If we see the history the meaning got changed/updated in this commit > By adding intel drivers to it. I would nt say it is big ideal, It just C code, > It can be changed based on the need. I think, what really import is, > maintain the the feature and commitment towards fixing any issue. > > commit f37dfab21c988d2d0ecb3c82be4ba9738c7e51c7 > Author: Jianfeng Tan <jianfeng.tan@intel.com> > Date: Wed Oct 11 10:33:48 2017 +0000 > > drivers/net: enable IOVA mode for Intel PMDs > > If we want to enable IOVA mode, introduced by > commit 93878cf0255e ("eal: introduce helper API for IOVA mode"), > we need PMDs (for PCI devices) to expose this flag. > > Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> > Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> > Reviewed-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> The doxygen meaning did not change from day one: /** Device driver supports IOVA as VA */ But the commit log meaning was: "Flag used when driver needs to operate in iova=va mode." And the Intel commit log had a different understanding: "If we want to enable IOVA mode, [..] we need PMDs [..] to expose this flag." Anyway we agree on the new meaning to be the original one the author had in mind (i.e. "driver needs"). > > > Other than above points, > > > Reviewed this patch and tested on octeontx2, It looks good to me. ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [dpdk-dev] [PATCH 2/2] eal: fix IOVA mode selection as VA for pci drivers 2019-07-15 16:06 ` Thomas Monjalon @ 2019-07-15 16:27 ` Jerin Jacob Kollanukkaran 0 siblings, 0 replies; 57+ messages in thread From: Jerin Jacob Kollanukkaran @ 2019-07-15 16:27 UTC (permalink / raw) To: Thomas Monjalon Cc: Burakov, Anatoly, David Marchand, dev, John McNamara, Marko Kovacevic, Igor Russkikh, Pavel Belous, Ajit Khaparde, Somnath Kotur, Wenzhuo Lu, John Daley, Hyong Youb Kim, Qi Zhang, Xiao Wang, Beilei Xing, Jingjing Wu, Qiming Yang, Konstantin Ananyev, Matan Azrad, Shahaf Shuler, Yongseok Koh, Viacheslav Ovsiienko, Alejandro Lucero, Nithin Kumar Dabilpuram, Kiran Kumar Kokkilagadda, Rasesh Mody, Shahed Shaikh, Bruce Richardson, alialnu, aconole > -----Original Message----- > From: Thomas Monjalon <thomas@monjalon.net> > Sent: Monday, July 15, 2019 9:36 PM > To: Jerin Jacob Kollanukkaran <jerinj@marvell.com> > Cc: Burakov, Anatoly <anatoly.burakov@intel.com>; David Marchand > <david.marchand@redhat.com>; dev@dpdk.org; John McNamara > <john.mcnamara@intel.com>; Marko Kovacevic > <marko.kovacevic@intel.com>; Igor Russkikh > <igor.russkikh@aquantia.com>; Pavel Belous <pavel.belous@aquantia.com>; > Ajit Khaparde <ajit.khaparde@broadcom.com>; Somnath Kotur > <somnath.kotur@broadcom.com>; Wenzhuo Lu <wenzhuo.lu@intel.com>; > John Daley <johndale@cisco.com>; Hyong Youb Kim <hyonkim@cisco.com>; > Qi Zhang <qi.z.zhang@intel.com>; Xiao Wang <xiao.w.wang@intel.com>; > Beilei Xing <beilei.xing@intel.com>; Jingjing Wu <jingjing.wu@intel.com>; > Qiming Yang <qiming.yang@intel.com>; Konstantin Ananyev > <konstantin.ananyev@intel.com>; Matan Azrad <matan@mellanox.com>; > Shahaf Shuler <shahafs@mellanox.com>; Yongseok Koh > <yskoh@mellanox.com>; Viacheslav Ovsiienko > <viacheslavo@mellanox.com>; Alejandro Lucero > <alejandro.lucero@netronome.com>; Nithin Kumar Dabilpuram > <ndabilpuram@marvell.com>; Kiran Kumar Kokkilagadda > <kirankumark@marvell.com>; Rasesh Mody <rmody@marvell.com>; Shahed > Shaikh <shshaikh@marvell.com>; Bruce Richardson > <bruce.richardson@intel.com>; alialnu@mellanox.com; > aconole@redhat.com > Subject: Re: [PATCH 2/2] eal: fix IOVA mode selection as VA for pci drivers > > 15/07/2019 17:35, Jerin Jacob Kollanukkaran: > > From: Thomas Monjalon <thomas@monjalon.net> > > > 15/07/2019 16:26, Jerin Jacob Kollanukkaran: > > > > > > Is there any specific reason why we always prefer PA if > > > > > > physical addresses are available? Since we're already assuming > > > > > > that all devices support PA and VA anyway, what's the harm in > > > > > > enabling VA by > > > default? > > > > > > > > > > If PA is available, it means we are running as root. > > > > > We can assume that using root is a choice, probably related to a > > > > > preference for PA. > > > > > > > > # Even if we are running as root, Why to choose PA in case of DC? > > > > ie. Following logic is not need > > > > if (iova_mode == RTE_IOVA_DC) { > > > > iova_mode = phys_addrs ? RTE_IOVA_PA : RTE_IOVA_VA; > > > > RTE_LOG(DEBUG, EAL, > > > > "Buses did not request a specific IOVA mode, using '%s' > > > based on physical addresses availability.\n", > > > > phys_addrs ? "PA" : "VA"); > > > > } > > > > > > Why running as root if using VA anyway? > > > We can assume the user knows what he is doing, so it is a user choice. > > > We want to allow the user choosing, right? > > > > The user can override iova=pa/va as eal argument if user needs to run a > specific mode. > > Running as root for various other reason(just be lazy) etc. it is not > > or it should not be connected to set the mode as PA. > > Good point. > I tend to prefer avoiding the use of EAL arguments because they may be > unavailable, depending on the application. Yes. The default case suffice the requirement here.ie when it DC chosen from bus layer select the VA. I don't see any point in overriding that. It is a good default. Do you think any case where it need to be "changed"? If not, let stick with VA i.e until unless if there no HARD requirement for PA. i.e Stayaway from PA WHEN possible. > > > > > # When DPDK running on guest, Anyway it can not access the real > > > > PA, It will > > > be IPA. > > > > > > What is IPA? Isn't it a beer? > > > > There may a beer with that name. In this context, it is "Intermediate > physical address" > > > > > > So I don't understand logic behind choose PA when DC. > > > > To me, it make sense to choose PA when DC. > > > > > > You probably mean "choose VA". > > > > Yup. > > > > > > # To align with RTE_PCI_DRV_NEED_MAPPING flag and reflect it "need" > > > > rather than support, I think, flag can be changed to > > > > RTE_PCI_DRV_NEED_IOVA_AS_VA > > > > > > I think the most important is to have a good documentation of this > > > flag (it was not done properly when Cavium introduced it initially). > > > If you want to rename the flag, you can do it in a separate patch. > > > If renaming, I really would like to get an answer to an old question: > > > Why IO adress is called IOVA? The name "IOVA_AS_VA" looks strange. > > > > IOVA = IO virtual address > > Since IOVA can be PA or VA, the name IOVA_AS_VA as chosen > > We could also call it "bus address" or "device address". > I think the word "IOVA" was enforced by Linux. > Anyway, my real issue when using "virtual" is that we don't really know what > we are talking about: is it an IOMMU translated address on the device side or > an MMU translated address on the application side? Actually in linux kernel, it creates the same mapping for device and CPU so user both can access the very same address. In this context, IOVA means, Virtual address for device. The OS can do same mapping in CPU MMU tables as well. > > I think we should better explain things. > One diagram which can help: > https://en.wikipedia.org/wiki/Input%E2%80%93output_memory_managem > ent_unit#/media/File:MMU_and_IOMMU.svg > > > > For reference, one description of addressing: > > > https://lists.linuxfoundation.org/pipermail/iommu/2018-May/027686.ht > > > ml > > > > > > About the naming, do you remember how I insisted to have a correct > > > naming of all related stuff in DPDK? It was hard to get it accepted, > > > the discussion was not nice and I stopped insisting to get all details fine > because I just got bored. > > > It was a really bad experience. > > > > I agree. > > To me that bad experience was due to mostly not having enough > > technical comments On the proposal. Though I am not the author/owner of > it. > > > > > You can ask why I remind this now? Because we must take care of all > > > details, make sure our messages are well understood, and be > cooperative. > > > > No disagreement. > > If we see the history the meaning got changed/updated in this commit > > By adding intel drivers to it. I would nt say it is big ideal, It > > just C code, It can be changed based on the need. I think, what really > > import is, maintain the the feature and commitment towards fixing any > issue. > > > > commit f37dfab21c988d2d0ecb3c82be4ba9738c7e51c7 > > Author: Jianfeng Tan <jianfeng.tan@intel.com> > > Date: Wed Oct 11 10:33:48 2017 +0000 > > > > drivers/net: enable IOVA mode for Intel PMDs > > > > If we want to enable IOVA mode, introduced by > > commit 93878cf0255e ("eal: introduce helper API for IOVA mode"), > > we need PMDs (for PCI devices) to expose this flag. > > > > Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> > > Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> > > Reviewed-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> > > The doxygen meaning did not change from day one: > /** Device driver supports IOVA as VA */ But the commit log > meaning was: > "Flag used when driver needs to operate in iova=va mode." > And the Intel commit log had a different understanding: > "If we want to enable IOVA mode, [..] we need PMDs [..] to expose > this flag." > > Anyway we agree on the new meaning to be the original one the author had > in mind (i.e. "driver needs"). > > > > > Other than above points, > > > > Reviewed this patch and tested on octeontx2, It looks good to me. > > ^ permalink raw reply [flat|nested] 57+ messages in thread
* [dpdk-dev] [PATCH v2 0/4] Fixes on IOVA mode selection 2019-07-10 21:48 [dpdk-dev] [PATCH 0/2] Fixes on IOVA mode selection David Marchand 2019-07-10 21:48 ` [dpdk-dev] [PATCH 1/2] Revert "bus/pci: add Mellanox kernel driver type" David Marchand 2019-07-10 21:48 ` [dpdk-dev] [PATCH 2/2] eal: fix IOVA mode selection as VA for pci drivers David Marchand @ 2019-07-16 13:46 ` jerinj 2019-07-16 13:46 ` [dpdk-dev] [PATCH v2 1/4] Revert "bus/pci: add Mellanox kernel driver type" jerinj ` (4 more replies) 2019-07-22 12:56 ` [dpdk-dev] [PATCH v4 " David Marchand 3 siblings, 5 replies; 57+ messages in thread From: jerinj @ 2019-07-16 13:46 UTC (permalink / raw) To: dev; +Cc: thomas, david.marchand, anatoly.burakov, Jerin Jacob From: Jerin Jacob <jerinj@marvell.com> Orginal V1 cover letter from David Marchand: Following the issues reported by Jerin and the discussion that emerged from it, here are fixes to restore and document the behavior of the EAL and the pci bus driver. I pondered all the arguments and tried to have the less changes possible. I can't find a need for a flag to just announce support of physical addresses from the pmd point of view. So it ended up with something really close to what Jerin had suggested. But the problem is that this is still unfinished wrt the documentation. I will be offline for 10 days and we need this to move forward, so sending anyway. v2: - Changed RTE_PCI_DRV_IOVA_AS_VA flag name as RTE_PCI_DRV_NEED_IOVA_AS_VA(patch 3/4) - Changed IOVA mode as VA for default case(patch 4/4) with documentation - Tested the patch series on octeontx2 platform David Marchand (2): Revert "bus/pci: add Mellanox kernel driver type" eal: fix IOVA mode selection as VA for pci drivers Jerin Jacob (2): eal: change RTE_PCI_DRV_IOVA_AS_VA flag name eal: select IOVA mode as VA for default case .../prog_guide/env_abstraction_layer.rst | 37 +++++++++++++++++++ drivers/bus/pci/linux/pci.c | 24 +++--------- drivers/bus/pci/pci_common.c | 30 ++++++++++++--- drivers/bus/pci/rte_bus_pci.h | 4 +- drivers/event/octeontx/timvf_probe.c | 2 +- drivers/event/octeontx2/otx2_evdev.c | 2 +- drivers/mempool/octeontx/octeontx_fpavf.c | 2 +- drivers/mempool/octeontx2/otx2_mempool.c | 2 +- drivers/net/atlantic/atl_ethdev.c | 3 +- drivers/net/bnxt/bnxt_ethdev.c | 3 +- drivers/net/e1000/em_ethdev.c | 3 +- drivers/net/e1000/igb_ethdev.c | 5 +-- drivers/net/enic/enic_ethdev.c | 3 +- drivers/net/fm10k/fm10k_ethdev.c | 3 +- drivers/net/i40e/i40e_ethdev.c | 3 +- drivers/net/i40e/i40e_ethdev_vf.c | 2 +- drivers/net/iavf/iavf_ethdev.c | 3 +- drivers/net/ice/ice_ethdev.c | 3 +- drivers/net/ixgbe/ixgbe_ethdev.c | 5 +-- drivers/net/mlx4/mlx4.c | 3 +- drivers/net/mlx5/mlx5.c | 2 +- drivers/net/nfp/nfp_net.c | 6 +-- drivers/net/octeontx2/otx2_ethdev.c | 7 +--- drivers/net/qede/qede_ethdev.c | 6 +-- drivers/raw/ioat/ioat_rawdev.c | 3 +- drivers/raw/octeontx2_dma/otx2_dpi_rawdev.c | 2 +- lib/librte_eal/common/eal_common_bus.c | 30 +++++++++++++-- lib/librte_eal/common/include/rte_dev.h | 1 - lib/librte_eal/linux/eal/eal.c | 6 +-- 29 files changed, 124 insertions(+), 81 deletions(-) -- 2.22.0 ^ permalink raw reply [flat|nested] 57+ messages in thread
* [dpdk-dev] [PATCH v2 1/4] Revert "bus/pci: add Mellanox kernel driver type" 2019-07-16 13:46 ` [dpdk-dev] [PATCH v2 0/4] Fixes on IOVA mode selection jerinj @ 2019-07-16 13:46 ` jerinj 2019-07-16 13:46 ` [dpdk-dev] [PATCH v2 2/4] eal: fix IOVA mode selection as VA for pci drivers jerinj ` (3 subsequent siblings) 4 siblings, 0 replies; 57+ messages in thread From: jerinj @ 2019-07-16 13:46 UTC (permalink / raw) To: dev; +Cc: thomas, david.marchand, anatoly.burakov, Jerin Jacob From: David Marchand <david.marchand@redhat.com> This reverts commit 0cb86518db57d35e0abc14d6703fad561a0310e2. The pci bus now reports DC when faced with a device bound to an unknown driver and, in such a case, the IOVA mode is selected against physical address availability. As a consequence, there is no reason for this special case for Mellanox drivers. Fixes: 703458e19c16 ("bus/pci: consider only usable devices for IOVA mode") Signed-off-by: David Marchand <david.marchand@redhat.com> Reviewed-by: Jerin Jacob <jerinj@marvell.com> --- drivers/bus/pci/linux/pci.c | 8 -------- lib/librte_eal/common/include/rte_dev.h | 1 - 2 files changed, 9 deletions(-) diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c index 33c8ea7e9..b12f10af5 100644 --- a/drivers/bus/pci/linux/pci.c +++ b/drivers/bus/pci/linux/pci.c @@ -329,9 +329,6 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr) dev->kdrv = RTE_KDRV_IGB_UIO; else if (!strcmp(driver, "uio_pci_generic")) dev->kdrv = RTE_KDRV_UIO_GENERIC; - else if (!strcmp(driver, "mlx4_core") || - !strcmp(driver, "mlx5_core")) - dev->kdrv = RTE_KDRV_NIC_MLX; else dev->kdrv = RTE_KDRV_UNKNOWN; } else @@ -591,11 +588,6 @@ pci_device_iova_mode(const struct rte_pci_driver *pdrv, break; } - case RTE_KDRV_NIC_MLX: - if ((pdrv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) == 0) - iova_mode = RTE_IOVA_PA; - break; - case RTE_KDRV_IGB_UIO: case RTE_KDRV_UIO_GENERIC: iova_mode = RTE_IOVA_PA; diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h index 94829f6e4..c25e09e3d 100644 --- a/lib/librte_eal/common/include/rte_dev.h +++ b/lib/librte_eal/common/include/rte_dev.h @@ -63,7 +63,6 @@ enum rte_kernel_driver { RTE_KDRV_VFIO, RTE_KDRV_UIO_GENERIC, RTE_KDRV_NIC_UIO, - RTE_KDRV_NIC_MLX, RTE_KDRV_NONE, }; -- 2.22.0 ^ permalink raw reply [flat|nested] 57+ messages in thread
* [dpdk-dev] [PATCH v2 2/4] eal: fix IOVA mode selection as VA for pci drivers 2019-07-16 13:46 ` [dpdk-dev] [PATCH v2 0/4] Fixes on IOVA mode selection jerinj 2019-07-16 13:46 ` [dpdk-dev] [PATCH v2 1/4] Revert "bus/pci: add Mellanox kernel driver type" jerinj @ 2019-07-16 13:46 ` jerinj 2019-07-16 14:26 ` Burakov, Anatoly 2019-07-16 13:46 ` [dpdk-dev] [PATCH v2 3/4] eal: change RTE_PCI_DRV_IOVA_AS_VA flag name jerinj ` (2 subsequent siblings) 4 siblings, 1 reply; 57+ messages in thread From: jerinj @ 2019-07-16 13:46 UTC (permalink / raw) To: dev, Anatoly Burakov, John McNamara, Marko Kovacevic, Igor Russkikh, Pavel Belous, Ajit Khaparde, Somnath Kotur, Wenzhuo Lu, John Daley, Hyong Youb Kim, Qi Zhang, Xiao Wang, Beilei Xing, Jingjing Wu, Qiming Yang, Konstantin Ananyev, Matan Azrad, Shahaf Shuler, Yongseok Koh, Viacheslav Ovsiienko, Alejandro Lucero, Jerin Jacob, Nithin Dabilpuram, Kiran Kumar K, Rasesh Mody, Shahed Shaikh, Bruce Richardson Cc: thomas, david.marchand From: David Marchand <david.marchand@redhat.com> The incriminated commit broke the use of RTE_PCI_DRV_IOVA_AS_VA which was intended to mean "driver only supports VA" but had been understood as "driver supports both PA and VA" by most net drivers and used to let dpdk processes to run as non root (which do not have access to physical addresses on recent kernels). The check on physical addresses actually closed the gap for those drivers. We don't need to mark them with RTE_PCI_DRV_IOVA_AS_VA and this flag can retain its intended meaning. Document explicitly its meaning. We can check that a driver requirement wrt to IOVA mode is fulfilled before trying to probe a device. Finally, document the heuristic used to select the IOVA mode and hope that we won't break it again. Fixes: 703458e19c16 ("bus/pci: consider only usable devices for IOVA mode") Signed-off-by: David Marchand <david.marchand@redhat.com> Reviewed-by: Jerin Jacob <jerinj@marvell.com> Tested-by: Jerin Jacob <jerinj@marvell.com> --- .../prog_guide/env_abstraction_layer.rst | 31 +++++++++++++++++++ drivers/bus/pci/linux/pci.c | 16 ++++------ drivers/bus/pci/pci_common.c | 30 ++++++++++++++---- drivers/bus/pci/rte_bus_pci.h | 4 +-- drivers/net/atlantic/atl_ethdev.c | 3 +- drivers/net/bnxt/bnxt_ethdev.c | 3 +- drivers/net/e1000/em_ethdev.c | 3 +- drivers/net/e1000/igb_ethdev.c | 5 ++- drivers/net/enic/enic_ethdev.c | 3 +- drivers/net/fm10k/fm10k_ethdev.c | 3 +- drivers/net/i40e/i40e_ethdev.c | 3 +- drivers/net/i40e/i40e_ethdev_vf.c | 2 +- drivers/net/iavf/iavf_ethdev.c | 3 +- drivers/net/ice/ice_ethdev.c | 3 +- drivers/net/ixgbe/ixgbe_ethdev.c | 5 ++- drivers/net/mlx4/mlx4.c | 3 +- drivers/net/mlx5/mlx5.c | 2 +- drivers/net/nfp/nfp_net.c | 6 ++-- drivers/net/octeontx2/otx2_ethdev.c | 5 --- drivers/net/qede/qede_ethdev.c | 6 ++-- drivers/raw/ioat/ioat_rawdev.c | 3 +- lib/librte_eal/common/eal_common_bus.c | 30 ++++++++++++++++-- 22 files changed, 110 insertions(+), 62 deletions(-) diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst index f15bcd976..77307e3a6 100644 --- a/doc/guides/prog_guide/env_abstraction_layer.rst +++ b/doc/guides/prog_guide/env_abstraction_layer.rst @@ -419,6 +419,37 @@ Misc Functions Locks and atomic operations are per-architecture (i686 and x86_64). +IOVA Mode Detection +~~~~~~~~~~~~~~~~~~~ + +IOVA Mode is selected by considering what the current usable Devices on the +system requires and/or supports. + +Below is the 2-step heuristic for this choice. + +For the first step, EAL asks each bus its requirement in terms of IOVA mode +and decides on a preferred IOVA mode. + +- if all buses report RTE_IOVA_PA, then the preferred IOVA mode is RTE_IOVA_PA, +- if all buses report RTE_IOVA_VA, then the preferred IOVA mode is RTE_IOVA_VA, +- if all buses report RTE_IOVA_DC, no bus expressed a preferrence, then the + preferred mode is RTE_IOVA_DC, +- if the buses disagree (at least one wants RTE_IOVA_PA and at least one wants + RTE_IOVA_VA), then the preferred IOVA mode is RTE_IOVA_DC (see below with the + check on Physical Addresses availability), + +The second step is checking if the preferred mode complies with the Physical +Addresses availability since those are only available to root user in recent +kernels. + +- if the preferred mode is RTE_IOVA_PA but there is no access to Physical + Addresses, then EAL init will fail early, since later probing of the devices + would fail anyway, +- if the preferred mode is RTE_IOVA_DC then based on the Physical Addresses + availability, the preferred mode is adjusted to RTE_IOVA_PA or RTE_IOVA_VA. + In the case when the buses had disagreed on the IOVA Mode at the first step, + part of the buses won't work because of this decision. + IOVA Mode Configuration ~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c index b12f10af5..1a2f99b32 100644 --- a/drivers/bus/pci/linux/pci.c +++ b/drivers/bus/pci/linux/pci.c @@ -578,12 +578,10 @@ pci_device_iova_mode(const struct rte_pci_driver *pdrv, else is_vfio_noiommu_enabled = 0; } - if ((pdrv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) == 0) { + if (is_vfio_noiommu_enabled != 0) iova_mode = RTE_IOVA_PA; - } else if (is_vfio_noiommu_enabled != 0) { - RTE_LOG(DEBUG, EAL, "Forcing to 'PA', vfio-noiommu mode configured\n"); - iova_mode = RTE_IOVA_PA; - } + else if ((pdrv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) != 0) + iova_mode = RTE_IOVA_VA; #endif break; } @@ -594,8 +592,8 @@ pci_device_iova_mode(const struct rte_pci_driver *pdrv, break; default: - RTE_LOG(DEBUG, EAL, "Unsupported kernel driver? Defaulting to IOVA as 'PA'\n"); - iova_mode = RTE_IOVA_PA; + if ((pdrv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) != 0) + iova_mode = RTE_IOVA_VA; break; } @@ -607,10 +605,8 @@ pci_device_iova_mode(const struct rte_pci_driver *pdrv, if (iommu_no_va == -1) iommu_no_va = pci_one_device_iommu_support_va(pdev) ? 0 : 1; - if (iommu_no_va != 0) { - RTE_LOG(DEBUG, EAL, "Forcing to 'PA', IOMMU does not support IOVA as 'VA'\n"); + if (iommu_no_va != 0) iova_mode = RTE_IOVA_PA; - } } return iova_mode; } diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c index d2af472ef..ed55b07f3 100644 --- a/drivers/bus/pci/pci_common.c +++ b/drivers/bus/pci/pci_common.c @@ -169,8 +169,22 @@ rte_pci_probe_one_driver(struct rte_pci_driver *dr, * This needs to be before rte_pci_map_device(), as it enables to use * driver flags for adjusting configuration. */ - if (!already_probed) + if (!already_probed) { + enum rte_iova_mode dev_iova_mode; + enum rte_iova_mode iova_mode; + + dev_iova_mode = pci_device_iova_mode(dr, dev); + iova_mode = rte_eal_iova_mode(); + if (dev_iova_mode != RTE_IOVA_DC && + dev_iova_mode != iova_mode) { + RTE_LOG(ERR, EAL, " Expecting '%s' IOVA mode but current mode is '%s', not initializing\n", + dev_iova_mode == RTE_IOVA_PA ? "PA" : "VA", + iova_mode == RTE_IOVA_PA ? "PA" : "VA"); + return -EINVAL; + } + dev->driver = dr; + } if (!already_probed && (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING)) { /* map resources for devices that use igb_uio */ @@ -629,12 +643,16 @@ rte_pci_get_iommu_class(void) devices_want_va = true; } } - if (devices_want_pa) { - iova_mode = RTE_IOVA_PA; - if (devices_want_va) - RTE_LOG(WARNING, EAL, "Some devices want 'VA' but forcing 'PA' because other devices want it\n"); - } else if (devices_want_va) { + if (devices_want_va && !devices_want_pa) { iova_mode = RTE_IOVA_VA; + } else if (devices_want_pa && !devices_want_va) { + iova_mode = RTE_IOVA_PA; + } else { + iova_mode = RTE_IOVA_DC; + if (devices_want_va) { + RTE_LOG(WARNING, EAL, "Some devices want 'VA' but forcing 'DC' because other devices want 'PA'.\n"); + RTE_LOG(WARNING, EAL, "Depending on the final decision by the EAL, part of your devices won't initialise.\n"); + } } return iova_mode; } diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h index 06e004cd3..0f2177564 100644 --- a/drivers/bus/pci/rte_bus_pci.h +++ b/drivers/bus/pci/rte_bus_pci.h @@ -187,8 +187,8 @@ struct rte_pci_bus { #define RTE_PCI_DRV_INTR_RMV 0x0010 /** Device driver needs to keep mapped resources if unsupported dev detected */ #define RTE_PCI_DRV_KEEP_MAPPED_RES 0x0020 -/** Device driver supports IOVA as VA */ -#define RTE_PCI_DRV_IOVA_AS_VA 0X0040 +/** Device driver only supports IOVA as VA and cannot work with IOVA as PA */ +#define RTE_PCI_DRV_IOVA_AS_VA 0x0040 /** * Map the PCI device resources in user space virtual memory address diff --git a/drivers/net/atlantic/atl_ethdev.c b/drivers/net/atlantic/atl_ethdev.c index fdc0a7f2d..fa89ae755 100644 --- a/drivers/net/atlantic/atl_ethdev.c +++ b/drivers/net/atlantic/atl_ethdev.c @@ -157,8 +157,7 @@ static const struct rte_pci_id pci_id_atl_map[] = { static struct rte_pci_driver rte_atl_pmd = { .id_table = pci_id_atl_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = eth_atl_pci_probe, .remove = eth_atl_pci_remove, }; diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c index 8fc510351..9306d5655 100644 --- a/drivers/net/bnxt/bnxt_ethdev.c +++ b/drivers/net/bnxt/bnxt_ethdev.c @@ -4028,8 +4028,7 @@ static int bnxt_pci_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver bnxt_rte_pmd = { .id_table = bnxt_pci_id_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | - RTE_PCI_DRV_INTR_LSC | RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = bnxt_pci_probe, .remove = bnxt_pci_remove, }; diff --git a/drivers/net/e1000/em_ethdev.c b/drivers/net/e1000/em_ethdev.c index dc886613a..0c859e52b 100644 --- a/drivers/net/e1000/em_ethdev.c +++ b/drivers/net/e1000/em_ethdev.c @@ -352,8 +352,7 @@ static int eth_em_pci_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver rte_em_pmd = { .id_table = pci_id_em_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = eth_em_pci_probe, .remove = eth_em_pci_remove, }; diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c index 3ee28cfbc..e784eeb73 100644 --- a/drivers/net/e1000/igb_ethdev.c +++ b/drivers/net/e1000/igb_ethdev.c @@ -1116,8 +1116,7 @@ static int eth_igb_pci_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver rte_igb_pmd = { .id_table = pci_id_igb_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = eth_igb_pci_probe, .remove = eth_igb_pci_remove, }; @@ -1140,7 +1139,7 @@ static int eth_igbvf_pci_remove(struct rte_pci_device *pci_dev) */ static struct rte_pci_driver rte_igbvf_pmd = { .id_table = pci_id_igbvf_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING, .probe = eth_igbvf_pci_probe, .remove = eth_igbvf_pci_remove, }; diff --git a/drivers/net/enic/enic_ethdev.c b/drivers/net/enic/enic_ethdev.c index 5cfbd31a2..e9c6f83ce 100644 --- a/drivers/net/enic/enic_ethdev.c +++ b/drivers/net/enic/enic_ethdev.c @@ -1247,8 +1247,7 @@ static int eth_enic_pci_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver rte_enic_pmd = { .id_table = pci_id_enic_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = eth_enic_pci_probe, .remove = eth_enic_pci_remove, }; diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c index a1e3836cb..2d3c47763 100644 --- a/drivers/net/fm10k/fm10k_ethdev.c +++ b/drivers/net/fm10k/fm10k_ethdev.c @@ -3268,8 +3268,7 @@ static const struct rte_pci_id pci_id_fm10k_map[] = { static struct rte_pci_driver rte_pmd_fm10k = { .id_table = pci_id_fm10k_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = eth_fm10k_pci_probe, .remove = eth_fm10k_pci_remove, }; diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c index 2b9fc4572..dd46d4d9d 100644 --- a/drivers/net/i40e/i40e_ethdev.c +++ b/drivers/net/i40e/i40e_ethdev.c @@ -696,8 +696,7 @@ static int eth_i40e_pci_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver rte_i40e_pmd = { .id_table = pci_id_i40e_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = eth_i40e_pci_probe, .remove = eth_i40e_pci_remove, }; diff --git a/drivers/net/i40e/i40e_ethdev_vf.c b/drivers/net/i40e/i40e_ethdev_vf.c index 5be32b069..3ff2f6097 100644 --- a/drivers/net/i40e/i40e_ethdev_vf.c +++ b/drivers/net/i40e/i40e_ethdev_vf.c @@ -1557,7 +1557,7 @@ static int eth_i40evf_pci_remove(struct rte_pci_device *pci_dev) */ static struct rte_pci_driver rte_i40evf_pmd = { .id_table = pci_id_i40evf_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING, .probe = eth_i40evf_pci_probe, .remove = eth_i40evf_pci_remove, }; diff --git a/drivers/net/iavf/iavf_ethdev.c b/drivers/net/iavf/iavf_ethdev.c index 53dc05c78..a97cd76fd 100644 --- a/drivers/net/iavf/iavf_ethdev.c +++ b/drivers/net/iavf/iavf_ethdev.c @@ -1402,8 +1402,7 @@ static int eth_iavf_pci_remove(struct rte_pci_device *pci_dev) /* Adaptive virtual function driver struct */ static struct rte_pci_driver rte_iavf_pmd = { .id_table = pci_id_iavf_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = eth_iavf_pci_probe, .remove = eth_iavf_pci_remove, }; diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c index 9ce730cd4..f05b48c01 100644 --- a/drivers/net/ice/ice_ethdev.c +++ b/drivers/net/ice/ice_ethdev.c @@ -3737,8 +3737,7 @@ ice_pci_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver rte_ice_pmd = { .id_table = pci_id_ice_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = ice_pci_probe, .remove = ice_pci_remove, }; diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c index 22c5b2c5c..4a6e5c32e 100644 --- a/drivers/net/ixgbe/ixgbe_ethdev.c +++ b/drivers/net/ixgbe/ixgbe_ethdev.c @@ -1869,8 +1869,7 @@ static int eth_ixgbe_pci_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver rte_ixgbe_pmd = { .id_table = pci_id_ixgbe_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = eth_ixgbe_pci_probe, .remove = eth_ixgbe_pci_remove, }; @@ -1892,7 +1891,7 @@ static int eth_ixgbevf_pci_remove(struct rte_pci_device *pci_dev) */ static struct rte_pci_driver rte_ixgbevf_pmd = { .id_table = pci_id_ixgbevf_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING, .probe = eth_ixgbevf_pci_probe, .remove = eth_ixgbevf_pci_remove, }; diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c index 2e169b088..d6e5753bf 100644 --- a/drivers/net/mlx4/mlx4.c +++ b/drivers/net/mlx4/mlx4.c @@ -1142,8 +1142,7 @@ static struct rte_pci_driver mlx4_driver = { }, .id_table = mlx4_pci_id_map, .probe = mlx4_pci_probe, - .drv_flags = RTE_PCI_DRV_INTR_LSC | RTE_PCI_DRV_INTR_RMV | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_INTR_LSC | RTE_PCI_DRV_INTR_RMV, }; #ifdef RTE_IBVERBS_LINK_DLOPEN diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index d93f92db5..0f05853f9 100644 --- a/drivers/net/mlx5/mlx5.c +++ b/drivers/net/mlx5/mlx5.c @@ -2087,7 +2087,7 @@ static struct rte_pci_driver mlx5_driver = { .dma_map = mlx5_dma_map, .dma_unmap = mlx5_dma_unmap, .drv_flags = RTE_PCI_DRV_INTR_LSC | RTE_PCI_DRV_INTR_RMV | - RTE_PCI_DRV_PROBE_AGAIN | RTE_PCI_DRV_IOVA_AS_VA, + RTE_PCI_DRV_PROBE_AGAIN, }; #ifdef RTE_IBVERBS_LINK_DLOPEN diff --git a/drivers/net/nfp/nfp_net.c b/drivers/net/nfp/nfp_net.c index 1a7aa17ee..f5d33efcf 100644 --- a/drivers/net/nfp/nfp_net.c +++ b/drivers/net/nfp/nfp_net.c @@ -3760,16 +3760,14 @@ static int eth_nfp_pci_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver rte_nfp_net_pf_pmd = { .id_table = pci_id_nfp_pf_net_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = nfp_pf_pci_probe, .remove = eth_nfp_pci_remove, }; static struct rte_pci_driver rte_nfp_net_vf_pmd = { .id_table = pci_id_nfp_vf_net_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = eth_nfp_pci_probe, .remove = eth_nfp_pci_remove, }; diff --git a/drivers/net/octeontx2/otx2_ethdev.c b/drivers/net/octeontx2/otx2_ethdev.c index fcb1869d5..5ec55511b 100644 --- a/drivers/net/octeontx2/otx2_ethdev.c +++ b/drivers/net/octeontx2/otx2_ethdev.c @@ -1188,11 +1188,6 @@ otx2_nix_configure(struct rte_eth_dev *eth_dev) goto fail; } - if (rte_eal_iova_mode() != RTE_IOVA_VA) { - otx2_err("iova mode should be va"); - goto fail; - } - if (conf->link_speeds & ETH_LINK_SPEED_FIXED) { otx2_err("Setting link speed/duplex not supported"); goto fail; diff --git a/drivers/net/qede/qede_ethdev.c b/drivers/net/qede/qede_ethdev.c index 82363e6eb..0b3046a8a 100644 --- a/drivers/net/qede/qede_ethdev.c +++ b/drivers/net/qede/qede_ethdev.c @@ -2737,8 +2737,7 @@ static int qedevf_eth_dev_pci_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver rte_qedevf_pmd = { .id_table = pci_id_qedevf_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = qedevf_eth_dev_pci_probe, .remove = qedevf_eth_dev_pci_remove, }; @@ -2757,8 +2756,7 @@ static int qede_eth_dev_pci_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver rte_qede_pmd = { .id_table = pci_id_qede_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = qede_eth_dev_pci_probe, .remove = qede_eth_dev_pci_remove, }; diff --git a/drivers/raw/ioat/ioat_rawdev.c b/drivers/raw/ioat/ioat_rawdev.c index d509b6606..7270ad7aa 100644 --- a/drivers/raw/ioat/ioat_rawdev.c +++ b/drivers/raw/ioat/ioat_rawdev.c @@ -338,8 +338,7 @@ static const struct rte_pci_id pci_id_ioat_map[] = { static struct rte_pci_driver ioat_pmd_drv = { .id_table = pci_id_ioat_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = ioat_rawdev_probe, .remove = ioat_rawdev_remove, }; diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c index 77f1be1b4..bf0da6e4f 100644 --- a/lib/librte_eal/common/eal_common_bus.c +++ b/lib/librte_eal/common/eal_common_bus.c @@ -228,13 +228,37 @@ rte_bus_find_by_device_name(const char *str) enum rte_iova_mode rte_bus_get_iommu_class(void) { - int mode = RTE_IOVA_DC; + enum rte_iova_mode mode = RTE_IOVA_DC; + bool buses_want_va = false; + bool buses_want_pa = false; struct rte_bus *bus; TAILQ_FOREACH(bus, &rte_bus_list, next) { + enum rte_iova_mode bus_iova_mode; - if (bus->get_iommu_class) - mode |= bus->get_iommu_class(); + if (bus->get_iommu_class == NULL) + continue; + + bus_iova_mode = bus->get_iommu_class(); + RTE_LOG(DEBUG, EAL, "Bus %s wants IOVA as '%s'\n", + bus->name, + bus_iova_mode == RTE_IOVA_DC ? "DC" : + (bus_iova_mode == RTE_IOVA_PA ? "PA" : "VA")); + if (bus_iova_mode == RTE_IOVA_PA) + buses_want_pa = true; + else if (bus_iova_mode == RTE_IOVA_VA) + buses_want_va = true; + } + if (buses_want_va && !buses_want_pa) { + mode = RTE_IOVA_VA; + } else if (buses_want_pa && !buses_want_va) { + mode = RTE_IOVA_PA; + } else { + mode = RTE_IOVA_DC; + if (buses_want_va) { + RTE_LOG(WARNING, EAL, "Some buses want 'VA' but forcing 'DC' because other buses want 'PA'.\n"); + RTE_LOG(WARNING, EAL, "Depending on the final decision by the EAL, part of your buses won't initialise.\n"); + } } return mode; -- 2.22.0 ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [dpdk-dev] [PATCH v2 2/4] eal: fix IOVA mode selection as VA for pci drivers 2019-07-16 13:46 ` [dpdk-dev] [PATCH v2 2/4] eal: fix IOVA mode selection as VA for pci drivers jerinj @ 2019-07-16 14:26 ` Burakov, Anatoly 2019-07-16 15:07 ` Jerin Jacob Kollanukkaran 0 siblings, 1 reply; 57+ messages in thread From: Burakov, Anatoly @ 2019-07-16 14:26 UTC (permalink / raw) To: jerinj, dev, John McNamara, Marko Kovacevic, Igor Russkikh, Pavel Belous, Ajit Khaparde, Somnath Kotur, Wenzhuo Lu, John Daley, Hyong Youb Kim, Qi Zhang, Xiao Wang, Beilei Xing, Jingjing Wu, Qiming Yang, Konstantin Ananyev, Matan Azrad, Shahaf Shuler, Yongseok Koh, Viacheslav Ovsiienko, Alejandro Lucero, Nithin Dabilpuram, Kiran Kumar K, Rasesh Mody, Shahed Shaikh, Bruce Richardson Cc: thomas, david.marchand On 16-Jul-19 2:46 PM, jerinj@marvell.com wrote: > From: David Marchand <david.marchand@redhat.com> > > The incriminated commit broke the use of RTE_PCI_DRV_IOVA_AS_VA which > was intended to mean "driver only supports VA" but had been understood > as "driver supports both PA and VA" by most net drivers and used to let > dpdk processes to run as non root (which do not have access to physical > addresses on recent kernels). > > The check on physical addresses actually closed the gap for those > drivers. We don't need to mark them with RTE_PCI_DRV_IOVA_AS_VA and this > flag can retain its intended meaning. > Document explicitly its meaning. > > We can check that a driver requirement wrt to IOVA mode is fulfilled > before trying to probe a device. > > Finally, document the heuristic used to select the IOVA mode and hope > that we won't break it again. > > Fixes: 703458e19c16 ("bus/pci: consider only usable devices for IOVA mode") > > Signed-off-by: David Marchand <david.marchand@redhat.com> > Reviewed-by: Jerin Jacob <jerinj@marvell.com> > Tested-by: Jerin Jacob <jerinj@marvell.com> > --- <snip> > @@ -629,12 +643,16 @@ rte_pci_get_iommu_class(void) > devices_want_va = true; > } > } > - if (devices_want_pa) { > - iova_mode = RTE_IOVA_PA; > - if (devices_want_va) > - RTE_LOG(WARNING, EAL, "Some devices want 'VA' but forcing 'PA' because other devices want it\n"); > - } else if (devices_want_va) { > + if (devices_want_va && !devices_want_pa) { > iova_mode = RTE_IOVA_VA; > + } else if (devices_want_pa && !devices_want_va) { > + iova_mode = RTE_IOVA_PA; > + } else { > + iova_mode = RTE_IOVA_DC; > + if (devices_want_va) { > + RTE_LOG(WARNING, EAL, "Some devices want 'VA' but forcing 'DC' because other devices want 'PA'.\n"); > + RTE_LOG(WARNING, EAL, "Depending on the final decision by the EAL, part of your devices won't initialise.\n"); Tiny nitpick - i generally don't like personal appeals in log messages, so perhaps drop the "your"? I.e. "Depending on final decision by EAL, not all devices may be able to initialize."? Same applies to the other instance of this error message. Otherwise, LGTM Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [dpdk-dev] [PATCH v2 2/4] eal: fix IOVA mode selection as VA for pci drivers 2019-07-16 14:26 ` Burakov, Anatoly @ 2019-07-16 15:07 ` Jerin Jacob Kollanukkaran 0 siblings, 0 replies; 57+ messages in thread From: Jerin Jacob Kollanukkaran @ 2019-07-16 15:07 UTC (permalink / raw) To: Burakov, Anatoly, dev, John McNamara, Marko Kovacevic, Igor Russkikh, Pavel Belous, Ajit Khaparde, Somnath Kotur, Wenzhuo Lu, John Daley, Hyong Youb Kim, Qi Zhang, Xiao Wang, Beilei Xing, Jingjing Wu, Qiming Yang, Konstantin Ananyev, Matan Azrad, Shahaf Shuler, Yongseok Koh, Viacheslav Ovsiienko, Alejandro Lucero, Nithin Kumar Dabilpuram, Kiran Kumar Kokkilagadda, Rasesh Mody, Shahed Shaikh, Bruce Richardson Cc: thomas, david.marchand > > On 16-Jul-19 2:46 PM, jerinj@marvell.com wrote: > > From: David Marchand <david.marchand@redhat.com> > > > > The incriminated commit broke the use of RTE_PCI_DRV_IOVA_AS_VA > which > > was intended to mean "driver only supports VA" but had been understood > > as "driver supports both PA and VA" by most net drivers and used to > > let dpdk processes to run as non root (which do not have access to > > physical addresses on recent kernels). > > > > The check on physical addresses actually closed the gap for those > > drivers. We don't need to mark them with RTE_PCI_DRV_IOVA_AS_VA and > > this flag can retain its intended meaning. > > Document explicitly its meaning. > > > > We can check that a driver requirement wrt to IOVA mode is fulfilled > > before trying to probe a device. > > > > Finally, document the heuristic used to select the IOVA mode and hope > > that we won't break it again. > > > > Fixes: 703458e19c16 ("bus/pci: consider only usable devices for IOVA > > mode") > > > > Signed-off-by: David Marchand <david.marchand@redhat.com> > > Reviewed-by: Jerin Jacob <jerinj@marvell.com> > > Tested-by: Jerin Jacob <jerinj@marvell.com> > > --- > > <snip> > > > @@ -629,12 +643,16 @@ rte_pci_get_iommu_class(void) > > devices_want_va = true; > > } > > } > > - if (devices_want_pa) { > > - iova_mode = RTE_IOVA_PA; > > - if (devices_want_va) > > - RTE_LOG(WARNING, EAL, "Some devices want 'VA' > but forcing 'PA' because other devices want it\n"); > > - } else if (devices_want_va) { > > + if (devices_want_va && !devices_want_pa) { > > iova_mode = RTE_IOVA_VA; > > + } else if (devices_want_pa && !devices_want_va) { > > + iova_mode = RTE_IOVA_PA; > > + } else { > > + iova_mode = RTE_IOVA_DC; > > + if (devices_want_va) { > > + RTE_LOG(WARNING, EAL, "Some devices want 'VA' > but forcing 'DC' because other devices want 'PA'.\n"); > > + RTE_LOG(WARNING, EAL, "Depending on the final > decision by the EAL, > > +part of your devices won't initialise.\n"); > > Tiny nitpick - i generally don't like personal appeals in log messages, so > perhaps drop the "your"? I.e. "Depending on final decision by EAL, not all > devices may be able to initialize."? Same applies to the other instance of this > error message. I will fix it in v3. > > Otherwise, LGTM > > Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> > > -- > Thanks, > Anatoly ^ permalink raw reply [flat|nested] 57+ messages in thread
* [dpdk-dev] [PATCH v2 3/4] eal: change RTE_PCI_DRV_IOVA_AS_VA flag name 2019-07-16 13:46 ` [dpdk-dev] [PATCH v2 0/4] Fixes on IOVA mode selection jerinj 2019-07-16 13:46 ` [dpdk-dev] [PATCH v2 1/4] Revert "bus/pci: add Mellanox kernel driver type" jerinj 2019-07-16 13:46 ` [dpdk-dev] [PATCH v2 2/4] eal: fix IOVA mode selection as VA for pci drivers jerinj @ 2019-07-16 13:46 ` jerinj 2019-07-16 13:46 ` [dpdk-dev] [PATCH v2 4/4] eal: select IOVA mode as VA for default case jerinj 2019-07-18 6:45 ` [dpdk-dev] [PATCH v3 0/4] Fixes on IOVA mode selection jerinj 4 siblings, 0 replies; 57+ messages in thread From: jerinj @ 2019-07-16 13:46 UTC (permalink / raw) To: dev, Pavan Nikhilesh, Jerin Jacob, Nithin Dabilpuram, Vamsi Attunuru, Kiran Kumar K, Satha Rao Cc: thomas, david.marchand, anatoly.burakov From: Jerin Jacob <jerinj@marvell.com> In order to align name with other PCI driver flag such as RTE_PCI_DRV_NEED_MAPPING and to reflect its purpose, Change RTE_PCI_DRV_IOVA_AS_VA flag name as RTE_PCI_DRV_NEED_IOVA_AS_VA. Signed-off-by: Jerin Jacob <jerinj@marvell.com> --- drivers/bus/pci/linux/pci.c | 4 ++-- drivers/bus/pci/rte_bus_pci.h | 4 ++-- drivers/event/octeontx/timvf_probe.c | 2 +- drivers/event/octeontx2/otx2_evdev.c | 2 +- drivers/mempool/octeontx/octeontx_fpavf.c | 2 +- drivers/mempool/octeontx2/otx2_mempool.c | 2 +- drivers/net/octeontx2/otx2_ethdev.c | 2 +- drivers/raw/octeontx2_dma/otx2_dpi_rawdev.c | 2 +- 8 files changed, 10 insertions(+), 10 deletions(-) diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c index 1a2f99b32..1d8d20d93 100644 --- a/drivers/bus/pci/linux/pci.c +++ b/drivers/bus/pci/linux/pci.c @@ -580,7 +580,7 @@ pci_device_iova_mode(const struct rte_pci_driver *pdrv, } if (is_vfio_noiommu_enabled != 0) iova_mode = RTE_IOVA_PA; - else if ((pdrv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) != 0) + else if ((pdrv->drv_flags & RTE_PCI_DRV_NEED_IOVA_AS_VA) != 0) iova_mode = RTE_IOVA_VA; #endif break; @@ -592,7 +592,7 @@ pci_device_iova_mode(const struct rte_pci_driver *pdrv, break; default: - if ((pdrv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) != 0) + if ((pdrv->drv_flags & RTE_PCI_DRV_NEED_IOVA_AS_VA) != 0) iova_mode = RTE_IOVA_VA; break; } diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h index 0f2177564..29bea6d70 100644 --- a/drivers/bus/pci/rte_bus_pci.h +++ b/drivers/bus/pci/rte_bus_pci.h @@ -187,8 +187,8 @@ struct rte_pci_bus { #define RTE_PCI_DRV_INTR_RMV 0x0010 /** Device driver needs to keep mapped resources if unsupported dev detected */ #define RTE_PCI_DRV_KEEP_MAPPED_RES 0x0020 -/** Device driver only supports IOVA as VA and cannot work with IOVA as PA */ -#define RTE_PCI_DRV_IOVA_AS_VA 0x0040 +/** Device driver needs IOVA as VA and cannot work with IOVA as PA */ +#define RTE_PCI_DRV_NEED_IOVA_AS_VA 0x0040 /** * Map the PCI device resources in user space virtual memory address diff --git a/drivers/event/octeontx/timvf_probe.c b/drivers/event/octeontx/timvf_probe.c index 08dbd2be9..af87625fd 100644 --- a/drivers/event/octeontx/timvf_probe.c +++ b/drivers/event/octeontx/timvf_probe.c @@ -140,7 +140,7 @@ static const struct rte_pci_id pci_timvf_map[] = { static struct rte_pci_driver pci_timvf = { .id_table = pci_timvf_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_NEED_IOVA_AS_VA, .probe = timvf_probe, .remove = NULL, }; diff --git a/drivers/event/octeontx2/otx2_evdev.c b/drivers/event/octeontx2/otx2_evdev.c index 56716c2ac..e6379e3b4 100644 --- a/drivers/event/octeontx2/otx2_evdev.c +++ b/drivers/event/octeontx2/otx2_evdev.c @@ -1630,7 +1630,7 @@ static const struct rte_pci_id pci_sso_map[] = { static struct rte_pci_driver pci_sso = { .id_table = pci_sso_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_NEED_IOVA_AS_VA, .probe = otx2_sso_probe, .remove = otx2_sso_remove, }; diff --git a/drivers/mempool/octeontx/octeontx_fpavf.c b/drivers/mempool/octeontx/octeontx_fpavf.c index 4cf387e8f..baabc0152 100644 --- a/drivers/mempool/octeontx/octeontx_fpavf.c +++ b/drivers/mempool/octeontx/octeontx_fpavf.c @@ -799,7 +799,7 @@ static const struct rte_pci_id pci_fpavf_map[] = { static struct rte_pci_driver pci_fpavf = { .id_table = pci_fpavf_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_NEED_IOVA_AS_VA, .probe = fpavf_probe, }; diff --git a/drivers/mempool/octeontx2/otx2_mempool.c b/drivers/mempool/octeontx2/otx2_mempool.c index 9a5f11cf4..3a4a9425f 100644 --- a/drivers/mempool/octeontx2/otx2_mempool.c +++ b/drivers/mempool/octeontx2/otx2_mempool.c @@ -443,7 +443,7 @@ static const struct rte_pci_id pci_npa_map[] = { static struct rte_pci_driver pci_npa = { .id_table = pci_npa_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_NEED_IOVA_AS_VA, .probe = npa_probe, .remove = npa_remove, }; diff --git a/drivers/net/octeontx2/otx2_ethdev.c b/drivers/net/octeontx2/otx2_ethdev.c index 5ec55511b..7b91f6b31 100644 --- a/drivers/net/octeontx2/otx2_ethdev.c +++ b/drivers/net/octeontx2/otx2_ethdev.c @@ -2001,7 +2001,7 @@ static const struct rte_pci_id pci_nix_map[] = { static struct rte_pci_driver pci_nix = { .id_table = pci_nix_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_IOVA_AS_VA | + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_NEED_IOVA_AS_VA | RTE_PCI_DRV_INTR_LSC, .probe = nix_probe, .remove = nix_remove, diff --git a/drivers/raw/octeontx2_dma/otx2_dpi_rawdev.c b/drivers/raw/octeontx2_dma/otx2_dpi_rawdev.c index 6a1b43678..e398abb75 100644 --- a/drivers/raw/octeontx2_dma/otx2_dpi_rawdev.c +++ b/drivers/raw/octeontx2_dma/otx2_dpi_rawdev.c @@ -427,7 +427,7 @@ otx2_dpi_rawdev_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver rte_dpi_rawdev_pmd = { .id_table = pci_dma_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_NEED_IOVA_AS_VA, .probe = otx2_dpi_rawdev_probe, .remove = otx2_dpi_rawdev_remove, }; -- 2.22.0 ^ permalink raw reply [flat|nested] 57+ messages in thread
* [dpdk-dev] [PATCH v2 4/4] eal: select IOVA mode as VA for default case 2019-07-16 13:46 ` [dpdk-dev] [PATCH v2 0/4] Fixes on IOVA mode selection jerinj ` (2 preceding siblings ...) 2019-07-16 13:46 ` [dpdk-dev] [PATCH v2 3/4] eal: change RTE_PCI_DRV_IOVA_AS_VA flag name jerinj @ 2019-07-16 13:46 ` jerinj 2019-07-16 14:33 ` Burakov, Anatoly 2019-07-18 6:45 ` [dpdk-dev] [PATCH v3 0/4] Fixes on IOVA mode selection jerinj 4 siblings, 1 reply; 57+ messages in thread From: jerinj @ 2019-07-16 13:46 UTC (permalink / raw) To: dev, Anatoly Burakov, John McNamara, Marko Kovacevic Cc: thomas, david.marchand, Jerin Jacob From: Jerin Jacob <jerinj@marvell.com> When bus layer selected the preferred mode as RTE_IOVA_DC then select the IOVA mode as RTE_IOVA_VA. The RTE_IOVA_VA selected as the default because, 1) All drivers work in RTE_IOVA_VA mode, irrespective of physical address availability. 2) By default, the mempool, first asks for IOVA-contiguous memory using RTE_MEMZONE_IOVA_CONTIG and this is slow in IOVA as PA mode and it may affect the application boot time. Signed-off-by: Jerin Jacob <jerinj@marvell.com> --- doc/guides/prog_guide/env_abstraction_layer.rst | 10 ++++++++-- lib/librte_eal/linux/eal/eal.c | 6 ++---- 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst index 77307e3a6..1b0343eee 100644 --- a/doc/guides/prog_guide/env_abstraction_layer.rst +++ b/doc/guides/prog_guide/env_abstraction_layer.rst @@ -445,8 +445,14 @@ kernels. - if the preferred mode is RTE_IOVA_PA but there is no access to Physical Addresses, then EAL init will fail early, since later probing of the devices would fail anyway, -- if the preferred mode is RTE_IOVA_DC then based on the Physical Addresses - availability, the preferred mode is adjusted to RTE_IOVA_PA or RTE_IOVA_VA. +- if the preferred mode is RTE_IOVA_DC then select the IOVA mode as RTE_IOVA_VA. + The RTE_IOVA_VA selected as the default because, + +#. All drivers work in RTE_IOVA_VA mode, irrespective of physical address availability. + +#. By default, the mempool, first asks for IOVA-contiguous memory using ``RTE_MEMZONE_IOVA_CONTIG``, + and this is slow in IOVA as PA mode and it may affect the application boot time. + In the case when the buses had disagreed on the IOVA Mode at the first step, part of the buses won't work because of this decision. diff --git a/lib/librte_eal/linux/eal/eal.c b/lib/librte_eal/linux/eal/eal.c index 2e5499f9b..34db78753 100644 --- a/lib/librte_eal/linux/eal/eal.c +++ b/lib/librte_eal/linux/eal/eal.c @@ -1061,10 +1061,8 @@ rte_eal_init(int argc, char **argv) enum rte_iova_mode iova_mode = rte_bus_get_iommu_class(); if (iova_mode == RTE_IOVA_DC) { - iova_mode = phys_addrs ? RTE_IOVA_PA : RTE_IOVA_VA; - RTE_LOG(DEBUG, EAL, - "Buses did not request a specific IOVA mode, using '%s' based on physical addresses availability.\n", - phys_addrs ? "PA" : "VA"); + iova_mode = RTE_IOVA_VA; + RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode, select IOVA as VA mode.\n"); } #ifdef RTE_LIBRTE_KNI /* Workaround for KNI which requires physical address to work */ -- 2.22.0 ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [dpdk-dev] [PATCH v2 4/4] eal: select IOVA mode as VA for default case 2019-07-16 13:46 ` [dpdk-dev] [PATCH v2 4/4] eal: select IOVA mode as VA for default case jerinj @ 2019-07-16 14:33 ` Burakov, Anatoly 2019-07-17 8:33 ` [dpdk-dev] [EXT] " Jerin Jacob Kollanukkaran 0 siblings, 1 reply; 57+ messages in thread From: Burakov, Anatoly @ 2019-07-16 14:33 UTC (permalink / raw) To: jerinj, dev, John McNamara, Marko Kovacevic; +Cc: thomas, david.marchand On 16-Jul-19 2:46 PM, jerinj@marvell.com wrote: > From: Jerin Jacob <jerinj@marvell.com> > > When bus layer selected the preferred mode as RTE_IOVA_DC then > select the IOVA mode as RTE_IOVA_VA. > > The RTE_IOVA_VA selected as the default because, > > 1) All drivers work in RTE_IOVA_VA mode, irrespective of physical > address availability. > > 2) By default, the mempool, first asks for IOVA-contiguous memory > using RTE_MEMZONE_IOVA_CONTIG and this is slow in IOVA as PA mode > and it may affect the application boot time. > > Signed-off-by: Jerin Jacob <jerinj@marvell.com> > --- I should celebrate now :D > doc/guides/prog_guide/env_abstraction_layer.rst | 10 ++++++++-- > lib/librte_eal/linux/eal/eal.c | 6 ++---- > 2 files changed, 10 insertions(+), 6 deletions(-) > > diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst > index 77307e3a6..1b0343eee 100644 > --- a/doc/guides/prog_guide/env_abstraction_layer.rst > +++ b/doc/guides/prog_guide/env_abstraction_layer.rst > @@ -445,8 +445,14 @@ kernels. > - if the preferred mode is RTE_IOVA_PA but there is no access to Physical > Addresses, then EAL init will fail early, since later probing of the devices > would fail anyway, > -- if the preferred mode is RTE_IOVA_DC then based on the Physical Addresses > - availability, the preferred mode is adjusted to RTE_IOVA_PA or RTE_IOVA_VA. > +- if the preferred mode is RTE_IOVA_DC then select the IOVA mode as RTE_IOVA_VA. > + The RTE_IOVA_VA selected as the default because, > + > +#. All drivers work in RTE_IOVA_VA mode, irrespective of physical address availability. Is there anywhere we can document that any new driver must support both before being accepted? > + > +#. By default, the mempool, first asks for IOVA-contiguous memory using ``RTE_MEMZONE_IOVA_CONTIG``, > + and this is slow in IOVA as PA mode and it may affect the application boot time. I would also add a point about usability improvement for use-cases which require large amounts of IOVA-contiguous memory. Otherwise, Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [dpdk-dev] [EXT] Re: [PATCH v2 4/4] eal: select IOVA mode as VA for default case 2019-07-16 14:33 ` Burakov, Anatoly @ 2019-07-17 8:33 ` Jerin Jacob Kollanukkaran 2019-07-17 12:38 ` Burakov, Anatoly 0 siblings, 1 reply; 57+ messages in thread From: Jerin Jacob Kollanukkaran @ 2019-07-17 8:33 UTC (permalink / raw) To: Burakov, Anatoly, dev, John McNamara, Marko Kovacevic Cc: thomas, david.marchand > -----Original Message----- > From: Burakov, Anatoly <anatoly.burakov@intel.com> > Sent: Tuesday, July 16, 2019 8:03 PM > To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; dev@dpdk.org; John > McNamara <john.mcnamara@intel.com>; Marko Kovacevic > <marko.kovacevic@intel.com> > Cc: thomas@monjalon.net; david.marchand@redhat.com > Subject: [EXT] Re: [dpdk-dev] [PATCH v2 4/4] eal: select IOVA mode as VA for > default case > > On 16-Jul-19 2:46 PM, jerinj@marvell.com wrote: > > From: Jerin Jacob <jerinj@marvell.com> > > > > When bus layer selected the preferred mode as RTE_IOVA_DC then select > > the IOVA mode as RTE_IOVA_VA. > > > > The RTE_IOVA_VA selected as the default because, > > > > 1) All drivers work in RTE_IOVA_VA mode, irrespective of physical > > address availability. > > > > 2) By default, the mempool, first asks for IOVA-contiguous memory > > using RTE_MEMZONE_IOVA_CONTIG and this is slow in IOVA as PA mode > and > > it may affect the application boot time. > > > > Signed-off-by: Jerin Jacob <jerinj@marvell.com> > > --- > > I should celebrate now :D > > > doc/guides/prog_guide/env_abstraction_layer.rst | 10 ++++++++-- > > lib/librte_eal/linux/eal/eal.c | 6 ++---- > > 2 files changed, 10 insertions(+), 6 deletions(-) > > > > diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst > > b/doc/guides/prog_guide/env_abstraction_layer.rst > > index 77307e3a6..1b0343eee 100644 > > --- a/doc/guides/prog_guide/env_abstraction_layer.rst > > +++ b/doc/guides/prog_guide/env_abstraction_layer.rst > > @@ -445,8 +445,14 @@ kernels. > > - if the preferred mode is RTE_IOVA_PA but there is no access to Physical > > Addresses, then EAL init will fail early, since later probing of the devices > > would fail anyway, > > -- if the preferred mode is RTE_IOVA_DC then based on the Physical > > Addresses > > - availability, the preferred mode is adjusted to RTE_IOVA_PA or > RTE_IOVA_VA. > > +- if the preferred mode is RTE_IOVA_DC then select the IOVA mode as > RTE_IOVA_VA. > > + The RTE_IOVA_VA selected as the default because, > > + > > +#. All drivers work in RTE_IOVA_VA mode, irrespective of physical address > availability. > > Is there anywhere we can document that any new driver must support both > before being accepted? Not sure why new drivers need to support both PA and VA. Do you mean VA? And not sure where to document this as well if need. > > > + > > +#. By default, the mempool, first asks for IOVA-contiguous memory using > ``RTE_MEMZONE_IOVA_CONTIG``, > > + and this is slow in IOVA as PA mode and it may affect the application > boot time. > > I would also add a point about usability improvement for use-cases which > require large amounts of IOVA-contiguous memory. I will add in next version: How about the following, Let me know if any change required. #. It is easy to enable large amount of IOVA-contiguous memory use-cases with IOVA in VA mode. > > Otherwise, > > Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [dpdk-dev] [EXT] Re: [PATCH v2 4/4] eal: select IOVA mode as VA for default case 2019-07-17 8:33 ` [dpdk-dev] [EXT] " Jerin Jacob Kollanukkaran @ 2019-07-17 12:38 ` Burakov, Anatoly 2019-07-17 14:04 ` Jerin Jacob Kollanukkaran 0 siblings, 1 reply; 57+ messages in thread From: Burakov, Anatoly @ 2019-07-17 12:38 UTC (permalink / raw) To: Jerin Jacob Kollanukkaran, dev, John McNamara, Marko Kovacevic Cc: thomas, david.marchand On 17-Jul-19 9:33 AM, Jerin Jacob Kollanukkaran wrote: >> -----Original Message----- >> From: Burakov, Anatoly <anatoly.burakov@intel.com> >> Sent: Tuesday, July 16, 2019 8:03 PM >> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; dev@dpdk.org; John >> McNamara <john.mcnamara@intel.com>; Marko Kovacevic >> <marko.kovacevic@intel.com> >> Cc: thomas@monjalon.net; david.marchand@redhat.com >> Subject: [EXT] Re: [dpdk-dev] [PATCH v2 4/4] eal: select IOVA mode as VA for >> default case >> >> On 16-Jul-19 2:46 PM, jerinj@marvell.com wrote: >>> From: Jerin Jacob <jerinj@marvell.com> >>> >>> When bus layer selected the preferred mode as RTE_IOVA_DC then select >>> the IOVA mode as RTE_IOVA_VA. >>> >>> The RTE_IOVA_VA selected as the default because, >>> >>> 1) All drivers work in RTE_IOVA_VA mode, irrespective of physical >>> address availability. >>> >>> 2) By default, the mempool, first asks for IOVA-contiguous memory >>> using RTE_MEMZONE_IOVA_CONTIG and this is slow in IOVA as PA mode >> and >>> it may affect the application boot time. >>> >>> Signed-off-by: Jerin Jacob <jerinj@marvell.com> >>> --- >> >> I should celebrate now :D >> >>> doc/guides/prog_guide/env_abstraction_layer.rst | 10 ++++++++-- >>> lib/librte_eal/linux/eal/eal.c | 6 ++---- >>> 2 files changed, 10 insertions(+), 6 deletions(-) >>> >>> diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst >>> b/doc/guides/prog_guide/env_abstraction_layer.rst >>> index 77307e3a6..1b0343eee 100644 >>> --- a/doc/guides/prog_guide/env_abstraction_layer.rst >>> +++ b/doc/guides/prog_guide/env_abstraction_layer.rst >>> @@ -445,8 +445,14 @@ kernels. >>> - if the preferred mode is RTE_IOVA_PA but there is no access to Physical >>> Addresses, then EAL init will fail early, since later probing of the devices >>> would fail anyway, >>> -- if the preferred mode is RTE_IOVA_DC then based on the Physical >>> Addresses >>> - availability, the preferred mode is adjusted to RTE_IOVA_PA or >> RTE_IOVA_VA. >>> +- if the preferred mode is RTE_IOVA_DC then select the IOVA mode as >> RTE_IOVA_VA. >>> + The RTE_IOVA_VA selected as the default because, >>> + >>> +#. All drivers work in RTE_IOVA_VA mode, irrespective of physical address >> availability. >> >> Is there anywhere we can document that any new driver must support both >> before being accepted? > > Not sure why new drivers need to support both PA and VA. Do you mean VA? > And not sure where to document this as well if need. We have a flaf that indicates that the driver needs IOVA as VA. Absence of said flag indicates that it supports both IOVA as VA and IOVA as PA. So, absent of this flag, any new driver must support both PA and VA, must it not? > >> >>> + >>> +#. By default, the mempool, first asks for IOVA-contiguous memory using >> ``RTE_MEMZONE_IOVA_CONTIG``, >>> + and this is slow in IOVA as PA mode and it may affect the application >> boot time. >> >> I would also add a point about usability improvement for use-cases which >> require large amounts of IOVA-contiguous memory. > > I will add in next version: > How about the following, Let me know if any change required. > > #. It is easy to enable large amount of IOVA-contiguous memory use-cases with IOVA in VA mode. Yes, that looks OK. > >> >> Otherwise, >> >> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [dpdk-dev] [EXT] Re: [PATCH v2 4/4] eal: select IOVA mode as VA for default case 2019-07-17 12:38 ` Burakov, Anatoly @ 2019-07-17 14:04 ` Jerin Jacob Kollanukkaran 0 siblings, 0 replies; 57+ messages in thread From: Jerin Jacob Kollanukkaran @ 2019-07-17 14:04 UTC (permalink / raw) To: Burakov, Anatoly, dev, John McNamara, Marko Kovacevic Cc: thomas, david.marchand > >>> Addresses, then EAL init will fail early, since later probing of the > devices > >>> would fail anyway, > >>> -- if the preferred mode is RTE_IOVA_DC then based on the Physical > >>> Addresses > >>> - availability, the preferred mode is adjusted to RTE_IOVA_PA or > >> RTE_IOVA_VA. > >>> +- if the preferred mode is RTE_IOVA_DC then select the IOVA mode as > >> RTE_IOVA_VA. > >>> + The RTE_IOVA_VA selected as the default because, > >>> + > >>> +#. All drivers work in RTE_IOVA_VA mode, irrespective of physical > >>> +address > >> availability. > >> > >> Is there anywhere we can document that any new driver must support > >> both before being accepted? > > > > Not sure why new drivers need to support both PA and VA. Do you mean > VA? > > And not sure where to document this as well if need. > > We have a flaf that indicates that the driver needs IOVA as VA. Absence of > said flag indicates that it supports both IOVA as VA and IOVA as PA. > So, absent of this flag, any new driver must support both PA and VA, must it > not? OK. I will add the following as "note" after "IOVA Mode Detection" section. I don’t any other place to put this info in the doc. If any change needed then let me know. .. note:: If the device driver needs IOVA as VA and it cannot work with IOVA as PA then the driver must request the PCI bus layer using ``RTE_PCI_DRV_NEED_IOVA_AS_VA`` requirement flag. Absence of this flag, dictates, the driver must support both IOVA as PA and VA modes. ^ permalink raw reply [flat|nested] 57+ messages in thread
* [dpdk-dev] [PATCH v3 0/4] Fixes on IOVA mode selection 2019-07-16 13:46 ` [dpdk-dev] [PATCH v2 0/4] Fixes on IOVA mode selection jerinj ` (3 preceding siblings ...) 2019-07-16 13:46 ` [dpdk-dev] [PATCH v2 4/4] eal: select IOVA mode as VA for default case jerinj @ 2019-07-18 6:45 ` jerinj 2019-07-18 6:45 ` [dpdk-dev] [PATCH v3 1/4] Revert "bus/pci: add Mellanox kernel driver type" jerinj ` (4 more replies) 4 siblings, 5 replies; 57+ messages in thread From: jerinj @ 2019-07-18 6:45 UTC (permalink / raw) To: dev; +Cc: thomas, david.marchand, anatoly.burakov, Jerin Jacob From: Jerin Jacob <jerinj@marvell.com> Orginal V1 cover letter from David Marchand: Following the issues reported by Jerin and the discussion that emerged from it, here are fixes to restore and document the behavior of the EAL and the pci bus driver. I pondered all the arguments and tried to have the less changes possible. I can't find a need for a flag to just announce support of physical addresses from the pmd point of view. So it ended up with something really close to what Jerin had suggested. But the problem is that this is still unfinished wrt the documentation. I will be offline for 10 days and we need this to move forward, so sending anyway. v3: - Patch 2/4 - Remove personal appeals in log messages(Anatoly) - Patch 4/4 - Added following documentation (Anatoly) a) #. It is easy to enable large amount of IOVA-contiguous memory use-cases with IOVA in VA mode. in the reasons for VA as default b) As a note, If the device driver needs IOVA as VA and it cannot work with IOVA as PA then the driver must request the PCI bus layer using ``RTE_PCI_DRV_NEED_IOVA_AS_VA`` requirement flag. Absence of this flag, dictates, the driver must support both IOVA as PA and VA modes. v2: - Changed RTE_PCI_DRV_IOVA_AS_VA flag name as RTE_PCI_DRV_NEED_IOVA_AS_VA(patch 3/4) - Changed IOVA mode as VA for default case(patch 4/4) with documentation - Tested the patch series on octeontx2 platform David Marchand (2): Revert "bus/pci: add Mellanox kernel driver type" eal: fix IOVA mode selection as VA for pci drivers Jerin Jacob (2): eal: change RTE_PCI_DRV_IOVA_AS_VA flag name eal: select IOVA mode as VA for default case .../prog_guide/env_abstraction_layer.rst | 45 +++++++++++++++++++ drivers/bus/pci/linux/pci.c | 24 +++------- drivers/bus/pci/pci_common.c | 30 ++++++++++--- drivers/bus/pci/rte_bus_pci.h | 4 +- drivers/event/octeontx/timvf_probe.c | 2 +- drivers/event/octeontx2/otx2_evdev.c | 2 +- drivers/mempool/octeontx/octeontx_fpavf.c | 2 +- drivers/mempool/octeontx2/otx2_mempool.c | 2 +- drivers/net/atlantic/atl_ethdev.c | 3 +- drivers/net/bnxt/bnxt_ethdev.c | 3 +- drivers/net/e1000/em_ethdev.c | 3 +- drivers/net/e1000/igb_ethdev.c | 5 +-- drivers/net/enic/enic_ethdev.c | 3 +- drivers/net/fm10k/fm10k_ethdev.c | 3 +- drivers/net/i40e/i40e_ethdev.c | 3 +- drivers/net/i40e/i40e_ethdev_vf.c | 2 +- drivers/net/iavf/iavf_ethdev.c | 3 +- drivers/net/ice/ice_ethdev.c | 3 +- drivers/net/ixgbe/ixgbe_ethdev.c | 5 +-- drivers/net/mlx4/mlx4.c | 3 +- drivers/net/mlx5/mlx5.c | 2 +- drivers/net/nfp/nfp_net.c | 6 +-- drivers/net/octeontx2/otx2_ethdev.c | 7 +-- drivers/net/qede/qede_ethdev.c | 6 +-- drivers/raw/ioat/ioat_rawdev.c | 3 +- drivers/raw/octeontx2_dma/otx2_dpi_rawdev.c | 2 +- lib/librte_eal/common/eal_common_bus.c | 30 +++++++++++-- lib/librte_eal/common/include/rte_dev.h | 1 - lib/librte_eal/linux/eal/eal.c | 6 +-- 29 files changed, 132 insertions(+), 81 deletions(-) -- 2.22.0 ^ permalink raw reply [flat|nested] 57+ messages in thread
* [dpdk-dev] [PATCH v3 1/4] Revert "bus/pci: add Mellanox kernel driver type" 2019-07-18 6:45 ` [dpdk-dev] [PATCH v3 0/4] Fixes on IOVA mode selection jerinj @ 2019-07-18 6:45 ` jerinj 2019-07-18 6:45 ` [dpdk-dev] [PATCH v3 2/4] eal: fix IOVA mode selection as VA for pci drivers jerinj ` (3 subsequent siblings) 4 siblings, 0 replies; 57+ messages in thread From: jerinj @ 2019-07-18 6:45 UTC (permalink / raw) To: dev; +Cc: thomas, david.marchand, anatoly.burakov, Jerin Jacob From: David Marchand <david.marchand@redhat.com> This reverts commit 0cb86518db57d35e0abc14d6703fad561a0310e2. The pci bus now reports DC when faced with a device bound to an unknown driver and, in such a case, the IOVA mode is selected against physical address availability. As a consequence, there is no reason for this special case for Mellanox drivers. Fixes: 703458e19c16 ("bus/pci: consider only usable devices for IOVA mode") Signed-off-by: David Marchand <david.marchand@redhat.com> Reviewed-by: Jerin Jacob <jerinj@marvell.com> --- drivers/bus/pci/linux/pci.c | 8 -------- lib/librte_eal/common/include/rte_dev.h | 1 - 2 files changed, 9 deletions(-) diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c index 33c8ea7e9..b12f10af5 100644 --- a/drivers/bus/pci/linux/pci.c +++ b/drivers/bus/pci/linux/pci.c @@ -329,9 +329,6 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr) dev->kdrv = RTE_KDRV_IGB_UIO; else if (!strcmp(driver, "uio_pci_generic")) dev->kdrv = RTE_KDRV_UIO_GENERIC; - else if (!strcmp(driver, "mlx4_core") || - !strcmp(driver, "mlx5_core")) - dev->kdrv = RTE_KDRV_NIC_MLX; else dev->kdrv = RTE_KDRV_UNKNOWN; } else @@ -591,11 +588,6 @@ pci_device_iova_mode(const struct rte_pci_driver *pdrv, break; } - case RTE_KDRV_NIC_MLX: - if ((pdrv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) == 0) - iova_mode = RTE_IOVA_PA; - break; - case RTE_KDRV_IGB_UIO: case RTE_KDRV_UIO_GENERIC: iova_mode = RTE_IOVA_PA; diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h index 94829f6e4..c25e09e3d 100644 --- a/lib/librte_eal/common/include/rte_dev.h +++ b/lib/librte_eal/common/include/rte_dev.h @@ -63,7 +63,6 @@ enum rte_kernel_driver { RTE_KDRV_VFIO, RTE_KDRV_UIO_GENERIC, RTE_KDRV_NIC_UIO, - RTE_KDRV_NIC_MLX, RTE_KDRV_NONE, }; -- 2.22.0 ^ permalink raw reply [flat|nested] 57+ messages in thread
* [dpdk-dev] [PATCH v3 2/4] eal: fix IOVA mode selection as VA for pci drivers 2019-07-18 6:45 ` [dpdk-dev] [PATCH v3 0/4] Fixes on IOVA mode selection jerinj 2019-07-18 6:45 ` [dpdk-dev] [PATCH v3 1/4] Revert "bus/pci: add Mellanox kernel driver type" jerinj @ 2019-07-18 6:45 ` jerinj 2019-07-18 6:45 ` [dpdk-dev] [PATCH v3 3/4] eal: change RTE_PCI_DRV_IOVA_AS_VA flag name jerinj ` (2 subsequent siblings) 4 siblings, 0 replies; 57+ messages in thread From: jerinj @ 2019-07-18 6:45 UTC (permalink / raw) To: dev, Anatoly Burakov, John McNamara, Marko Kovacevic, Igor Russkikh, Pavel Belous, Ajit Khaparde, Somnath Kotur, Wenzhuo Lu, John Daley, Hyong Youb Kim, Qi Zhang, Xiao Wang, Beilei Xing, Jingjing Wu, Qiming Yang, Konstantin Ananyev, Matan Azrad, Shahaf Shuler, Yongseok Koh, Viacheslav Ovsiienko, Alejandro Lucero, Jerin Jacob, Nithin Dabilpuram, Kiran Kumar K, Rasesh Mody, Shahed Shaikh, Bruce Richardson Cc: thomas, david.marchand From: David Marchand <david.marchand@redhat.com> The incriminated commit broke the use of RTE_PCI_DRV_IOVA_AS_VA which was intended to mean "driver only supports VA" but had been understood as "driver supports both PA and VA" by most net drivers and used to let dpdk processes to run as non root (which do not have access to physical addresses on recent kernels). The check on physical addresses actually closed the gap for those drivers. We don't need to mark them with RTE_PCI_DRV_IOVA_AS_VA and this flag can retain its intended meaning. Document explicitly its meaning. We can check that a driver requirement wrt to IOVA mode is fulfilled before trying to probe a device. Finally, document the heuristic used to select the IOVA mode and hope that we won't break it again. Fixes: 703458e19c16 ("bus/pci: consider only usable devices for IOVA mode") Signed-off-by: David Marchand <david.marchand@redhat.com> Reviewed-by: Jerin Jacob <jerinj@marvell.com> Tested-by: Jerin Jacob <jerinj@marvell.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> --- .../prog_guide/env_abstraction_layer.rst | 31 +++++++++++++++++++ drivers/bus/pci/linux/pci.c | 16 ++++------ drivers/bus/pci/pci_common.c | 30 ++++++++++++++---- drivers/bus/pci/rte_bus_pci.h | 4 +-- drivers/net/atlantic/atl_ethdev.c | 3 +- drivers/net/bnxt/bnxt_ethdev.c | 3 +- drivers/net/e1000/em_ethdev.c | 3 +- drivers/net/e1000/igb_ethdev.c | 5 ++- drivers/net/enic/enic_ethdev.c | 3 +- drivers/net/fm10k/fm10k_ethdev.c | 3 +- drivers/net/i40e/i40e_ethdev.c | 3 +- drivers/net/i40e/i40e_ethdev_vf.c | 2 +- drivers/net/iavf/iavf_ethdev.c | 3 +- drivers/net/ice/ice_ethdev.c | 3 +- drivers/net/ixgbe/ixgbe_ethdev.c | 5 ++- drivers/net/mlx4/mlx4.c | 3 +- drivers/net/mlx5/mlx5.c | 2 +- drivers/net/nfp/nfp_net.c | 6 ++-- drivers/net/octeontx2/otx2_ethdev.c | 5 --- drivers/net/qede/qede_ethdev.c | 6 ++-- drivers/raw/ioat/ioat_rawdev.c | 3 +- lib/librte_eal/common/eal_common_bus.c | 30 ++++++++++++++++-- 22 files changed, 110 insertions(+), 62 deletions(-) diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst index f15bcd976..77307e3a6 100644 --- a/doc/guides/prog_guide/env_abstraction_layer.rst +++ b/doc/guides/prog_guide/env_abstraction_layer.rst @@ -419,6 +419,37 @@ Misc Functions Locks and atomic operations are per-architecture (i686 and x86_64). +IOVA Mode Detection +~~~~~~~~~~~~~~~~~~~ + +IOVA Mode is selected by considering what the current usable Devices on the +system requires and/or supports. + +Below is the 2-step heuristic for this choice. + +For the first step, EAL asks each bus its requirement in terms of IOVA mode +and decides on a preferred IOVA mode. + +- if all buses report RTE_IOVA_PA, then the preferred IOVA mode is RTE_IOVA_PA, +- if all buses report RTE_IOVA_VA, then the preferred IOVA mode is RTE_IOVA_VA, +- if all buses report RTE_IOVA_DC, no bus expressed a preferrence, then the + preferred mode is RTE_IOVA_DC, +- if the buses disagree (at least one wants RTE_IOVA_PA and at least one wants + RTE_IOVA_VA), then the preferred IOVA mode is RTE_IOVA_DC (see below with the + check on Physical Addresses availability), + +The second step is checking if the preferred mode complies with the Physical +Addresses availability since those are only available to root user in recent +kernels. + +- if the preferred mode is RTE_IOVA_PA but there is no access to Physical + Addresses, then EAL init will fail early, since later probing of the devices + would fail anyway, +- if the preferred mode is RTE_IOVA_DC then based on the Physical Addresses + availability, the preferred mode is adjusted to RTE_IOVA_PA or RTE_IOVA_VA. + In the case when the buses had disagreed on the IOVA Mode at the first step, + part of the buses won't work because of this decision. + IOVA Mode Configuration ~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c index b12f10af5..1a2f99b32 100644 --- a/drivers/bus/pci/linux/pci.c +++ b/drivers/bus/pci/linux/pci.c @@ -578,12 +578,10 @@ pci_device_iova_mode(const struct rte_pci_driver *pdrv, else is_vfio_noiommu_enabled = 0; } - if ((pdrv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) == 0) { + if (is_vfio_noiommu_enabled != 0) iova_mode = RTE_IOVA_PA; - } else if (is_vfio_noiommu_enabled != 0) { - RTE_LOG(DEBUG, EAL, "Forcing to 'PA', vfio-noiommu mode configured\n"); - iova_mode = RTE_IOVA_PA; - } + else if ((pdrv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) != 0) + iova_mode = RTE_IOVA_VA; #endif break; } @@ -594,8 +592,8 @@ pci_device_iova_mode(const struct rte_pci_driver *pdrv, break; default: - RTE_LOG(DEBUG, EAL, "Unsupported kernel driver? Defaulting to IOVA as 'PA'\n"); - iova_mode = RTE_IOVA_PA; + if ((pdrv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) != 0) + iova_mode = RTE_IOVA_VA; break; } @@ -607,10 +605,8 @@ pci_device_iova_mode(const struct rte_pci_driver *pdrv, if (iommu_no_va == -1) iommu_no_va = pci_one_device_iommu_support_va(pdev) ? 0 : 1; - if (iommu_no_va != 0) { - RTE_LOG(DEBUG, EAL, "Forcing to 'PA', IOMMU does not support IOVA as 'VA'\n"); + if (iommu_no_va != 0) iova_mode = RTE_IOVA_PA; - } } return iova_mode; } diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c index d2af472ef..9794552fd 100644 --- a/drivers/bus/pci/pci_common.c +++ b/drivers/bus/pci/pci_common.c @@ -169,8 +169,22 @@ rte_pci_probe_one_driver(struct rte_pci_driver *dr, * This needs to be before rte_pci_map_device(), as it enables to use * driver flags for adjusting configuration. */ - if (!already_probed) + if (!already_probed) { + enum rte_iova_mode dev_iova_mode; + enum rte_iova_mode iova_mode; + + dev_iova_mode = pci_device_iova_mode(dr, dev); + iova_mode = rte_eal_iova_mode(); + if (dev_iova_mode != RTE_IOVA_DC && + dev_iova_mode != iova_mode) { + RTE_LOG(ERR, EAL, " Expecting '%s' IOVA mode but current mode is '%s', not initializing\n", + dev_iova_mode == RTE_IOVA_PA ? "PA" : "VA", + iova_mode == RTE_IOVA_PA ? "PA" : "VA"); + return -EINVAL; + } + dev->driver = dr; + } if (!already_probed && (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING)) { /* map resources for devices that use igb_uio */ @@ -629,12 +643,16 @@ rte_pci_get_iommu_class(void) devices_want_va = true; } } - if (devices_want_pa) { - iova_mode = RTE_IOVA_PA; - if (devices_want_va) - RTE_LOG(WARNING, EAL, "Some devices want 'VA' but forcing 'PA' because other devices want it\n"); - } else if (devices_want_va) { + if (devices_want_va && !devices_want_pa) { iova_mode = RTE_IOVA_VA; + } else if (devices_want_pa && !devices_want_va) { + iova_mode = RTE_IOVA_PA; + } else { + iova_mode = RTE_IOVA_DC; + if (devices_want_va) { + RTE_LOG(WARNING, EAL, "Some devices want 'VA' but forcing 'DC' because other devices want 'PA'.\n"); + RTE_LOG(WARNING, EAL, "Depending on the final decision by the EAL, not all devices may be able to initialize.\n"); + } } return iova_mode; } diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h index 06e004cd3..0f2177564 100644 --- a/drivers/bus/pci/rte_bus_pci.h +++ b/drivers/bus/pci/rte_bus_pci.h @@ -187,8 +187,8 @@ struct rte_pci_bus { #define RTE_PCI_DRV_INTR_RMV 0x0010 /** Device driver needs to keep mapped resources if unsupported dev detected */ #define RTE_PCI_DRV_KEEP_MAPPED_RES 0x0020 -/** Device driver supports IOVA as VA */ -#define RTE_PCI_DRV_IOVA_AS_VA 0X0040 +/** Device driver only supports IOVA as VA and cannot work with IOVA as PA */ +#define RTE_PCI_DRV_IOVA_AS_VA 0x0040 /** * Map the PCI device resources in user space virtual memory address diff --git a/drivers/net/atlantic/atl_ethdev.c b/drivers/net/atlantic/atl_ethdev.c index fdc0a7f2d..fa89ae755 100644 --- a/drivers/net/atlantic/atl_ethdev.c +++ b/drivers/net/atlantic/atl_ethdev.c @@ -157,8 +157,7 @@ static const struct rte_pci_id pci_id_atl_map[] = { static struct rte_pci_driver rte_atl_pmd = { .id_table = pci_id_atl_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = eth_atl_pci_probe, .remove = eth_atl_pci_remove, }; diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c index 8fc510351..9306d5655 100644 --- a/drivers/net/bnxt/bnxt_ethdev.c +++ b/drivers/net/bnxt/bnxt_ethdev.c @@ -4028,8 +4028,7 @@ static int bnxt_pci_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver bnxt_rte_pmd = { .id_table = bnxt_pci_id_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | - RTE_PCI_DRV_INTR_LSC | RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = bnxt_pci_probe, .remove = bnxt_pci_remove, }; diff --git a/drivers/net/e1000/em_ethdev.c b/drivers/net/e1000/em_ethdev.c index dc886613a..0c859e52b 100644 --- a/drivers/net/e1000/em_ethdev.c +++ b/drivers/net/e1000/em_ethdev.c @@ -352,8 +352,7 @@ static int eth_em_pci_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver rte_em_pmd = { .id_table = pci_id_em_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = eth_em_pci_probe, .remove = eth_em_pci_remove, }; diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c index 3ee28cfbc..e784eeb73 100644 --- a/drivers/net/e1000/igb_ethdev.c +++ b/drivers/net/e1000/igb_ethdev.c @@ -1116,8 +1116,7 @@ static int eth_igb_pci_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver rte_igb_pmd = { .id_table = pci_id_igb_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = eth_igb_pci_probe, .remove = eth_igb_pci_remove, }; @@ -1140,7 +1139,7 @@ static int eth_igbvf_pci_remove(struct rte_pci_device *pci_dev) */ static struct rte_pci_driver rte_igbvf_pmd = { .id_table = pci_id_igbvf_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING, .probe = eth_igbvf_pci_probe, .remove = eth_igbvf_pci_remove, }; diff --git a/drivers/net/enic/enic_ethdev.c b/drivers/net/enic/enic_ethdev.c index 5cfbd31a2..e9c6f83ce 100644 --- a/drivers/net/enic/enic_ethdev.c +++ b/drivers/net/enic/enic_ethdev.c @@ -1247,8 +1247,7 @@ static int eth_enic_pci_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver rte_enic_pmd = { .id_table = pci_id_enic_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = eth_enic_pci_probe, .remove = eth_enic_pci_remove, }; diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c index a1e3836cb..2d3c47763 100644 --- a/drivers/net/fm10k/fm10k_ethdev.c +++ b/drivers/net/fm10k/fm10k_ethdev.c @@ -3268,8 +3268,7 @@ static const struct rte_pci_id pci_id_fm10k_map[] = { static struct rte_pci_driver rte_pmd_fm10k = { .id_table = pci_id_fm10k_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = eth_fm10k_pci_probe, .remove = eth_fm10k_pci_remove, }; diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c index 2b9fc4572..dd46d4d9d 100644 --- a/drivers/net/i40e/i40e_ethdev.c +++ b/drivers/net/i40e/i40e_ethdev.c @@ -696,8 +696,7 @@ static int eth_i40e_pci_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver rte_i40e_pmd = { .id_table = pci_id_i40e_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = eth_i40e_pci_probe, .remove = eth_i40e_pci_remove, }; diff --git a/drivers/net/i40e/i40e_ethdev_vf.c b/drivers/net/i40e/i40e_ethdev_vf.c index 5be32b069..3ff2f6097 100644 --- a/drivers/net/i40e/i40e_ethdev_vf.c +++ b/drivers/net/i40e/i40e_ethdev_vf.c @@ -1557,7 +1557,7 @@ static int eth_i40evf_pci_remove(struct rte_pci_device *pci_dev) */ static struct rte_pci_driver rte_i40evf_pmd = { .id_table = pci_id_i40evf_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING, .probe = eth_i40evf_pci_probe, .remove = eth_i40evf_pci_remove, }; diff --git a/drivers/net/iavf/iavf_ethdev.c b/drivers/net/iavf/iavf_ethdev.c index 53dc05c78..a97cd76fd 100644 --- a/drivers/net/iavf/iavf_ethdev.c +++ b/drivers/net/iavf/iavf_ethdev.c @@ -1402,8 +1402,7 @@ static int eth_iavf_pci_remove(struct rte_pci_device *pci_dev) /* Adaptive virtual function driver struct */ static struct rte_pci_driver rte_iavf_pmd = { .id_table = pci_id_iavf_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = eth_iavf_pci_probe, .remove = eth_iavf_pci_remove, }; diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c index 9ce730cd4..f05b48c01 100644 --- a/drivers/net/ice/ice_ethdev.c +++ b/drivers/net/ice/ice_ethdev.c @@ -3737,8 +3737,7 @@ ice_pci_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver rte_ice_pmd = { .id_table = pci_id_ice_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = ice_pci_probe, .remove = ice_pci_remove, }; diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c index 22c5b2c5c..4a6e5c32e 100644 --- a/drivers/net/ixgbe/ixgbe_ethdev.c +++ b/drivers/net/ixgbe/ixgbe_ethdev.c @@ -1869,8 +1869,7 @@ static int eth_ixgbe_pci_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver rte_ixgbe_pmd = { .id_table = pci_id_ixgbe_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = eth_ixgbe_pci_probe, .remove = eth_ixgbe_pci_remove, }; @@ -1892,7 +1891,7 @@ static int eth_ixgbevf_pci_remove(struct rte_pci_device *pci_dev) */ static struct rte_pci_driver rte_ixgbevf_pmd = { .id_table = pci_id_ixgbevf_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING, .probe = eth_ixgbevf_pci_probe, .remove = eth_ixgbevf_pci_remove, }; diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c index 2e169b088..d6e5753bf 100644 --- a/drivers/net/mlx4/mlx4.c +++ b/drivers/net/mlx4/mlx4.c @@ -1142,8 +1142,7 @@ static struct rte_pci_driver mlx4_driver = { }, .id_table = mlx4_pci_id_map, .probe = mlx4_pci_probe, - .drv_flags = RTE_PCI_DRV_INTR_LSC | RTE_PCI_DRV_INTR_RMV | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_INTR_LSC | RTE_PCI_DRV_INTR_RMV, }; #ifdef RTE_IBVERBS_LINK_DLOPEN diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index d93f92db5..0f05853f9 100644 --- a/drivers/net/mlx5/mlx5.c +++ b/drivers/net/mlx5/mlx5.c @@ -2087,7 +2087,7 @@ static struct rte_pci_driver mlx5_driver = { .dma_map = mlx5_dma_map, .dma_unmap = mlx5_dma_unmap, .drv_flags = RTE_PCI_DRV_INTR_LSC | RTE_PCI_DRV_INTR_RMV | - RTE_PCI_DRV_PROBE_AGAIN | RTE_PCI_DRV_IOVA_AS_VA, + RTE_PCI_DRV_PROBE_AGAIN, }; #ifdef RTE_IBVERBS_LINK_DLOPEN diff --git a/drivers/net/nfp/nfp_net.c b/drivers/net/nfp/nfp_net.c index 1a7aa17ee..f5d33efcf 100644 --- a/drivers/net/nfp/nfp_net.c +++ b/drivers/net/nfp/nfp_net.c @@ -3760,16 +3760,14 @@ static int eth_nfp_pci_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver rte_nfp_net_pf_pmd = { .id_table = pci_id_nfp_pf_net_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = nfp_pf_pci_probe, .remove = eth_nfp_pci_remove, }; static struct rte_pci_driver rte_nfp_net_vf_pmd = { .id_table = pci_id_nfp_vf_net_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = eth_nfp_pci_probe, .remove = eth_nfp_pci_remove, }; diff --git a/drivers/net/octeontx2/otx2_ethdev.c b/drivers/net/octeontx2/otx2_ethdev.c index fcb1869d5..5ec55511b 100644 --- a/drivers/net/octeontx2/otx2_ethdev.c +++ b/drivers/net/octeontx2/otx2_ethdev.c @@ -1188,11 +1188,6 @@ otx2_nix_configure(struct rte_eth_dev *eth_dev) goto fail; } - if (rte_eal_iova_mode() != RTE_IOVA_VA) { - otx2_err("iova mode should be va"); - goto fail; - } - if (conf->link_speeds & ETH_LINK_SPEED_FIXED) { otx2_err("Setting link speed/duplex not supported"); goto fail; diff --git a/drivers/net/qede/qede_ethdev.c b/drivers/net/qede/qede_ethdev.c index 82363e6eb..0b3046a8a 100644 --- a/drivers/net/qede/qede_ethdev.c +++ b/drivers/net/qede/qede_ethdev.c @@ -2737,8 +2737,7 @@ static int qedevf_eth_dev_pci_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver rte_qedevf_pmd = { .id_table = pci_id_qedevf_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = qedevf_eth_dev_pci_probe, .remove = qedevf_eth_dev_pci_remove, }; @@ -2757,8 +2756,7 @@ static int qede_eth_dev_pci_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver rte_qede_pmd = { .id_table = pci_id_qede_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = qede_eth_dev_pci_probe, .remove = qede_eth_dev_pci_remove, }; diff --git a/drivers/raw/ioat/ioat_rawdev.c b/drivers/raw/ioat/ioat_rawdev.c index d509b6606..7270ad7aa 100644 --- a/drivers/raw/ioat/ioat_rawdev.c +++ b/drivers/raw/ioat/ioat_rawdev.c @@ -338,8 +338,7 @@ static const struct rte_pci_id pci_id_ioat_map[] = { static struct rte_pci_driver ioat_pmd_drv = { .id_table = pci_id_ioat_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = ioat_rawdev_probe, .remove = ioat_rawdev_remove, }; diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c index 77f1be1b4..04590485b 100644 --- a/lib/librte_eal/common/eal_common_bus.c +++ b/lib/librte_eal/common/eal_common_bus.c @@ -228,13 +228,37 @@ rte_bus_find_by_device_name(const char *str) enum rte_iova_mode rte_bus_get_iommu_class(void) { - int mode = RTE_IOVA_DC; + enum rte_iova_mode mode = RTE_IOVA_DC; + bool buses_want_va = false; + bool buses_want_pa = false; struct rte_bus *bus; TAILQ_FOREACH(bus, &rte_bus_list, next) { + enum rte_iova_mode bus_iova_mode; - if (bus->get_iommu_class) - mode |= bus->get_iommu_class(); + if (bus->get_iommu_class == NULL) + continue; + + bus_iova_mode = bus->get_iommu_class(); + RTE_LOG(DEBUG, EAL, "Bus %s wants IOVA as '%s'\n", + bus->name, + bus_iova_mode == RTE_IOVA_DC ? "DC" : + (bus_iova_mode == RTE_IOVA_PA ? "PA" : "VA")); + if (bus_iova_mode == RTE_IOVA_PA) + buses_want_pa = true; + else if (bus_iova_mode == RTE_IOVA_VA) + buses_want_va = true; + } + if (buses_want_va && !buses_want_pa) { + mode = RTE_IOVA_VA; + } else if (buses_want_pa && !buses_want_va) { + mode = RTE_IOVA_PA; + } else { + mode = RTE_IOVA_DC; + if (buses_want_va) { + RTE_LOG(WARNING, EAL, "Some buses want 'VA' but forcing 'DC' because other buses want 'PA'.\n"); + RTE_LOG(WARNING, EAL, "Depending on the final decision by the EAL, not all buses may be able to initialize.\n"); + } } return mode; -- 2.22.0 ^ permalink raw reply [flat|nested] 57+ messages in thread
* [dpdk-dev] [PATCH v3 3/4] eal: change RTE_PCI_DRV_IOVA_AS_VA flag name 2019-07-18 6:45 ` [dpdk-dev] [PATCH v3 0/4] Fixes on IOVA mode selection jerinj 2019-07-18 6:45 ` [dpdk-dev] [PATCH v3 1/4] Revert "bus/pci: add Mellanox kernel driver type" jerinj 2019-07-18 6:45 ` [dpdk-dev] [PATCH v3 2/4] eal: fix IOVA mode selection as VA for pci drivers jerinj @ 2019-07-18 6:45 ` jerinj 2019-07-18 6:45 ` [dpdk-dev] [PATCH v3 4/4] eal: select IOVA mode as VA for default case jerinj 2019-07-22 11:28 ` [dpdk-dev] [PATCH v3 0/4] Fixes on IOVA mode selection David Marchand 4 siblings, 0 replies; 57+ messages in thread From: jerinj @ 2019-07-18 6:45 UTC (permalink / raw) To: dev, Pavan Nikhilesh, Jerin Jacob, Nithin Dabilpuram, Vamsi Attunuru, Kiran Kumar K, Satha Rao Cc: thomas, david.marchand, anatoly.burakov From: Jerin Jacob <jerinj@marvell.com> In order to align name with other PCI driver flag such as RTE_PCI_DRV_NEED_MAPPING and to reflect its purpose, Change RTE_PCI_DRV_IOVA_AS_VA flag name as RTE_PCI_DRV_NEED_IOVA_AS_VA. Signed-off-by: Jerin Jacob <jerinj@marvell.com> --- drivers/bus/pci/linux/pci.c | 4 ++-- drivers/bus/pci/rte_bus_pci.h | 4 ++-- drivers/event/octeontx/timvf_probe.c | 2 +- drivers/event/octeontx2/otx2_evdev.c | 2 +- drivers/mempool/octeontx/octeontx_fpavf.c | 2 +- drivers/mempool/octeontx2/otx2_mempool.c | 2 +- drivers/net/octeontx2/otx2_ethdev.c | 2 +- drivers/raw/octeontx2_dma/otx2_dpi_rawdev.c | 2 +- 8 files changed, 10 insertions(+), 10 deletions(-) diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c index 1a2f99b32..1d8d20d93 100644 --- a/drivers/bus/pci/linux/pci.c +++ b/drivers/bus/pci/linux/pci.c @@ -580,7 +580,7 @@ pci_device_iova_mode(const struct rte_pci_driver *pdrv, } if (is_vfio_noiommu_enabled != 0) iova_mode = RTE_IOVA_PA; - else if ((pdrv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) != 0) + else if ((pdrv->drv_flags & RTE_PCI_DRV_NEED_IOVA_AS_VA) != 0) iova_mode = RTE_IOVA_VA; #endif break; @@ -592,7 +592,7 @@ pci_device_iova_mode(const struct rte_pci_driver *pdrv, break; default: - if ((pdrv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) != 0) + if ((pdrv->drv_flags & RTE_PCI_DRV_NEED_IOVA_AS_VA) != 0) iova_mode = RTE_IOVA_VA; break; } diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h index 0f2177564..29bea6d70 100644 --- a/drivers/bus/pci/rte_bus_pci.h +++ b/drivers/bus/pci/rte_bus_pci.h @@ -187,8 +187,8 @@ struct rte_pci_bus { #define RTE_PCI_DRV_INTR_RMV 0x0010 /** Device driver needs to keep mapped resources if unsupported dev detected */ #define RTE_PCI_DRV_KEEP_MAPPED_RES 0x0020 -/** Device driver only supports IOVA as VA and cannot work with IOVA as PA */ -#define RTE_PCI_DRV_IOVA_AS_VA 0x0040 +/** Device driver needs IOVA as VA and cannot work with IOVA as PA */ +#define RTE_PCI_DRV_NEED_IOVA_AS_VA 0x0040 /** * Map the PCI device resources in user space virtual memory address diff --git a/drivers/event/octeontx/timvf_probe.c b/drivers/event/octeontx/timvf_probe.c index 08dbd2be9..af87625fd 100644 --- a/drivers/event/octeontx/timvf_probe.c +++ b/drivers/event/octeontx/timvf_probe.c @@ -140,7 +140,7 @@ static const struct rte_pci_id pci_timvf_map[] = { static struct rte_pci_driver pci_timvf = { .id_table = pci_timvf_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_NEED_IOVA_AS_VA, .probe = timvf_probe, .remove = NULL, }; diff --git a/drivers/event/octeontx2/otx2_evdev.c b/drivers/event/octeontx2/otx2_evdev.c index 56716c2ac..e6379e3b4 100644 --- a/drivers/event/octeontx2/otx2_evdev.c +++ b/drivers/event/octeontx2/otx2_evdev.c @@ -1630,7 +1630,7 @@ static const struct rte_pci_id pci_sso_map[] = { static struct rte_pci_driver pci_sso = { .id_table = pci_sso_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_NEED_IOVA_AS_VA, .probe = otx2_sso_probe, .remove = otx2_sso_remove, }; diff --git a/drivers/mempool/octeontx/octeontx_fpavf.c b/drivers/mempool/octeontx/octeontx_fpavf.c index 4cf387e8f..baabc0152 100644 --- a/drivers/mempool/octeontx/octeontx_fpavf.c +++ b/drivers/mempool/octeontx/octeontx_fpavf.c @@ -799,7 +799,7 @@ static const struct rte_pci_id pci_fpavf_map[] = { static struct rte_pci_driver pci_fpavf = { .id_table = pci_fpavf_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_NEED_IOVA_AS_VA, .probe = fpavf_probe, }; diff --git a/drivers/mempool/octeontx2/otx2_mempool.c b/drivers/mempool/octeontx2/otx2_mempool.c index 9a5f11cf4..3a4a9425f 100644 --- a/drivers/mempool/octeontx2/otx2_mempool.c +++ b/drivers/mempool/octeontx2/otx2_mempool.c @@ -443,7 +443,7 @@ static const struct rte_pci_id pci_npa_map[] = { static struct rte_pci_driver pci_npa = { .id_table = pci_npa_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_NEED_IOVA_AS_VA, .probe = npa_probe, .remove = npa_remove, }; diff --git a/drivers/net/octeontx2/otx2_ethdev.c b/drivers/net/octeontx2/otx2_ethdev.c index 5ec55511b..7b91f6b31 100644 --- a/drivers/net/octeontx2/otx2_ethdev.c +++ b/drivers/net/octeontx2/otx2_ethdev.c @@ -2001,7 +2001,7 @@ static const struct rte_pci_id pci_nix_map[] = { static struct rte_pci_driver pci_nix = { .id_table = pci_nix_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_IOVA_AS_VA | + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_NEED_IOVA_AS_VA | RTE_PCI_DRV_INTR_LSC, .probe = nix_probe, .remove = nix_remove, diff --git a/drivers/raw/octeontx2_dma/otx2_dpi_rawdev.c b/drivers/raw/octeontx2_dma/otx2_dpi_rawdev.c index 6a1b43678..e398abb75 100644 --- a/drivers/raw/octeontx2_dma/otx2_dpi_rawdev.c +++ b/drivers/raw/octeontx2_dma/otx2_dpi_rawdev.c @@ -427,7 +427,7 @@ otx2_dpi_rawdev_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver rte_dpi_rawdev_pmd = { .id_table = pci_dma_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_NEED_IOVA_AS_VA, .probe = otx2_dpi_rawdev_probe, .remove = otx2_dpi_rawdev_remove, }; -- 2.22.0 ^ permalink raw reply [flat|nested] 57+ messages in thread
* [dpdk-dev] [PATCH v3 4/4] eal: select IOVA mode as VA for default case 2019-07-18 6:45 ` [dpdk-dev] [PATCH v3 0/4] Fixes on IOVA mode selection jerinj ` (2 preceding siblings ...) 2019-07-18 6:45 ` [dpdk-dev] [PATCH v3 3/4] eal: change RTE_PCI_DRV_IOVA_AS_VA flag name jerinj @ 2019-07-18 6:45 ` jerinj 2019-07-22 11:28 ` [dpdk-dev] [PATCH v3 0/4] Fixes on IOVA mode selection David Marchand 4 siblings, 0 replies; 57+ messages in thread From: jerinj @ 2019-07-18 6:45 UTC (permalink / raw) To: dev, Anatoly Burakov, John McNamara, Marko Kovacevic Cc: thomas, david.marchand, Jerin Jacob From: Jerin Jacob <jerinj@marvell.com> When bus layer selected the preferred mode as RTE_IOVA_DC then select the IOVA mode as RTE_IOVA_VA. The RTE_IOVA_VA selected as the default because, 1) All drivers work in RTE_IOVA_VA mode, irrespective of physical address availability. 2) By default, the mempool, first asks for IOVA-contiguous memory using RTE_MEMZONE_IOVA_CONTIG and this is slow in IOVA as PA mode and it may affect the application boot time. Signed-off-by: Jerin Jacob <jerinj@marvell.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> --- .../prog_guide/env_abstraction_layer.rst | 18 ++++++++++++++++-- lib/librte_eal/linux/eal/eal.c | 6 ++---- 2 files changed, 18 insertions(+), 6 deletions(-) diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst index 77307e3a6..b17c9dad9 100644 --- a/doc/guides/prog_guide/env_abstraction_layer.rst +++ b/doc/guides/prog_guide/env_abstraction_layer.rst @@ -445,11 +445,25 @@ kernels. - if the preferred mode is RTE_IOVA_PA but there is no access to Physical Addresses, then EAL init will fail early, since later probing of the devices would fail anyway, -- if the preferred mode is RTE_IOVA_DC then based on the Physical Addresses - availability, the preferred mode is adjusted to RTE_IOVA_PA or RTE_IOVA_VA. +- if the preferred mode is RTE_IOVA_DC then select the IOVA mode as RTE_IOVA_VA. + The RTE_IOVA_VA selected as the default because, + +#. All drivers work in RTE_IOVA_VA mode, irrespective of physical address availability. + +#. By default, the mempool, first asks for IOVA-contiguous memory using ``RTE_MEMZONE_IOVA_CONTIG``, + and this is slow in IOVA as PA mode and it may affect the application boot time. + +#. It is easy to enable large amount of IOVA-contiguous memory use-cases with IOVA in VA mode. + In the case when the buses had disagreed on the IOVA Mode at the first step, part of the buses won't work because of this decision. +.. note:: + + If the device driver needs IOVA as VA and it cannot work with IOVA as PA + then the driver must request the PCI bus layer using ``RTE_PCI_DRV_NEED_IOVA_AS_VA`` + requirement flag. Absence of this flag, dictates, the driver must support both IOVA as PA and VA modes. + IOVA Mode Configuration ~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/lib/librte_eal/linux/eal/eal.c b/lib/librte_eal/linux/eal/eal.c index 2e5499f9b..34db78753 100644 --- a/lib/librte_eal/linux/eal/eal.c +++ b/lib/librte_eal/linux/eal/eal.c @@ -1061,10 +1061,8 @@ rte_eal_init(int argc, char **argv) enum rte_iova_mode iova_mode = rte_bus_get_iommu_class(); if (iova_mode == RTE_IOVA_DC) { - iova_mode = phys_addrs ? RTE_IOVA_PA : RTE_IOVA_VA; - RTE_LOG(DEBUG, EAL, - "Buses did not request a specific IOVA mode, using '%s' based on physical addresses availability.\n", - phys_addrs ? "PA" : "VA"); + iova_mode = RTE_IOVA_VA; + RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode, select IOVA as VA mode.\n"); } #ifdef RTE_LIBRTE_KNI /* Workaround for KNI which requires physical address to work */ -- 2.22.0 ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [dpdk-dev] [PATCH v3 0/4] Fixes on IOVA mode selection 2019-07-18 6:45 ` [dpdk-dev] [PATCH v3 0/4] Fixes on IOVA mode selection jerinj ` (3 preceding siblings ...) 2019-07-18 6:45 ` [dpdk-dev] [PATCH v3 4/4] eal: select IOVA mode as VA for default case jerinj @ 2019-07-22 11:28 ` David Marchand 4 siblings, 0 replies; 57+ messages in thread From: David Marchand @ 2019-07-22 11:28 UTC (permalink / raw) To: Jerin Jacob Kollanukkaran; +Cc: dev, Thomas Monjalon, Burakov, Anatoly On Thu, Jul 18, 2019 at 8:45 AM <jerinj@marvell.com> wrote: > > From: Jerin Jacob <jerinj@marvell.com> > > Orginal V1 cover letter from David Marchand: > > Following the issues reported by Jerin and the discussion that emerged > from it, here are fixes to restore and document the behavior of the EAL > and the pci bus driver. > > I pondered all the arguments and tried to have the less changes > possible. > I can't find a need for a flag to just announce support of physical > addresses from the pmd point of view. > So it ended up with something really close to what Jerin had suggested. > > But the problem is that this is still unfinished wrt the documentation. > I will be offline for 10 days and we need this to move forward, so > sending > anyway. > > > v3: > - Patch 2/4 - Remove personal appeals in log messages(Anatoly) > - Patch 4/4 - Added following documentation (Anatoly) > a) #. It is easy to enable large amount of IOVA-contiguous memory use-cases with IOVA in VA mode. > in the reasons for VA as default > b) As a note, > If the device driver needs IOVA as VA and it cannot work with IOVA > as PA then the driver must request the PCI bus layer using > ``RTE_PCI_DRV_NEED_IOVA_AS_VA`` requirement flag. > Absence of this flag, dictates, the driver must support both IOVA as PA and VA modes. > > v2: > - Changed RTE_PCI_DRV_IOVA_AS_VA flag name as RTE_PCI_DRV_NEED_IOVA_AS_VA(patch 3/4) > - Changed IOVA mode as VA for default case(patch 4/4) with documentation > - Tested the patch series on octeontx2 platform Many thanks to both of you (Jerin, Anatoly) for working on this while I was away. I have some minor comments on the newly added patches, I will send a v4 with the changes in it directly. -- David Marchand ^ permalink raw reply [flat|nested] 57+ messages in thread
* [dpdk-dev] [PATCH v4 0/4] Fixes on IOVA mode selection 2019-07-10 21:48 [dpdk-dev] [PATCH 0/2] Fixes on IOVA mode selection David Marchand ` (2 preceding siblings ...) 2019-07-16 13:46 ` [dpdk-dev] [PATCH v2 0/4] Fixes on IOVA mode selection jerinj @ 2019-07-22 12:56 ` David Marchand 2019-07-22 12:56 ` [dpdk-dev] [PATCH v4 1/4] Revert "bus/pci: add Mellanox kernel driver type" David Marchand ` (4 more replies) 3 siblings, 5 replies; 57+ messages in thread From: David Marchand @ 2019-07-22 12:56 UTC (permalink / raw) To: dev; +Cc: anatoly.burakov, jerinj, thomas Following the issues reported by Jerin and the discussion that emerged from it, here are fixes to restore and document the behavior of the EAL and the pci bus driver. I pondered all the arguments and tried to have the less changes possible. I can't find a need for a flag to just announce support of physical addresses from the pmd point of view. So it ended up with something really close to what Jerin had suggested. But the problem is that this is still unfinished wrt the documentation. I will be offline for 10 days and we need this to move forward, so sending anyway. Changelog since v3: - fixed typos in patch 2, - updated patch 3 title, - moved and reworded comments in the note section in patch 4, Changelog since v2 (Jerin): - Patch 2/4 - Remove personal appeals in log messages(Anatoly) - Patch 4/4 - Added documentation (Anatoly) Changelog since v1 (Jerin): - Changed RTE_PCI_DRV_IOVA_AS_VA flag name as RTE_PCI_DRV_NEED_IOVA_AS_VA (patch 3/4) - Changed IOVA mode as VA for default case(patch 4/4) with documentation - Tested the patch series on octeontx2 platform -- David Marchand David Marchand (2): Revert "bus/pci: add Mellanox kernel driver type" eal: fix IOVA mode selection as VA for PCI drivers Jerin Jacob (2): drivers: change IOVA as VA PCI flag name eal: select IOVA as VA mode for default case doc/guides/prog_guide/env_abstraction_layer.rst | 49 +++++++++++++++++++++++++ drivers/bus/pci/linux/pci.c | 24 +++--------- drivers/bus/pci/pci_common.c | 30 ++++++++++++--- drivers/bus/pci/rte_bus_pci.h | 4 +- drivers/event/octeontx/timvf_probe.c | 2 +- drivers/event/octeontx2/otx2_evdev.c | 2 +- drivers/mempool/octeontx/octeontx_fpavf.c | 2 +- drivers/mempool/octeontx2/otx2_mempool.c | 2 +- drivers/net/atlantic/atl_ethdev.c | 3 +- drivers/net/bnxt/bnxt_ethdev.c | 3 +- drivers/net/e1000/em_ethdev.c | 3 +- drivers/net/e1000/igb_ethdev.c | 5 +-- drivers/net/enic/enic_ethdev.c | 3 +- drivers/net/fm10k/fm10k_ethdev.c | 3 +- drivers/net/i40e/i40e_ethdev.c | 3 +- drivers/net/i40e/i40e_ethdev_vf.c | 2 +- drivers/net/iavf/iavf_ethdev.c | 3 +- drivers/net/ice/ice_ethdev.c | 3 +- drivers/net/ixgbe/ixgbe_ethdev.c | 5 +-- drivers/net/mlx4/mlx4.c | 3 +- drivers/net/mlx5/mlx5.c | 2 +- drivers/net/nfp/nfp_net.c | 6 +-- drivers/net/octeontx2/otx2_ethdev.c | 7 +--- drivers/net/qede/qede_ethdev.c | 6 +-- drivers/raw/ioat/ioat_rawdev.c | 3 +- drivers/raw/octeontx2_dma/otx2_dpi_rawdev.c | 2 +- lib/librte_eal/common/eal_common_bus.c | 30 +++++++++++++-- lib/librte_eal/common/include/rte_dev.h | 1 - lib/librte_eal/linux/eal/eal.c | 6 +-- 29 files changed, 136 insertions(+), 81 deletions(-) -- 1.8.3.1 ^ permalink raw reply [flat|nested] 57+ messages in thread
* [dpdk-dev] [PATCH v4 1/4] Revert "bus/pci: add Mellanox kernel driver type" 2019-07-22 12:56 ` [dpdk-dev] [PATCH v4 " David Marchand @ 2019-07-22 12:56 ` David Marchand 2019-07-22 12:56 ` [dpdk-dev] [PATCH v4 2/4] eal: fix IOVA mode selection as VA for PCI drivers David Marchand ` (3 subsequent siblings) 4 siblings, 0 replies; 57+ messages in thread From: David Marchand @ 2019-07-22 12:56 UTC (permalink / raw) To: dev; +Cc: anatoly.burakov, jerinj, thomas This reverts commit 0cb86518db57d35e0abc14d6703fad561a0310e2. The PCI bus now reports DC when faced with a device bound to an unknown driver and, in such a case, the IOVA mode is selected against physical address availability. As a consequence, there is no reason for this special case for Mellanox drivers. Fixes: 703458e19c16 ("bus/pci: consider only usable devices for IOVA mode") Signed-off-by: David Marchand <david.marchand@redhat.com> Reviewed-by: Jerin Jacob <jerinj@marvell.com> --- drivers/bus/pci/linux/pci.c | 8 -------- lib/librte_eal/common/include/rte_dev.h | 1 - 2 files changed, 9 deletions(-) diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c index 33c8ea7..b12f10a 100644 --- a/drivers/bus/pci/linux/pci.c +++ b/drivers/bus/pci/linux/pci.c @@ -329,9 +329,6 @@ dev->kdrv = RTE_KDRV_IGB_UIO; else if (!strcmp(driver, "uio_pci_generic")) dev->kdrv = RTE_KDRV_UIO_GENERIC; - else if (!strcmp(driver, "mlx4_core") || - !strcmp(driver, "mlx5_core")) - dev->kdrv = RTE_KDRV_NIC_MLX; else dev->kdrv = RTE_KDRV_UNKNOWN; } else @@ -591,11 +588,6 @@ enum rte_iova_mode break; } - case RTE_KDRV_NIC_MLX: - if ((pdrv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) == 0) - iova_mode = RTE_IOVA_PA; - break; - case RTE_KDRV_IGB_UIO: case RTE_KDRV_UIO_GENERIC: iova_mode = RTE_IOVA_PA; diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h index 94829f6..c25e09e 100644 --- a/lib/librte_eal/common/include/rte_dev.h +++ b/lib/librte_eal/common/include/rte_dev.h @@ -63,7 +63,6 @@ enum rte_kernel_driver { RTE_KDRV_VFIO, RTE_KDRV_UIO_GENERIC, RTE_KDRV_NIC_UIO, - RTE_KDRV_NIC_MLX, RTE_KDRV_NONE, }; -- 1.8.3.1 ^ permalink raw reply [flat|nested] 57+ messages in thread
* [dpdk-dev] [PATCH v4 2/4] eal: fix IOVA mode selection as VA for PCI drivers 2019-07-22 12:56 ` [dpdk-dev] [PATCH v4 " David Marchand 2019-07-22 12:56 ` [dpdk-dev] [PATCH v4 1/4] Revert "bus/pci: add Mellanox kernel driver type" David Marchand @ 2019-07-22 12:56 ` David Marchand 2019-11-25 9:33 ` Ferruh Yigit 2019-07-22 12:56 ` [dpdk-dev] [PATCH v4 3/4] drivers: change IOVA as VA PCI flag name David Marchand ` (2 subsequent siblings) 4 siblings, 1 reply; 57+ messages in thread From: David Marchand @ 2019-07-22 12:56 UTC (permalink / raw) To: dev Cc: anatoly.burakov, jerinj, thomas, John McNamara, Marko Kovacevic, Igor Russkikh, Pavel Belous, Ajit Khaparde, Somnath Kotur, Wenzhuo Lu, John Daley, Hyong Youb Kim, Qi Zhang, Xiao Wang, Beilei Xing, Jingjing Wu, Qiming Yang, Konstantin Ananyev, Matan Azrad, Shahaf Shuler, Yongseok Koh, Viacheslav Ovsiienko, Alejandro Lucero, Nithin Dabilpuram, Kiran Kumar K, Rasesh Mody, Shahed Shaikh, Bruce Richardson The incriminated commit broke the use of RTE_PCI_DRV_IOVA_AS_VA which was intended to mean "driver only supports VA" but had been understood as "driver supports both PA and VA" by most net drivers and used to let dpdk processes to run as non root (which do not have access to physical addresses on recent kernels). The check on physical addresses actually closed the gap for those drivers. We don't need to mark them with RTE_PCI_DRV_IOVA_AS_VA and this flag can retain its intended meaning. Document explicitly its meaning. We can check that a driver requirement wrt to IOVA mode is fulfilled before trying to probe a device. Finally, document the heuristic used to select the IOVA mode and hope that we won't break it again. Fixes: 703458e19c16 ("bus/pci: consider only usable devices for IOVA mode") Signed-off-by: David Marchand <david.marchand@redhat.com> Reviewed-by: Jerin Jacob <jerinj@marvell.com> Tested-by: Jerin Jacob <jerinj@marvell.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> --- Changelog since v3: - fixed typos, --- doc/guides/prog_guide/env_abstraction_layer.rst | 31 +++++++++++++++++++++++++ drivers/bus/pci/linux/pci.c | 16 +++++-------- drivers/bus/pci/pci_common.c | 30 +++++++++++++++++++----- drivers/bus/pci/rte_bus_pci.h | 4 ++-- drivers/net/atlantic/atl_ethdev.c | 3 +-- drivers/net/bnxt/bnxt_ethdev.c | 3 +-- drivers/net/e1000/em_ethdev.c | 3 +-- drivers/net/e1000/igb_ethdev.c | 5 ++-- drivers/net/enic/enic_ethdev.c | 3 +-- drivers/net/fm10k/fm10k_ethdev.c | 3 +-- drivers/net/i40e/i40e_ethdev.c | 3 +-- drivers/net/i40e/i40e_ethdev_vf.c | 2 +- drivers/net/iavf/iavf_ethdev.c | 3 +-- drivers/net/ice/ice_ethdev.c | 3 +-- drivers/net/ixgbe/ixgbe_ethdev.c | 5 ++-- drivers/net/mlx4/mlx4.c | 3 +-- drivers/net/mlx5/mlx5.c | 2 +- drivers/net/nfp/nfp_net.c | 6 ++--- drivers/net/octeontx2/otx2_ethdev.c | 5 ---- drivers/net/qede/qede_ethdev.c | 6 ++--- drivers/raw/ioat/ioat_rawdev.c | 3 +-- lib/librte_eal/common/eal_common_bus.c | 30 +++++++++++++++++++++--- 22 files changed, 110 insertions(+), 62 deletions(-) diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst index f15bcd9..1d63675 100644 --- a/doc/guides/prog_guide/env_abstraction_layer.rst +++ b/doc/guides/prog_guide/env_abstraction_layer.rst @@ -419,6 +419,37 @@ Misc Functions Locks and atomic operations are per-architecture (i686 and x86_64). +IOVA Mode Detection +~~~~~~~~~~~~~~~~~~~ + +IOVA Mode is selected by considering what the current usable Devices on the +system require and/or support. + +Below is the 2-step heuristic for this choice. + +For the first step, EAL asks each bus its requirement in terms of IOVA mode +and decides on a preferred IOVA mode. + +- if all buses report RTE_IOVA_PA, then the preferred IOVA mode is RTE_IOVA_PA, +- if all buses report RTE_IOVA_VA, then the preferred IOVA mode is RTE_IOVA_VA, +- if all buses report RTE_IOVA_DC, no bus expressed a preferrence, then the + preferred mode is RTE_IOVA_DC, +- if the buses disagree (at least one wants RTE_IOVA_PA and at least one wants + RTE_IOVA_VA), then the preferred IOVA mode is RTE_IOVA_DC (see below with the + check on Physical Addresses availability), + +The second step checks if the preferred mode complies with the Physical +Addresses availability since those are only available to root user in recent +kernels. + +- if the preferred mode is RTE_IOVA_PA but there is no access to Physical + Addresses, then EAL init fails early, since later probing of the devices + would fail anyway, +- if the preferred mode is RTE_IOVA_DC then based on the Physical Addresses + availability, the preferred mode is adjusted to RTE_IOVA_PA or RTE_IOVA_VA. + In the case when the buses had disagreed on the IOVA Mode at the first step, + part of the buses won't work because of this decision. + IOVA Mode Configuration ~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c index b12f10a..1a2f99b 100644 --- a/drivers/bus/pci/linux/pci.c +++ b/drivers/bus/pci/linux/pci.c @@ -578,12 +578,10 @@ enum rte_iova_mode else is_vfio_noiommu_enabled = 0; } - if ((pdrv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) == 0) { + if (is_vfio_noiommu_enabled != 0) iova_mode = RTE_IOVA_PA; - } else if (is_vfio_noiommu_enabled != 0) { - RTE_LOG(DEBUG, EAL, "Forcing to 'PA', vfio-noiommu mode configured\n"); - iova_mode = RTE_IOVA_PA; - } + else if ((pdrv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) != 0) + iova_mode = RTE_IOVA_VA; #endif break; } @@ -594,8 +592,8 @@ enum rte_iova_mode break; default: - RTE_LOG(DEBUG, EAL, "Unsupported kernel driver? Defaulting to IOVA as 'PA'\n"); - iova_mode = RTE_IOVA_PA; + if ((pdrv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) != 0) + iova_mode = RTE_IOVA_VA; break; } @@ -607,10 +605,8 @@ enum rte_iova_mode if (iommu_no_va == -1) iommu_no_va = pci_one_device_iommu_support_va(pdev) ? 0 : 1; - if (iommu_no_va != 0) { - RTE_LOG(DEBUG, EAL, "Forcing to 'PA', IOMMU does not support IOVA as 'VA'\n"); + if (iommu_no_va != 0) iova_mode = RTE_IOVA_PA; - } } return iova_mode; } diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c index d2af472..9794552 100644 --- a/drivers/bus/pci/pci_common.c +++ b/drivers/bus/pci/pci_common.c @@ -169,8 +169,22 @@ static struct rte_devargs *pci_devargs_lookup(struct rte_pci_device *dev) * This needs to be before rte_pci_map_device(), as it enables to use * driver flags for adjusting configuration. */ - if (!already_probed) + if (!already_probed) { + enum rte_iova_mode dev_iova_mode; + enum rte_iova_mode iova_mode; + + dev_iova_mode = pci_device_iova_mode(dr, dev); + iova_mode = rte_eal_iova_mode(); + if (dev_iova_mode != RTE_IOVA_DC && + dev_iova_mode != iova_mode) { + RTE_LOG(ERR, EAL, " Expecting '%s' IOVA mode but current mode is '%s', not initializing\n", + dev_iova_mode == RTE_IOVA_PA ? "PA" : "VA", + iova_mode == RTE_IOVA_PA ? "PA" : "VA"); + return -EINVAL; + } + dev->driver = dr; + } if (!already_probed && (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING)) { /* map resources for devices that use igb_uio */ @@ -629,12 +643,16 @@ enum rte_iova_mode devices_want_va = true; } } - if (devices_want_pa) { - iova_mode = RTE_IOVA_PA; - if (devices_want_va) - RTE_LOG(WARNING, EAL, "Some devices want 'VA' but forcing 'PA' because other devices want it\n"); - } else if (devices_want_va) { + if (devices_want_va && !devices_want_pa) { iova_mode = RTE_IOVA_VA; + } else if (devices_want_pa && !devices_want_va) { + iova_mode = RTE_IOVA_PA; + } else { + iova_mode = RTE_IOVA_DC; + if (devices_want_va) { + RTE_LOG(WARNING, EAL, "Some devices want 'VA' but forcing 'DC' because other devices want 'PA'.\n"); + RTE_LOG(WARNING, EAL, "Depending on the final decision by the EAL, not all devices may be able to initialize.\n"); + } } return iova_mode; } diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h index 06e004c..0f21775 100644 --- a/drivers/bus/pci/rte_bus_pci.h +++ b/drivers/bus/pci/rte_bus_pci.h @@ -187,8 +187,8 @@ struct rte_pci_bus { #define RTE_PCI_DRV_INTR_RMV 0x0010 /** Device driver needs to keep mapped resources if unsupported dev detected */ #define RTE_PCI_DRV_KEEP_MAPPED_RES 0x0020 -/** Device driver supports IOVA as VA */ -#define RTE_PCI_DRV_IOVA_AS_VA 0X0040 +/** Device driver only supports IOVA as VA and cannot work with IOVA as PA */ +#define RTE_PCI_DRV_IOVA_AS_VA 0x0040 /** * Map the PCI device resources in user space virtual memory address diff --git a/drivers/net/atlantic/atl_ethdev.c b/drivers/net/atlantic/atl_ethdev.c index fdc0a7f..fa89ae7 100644 --- a/drivers/net/atlantic/atl_ethdev.c +++ b/drivers/net/atlantic/atl_ethdev.c @@ -157,8 +157,7 @@ static void atl_dev_info_get(struct rte_eth_dev *dev, static struct rte_pci_driver rte_atl_pmd = { .id_table = pci_id_atl_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = eth_atl_pci_probe, .remove = eth_atl_pci_remove, }; diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c index 8fc5103..9306d56 100644 --- a/drivers/net/bnxt/bnxt_ethdev.c +++ b/drivers/net/bnxt/bnxt_ethdev.c @@ -4028,8 +4028,7 @@ static int bnxt_pci_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver bnxt_rte_pmd = { .id_table = bnxt_pci_id_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | - RTE_PCI_DRV_INTR_LSC | RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = bnxt_pci_probe, .remove = bnxt_pci_remove, }; diff --git a/drivers/net/e1000/em_ethdev.c b/drivers/net/e1000/em_ethdev.c index dc88661..0c859e5 100644 --- a/drivers/net/e1000/em_ethdev.c +++ b/drivers/net/e1000/em_ethdev.c @@ -352,8 +352,7 @@ static int eth_em_pci_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver rte_em_pmd = { .id_table = pci_id_em_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = eth_em_pci_probe, .remove = eth_em_pci_remove, }; diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c index 3ee28cf..e784eeb 100644 --- a/drivers/net/e1000/igb_ethdev.c +++ b/drivers/net/e1000/igb_ethdev.c @@ -1116,8 +1116,7 @@ static int eth_igb_pci_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver rte_igb_pmd = { .id_table = pci_id_igb_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = eth_igb_pci_probe, .remove = eth_igb_pci_remove, }; @@ -1140,7 +1139,7 @@ static int eth_igbvf_pci_remove(struct rte_pci_device *pci_dev) */ static struct rte_pci_driver rte_igbvf_pmd = { .id_table = pci_id_igbvf_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING, .probe = eth_igbvf_pci_probe, .remove = eth_igbvf_pci_remove, }; diff --git a/drivers/net/enic/enic_ethdev.c b/drivers/net/enic/enic_ethdev.c index 5cfbd31..e9c6f83 100644 --- a/drivers/net/enic/enic_ethdev.c +++ b/drivers/net/enic/enic_ethdev.c @@ -1247,8 +1247,7 @@ static int eth_enic_pci_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver rte_enic_pmd = { .id_table = pci_id_enic_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = eth_enic_pci_probe, .remove = eth_enic_pci_remove, }; diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c index a1e3836..2d3c477 100644 --- a/drivers/net/fm10k/fm10k_ethdev.c +++ b/drivers/net/fm10k/fm10k_ethdev.c @@ -3268,8 +3268,7 @@ static int eth_fm10k_pci_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver rte_pmd_fm10k = { .id_table = pci_id_fm10k_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = eth_fm10k_pci_probe, .remove = eth_fm10k_pci_remove, }; diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c index 2b9fc45..dd46d4d 100644 --- a/drivers/net/i40e/i40e_ethdev.c +++ b/drivers/net/i40e/i40e_ethdev.c @@ -696,8 +696,7 @@ static int eth_i40e_pci_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver rte_i40e_pmd = { .id_table = pci_id_i40e_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = eth_i40e_pci_probe, .remove = eth_i40e_pci_remove, }; diff --git a/drivers/net/i40e/i40e_ethdev_vf.c b/drivers/net/i40e/i40e_ethdev_vf.c index 5be32b0..3ff2f60 100644 --- a/drivers/net/i40e/i40e_ethdev_vf.c +++ b/drivers/net/i40e/i40e_ethdev_vf.c @@ -1557,7 +1557,7 @@ static int eth_i40evf_pci_remove(struct rte_pci_device *pci_dev) */ static struct rte_pci_driver rte_i40evf_pmd = { .id_table = pci_id_i40evf_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING, .probe = eth_i40evf_pci_probe, .remove = eth_i40evf_pci_remove, }; diff --git a/drivers/net/iavf/iavf_ethdev.c b/drivers/net/iavf/iavf_ethdev.c index 53dc05c..a97cd76 100644 --- a/drivers/net/iavf/iavf_ethdev.c +++ b/drivers/net/iavf/iavf_ethdev.c @@ -1402,8 +1402,7 @@ static int eth_iavf_pci_remove(struct rte_pci_device *pci_dev) /* Adaptive virtual function driver struct */ static struct rte_pci_driver rte_iavf_pmd = { .id_table = pci_id_iavf_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = eth_iavf_pci_probe, .remove = eth_iavf_pci_remove, }; diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c index 9ce730c..f05b48c 100644 --- a/drivers/net/ice/ice_ethdev.c +++ b/drivers/net/ice/ice_ethdev.c @@ -3737,8 +3737,7 @@ static int ice_xstats_get_names(__rte_unused struct rte_eth_dev *dev, static struct rte_pci_driver rte_ice_pmd = { .id_table = pci_id_ice_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = ice_pci_probe, .remove = ice_pci_remove, }; diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c index 22c5b2c..4a6e5c3 100644 --- a/drivers/net/ixgbe/ixgbe_ethdev.c +++ b/drivers/net/ixgbe/ixgbe_ethdev.c @@ -1869,8 +1869,7 @@ static int eth_ixgbe_pci_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver rte_ixgbe_pmd = { .id_table = pci_id_ixgbe_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = eth_ixgbe_pci_probe, .remove = eth_ixgbe_pci_remove, }; @@ -1892,7 +1891,7 @@ static int eth_ixgbevf_pci_remove(struct rte_pci_device *pci_dev) */ static struct rte_pci_driver rte_ixgbevf_pmd = { .id_table = pci_id_ixgbevf_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING, .probe = eth_ixgbevf_pci_probe, .remove = eth_ixgbevf_pci_remove, }; diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c index 2e169b0..d6e5753 100644 --- a/drivers/net/mlx4/mlx4.c +++ b/drivers/net/mlx4/mlx4.c @@ -1142,8 +1142,7 @@ struct mlx4_conf { }, .id_table = mlx4_pci_id_map, .probe = mlx4_pci_probe, - .drv_flags = RTE_PCI_DRV_INTR_LSC | RTE_PCI_DRV_INTR_RMV | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_INTR_LSC | RTE_PCI_DRV_INTR_RMV, }; #ifdef RTE_IBVERBS_LINK_DLOPEN diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index d93f92d..0f05853 100644 --- a/drivers/net/mlx5/mlx5.c +++ b/drivers/net/mlx5/mlx5.c @@ -2087,7 +2087,7 @@ struct mlx5_dev_spawn_data { .dma_map = mlx5_dma_map, .dma_unmap = mlx5_dma_unmap, .drv_flags = RTE_PCI_DRV_INTR_LSC | RTE_PCI_DRV_INTR_RMV | - RTE_PCI_DRV_PROBE_AGAIN | RTE_PCI_DRV_IOVA_AS_VA, + RTE_PCI_DRV_PROBE_AGAIN, }; #ifdef RTE_IBVERBS_LINK_DLOPEN diff --git a/drivers/net/nfp/nfp_net.c b/drivers/net/nfp/nfp_net.c index 1a7aa17..f5d33ef 100644 --- a/drivers/net/nfp/nfp_net.c +++ b/drivers/net/nfp/nfp_net.c @@ -3760,16 +3760,14 @@ static int eth_nfp_pci_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver rte_nfp_net_pf_pmd = { .id_table = pci_id_nfp_pf_net_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = nfp_pf_pci_probe, .remove = eth_nfp_pci_remove, }; static struct rte_pci_driver rte_nfp_net_vf_pmd = { .id_table = pci_id_nfp_vf_net_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = eth_nfp_pci_probe, .remove = eth_nfp_pci_remove, }; diff --git a/drivers/net/octeontx2/otx2_ethdev.c b/drivers/net/octeontx2/otx2_ethdev.c index fcb1869..5ec5551 100644 --- a/drivers/net/octeontx2/otx2_ethdev.c +++ b/drivers/net/octeontx2/otx2_ethdev.c @@ -1188,11 +1188,6 @@ goto fail; } - if (rte_eal_iova_mode() != RTE_IOVA_VA) { - otx2_err("iova mode should be va"); - goto fail; - } - if (conf->link_speeds & ETH_LINK_SPEED_FIXED) { otx2_err("Setting link speed/duplex not supported"); goto fail; diff --git a/drivers/net/qede/qede_ethdev.c b/drivers/net/qede/qede_ethdev.c index 82363e6..0b3046a 100644 --- a/drivers/net/qede/qede_ethdev.c +++ b/drivers/net/qede/qede_ethdev.c @@ -2737,8 +2737,7 @@ static int qedevf_eth_dev_pci_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver rte_qedevf_pmd = { .id_table = pci_id_qedevf_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = qedevf_eth_dev_pci_probe, .remove = qedevf_eth_dev_pci_remove, }; @@ -2757,8 +2756,7 @@ static int qede_eth_dev_pci_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver rte_qede_pmd = { .id_table = pci_id_qede_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = qede_eth_dev_pci_probe, .remove = qede_eth_dev_pci_remove, }; diff --git a/drivers/raw/ioat/ioat_rawdev.c b/drivers/raw/ioat/ioat_rawdev.c index d509b66..7270ad7 100644 --- a/drivers/raw/ioat/ioat_rawdev.c +++ b/drivers/raw/ioat/ioat_rawdev.c @@ -338,8 +338,7 @@ static struct rte_pci_driver ioat_pmd_drv = { .id_table = pci_id_ioat_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | - RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, .probe = ioat_rawdev_probe, .remove = ioat_rawdev_remove, }; diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c index 77f1be1..0459048 100644 --- a/lib/librte_eal/common/eal_common_bus.c +++ b/lib/librte_eal/common/eal_common_bus.c @@ -228,13 +228,37 @@ struct rte_bus * enum rte_iova_mode rte_bus_get_iommu_class(void) { - int mode = RTE_IOVA_DC; + enum rte_iova_mode mode = RTE_IOVA_DC; + bool buses_want_va = false; + bool buses_want_pa = false; struct rte_bus *bus; TAILQ_FOREACH(bus, &rte_bus_list, next) { + enum rte_iova_mode bus_iova_mode; - if (bus->get_iommu_class) - mode |= bus->get_iommu_class(); + if (bus->get_iommu_class == NULL) + continue; + + bus_iova_mode = bus->get_iommu_class(); + RTE_LOG(DEBUG, EAL, "Bus %s wants IOVA as '%s'\n", + bus->name, + bus_iova_mode == RTE_IOVA_DC ? "DC" : + (bus_iova_mode == RTE_IOVA_PA ? "PA" : "VA")); + if (bus_iova_mode == RTE_IOVA_PA) + buses_want_pa = true; + else if (bus_iova_mode == RTE_IOVA_VA) + buses_want_va = true; + } + if (buses_want_va && !buses_want_pa) { + mode = RTE_IOVA_VA; + } else if (buses_want_pa && !buses_want_va) { + mode = RTE_IOVA_PA; + } else { + mode = RTE_IOVA_DC; + if (buses_want_va) { + RTE_LOG(WARNING, EAL, "Some buses want 'VA' but forcing 'DC' because other buses want 'PA'.\n"); + RTE_LOG(WARNING, EAL, "Depending on the final decision by the EAL, not all buses may be able to initialize.\n"); + } } return mode; -- 1.8.3.1 ^ permalink raw reply [flat|nested] 57+ messages in thread
* [dpdk-dev] [PATCH v4 2/4] eal: fix IOVA mode selection as VA for PCI drivers 2019-07-22 12:56 ` [dpdk-dev] [PATCH v4 2/4] eal: fix IOVA mode selection as VA for PCI drivers David Marchand @ 2019-11-25 9:33 ` Ferruh Yigit 2019-11-25 10:22 ` Thomas Monjalon 2019-11-25 11:07 ` Jerin Jacob 0 siblings, 2 replies; 57+ messages in thread From: Ferruh Yigit @ 2019-11-25 9:33 UTC (permalink / raw) To: david.marchand Cc: ajit.khaparde, alejandro.lucero, anatoly.burakov, beilei.xing, bruce.richardson, dev, hyonkim, igor.russkikh, jerinj, jingjing.wu, john.mcnamara, johndale, kirankumark, konstantin.ananyev, marko.kovacevic, matan, ndabilpuram, pavel.belous, qi.z.zhang, qiming.yang, rmody, shahafs, shshaikh, somnath.kotur, thomas, viacheslavo, wenzhuo.lu, xiao.w.wang, yskoh On 7/22/2019 1:56 PM, David Marchand wrote: > The incriminated commit broke the use of RTE_PCI_DRV_IOVA_AS_VA which > was intended to mean "driver only supports VA" but had been understood > as "driver supports both PA and VA" by most net drivers and used to let > dpdk processes to run as non root (which do not have access to physical > addresses on recent kernels). > > The check on physical addresses actually closed the gap for those > drivers. We don't need to mark them with RTE_PCI_DRV_IOVA_AS_VA and this > flag can retain its intended meaning. > Document explicitly its meaning. > > We can check that a driver requirement wrt to IOVA mode is fulfilled > before trying to probe a device. > > Finally, document the heuristic used to select the IOVA mode and hope > that we won't break it again. > > Fixes: 703458e19c16 ("bus/pci: consider only usable devices for IOVA mode") > > Signed-off-by: David Marchand <david.marchand@redhat.com> > Reviewed-by: Jerin Jacob <jerinj@marvell.com> > Tested-by: Jerin Jacob <jerinj@marvell.com> > Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> <...> > diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c > index d2af472..9794552 100644 > --- a/drivers/bus/pci/pci_common.c > +++ b/drivers/bus/pci/pci_common.c > @@ -169,8 +169,22 @@ static struct rte_devargs *pci_devargs_lookup(struct rte_pci_device *dev) > * This needs to be before rte_pci_map_device(), as it enables to use > * driver flags for adjusting configuration. > */ > - if (!already_probed) > + if (!already_probed) { > + enum rte_iova_mode dev_iova_mode; > + enum rte_iova_mode iova_mode; > + > + dev_iova_mode = pci_device_iova_mode(dr, dev); > + iova_mode = rte_eal_iova_mode(); > + if (dev_iova_mode != RTE_IOVA_DC && > + dev_iova_mode != iova_mode) { > + RTE_LOG(ERR, EAL, " Expecting '%s' IOVA mode but current mode is '%s', not initializing\n", > + dev_iova_mode == RTE_IOVA_PA ? "PA" : "VA", > + iova_mode == RTE_IOVA_PA ? "PA" : "VA"); > + return -EINVAL; > + } > + OvS reported an error while hotplugging a device. It looks like DPDK application initialized as IOVA=VA, and the new device is bound to 'igb_uio' which forces it to PA, fails on above check. I would like to get your comment on the issue. For the OvS mode, hopefully binding the device to 'vfio-pci' can be a solution, but for the cases we don't have that option, can/should we force the DPDK to PA mode after initialization? ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [dpdk-dev] [PATCH v4 2/4] eal: fix IOVA mode selection as VA for PCI drivers 2019-11-25 9:33 ` Ferruh Yigit @ 2019-11-25 10:22 ` Thomas Monjalon 2019-11-25 12:03 ` Ferruh Yigit 2019-11-25 11:07 ` Jerin Jacob 1 sibling, 1 reply; 57+ messages in thread From: Thomas Monjalon @ 2019-11-25 10:22 UTC (permalink / raw) To: Ferruh Yigit Cc: david.marchand, ajit.khaparde, alejandro.lucero, anatoly.burakov, beilei.xing, bruce.richardson, dev, hyonkim, igor.russkikh, jerinj, jingjing.wu, john.mcnamara, johndale, kirankumark, konstantin.ananyev, marko.kovacevic, matan, ndabilpuram, pavel.belous, qi.z.zhang, qiming.yang, rmody, shahafs, shshaikh, somnath.kotur, viacheslavo, wenzhuo.lu, xiao.w.wang, yskoh 25/11/2019 10:33, Ferruh Yigit: > It looks like DPDK application initialized as IOVA=VA, > and the new device is bound to 'igb_uio' which forces it to PA, > fails on above check. Do you mean this use case was not tested earlier with DPDK 19.08? > I would like to get your comment on the issue. > > For the OvS mode, hopefully binding the device to 'vfio-pci' > can be a solution, but for the cases we don't have that option, > can/should we force the DPDK to PA mode after initialization? I think this is expected, because VA is the new default since 19.08: http://doc.dpdk.org/guides/rel_notes/release_19_08.html#new-features In case, there is no constraint on initialization, we have to decide which mode is preferred. Previously PA was preferred. For the sake of modernity (and because it fits with some new devices), the preference has been changed to VA. If igb_uio device is used at initialization, the PA mode should be used. If igb_uio (PA-only) device is hotplugged, no luck! If VA-only device is hotplugged, it works! I think this change is one step in deprecating igb_uio. ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [dpdk-dev] [PATCH v4 2/4] eal: fix IOVA mode selection as VA for PCI drivers 2019-11-25 10:22 ` Thomas Monjalon @ 2019-11-25 12:03 ` Ferruh Yigit 2019-11-25 12:36 ` David Marchand 0 siblings, 1 reply; 57+ messages in thread From: Ferruh Yigit @ 2019-11-25 12:03 UTC (permalink / raw) To: Thomas Monjalon Cc: david.marchand, ajit.khaparde, alejandro.lucero, anatoly.burakov, beilei.xing, bruce.richardson, dev, hyonkim, igor.russkikh, jerinj, jingjing.wu, john.mcnamara, johndale, kirankumark, konstantin.ananyev, marko.kovacevic, matan, ndabilpuram, pavel.belous, qi.z.zhang, qiming.yang, rmody, shahafs, shshaikh, somnath.kotur, viacheslavo, wenzhuo.lu, xiao.w.wang, yskoh On 11/25/2019 10:22 AM, Thomas Monjalon wrote: > 25/11/2019 10:33, Ferruh Yigit: >> It looks like DPDK application initialized as IOVA=VA, >> and the new device is bound to 'igb_uio' which forces it to PA, >> fails on above check. > > Do you mean this use case was not tested earlier with DPDK 19.08? Perhaps, just a guess, this can be side affect of only using LTS. > > >> I would like to get your comment on the issue. >> >> For the OvS mode, hopefully binding the device to 'vfio-pci' >> can be a solution, but for the cases we don't have that option, >> can/should we force the DPDK to PA mode after initialization? > > I think this is expected, because VA is the new default since 19.08: > http://doc.dpdk.org/guides/rel_notes/release_19_08.html#new-features > > In case, there is no constraint on initialization, > we have to decide which mode is preferred. > Previously PA was preferred. > For the sake of modernity (and because it fits with some new devices), > the preference has been changed to VA. > > If igb_uio device is used at initialization, > the PA mode should be used. > If igb_uio (PA-only) device is hotplugged, no luck! > If VA-only device is hotplugged, it works! > > I think this change is one step in deprecating igb_uio. > I just want to confirm/clarify that this behavior change is by design, not a defect. Should we document this behavior change more clearly, or highlight more, to not let catch others too? ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [dpdk-dev] [PATCH v4 2/4] eal: fix IOVA mode selection as VA for PCI drivers 2019-11-25 12:03 ` Ferruh Yigit @ 2019-11-25 12:36 ` David Marchand 2019-11-25 12:58 ` Burakov, Anatoly 0 siblings, 1 reply; 57+ messages in thread From: David Marchand @ 2019-11-25 12:36 UTC (permalink / raw) To: Ferruh Yigit Cc: Thomas Monjalon, Ajit Khaparde, Alejandro Lucero, Burakov, Anatoly, Beilei Xing, Bruce Richardson, dev, Hyong Youb Kim, Igor Russkikh, Jerin Jacob Kollanukkaran, Jingjing Wu, Mcnamara, John, John Daley, Kiran Kumar Kokkilagadda, Ananyev, Konstantin, Kovacevic, Marko, Matan Azrad, Nithin Dabilpuram, Pavel Belous, Qi Zhang, Qiming Yang, Rasesh Mody, Shahaf Shuler, Shahed Shaikh, Somnath Kotur, Viacheslav Ovsiienko, Wenzhuo Lu, Xiao Wang, Yongseok Koh On Mon, Nov 25, 2019 at 1:03 PM Ferruh Yigit <ferruh.yigit@intel.com> wrote: > >> I would like to get your comment on the issue. > >> > >> For the OvS mode, hopefully binding the device to 'vfio-pci' > >> can be a solution, but for the cases we don't have that option, > >> can/should we force the DPDK to PA mode after initialization? > > > > I think this is expected, because VA is the new default since 19.08: > > http://doc.dpdk.org/guides/rel_notes/release_19_08.html#new-features > > > > In case, there is no constraint on initialization, > > we have to decide which mode is preferred. > > Previously PA was preferred. > > For the sake of modernity (and because it fits with some new devices), > > the preference has been changed to VA. > > > > If igb_uio device is used at initialization, > > the PA mode should be used. > > If igb_uio (PA-only) device is hotplugged, no luck! > > If VA-only device is hotplugged, it works! > > > > I think this change is one step in deprecating igb_uio. > > > > I just want to confirm/clarify that this behavior change is by design, not a defect. > Should we document this behavior change more clearly, or highlight more, to not > let catch others too? The behavior change happened with: https://git.dpdk.org/dpdk/commit?id=bbe29a9bd7ab6feab9a52051c32092a94ee886eb And there is an entry in the 19.08 release notes: http://doc.dpdk.org/guides/rel_notes/release_19_08.html#new-features -- David Marchand ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [dpdk-dev] [PATCH v4 2/4] eal: fix IOVA mode selection as VA for PCI drivers 2019-11-25 12:36 ` David Marchand @ 2019-11-25 12:58 ` Burakov, Anatoly 2019-11-25 14:29 ` Thomas Monjalon 0 siblings, 1 reply; 57+ messages in thread From: Burakov, Anatoly @ 2019-11-25 12:58 UTC (permalink / raw) To: David Marchand, Ferruh Yigit Cc: Thomas Monjalon, Ajit Khaparde, Alejandro Lucero, Beilei Xing, Bruce Richardson, dev, Hyong Youb Kim, Igor Russkikh, Jerin Jacob Kollanukkaran, Jingjing Wu, Mcnamara, John, John Daley, Kiran Kumar Kokkilagadda, Ananyev, Konstantin, Kovacevic, Marko, Matan Azrad, Nithin Dabilpuram, Pavel Belous, Qi Zhang, Qiming Yang, Rasesh Mody, Shahaf Shuler, Shahed Shaikh, Somnath Kotur, Viacheslav Ovsiienko, Wenzhuo Lu, Xiao Wang, Yongseok Koh On 25-Nov-19 12:36 PM, David Marchand wrote: > On Mon, Nov 25, 2019 at 1:03 PM Ferruh Yigit <ferruh.yigit@intel.com> wrote: >>>> I would like to get your comment on the issue. >>>> >>>> For the OvS mode, hopefully binding the device to 'vfio-pci' >>>> can be a solution, but for the cases we don't have that option, >>>> can/should we force the DPDK to PA mode after initialization? >>> >>> I think this is expected, because VA is the new default since 19.08: >>> http://doc.dpdk.org/guides/rel_notes/release_19_08.html#new-features >>> >>> In case, there is no constraint on initialization, >>> we have to decide which mode is preferred. >>> Previously PA was preferred. >>> For the sake of modernity (and because it fits with some new devices), >>> the preference has been changed to VA. >>> >>> If igb_uio device is used at initialization, >>> the PA mode should be used. >>> If igb_uio (PA-only) device is hotplugged, no luck! >>> If VA-only device is hotplugged, it works! >>> >>> I think this change is one step in deprecating igb_uio. >>> >> >> I just want to confirm/clarify that this behavior change is by design, not a defect. >> Should we document this behavior change more clearly, or highlight more, to not >> let catch others too? > > The behavior change happened with: > https://git.dpdk.org/dpdk/commit?id=bbe29a9bd7ab6feab9a52051c32092a94ee886eb > And there is an entry in the 19.08 release notes: > http://doc.dpdk.org/guides/rel_notes/release_19_08.html#new-features > > Should we perhaps also provide LTS release notes, i.e. "all changes since last LTS"? I think it's unreasonable to expect people using LTS's to trace through every release notes between LTS's. -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [dpdk-dev] [PATCH v4 2/4] eal: fix IOVA mode selection as VA for PCI drivers 2019-11-25 12:58 ` Burakov, Anatoly @ 2019-11-25 14:29 ` Thomas Monjalon 0 siblings, 0 replies; 57+ messages in thread From: Thomas Monjalon @ 2019-11-25 14:29 UTC (permalink / raw) To: Burakov, Anatoly Cc: David Marchand, Ferruh Yigit, Ajit Khaparde, Beilei Xing, Bruce Richardson, dev, Hyong Youb Kim, Igor Russkikh, Jerin Jacob Kollanukkaran, Jingjing Wu, Mcnamara, John, John Daley, Kiran Kumar Kokkilagadda, Ananyev, Konstantin, Kovacevic, Marko, Matan Azrad, Nithin Dabilpuram, Pavel Belous, Qi Zhang, Qiming Yang, Rasesh Mody, Shahaf Shuler, Shahed Shaikh, Somnath Kotur, Viacheslav Ovsiienko, Wenzhuo Lu, Xiao Wang, Yongseok Koh 25/11/2019 13:58, Burakov, Anatoly: > On 25-Nov-19 12:36 PM, David Marchand wrote: > > On Mon, Nov 25, 2019 at 1:03 PM Ferruh Yigit <ferruh.yigit@intel.com> wrote: > >>>> I would like to get your comment on the issue. > >>>> > >>>> For the OvS mode, hopefully binding the device to 'vfio-pci' > >>>> can be a solution, but for the cases we don't have that option, > >>>> can/should we force the DPDK to PA mode after initialization? > >>> > >>> I think this is expected, because VA is the new default since 19.08: > >>> http://doc.dpdk.org/guides/rel_notes/release_19_08.html#new-features > >>> > >>> In case, there is no constraint on initialization, > >>> we have to decide which mode is preferred. > >>> Previously PA was preferred. > >>> For the sake of modernity (and because it fits with some new devices), > >>> the preference has been changed to VA. > >>> > >>> If igb_uio device is used at initialization, > >>> the PA mode should be used. > >>> If igb_uio (PA-only) device is hotplugged, no luck! > >>> If VA-only device is hotplugged, it works! > >>> > >>> I think this change is one step in deprecating igb_uio. > >>> > >> > >> I just want to confirm/clarify that this behavior change is by design, not a defect. > >> Should we document this behavior change more clearly, or highlight more, to not > >> let catch others too? > > > > The behavior change happened with: > > https://git.dpdk.org/dpdk/commit?id=bbe29a9bd7ab6feab9a52051c32092a94ee886eb > > And there is an entry in the 19.08 release notes: > > http://doc.dpdk.org/guides/rel_notes/release_19_08.html#new-features > > > > > > Should we perhaps also provide LTS release notes, i.e. "all changes > since last LTS"? I think it's unreasonable to expect people using LTS's > to trace through every release notes between LTS's. I don't think it is unreasonable. There are only 4 releases since last LTS. But I am OK for switching to 3 releases per year :) ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [dpdk-dev] [PATCH v4 2/4] eal: fix IOVA mode selection as VA for PCI drivers 2019-11-25 9:33 ` Ferruh Yigit 2019-11-25 10:22 ` Thomas Monjalon @ 2019-11-25 11:07 ` Jerin Jacob 1 sibling, 0 replies; 57+ messages in thread From: Jerin Jacob @ 2019-11-25 11:07 UTC (permalink / raw) To: Ferruh Yigit Cc: David Marchand, Ajit Khaparde, Alejandro Lucero, Anatoly Burakov, Beilei Xing, Richardson, Bruce, dpdk-dev, Hyong Youb Kim, igor.russkikh, Jerin Jacob, Jingjing Wu, John McNamara, John Daley, Kiran Kumar K, Ananyev, Konstantin, Marko Kovacevic, Matan Azrad, Nithin Dabilpuram, pavel.belous, Qi Zhang, Qiming Yang, Rasesh Mody, Shahaf Shuler, Shahed Shaikh, Somnath Kotur, Thomas Monjalon, Slava Ovsiienko, Wenzhuo Lu, Xiao Wang, Yongseok Koh On Mon, Nov 25, 2019 at 6:33 PM Ferruh Yigit <ferruh.yigit@intel.com> wrote: > > > On 7/22/2019 1:56 PM, David Marchand wrote: > > The incriminated commit broke the use of RTE_PCI_DRV_IOVA_AS_VA which > > was intended to mean "driver only supports VA" but had been understood > > as "driver supports both PA and VA" by most net drivers and used to let > > dpdk processes to run as non root (which do not have access to physical > > addresses on recent kernels). > > > > The check on physical addresses actually closed the gap for those > > drivers. We don't need to mark them with RTE_PCI_DRV_IOVA_AS_VA and this > > flag can retain its intended meaning. > > Document explicitly its meaning. > > > > We can check that a driver requirement wrt to IOVA mode is fulfilled > > before trying to probe a device. > > > > Finally, document the heuristic used to select the IOVA mode and hope > > that we won't break it again. > > > > Fixes: 703458e19c16 ("bus/pci: consider only usable devices for IOVA mode") > > > > Signed-off-by: David Marchand <david.marchand@redhat.com> > > Reviewed-by: Jerin Jacob <jerinj@marvell.com> > > Tested-by: Jerin Jacob <jerinj@marvell.com> > > Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> > <...> > > > diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c > > index d2af472..9794552 100644 > > --- a/drivers/bus/pci/pci_common.c > > +++ b/drivers/bus/pci/pci_common.c > > @@ -169,8 +169,22 @@ static struct rte_devargs *pci_devargs_lookup(struct rte_pci_device *dev) > > * This needs to be before rte_pci_map_device(), as it enables to use > > * driver flags for adjusting configuration. > > */ > > - if (!already_probed) > > + if (!already_probed) { > > + enum rte_iova_mode dev_iova_mode; > > + enum rte_iova_mode iova_mode; > > + > > + dev_iova_mode = pci_device_iova_mode(dr, dev); > > + iova_mode = rte_eal_iova_mode(); > > + if (dev_iova_mode != RTE_IOVA_DC && > > + dev_iova_mode != iova_mode) { > > + RTE_LOG(ERR, EAL, " Expecting '%s' IOVA mode but current mode is '%s', not initializing\n", > > + dev_iova_mode == RTE_IOVA_PA ? "PA" : "VA", > > + iova_mode == RTE_IOVA_PA ? "PA" : "VA"); > > + return -EINVAL; > > + } > > + > > OvS reported an error while hotplugging a device. > > It looks like DPDK application initialized as IOVA=VA, and the new device is bound to 'igb_uio' which forces it to PA, fails on above check. Why they are binding to igb_uio if there is NO need for it? > > I would like to get your comment on the issue. > > For the OvS mode, hopefully binding the device to 'vfio-pci' can be a solution, but for the cases we don't have that option, can/should we force the DPDK to PA mode after initialization? On the other side, If we are forcing DPDK to PA then the same fate will be for VFIO only devices. There are two cases: 1) The system has a limitation on the specific mode 2) The devices have a limitation on the specific mod.e. Case (1), It is not applicable for hotplug cases as the system can run only one mode. We should able to detect in the first pass(Before the hotplug devices runs) Case (2), Is there any devices that can work ONLY in IOVA as PA mode? If yes, Please enumerate. Maybe something in the storage domain. > ^ permalink raw reply [flat|nested] 57+ messages in thread
* [dpdk-dev] [PATCH v4 3/4] drivers: change IOVA as VA PCI flag name 2019-07-22 12:56 ` [dpdk-dev] [PATCH v4 " David Marchand 2019-07-22 12:56 ` [dpdk-dev] [PATCH v4 1/4] Revert "bus/pci: add Mellanox kernel driver type" David Marchand 2019-07-22 12:56 ` [dpdk-dev] [PATCH v4 2/4] eal: fix IOVA mode selection as VA for PCI drivers David Marchand @ 2019-07-22 12:56 ` David Marchand 2019-07-22 12:56 ` [dpdk-dev] [PATCH v4 4/4] eal: select IOVA as VA mode for default case David Marchand 2019-07-22 15:53 ` [dpdk-dev] [PATCH v4 0/4] Fixes on IOVA mode selection Thomas Monjalon 4 siblings, 0 replies; 57+ messages in thread From: David Marchand @ 2019-07-22 12:56 UTC (permalink / raw) To: dev Cc: anatoly.burakov, jerinj, thomas, Pavan Nikhilesh, Nithin Dabilpuram, Vamsi Attunuru, Kiran Kumar K, Satha Rao From: Jerin Jacob <jerinj@marvell.com> In order to align name with other PCI driver flag such as RTE_PCI_DRV_NEED_MAPPING and to reflect its purpose, change RTE_PCI_DRV_IOVA_AS_VA flag name as RTE_PCI_DRV_NEED_IOVA_AS_VA. Signed-off-by: Jerin Jacob <jerinj@marvell.com> Signed-off-by: David Marchand <david.marchand@redhat.com> --- Changelog since v3: - updated title, --- drivers/bus/pci/linux/pci.c | 4 ++-- drivers/bus/pci/rte_bus_pci.h | 4 ++-- drivers/event/octeontx/timvf_probe.c | 2 +- drivers/event/octeontx2/otx2_evdev.c | 2 +- drivers/mempool/octeontx/octeontx_fpavf.c | 2 +- drivers/mempool/octeontx2/otx2_mempool.c | 2 +- drivers/net/octeontx2/otx2_ethdev.c | 2 +- drivers/raw/octeontx2_dma/otx2_dpi_rawdev.c | 2 +- 8 files changed, 10 insertions(+), 10 deletions(-) diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c index 1a2f99b..1d8d20d 100644 --- a/drivers/bus/pci/linux/pci.c +++ b/drivers/bus/pci/linux/pci.c @@ -580,7 +580,7 @@ enum rte_iova_mode } if (is_vfio_noiommu_enabled != 0) iova_mode = RTE_IOVA_PA; - else if ((pdrv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) != 0) + else if ((pdrv->drv_flags & RTE_PCI_DRV_NEED_IOVA_AS_VA) != 0) iova_mode = RTE_IOVA_VA; #endif break; @@ -592,7 +592,7 @@ enum rte_iova_mode break; default: - if ((pdrv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) != 0) + if ((pdrv->drv_flags & RTE_PCI_DRV_NEED_IOVA_AS_VA) != 0) iova_mode = RTE_IOVA_VA; break; } diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h index 0f21775..29bea6d 100644 --- a/drivers/bus/pci/rte_bus_pci.h +++ b/drivers/bus/pci/rte_bus_pci.h @@ -187,8 +187,8 @@ struct rte_pci_bus { #define RTE_PCI_DRV_INTR_RMV 0x0010 /** Device driver needs to keep mapped resources if unsupported dev detected */ #define RTE_PCI_DRV_KEEP_MAPPED_RES 0x0020 -/** Device driver only supports IOVA as VA and cannot work with IOVA as PA */ -#define RTE_PCI_DRV_IOVA_AS_VA 0x0040 +/** Device driver needs IOVA as VA and cannot work with IOVA as PA */ +#define RTE_PCI_DRV_NEED_IOVA_AS_VA 0x0040 /** * Map the PCI device resources in user space virtual memory address diff --git a/drivers/event/octeontx/timvf_probe.c b/drivers/event/octeontx/timvf_probe.c index 08dbd2b..af87625 100644 --- a/drivers/event/octeontx/timvf_probe.c +++ b/drivers/event/octeontx/timvf_probe.c @@ -140,7 +140,7 @@ struct timdev { static struct rte_pci_driver pci_timvf = { .id_table = pci_timvf_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_NEED_IOVA_AS_VA, .probe = timvf_probe, .remove = NULL, }; diff --git a/drivers/event/octeontx2/otx2_evdev.c b/drivers/event/octeontx2/otx2_evdev.c index 56716c2..e6379e3 100644 --- a/drivers/event/octeontx2/otx2_evdev.c +++ b/drivers/event/octeontx2/otx2_evdev.c @@ -1630,7 +1630,7 @@ static struct rte_pci_driver pci_sso = { .id_table = pci_sso_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_NEED_IOVA_AS_VA, .probe = otx2_sso_probe, .remove = otx2_sso_remove, }; diff --git a/drivers/mempool/octeontx/octeontx_fpavf.c b/drivers/mempool/octeontx/octeontx_fpavf.c index 4cf387e..baabc01 100644 --- a/drivers/mempool/octeontx/octeontx_fpavf.c +++ b/drivers/mempool/octeontx/octeontx_fpavf.c @@ -799,7 +799,7 @@ struct octeontx_fpadev { static struct rte_pci_driver pci_fpavf = { .id_table = pci_fpavf_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_NEED_IOVA_AS_VA, .probe = fpavf_probe, }; diff --git a/drivers/mempool/octeontx2/otx2_mempool.c b/drivers/mempool/octeontx2/otx2_mempool.c index 9a5f11c..3a4a942 100644 --- a/drivers/mempool/octeontx2/otx2_mempool.c +++ b/drivers/mempool/octeontx2/otx2_mempool.c @@ -443,7 +443,7 @@ static struct rte_pci_driver pci_npa = { .id_table = pci_npa_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_NEED_IOVA_AS_VA, .probe = npa_probe, .remove = npa_remove, }; diff --git a/drivers/net/octeontx2/otx2_ethdev.c b/drivers/net/octeontx2/otx2_ethdev.c index 5ec5551..7b91f6b 100644 --- a/drivers/net/octeontx2/otx2_ethdev.c +++ b/drivers/net/octeontx2/otx2_ethdev.c @@ -2001,7 +2001,7 @@ static struct rte_pci_driver pci_nix = { .id_table = pci_nix_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_IOVA_AS_VA | + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_NEED_IOVA_AS_VA | RTE_PCI_DRV_INTR_LSC, .probe = nix_probe, .remove = nix_remove, diff --git a/drivers/raw/octeontx2_dma/otx2_dpi_rawdev.c b/drivers/raw/octeontx2_dma/otx2_dpi_rawdev.c index 6a1b436..e398abb 100644 --- a/drivers/raw/octeontx2_dma/otx2_dpi_rawdev.c +++ b/drivers/raw/octeontx2_dma/otx2_dpi_rawdev.c @@ -427,7 +427,7 @@ static struct rte_pci_driver rte_dpi_rawdev_pmd = { .id_table = pci_dma_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_IOVA_AS_VA, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_NEED_IOVA_AS_VA, .probe = otx2_dpi_rawdev_probe, .remove = otx2_dpi_rawdev_remove, }; -- 1.8.3.1 ^ permalink raw reply [flat|nested] 57+ messages in thread
* [dpdk-dev] [PATCH v4 4/4] eal: select IOVA as VA mode for default case 2019-07-22 12:56 ` [dpdk-dev] [PATCH v4 " David Marchand ` (2 preceding siblings ...) 2019-07-22 12:56 ` [dpdk-dev] [PATCH v4 3/4] drivers: change IOVA as VA PCI flag name David Marchand @ 2019-07-22 12:56 ` David Marchand 2019-07-22 15:53 ` [dpdk-dev] [PATCH v4 0/4] Fixes on IOVA mode selection Thomas Monjalon 4 siblings, 0 replies; 57+ messages in thread From: David Marchand @ 2019-07-22 12:56 UTC (permalink / raw) To: dev; +Cc: anatoly.burakov, jerinj, thomas, John McNamara, Marko Kovacevic From: Jerin Jacob <jerinj@marvell.com> When bus layer reports the preferred mode as RTE_IOVA_DC then select the RTE_IOVA_VA mode: - All drivers work in RTE_IOVA_VA mode, irrespective of physical address availability. - By default, a mempool asks for IOVA-contiguous memory using RTE_MEMZONE_IOVA_CONTIG. This is slow in RTE_IOVA_PA mode and it may affect the application boot time. Signed-off-by: Jerin Jacob <jerinj@marvell.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> Signed-off-by: David Marchand <david.marchand@redhat.com> --- Changelog since v3: - moved the explanations on RTE_IOVA_VA choice in the note section, - reworded the comments on the PCI driver flag, --- doc/guides/prog_guide/env_abstraction_layer.rst | 22 ++++++++++++++++++++-- lib/librte_eal/linux/eal/eal.c | 6 ++---- 2 files changed, 22 insertions(+), 6 deletions(-) diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst index 1d63675..1487ea5 100644 --- a/doc/guides/prog_guide/env_abstraction_layer.rst +++ b/doc/guides/prog_guide/env_abstraction_layer.rst @@ -445,11 +445,29 @@ kernels. - if the preferred mode is RTE_IOVA_PA but there is no access to Physical Addresses, then EAL init fails early, since later probing of the devices would fail anyway, -- if the preferred mode is RTE_IOVA_DC then based on the Physical Addresses - availability, the preferred mode is adjusted to RTE_IOVA_PA or RTE_IOVA_VA. +- if the preferred mode is RTE_IOVA_DC then EAL selects the RTE_IOVA_VA mode. In the case when the buses had disagreed on the IOVA Mode at the first step, part of the buses won't work because of this decision. +.. note:: + + The RTE_IOVA_VA mode is selected as the default for the following reasons: + + - All drivers are expected to work in RTE_IOVA_VA mode, irrespective of + physical address availability. + - By default, the mempool, first asks for IOVA-contiguous memory using + ``RTE_MEMZONE_IOVA_CONTIG``. This is slow in RTE_IOVA_PA mode and it may + affect the application boot time. + - It is easy to enable large amount of IOVA-contiguous memory use-cases + with IOVA in VA mode. + + It is expected that all PCI drivers work in both RTE_IOVA_PA and + RTE_IOVA_VA modes. + + If a PCI driver does not support RTE_IOVA_PA mode, the + ``RTE_PCI_DRV_NEED_IOVA_AS_VA`` flag is used to dictate that this PCI + driver can only work in RTE_IOVA_VA mode. + IOVA Mode Configuration ~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/lib/librte_eal/linux/eal/eal.c b/lib/librte_eal/linux/eal/eal.c index 2e5499f..34db787 100644 --- a/lib/librte_eal/linux/eal/eal.c +++ b/lib/librte_eal/linux/eal/eal.c @@ -1061,10 +1061,8 @@ static void rte_eal_init_alert(const char *msg) enum rte_iova_mode iova_mode = rte_bus_get_iommu_class(); if (iova_mode == RTE_IOVA_DC) { - iova_mode = phys_addrs ? RTE_IOVA_PA : RTE_IOVA_VA; - RTE_LOG(DEBUG, EAL, - "Buses did not request a specific IOVA mode, using '%s' based on physical addresses availability.\n", - phys_addrs ? "PA" : "VA"); + iova_mode = RTE_IOVA_VA; + RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode, select IOVA as VA mode.\n"); } #ifdef RTE_LIBRTE_KNI /* Workaround for KNI which requires physical address to work */ -- 1.8.3.1 ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [dpdk-dev] [PATCH v4 0/4] Fixes on IOVA mode selection 2019-07-22 12:56 ` [dpdk-dev] [PATCH v4 " David Marchand ` (3 preceding siblings ...) 2019-07-22 12:56 ` [dpdk-dev] [PATCH v4 4/4] eal: select IOVA as VA mode for default case David Marchand @ 2019-07-22 15:53 ` Thomas Monjalon 2019-07-23 3:35 ` Stojaczyk, Dariusz 4 siblings, 1 reply; 57+ messages in thread From: Thomas Monjalon @ 2019-07-22 15:53 UTC (permalink / raw) To: David Marchand, anatoly.burakov, jerinj; +Cc: dev 22/07/2019 14:56, David Marchand: > Following the issues reported by Jerin and the discussion that emerged > from it, here are fixes to restore and document the behavior of the EAL > and the pci bus driver. > > I pondered all the arguments and tried to have the less changes > possible. > I can't find a need for a flag to just announce support of physical > addresses from the pmd point of view. > So it ended up with something really close to what Jerin had suggested. > > But the problem is that this is still unfinished wrt the documentation. > I will be offline for 10 days and we need this to move forward, so > sending > anyway. > > Changelog since v3: > - fixed typos in patch 2, > - updated patch 3 title, > - moved and reworded comments in the note section in patch 4, > > Changelog since v2 (Jerin): > - Patch 2/4 - Remove personal appeals in log messages(Anatoly) > - Patch 4/4 - Added documentation (Anatoly) > > Changelog since v1 (Jerin): > - Changed RTE_PCI_DRV_IOVA_AS_VA flag name as RTE_PCI_DRV_NEED_IOVA_AS_VA > (patch 3/4) > - Changed IOVA mode as VA for default case(patch 4/4) with documentation > - Tested the patch series on octeontx2 platform Applied, thanks Jerin, Anatoly and David for converging on a documented solution together. ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [dpdk-dev] [PATCH v4 0/4] Fixes on IOVA mode selection 2019-07-22 15:53 ` [dpdk-dev] [PATCH v4 0/4] Fixes on IOVA mode selection Thomas Monjalon @ 2019-07-23 3:35 ` Stojaczyk, Dariusz 2019-07-23 4:18 ` Jerin Jacob Kollanukkaran 0 siblings, 1 reply; 57+ messages in thread From: Stojaczyk, Dariusz @ 2019-07-23 3:35 UTC (permalink / raw) To: Thomas Monjalon, David Marchand, Burakov, Anatoly, jerinj; +Cc: dev This introduces a regression where uio-bound devies are attached to a DPDK app at runtime. When there are no devices attached at initialization, the only safe default should be RTE_IOVA_PA. With RTE_IOVA_VA we just won't be able to do any DMA to uio-bound PCI devices. Can we revert this patch? D. > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas Monjalon > Sent: Monday, July 22, 2019 5:53 PM > To: David Marchand <david.marchand@redhat.com>; Burakov, Anatoly > <anatoly.burakov@intel.com>; jerinj@marvell.com > Cc: dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH v4 0/4] Fixes on IOVA mode selection > > 22/07/2019 14:56, David Marchand: > > Following the issues reported by Jerin and the discussion that emerged > > from it, here are fixes to restore and document the behavior of the EAL > > and the pci bus driver. > > > > I pondered all the arguments and tried to have the less changes > > possible. > > I can't find a need for a flag to just announce support of physical > > addresses from the pmd point of view. > > So it ended up with something really close to what Jerin had suggested. > > > > But the problem is that this is still unfinished wrt the documentation. > > I will be offline for 10 days and we need this to move forward, so > > sending > > anyway. > > > > Changelog since v3: > > - fixed typos in patch 2, > > - updated patch 3 title, > > - moved and reworded comments in the note section in patch 4, > > > > Changelog since v2 (Jerin): > > - Patch 2/4 - Remove personal appeals in log messages(Anatoly) > > - Patch 4/4 - Added documentation (Anatoly) > > > > Changelog since v1 (Jerin): > > - Changed RTE_PCI_DRV_IOVA_AS_VA flag name as > RTE_PCI_DRV_NEED_IOVA_AS_VA > > (patch 3/4) > > - Changed IOVA mode as VA for default case(patch 4/4) with > documentation > > - Tested the patch series on octeontx2 platform > > Applied, thanks Jerin, Anatoly and David for converging > on a documented solution together. > > ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [dpdk-dev] [PATCH v4 0/4] Fixes on IOVA mode selection 2019-07-23 3:35 ` Stojaczyk, Dariusz @ 2019-07-23 4:18 ` Jerin Jacob Kollanukkaran 2019-07-23 4:54 ` Stojaczyk, Dariusz 0 siblings, 1 reply; 57+ messages in thread From: Jerin Jacob Kollanukkaran @ 2019-07-23 4:18 UTC (permalink / raw) To: Stojaczyk, Dariusz, Thomas Monjalon, David Marchand, Burakov, Anatoly; +Cc: dev > -----Original Message----- > From: Stojaczyk, Dariusz <dariusz.stojaczyk@intel.com> > Sent: Tuesday, July 23, 2019 9:06 AM > To: Thomas Monjalon <thomas@monjalon.net>; David Marchand > <david.marchand@redhat.com>; Burakov, Anatoly > <anatoly.burakov@intel.com>; Jerin Jacob Kollanukkaran > <jerinj@marvell.com> > Cc: dev@dpdk.org > Subject: [EXT] RE: [dpdk-dev] [PATCH v4 0/4] Fixes on IOVA mode selection > > This introduces a regression where uio-bound devies are attached to a DPDK > app at runtime. Just to understand the requirements; # Is this requirement for SPDK? # Is brand new PCI device scanned and attached to DPDK at runtime? # Any specific reason for using uio vs vfio? If it is for SPDK, # How about introducing rte_eal_init_with_mode(enum rte_iova_mode)? # How about adding dummy bus which returns RTE_IOVA_PA in the bus_get_iommus_class() in SPDK code base? > > When there are no devices attached at initialization, the only safe default > should be RTE_IOVA_PA. With RTE_IOVA_VA we just won't be able to do > any DMA to uio-bound PCI devices. > > Can we revert this patch? > > D. > > > -----Original Message----- > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas Monjalon > > Sent: Monday, July 22, 2019 5:53 PM > > To: David Marchand <david.marchand@redhat.com>; Burakov, Anatoly > > <anatoly.burakov@intel.com>; jerinj@marvell.com > > Cc: dev@dpdk.org > > Subject: Re: [dpdk-dev] [PATCH v4 0/4] Fixes on IOVA mode selection > > > > 22/07/2019 14:56, David Marchand: > > > Following the issues reported by Jerin and the discussion that > > > emerged from it, here are fixes to restore and document the behavior > > > of the EAL and the pci bus driver. > > > > > > I pondered all the arguments and tried to have the less changes > > > possible. > > > I can't find a need for a flag to just announce support of physical > > > addresses from the pmd point of view. > > > So it ended up with something really close to what Jerin had suggested. > > > > > > But the problem is that this is still unfinished wrt the documentation. > > > I will be offline for 10 days and we need this to move forward, so > > > sending anyway. > > > > > > Changelog since v3: > > > - fixed typos in patch 2, > > > - updated patch 3 title, > > > - moved and reworded comments in the note section in patch 4, > > > > > > Changelog since v2 (Jerin): > > > - Patch 2/4 - Remove personal appeals in log messages(Anatoly) > > > - Patch 4/4 - Added documentation (Anatoly) > > > > > > Changelog since v1 (Jerin): > > > - Changed RTE_PCI_DRV_IOVA_AS_VA flag name as > > RTE_PCI_DRV_NEED_IOVA_AS_VA > > > (patch 3/4) > > > - Changed IOVA mode as VA for default case(patch 4/4) with > > documentation > > > - Tested the patch series on octeontx2 platform > > > > Applied, thanks Jerin, Anatoly and David for converging on a > > documented solution together. > > > > ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [dpdk-dev] [PATCH v4 0/4] Fixes on IOVA mode selection 2019-07-23 4:18 ` Jerin Jacob Kollanukkaran @ 2019-07-23 4:54 ` Stojaczyk, Dariusz 2019-07-23 5:27 ` Jerin Jacob Kollanukkaran 0 siblings, 1 reply; 57+ messages in thread From: Stojaczyk, Dariusz @ 2019-07-23 4:54 UTC (permalink / raw) To: Jerin Jacob Kollanukkaran, Thomas Monjalon, David Marchand, Burakov, Anatoly Cc: dev > -----Original Message----- > From: Jerin Jacob Kollanukkaran [mailto:jerinj@marvell.com] > Sent: Tuesday, July 23, 2019 6:19 AM > > > -----Original Message----- > > From: Stojaczyk, Dariusz <dariusz.stojaczyk@intel.com> > > Sent: Tuesday, July 23, 2019 9:06 AM > > To: Thomas Monjalon <thomas@monjalon.net>; David Marchand > > <david.marchand@redhat.com>; Burakov, Anatoly > > <anatoly.burakov@intel.com>; Jerin Jacob Kollanukkaran > > <jerinj@marvell.com> > > Cc: dev@dpdk.org > > Subject: [EXT] RE: [dpdk-dev] [PATCH v4 0/4] Fixes on IOVA mode selection > > > > This introduces a regression where uio-bound devies are attached to a > DPDK > > app at runtime. > > Just to understand the requirements; > # Is this requirement for SPDK? > # Is brand new PCI device scanned and attached to DPDK at runtime? > # Any specific reason for using uio vs vfio? Jerin, It came up in SPDK tests, but it's certainly nothing SPDK-specific, I can't give you the steps but it should be reproducible even with testpmd. The PCI device could have been simply hotplugged to the system after DPDK app start. DPDK didn't know about it at initialization, so it picked RTE_IOVA_VA and then would fail to attach any UIO-bound device ever after: EAL: Expecting 'PA' IOVA mode but current mode is 'VA', not initializing EAL: Driver cannot attach the device (0000:00:09.0) EAL: Failed to attach device on primary process UIO is commonly used on systems without IOMMU- including VMs. > > If it is for SPDK, > # How about introducing rte_eal_init_with_mode(enum rte_iova_mode)? > # How about adding dummy bus which returns RTE_IOVA_PA in the > bus_get_iommus_class() in SPDK code base? There's already an --iova=mode option in DPDK that forces the iova mode. I'm not concerned about configurability, but the regression in the default behavior. I can add workarounds to SPDK, sure, but that wouldn't be a very healthy approach. D. > > > > > When there are no devices attached at initialization, the only safe default > > should be RTE_IOVA_PA. With RTE_IOVA_VA we just won't be able to do > > any DMA to uio-bound PCI devices. > > > > Can we revert this patch? > > > > D. > > > > > -----Original Message----- > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas > Monjalon > > > Sent: Monday, July 22, 2019 5:53 PM > > > To: David Marchand <david.marchand@redhat.com>; Burakov, Anatoly > > > <anatoly.burakov@intel.com>; jerinj@marvell.com > > > Cc: dev@dpdk.org > > > Subject: Re: [dpdk-dev] [PATCH v4 0/4] Fixes on IOVA mode selection > > > > > > 22/07/2019 14:56, David Marchand: > > > > Following the issues reported by Jerin and the discussion that > > > > emerged from it, here are fixes to restore and document the behavior > > > > of the EAL and the pci bus driver. > > > > > > > > I pondered all the arguments and tried to have the less changes > > > > possible. > > > > I can't find a need for a flag to just announce support of physical > > > > addresses from the pmd point of view. > > > > So it ended up with something really close to what Jerin had suggested. > > > > > > > > But the problem is that this is still unfinished wrt the documentation. > > > > I will be offline for 10 days and we need this to move forward, so > > > > sending anyway. > > > > > > > > Changelog since v3: > > > > - fixed typos in patch 2, > > > > - updated patch 3 title, > > > > - moved and reworded comments in the note section in patch 4, > > > > > > > > Changelog since v2 (Jerin): > > > > - Patch 2/4 - Remove personal appeals in log messages(Anatoly) > > > > - Patch 4/4 - Added documentation (Anatoly) > > > > > > > > Changelog since v1 (Jerin): > > > > - Changed RTE_PCI_DRV_IOVA_AS_VA flag name as > > > RTE_PCI_DRV_NEED_IOVA_AS_VA > > > > (patch 3/4) > > > > - Changed IOVA mode as VA for default case(patch 4/4) with > > > documentation > > > > - Tested the patch series on octeontx2 platform > > > > > > Applied, thanks Jerin, Anatoly and David for converging on a > > > documented solution together. > > > > > > ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [dpdk-dev] [PATCH v4 0/4] Fixes on IOVA mode selection 2019-07-23 4:54 ` Stojaczyk, Dariusz @ 2019-07-23 5:27 ` Jerin Jacob Kollanukkaran 2019-07-23 7:21 ` Thomas Monjalon 2019-07-23 9:57 ` Burakov, Anatoly 0 siblings, 2 replies; 57+ messages in thread From: Jerin Jacob Kollanukkaran @ 2019-07-23 5:27 UTC (permalink / raw) To: Stojaczyk, Dariusz, Thomas Monjalon, David Marchand, Burakov, Anatoly; +Cc: dev > -----Original Message----- > From: Stojaczyk, Dariusz <dariusz.stojaczyk@intel.com> > Sent: Tuesday, July 23, 2019 10:24 AM > To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Thomas Monjalon > <thomas@monjalon.net>; David Marchand <david.marchand@redhat.com>; > Burakov, Anatoly <anatoly.burakov@intel.com> > Cc: dev@dpdk.org > Subject: [EXT] RE: [dpdk-dev] [PATCH v4 0/4] Fixes on IOVA mode selection > > > -----Original Message----- > > From: Jerin Jacob Kollanukkaran [mailto:jerinj@marvell.com] > > Sent: Tuesday, July 23, 2019 6:19 AM > > > > > -----Original Message----- > > > From: Stojaczyk, Dariusz <dariusz.stojaczyk@intel.com> > > > Sent: Tuesday, July 23, 2019 9:06 AM > > > To: Thomas Monjalon <thomas@monjalon.net>; David Marchand > > > <david.marchand@redhat.com>; Burakov, Anatoly > > > <anatoly.burakov@intel.com>; Jerin Jacob Kollanukkaran > > > <jerinj@marvell.com> > > > Cc: dev@dpdk.org > > > Subject: [EXT] RE: [dpdk-dev] [PATCH v4 0/4] Fixes on IOVA mode > > > selection > > > > > > This introduces a regression where uio-bound devies are attached to > > > a > > DPDK > > > app at runtime. > > > > Just to understand the requirements; > > # Is this requirement for SPDK? > > # Is brand new PCI device scanned and attached to DPDK at runtime? > > # Any specific reason for using uio vs vfio? > > Jerin, Stojaczyk, There reason to choose VA incase if bus detects DC is following: - All drivers are expected to work in RTE_IOVA_VA mode, irrespective of physical address availability. - By default, the mempool, first asks for IOVA-contiguous memory using ``RTE_MEMZONE_IOVA_CONTIG``. This is slow in RTE_IOVA_PA mode and it may affect the application boot time. - It is easy to enable large amount of IOVA-contiguous memory use-cases with IOVA in VA mode. > > It came up in SPDK tests, but it's certainly nothing SPDK-specific, I can't give > you the steps but it should be reproducible even with testpmd. > > The PCI device could have been simply hotplugged to the system after DPDK > app start. DPDK didn't know about it at initialization, so it picked > RTE_IOVA_VA and then would fail to attach any UIO-bound device ever > after: > > EAL: Expecting 'PA' IOVA mode but current mode is 'VA', not initializing We have RTE_PCI_DRV_NEED_IOVA_AS_VA devices in DPDK, Which can work Only on VA. If we default 'PA' incase of DC, then what do with hotplugging on those devices? > EAL: Driver cannot attach the device (0000:00:09.0) > EAL: Failed to attach device on primary process > > UIO is commonly used on systems without IOMMU- including VMs. The latest machines has IOMMU. Which machines you are testing against, Can we detect the machines without IOMMU and switch to PA? > > > > > If it is for SPDK, > > # How about introducing rte_eal_init_with_mode(enum rte_iova_mode)? > > # How about adding dummy bus which returns RTE_IOVA_PA in the > > bus_get_iommus_class() in SPDK code base? > > There's already an --iova=mode option in DPDK that forces the iova mode. > I'm not concerned about configurability, but the regression in the default > behavior. > > I can add workarounds to SPDK, sure, but that wouldn't be a very healthy > approach. Nothing like workaround, I am looking for the options for expressing The requirements for PA? > > D. > > > > > > > > > When there are no devices attached at initialization, the only safe > > > default should be RTE_IOVA_PA. With RTE_IOVA_VA we just won't be > > > able to do any DMA to uio-bound PCI devices. > > > > > > Can we revert this patch? > > > > > > D. > > > > > > > -----Original Message----- > > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas > > Monjalon > > > > Sent: Monday, July 22, 2019 5:53 PM > > > > To: David Marchand <david.marchand@redhat.com>; Burakov, Anatoly > > > > <anatoly.burakov@intel.com>; jerinj@marvell.com > > > > Cc: dev@dpdk.org > > > > Subject: Re: [dpdk-dev] [PATCH v4 0/4] Fixes on IOVA mode > > > > selection > > > > > > > > 22/07/2019 14:56, David Marchand: > > > > > Following the issues reported by Jerin and the discussion that > > > > > emerged from it, here are fixes to restore and document the > > > > > behavior of the EAL and the pci bus driver. > > > > > > > > > > I pondered all the arguments and tried to have the less changes > > > > > possible. > > > > > I can't find a need for a flag to just announce support of > > > > > physical addresses from the pmd point of view. > > > > > So it ended up with something really close to what Jerin had > suggested. > > > > > > > > > > But the problem is that this is still unfinished wrt the documentation. > > > > > I will be offline for 10 days and we need this to move forward, > > > > > so sending anyway. > > > > > > > > > > Changelog since v3: > > > > > - fixed typos in patch 2, > > > > > - updated patch 3 title, > > > > > - moved and reworded comments in the note section in patch 4, > > > > > > > > > > Changelog since v2 (Jerin): > > > > > - Patch 2/4 - Remove personal appeals in log messages(Anatoly) > > > > > - Patch 4/4 - Added documentation (Anatoly) > > > > > > > > > > Changelog since v1 (Jerin): > > > > > - Changed RTE_PCI_DRV_IOVA_AS_VA flag name as > > > > RTE_PCI_DRV_NEED_IOVA_AS_VA > > > > > (patch 3/4) > > > > > - Changed IOVA mode as VA for default case(patch 4/4) with > > > > documentation > > > > > - Tested the patch series on octeontx2 platform > > > > > > > > Applied, thanks Jerin, Anatoly and David for converging on a > > > > documented solution together. > > > > > > > > ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [dpdk-dev] [PATCH v4 0/4] Fixes on IOVA mode selection 2019-07-23 5:27 ` Jerin Jacob Kollanukkaran @ 2019-07-23 7:21 ` Thomas Monjalon 2019-07-23 9:57 ` Burakov, Anatoly 1 sibling, 0 replies; 57+ messages in thread From: Thomas Monjalon @ 2019-07-23 7:21 UTC (permalink / raw) To: Stojaczyk, Dariusz Cc: Jerin Jacob Kollanukkaran, David Marchand, Burakov, Anatoly, dev 23/07/2019 07:27, Jerin Jacob Kollanukkaran: > From: Stojaczyk, Dariusz <dariusz.stojaczyk@intel.com> > > From: Jerin Jacob Kollanukkaran [mailto:jerinj@marvell.com] > > > From: Stojaczyk, Dariusz <dariusz.stojaczyk@intel.com> > > > > > > > > This introduces a regression where uio-bound devies are attached to > > > > a DPDK app at runtime. Yes it is a regression on purpose. We can also name it a behaviour change (more below). > > > [...] > There reason to choose VA incase if bus detects DC is following: > > - All drivers are expected to work in RTE_IOVA_VA mode, irrespective of > physical address availability. > - By default, the mempool, first asks for IOVA-contiguous memory using > ``RTE_MEMZONE_IOVA_CONTIG``. This is slow in RTE_IOVA_PA mode and it may > affect the application boot time. > - It is easy to enable large amount of IOVA-contiguous memory use-cases > with IOVA in VA mode. > [...] > > The PCI device could have been simply hotplugged to the system after DPDK > > app start. DPDK didn't know about it at initialization, so it picked > > RTE_IOVA_VA and then would fail to attach any UIO-bound device ever > > after: > > > > EAL: Expecting 'PA' IOVA mode but current mode is 'VA', not initializing > > We have RTE_PCI_DRV_NEED_IOVA_AS_VA devices in DPDK, Which can work > Only on VA. If we default 'PA' incase of DC, then what do with hotplugging on those devices? [...] > > > > When there are no devices attached at initialization, the only safe > > > > default should be RTE_IOVA_PA. With RTE_IOVA_VA we just won't be > > > > able to do any DMA to uio-bound PCI devices. As Jerin explained, there is no safe default. There are two cases which cannot work together: 1/ no IOMMU 2/ driver supporting only IOMMU address (named IOVA_AS_VA) In the past we were defaulting to physical addressing, it was in favor of case 1. Now we decided to switch to IOMMU address by default, which is in favor of case 2. As explained above by Jerin, this is considered as an improvement. We should explain this change in the known issues of the release notes. The only real fix would be to allow both addresses at the same time, with separate memory allocators. ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [dpdk-dev] [PATCH v4 0/4] Fixes on IOVA mode selection 2019-07-23 5:27 ` Jerin Jacob Kollanukkaran 2019-07-23 7:21 ` Thomas Monjalon @ 2019-07-23 9:57 ` Burakov, Anatoly 2019-07-23 10:25 ` Thomas Monjalon 1 sibling, 1 reply; 57+ messages in thread From: Burakov, Anatoly @ 2019-07-23 9:57 UTC (permalink / raw) To: Jerin Jacob Kollanukkaran, Stojaczyk, Dariusz, Thomas Monjalon, David Marchand Cc: dev On 23-Jul-19 6:27 AM, Jerin Jacob Kollanukkaran wrote: >> -----Original Message----- >> From: Stojaczyk, Dariusz <dariusz.stojaczyk@intel.com> >> Sent: Tuesday, July 23, 2019 10:24 AM >> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Thomas Monjalon >> <thomas@monjalon.net>; David Marchand <david.marchand@redhat.com>; >> Burakov, Anatoly <anatoly.burakov@intel.com> >> Cc: dev@dpdk.org >> Subject: [EXT] RE: [dpdk-dev] [PATCH v4 0/4] Fixes on IOVA mode selection >> >>> -----Original Message----- >>> From: Jerin Jacob Kollanukkaran [mailto:jerinj@marvell.com] >>> Sent: Tuesday, July 23, 2019 6:19 AM >>> >>>> -----Original Message----- >>>> From: Stojaczyk, Dariusz <dariusz.stojaczyk@intel.com> >>>> Sent: Tuesday, July 23, 2019 9:06 AM >>>> To: Thomas Monjalon <thomas@monjalon.net>; David Marchand >>>> <david.marchand@redhat.com>; Burakov, Anatoly >>>> <anatoly.burakov@intel.com>; Jerin Jacob Kollanukkaran >>>> <jerinj@marvell.com> >>>> Cc: dev@dpdk.org >>>> Subject: [EXT] RE: [dpdk-dev] [PATCH v4 0/4] Fixes on IOVA mode >>>> selection >>>> >>>> This introduces a regression where uio-bound devies are attached to >>>> a >>> DPDK >>>> app at runtime. >>> >>> Just to understand the requirements; >>> # Is this requirement for SPDK? >>> # Is brand new PCI device scanned and attached to DPDK at runtime? >>> # Any specific reason for using uio vs vfio? >> >> Jerin, > > Stojaczyk, > > There reason to choose VA incase if bus detects DC is following: > > - All drivers are expected to work in RTE_IOVA_VA mode, irrespective of > physical address availability. > - By default, the mempool, first asks for IOVA-contiguous memory using > ``RTE_MEMZONE_IOVA_CONTIG``. This is slow in RTE_IOVA_PA mode and it may > affect the application boot time. > - It is easy to enable large amount of IOVA-contiguous memory use-cases > with IOVA in VA mode. > >> >> It came up in SPDK tests, but it's certainly nothing SPDK-specific, I can't give >> you the steps but it should be reproducible even with testpmd. >> >> The PCI device could have been simply hotplugged to the system after DPDK >> app start. DPDK didn't know about it at initialization, so it picked >> RTE_IOVA_VA and then would fail to attach any UIO-bound device ever >> after: >> >> EAL: Expecting 'PA' IOVA mode but current mode is 'VA', not initializing > > We have RTE_PCI_DRV_NEED_IOVA_AS_VA devices in DPDK, Which can work > Only on VA. If we default 'PA' incase of DC, then what do with hotplugging on those devices? > > >> EAL: Driver cannot attach the device (0000:00:09.0) >> EAL: Failed to attach device on primary process >> >> UIO is commonly used on systems without IOMMU- including VMs. > > The latest machines has IOMMU. Which machines you are testing against, > Can we detect the machines without IOMMU and switch to PA? A machine without an IOMMU shouldn't have picked IOVA as VA in the first place. Perhaps this is something we could fix? I'm not sure how to detected that condition though, i don't think there's a mechanism to know that for sure. Some kernels create a "iommu" sysfs directories, but i'm not too sure if they're 1) there for older kernels we support, and 2) always there. On machines with IOMMU, VFIO should be the default, and we should discourage people from using igb_uio. Is there any reason why SPDK is not using VFIO by default? On my machine, "/sys/devices/virtual/iommu" exists when IOMMU is enabled, but doesn't exist if it isn't ("/sys/class/iommu" exists in both cases, but is empty when IOMMU is disabled). Perhaps we could go off that? -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [dpdk-dev] [PATCH v4 0/4] Fixes on IOVA mode selection 2019-07-23 9:57 ` Burakov, Anatoly @ 2019-07-23 10:25 ` Thomas Monjalon 2019-07-23 13:56 ` Burakov, Anatoly 0 siblings, 1 reply; 57+ messages in thread From: Thomas Monjalon @ 2019-07-23 10:25 UTC (permalink / raw) To: Burakov, Anatoly Cc: Jerin Jacob Kollanukkaran, Stojaczyk, Dariusz, David Marchand, dev 23/07/2019 11:57, Burakov, Anatoly: > A machine without an IOMMU shouldn't have picked IOVA as VA in the first > place. Perhaps this is something we could fix? I'm not sure how to > detected that condition though, i don't think there's a mechanism to > know that for sure. Some kernels create a "iommu" sysfs directories, but > i'm not too sure if they're 1) there for older kernels we support, and > 2) always there. [..] > On my machine, "/sys/devices/virtual/iommu" exists when IOMMU is > enabled, but doesn't exist if it isn't ("/sys/class/iommu" exists in > both cases, but is empty when IOMMU is disabled). Perhaps we could go > off that? Yes, good idea. We need to check how these sysfs entries are managed, and how old they are by looking at Linux code history. ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [dpdk-dev] [PATCH v4 0/4] Fixes on IOVA mode selection 2019-07-23 10:25 ` Thomas Monjalon @ 2019-07-23 13:56 ` Burakov, Anatoly 2019-07-23 14:24 ` [dpdk-dev] [EXT] " Jerin Jacob Kollanukkaran 2019-07-23 14:29 ` [dpdk-dev] " Burakov, Anatoly 0 siblings, 2 replies; 57+ messages in thread From: Burakov, Anatoly @ 2019-07-23 13:56 UTC (permalink / raw) To: Thomas Monjalon Cc: Jerin Jacob Kollanukkaran, Stojaczyk, Dariusz, David Marchand, dev On 23-Jul-19 11:25 AM, Thomas Monjalon wrote: > 23/07/2019 11:57, Burakov, Anatoly: >> A machine without an IOMMU shouldn't have picked IOVA as VA in the first >> place. Perhaps this is something we could fix? I'm not sure how to >> detected that condition though, i don't think there's a mechanism to >> know that for sure. Some kernels create a "iommu" sysfs directories, but >> i'm not too sure if they're 1) there for older kernels we support, and >> 2) always there. > [..] >> On my machine, "/sys/devices/virtual/iommu" exists when IOMMU is >> enabled, but doesn't exist if it isn't ("/sys/class/iommu" exists in >> both cases, but is empty when IOMMU is disabled). Perhaps we could go >> off that? > > Yes, good idea. > We need to check how these sysfs entries are managed, > and how old they are by looking at Linux code history. > Quick (and by no means thorough) Google reveals that IOMMU driver's sysfs-related code dates back as far as kernel version 3.17: https://elixir.bootlin.com/linux/v3.17.8/source/drivers/iommu/iommu-sysfs.c I'm not a kernel code expert, but the code *looks* like it's creating an IOMMU-related entry in sysfs. So, i take it we can be reasonably sure of these entries' presence at least since v3.17 onwards? Do we support kernels which don't have this code? -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [dpdk-dev] [EXT] Re: [PATCH v4 0/4] Fixes on IOVA mode selection 2019-07-23 13:56 ` Burakov, Anatoly @ 2019-07-23 14:24 ` Jerin Jacob Kollanukkaran 2019-07-23 14:29 ` [dpdk-dev] " Burakov, Anatoly 1 sibling, 0 replies; 57+ messages in thread From: Jerin Jacob Kollanukkaran @ 2019-07-23 14:24 UTC (permalink / raw) To: Burakov, Anatoly, Thomas Monjalon; +Cc: Stojaczyk, Dariusz, David Marchand, dev > -----Original Message----- > From: Burakov, Anatoly <anatoly.burakov@intel.com> > Sent: Tuesday, July 23, 2019 7:27 PM > To: Thomas Monjalon <thomas@monjalon.net> > Cc: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Stojaczyk, Dariusz > <dariusz.stojaczyk@intel.com>; David Marchand > <david.marchand@redhat.com>; dev@dpdk.org > Subject: [EXT] Re: [dpdk-dev] [PATCH v4 0/4] Fixes on IOVA mode selection > > ---------------------------------------------------------------------- > On 23-Jul-19 11:25 AM, Thomas Monjalon wrote: > > 23/07/2019 11:57, Burakov, Anatoly: > >> A machine without an IOMMU shouldn't have picked IOVA as VA in the > >> first place. Perhaps this is something we could fix? I'm not sure how > >> to detected that condition though, i don't think there's a mechanism > >> to know that for sure. Some kernels create a "iommu" sysfs > >> directories, but i'm not too sure if they're 1) there for older > >> kernels we support, and > >> 2) always there. > > [..] > >> On my machine, "/sys/devices/virtual/iommu" exists when IOMMU is > >> enabled, but doesn't exist if it isn't ("/sys/class/iommu" exists in > >> both cases, but is empty when IOMMU is disabled). Perhaps we could go > >> off that? > > > > Yes, good idea. > > We need to check how these sysfs entries are managed, and how old they > > are by looking at Linux code history. > > > > Quick (and by no means thorough) Google reveals that IOMMU driver's > sysfs-related code dates back as far as kernel version 3.17: > > https://elixir.bootlin.com/linux/v3.17.8/source/drivers/iommu/iommu- > sysfs.c > > I'm not a kernel code expert, but the code *looks* like it's creating an > IOMMU-related entry in sysfs. So, i take it we can be reasonably sure of > these entries' presence at least since v3.17 onwards? Do we support kernels > which don't have this code? I checked with a x86 and arm64 machine. I could not see "/sys/devices/virtual/iommu" But looks like "/sys/class/iommu/" present when iommu present. $ uname -a Linux jerin-lab 5.1.15-arch1-1-ARCH #1 SMP PREEMPT Tue Jun 25 04:49:39 UTC 2019 x86_64 GNU/Linux $ ls /sys/devices/virtual/ bdi dmi drm graphics mem misc msr net powercap thermal tty vc vtconsole workqueue # ls /sys/class/iommu/ # uname -a Linux alarm 4.14.76-5.0.0-g12f0519 #63 SMP PREEMPT Thu Jul 11 17:43:54 IST 2019 aarch64 GNU/Linux # ls /sys/devices/virtual/ bdi block graphics input mem misc net otx-bphy-ctr otx-gpio-ctr ppp tty vc vfio vtconsole workqueue # ls /sys/class/iommu/ smmu3.0x0000830000000000 > > -- > Thanks, > Anatoly ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [dpdk-dev] [PATCH v4 0/4] Fixes on IOVA mode selection 2019-07-23 13:56 ` Burakov, Anatoly 2019-07-23 14:24 ` [dpdk-dev] [EXT] " Jerin Jacob Kollanukkaran @ 2019-07-23 14:29 ` Burakov, Anatoly 2019-07-23 14:36 ` [dpdk-dev] [EXT] " Jerin Jacob Kollanukkaran 1 sibling, 1 reply; 57+ messages in thread From: Burakov, Anatoly @ 2019-07-23 14:29 UTC (permalink / raw) To: Thomas Monjalon Cc: Jerin Jacob Kollanukkaran, Stojaczyk, Dariusz, David Marchand, dev On 23-Jul-19 2:56 PM, Burakov, Anatoly wrote: > On 23-Jul-19 11:25 AM, Thomas Monjalon wrote: >> 23/07/2019 11:57, Burakov, Anatoly: >>> A machine without an IOMMU shouldn't have picked IOVA as VA in the first >>> place. Perhaps this is something we could fix? I'm not sure how to >>> detected that condition though, i don't think there's a mechanism to >>> know that for sure. Some kernels create a "iommu" sysfs directories, but >>> i'm not too sure if they're 1) there for older kernels we support, and >>> 2) always there. >> [..] >>> On my machine, "/sys/devices/virtual/iommu" exists when IOMMU is >>> enabled, but doesn't exist if it isn't ("/sys/class/iommu" exists in >>> both cases, but is empty when IOMMU is disabled). Perhaps we could go >>> off that? >> >> Yes, good idea. >> We need to check how these sysfs entries are managed, >> and how old they are by looking at Linux code history. >> > > Quick (and by no means thorough) Google reveals that IOMMU driver's > sysfs-related code dates back as far as kernel version 3.17: > > https://elixir.bootlin.com/linux/v3.17.8/source/drivers/iommu/iommu-sysfs.c > > I'm not a kernel code expert, but the code *looks* like it's creating an > IOMMU-related entry in sysfs. So, i take it we can be reasonably sure of > these entries' presence at least since v3.17 onwards? Do we support > kernels which don't have this code? > After a short chat with Ferruh, i think we have even better way to determine whether IOMMU is enabled - the /sys/kernel/iommu filesystem. Those are created whenever it is possible for VFIO to run, even if VFIO driver itself is not loaded. These have been there since kernel 3.6, so our minimum requirements are met with this approach, i believe. -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [dpdk-dev] [EXT] Re: [PATCH v4 0/4] Fixes on IOVA mode selection 2019-07-23 14:29 ` [dpdk-dev] " Burakov, Anatoly @ 2019-07-23 14:36 ` Jerin Jacob Kollanukkaran 2019-07-23 15:47 ` Burakov, Anatoly 0 siblings, 1 reply; 57+ messages in thread From: Jerin Jacob Kollanukkaran @ 2019-07-23 14:36 UTC (permalink / raw) To: Burakov, Anatoly, Thomas Monjalon; +Cc: Stojaczyk, Dariusz, David Marchand, dev > -----Original Message----- > From: Burakov, Anatoly <anatoly.burakov@intel.com> > Sent: Tuesday, July 23, 2019 8:00 PM > To: Thomas Monjalon <thomas@monjalon.net> > Cc: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Stojaczyk, Dariusz > <dariusz.stojaczyk@intel.com>; David Marchand > <david.marchand@redhat.com>; dev@dpdk.org > Subject: [EXT] Re: [dpdk-dev] [PATCH v4 0/4] Fixes on IOVA mode selection > On 23-Jul-19 2:56 PM, Burakov, Anatoly wrote: > > On 23-Jul-19 11:25 AM, Thomas Monjalon wrote: > >> 23/07/2019 11:57, Burakov, Anatoly: > >>> A machine without an IOMMU shouldn't have picked IOVA as VA in the > >>> first place. Perhaps this is something we could fix? I'm not sure > >>> how to detected that condition though, i don't think there's a > >>> mechanism to know that for sure. Some kernels create a "iommu" sysfs > >>> directories, but i'm not too sure if they're 1) there for older > >>> kernels we support, and > >>> 2) always there. > >> [..] > >>> On my machine, "/sys/devices/virtual/iommu" exists when IOMMU is > >>> enabled, but doesn't exist if it isn't ("/sys/class/iommu" exists in > >>> both cases, but is empty when IOMMU is disabled). Perhaps we could > >>> go off that? > >> > >> Yes, good idea. > >> We need to check how these sysfs entries are managed, and how old > >> they are by looking at Linux code history. > >> > > > > Quick (and by no means thorough) Google reveals that IOMMU driver's > > sysfs-related code dates back as far as kernel version 3.17: > > > > https://elixir.bootlin.com/linux/v3.17.8/source/drivers/iommu/iommu-sy > > sfs.c > > > > I'm not a kernel code expert, but the code *looks* like it's creating > > an IOMMU-related entry in sysfs. So, i take it we can be reasonably > > sure of these entries' presence at least since v3.17 onwards? Do we > > support kernels which don't have this code? > > > > After a short chat with Ferruh, i think we have even better way to determine > whether IOMMU is enabled - the /sys/kernel/iommu filesystem. > Those are created whenever it is possible for VFIO to run, even if VFIO driver > itself is not loaded. These have been there since kernel 3.6, so our minimum > requirements are met with this approach, i believe. I can see /sys/kernel/iommu_groups/ on IOMMU systems not /sys/kernel/iommu > -- > Thanks, > Anatoly ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [dpdk-dev] [EXT] Re: [PATCH v4 0/4] Fixes on IOVA mode selection 2019-07-23 14:36 ` [dpdk-dev] [EXT] " Jerin Jacob Kollanukkaran @ 2019-07-23 15:47 ` Burakov, Anatoly 0 siblings, 0 replies; 57+ messages in thread From: Burakov, Anatoly @ 2019-07-23 15:47 UTC (permalink / raw) To: Jerin Jacob Kollanukkaran, Thomas Monjalon Cc: Stojaczyk, Dariusz, David Marchand, dev On 23-Jul-19 3:36 PM, Jerin Jacob Kollanukkaran wrote: >> -----Original Message----- >> From: Burakov, Anatoly <anatoly.burakov@intel.com> >> Sent: Tuesday, July 23, 2019 8:00 PM >> To: Thomas Monjalon <thomas@monjalon.net> >> Cc: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Stojaczyk, Dariusz >> <dariusz.stojaczyk@intel.com>; David Marchand >> <david.marchand@redhat.com>; dev@dpdk.org >> Subject: [EXT] Re: [dpdk-dev] [PATCH v4 0/4] Fixes on IOVA mode selection >> On 23-Jul-19 2:56 PM, Burakov, Anatoly wrote: >>> On 23-Jul-19 11:25 AM, Thomas Monjalon wrote: >>>> 23/07/2019 11:57, Burakov, Anatoly: >>>>> A machine without an IOMMU shouldn't have picked IOVA as VA in the >>>>> first place. Perhaps this is something we could fix? I'm not sure >>>>> how to detected that condition though, i don't think there's a >>>>> mechanism to know that for sure. Some kernels create a "iommu" sysfs >>>>> directories, but i'm not too sure if they're 1) there for older >>>>> kernels we support, and >>>>> 2) always there. >>>> [..] >>>>> On my machine, "/sys/devices/virtual/iommu" exists when IOMMU is >>>>> enabled, but doesn't exist if it isn't ("/sys/class/iommu" exists in >>>>> both cases, but is empty when IOMMU is disabled). Perhaps we could >>>>> go off that? >>>> >>>> Yes, good idea. >>>> We need to check how these sysfs entries are managed, and how old >>>> they are by looking at Linux code history. >>>> >>> >>> Quick (and by no means thorough) Google reveals that IOMMU driver's >>> sysfs-related code dates back as far as kernel version 3.17: >>> >>> https://elixir.bootlin.com/linux/v3.17.8/source/drivers/iommu/iommu-sy >>> sfs.c >>> >>> I'm not a kernel code expert, but the code *looks* like it's creating >>> an IOMMU-related entry in sysfs. So, i take it we can be reasonably >>> sure of these entries' presence at least since v3.17 onwards? Do we >>> support kernels which don't have this code? >>> >> >> After a short chat with Ferruh, i think we have even better way to determine >> whether IOMMU is enabled - the /sys/kernel/iommu filesystem. >> Those are created whenever it is possible for VFIO to run, even if VFIO driver >> itself is not loaded. These have been there since kernel 3.6, so our minimum >> requirements are met with this approach, i believe. > > I can see /sys/kernel/iommu_groups/ on IOMMU systems not /sys/kernel/iommu Sorry, yes, a typo. It's /sys/kernel/iommu_groups/. > >> -- >> Thanks, >> Anatoly -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 57+ messages in thread
end of thread, other threads:[~2019-11-25 14:30 UTC | newest] Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-07-10 21:48 [dpdk-dev] [PATCH 0/2] Fixes on IOVA mode selection David Marchand 2019-07-10 21:48 ` [dpdk-dev] [PATCH 1/2] Revert "bus/pci: add Mellanox kernel driver type" David Marchand 2019-07-16 10:37 ` [dpdk-dev] [EXT] " Jerin Jacob Kollanukkaran 2019-07-10 21:48 ` [dpdk-dev] [PATCH 2/2] eal: fix IOVA mode selection as VA for pci drivers David Marchand 2019-07-11 14:40 ` Thomas Monjalon 2019-07-12 8:05 ` Jerin Jacob Kollanukkaran 2019-07-12 11:03 ` Burakov, Anatoly 2019-07-12 12:43 ` Thomas Monjalon 2019-07-12 12:58 ` Burakov, Anatoly 2019-07-12 13:19 ` Bruce Richardson 2019-07-15 14:26 ` Jerin Jacob Kollanukkaran 2019-07-15 15:03 ` Thomas Monjalon 2019-07-15 15:35 ` Jerin Jacob Kollanukkaran 2019-07-15 16:06 ` Thomas Monjalon 2019-07-15 16:27 ` Jerin Jacob Kollanukkaran 2019-07-16 13:46 ` [dpdk-dev] [PATCH v2 0/4] Fixes on IOVA mode selection jerinj 2019-07-16 13:46 ` [dpdk-dev] [PATCH v2 1/4] Revert "bus/pci: add Mellanox kernel driver type" jerinj 2019-07-16 13:46 ` [dpdk-dev] [PATCH v2 2/4] eal: fix IOVA mode selection as VA for pci drivers jerinj 2019-07-16 14:26 ` Burakov, Anatoly 2019-07-16 15:07 ` Jerin Jacob Kollanukkaran 2019-07-16 13:46 ` [dpdk-dev] [PATCH v2 3/4] eal: change RTE_PCI_DRV_IOVA_AS_VA flag name jerinj 2019-07-16 13:46 ` [dpdk-dev] [PATCH v2 4/4] eal: select IOVA mode as VA for default case jerinj 2019-07-16 14:33 ` Burakov, Anatoly 2019-07-17 8:33 ` [dpdk-dev] [EXT] " Jerin Jacob Kollanukkaran 2019-07-17 12:38 ` Burakov, Anatoly 2019-07-17 14:04 ` Jerin Jacob Kollanukkaran 2019-07-18 6:45 ` [dpdk-dev] [PATCH v3 0/4] Fixes on IOVA mode selection jerinj 2019-07-18 6:45 ` [dpdk-dev] [PATCH v3 1/4] Revert "bus/pci: add Mellanox kernel driver type" jerinj 2019-07-18 6:45 ` [dpdk-dev] [PATCH v3 2/4] eal: fix IOVA mode selection as VA for pci drivers jerinj 2019-07-18 6:45 ` [dpdk-dev] [PATCH v3 3/4] eal: change RTE_PCI_DRV_IOVA_AS_VA flag name jerinj 2019-07-18 6:45 ` [dpdk-dev] [PATCH v3 4/4] eal: select IOVA mode as VA for default case jerinj 2019-07-22 11:28 ` [dpdk-dev] [PATCH v3 0/4] Fixes on IOVA mode selection David Marchand 2019-07-22 12:56 ` [dpdk-dev] [PATCH v4 " David Marchand 2019-07-22 12:56 ` [dpdk-dev] [PATCH v4 1/4] Revert "bus/pci: add Mellanox kernel driver type" David Marchand 2019-07-22 12:56 ` [dpdk-dev] [PATCH v4 2/4] eal: fix IOVA mode selection as VA for PCI drivers David Marchand 2019-11-25 9:33 ` Ferruh Yigit 2019-11-25 10:22 ` Thomas Monjalon 2019-11-25 12:03 ` Ferruh Yigit 2019-11-25 12:36 ` David Marchand 2019-11-25 12:58 ` Burakov, Anatoly 2019-11-25 14:29 ` Thomas Monjalon 2019-11-25 11:07 ` Jerin Jacob 2019-07-22 12:56 ` [dpdk-dev] [PATCH v4 3/4] drivers: change IOVA as VA PCI flag name David Marchand 2019-07-22 12:56 ` [dpdk-dev] [PATCH v4 4/4] eal: select IOVA as VA mode for default case David Marchand 2019-07-22 15:53 ` [dpdk-dev] [PATCH v4 0/4] Fixes on IOVA mode selection Thomas Monjalon 2019-07-23 3:35 ` Stojaczyk, Dariusz 2019-07-23 4:18 ` Jerin Jacob Kollanukkaran 2019-07-23 4:54 ` Stojaczyk, Dariusz 2019-07-23 5:27 ` Jerin Jacob Kollanukkaran 2019-07-23 7:21 ` Thomas Monjalon 2019-07-23 9:57 ` Burakov, Anatoly 2019-07-23 10:25 ` Thomas Monjalon 2019-07-23 13:56 ` Burakov, Anatoly 2019-07-23 14:24 ` [dpdk-dev] [EXT] " Jerin Jacob Kollanukkaran 2019-07-23 14:29 ` [dpdk-dev] " Burakov, Anatoly 2019-07-23 14:36 ` [dpdk-dev] [EXT] " Jerin Jacob Kollanukkaran 2019-07-23 15:47 ` Burakov, Anatoly
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).