* [dpdk-dev] [PATCH 0/2] enable 4KB + VFIO-PCI @ 2017-10-11 10:33 Jianfeng Tan 2017-10-11 10:33 ` [dpdk-dev] [PATCH 1/2] eal: honor IOVA mode for no-huge case Jianfeng Tan ` (3 more replies) 0 siblings, 4 replies; 19+ messages in thread From: Jianfeng Tan @ 2017-10-11 10:33 UTC (permalink / raw) To: dev Cc: santosh.shukla, sergio.gonzalez.monroy, thomas, ferruh.yigit, Jianfeng Tan Patch 1: Use VA as IOVA if IOVA mode is enabled. Patch 2: Enable IOVA mode for the PMDs for Intel NICs. How to test: $ (bind nic to vfio-pci) $ testpmd -c 0x3 -n 4 -m 2048 --no-huge -- -i --no-numa Jianfeng Tan (2): eal: honor IOVA mode for no-huge case net: enable IOVA mode for PMDs drivers/net/e1000/em_ethdev.c | 3 ++- drivers/net/e1000/igb_ethdev.c | 5 +++-- drivers/net/fm10k/fm10k_ethdev.c | 3 ++- drivers/net/i40e/i40e_ethdev.c | 3 ++- drivers/net/i40e/i40e_ethdev_vf.c | 2 +- drivers/net/ixgbe/ixgbe_ethdev.c | 5 +++-- lib/librte_eal/linuxapp/eal/eal_memory.c | 5 ++++- 7 files changed, 17 insertions(+), 9 deletions(-) -- 2.7.4 ^ permalink raw reply [flat|nested] 19+ messages in thread
* [dpdk-dev] [PATCH 1/2] eal: honor IOVA mode for no-huge case 2017-10-11 10:33 [dpdk-dev] [PATCH 0/2] enable 4KB + VFIO-PCI Jianfeng Tan @ 2017-10-11 10:33 ` Jianfeng Tan 2017-10-11 11:27 ` Burakov, Anatoly ` (2 more replies) 2017-10-11 10:33 ` [dpdk-dev] [PATCH 2/2] net: enable IOVA mode for PMDs Jianfeng Tan ` (2 subsequent siblings) 3 siblings, 3 replies; 19+ messages in thread From: Jianfeng Tan @ 2017-10-11 10:33 UTC (permalink / raw) To: dev Cc: santosh.shukla, sergio.gonzalez.monroy, thomas, ferruh.yigit, Jianfeng Tan With the introduction of IOVA mode, the only blocker to run with 4KB pages for NICs binding to vfio-pci, is that RTE_BAD_PHYS_ADDR is not a valid IOVA address. We can refine this by using VA as IOVA if it's IOVA mode. Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> --- lib/librte_eal/linuxapp/eal/eal_memory.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c index 28bca49..187d338 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memory.c +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c @@ -1030,7 +1030,10 @@ rte_eal_hugepage_init(void) strerror(errno)); return -1; } - mcfg->memseg[0].phys_addr = RTE_BAD_PHYS_ADDR; + if (rte_eal_iova_mode() == RTE_IOVA_VA) + mcfg->memseg[0].phys_addr = (uintptr_t)addr; + else + mcfg->memseg[0].phys_addr = RTE_BAD_PHYS_ADDR; mcfg->memseg[0].addr = addr; mcfg->memseg[0].hugepage_sz = RTE_PGSIZE_4K; mcfg->memseg[0].len = internal_config.memory; -- 2.7.4 ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [dpdk-dev] [PATCH 1/2] eal: honor IOVA mode for no-huge case 2017-10-11 10:33 ` [dpdk-dev] [PATCH 1/2] eal: honor IOVA mode for no-huge case Jianfeng Tan @ 2017-10-11 11:27 ` Burakov, Anatoly 2017-10-11 11:30 ` santosh 2017-10-31 21:49 ` Ferruh Yigit 2 siblings, 0 replies; 19+ messages in thread From: Burakov, Anatoly @ 2017-10-11 11:27 UTC (permalink / raw) To: Jianfeng Tan, dev Cc: santosh.shukla, sergio.gonzalez.monroy, thomas, ferruh.yigit On 11-Oct-17 11:33 AM, Jianfeng Tan wrote: > With the introduction of IOVA mode, the only blocker to run > with 4KB pages for NICs binding to vfio-pci, is that > RTE_BAD_PHYS_ADDR is not a valid IOVA address. > > We can refine this by using VA as IOVA if it's IOVA mode. > > Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> > --- > lib/librte_eal/linuxapp/eal/eal_memory.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c > index 28bca49..187d338 100644 > --- a/lib/librte_eal/linuxapp/eal/eal_memory.c > +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c > @@ -1030,7 +1030,10 @@ rte_eal_hugepage_init(void) > strerror(errno)); > return -1; > } > - mcfg->memseg[0].phys_addr = RTE_BAD_PHYS_ADDR; > + if (rte_eal_iova_mode() == RTE_IOVA_VA) > + mcfg->memseg[0].phys_addr = (uintptr_t)addr; > + else > + mcfg->memseg[0].phys_addr = RTE_BAD_PHYS_ADDR; > mcfg->memseg[0].addr = addr; > mcfg->memseg[0].hugepage_sz = RTE_PGSIZE_4K; > mcfg->memseg[0].len = internal_config.memory; > Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [dpdk-dev] [PATCH 1/2] eal: honor IOVA mode for no-huge case 2017-10-11 10:33 ` [dpdk-dev] [PATCH 1/2] eal: honor IOVA mode for no-huge case Jianfeng Tan 2017-10-11 11:27 ` Burakov, Anatoly @ 2017-10-11 11:30 ` santosh 2017-10-31 21:49 ` Ferruh Yigit 2 siblings, 0 replies; 19+ messages in thread From: santosh @ 2017-10-11 11:30 UTC (permalink / raw) To: Jianfeng Tan, dev; +Cc: sergio.gonzalez.monroy, thomas, ferruh.yigit On Wednesday 11 October 2017 04:03 PM, Jianfeng Tan wrote: > With the introduction of IOVA mode, the only blocker to run > with 4KB pages for NICs binding to vfio-pci, is that > RTE_BAD_PHYS_ADDR is not a valid IOVA address. > > We can refine this by using VA as IOVA if it's IOVA mode. > > Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> > --- Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [dpdk-dev] [PATCH 1/2] eal: honor IOVA mode for no-huge case 2017-10-11 10:33 ` [dpdk-dev] [PATCH 1/2] eal: honor IOVA mode for no-huge case Jianfeng Tan 2017-10-11 11:27 ` Burakov, Anatoly 2017-10-11 11:30 ` santosh @ 2017-10-31 21:49 ` Ferruh Yigit 2017-10-31 22:37 ` Ferruh Yigit 2 siblings, 1 reply; 19+ messages in thread From: Ferruh Yigit @ 2017-10-31 21:49 UTC (permalink / raw) To: Jianfeng Tan, dev; +Cc: santosh.shukla, sergio.gonzalez.monroy, thomas On 10/11/2017 3:33 AM, Jianfeng Tan wrote: > With the introduction of IOVA mode, the only blocker to run > with 4KB pages for NICs binding to vfio-pci, is that > RTE_BAD_PHYS_ADDR is not a valid IOVA address. > > We can refine this by using VA as IOVA if it's IOVA mode. > > Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> > --- > lib/librte_eal/linuxapp/eal/eal_memory.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c > index 28bca49..187d338 100644 > --- a/lib/librte_eal/linuxapp/eal/eal_memory.c > +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c > @@ -1030,7 +1030,10 @@ rte_eal_hugepage_init(void) > strerror(errno)); > return -1; > } > - mcfg->memseg[0].phys_addr = RTE_BAD_PHYS_ADDR; > + if (rte_eal_iova_mode() == RTE_IOVA_VA) > + mcfg->memseg[0].phys_addr = (uintptr_t)addr; > + else > + mcfg->memseg[0].phys_addr = RTE_BAD_PHYS_ADDR; This breaks KNI which requires physical address. Any idea how to disable RTE_IOVA_VA when KNI used? > mcfg->memseg[0].addr = addr; > mcfg->memseg[0].hugepage_sz = RTE_PGSIZE_4K; > mcfg->memseg[0].len = internal_config.memory; > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [dpdk-dev] [PATCH 1/2] eal: honor IOVA mode for no-huge case 2017-10-31 21:49 ` Ferruh Yigit @ 2017-10-31 22:37 ` Ferruh Yigit 2017-11-01 1:10 ` Ferruh Yigit 0 siblings, 1 reply; 19+ messages in thread From: Ferruh Yigit @ 2017-10-31 22:37 UTC (permalink / raw) To: Jianfeng Tan, dev; +Cc: santosh.shukla, sergio.gonzalez.monroy, thomas On 10/31/2017 2:49 PM, Ferruh Yigit wrote: > On 10/11/2017 3:33 AM, Jianfeng Tan wrote: >> With the introduction of IOVA mode, the only blocker to run >> with 4KB pages for NICs binding to vfio-pci, is that >> RTE_BAD_PHYS_ADDR is not a valid IOVA address. >> >> We can refine this by using VA as IOVA if it's IOVA mode. >> >> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> >> --- >> lib/librte_eal/linuxapp/eal/eal_memory.c | 5 ++++- >> 1 file changed, 4 insertions(+), 1 deletion(-) >> >> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c >> index 28bca49..187d338 100644 >> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c >> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c >> @@ -1030,7 +1030,10 @@ rte_eal_hugepage_init(void) >> strerror(errno)); >> return -1; >> } >> - mcfg->memseg[0].phys_addr = RTE_BAD_PHYS_ADDR; >> + if (rte_eal_iova_mode() == RTE_IOVA_VA) >> + mcfg->memseg[0].phys_addr = (uintptr_t)addr; >> + else >> + mcfg->memseg[0].phys_addr = RTE_BAD_PHYS_ADDR; > > This breaks KNI which requires physical address. My bad, this patch is for no_hugetlbfs case. Issue seen starting from next patch in the set [1], which enables IOVA mode for Intel PMDs. With IOVA mode enabled, KNI fails. Does it make sense to add an API to set iova mode explicitly by application? Application can set iova to PA and allocate memzones it requires. [1] http://dpdk.org/commit/f37dfab2 > > Any idea how to disable RTE_IOVA_VA when KNI used? > >> mcfg->memseg[0].addr = addr; >> mcfg->memseg[0].hugepage_sz = RTE_PGSIZE_4K; >> mcfg->memseg[0].len = internal_config.memory; >> > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [dpdk-dev] [PATCH 1/2] eal: honor IOVA mode for no-huge case 2017-10-31 22:37 ` Ferruh Yigit @ 2017-11-01 1:10 ` Ferruh Yigit 0 siblings, 0 replies; 19+ messages in thread From: Ferruh Yigit @ 2017-11-01 1:10 UTC (permalink / raw) To: Jianfeng Tan, dev; +Cc: santosh.shukla, sergio.gonzalez.monroy, thomas On 10/31/2017 3:37 PM, Ferruh Yigit wrote: > On 10/31/2017 2:49 PM, Ferruh Yigit wrote: >> On 10/11/2017 3:33 AM, Jianfeng Tan wrote: >>> With the introduction of IOVA mode, the only blocker to run >>> with 4KB pages for NICs binding to vfio-pci, is that >>> RTE_BAD_PHYS_ADDR is not a valid IOVA address. >>> >>> We can refine this by using VA as IOVA if it's IOVA mode. >>> >>> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> >>> --- >>> lib/librte_eal/linuxapp/eal/eal_memory.c | 5 ++++- >>> 1 file changed, 4 insertions(+), 1 deletion(-) >>> >>> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c >>> index 28bca49..187d338 100644 >>> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c >>> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c >>> @@ -1030,7 +1030,10 @@ rte_eal_hugepage_init(void) >>> strerror(errno)); >>> return -1; >>> } >>> - mcfg->memseg[0].phys_addr = RTE_BAD_PHYS_ADDR; >>> + if (rte_eal_iova_mode() == RTE_IOVA_VA) >>> + mcfg->memseg[0].phys_addr = (uintptr_t)addr; >>> + else >>> + mcfg->memseg[0].phys_addr = RTE_BAD_PHYS_ADDR; >> >> This breaks KNI which requires physical address. > > My bad, this patch is for no_hugetlbfs case. > > Issue seen starting from next patch in the set [1], which enables IOVA mode for > Intel PMDs. > > With IOVA mode enabled, KNI fails. > > Does it make sense to add an API to set iova mode explicitly by application? > Application can set iova to PA and allocate memzones it requires. Added config option to disable IOVA mode detection: http://dpdk.org/dev/patchwork/patch/31071/ Still concerned if this may hit someone, since the result for KNI is a kernel crash it would be nice to have more solid protection here. And suggestion welcome. Thanks, ferruh > > [1] > http://dpdk.org/commit/f37dfab2 > >> >> Any idea how to disable RTE_IOVA_VA when KNI used? >> >>> mcfg->memseg[0].addr = addr; >>> mcfg->memseg[0].hugepage_sz = RTE_PGSIZE_4K; >>> mcfg->memseg[0].len = internal_config.memory; >>> >> > ^ permalink raw reply [flat|nested] 19+ messages in thread
* [dpdk-dev] [PATCH 2/2] net: enable IOVA mode for PMDs 2017-10-11 10:33 [dpdk-dev] [PATCH 0/2] enable 4KB + VFIO-PCI Jianfeng Tan 2017-10-11 10:33 ` [dpdk-dev] [PATCH 1/2] eal: honor IOVA mode for no-huge case Jianfeng Tan @ 2017-10-11 10:33 ` Jianfeng Tan 2017-10-11 10:43 ` Burakov, Anatoly ` (3 more replies) 2017-10-11 10:47 ` [dpdk-dev] [PATCH 0/2] enable 4KB + VFIO-PCI Burakov, Anatoly 2017-10-12 19:57 ` Ferruh Yigit 3 siblings, 4 replies; 19+ messages in thread From: Jianfeng Tan @ 2017-10-11 10:33 UTC (permalink / raw) To: dev Cc: santosh.shukla, sergio.gonzalez.monroy, thomas, ferruh.yigit, Jianfeng Tan If we want to enable IOVA mode, introduced by commit 93878cf0255e ("eal: introduce helper API for IOVA mode"), we need PMDs (for PCI devices) to expose this flag. Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> --- drivers/net/e1000/em_ethdev.c | 3 ++- drivers/net/e1000/igb_ethdev.c | 5 +++-- drivers/net/fm10k/fm10k_ethdev.c | 3 ++- drivers/net/i40e/i40e_ethdev.c | 3 ++- drivers/net/i40e/i40e_ethdev_vf.c | 2 +- drivers/net/ixgbe/ixgbe_ethdev.c | 5 +++-- 6 files changed, 13 insertions(+), 8 deletions(-) diff --git a/drivers/net/e1000/em_ethdev.c b/drivers/net/e1000/em_ethdev.c index a59947d..324f051 100644 --- a/drivers/net/e1000/em_ethdev.c +++ b/drivers/net/e1000/em_ethdev.c @@ -432,7 +432,8 @@ static int eth_em_pci_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver rte_em_pmd = { .id_table = pci_id_em_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | + RTE_PCI_DRV_IOVA_AS_VA, .probe = eth_em_pci_probe, .remove = eth_em_pci_remove, }; diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c index 040dd9f..a760011 100644 --- a/drivers/net/e1000/igb_ethdev.c +++ b/drivers/net/e1000/igb_ethdev.c @@ -1168,7 +1168,8 @@ static int eth_igb_pci_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver rte_igb_pmd = { .id_table = pci_id_igb_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | + RTE_PCI_DRV_IOVA_AS_VA, .probe = eth_igb_pci_probe, .remove = eth_igb_pci_remove, }; @@ -1191,7 +1192,7 @@ static int eth_igbvf_pci_remove(struct rte_pci_device *pci_dev) */ static struct rte_pci_driver rte_igbvf_pmd = { .id_table = pci_id_igbvf_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_IOVA_AS_VA, .probe = eth_igbvf_pci_probe, .remove = eth_igbvf_pci_remove, }; diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c index 15ea2a5..bf36e71 100644 --- a/drivers/net/fm10k/fm10k_ethdev.c +++ b/drivers/net/fm10k/fm10k_ethdev.c @@ -3142,7 +3142,8 @@ static const struct rte_pci_id pci_id_fm10k_map[] = { static struct rte_pci_driver rte_pmd_fm10k = { .id_table = pci_id_fm10k_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | + RTE_PCI_DRV_IOVA_AS_VA, .probe = eth_fm10k_pci_probe, .remove = eth_fm10k_pci_remove, }; diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c index 536365d..f6330d2 100644 --- a/drivers/net/i40e/i40e_ethdev.c +++ b/drivers/net/i40e/i40e_ethdev.c @@ -654,7 +654,8 @@ static int eth_i40e_pci_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver rte_i40e_pmd = { .id_table = pci_id_i40e_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | + RTE_PCI_DRV_IOVA_AS_VA, .probe = eth_i40e_pci_probe, .remove = eth_i40e_pci_remove, }; diff --git a/drivers/net/i40e/i40e_ethdev_vf.c b/drivers/net/i40e/i40e_ethdev_vf.c index 111ac39..4cadf83 100644 --- a/drivers/net/i40e/i40e_ethdev_vf.c +++ b/drivers/net/i40e/i40e_ethdev_vf.c @@ -1527,7 +1527,7 @@ static int eth_i40evf_pci_remove(struct rte_pci_device *pci_dev) */ static struct rte_pci_driver rte_i40evf_pmd = { .id_table = pci_id_i40evf_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_IOVA_AS_VA, .probe = eth_i40evf_pci_probe, .remove = eth_i40evf_pci_remove, }; diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c index a7d7acc..6ad28b3 100644 --- a/drivers/net/ixgbe/ixgbe_ethdev.c +++ b/drivers/net/ixgbe/ixgbe_ethdev.c @@ -1781,7 +1781,8 @@ static int eth_ixgbe_pci_remove(struct rte_pci_device *pci_dev) static struct rte_pci_driver rte_ixgbe_pmd = { .id_table = pci_id_ixgbe_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | + RTE_PCI_DRV_IOVA_AS_VA, .probe = eth_ixgbe_pci_probe, .remove = eth_ixgbe_pci_remove, }; @@ -1803,7 +1804,7 @@ static int eth_ixgbevf_pci_remove(struct rte_pci_device *pci_dev) */ static struct rte_pci_driver rte_ixgbevf_pmd = { .id_table = pci_id_ixgbevf_map, - .drv_flags = RTE_PCI_DRV_NEED_MAPPING, + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_IOVA_AS_VA, .probe = eth_ixgbevf_pci_probe, .remove = eth_ixgbevf_pci_remove, }; -- 2.7.4 ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [dpdk-dev] [PATCH 2/2] net: enable IOVA mode for PMDs 2017-10-11 10:33 ` [dpdk-dev] [PATCH 2/2] net: enable IOVA mode for PMDs Jianfeng Tan @ 2017-10-11 10:43 ` Burakov, Anatoly 2017-10-11 10:56 ` Tan, Jianfeng 2017-10-11 11:30 ` Burakov, Anatoly ` (2 subsequent siblings) 3 siblings, 1 reply; 19+ messages in thread From: Burakov, Anatoly @ 2017-10-11 10:43 UTC (permalink / raw) To: Jianfeng Tan, dev Cc: santosh.shukla, sergio.gonzalez.monroy, thomas, ferruh.yigit On 11-Oct-17 11:33 AM, Jianfeng Tan wrote: > If we want to enable IOVA mode, introduced by > commit 93878cf0255e ("eal: introduce helper API for IOVA mode"), > we need PMDs (for PCI devices) to expose this flag. > > Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> > --- Is this the complete list of drivers which need this flag? Do other devices (e.g. cryptodev?) need this flag? -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [dpdk-dev] [PATCH 2/2] net: enable IOVA mode for PMDs 2017-10-11 10:43 ` Burakov, Anatoly @ 2017-10-11 10:56 ` Tan, Jianfeng 0 siblings, 0 replies; 19+ messages in thread From: Tan, Jianfeng @ 2017-10-11 10:56 UTC (permalink / raw) To: Burakov, Anatoly, dev Cc: santosh.shukla, sergio.gonzalez.monroy, thomas, ferruh.yigit On 10/11/2017 6:43 PM, Burakov, Anatoly wrote: > On 11-Oct-17 11:33 AM, Jianfeng Tan wrote: >> If we want to enable IOVA mode, introduced by >> commit 93878cf0255e ("eal: introduce helper API for IOVA mode"), >> we need PMDs (for PCI devices) to expose this flag. >> >> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> >> --- > > Is this the complete list of drivers which need this flag? Do other > devices (e.g. cryptodev?) need this flag? No, these are just NICs from Intel (as an example). If other NICs want to enable this, I'm more than happy to cover it in v2. Thanks, Jianfeng ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [dpdk-dev] [PATCH 2/2] net: enable IOVA mode for PMDs 2017-10-11 10:33 ` [dpdk-dev] [PATCH 2/2] net: enable IOVA mode for PMDs Jianfeng Tan 2017-10-11 10:43 ` Burakov, Anatoly @ 2017-10-11 11:30 ` Burakov, Anatoly 2017-10-11 11:33 ` santosh 2018-01-05 10:32 ` Maxime Coquelin 3 siblings, 0 replies; 19+ messages in thread From: Burakov, Anatoly @ 2017-10-11 11:30 UTC (permalink / raw) To: Jianfeng Tan, dev Cc: santosh.shukla, sergio.gonzalez.monroy, thomas, ferruh.yigit On 11-Oct-17 11:33 AM, Jianfeng Tan wrote: > If we want to enable IOVA mode, introduced by > commit 93878cf0255e ("eal: introduce helper API for IOVA mode"), > we need PMDs (for PCI devices) to expose this flag. > > Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> > --- Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [dpdk-dev] [PATCH 2/2] net: enable IOVA mode for PMDs 2017-10-11 10:33 ` [dpdk-dev] [PATCH 2/2] net: enable IOVA mode for PMDs Jianfeng Tan 2017-10-11 10:43 ` Burakov, Anatoly 2017-10-11 11:30 ` Burakov, Anatoly @ 2017-10-11 11:33 ` santosh 2018-01-05 10:32 ` Maxime Coquelin 3 siblings, 0 replies; 19+ messages in thread From: santosh @ 2017-10-11 11:33 UTC (permalink / raw) To: Jianfeng Tan, dev; +Cc: sergio.gonzalez.monroy, thomas, ferruh.yigit On Wednesday 11 October 2017 04:03 PM, Jianfeng Tan wrote: > If we want to enable IOVA mode, introduced by > commit 93878cf0255e ("eal: introduce helper API for IOVA mode"), > we need PMDs (for PCI devices) to expose this flag. > > Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> > --- Reviewed-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [dpdk-dev] [PATCH 2/2] net: enable IOVA mode for PMDs 2017-10-11 10:33 ` [dpdk-dev] [PATCH 2/2] net: enable IOVA mode for PMDs Jianfeng Tan ` (2 preceding siblings ...) 2017-10-11 11:33 ` santosh @ 2018-01-05 10:32 ` Maxime Coquelin 2018-01-05 12:04 ` Maxime Coquelin 2018-01-05 12:10 ` santosh 3 siblings, 2 replies; 19+ messages in thread From: Maxime Coquelin @ 2018-01-05 10:32 UTC (permalink / raw) To: Jianfeng Tan, dev, santosh.shukla, ferruh.yigit Cc: sergio.gonzalez.monroy, thomas, Peter Xu Hi Jianfeng, On 10/11/2017 12:33 PM, Jianfeng Tan wrote: > If we want to enable IOVA mode, introduced by > commit 93878cf0255e ("eal: introduce helper API for IOVA mode"), > we need PMDs (for PCI devices) to expose this flag. > > Signed-off-by: Jianfeng Tan<jianfeng.tan@intel.com> > --- > drivers/net/e1000/em_ethdev.c | 3 ++- > drivers/net/e1000/igb_ethdev.c | 5 +++-- > drivers/net/fm10k/fm10k_ethdev.c | 3 ++- > drivers/net/i40e/i40e_ethdev.c | 3 ++- > drivers/net/i40e/i40e_ethdev_vf.c | 2 +- > drivers/net/ixgbe/ixgbe_ethdev.c | 5 +++-- > 6 files changed, 13 insertions(+), 8 deletions(-) This patch introduces a regression when doing device assignment in guest, because current VT-d emulation only supports 39bits guest address width [0]. In the Bz, Peter suggest we could have an IOVA allocator algorithm, which could start to allocate IOVAs from 0. I think it could solve the --no-huge case your series address, do you agree? But it would be a long term solution, we need to fix this in stable. Is the --no-huge option used in production, or is it only for testing? If the latter do you think we could revert your patch while we find a solution that makes all cases to work? Ferruh, I see you also faced problems with KNI, how did you solved it? Thanks, Maxime [0]: https://bugzilla.redhat.com/show_bug.cgi?id=1530957#c3 ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [dpdk-dev] [PATCH 2/2] net: enable IOVA mode for PMDs 2018-01-05 10:32 ` Maxime Coquelin @ 2018-01-05 12:04 ` Maxime Coquelin 2018-01-05 12:10 ` santosh 1 sibling, 0 replies; 19+ messages in thread From: Maxime Coquelin @ 2018-01-05 12:04 UTC (permalink / raw) To: Jianfeng Tan, dev, santosh.shukla, ferruh.yigit Cc: sergio.gonzalez.monroy, thomas, Peter Xu On 01/05/2018 11:32 AM, Maxime Coquelin wrote: > Hi Jianfeng, > > On 10/11/2017 12:33 PM, Jianfeng Tan wrote: >> If we want to enable IOVA mode, introduced by >> commit 93878cf0255e ("eal: introduce helper API for IOVA mode"), >> we need PMDs (for PCI devices) to expose this flag. >> >> Signed-off-by: Jianfeng Tan<jianfeng.tan@intel.com> >> --- >> drivers/net/e1000/em_ethdev.c | 3 ++- >> drivers/net/e1000/igb_ethdev.c | 5 +++-- >> drivers/net/fm10k/fm10k_ethdev.c | 3 ++- >> drivers/net/i40e/i40e_ethdev.c | 3 ++- >> drivers/net/i40e/i40e_ethdev_vf.c | 2 +- >> drivers/net/ixgbe/ixgbe_ethdev.c | 5 +++-- >> 6 files changed, 13 insertions(+), 8 deletions(-) > > This patch introduces a regression when doing device assignment in > guest, because current VT-d emulation only supports 39bits guest address > width [0]. > > In the Bz, Peter suggest we could have an IOVA allocator algorithm, > which could start to allocate IOVAs from 0. I think it could solve the > --no-huge case your series address, do you agree? > > But it would be a long term solution, we need to fix this in stable. > > Is the --no-huge option used in production, or is it only for testing? > If the latter do you think we could revert your patch while we find a > solution that makes all cases to work? It seems that we can get Intel IOMMU's Guest Address Width from the sysfs, as the CAP register is exposed. So we can get the SAGAW value (see [1], page 217): On Bare Metal: # echo $(((0x`cat /sys/class/iommu/dmar0/intel-iommu/cap` >> 8) & 0x1f)) 4 => 48bits In guest: # echo $(((0x`cat /sys/class/iommu/dmar0/intel-iommu/cap` >> 8) & 0x1f)) 2 => 39bits Using this, we could or not allow the VA mode when using Intel IOMMU. Any thoughts? Regards, Maxime [1]: https://software.intel.com/sites/default/files/managed/c5/15/vt-directed-io-spec.pdf > Ferruh, I see you also faced problems with KNI, how did you solved it? > > Thanks, > Maxime > > [0]: https://bugzilla.redhat.com/show_bug.cgi?id=1530957#c3 ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [dpdk-dev] [PATCH 2/2] net: enable IOVA mode for PMDs 2018-01-05 10:32 ` Maxime Coquelin 2018-01-05 12:04 ` Maxime Coquelin @ 2018-01-05 12:10 ` santosh 2018-01-05 12:57 ` Maxime Coquelin 1 sibling, 1 reply; 19+ messages in thread From: santosh @ 2018-01-05 12:10 UTC (permalink / raw) To: Maxime Coquelin, Jianfeng Tan, dev, ferruh.yigit Cc: sergio.gonzalez.monroy, thomas, Peter Xu Hi Maxim, On Friday 05 January 2018 04:02 PM, Maxime Coquelin wrote: > Hi Jianfeng, > > On 10/11/2017 12:33 PM, Jianfeng Tan wrote: >> If we want to enable IOVA mode, introduced by >> commit 93878cf0255e ("eal: introduce helper API for IOVA mode"), >> we need PMDs (for PCI devices) to expose this flag. >> >> Signed-off-by: Jianfeng Tan<jianfeng.tan@intel.com> >> --- [...] > Ferruh, I see you also faced problems with KNI, how did you solved it? > By checking lsmod for rte_kni module and if found then set .iova_mode = _pa, refer [1]. You may follow similar approach.. meaning detect emulation mode Or if not then other-way to introduce --iova-mode=<> eal arg. [1] http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal.c#n810 Thanks. > Thanks, > Maxime > > [0]: https://bugzilla.redhat.com/show_bug.cgi?id=1530957#c3 ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [dpdk-dev] [PATCH 2/2] net: enable IOVA mode for PMDs 2018-01-05 12:10 ` santosh @ 2018-01-05 12:57 ` Maxime Coquelin 0 siblings, 0 replies; 19+ messages in thread From: Maxime Coquelin @ 2018-01-05 12:57 UTC (permalink / raw) To: santosh, Jianfeng Tan, dev, ferruh.yigit Cc: sergio.gonzalez.monroy, thomas, Peter Xu Hi Santosh On 01/05/2018 01:10 PM, santosh wrote: > Hi Maxim, > > > On Friday 05 January 2018 04:02 PM, Maxime Coquelin wrote: >> Hi Jianfeng, >> >> On 10/11/2017 12:33 PM, Jianfeng Tan wrote: >>> If we want to enable IOVA mode, introduced by >>> commit 93878cf0255e ("eal: introduce helper API for IOVA mode"), >>> we need PMDs (for PCI devices) to expose this flag. >>> >>> Signed-off-by: Jianfeng Tan<jianfeng.tan@intel.com> >>> --- > > [...] > >> Ferruh, I see you also faced problems with KNI, how did you solved it? >> > By checking lsmod for rte_kni module and if found then set .iova_mode = _pa, refer [1]. > You may follow similar approach.. meaning detect emulation mode Or if not then > other-way to introduce --iova-mode=<> eal arg. Thanks for the information Detecting whether we are in host or guest is not that trivial, and as Peter pointed me out, the VT-d specifies the 39bits guest address width so there are certainly some processors in the wild using it. And I don't think introducing a new EAL arg in -stable is a good idea. If this is the only solution, then we should keep PA by default. When using intel IOMMU, I think the best solution is to forbid VA mode if GAW is 39 bits. Regards, Maxime > [1] http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal.c#n810 > > Thanks. > >> Thanks, >> Maxime >> >> [0]: https://bugzilla.redhat.com/show_bug.cgi?id=1530957#c3 > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [dpdk-dev] [PATCH 0/2] enable 4KB + VFIO-PCI 2017-10-11 10:33 [dpdk-dev] [PATCH 0/2] enable 4KB + VFIO-PCI Jianfeng Tan 2017-10-11 10:33 ` [dpdk-dev] [PATCH 1/2] eal: honor IOVA mode for no-huge case Jianfeng Tan 2017-10-11 10:33 ` [dpdk-dev] [PATCH 2/2] net: enable IOVA mode for PMDs Jianfeng Tan @ 2017-10-11 10:47 ` Burakov, Anatoly 2017-10-11 10:50 ` Thomas Monjalon 2017-10-12 19:57 ` Ferruh Yigit 3 siblings, 1 reply; 19+ messages in thread From: Burakov, Anatoly @ 2017-10-11 10:47 UTC (permalink / raw) To: Jianfeng Tan, dev Cc: santosh.shukla, sergio.gonzalez.monroy, thomas, ferruh.yigit On 11-Oct-17 11:33 AM, Jianfeng Tan wrote: > Patch 1: Use VA as IOVA if IOVA mode is enabled. > Patch 2: Enable IOVA mode for the PMDs for Intel NICs. > > How to test: > > $ (bind nic to vfio-pci) > $ testpmd -c 0x3 -n 4 -m 2048 --no-huge -- -i --no-numa > > Jianfeng Tan (2): > eal: honor IOVA mode for no-huge case > net: enable IOVA mode for PMDs > > drivers/net/e1000/em_ethdev.c | 3 ++- > drivers/net/e1000/igb_ethdev.c | 5 +++-- > drivers/net/fm10k/fm10k_ethdev.c | 3 ++- > drivers/net/i40e/i40e_ethdev.c | 3 ++- > drivers/net/i40e/i40e_ethdev_vf.c | 2 +- > drivers/net/ixgbe/ixgbe_ethdev.c | 5 +++-- > lib/librte_eal/linuxapp/eal/eal_memory.c | 5 ++++- > 7 files changed, 17 insertions(+), 9 deletions(-) > The patchset should probably mention its dependency on IOVA patches from Santosh. -- Thanks, Anatoly ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [dpdk-dev] [PATCH 0/2] enable 4KB + VFIO-PCI 2017-10-11 10:47 ` [dpdk-dev] [PATCH 0/2] enable 4KB + VFIO-PCI Burakov, Anatoly @ 2017-10-11 10:50 ` Thomas Monjalon 0 siblings, 0 replies; 19+ messages in thread From: Thomas Monjalon @ 2017-10-11 10:50 UTC (permalink / raw) To: Burakov, Anatoly Cc: Jianfeng Tan, dev, santosh.shukla, sergio.gonzalez.monroy, ferruh.yigit 11/10/2017 12:47, Burakov, Anatoly: > On 11-Oct-17 11:33 AM, Jianfeng Tan wrote: > > Patch 1: Use VA as IOVA if IOVA mode is enabled. > > Patch 2: Enable IOVA mode for the PMDs for Intel NICs. [...] > > The patchset should probably mention its dependency on IOVA patches from > Santosh. No need because IOVA patches are merged. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [dpdk-dev] [PATCH 0/2] enable 4KB + VFIO-PCI 2017-10-11 10:33 [dpdk-dev] [PATCH 0/2] enable 4KB + VFIO-PCI Jianfeng Tan ` (2 preceding siblings ...) 2017-10-11 10:47 ` [dpdk-dev] [PATCH 0/2] enable 4KB + VFIO-PCI Burakov, Anatoly @ 2017-10-12 19:57 ` Ferruh Yigit 3 siblings, 0 replies; 19+ messages in thread From: Ferruh Yigit @ 2017-10-12 19:57 UTC (permalink / raw) To: Jianfeng Tan, dev; +Cc: santosh.shukla, sergio.gonzalez.monroy, thomas On 10/11/2017 11:33 AM, Jianfeng Tan wrote: > Patch 1: Use VA as IOVA if IOVA mode is enabled. > Patch 2: Enable IOVA mode for the PMDs for Intel NICs. > > How to test: > > $ (bind nic to vfio-pci) > $ testpmd -c 0x3 -n 4 -m 2048 --no-huge -- -i --no-numa > > Jianfeng Tan (2): > eal: honor IOVA mode for no-huge case > net: enable IOVA mode for PMDs Series applied to dpdk/master, thanks. ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2018-01-05 12:58 UTC | newest] Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-10-11 10:33 [dpdk-dev] [PATCH 0/2] enable 4KB + VFIO-PCI Jianfeng Tan 2017-10-11 10:33 ` [dpdk-dev] [PATCH 1/2] eal: honor IOVA mode for no-huge case Jianfeng Tan 2017-10-11 11:27 ` Burakov, Anatoly 2017-10-11 11:30 ` santosh 2017-10-31 21:49 ` Ferruh Yigit 2017-10-31 22:37 ` Ferruh Yigit 2017-11-01 1:10 ` Ferruh Yigit 2017-10-11 10:33 ` [dpdk-dev] [PATCH 2/2] net: enable IOVA mode for PMDs Jianfeng Tan 2017-10-11 10:43 ` Burakov, Anatoly 2017-10-11 10:56 ` Tan, Jianfeng 2017-10-11 11:30 ` Burakov, Anatoly 2017-10-11 11:33 ` santosh 2018-01-05 10:32 ` Maxime Coquelin 2018-01-05 12:04 ` Maxime Coquelin 2018-01-05 12:10 ` santosh 2018-01-05 12:57 ` Maxime Coquelin 2017-10-11 10:47 ` [dpdk-dev] [PATCH 0/2] enable 4KB + VFIO-PCI Burakov, Anatoly 2017-10-11 10:50 ` Thomas Monjalon 2017-10-12 19:57 ` Ferruh Yigit
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).