DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 0/2] enable 4KB + VFIO-PCI
@ 2017-10-11 10:33 Jianfeng Tan
  2017-10-11 10:33 ` [dpdk-dev] [PATCH 1/2] eal: honor IOVA mode for no-huge case Jianfeng Tan
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Jianfeng Tan @ 2017-10-11 10:33 UTC (permalink / raw)
  To: dev
  Cc: santosh.shukla, sergio.gonzalez.monroy, thomas, ferruh.yigit,
	Jianfeng Tan

Patch 1: Use VA as IOVA if IOVA mode is enabled.
Patch 2: Enable IOVA mode for the PMDs for Intel NICs.

How to test:

$ (bind nic to vfio-pci)
$ testpmd -c 0x3 -n 4 -m 2048 --no-huge -- -i --no-numa

Jianfeng Tan (2):
  eal: honor IOVA mode for no-huge case
  net: enable IOVA mode for PMDs

 drivers/net/e1000/em_ethdev.c            | 3 ++-
 drivers/net/e1000/igb_ethdev.c           | 5 +++--
 drivers/net/fm10k/fm10k_ethdev.c         | 3 ++-
 drivers/net/i40e/i40e_ethdev.c           | 3 ++-
 drivers/net/i40e/i40e_ethdev_vf.c        | 2 +-
 drivers/net/ixgbe/ixgbe_ethdev.c         | 5 +++--
 lib/librte_eal/linuxapp/eal/eal_memory.c | 5 ++++-
 7 files changed, 17 insertions(+), 9 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [dpdk-dev] [PATCH 1/2] eal: honor IOVA mode for no-huge case
  2017-10-11 10:33 [dpdk-dev] [PATCH 0/2] enable 4KB + VFIO-PCI Jianfeng Tan
@ 2017-10-11 10:33 ` Jianfeng Tan
  2017-10-11 11:27   ` Burakov, Anatoly
                     ` (2 more replies)
  2017-10-11 10:33 ` [dpdk-dev] [PATCH 2/2] net: enable IOVA mode for PMDs Jianfeng Tan
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 19+ messages in thread
From: Jianfeng Tan @ 2017-10-11 10:33 UTC (permalink / raw)
  To: dev
  Cc: santosh.shukla, sergio.gonzalez.monroy, thomas, ferruh.yigit,
	Jianfeng Tan

With the introduction of IOVA mode, the only blocker to run
with 4KB pages for NICs binding to vfio-pci, is that
RTE_BAD_PHYS_ADDR is not a valid IOVA address.

We can refine this by using VA as IOVA if it's IOVA mode.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
 lib/librte_eal/linuxapp/eal/eal_memory.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 28bca49..187d338 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -1030,7 +1030,10 @@ rte_eal_hugepage_init(void)
 					strerror(errno));
 			return -1;
 		}
-		mcfg->memseg[0].phys_addr = RTE_BAD_PHYS_ADDR;
+		if (rte_eal_iova_mode() == RTE_IOVA_VA)
+			mcfg->memseg[0].phys_addr = (uintptr_t)addr;
+		else
+			mcfg->memseg[0].phys_addr = RTE_BAD_PHYS_ADDR;
 		mcfg->memseg[0].addr = addr;
 		mcfg->memseg[0].hugepage_sz = RTE_PGSIZE_4K;
 		mcfg->memseg[0].len = internal_config.memory;
-- 
2.7.4

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [dpdk-dev] [PATCH 2/2] net: enable IOVA mode for PMDs
  2017-10-11 10:33 [dpdk-dev] [PATCH 0/2] enable 4KB + VFIO-PCI Jianfeng Tan
  2017-10-11 10:33 ` [dpdk-dev] [PATCH 1/2] eal: honor IOVA mode for no-huge case Jianfeng Tan
@ 2017-10-11 10:33 ` Jianfeng Tan
  2017-10-11 10:43   ` Burakov, Anatoly
                     ` (3 more replies)
  2017-10-11 10:47 ` [dpdk-dev] [PATCH 0/2] enable 4KB + VFIO-PCI Burakov, Anatoly
  2017-10-12 19:57 ` Ferruh Yigit
  3 siblings, 4 replies; 19+ messages in thread
From: Jianfeng Tan @ 2017-10-11 10:33 UTC (permalink / raw)
  To: dev
  Cc: santosh.shukla, sergio.gonzalez.monroy, thomas, ferruh.yigit,
	Jianfeng Tan

If we want to enable IOVA mode, introduced by
commit 93878cf0255e ("eal: introduce helper API for IOVA mode"),
we need PMDs (for PCI devices) to expose this flag.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
 drivers/net/e1000/em_ethdev.c     | 3 ++-
 drivers/net/e1000/igb_ethdev.c    | 5 +++--
 drivers/net/fm10k/fm10k_ethdev.c  | 3 ++-
 drivers/net/i40e/i40e_ethdev.c    | 3 ++-
 drivers/net/i40e/i40e_ethdev_vf.c | 2 +-
 drivers/net/ixgbe/ixgbe_ethdev.c  | 5 +++--
 6 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/drivers/net/e1000/em_ethdev.c b/drivers/net/e1000/em_ethdev.c
index a59947d..324f051 100644
--- a/drivers/net/e1000/em_ethdev.c
+++ b/drivers/net/e1000/em_ethdev.c
@@ -432,7 +432,8 @@ static int eth_em_pci_remove(struct rte_pci_device *pci_dev)
 
 static struct rte_pci_driver rte_em_pmd = {
 	.id_table = pci_id_em_map,
-	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC |
+		     RTE_PCI_DRV_IOVA_AS_VA,
 	.probe = eth_em_pci_probe,
 	.remove = eth_em_pci_remove,
 };
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index 040dd9f..a760011 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -1168,7 +1168,8 @@ static int eth_igb_pci_remove(struct rte_pci_device *pci_dev)
 
 static struct rte_pci_driver rte_igb_pmd = {
 	.id_table = pci_id_igb_map,
-	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC |
+		     RTE_PCI_DRV_IOVA_AS_VA,
 	.probe = eth_igb_pci_probe,
 	.remove = eth_igb_pci_remove,
 };
@@ -1191,7 +1192,7 @@ static int eth_igbvf_pci_remove(struct rte_pci_device *pci_dev)
  */
 static struct rte_pci_driver rte_igbvf_pmd = {
 	.id_table = pci_id_igbvf_map,
-	.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_IOVA_AS_VA,
 	.probe = eth_igbvf_pci_probe,
 	.remove = eth_igbvf_pci_remove,
 };
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 15ea2a5..bf36e71 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -3142,7 +3142,8 @@ static const struct rte_pci_id pci_id_fm10k_map[] = {
 
 static struct rte_pci_driver rte_pmd_fm10k = {
 	.id_table = pci_id_fm10k_map,
-	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC |
+		     RTE_PCI_DRV_IOVA_AS_VA,
 	.probe = eth_fm10k_pci_probe,
 	.remove = eth_fm10k_pci_remove,
 };
diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 536365d..f6330d2 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -654,7 +654,8 @@ static int eth_i40e_pci_remove(struct rte_pci_device *pci_dev)
 
 static struct rte_pci_driver rte_i40e_pmd = {
 	.id_table = pci_id_i40e_map,
-	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC |
+		     RTE_PCI_DRV_IOVA_AS_VA,
 	.probe = eth_i40e_pci_probe,
 	.remove = eth_i40e_pci_remove,
 };
diff --git a/drivers/net/i40e/i40e_ethdev_vf.c b/drivers/net/i40e/i40e_ethdev_vf.c
index 111ac39..4cadf83 100644
--- a/drivers/net/i40e/i40e_ethdev_vf.c
+++ b/drivers/net/i40e/i40e_ethdev_vf.c
@@ -1527,7 +1527,7 @@ static int eth_i40evf_pci_remove(struct rte_pci_device *pci_dev)
  */
 static struct rte_pci_driver rte_i40evf_pmd = {
 	.id_table = pci_id_i40evf_map,
-	.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_IOVA_AS_VA,
 	.probe = eth_i40evf_pci_probe,
 	.remove = eth_i40evf_pci_remove,
 };
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index a7d7acc..6ad28b3 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -1781,7 +1781,8 @@ static int eth_ixgbe_pci_remove(struct rte_pci_device *pci_dev)
 
 static struct rte_pci_driver rte_ixgbe_pmd = {
 	.id_table = pci_id_ixgbe_map,
-	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC |
+		     RTE_PCI_DRV_IOVA_AS_VA,
 	.probe = eth_ixgbe_pci_probe,
 	.remove = eth_ixgbe_pci_remove,
 };
@@ -1803,7 +1804,7 @@ static int eth_ixgbevf_pci_remove(struct rte_pci_device *pci_dev)
  */
 static struct rte_pci_driver rte_ixgbevf_pmd = {
 	.id_table = pci_id_ixgbevf_map,
-	.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_IOVA_AS_VA,
 	.probe = eth_ixgbevf_pci_probe,
 	.remove = eth_ixgbevf_pci_remove,
 };
-- 
2.7.4

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dpdk-dev] [PATCH 2/2] net: enable IOVA mode for PMDs
  2017-10-11 10:33 ` [dpdk-dev] [PATCH 2/2] net: enable IOVA mode for PMDs Jianfeng Tan
@ 2017-10-11 10:43   ` Burakov, Anatoly
  2017-10-11 10:56     ` Tan, Jianfeng
  2017-10-11 11:30   ` Burakov, Anatoly
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 19+ messages in thread
From: Burakov, Anatoly @ 2017-10-11 10:43 UTC (permalink / raw)
  To: Jianfeng Tan, dev
  Cc: santosh.shukla, sergio.gonzalez.monroy, thomas, ferruh.yigit

On 11-Oct-17 11:33 AM, Jianfeng Tan wrote:
> If we want to enable IOVA mode, introduced by
> commit 93878cf0255e ("eal: introduce helper API for IOVA mode"),
> we need PMDs (for PCI devices) to expose this flag.
> 
> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> ---

Is this the complete list of drivers which need this flag? Do other 
devices (e.g. cryptodev?) need this flag?

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dpdk-dev] [PATCH 0/2] enable 4KB + VFIO-PCI
  2017-10-11 10:33 [dpdk-dev] [PATCH 0/2] enable 4KB + VFIO-PCI Jianfeng Tan
  2017-10-11 10:33 ` [dpdk-dev] [PATCH 1/2] eal: honor IOVA mode for no-huge case Jianfeng Tan
  2017-10-11 10:33 ` [dpdk-dev] [PATCH 2/2] net: enable IOVA mode for PMDs Jianfeng Tan
@ 2017-10-11 10:47 ` Burakov, Anatoly
  2017-10-11 10:50   ` Thomas Monjalon
  2017-10-12 19:57 ` Ferruh Yigit
  3 siblings, 1 reply; 19+ messages in thread
From: Burakov, Anatoly @ 2017-10-11 10:47 UTC (permalink / raw)
  To: Jianfeng Tan, dev
  Cc: santosh.shukla, sergio.gonzalez.monroy, thomas, ferruh.yigit

On 11-Oct-17 11:33 AM, Jianfeng Tan wrote:
> Patch 1: Use VA as IOVA if IOVA mode is enabled.
> Patch 2: Enable IOVA mode for the PMDs for Intel NICs.
> 
> How to test:
> 
> $ (bind nic to vfio-pci)
> $ testpmd -c 0x3 -n 4 -m 2048 --no-huge -- -i --no-numa
> 
> Jianfeng Tan (2):
>    eal: honor IOVA mode for no-huge case
>    net: enable IOVA mode for PMDs
> 
>   drivers/net/e1000/em_ethdev.c            | 3 ++-
>   drivers/net/e1000/igb_ethdev.c           | 5 +++--
>   drivers/net/fm10k/fm10k_ethdev.c         | 3 ++-
>   drivers/net/i40e/i40e_ethdev.c           | 3 ++-
>   drivers/net/i40e/i40e_ethdev_vf.c        | 2 +-
>   drivers/net/ixgbe/ixgbe_ethdev.c         | 5 +++--
>   lib/librte_eal/linuxapp/eal/eal_memory.c | 5 ++++-
>   7 files changed, 17 insertions(+), 9 deletions(-)
> 

The patchset should probably mention its dependency on IOVA patches from 
Santosh.

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dpdk-dev] [PATCH 0/2] enable 4KB + VFIO-PCI
  2017-10-11 10:47 ` [dpdk-dev] [PATCH 0/2] enable 4KB + VFIO-PCI Burakov, Anatoly
@ 2017-10-11 10:50   ` Thomas Monjalon
  0 siblings, 0 replies; 19+ messages in thread
From: Thomas Monjalon @ 2017-10-11 10:50 UTC (permalink / raw)
  To: Burakov, Anatoly
  Cc: Jianfeng Tan, dev, santosh.shukla, sergio.gonzalez.monroy, ferruh.yigit

11/10/2017 12:47, Burakov, Anatoly:
> On 11-Oct-17 11:33 AM, Jianfeng Tan wrote:
> > Patch 1: Use VA as IOVA if IOVA mode is enabled.
> > Patch 2: Enable IOVA mode for the PMDs for Intel NICs.
[...]
> 
> The patchset should probably mention its dependency on IOVA patches from 
> Santosh.

No need because IOVA patches are merged.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dpdk-dev] [PATCH 2/2] net: enable IOVA mode for PMDs
  2017-10-11 10:43   ` Burakov, Anatoly
@ 2017-10-11 10:56     ` Tan, Jianfeng
  0 siblings, 0 replies; 19+ messages in thread
From: Tan, Jianfeng @ 2017-10-11 10:56 UTC (permalink / raw)
  To: Burakov, Anatoly, dev
  Cc: santosh.shukla, sergio.gonzalez.monroy, thomas, ferruh.yigit



On 10/11/2017 6:43 PM, Burakov, Anatoly wrote:
> On 11-Oct-17 11:33 AM, Jianfeng Tan wrote:
>> If we want to enable IOVA mode, introduced by
>> commit 93878cf0255e ("eal: introduce helper API for IOVA mode"),
>> we need PMDs (for PCI devices) to expose this flag.
>>
>> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
>> ---
>
> Is this the complete list of drivers which need this flag? Do other 
> devices (e.g. cryptodev?) need this flag?

No, these are just NICs from Intel (as an example). If other NICs want 
to enable this, I'm more than happy to cover it in v2.

Thanks,
Jianfeng

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dpdk-dev] [PATCH 1/2] eal: honor IOVA mode for no-huge case
  2017-10-11 10:33 ` [dpdk-dev] [PATCH 1/2] eal: honor IOVA mode for no-huge case Jianfeng Tan
@ 2017-10-11 11:27   ` Burakov, Anatoly
  2017-10-11 11:30   ` santosh
  2017-10-31 21:49   ` Ferruh Yigit
  2 siblings, 0 replies; 19+ messages in thread
From: Burakov, Anatoly @ 2017-10-11 11:27 UTC (permalink / raw)
  To: Jianfeng Tan, dev
  Cc: santosh.shukla, sergio.gonzalez.monroy, thomas, ferruh.yigit

On 11-Oct-17 11:33 AM, Jianfeng Tan wrote:
> With the introduction of IOVA mode, the only blocker to run
> with 4KB pages for NICs binding to vfio-pci, is that
> RTE_BAD_PHYS_ADDR is not a valid IOVA address.
> 
> We can refine this by using VA as IOVA if it's IOVA mode.
> 
> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> ---
>   lib/librte_eal/linuxapp/eal/eal_memory.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
> index 28bca49..187d338 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
> @@ -1030,7 +1030,10 @@ rte_eal_hugepage_init(void)
>   					strerror(errno));
>   			return -1;
>   		}
> -		mcfg->memseg[0].phys_addr = RTE_BAD_PHYS_ADDR;
> +		if (rte_eal_iova_mode() == RTE_IOVA_VA)
> +			mcfg->memseg[0].phys_addr = (uintptr_t)addr;
> +		else
> +			mcfg->memseg[0].phys_addr = RTE_BAD_PHYS_ADDR;
>   		mcfg->memseg[0].addr = addr;
>   		mcfg->memseg[0].hugepage_sz = RTE_PGSIZE_4K;
>   		mcfg->memseg[0].len = internal_config.memory;
> 
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dpdk-dev] [PATCH 2/2] net: enable IOVA mode for PMDs
  2017-10-11 10:33 ` [dpdk-dev] [PATCH 2/2] net: enable IOVA mode for PMDs Jianfeng Tan
  2017-10-11 10:43   ` Burakov, Anatoly
@ 2017-10-11 11:30   ` Burakov, Anatoly
  2017-10-11 11:33   ` santosh
  2018-01-05 10:32   ` Maxime Coquelin
  3 siblings, 0 replies; 19+ messages in thread
From: Burakov, Anatoly @ 2017-10-11 11:30 UTC (permalink / raw)
  To: Jianfeng Tan, dev
  Cc: santosh.shukla, sergio.gonzalez.monroy, thomas, ferruh.yigit

On 11-Oct-17 11:33 AM, Jianfeng Tan wrote:
> If we want to enable IOVA mode, introduced by
> commit 93878cf0255e ("eal: introduce helper API for IOVA mode"),
> we need PMDs (for PCI devices) to expose this flag.
> 
> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> ---

Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dpdk-dev] [PATCH 1/2] eal: honor IOVA mode for no-huge case
  2017-10-11 10:33 ` [dpdk-dev] [PATCH 1/2] eal: honor IOVA mode for no-huge case Jianfeng Tan
  2017-10-11 11:27   ` Burakov, Anatoly
@ 2017-10-11 11:30   ` santosh
  2017-10-31 21:49   ` Ferruh Yigit
  2 siblings, 0 replies; 19+ messages in thread
From: santosh @ 2017-10-11 11:30 UTC (permalink / raw)
  To: Jianfeng Tan, dev; +Cc: sergio.gonzalez.monroy, thomas, ferruh.yigit


On Wednesday 11 October 2017 04:03 PM, Jianfeng Tan wrote:
> With the introduction of IOVA mode, the only blocker to run
> with 4KB pages for NICs binding to vfio-pci, is that
> RTE_BAD_PHYS_ADDR is not a valid IOVA address.
>
> We can refine this by using VA as IOVA if it's IOVA mode.
>
> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> ---

Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dpdk-dev] [PATCH 2/2] net: enable IOVA mode for PMDs
  2017-10-11 10:33 ` [dpdk-dev] [PATCH 2/2] net: enable IOVA mode for PMDs Jianfeng Tan
  2017-10-11 10:43   ` Burakov, Anatoly
  2017-10-11 11:30   ` Burakov, Anatoly
@ 2017-10-11 11:33   ` santosh
  2018-01-05 10:32   ` Maxime Coquelin
  3 siblings, 0 replies; 19+ messages in thread
From: santosh @ 2017-10-11 11:33 UTC (permalink / raw)
  To: Jianfeng Tan, dev; +Cc: sergio.gonzalez.monroy, thomas, ferruh.yigit


On Wednesday 11 October 2017 04:03 PM, Jianfeng Tan wrote:
> If we want to enable IOVA mode, introduced by
> commit 93878cf0255e ("eal: introduce helper API for IOVA mode"),
> we need PMDs (for PCI devices) to expose this flag.
>
> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> ---

Reviewed-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dpdk-dev] [PATCH 0/2] enable 4KB + VFIO-PCI
  2017-10-11 10:33 [dpdk-dev] [PATCH 0/2] enable 4KB + VFIO-PCI Jianfeng Tan
                   ` (2 preceding siblings ...)
  2017-10-11 10:47 ` [dpdk-dev] [PATCH 0/2] enable 4KB + VFIO-PCI Burakov, Anatoly
@ 2017-10-12 19:57 ` Ferruh Yigit
  3 siblings, 0 replies; 19+ messages in thread
From: Ferruh Yigit @ 2017-10-12 19:57 UTC (permalink / raw)
  To: Jianfeng Tan, dev; +Cc: santosh.shukla, sergio.gonzalez.monroy, thomas

On 10/11/2017 11:33 AM, Jianfeng Tan wrote:
> Patch 1: Use VA as IOVA if IOVA mode is enabled.
> Patch 2: Enable IOVA mode for the PMDs for Intel NICs.
> 
> How to test:
> 
> $ (bind nic to vfio-pci)
> $ testpmd -c 0x3 -n 4 -m 2048 --no-huge -- -i --no-numa
> 
> Jianfeng Tan (2):
>   eal: honor IOVA mode for no-huge case
>   net: enable IOVA mode for PMDs

Series applied to dpdk/master, thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dpdk-dev] [PATCH 1/2] eal: honor IOVA mode for no-huge case
  2017-10-11 10:33 ` [dpdk-dev] [PATCH 1/2] eal: honor IOVA mode for no-huge case Jianfeng Tan
  2017-10-11 11:27   ` Burakov, Anatoly
  2017-10-11 11:30   ` santosh
@ 2017-10-31 21:49   ` Ferruh Yigit
  2017-10-31 22:37     ` Ferruh Yigit
  2 siblings, 1 reply; 19+ messages in thread
From: Ferruh Yigit @ 2017-10-31 21:49 UTC (permalink / raw)
  To: Jianfeng Tan, dev; +Cc: santosh.shukla, sergio.gonzalez.monroy, thomas

On 10/11/2017 3:33 AM, Jianfeng Tan wrote:
> With the introduction of IOVA mode, the only blocker to run
> with 4KB pages for NICs binding to vfio-pci, is that
> RTE_BAD_PHYS_ADDR is not a valid IOVA address.
> 
> We can refine this by using VA as IOVA if it's IOVA mode.
> 
> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> ---
>  lib/librte_eal/linuxapp/eal/eal_memory.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
> index 28bca49..187d338 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
> @@ -1030,7 +1030,10 @@ rte_eal_hugepage_init(void)
>  					strerror(errno));
>  			return -1;
>  		}
> -		mcfg->memseg[0].phys_addr = RTE_BAD_PHYS_ADDR;
> +		if (rte_eal_iova_mode() == RTE_IOVA_VA)
> +			mcfg->memseg[0].phys_addr = (uintptr_t)addr;
> +		else
> +			mcfg->memseg[0].phys_addr = RTE_BAD_PHYS_ADDR;

This breaks KNI which requires physical address.

Any idea how to disable RTE_IOVA_VA when KNI used?

>  		mcfg->memseg[0].addr = addr;
>  		mcfg->memseg[0].hugepage_sz = RTE_PGSIZE_4K;
>  		mcfg->memseg[0].len = internal_config.memory;
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dpdk-dev] [PATCH 1/2] eal: honor IOVA mode for no-huge case
  2017-10-31 21:49   ` Ferruh Yigit
@ 2017-10-31 22:37     ` Ferruh Yigit
  2017-11-01  1:10       ` Ferruh Yigit
  0 siblings, 1 reply; 19+ messages in thread
From: Ferruh Yigit @ 2017-10-31 22:37 UTC (permalink / raw)
  To: Jianfeng Tan, dev; +Cc: santosh.shukla, sergio.gonzalez.monroy, thomas

On 10/31/2017 2:49 PM, Ferruh Yigit wrote:
> On 10/11/2017 3:33 AM, Jianfeng Tan wrote:
>> With the introduction of IOVA mode, the only blocker to run
>> with 4KB pages for NICs binding to vfio-pci, is that
>> RTE_BAD_PHYS_ADDR is not a valid IOVA address.
>>
>> We can refine this by using VA as IOVA if it's IOVA mode.
>>
>> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
>> ---
>>  lib/librte_eal/linuxapp/eal/eal_memory.c | 5 ++++-
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
>> index 28bca49..187d338 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
>> @@ -1030,7 +1030,10 @@ rte_eal_hugepage_init(void)
>>  					strerror(errno));
>>  			return -1;
>>  		}
>> -		mcfg->memseg[0].phys_addr = RTE_BAD_PHYS_ADDR;
>> +		if (rte_eal_iova_mode() == RTE_IOVA_VA)
>> +			mcfg->memseg[0].phys_addr = (uintptr_t)addr;
>> +		else
>> +			mcfg->memseg[0].phys_addr = RTE_BAD_PHYS_ADDR;
> 
> This breaks KNI which requires physical address.

My bad, this patch is for no_hugetlbfs case.

Issue seen starting from next patch in the set [1], which enables IOVA mode for
Intel PMDs.

With IOVA mode enabled, KNI fails.

Does it make sense to add an API to set iova mode explicitly by application?
Application can set iova to PA and allocate memzones it requires.

[1]
http://dpdk.org/commit/f37dfab2

> 
> Any idea how to disable RTE_IOVA_VA when KNI used?
> 
>>  		mcfg->memseg[0].addr = addr;
>>  		mcfg->memseg[0].hugepage_sz = RTE_PGSIZE_4K;
>>  		mcfg->memseg[0].len = internal_config.memory;
>>
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dpdk-dev] [PATCH 1/2] eal: honor IOVA mode for no-huge case
  2017-10-31 22:37     ` Ferruh Yigit
@ 2017-11-01  1:10       ` Ferruh Yigit
  0 siblings, 0 replies; 19+ messages in thread
From: Ferruh Yigit @ 2017-11-01  1:10 UTC (permalink / raw)
  To: Jianfeng Tan, dev; +Cc: santosh.shukla, sergio.gonzalez.monroy, thomas

On 10/31/2017 3:37 PM, Ferruh Yigit wrote:
> On 10/31/2017 2:49 PM, Ferruh Yigit wrote:
>> On 10/11/2017 3:33 AM, Jianfeng Tan wrote:
>>> With the introduction of IOVA mode, the only blocker to run
>>> with 4KB pages for NICs binding to vfio-pci, is that
>>> RTE_BAD_PHYS_ADDR is not a valid IOVA address.
>>>
>>> We can refine this by using VA as IOVA if it's IOVA mode.
>>>
>>> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
>>> ---
>>>  lib/librte_eal/linuxapp/eal/eal_memory.c | 5 ++++-
>>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
>>> index 28bca49..187d338 100644
>>> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
>>> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
>>> @@ -1030,7 +1030,10 @@ rte_eal_hugepage_init(void)
>>>  					strerror(errno));
>>>  			return -1;
>>>  		}
>>> -		mcfg->memseg[0].phys_addr = RTE_BAD_PHYS_ADDR;
>>> +		if (rte_eal_iova_mode() == RTE_IOVA_VA)
>>> +			mcfg->memseg[0].phys_addr = (uintptr_t)addr;
>>> +		else
>>> +			mcfg->memseg[0].phys_addr = RTE_BAD_PHYS_ADDR;
>>
>> This breaks KNI which requires physical address.
> 
> My bad, this patch is for no_hugetlbfs case.
> 
> Issue seen starting from next patch in the set [1], which enables IOVA mode for
> Intel PMDs.
> 
> With IOVA mode enabled, KNI fails.
> 
> Does it make sense to add an API to set iova mode explicitly by application?
> Application can set iova to PA and allocate memzones it requires.

Added config option to disable IOVA mode detection:
http://dpdk.org/dev/patchwork/patch/31071/

Still concerned if this may hit someone, since the result for KNI is a kernel
crash it would be nice to have more solid protection here.

And suggestion welcome.

Thanks,
ferruh

> 
> [1]
> http://dpdk.org/commit/f37dfab2
> 
>>
>> Any idea how to disable RTE_IOVA_VA when KNI used?
>>
>>>  		mcfg->memseg[0].addr = addr;
>>>  		mcfg->memseg[0].hugepage_sz = RTE_PGSIZE_4K;
>>>  		mcfg->memseg[0].len = internal_config.memory;
>>>
>>
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dpdk-dev] [PATCH 2/2] net: enable IOVA mode for PMDs
  2017-10-11 10:33 ` [dpdk-dev] [PATCH 2/2] net: enable IOVA mode for PMDs Jianfeng Tan
                     ` (2 preceding siblings ...)
  2017-10-11 11:33   ` santosh
@ 2018-01-05 10:32   ` Maxime Coquelin
  2018-01-05 12:04     ` Maxime Coquelin
  2018-01-05 12:10     ` santosh
  3 siblings, 2 replies; 19+ messages in thread
From: Maxime Coquelin @ 2018-01-05 10:32 UTC (permalink / raw)
  To: Jianfeng Tan, dev, santosh.shukla, ferruh.yigit
  Cc: sergio.gonzalez.monroy, thomas, Peter Xu

Hi Jianfeng,

On 10/11/2017 12:33 PM, Jianfeng Tan wrote:
> If we want to enable IOVA mode, introduced by
> commit 93878cf0255e ("eal: introduce helper API for IOVA mode"),
> we need PMDs (for PCI devices) to expose this flag.
> 
> Signed-off-by: Jianfeng Tan<jianfeng.tan@intel.com>
> ---
>   drivers/net/e1000/em_ethdev.c     | 3 ++-
>   drivers/net/e1000/igb_ethdev.c    | 5 +++--
>   drivers/net/fm10k/fm10k_ethdev.c  | 3 ++-
>   drivers/net/i40e/i40e_ethdev.c    | 3 ++-
>   drivers/net/i40e/i40e_ethdev_vf.c | 2 +-
>   drivers/net/ixgbe/ixgbe_ethdev.c  | 5 +++--
>   6 files changed, 13 insertions(+), 8 deletions(-)

This patch introduces a regression when doing device assignment in
guest, because current VT-d emulation only supports 39bits guest address
width [0].

In the Bz, Peter suggest we could have an IOVA allocator algorithm,
which could start to allocate IOVAs from 0. I think it could solve the
--no-huge case your series address, do you agree?

But it would be a long term solution, we need to fix this in stable.

Is the --no-huge option used in production, or is it only for testing?
If the latter do you think we could revert your patch while we find a
solution that makes all cases to work?

Ferruh, I see you also faced problems with KNI, how did you solved it?

Thanks,
Maxime

[0]: https://bugzilla.redhat.com/show_bug.cgi?id=1530957#c3

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dpdk-dev] [PATCH 2/2] net: enable IOVA mode for PMDs
  2018-01-05 10:32   ` Maxime Coquelin
@ 2018-01-05 12:04     ` Maxime Coquelin
  2018-01-05 12:10     ` santosh
  1 sibling, 0 replies; 19+ messages in thread
From: Maxime Coquelin @ 2018-01-05 12:04 UTC (permalink / raw)
  To: Jianfeng Tan, dev, santosh.shukla, ferruh.yigit
  Cc: sergio.gonzalez.monroy, thomas, Peter Xu



On 01/05/2018 11:32 AM, Maxime Coquelin wrote:
> Hi Jianfeng,
> 
> On 10/11/2017 12:33 PM, Jianfeng Tan wrote:
>> If we want to enable IOVA mode, introduced by
>> commit 93878cf0255e ("eal: introduce helper API for IOVA mode"),
>> we need PMDs (for PCI devices) to expose this flag.
>>
>> Signed-off-by: Jianfeng Tan<jianfeng.tan@intel.com>
>> ---
>>   drivers/net/e1000/em_ethdev.c     | 3 ++-
>>   drivers/net/e1000/igb_ethdev.c    | 5 +++--
>>   drivers/net/fm10k/fm10k_ethdev.c  | 3 ++-
>>   drivers/net/i40e/i40e_ethdev.c    | 3 ++-
>>   drivers/net/i40e/i40e_ethdev_vf.c | 2 +-
>>   drivers/net/ixgbe/ixgbe_ethdev.c  | 5 +++--
>>   6 files changed, 13 insertions(+), 8 deletions(-)
> 
> This patch introduces a regression when doing device assignment in
> guest, because current VT-d emulation only supports 39bits guest address
> width [0].
> 
> In the Bz, Peter suggest we could have an IOVA allocator algorithm,
> which could start to allocate IOVAs from 0. I think it could solve the
> --no-huge case your series address, do you agree?
> 
> But it would be a long term solution, we need to fix this in stable.
> 
> Is the --no-huge option used in production, or is it only for testing?
> If the latter do you think we could revert your patch while we find a
> solution that makes all cases to work?

It seems that we can get Intel IOMMU's Guest Address Width from the
sysfs, as the CAP register is exposed.

So we can get the SAGAW value (see [1], page 217):

On Bare Metal:
# echo $(((0x`cat /sys/class/iommu/dmar0/intel-iommu/cap` >> 8) & 0x1f))
4
=> 48bits

In guest:
# echo $(((0x`cat /sys/class/iommu/dmar0/intel-iommu/cap` >> 8) & 0x1f))
2
=> 39bits

Using this, we could or not allow the VA mode when using Intel IOMMU.
Any thoughts?

Regards,
Maxime

[1]: 
https://software.intel.com/sites/default/files/managed/c5/15/vt-directed-io-spec.pdf

> Ferruh, I see you also faced problems with KNI, how did you solved it?
> 
> Thanks,
> Maxime
> 
> [0]: https://bugzilla.redhat.com/show_bug.cgi?id=1530957#c3

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dpdk-dev] [PATCH 2/2] net: enable IOVA mode for PMDs
  2018-01-05 10:32   ` Maxime Coquelin
  2018-01-05 12:04     ` Maxime Coquelin
@ 2018-01-05 12:10     ` santosh
  2018-01-05 12:57       ` Maxime Coquelin
  1 sibling, 1 reply; 19+ messages in thread
From: santosh @ 2018-01-05 12:10 UTC (permalink / raw)
  To: Maxime Coquelin, Jianfeng Tan, dev, ferruh.yigit
  Cc: sergio.gonzalez.monroy, thomas, Peter Xu

Hi Maxim,


On Friday 05 January 2018 04:02 PM, Maxime Coquelin wrote:
> Hi Jianfeng,
>
> On 10/11/2017 12:33 PM, Jianfeng Tan wrote:
>> If we want to enable IOVA mode, introduced by
>> commit 93878cf0255e ("eal: introduce helper API for IOVA mode"),
>> we need PMDs (for PCI devices) to expose this flag.
>>
>> Signed-off-by: Jianfeng Tan<jianfeng.tan@intel.com>
>> ---

[...]

> Ferruh, I see you also faced problems with KNI, how did you solved it?
>
By checking lsmod for rte_kni module and if found then set .iova_mode = _pa, refer [1].
You may follow similar approach.. meaning detect emulation mode Or if not then
other-way to introduce --iova-mode=<> eal arg.

[1] http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal.c#n810

Thanks.

> Thanks,
> Maxime
>
> [0]: https://bugzilla.redhat.com/show_bug.cgi?id=1530957#c3

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dpdk-dev] [PATCH 2/2] net: enable IOVA mode for PMDs
  2018-01-05 12:10     ` santosh
@ 2018-01-05 12:57       ` Maxime Coquelin
  0 siblings, 0 replies; 19+ messages in thread
From: Maxime Coquelin @ 2018-01-05 12:57 UTC (permalink / raw)
  To: santosh, Jianfeng Tan, dev, ferruh.yigit
  Cc: sergio.gonzalez.monroy, thomas, Peter Xu

Hi Santosh

On 01/05/2018 01:10 PM, santosh wrote:
> Hi Maxim,
> 
> 
> On Friday 05 January 2018 04:02 PM, Maxime Coquelin wrote:
>> Hi Jianfeng,
>>
>> On 10/11/2017 12:33 PM, Jianfeng Tan wrote:
>>> If we want to enable IOVA mode, introduced by
>>> commit 93878cf0255e ("eal: introduce helper API for IOVA mode"),
>>> we need PMDs (for PCI devices) to expose this flag.
>>>
>>> Signed-off-by: Jianfeng Tan<jianfeng.tan@intel.com>
>>> ---
> 
> [...]
> 
>> Ferruh, I see you also faced problems with KNI, how did you solved it?
>>
> By checking lsmod for rte_kni module and if found then set .iova_mode = _pa, refer [1].
> You may follow similar approach.. meaning detect emulation mode Or if not then
> other-way to introduce --iova-mode=<> eal arg.

Thanks for the information

Detecting whether we are in host or guest is not that trivial, and as 
Peter pointed me out, the VT-d specifies the 39bits guest address width
so there are certainly some processors in the wild using it.

And I don't think introducing a new EAL arg in -stable is a good idea.
If this is the only solution, then we should keep PA by default.

When using intel IOMMU, I think the best solution is to forbid VA mode
if GAW is 39 bits.

Regards,
Maxime

> [1] http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal.c#n810
> 
> Thanks.
> 
>> Thanks,
>> Maxime
>>
>> [0]: https://bugzilla.redhat.com/show_bug.cgi?id=1530957#c3
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2018-01-05 12:58 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-11 10:33 [dpdk-dev] [PATCH 0/2] enable 4KB + VFIO-PCI Jianfeng Tan
2017-10-11 10:33 ` [dpdk-dev] [PATCH 1/2] eal: honor IOVA mode for no-huge case Jianfeng Tan
2017-10-11 11:27   ` Burakov, Anatoly
2017-10-11 11:30   ` santosh
2017-10-31 21:49   ` Ferruh Yigit
2017-10-31 22:37     ` Ferruh Yigit
2017-11-01  1:10       ` Ferruh Yigit
2017-10-11 10:33 ` [dpdk-dev] [PATCH 2/2] net: enable IOVA mode for PMDs Jianfeng Tan
2017-10-11 10:43   ` Burakov, Anatoly
2017-10-11 10:56     ` Tan, Jianfeng
2017-10-11 11:30   ` Burakov, Anatoly
2017-10-11 11:33   ` santosh
2018-01-05 10:32   ` Maxime Coquelin
2018-01-05 12:04     ` Maxime Coquelin
2018-01-05 12:10     ` santosh
2018-01-05 12:57       ` Maxime Coquelin
2017-10-11 10:47 ` [dpdk-dev] [PATCH 0/2] enable 4KB + VFIO-PCI Burakov, Anatoly
2017-10-11 10:50   ` Thomas Monjalon
2017-10-12 19:57 ` Ferruh Yigit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).