DPDK patches and discussions
 help / color / mirror / Atom feed
* [PATCH] vfio: do not coalesce DMA mappings
@ 2022-12-30  9:58 Nipun Gupta
  2023-01-04  5:19 ` [PATCH v2] " Nipun Gupta
  2023-06-29  8:21 ` [PATCH] " Ding, Xuan
  0 siblings, 2 replies; 30+ messages in thread
From: Nipun Gupta @ 2022-12-30  9:58 UTC (permalink / raw)
  To: dev, thomas, anatoly.burakov, ferruh.yigit; +Cc: nikhil.agarwal, Nipun Gupta

At the cleanup time when dma unmap is done, linux kernel
does not allow unmap of individual segments which were
coalesced together while creating the DMA map for type1 IOMMU
mappings. So, this change updates the mapping of the memory
segments(hugepages) on a per-page basis.

Signed-off-by: Nipun Gupta <nipun.gupta@amd.com>
---

When hotplug of devices is used, multiple pages gets colaeced
and a single mapping gets created for these pages (using APIs
rte_memseg_contig_walk() and type1_map_contig(). On the cleanup
time when the memory is released, the VFIO does not cleans up
that memory and following error is observed in the eal for 2MB
hugepages:
EAL: Unexpected size 0 of DMA remapping cleared instead of 2097152

This is because VFIO does not clear the DMA (refer API
vfio_dma_do_unmap() -
https://elixir.bootlin.com/linux/latest/source/drivers/vfio/vfio_iommu_type1.c#L1330),
where it checks the dma mapping where it checks for IOVA to free:
https://elixir.bootlin.com/linux/latest/source/drivers/vfio/vfio_iommu_type1.c#L1418.

Thus this change updates the mapping to be created individually
instead of colaecing them.

 lib/eal/linux/eal_vfio.c | 29 -----------------------------
 1 file changed, 29 deletions(-)

diff --git a/lib/eal/linux/eal_vfio.c b/lib/eal/linux/eal_vfio.c
index 549b86ae1d..56edccb0db 100644
--- a/lib/eal/linux/eal_vfio.c
+++ b/lib/eal/linux/eal_vfio.c
@@ -1369,19 +1369,6 @@ rte_vfio_get_group_num(const char *sysfs_base,
 	return 1;
 }
 
-static int
-type1_map_contig(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
-		size_t len, void *arg)
-{
-	int *vfio_container_fd = arg;
-
-	if (msl->external)
-		return 0;
-
-	return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
-			len, 1);
-}
-
 static int
 type1_map(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
 		void *arg)
@@ -1396,10 +1383,6 @@ type1_map(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
 	if (ms->iova == RTE_BAD_IOVA)
 		return 0;
 
-	/* if IOVA mode is VA, we've already mapped the internal segments */
-	if (!msl->external && rte_eal_iova_mode() == RTE_IOVA_VA)
-		return 0;
-
 	return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
 			ms->len, 1);
 }
@@ -1464,18 +1447,6 @@ vfio_type1_dma_mem_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova,
 static int
 vfio_type1_dma_map(int vfio_container_fd)
 {
-	if (rte_eal_iova_mode() == RTE_IOVA_VA) {
-		/* with IOVA as VA mode, we can get away with mapping contiguous
-		 * chunks rather than going page-by-page.
-		 */
-		int ret = rte_memseg_contig_walk(type1_map_contig,
-				&vfio_container_fd);
-		if (ret)
-			return ret;
-		/* we have to continue the walk because we've skipped the
-		 * external segments during the config walk.
-		 */
-	}
 	return rte_memseg_walk(type1_map, &vfio_container_fd);
 }
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2] vfio: do not coalesce DMA mappings
  2022-12-30  9:58 [PATCH] vfio: do not coalesce DMA mappings Nipun Gupta
@ 2023-01-04  5:19 ` Nipun Gupta
  2023-02-02 10:48   ` David Marchand
  2023-05-15 11:16   ` David Marchand
  2023-06-29  8:21 ` [PATCH] " Ding, Xuan
  1 sibling, 2 replies; 30+ messages in thread
From: Nipun Gupta @ 2023-01-04  5:19 UTC (permalink / raw)
  To: dev, thomas, anatoly.burakov, ferruh.yigit; +Cc: nikhil.agarwal, Nipun Gupta

At the cleanup time when dma unmap is done, linux kernel
does not allow unmap of individual segments which were
coalesced together while creating the DMA map for type1 IOMMU
mappings. So, this change updates the mapping of the memory
segments(hugepages) on a per-page basis.

Signed-off-by: Nipun Gupta <nipun.gupta@amd.com>
Signed-off-by: Nikhil Agarwal <nikhil.agarwal@amd.com>
---

Changes in v2:
- Fix checkpatch errors by updated mailmap

 .mailmap                 |  4 ++--
 lib/eal/linux/eal_vfio.c | 29 -----------------------------
 2 files changed, 2 insertions(+), 31 deletions(-)

diff --git a/.mailmap b/.mailmap
index 75884b6fe2..a234c9b3de 100644
--- a/.mailmap
+++ b/.mailmap
@@ -954,7 +954,7 @@ Nicolas Chautru <nicolas.chautru@intel.com>
 Nicolas Dichtel <nicolas.dichtel@6wind.com>
 Nicolas Harnois <nicolas.harnois@6wind.com>
 Nicolás Pernas Maradei <nico@emutex.com> <nicolas.pernas.maradei@emutex.com>
-Nikhil Agarwal <nikhil.agarwal@linaro.org>
+Nikhil Agarwal <nikhil.agarwal@amd.com> <nikhil.agarwal@xilinx.com> <nikhil.agarwal@linaro.org>
 Nikhil Jagtap <nikhil.jagtap@gmail.com>
 Nikhil Rao <nikhil.rao@intel.com>
 Nikhil Vasoya <nikhil.vasoya@chelsio.com>
@@ -962,7 +962,7 @@ Nikita Kozlov <nikita@elyzion.net>
 Niklas Söderlund <niklas.soderlund@corigine.com>
 Nikolay Nikolaev <nicknickolaev@gmail.com>
 Ning Li <muziding001@163.com> <lining18@jd.com>
-Nipun Gupta <nipun.gupta@nxp.com>
+Nipun Gupta <nipun.gupta@amd.com> <nipun.gupta@xilinx.com> <nipun.gupta@nxp.com>
 Nir Efrati <nir.efrati@intel.com>
 Nirmoy Das <ndas@suse.de>
 Nithin Dabilpuram <ndabilpuram@marvell.com> <nithin.dabilpuram@caviumnetworks.com>
diff --git a/lib/eal/linux/eal_vfio.c b/lib/eal/linux/eal_vfio.c
index 549b86ae1d..56edccb0db 100644
--- a/lib/eal/linux/eal_vfio.c
+++ b/lib/eal/linux/eal_vfio.c
@@ -1369,19 +1369,6 @@ rte_vfio_get_group_num(const char *sysfs_base,
 	return 1;
 }
 
-static int
-type1_map_contig(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
-		size_t len, void *arg)
-{
-	int *vfio_container_fd = arg;
-
-	if (msl->external)
-		return 0;
-
-	return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
-			len, 1);
-}
-
 static int
 type1_map(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
 		void *arg)
@@ -1396,10 +1383,6 @@ type1_map(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
 	if (ms->iova == RTE_BAD_IOVA)
 		return 0;
 
-	/* if IOVA mode is VA, we've already mapped the internal segments */
-	if (!msl->external && rte_eal_iova_mode() == RTE_IOVA_VA)
-		return 0;
-
 	return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
 			ms->len, 1);
 }
@@ -1464,18 +1447,6 @@ vfio_type1_dma_mem_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova,
 static int
 vfio_type1_dma_map(int vfio_container_fd)
 {
-	if (rte_eal_iova_mode() == RTE_IOVA_VA) {
-		/* with IOVA as VA mode, we can get away with mapping contiguous
-		 * chunks rather than going page-by-page.
-		 */
-		int ret = rte_memseg_contig_walk(type1_map_contig,
-				&vfio_container_fd);
-		if (ret)
-			return ret;
-		/* we have to continue the walk because we've skipped the
-		 * external segments during the config walk.
-		 */
-	}
 	return rte_memseg_walk(type1_map, &vfio_container_fd);
 }
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] vfio: do not coalesce DMA mappings
  2023-01-04  5:19 ` [PATCH v2] " Nipun Gupta
@ 2023-02-02 10:48   ` David Marchand
  2023-02-07  8:56     ` Gupta, Nipun
  2023-05-15 11:16   ` David Marchand
  1 sibling, 1 reply; 30+ messages in thread
From: David Marchand @ 2023-02-02 10:48 UTC (permalink / raw)
  To: anatoly.burakov, Nipun Gupta; +Cc: dev, thomas, ferruh.yigit, nikhil.agarwal

Hi,

On Wed, Jan 4, 2023 at 6:19 AM Nipun Gupta <nipun.gupta@amd.com> wrote:
>
> At the cleanup time when dma unmap is done, linux kernel
> does not allow unmap of individual segments which were
> coalesced together while creating the DMA map for type1 IOMMU
> mappings. So, this change updates the mapping of the memory
> segments(hugepages) on a per-page basis.
>
> Signed-off-by: Nipun Gupta <nipun.gupta@amd.com>
> Signed-off-by: Nikhil Agarwal <nikhil.agarwal@amd.com>

This change is scary.

I won't take it without a review from the maintainer.
Anatoly, can you have a look?


Thanks.
-- 
David Marchand


^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v2] vfio: do not coalesce DMA mappings
  2023-02-02 10:48   ` David Marchand
@ 2023-02-07  8:56     ` Gupta, Nipun
  2023-04-04 14:57       ` Gupta, Nipun
  2023-04-04 15:13       ` Burakov, Anatoly
  0 siblings, 2 replies; 30+ messages in thread
From: Gupta, Nipun @ 2023-02-07  8:56 UTC (permalink / raw)
  To: David Marchand, anatoly.burakov
  Cc: dev, thomas, Yigit, Ferruh, Agarwal, Nikhil

[AMD Official Use Only - General]

Hi David,

I agree that change is not straightforward to review, but it should not cause any functional issue as we are still creating all the memory mappings, but one by one for each segment.
For hot plug case this causes issue as mentioned, that VFIO does not allow unmap of the individual segments in case mapping was created of a single coalesced segment. 

But yes, I am not sure why this code was added, which Anatoly may have more understanding on.

Anatoly,

Can you please provide your feedback on this change?

Thanks,
Nipun

> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Thursday, February 2, 2023 4:19 PM
> To: anatoly.burakov@intel.com; Gupta, Nipun <Nipun.Gupta@amd.com>
> Cc: dev@dpdk.org; thomas@monjalon.net; Yigit, Ferruh
> <Ferruh.Yigit@amd.com>; Agarwal, Nikhil <nikhil.agarwal@amd.com>
> Subject: Re: [PATCH v2] vfio: do not coalesce DMA mappings
> 
> Caution: This message originated from an External Source. Use proper caution
> when opening attachments, clicking links, or responding.
> 
> 
> Hi,
> 
> On Wed, Jan 4, 2023 at 6:19 AM Nipun Gupta <nipun.gupta@amd.com> wrote:
> >
> > At the cleanup time when dma unmap is done, linux kernel
> > does not allow unmap of individual segments which were
> > coalesced together while creating the DMA map for type1 IOMMU
> > mappings. So, this change updates the mapping of the memory
> > segments(hugepages) on a per-page basis.
> >
> > Signed-off-by: Nipun Gupta <nipun.gupta@amd.com>
> > Signed-off-by: Nikhil Agarwal <nikhil.agarwal@amd.com>
> 
> This change is scary.
> 
> I won't take it without a review from the maintainer.
> Anatoly, can you have a look?
> 
> 
> Thanks.
> --
> David Marchand

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v2] vfio: do not coalesce DMA mappings
  2023-02-07  8:56     ` Gupta, Nipun
@ 2023-04-04 14:57       ` Gupta, Nipun
  2023-04-04 15:13       ` Burakov, Anatoly
  1 sibling, 0 replies; 30+ messages in thread
From: Gupta, Nipun @ 2023-04-04 14:57 UTC (permalink / raw)
  To: David Marchand, anatoly.burakov
  Cc: dev, thomas, Yigit, Ferruh, Agarwal, Nikhil, Gupta, Nipun

Hi David,
	As the DPDK 23.03 release is out can we have a relook at this change?

Thanks,
Nipun

> -----Original Message-----
> From: Gupta, Nipun
> Sent: Tuesday, February 7, 2023 2:27 PM
> To: David Marchand <david.marchand@redhat.com>;
> anatoly.burakov@intel.com
> Cc: dev@dpdk.org; thomas@monjalon.net; Yigit, Ferruh
> <Ferruh.Yigit@amd.com>; Agarwal, Nikhil <nikhil.agarwal@amd.com>
> Subject: RE: [PATCH v2] vfio: do not coalesce DMA mappings
> 
> [AMD Official Use Only - General]
> 
> Hi David,
> 
> I agree that change is not straightforward to review, but it should not cause any
> functional issue as we are still creating all the memory mappings, but one by one
> for each segment.
> For hot plug case this causes issue as mentioned, that VFIO does not allow
> unmap of the individual segments in case mapping was created of a single
> coalesced segment.
> 
> But yes, I am not sure why this code was added, which Anatoly may have more
> understanding on.
> 
> Anatoly,
> 
> Can you please provide your feedback on this change?
> 
> Thanks,
> Nipun
> 
> > -----Original Message-----
> > From: David Marchand <david.marchand@redhat.com>
> > Sent: Thursday, February 2, 2023 4:19 PM
> > To: anatoly.burakov@intel.com; Gupta, Nipun <Nipun.Gupta@amd.com>
> > Cc: dev@dpdk.org; thomas@monjalon.net; Yigit, Ferruh
> > <Ferruh.Yigit@amd.com>; Agarwal, Nikhil <nikhil.agarwal@amd.com>
> > Subject: Re: [PATCH v2] vfio: do not coalesce DMA mappings
> >
> > Caution: This message originated from an External Source. Use proper caution
> > when opening attachments, clicking links, or responding.
> >
> >
> > Hi,
> >
> > On Wed, Jan 4, 2023 at 6:19 AM Nipun Gupta <nipun.gupta@amd.com>
> wrote:
> > >
> > > At the cleanup time when dma unmap is done, linux kernel
> > > does not allow unmap of individual segments which were
> > > coalesced together while creating the DMA map for type1 IOMMU
> > > mappings. So, this change updates the mapping of the memory
> > > segments(hugepages) on a per-page basis.
> > >
> > > Signed-off-by: Nipun Gupta <nipun.gupta@amd.com>
> > > Signed-off-by: Nikhil Agarwal <nikhil.agarwal@amd.com>
> >
> > This change is scary.
> >
> > I won't take it without a review from the maintainer.
> > Anatoly, can you have a look?
> >
> >
> > Thanks.
> > --
> > David Marchand

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] vfio: do not coalesce DMA mappings
  2023-02-07  8:56     ` Gupta, Nipun
  2023-04-04 14:57       ` Gupta, Nipun
@ 2023-04-04 15:13       ` Burakov, Anatoly
  2023-04-04 16:32         ` Nipun Gupta
  2023-04-07  6:13         ` Nipun Gupta
  1 sibling, 2 replies; 30+ messages in thread
From: Burakov, Anatoly @ 2023-04-04 15:13 UTC (permalink / raw)
  To: Gupta, Nipun, David Marchand; +Cc: dev, thomas, Yigit, Ferruh, Agarwal, Nikhil

On 2/7/2023 8:56 AM, Gupta, Nipun wrote:
> [AMD Official Use Only - General]
> 
> Hi David,
> 
> I agree that change is not straightforward to review, but it should not cause any functional issue as we are still creating all the memory mappings, but one by one for each segment.
> For hot plug case this causes issue as mentioned, that VFIO does not allow unmap of the individual segments in case mapping was created of a single coalesced segment.
> 
> But yes, I am not sure why this code was added, which Anatoly may have more understanding on.

The motivation behind this code was that Linux allows limited amount of 
page mappings, so we were trying to save on those. However, since then 
there have been a few changes related to partial unmaps that may make it 
so that this code is not only no longer necessary, but is in fact 
actively harmful. I agree that this at least warrants a second look.

> 
> Anatoly,
> 
> Can you please provide your feedback on this change?

The patch probably shouldn't include the mailmap changes :)

Could you please provide some steps to reproduce the hotplug issue 
you're having? It would be great to have a test case for this patchset 
to put it in context.

-- 
Thanks,
Anatoly


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] vfio: do not coalesce DMA mappings
  2023-04-04 15:13       ` Burakov, Anatoly
@ 2023-04-04 16:32         ` Nipun Gupta
  2023-04-05 14:16           ` Burakov, Anatoly
  2023-04-11 15:13           ` Thomas Monjalon
  2023-04-07  6:13         ` Nipun Gupta
  1 sibling, 2 replies; 30+ messages in thread
From: Nipun Gupta @ 2023-04-04 16:32 UTC (permalink / raw)
  To: Burakov, Anatoly, David Marchand
  Cc: dev, thomas, Yigit, Ferruh, Agarwal, Nikhil



On 4/4/2023 8:43 PM, Burakov, Anatoly wrote:
> Caution: This message originated from an External Source. Use proper 
> caution when opening attachments, clicking links, or responding.
> 
> 
> On 2/7/2023 8:56 AM, Gupta, Nipun wrote:
>> [AMD Official Use Only - General]
>>
>> Hi David,
>>
>> I agree that change is not straightforward to review, but it should 
>> not cause any functional issue as we are still creating all the memory 
>> mappings, but one by one for each segment.
>> For hot plug case this causes issue as mentioned, that VFIO does not 
>> allow unmap of the individual segments in case mapping was created of 
>> a single coalesced segment.
>>
>> But yes, I am not sure why this code was added, which Anatoly may have 
>> more understanding on.
> 
> The motivation behind this code was that Linux allows limited amount of
> page mappings, so we were trying to save on those. However, since then
> there have been a few changes related to partial unmaps that may make it
> so that this code is not only no longer necessary, but is in fact
> actively harmful. I agree that this at least warrants a second look.
> 
>>
>> Anatoly,
>>
>> Can you please provide your feedback on this change?
> 
> The patch probably shouldn't include the mailmap changes :)

Sure, will send a separate patch for it.

> 
> Could you please provide some steps to reproduce the hotplug issue
> you're having? It would be great to have a test case for this patchset
> to put it in context.

I am working on CDX bus 
(http://patchwork.dpdk.org/project/dpdk/patch/20230124140746.594066-2-nipun.gupta@amd.com/) 
and trying out some cases for plug/unplug.

The test is as follows:
   # Run testpmd application
   ./dpdk-testpmd -c 0x3 -- -i --nb-cores=1

   # Bind to VFIO
   echo "vfio-cdx" >  /sys/bus/cdx/devices/cdx-00\:00/driver_override
   echo "cdx-00:00" > /sys/bus/cdx/drivers_probe

   # Plug a device
   testpmd> port attach cdx:cdx-00:00

   #quit testpmd
   testpmd> quit

This gave error at testpmd exit that memory cannot be freed. On 
debugging I updated this code and seems it should be seen with any of 
the device.

I see similar test case (without quit) mentioned 
https://doc.dpdk.org/dts/test_plans/hotplug_test_plan.html, but the 
difference is that it is with igb_uio and issue is being observed with VFIO.

Please note the device/bus mentioned in the commands is not yet 
upstreamed in DPDK, but patches would be sent out soon.

Thanks,
Nipun

> 
> -- 
> Thanks,
> Anatoly
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] vfio: do not coalesce DMA mappings
  2023-04-04 16:32         ` Nipun Gupta
@ 2023-04-05 14:16           ` Burakov, Anatoly
  2023-04-05 15:06             ` Gupta, Nipun
  2023-04-24 15:22             ` David Marchand
  2023-04-11 15:13           ` Thomas Monjalon
  1 sibling, 2 replies; 30+ messages in thread
From: Burakov, Anatoly @ 2023-04-05 14:16 UTC (permalink / raw)
  To: Nipun Gupta, David Marchand; +Cc: dev, thomas, Yigit, Ferruh, Agarwal, Nikhil

On 4/4/2023 5:32 PM, Nipun Gupta wrote:
> 
> 
> On 4/4/2023 8:43 PM, Burakov, Anatoly wrote:
>> Caution: This message originated from an External Source. Use proper 
>> caution when opening attachments, clicking links, or responding.
>>
>>
>> On 2/7/2023 8:56 AM, Gupta, Nipun wrote:
>>> [AMD Official Use Only - General]
>>>
>>> Hi David,
>>>
>>> I agree that change is not straightforward to review, but it should 
>>> not cause any functional issue as we are still creating all the 
>>> memory mappings, but one by one for each segment.
>>> For hot plug case this causes issue as mentioned, that VFIO does not 
>>> allow unmap of the individual segments in case mapping was created of 
>>> a single coalesced segment.
>>>
>>> But yes, I am not sure why this code was added, which Anatoly may 
>>> have more understanding on.
>>
>> The motivation behind this code was that Linux allows limited amount of
>> page mappings, so we were trying to save on those. However, since then
>> there have been a few changes related to partial unmaps that may make it
>> so that this code is not only no longer necessary, but is in fact
>> actively harmful. I agree that this at least warrants a second look.
>>
>>>
>>> Anatoly,
>>>
>>> Can you please provide your feedback on this change?
>>
>> The patch probably shouldn't include the mailmap changes :)
> 
> Sure, will send a separate patch for it.
> 
>>
>> Could you please provide some steps to reproduce the hotplug issue
>> you're having? It would be great to have a test case for this patchset
>> to put it in context.
> 
> I am working on CDX bus 
> (http://patchwork.dpdk.org/project/dpdk/patch/20230124140746.594066-2-nipun.gupta@amd.com/) and trying out some cases for plug/unplug.
> 
> The test is as follows:
>    # Run testpmd application
>    ./dpdk-testpmd -c 0x3 -- -i --nb-cores=1
> 
>    # Bind to VFIO
>    echo "vfio-cdx" >  /sys/bus/cdx/devices/cdx-00\:00/driver_override
>    echo "cdx-00:00" > /sys/bus/cdx/drivers_probe
> 
>    # Plug a device
>    testpmd> port attach cdx:cdx-00:00
> 
>    #quit testpmd
>    testpmd> quit
> 
> This gave error at testpmd exit that memory cannot be freed. On 
> debugging I updated this code and seems it should be seen with any of 
> the device.
> 
> I see similar test case (without quit) mentioned 
> https://doc.dpdk.org/dts/test_plans/hotplug_test_plan.html, but the 
> difference is that it is with igb_uio and issue is being observed with 
> VFIO.
> 
> Please note the device/bus mentioned in the commands is not yet 
> upstreamed in DPDK, but patches would be sent out soon.
> 
> Thanks,
> Nipun
> 

Thanks, I can reproduce this issue with regular devices too (run testpmd 
with no devices, bind a NIC to VFIO, attach it, then quit). You're 
correct in that since the initial mapping was done with mapping large 
contiguous zones (such as when mempools are created before attach), any 
subsequent freeing of memory will cause these errors to happen.

I don't think this can be fixed by anything other than not doing the 
contiguous mapping thing, so provisionally, I think this patch should be 
accepted. I'll play around with it some more and get back to you :)

>>
>> -- 
>> Thanks,
>> Anatoly
>>

-- 
Thanks,
Anatoly


^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v2] vfio: do not coalesce DMA mappings
  2023-04-05 14:16           ` Burakov, Anatoly
@ 2023-04-05 15:06             ` Gupta, Nipun
  2023-04-24 15:22             ` David Marchand
  1 sibling, 0 replies; 30+ messages in thread
From: Gupta, Nipun @ 2023-04-05 15:06 UTC (permalink / raw)
  To: Burakov, Anatoly, David Marchand
  Cc: dev, thomas, Yigit, Ferruh, Agarwal, Nikhil

[AMD Official Use Only - General]



> -----Original Message-----
> From: Burakov, Anatoly <anatoly.burakov@intel.com>
> Sent: Wednesday, April 5, 2023 7:46 PM
> To: Gupta, Nipun <Nipun.Gupta@amd.com>; David Marchand
> <david.marchand@redhat.com>
> Cc: dev@dpdk.org; thomas@monjalon.net; Yigit, Ferruh
> <Ferruh.Yigit@amd.com>; Agarwal, Nikhil <nikhil.agarwal@amd.com>
> Subject: Re: [PATCH v2] vfio: do not coalesce DMA mappings
> 
> Caution: This message originated from an External Source. Use proper caution
> when opening attachments, clicking links, or responding.
> 
> 
> On 4/4/2023 5:32 PM, Nipun Gupta wrote:
> >
> >
> > On 4/4/2023 8:43 PM, Burakov, Anatoly wrote:
> >> Caution: This message originated from an External Source. Use proper
> >> caution when opening attachments, clicking links, or responding.
> >>
> >>
> >> On 2/7/2023 8:56 AM, Gupta, Nipun wrote:
> >>> [AMD Official Use Only - General]
> >>>
> >>> Hi David,
> >>>
> >>> I agree that change is not straightforward to review, but it should
> >>> not cause any functional issue as we are still creating all the
> >>> memory mappings, but one by one for each segment.
> >>> For hot plug case this causes issue as mentioned, that VFIO does not
> >>> allow unmap of the individual segments in case mapping was created of
> >>> a single coalesced segment.
> >>>
> >>> But yes, I am not sure why this code was added, which Anatoly may
> >>> have more understanding on.
> >>
> >> The motivation behind this code was that Linux allows limited amount of
> >> page mappings, so we were trying to save on those. However, since then
> >> there have been a few changes related to partial unmaps that may make it
> >> so that this code is not only no longer necessary, but is in fact
> >> actively harmful. I agree that this at least warrants a second look.
> >>
> >>>
> >>> Anatoly,
> >>>
> >>> Can you please provide your feedback on this change?
> >>
> >> The patch probably shouldn't include the mailmap changes :)
> >
> > Sure, will send a separate patch for it.
> >
> >>
> >> Could you please provide some steps to reproduce the hotplug issue
> >> you're having? It would be great to have a test case for this patchset
> >> to put it in context.
> >
> > I am working on CDX bus
> > (http://patchwork.dpdk.org/project/dpdk/patch/20230124140746.594066-2-
> nipun.gupta@amd.com/) and trying out some cases for plug/unplug.
> >
> > The test is as follows:
> >    # Run testpmd application
> >    ./dpdk-testpmd -c 0x3 -- -i --nb-cores=1
> >
> >    # Bind to VFIO
> >    echo "vfio-cdx" >  /sys/bus/cdx/devices/cdx-00\:00/driver_override
> >    echo "cdx-00:00" > /sys/bus/cdx/drivers_probe
> >
> >    # Plug a device
> >    testpmd> port attach cdx:cdx-00:00
> >
> >    #quit testpmd
> >    testpmd> quit
> >
> > This gave error at testpmd exit that memory cannot be freed. On
> > debugging I updated this code and seems it should be seen with any of
> > the device.
> >
> > I see similar test case (without quit) mentioned
> > https://doc.dpdk.org/dts/test_plans/hotplug_test_plan.html, but the
> > difference is that it is with igb_uio and issue is being observed with
> > VFIO.
> >
> > Please note the device/bus mentioned in the commands is not yet
> > upstreamed in DPDK, but patches would be sent out soon.
> >
> > Thanks,
> > Nipun
> >
> 
> Thanks, I can reproduce this issue with regular devices too (run testpmd
> with no devices, bind a NIC to VFIO, attach it, then quit). You're
> correct in that since the initial mapping was done with mapping large
> contiguous zones (such as when mempools are created before attach), any
> subsequent freeing of memory will cause these errors to happen.

Thanks for trying on other devices too, and good to know that this is also
reproduced with them :)

Regards,
Nipun

> 
> I don't think this can be fixed by anything other than not doing the
> contiguous mapping thing, so provisionally, I think this patch should be
> accepted. I'll play around with it some more and get back to you :)
> 
> >>
> >> --
> >> Thanks,
> >> Anatoly
> >>
> 
> --
> Thanks,
> Anatoly

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] vfio: do not coalesce DMA mappings
  2023-04-04 15:13       ` Burakov, Anatoly
  2023-04-04 16:32         ` Nipun Gupta
@ 2023-04-07  6:13         ` Nipun Gupta
  2023-04-07  7:26           ` David Marchand
  1 sibling, 1 reply; 30+ messages in thread
From: Nipun Gupta @ 2023-04-07  6:13 UTC (permalink / raw)
  To: Burakov, Anatoly, David Marchand
  Cc: dev, thomas, Yigit, Ferruh, Agarwal, Nikhil



On 4/4/2023 8:43 PM, Burakov, Anatoly wrote:
> Caution: This message originated from an External Source. Use proper 
> caution when opening attachments, clicking links, or responding.
> 
> 
> On 2/7/2023 8:56 AM, Gupta, Nipun wrote:
>> [AMD Official Use Only - General]
>>
>> Hi David,
>>
>> I agree that change is not straightforward to review, but it should 
>> not cause any functional issue as we are still creating all the memory 
>> mappings, but one by one for each segment.
>> For hot plug case this causes issue as mentioned, that VFIO does not 
>> allow unmap of the individual segments in case mapping was created of 
>> a single coalesced segment.
>>
>> But yes, I am not sure why this code was added, which Anatoly may have 
>> more understanding on.
> 
> The motivation behind this code was that Linux allows limited amount of
> page mappings, so we were trying to save on those. However, since then
> there have been a few changes related to partial unmaps that may make it
> so that this code is not only no longer necessary, but is in fact
> actively harmful. I agree that this at least warrants a second look.
> 
>>
>> Anatoly,
>>
>> Can you please provide your feedback on this change?
> 
> The patch probably shouldn't include the mailmap changes :)

I see in "git log" that all the mailmap changes are with the patch 
submitted, probably as it shows checkpatch warning, so it seems this 
should be fine?

Thanks,
Nipun

> 
> Could you please provide some steps to reproduce the hotplug issue
> you're having? It would be great to have a test case for this patchset
> to put it in context.
> 
> -- 
> Thanks,
> Anatoly
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] vfio: do not coalesce DMA mappings
  2023-04-07  6:13         ` Nipun Gupta
@ 2023-04-07  7:26           ` David Marchand
  0 siblings, 0 replies; 30+ messages in thread
From: David Marchand @ 2023-04-07  7:26 UTC (permalink / raw)
  To: Nipun Gupta, Burakov, Anatoly; +Cc: dev, thomas, Yigit, Ferruh, Agarwal, Nikhil

Hello Nipun, Anatoly,

On Fri, Apr 7, 2023 at 8:13 AM Nipun Gupta <nipun.gupta@amd.com> wrote:
> > The patch probably shouldn't include the mailmap changes :)

Mailmap changes can be done by the submitter.
So Nipun did nothing wrong.

>
> I see in "git log" that all the mailmap changes are with the patch
> submitted, probably as it shows checkpatch warning, so it seems this
> should be fine?

If a submitter includes the mailmap change in his patches, it is fine.
Otherwise, the maintainer that merges the first patch of a new
contributor will fix the mailmap at the same time.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] vfio: do not coalesce DMA mappings
  2023-04-04 16:32         ` Nipun Gupta
  2023-04-05 14:16           ` Burakov, Anatoly
@ 2023-04-11 15:13           ` Thomas Monjalon
  2023-04-11 16:51             ` Gupta, Nipun
  1 sibling, 1 reply; 30+ messages in thread
From: Thomas Monjalon @ 2023-04-11 15:13 UTC (permalink / raw)
  To: Burakov, Anatoly, David Marchand, Nipun Gupta
  Cc: dev, Yigit, Ferruh, Agarwal, Nikhil

04/04/2023 18:32, Nipun Gupta:
> On 4/4/2023 8:43 PM, Burakov, Anatoly wrote:
> > The patch probably shouldn't include the mailmap changes :)
> 
> Sure, will send a separate patch for it.

No please, we squash mailmap changes with 1st patch.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v2] vfio: do not coalesce DMA mappings
  2023-04-11 15:13           ` Thomas Monjalon
@ 2023-04-11 16:51             ` Gupta, Nipun
  0 siblings, 0 replies; 30+ messages in thread
From: Gupta, Nipun @ 2023-04-11 16:51 UTC (permalink / raw)
  To: Thomas Monjalon, Burakov, Anatoly, David Marchand
  Cc: dev, Yigit, Ferruh, Agarwal, Nikhil

[AMD Official Use Only - General]



> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Tuesday, April 11, 2023 8:43 PM
> To: Burakov, Anatoly <anatoly.burakov@intel.com>; David Marchand
> <david.marchand@redhat.com>; Gupta, Nipun <Nipun.Gupta@amd.com>
> Cc: dev@dpdk.org; Yigit, Ferruh <Ferruh.Yigit@amd.com>; Agarwal, Nikhil
> <nikhil.agarwal@amd.com>
> Subject: Re: [PATCH v2] vfio: do not coalesce DMA mappings
> 
> Caution: This message originated from an External Source. Use proper caution
> when opening attachments, clicking links, or responding.
> 
> 
> 04/04/2023 18:32, Nipun Gupta:
> > On 4/4/2023 8:43 PM, Burakov, Anatoly wrote:
> > > The patch probably shouldn't include the mailmap changes :)
> >
> > Sure, will send a separate patch for it.
> 
> No please, we squash mailmap changes with 1st patch.

Agree, I have replied same to this later in this mail thread.
So, there is no change/resubmission required in this patch as of now.

> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] vfio: do not coalesce DMA mappings
  2023-04-05 14:16           ` Burakov, Anatoly
  2023-04-05 15:06             ` Gupta, Nipun
@ 2023-04-24 15:22             ` David Marchand
  2023-04-24 16:10               ` Stephen Hemminger
  2023-05-10 12:58               ` Nipun Gupta
  1 sibling, 2 replies; 30+ messages in thread
From: David Marchand @ 2023-04-24 15:22 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: Nipun Gupta, dev, thomas, Yigit, Ferruh, Agarwal, Nikhil

Hello Anatoly,

On Wed, Apr 5, 2023 at 4:17 PM Burakov, Anatoly
<anatoly.burakov@intel.com> wrote:
> >> Could you please provide some steps to reproduce the hotplug issue
> >> you're having? It would be great to have a test case for this patchset
> >> to put it in context.
> >
> > I am working on CDX bus
> > (http://patchwork.dpdk.org/project/dpdk/patch/20230124140746.594066-2-nipun.gupta@amd.com/) and trying out some cases for plug/unplug.
> >
> > The test is as follows:
> >    # Run testpmd application
> >    ./dpdk-testpmd -c 0x3 -- -i --nb-cores=1
> >
> >    # Bind to VFIO
> >    echo "vfio-cdx" >  /sys/bus/cdx/devices/cdx-00\:00/driver_override
> >    echo "cdx-00:00" > /sys/bus/cdx/drivers_probe
> >
> >    # Plug a device
> >    testpmd> port attach cdx:cdx-00:00
> >
> >    #quit testpmd
> >    testpmd> quit
> >
> > This gave error at testpmd exit that memory cannot be freed. On
> > debugging I updated this code and seems it should be seen with any of
> > the device.
> >
> > I see similar test case (without quit) mentioned
> > https://doc.dpdk.org/dts/test_plans/hotplug_test_plan.html, but the
> > difference is that it is with igb_uio and issue is being observed with
> > VFIO.
> >
> > Please note the device/bus mentioned in the commands is not yet
> > upstreamed in DPDK, but patches would be sent out soon.
> >
> > Thanks,
> > Nipun
> >
>
> Thanks, I can reproduce this issue with regular devices too (run testpmd
> with no devices, bind a NIC to VFIO, attach it, then quit). You're
> correct in that since the initial mapping was done with mapping large
> contiguous zones (such as when mempools are created before attach), any
> subsequent freeing of memory will cause these errors to happen.
>
> I don't think this can be fixed by anything other than not doing the
> contiguous mapping thing, so provisionally, I think this patch should be
> accepted. I'll play around with it some more and get back to you :)

Can we conclude on this topic?
It is best we merge this kind of change the sooner possible for a release.

Thanks.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] vfio: do not coalesce DMA mappings
  2023-04-24 15:22             ` David Marchand
@ 2023-04-24 16:10               ` Stephen Hemminger
  2023-04-24 16:16                 ` Gupta, Nipun
  2023-05-10 12:58               ` Nipun Gupta
  1 sibling, 1 reply; 30+ messages in thread
From: Stephen Hemminger @ 2023-04-24 16:10 UTC (permalink / raw)
  To: David Marchand
  Cc: Burakov, Anatoly, Nipun Gupta, dev, thomas, Yigit, Ferruh,
	Agarwal, Nikhil

On Mon, 24 Apr 2023 17:22:46 +0200
David Marchand <david.marchand@redhat.com> wrote:

> > >  
> >
> > Thanks, I can reproduce this issue with regular devices too (run testpmd
> > with no devices, bind a NIC to VFIO, attach it, then quit). You're
> > correct in that since the initial mapping was done with mapping large
> > contiguous zones (such as when mempools are created before attach), any
> > subsequent freeing of memory will cause these errors to happen.
> >
> > I don't think this can be fixed by anything other than not doing the
> > contiguous mapping thing, so provisionally, I think this patch should be
> > accepted. I'll play around with it some more and get back to you :)  
> 
> Can we conclude on this topic?
> It is best we merge this kind of change the sooner possible for a release.
> 
> Thanks.

Shouldn't the coalesced mappings be able to have correct datastructure
(accounting) so that on shutdown the unmap's are done for the right size?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v2] vfio: do not coalesce DMA mappings
  2023-04-24 16:10               ` Stephen Hemminger
@ 2023-04-24 16:16                 ` Gupta, Nipun
  0 siblings, 0 replies; 30+ messages in thread
From: Gupta, Nipun @ 2023-04-24 16:16 UTC (permalink / raw)
  To: Stephen Hemminger, David Marchand
  Cc: Burakov, Anatoly, dev, thomas, Yigit, Ferruh, Agarwal, Nikhil



> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Monday, April 24, 2023 9:41 PM
> To: David Marchand <david.marchand@redhat.com>
> Cc: Burakov, Anatoly <anatoly.burakov@intel.com>; Gupta, Nipun
> <Nipun.Gupta@amd.com>; dev@dpdk.org; thomas@monjalon.net; Yigit, Ferruh
> <Ferruh.Yigit@amd.com>; Agarwal, Nikhil <nikhil.agarwal@amd.com>
> Subject: Re: [PATCH v2] vfio: do not coalesce DMA mappings
> 
> 
> On Mon, 24 Apr 2023 17:22:46 +0200
> David Marchand <david.marchand@redhat.com> wrote:
> 
> > > >
> > >
> > > Thanks, I can reproduce this issue with regular devices too (run testpmd
> > > with no devices, bind a NIC to VFIO, attach it, then quit). You're
> > > correct in that since the initial mapping was done with mapping large
> > > contiguous zones (such as when mempools are created before attach), any
> > > subsequent freeing of memory will cause these errors to happen.
> > >
> > > I don't think this can be fixed by anything other than not doing the
> > > contiguous mapping thing, so provisionally, I think this patch should be
> > > accepted. I'll play around with it some more and get back to you :)
> >
> > Can we conclude on this topic?
> > It is best we merge this kind of change the sooner possible for a release.
> >
> > Thanks.
> 
> Shouldn't the coalesced mappings be able to have correct datastructure
> (accounting) so that on shutdown the unmap's are done for the right size?

This issue occurs only on the hotplug case. Other devices which are not hot plugged
and are existing from the start of the application need to have individual (non-
coalesced) mappings. So individual (non-coalesced) mappings are definitely
required. IMO we should not maintain separate mapping for each hot-plugged
device as it would be unrequired overhead.

Regards,
Nipun

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] vfio: do not coalesce DMA mappings
  2023-04-24 15:22             ` David Marchand
  2023-04-24 16:10               ` Stephen Hemminger
@ 2023-05-10 12:58               ` Nipun Gupta
  2023-05-11 14:08                 ` Burakov, Anatoly
  1 sibling, 1 reply; 30+ messages in thread
From: Nipun Gupta @ 2023-05-10 12:58 UTC (permalink / raw)
  To: David Marchand, Burakov, Anatoly
  Cc: dev, thomas, Yigit, Ferruh, Agarwal, Nikhil



On 4/24/2023 8:52 PM, David Marchand wrote:
> 
> Hello Anatoly,
> 
> On Wed, Apr 5, 2023 at 4:17 PM Burakov, Anatoly
> <anatoly.burakov@intel.com> wrote:
>>>> Could you please provide some steps to reproduce the hotplug issue
>>>> you're having? It would be great to have a test case for this patchset
>>>> to put it in context.
>>>
>>> I am working on CDX bus
>>> (http://patchwork.dpdk.org/project/dpdk/patch/20230124140746.594066-2-nipun.gupta@amd.com/) and trying out some cases for plug/unplug.
>>>
>>> The test is as follows:
>>>     # Run testpmd application
>>>     ./dpdk-testpmd -c 0x3 -- -i --nb-cores=1
>>>
>>>     # Bind to VFIO
>>>     echo "vfio-cdx" >  /sys/bus/cdx/devices/cdx-00\:00/driver_override
>>>     echo "cdx-00:00" > /sys/bus/cdx/drivers_probe
>>>
>>>     # Plug a device
>>>     testpmd> port attach cdx:cdx-00:00
>>>
>>>     #quit testpmd
>>>     testpmd> quit
>>>
>>> This gave error at testpmd exit that memory cannot be freed. On
>>> debugging I updated this code and seems it should be seen with any of
>>> the device.
>>>
>>> I see similar test case (without quit) mentioned
>>> https://doc.dpdk.org/dts/test_plans/hotplug_test_plan.html, but the
>>> difference is that it is with igb_uio and issue is being observed with
>>> VFIO.
>>>
>>> Please note the device/bus mentioned in the commands is not yet
>>> upstreamed in DPDK, but patches would be sent out soon.
>>>
>>> Thanks,
>>> Nipun
>>>
>>
>> Thanks, I can reproduce this issue with regular devices too (run testpmd
>> with no devices, bind a NIC to VFIO, attach it, then quit). You're
>> correct in that since the initial mapping was done with mapping large
>> contiguous zones (such as when mempools are created before attach), any
>> subsequent freeing of memory will cause these errors to happen.
>>
>> I don't think this can be fixed by anything other than not doing the
>> contiguous mapping thing, so provisionally, I think this patch should be
>> accepted. I'll play around with it some more and get back to you :)
> 
> Can we conclude on this topic?
> It is best we merge this kind of change the sooner possible for a release.

Hi Anatoly,
	Can you kindly update on this?

Thanks,
Nipun

> 
> Thanks.
> 
> 
> --
> David Marchand
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] vfio: do not coalesce DMA mappings
  2023-05-10 12:58               ` Nipun Gupta
@ 2023-05-11 14:08                 ` Burakov, Anatoly
  0 siblings, 0 replies; 30+ messages in thread
From: Burakov, Anatoly @ 2023-05-11 14:08 UTC (permalink / raw)
  To: Nipun Gupta, David Marchand; +Cc: dev, thomas, Yigit, Ferruh, Agarwal, Nikhil

On 5/10/2023 1:58 PM, Nipun Gupta wrote:
> 
> 
> On 4/24/2023 8:52 PM, David Marchand wrote:
>>
>> Hello Anatoly,
>>
>> On Wed, Apr 5, 2023 at 4:17 PM Burakov, Anatoly
>> <anatoly.burakov@intel.com> wrote:
>>>>> Could you please provide some steps to reproduce the hotplug issue
>>>>> you're having? It would be great to have a test case for this patchset
>>>>> to put it in context.
>>>>
>>>> I am working on CDX bus
>>>> (http://patchwork.dpdk.org/project/dpdk/patch/20230124140746.594066-2-nipun.gupta@amd.com/) and trying out some cases for plug/unplug.
>>>>
>>>> The test is as follows:
>>>>     # Run testpmd application
>>>>     ./dpdk-testpmd -c 0x3 -- -i --nb-cores=1
>>>>
>>>>     # Bind to VFIO
>>>>     echo "vfio-cdx" >  /sys/bus/cdx/devices/cdx-00\:00/driver_override
>>>>     echo "cdx-00:00" > /sys/bus/cdx/drivers_probe
>>>>
>>>>     # Plug a device
>>>>     testpmd> port attach cdx:cdx-00:00
>>>>
>>>>     #quit testpmd
>>>>     testpmd> quit
>>>>
>>>> This gave error at testpmd exit that memory cannot be freed. On
>>>> debugging I updated this code and seems it should be seen with any of
>>>> the device.
>>>>
>>>> I see similar test case (without quit) mentioned
>>>> https://doc.dpdk.org/dts/test_plans/hotplug_test_plan.html, but the
>>>> difference is that it is with igb_uio and issue is being observed with
>>>> VFIO.
>>>>
>>>> Please note the device/bus mentioned in the commands is not yet
>>>> upstreamed in DPDK, but patches would be sent out soon.
>>>>
>>>> Thanks,
>>>> Nipun
>>>>
>>>
>>> Thanks, I can reproduce this issue with regular devices too (run testpmd
>>> with no devices, bind a NIC to VFIO, attach it, then quit). You're
>>> correct in that since the initial mapping was done with mapping large
>>> contiguous zones (such as when mempools are created before attach), any
>>> subsequent freeing of memory will cause these errors to happen.
>>>
>>> I don't think this can be fixed by anything other than not doing the
>>> contiguous mapping thing, so provisionally, I think this patch should be
>>> accepted. I'll play around with it some more and get back to you :)
>>
>> Can we conclude on this topic?
>> It is best we merge this kind of change the sooner possible for a 
>> release.
> 
> Hi Anatoly,
>      Can you kindly update on this?
> 

Hi all,

apologies for late reply.

Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>

-- 
Thanks,
Anatoly


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] vfio: do not coalesce DMA mappings
  2023-01-04  5:19 ` [PATCH v2] " Nipun Gupta
  2023-02-02 10:48   ` David Marchand
@ 2023-05-15 11:16   ` David Marchand
  1 sibling, 0 replies; 30+ messages in thread
From: David Marchand @ 2023-05-15 11:16 UTC (permalink / raw)
  To: Nipun Gupta; +Cc: dev, thomas, anatoly.burakov, ferruh.yigit, nikhil.agarwal

On Wed, Jan 4, 2023 at 6:19 AM Nipun Gupta <nipun.gupta@amd.com> wrote:
> At the cleanup time when dma unmap is done, linux kernel
> does not allow unmap of individual segments which were
> coalesced together while creating the DMA map for type1 IOMMU
> mappings. So, this change updates the mapping of the memory
> segments(hugepages) on a per-page basis.
>
> Signed-off-by: Nipun Gupta <nipun.gupta@amd.com>
> Signed-off-by: Nikhil Agarwal <nikhil.agarwal@amd.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>


Applied, thanks.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH] vfio: do not coalesce DMA mappings
  2022-12-30  9:58 [PATCH] vfio: do not coalesce DMA mappings Nipun Gupta
  2023-01-04  5:19 ` [PATCH v2] " Nipun Gupta
@ 2023-06-29  8:21 ` Ding, Xuan
  2023-06-30  1:45   ` Nipun Gupta
  1 sibling, 1 reply; 30+ messages in thread
From: Ding, Xuan @ 2023-06-29  8:21 UTC (permalink / raw)
  To: Nipun Gupta, dev, thomas, Burakov, Anatoly, ferruh.yigit
  Cc: nikhil.agarwal, He, Xingguang, Ling, WeiX

Hi Nipun,

I'd like to appreciate your time reading this email.

Our QA team found that since this commit "a399d7b5a994: do not coalesce DMA mappings" is introduced,
the dpdk testpmd start with "--no-huge" parameters will failed, and shows "EAL: Cannot set up DMA remapping, error 28 (No space left on device)".
So they reported it on dpdk Bugzilla: https://bugs.dpdk.org/show_bug.cgi?id=1235.

I understand this feature is to keep consistent with the kernel and not allow memory segments be merged.
The side effect is the testpmd with "--no-huge" parameters will not be able to start because the too many pages will exceed the capability of IOMMU.
Is it expected? Should we remove the --no-huge" in our testcase?

Regards,
Xuan

> -----Original Message-----
> From: Nipun Gupta <nipun.gupta@amd.com>
> Sent: Friday, December 30, 2022 5:59 PM
> To: dev@dpdk.org; thomas@monjalon.net; Burakov, Anatoly
> <anatoly.burakov@intel.com>; ferruh.yigit@amd.com
> Cc: nikhil.agarwal@amd.com; Nipun Gupta <nipun.gupta@amd.com>
> Subject: [PATCH] vfio: do not coalesce DMA mappings
> 
> At the cleanup time when dma unmap is done, linux kernel does not allow
> unmap of individual segments which were coalesced together while creating
> the DMA map for type1 IOMMU mappings. So, this change updates the
> mapping of the memory
> segments(hugepages) on a per-page basis.
> 
> Signed-off-by: Nipun Gupta <nipun.gupta@amd.com>
> ---
> 
> When hotplug of devices is used, multiple pages gets colaeced and a single
> mapping gets created for these pages (using APIs
> rte_memseg_contig_walk() and type1_map_contig(). On the cleanup time
> when the memory is released, the VFIO does not cleans up that memory and
> following error is observed in the eal for 2MB
> hugepages:
> EAL: Unexpected size 0 of DMA remapping cleared instead of 2097152
> 
> This is because VFIO does not clear the DMA (refer API
> vfio_dma_do_unmap() -
> https://elixir.bootlin.com/linux/latest/source/drivers/vfio/vfio_iommu_type1.
> c#L1330),
> where it checks the dma mapping where it checks for IOVA to free:
> https://elixir.bootlin.com/linux/latest/source/drivers/vfio/vfio_iommu_type1.
> c#L1418.
> 
> Thus this change updates the mapping to be created individually instead of
> colaecing them.
> 
>  lib/eal/linux/eal_vfio.c | 29 -----------------------------
>  1 file changed, 29 deletions(-)
> 
> diff --git a/lib/eal/linux/eal_vfio.c b/lib/eal/linux/eal_vfio.c index
> 549b86ae1d..56edccb0db 100644
> --- a/lib/eal/linux/eal_vfio.c
> +++ b/lib/eal/linux/eal_vfio.c
> @@ -1369,19 +1369,6 @@ rte_vfio_get_group_num(const char *sysfs_base,
>  	return 1;
>  }
> 
> -static int
> -type1_map_contig(const struct rte_memseg_list *msl, const struct
> rte_memseg *ms,
> -		size_t len, void *arg)
> -{
> -	int *vfio_container_fd = arg;
> -
> -	if (msl->external)
> -		return 0;
> -
> -	return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64,
> ms->iova,
> -			len, 1);
> -}
> -
>  static int
>  type1_map(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
>  		void *arg)
> @@ -1396,10 +1383,6 @@ type1_map(const struct rte_memseg_list *msl,
> const struct rte_memseg *ms,
>  	if (ms->iova == RTE_BAD_IOVA)
>  		return 0;
> 
> -	/* if IOVA mode is VA, we've already mapped the internal segments */
> -	if (!msl->external && rte_eal_iova_mode() == RTE_IOVA_VA)
> -		return 0;
> -
>  	return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64,
> ms->iova,
>  			ms->len, 1);
>  }
> @@ -1464,18 +1447,6 @@ vfio_type1_dma_mem_map(int vfio_container_fd,
> uint64_t vaddr, uint64_t iova,  static int  vfio_type1_dma_map(int
> vfio_container_fd)  {
> -	if (rte_eal_iova_mode() == RTE_IOVA_VA) {
> -		/* with IOVA as VA mode, we can get away with mapping
> contiguous
> -		 * chunks rather than going page-by-page.
> -		 */
> -		int ret = rte_memseg_contig_walk(type1_map_contig,
> -				&vfio_container_fd);
> -		if (ret)
> -			return ret;
> -		/* we have to continue the walk because we've skipped the
> -		 * external segments during the config walk.
> -		 */
> -	}
>  	return rte_memseg_walk(type1_map, &vfio_container_fd);  }
> 
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] vfio: do not coalesce DMA mappings
  2023-06-29  8:21 ` [PATCH] " Ding, Xuan
@ 2023-06-30  1:45   ` Nipun Gupta
  2023-06-30  5:58     ` Ding, Xuan
  0 siblings, 1 reply; 30+ messages in thread
From: Nipun Gupta @ 2023-06-30  1:45 UTC (permalink / raw)
  To: Ding, Xuan, dev, thomas, Burakov, Anatoly, ferruh.yigit
  Cc: nikhil.agarwal, He, Xingguang, Ling, WeiX

Hi Xuan,

Thanks for pointing out the issue and figuring out the patch which 
introduced this. If you have answers to below queries, please let me know:

Is there any other test cases which tests "--no-huge" which pass?

Also, if we change the "-m" option to provide lower memory, does the 
test pass?

When you mention too many pages exceed the capability of IOMMU, you are 
referring to HW capability to create multiple pages? Here it seems in 
case of 4K page size we need 256K pages which is limiting the capacity?

Regards,
Nipun

On 6/29/2023 1:51 PM, Ding, Xuan wrote:
> Hi Nipun,
> 
> I'd like to appreciate your time reading this email.
> 
> Our QA team found that since this commit "a399d7b5a994: do not coalesce DMA mappings" is introduced,
> the dpdk testpmd start with "--no-huge" parameters will failed, and shows "EAL: Cannot set up DMA remapping, error 28 (No space left on device)".
> So they reported it on dpdk Bugzilla: https://bugs.dpdk.org/show_bug.cgi?id=1235.
> 
> I understand this feature is to keep consistent with the kernel and not allow memory segments be merged.
> The side effect is the testpmd with "--no-huge" parameters will not be able to start because the too many pages will exceed the capability of IOMMU.
> Is it expected? Should we remove the --no-huge" in our testcase?
> 
> Regards,
> Xuan
> 
>> -----Original Message-----
>> From: Nipun Gupta <nipun.gupta@amd.com>
>> Sent: Friday, December 30, 2022 5:59 PM
>> To: dev@dpdk.org; thomas@monjalon.net; Burakov, Anatoly
>> <anatoly.burakov@intel.com>; ferruh.yigit@amd.com
>> Cc: nikhil.agarwal@amd.com; Nipun Gupta <nipun.gupta@amd.com>
>> Subject: [PATCH] vfio: do not coalesce DMA mappings
>>
>> At the cleanup time when dma unmap is done, linux kernel does not allow
>> unmap of individual segments which were coalesced together while creating
>> the DMA map for type1 IOMMU mappings. So, this change updates the
>> mapping of the memory
>> segments(hugepages) on a per-page basis.
>>
>> Signed-off-by: Nipun Gupta <nipun.gupta@amd.com>
>> ---
>>
>> When hotplug of devices is used, multiple pages gets colaeced and a single
>> mapping gets created for these pages (using APIs
>> rte_memseg_contig_walk() and type1_map_contig(). On the cleanup time
>> when the memory is released, the VFIO does not cleans up that memory and
>> following error is observed in the eal for 2MB
>> hugepages:
>> EAL: Unexpected size 0 of DMA remapping cleared instead of 2097152
>>
>> This is because VFIO does not clear the DMA (refer API
>> vfio_dma_do_unmap() -
>> https://elixir.bootlin.com/linux/latest/source/drivers/vfio/vfio_iommu_type1.
>> c#L1330),
>> where it checks the dma mapping where it checks for IOVA to free:
>> https://elixir.bootlin.com/linux/latest/source/drivers/vfio/vfio_iommu_type1.
>> c#L1418.
>>
>> Thus this change updates the mapping to be created individually instead of
>> colaecing them.
>>
>>   lib/eal/linux/eal_vfio.c | 29 -----------------------------
>>   1 file changed, 29 deletions(-)
>>
>> diff --git a/lib/eal/linux/eal_vfio.c b/lib/eal/linux/eal_vfio.c index
>> 549b86ae1d..56edccb0db 100644
>> --- a/lib/eal/linux/eal_vfio.c
>> +++ b/lib/eal/linux/eal_vfio.c
>> @@ -1369,19 +1369,6 @@ rte_vfio_get_group_num(const char *sysfs_base,
>>   	return 1;
>>   }
>>
>> -static int
>> -type1_map_contig(const struct rte_memseg_list *msl, const struct
>> rte_memseg *ms,
>> -		size_t len, void *arg)
>> -{
>> -	int *vfio_container_fd = arg;
>> -
>> -	if (msl->external)
>> -		return 0;
>> -
>> -	return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64,
>> ms->iova,
>> -			len, 1);
>> -}
>> -
>>   static int
>>   type1_map(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
>>   		void *arg)
>> @@ -1396,10 +1383,6 @@ type1_map(const struct rte_memseg_list *msl,
>> const struct rte_memseg *ms,
>>   	if (ms->iova == RTE_BAD_IOVA)
>>   		return 0;
>>
>> -	/* if IOVA mode is VA, we've already mapped the internal segments */
>> -	if (!msl->external && rte_eal_iova_mode() == RTE_IOVA_VA)
>> -		return 0;
>> -
>>   	return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64,
>> ms->iova,
>>   			ms->len, 1);
>>   }
>> @@ -1464,18 +1447,6 @@ vfio_type1_dma_mem_map(int vfio_container_fd,
>> uint64_t vaddr, uint64_t iova,  static int  vfio_type1_dma_map(int
>> vfio_container_fd)  {
>> -	if (rte_eal_iova_mode() == RTE_IOVA_VA) {
>> -		/* with IOVA as VA mode, we can get away with mapping
>> contiguous
>> -		 * chunks rather than going page-by-page.
>> -		 */
>> -		int ret = rte_memseg_contig_walk(type1_map_contig,
>> -				&vfio_container_fd);
>> -		if (ret)
>> -			return ret;
>> -		/* we have to continue the walk because we've skipped the
>> -		 * external segments during the config walk.
>> -		 */
>> -	}
>>   	return rte_memseg_walk(type1_map, &vfio_container_fd);  }
>>
>> --
>> 2.25.1
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH] vfio: do not coalesce DMA mappings
  2023-06-30  1:45   ` Nipun Gupta
@ 2023-06-30  5:58     ` Ding, Xuan
  2023-07-04  5:13       ` Ding, Xuan
  0 siblings, 1 reply; 30+ messages in thread
From: Ding, Xuan @ 2023-06-30  5:58 UTC (permalink / raw)
  To: Nipun Gupta, dev, thomas, Burakov, Anatoly, ferruh.yigit
  Cc: nikhil.agarwal, He, Xingguang, Ling, WeiX

Hi Nipun,

Replies are inline.

> -----Original Message-----
> From: Nipun Gupta <nipun.gupta@amd.com>
> Sent: Friday, June 30, 2023 9:46 AM
> To: Ding, Xuan <xuan.ding@intel.com>; dev@dpdk.org;
> thomas@monjalon.net; Burakov, Anatoly <anatoly.burakov@intel.com>;
> ferruh.yigit@amd.com
> Cc: nikhil.agarwal@amd.com; He, Xingguang <xingguang.he@intel.com>; Ling,
> WeiX <weix.ling@intel.com>
> Subject: Re: [PATCH] vfio: do not coalesce DMA mappings
> 
> Hi Xuan,
> 
> Thanks for pointing out the issue and figuring out the patch which introduced
> this. If you have answers to below queries, please let me know:
> 
> Is there any other test cases which tests "--no-huge" which pass?

Yes, there are test cases adding "--no-huge" option to validate 4k page size in async vhost.
Actually, the page size is decided by front-end, so I think this case can be removed.

Previously, testpmd can start with "--no-huge" options (not sure if there are test cases).
Cmd: ./build/app/dpdk-testpmd -l 5-6 -n 4 --no-huge -m 1024 -- -i

> 
> Also, if we change the "-m" option to provide lower memory, does the test
> pass?

"-m" option is also added and does not work.

> 
> When you mention too many pages exceed the capability of IOMMU, you are
> referring to HW capability to create multiple pages? Here it seems in case of
> 4K page size we need 256K pages which is limiting the capacity?

Yes, this is the result of my initial debugging.
The direct impact is that this kind of testpmd cases cannot start now.
If this is expected, I think we can close this defect and ignore the "--no-huge" option when start.

Regards,
Xuan

> 
> Regards,
> Nipun
> 
> On 6/29/2023 1:51 PM, Ding, Xuan wrote:
> > Hi Nipun,
> >
> > I'd like to appreciate your time reading this email.
> >
> > Our QA team found that since this commit "a399d7b5a994: do not
> > coalesce DMA mappings" is introduced, the dpdk testpmd start with "--no-
> huge" parameters will failed, and shows "EAL: Cannot set up DMA
> remapping, error 28 (No space left on device)".
> > So they reported it on dpdk Bugzilla:
> https://bugs.dpdk.org/show_bug.cgi?id=1235.
> >
> > I understand this feature is to keep consistent with the kernel and not allow
> memory segments be merged.
> > The side effect is the testpmd with "--no-huge" parameters will not be able
> to start because the too many pages will exceed the capability of IOMMU.
> > Is it expected? Should we remove the --no-huge" in our testcase?
> >
> > Regards,
> > Xuan
> >
> >> -----Original Message-----
> >> From: Nipun Gupta <nipun.gupta@amd.com>
> >> Sent: Friday, December 30, 2022 5:59 PM
> >> To: dev@dpdk.org; thomas@monjalon.net; Burakov, Anatoly
> >> <anatoly.burakov@intel.com>; ferruh.yigit@amd.com
> >> Cc: nikhil.agarwal@amd.com; Nipun Gupta <nipun.gupta@amd.com>
> >> Subject: [PATCH] vfio: do not coalesce DMA mappings
> >>
> >> At the cleanup time when dma unmap is done, linux kernel does not
> >> allow unmap of individual segments which were coalesced together
> >> while creating the DMA map for type1 IOMMU mappings. So, this change
> >> updates the mapping of the memory
> >> segments(hugepages) on a per-page basis.
> >>
> >> Signed-off-by: Nipun Gupta <nipun.gupta@amd.com>
> >> ---
> >>
> >> When hotplug of devices is used, multiple pages gets colaeced and a
> >> single mapping gets created for these pages (using APIs
> >> rte_memseg_contig_walk() and type1_map_contig(). On the cleanup time
> >> when the memory is released, the VFIO does not cleans up that memory
> >> and following error is observed in the eal for 2MB
> >> hugepages:
> >> EAL: Unexpected size 0 of DMA remapping cleared instead of 2097152
> >>
> >> This is because VFIO does not clear the DMA (refer API
> >> vfio_dma_do_unmap() -
> >>
> https://elixir.bootlin.com/linux/latest/source/drivers/vfio/vfio_iommu_type
> 1.
> >> c#L1330),
> >> where it checks the dma mapping where it checks for IOVA to free:
> >>
> https://elixir.bootlin.com/linux/latest/source/drivers/vfio/vfio_iommu_type
> 1.
> >> c#L1418.
> >>
> >> Thus this change updates the mapping to be created individually
> >> instead of colaecing them.
> >>
> >>   lib/eal/linux/eal_vfio.c | 29 -----------------------------
> >>   1 file changed, 29 deletions(-)
> >>
> >> diff --git a/lib/eal/linux/eal_vfio.c b/lib/eal/linux/eal_vfio.c
> >> index 549b86ae1d..56edccb0db 100644
> >> --- a/lib/eal/linux/eal_vfio.c
> >> +++ b/lib/eal/linux/eal_vfio.c
> >> @@ -1369,19 +1369,6 @@ rte_vfio_get_group_num(const char
> *sysfs_base,
> >>   	return 1;
> >>   }
> >>
> >> -static int
> >> -type1_map_contig(const struct rte_memseg_list *msl, const struct
> >> rte_memseg *ms,
> >> -		size_t len, void *arg)
> >> -{
> >> -	int *vfio_container_fd = arg;
> >> -
> >> -	if (msl->external)
> >> -		return 0;
> >> -
> >> -	return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64,
> >> ms->iova,
> >> -			len, 1);
> >> -}
> >> -
> >>   static int
> >>   type1_map(const struct rte_memseg_list *msl, const struct rte_memseg
> *ms,
> >>   		void *arg)
> >> @@ -1396,10 +1383,6 @@ type1_map(const struct rte_memseg_list *msl,
> >> const struct rte_memseg *ms,
> >>   	if (ms->iova == RTE_BAD_IOVA)
> >>   		return 0;
> >>
> >> -	/* if IOVA mode is VA, we've already mapped the internal segments
> */
> >> -	if (!msl->external && rte_eal_iova_mode() == RTE_IOVA_VA)
> >> -		return 0;
> >> -
> >>   	return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64,
> >> ms->iova,
> >>   			ms->len, 1);
> >>   }
> >> @@ -1464,18 +1447,6 @@ vfio_type1_dma_mem_map(int
> vfio_container_fd,
> >> uint64_t vaddr, uint64_t iova,  static int  vfio_type1_dma_map(int
> >> vfio_container_fd)  {
> >> -	if (rte_eal_iova_mode() == RTE_IOVA_VA) {
> >> -		/* with IOVA as VA mode, we can get away with mapping
> >> contiguous
> >> -		 * chunks rather than going page-by-page.
> >> -		 */
> >> -		int ret = rte_memseg_contig_walk(type1_map_contig,
> >> -				&vfio_container_fd);
> >> -		if (ret)
> >> -			return ret;
> >> -		/* we have to continue the walk because we've skipped the
> >> -		 * external segments during the config walk.
> >> -		 */
> >> -	}
> >>   	return rte_memseg_walk(type1_map, &vfio_container_fd);  }
> >>
> >> --
> >> 2.25.1
> >

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH] vfio: do not coalesce DMA mappings
  2023-06-30  5:58     ` Ding, Xuan
@ 2023-07-04  5:13       ` Ding, Xuan
  2023-07-04  6:53         ` Gupta, Nipun
  0 siblings, 1 reply; 30+ messages in thread
From: Ding, Xuan @ 2023-07-04  5:13 UTC (permalink / raw)
  To: Nipun Gupta, dev, thomas, Burakov, Anatoly, ferruh.yigit
  Cc: nikhil.agarwal, He, Xingguang, Ling, WeiX

Hi Nipun,

> -----Original Message-----
> From: Ding, Xuan
> Sent: Friday, June 30, 2023 1:58 PM
> To: Nipun Gupta <nipun.gupta@amd.com>; dev@dpdk.org;
> thomas@monjalon.net; Burakov, Anatoly <anatoly.burakov@intel.com>;
> ferruh.yigit@amd.com
> Cc: nikhil.agarwal@amd.com; He, Xingguang <xingguang.he@intel.com>; Ling,
> WeiX <weix.ling@intel.com>
> Subject: RE: [PATCH] vfio: do not coalesce DMA mappings
> 
> Hi Nipun,
> 
> Replies are inline.
> 
> > -----Original Message-----
> > From: Nipun Gupta <nipun.gupta@amd.com>
> > Sent: Friday, June 30, 2023 9:46 AM
> > To: Ding, Xuan <xuan.ding@intel.com>; dev@dpdk.org;
> > thomas@monjalon.net; Burakov, Anatoly <anatoly.burakov@intel.com>;
> > ferruh.yigit@amd.com
> > Cc: nikhil.agarwal@amd.com; He, Xingguang <xingguang.he@intel.com>;
> > Ling, WeiX <weix.ling@intel.com>
> > Subject: Re: [PATCH] vfio: do not coalesce DMA mappings
> >
> > Hi Xuan,
> >
> > Thanks for pointing out the issue and figuring out the patch which
> > introduced this. If you have answers to below queries, please let me know:
> >
> > Is there any other test cases which tests "--no-huge" which pass?
> 
> Yes, there are test cases adding "--no-huge" option to validate 4k page size in
> async vhost.
> Actually, the page size is decided by front-end, so I think this case can be
> removed.
> 
> Previously, testpmd can start with "--no-huge" options (not sure if there are
> test cases).
> Cmd: ./build/app/dpdk-testpmd -l 5-6 -n 4 --no-huge -m 1024 -- -i
> 
> >
> > Also, if we change the "-m" option to provide lower memory, does the
> > test pass?
> 
> "-m" option is also added and does not work.
> 
> >
> > When you mention too many pages exceed the capability of IOMMU, you
> > are referring to HW capability to create multiple pages? Here it seems
> > in case of 4K page size we need 256K pages which is limiting the capacity?
> 
> Yes, this is the result of my initial debugging.
> The direct impact is that this kind of testpmd cases cannot start now.
> If this is expected, I think we can close this defect and ignore the "--no-huge"
> option when start.

Any insights? Should we just ignore the "--no-huge" option and close this defect?
Now we did this as a workaround. Seems no one uses the "--no-huge" option in testpmd now.

Thanks,
Xuan

> 
> Regards,
> Xuan
> 
> >
> > Regards,
> > Nipun
> >
> > On 6/29/2023 1:51 PM, Ding, Xuan wrote:
> > > Hi Nipun,
> > >
> > > I'd like to appreciate your time reading this email.
> > >
> > > Our QA team found that since this commit "a399d7b5a994: do not
> > > coalesce DMA mappings" is introduced, the dpdk testpmd start with
> > > "--no-
> > huge" parameters will failed, and shows "EAL: Cannot set up DMA
> > remapping, error 28 (No space left on device)".
> > > So they reported it on dpdk Bugzilla:
> > https://bugs.dpdk.org/show_bug.cgi?id=1235.
> > >
> > > I understand this feature is to keep consistent with the kernel and
> > > not allow
> > memory segments be merged.
> > > The side effect is the testpmd with "--no-huge" parameters will not
> > > be able
> > to start because the too many pages will exceed the capability of IOMMU.
> > > Is it expected? Should we remove the --no-huge" in our testcase?
> > >
> > > Regards,
> > > Xuan
> > >
> > >> -----Original Message-----
> > >> From: Nipun Gupta <nipun.gupta@amd.com>
> > >> Sent: Friday, December 30, 2022 5:59 PM
> > >> To: dev@dpdk.org; thomas@monjalon.net; Burakov, Anatoly
> > >> <anatoly.burakov@intel.com>; ferruh.yigit@amd.com
> > >> Cc: nikhil.agarwal@amd.com; Nipun Gupta <nipun.gupta@amd.com>
> > >> Subject: [PATCH] vfio: do not coalesce DMA mappings
> > >>
> > >> At the cleanup time when dma unmap is done, linux kernel does not
> > >> allow unmap of individual segments which were coalesced together
> > >> while creating the DMA map for type1 IOMMU mappings. So, this
> > >> change updates the mapping of the memory
> > >> segments(hugepages) on a per-page basis.
> > >>
> > >> Signed-off-by: Nipun Gupta <nipun.gupta@amd.com>
> > >> ---
> > >>
> > >> When hotplug of devices is used, multiple pages gets colaeced and a
> > >> single mapping gets created for these pages (using APIs
> > >> rte_memseg_contig_walk() and type1_map_contig(). On the cleanup
> > >> time when the memory is released, the VFIO does not cleans up that
> > >> memory and following error is observed in the eal for 2MB
> > >> hugepages:
> > >> EAL: Unexpected size 0 of DMA remapping cleared instead of 2097152
> > >>
> > >> This is because VFIO does not clear the DMA (refer API
> > >> vfio_dma_do_unmap() -
> > >>
> > https://elixir.bootlin.com/linux/latest/source/drivers/vfio/vfio_iommu
> > _type
> > 1.
> > >> c#L1330),
> > >> where it checks the dma mapping where it checks for IOVA to free:
> > >>
> > https://elixir.bootlin.com/linux/latest/source/drivers/vfio/vfio_iommu
> > _type
> > 1.
> > >> c#L1418.
> > >>
> > >> Thus this change updates the mapping to be created individually
> > >> instead of colaecing them.
> > >>
> > >>   lib/eal/linux/eal_vfio.c | 29 -----------------------------
> > >>   1 file changed, 29 deletions(-)
> > >>
> > >> diff --git a/lib/eal/linux/eal_vfio.c b/lib/eal/linux/eal_vfio.c
> > >> index 549b86ae1d..56edccb0db 100644
> > >> --- a/lib/eal/linux/eal_vfio.c
> > >> +++ b/lib/eal/linux/eal_vfio.c
> > >> @@ -1369,19 +1369,6 @@ rte_vfio_get_group_num(const char
> > *sysfs_base,
> > >>   	return 1;
> > >>   }
> > >>
> > >> -static int
> > >> -type1_map_contig(const struct rte_memseg_list *msl, const struct
> > >> rte_memseg *ms,
> > >> -		size_t len, void *arg)
> > >> -{
> > >> -	int *vfio_container_fd = arg;
> > >> -
> > >> -	if (msl->external)
> > >> -		return 0;
> > >> -
> > >> -	return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64,
> > >> ms->iova,
> > >> -			len, 1);
> > >> -}
> > >> -
> > >>   static int
> > >>   type1_map(const struct rte_memseg_list *msl, const struct
> > >> rte_memseg
> > *ms,
> > >>   		void *arg)
> > >> @@ -1396,10 +1383,6 @@ type1_map(const struct rte_memseg_list *msl,
> > >> const struct rte_memseg *ms,
> > >>   	if (ms->iova == RTE_BAD_IOVA)
> > >>   		return 0;
> > >>
> > >> -	/* if IOVA mode is VA, we've already mapped the internal segments
> > */
> > >> -	if (!msl->external && rte_eal_iova_mode() == RTE_IOVA_VA)
> > >> -		return 0;
> > >> -
> > >>   	return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64,
> > >> ms->iova,
> > >>   			ms->len, 1);
> > >>   }
> > >> @@ -1464,18 +1447,6 @@ vfio_type1_dma_mem_map(int
> > vfio_container_fd,
> > >> uint64_t vaddr, uint64_t iova,  static int  vfio_type1_dma_map(int
> > >> vfio_container_fd)  {
> > >> -	if (rte_eal_iova_mode() == RTE_IOVA_VA) {
> > >> -		/* with IOVA as VA mode, we can get away with mapping
> > >> contiguous
> > >> -		 * chunks rather than going page-by-page.
> > >> -		 */
> > >> -		int ret = rte_memseg_contig_walk(type1_map_contig,
> > >> -				&vfio_container_fd);
> > >> -		if (ret)
> > >> -			return ret;
> > >> -		/* we have to continue the walk because we've skipped the
> > >> -		 * external segments during the config walk.
> > >> -		 */
> > >> -	}
> > >>   	return rte_memseg_walk(type1_map, &vfio_container_fd);  }
> > >>
> > >> --
> > >> 2.25.1
> > >

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH] vfio: do not coalesce DMA mappings
  2023-07-04  5:13       ` Ding, Xuan
@ 2023-07-04  6:53         ` Gupta, Nipun
  2023-07-04  8:06           ` Ding, Xuan
  0 siblings, 1 reply; 30+ messages in thread
From: Gupta, Nipun @ 2023-07-04  6:53 UTC (permalink / raw)
  To: Ding, Xuan, dev, thomas, Burakov, Anatoly, Yigit, Ferruh, David Marchand
  Cc: Agarwal, Nikhil, He, Xingguang, Ling, WeiX

Hi Xuan,

Please see inline.

> -----Original Message-----
> From: Ding, Xuan <xuan.ding@intel.com>
> Sent: Tuesday, July 4, 2023 10:43 AM
> To: Gupta, Nipun <Nipun.Gupta@amd.com>; dev@dpdk.org;
> thomas@monjalon.net; Burakov, Anatoly <anatoly.burakov@intel.com>; Yigit,
> Ferruh <Ferruh.Yigit@amd.com>
> Cc: Agarwal, Nikhil <nikhil.agarwal@amd.com>; He, Xingguang
> <xingguang.he@intel.com>; Ling, WeiX <weix.ling@intel.com>
> Subject: RE: [PATCH] vfio: do not coalesce DMA mappings
> 
> Hi Nipun,
> 
> > -----Original Message-----
> > From: Ding, Xuan
> > Sent: Friday, June 30, 2023 1:58 PM
> > To: Nipun Gupta <nipun.gupta@amd.com>; dev@dpdk.org;
> > thomas@monjalon.net; Burakov, Anatoly <anatoly.burakov@intel.com>;
> > ferruh.yigit@amd.com
> > Cc: nikhil.agarwal@amd.com; He, Xingguang <xingguang.he@intel.com>; Ling,
> > WeiX <weix.ling@intel.com>
> > Subject: RE: [PATCH] vfio: do not coalesce DMA mappings
> >
> > Hi Nipun,
> >
> > Replies are inline.
> >
> > > -----Original Message-----
> > > From: Nipun Gupta <nipun.gupta@amd.com>
> > > Sent: Friday, June 30, 2023 9:46 AM
> > > To: Ding, Xuan <xuan.ding@intel.com>; dev@dpdk.org;
> > > thomas@monjalon.net; Burakov, Anatoly <anatoly.burakov@intel.com>;
> > > ferruh.yigit@amd.com
> > > Cc: nikhil.agarwal@amd.com; He, Xingguang <xingguang.he@intel.com>;
> > > Ling, WeiX <weix.ling@intel.com>
> > > Subject: Re: [PATCH] vfio: do not coalesce DMA mappings
> > >
> > > Hi Xuan,
> > >
> > > Thanks for pointing out the issue and figuring out the patch which
> > > introduced this. If you have answers to below queries, please let me know:
> > >
> > > Is there any other test cases which tests "--no-huge" which pass?
> >
> > Yes, there are test cases adding "--no-huge" option to validate 4k page size in
> > async vhost.
> > Actually, the page size is decided by front-end, so I think this case can be
> > removed.
> >
> > Previously, testpmd can start with "--no-huge" options (not sure if there are
> > test cases).
> > Cmd: ./build/app/dpdk-testpmd -l 5-6 -n 4 --no-huge -m 1024 -- -i
> >
> > >
> > > Also, if we change the "-m" option to provide lower memory, does the
> > > test pass?
> >
> > "-m" option is also added and does not work.
> >
> > >
> > > When you mention too many pages exceed the capability of IOMMU, you
> > > are referring to HW capability to create multiple pages? Here it seems
> > > in case of 4K page size we need 256K pages which is limiting the capacity?
> >
> > Yes, this is the result of my initial debugging.
> > The direct impact is that this kind of testpmd cases cannot start now.
> > If this is expected, I think we can close this defect and ignore the "--no-huge"
> > option when start.
> 
> Any insights? Should we just ignore the "--no-huge" option and close this defect?
> Now we did this as a workaround. Seems no one uses the "--no-huge" option in
> testpmd now.

VFIO supports dma_entry_limit as a module parameter, which has a default value of
U16_MAX i.e. 64K, most likely which is limiting creation of 256K entries for 4K pages
here. This can be modified while inserting vfio module:
        modprobe vfio_iommu_type1 dma_entry_limit=1000000

You can update the test case with updating this limit, and test case shall pass.

Thanks,
Nipun

> 
> Thanks,
> Xuan
> 
> >
> > Regards,
> > Xuan
> >
> > >
> > > Regards,
> > > Nipun
> > >
> > > On 6/29/2023 1:51 PM, Ding, Xuan wrote:
> > > > Hi Nipun,
> > > >
> > > > I'd like to appreciate your time reading this email.
> > > >
> > > > Our QA team found that since this commit "a399d7b5a994: do not
> > > > coalesce DMA mappings" is introduced, the dpdk testpmd start with
> > > > "--no-
> > > huge" parameters will failed, and shows "EAL: Cannot set up DMA
> > > remapping, error 28 (No space left on device)".
> > > > So they reported it on dpdk Bugzilla:
> > > https://bugs.dpdk.org/show_bug.cgi?id=1235.
> > > >
> > > > I understand this feature is to keep consistent with the kernel and
> > > > not allow
> > > memory segments be merged.
> > > > The side effect is the testpmd with "--no-huge" parameters will not
> > > > be able
> > > to start because the too many pages will exceed the capability of IOMMU.
> > > > Is it expected? Should we remove the --no-huge" in our testcase?
> > > >
> > > > Regards,
> > > > Xuan
> > > >
> > > >> -----Original Message-----
> > > >> From: Nipun Gupta <nipun.gupta@amd.com>
> > > >> Sent: Friday, December 30, 2022 5:59 PM
> > > >> To: dev@dpdk.org; thomas@monjalon.net; Burakov, Anatoly
> > > >> <anatoly.burakov@intel.com>; ferruh.yigit@amd.com
> > > >> Cc: nikhil.agarwal@amd.com; Nipun Gupta <nipun.gupta@amd.com>
> > > >> Subject: [PATCH] vfio: do not coalesce DMA mappings
> > > >>
> > > >> At the cleanup time when dma unmap is done, linux kernel does not
> > > >> allow unmap of individual segments which were coalesced together
> > > >> while creating the DMA map for type1 IOMMU mappings. So, this
> > > >> change updates the mapping of the memory
> > > >> segments(hugepages) on a per-page basis.
> > > >>
> > > >> Signed-off-by: Nipun Gupta <nipun.gupta@amd.com>
> > > >> ---
> > > >>
> > > >> When hotplug of devices is used, multiple pages gets colaeced and a
> > > >> single mapping gets created for these pages (using APIs
> > > >> rte_memseg_contig_walk() and type1_map_contig(). On the cleanup
> > > >> time when the memory is released, the VFIO does not cleans up that
> > > >> memory and following error is observed in the eal for 2MB
> > > >> hugepages:
> > > >> EAL: Unexpected size 0 of DMA remapping cleared instead of 2097152
> > > >>
> > > >> This is because VFIO does not clear the DMA (refer API
> > > >> vfio_dma_do_unmap() -
> > > >>
> > > https://elixir.bootlin.com/linux/latest/source/drivers/vfio/vfio_iommu
> > > _type
> > > 1.
> > > >> c#L1330),
> > > >> where it checks the dma mapping where it checks for IOVA to free:
> > > >>
> > > https://elixir.bootlin.com/linux/latest/source/drivers/vfio/vfio_iommu
> > > _type
> > > 1.
> > > >> c#L1418.
> > > >>
> > > >> Thus this change updates the mapping to be created individually
> > > >> instead of colaecing them.
> > > >>
> > > >>   lib/eal/linux/eal_vfio.c | 29 -----------------------------
> > > >>   1 file changed, 29 deletions(-)
> > > >>
> > > >> diff --git a/lib/eal/linux/eal_vfio.c b/lib/eal/linux/eal_vfio.c
> > > >> index 549b86ae1d..56edccb0db 100644
> > > >> --- a/lib/eal/linux/eal_vfio.c
> > > >> +++ b/lib/eal/linux/eal_vfio.c
> > > >> @@ -1369,19 +1369,6 @@ rte_vfio_get_group_num(const char
> > > *sysfs_base,
> > > >>   	return 1;
> > > >>   }
> > > >>
> > > >> -static int
> > > >> -type1_map_contig(const struct rte_memseg_list *msl, const struct
> > > >> rte_memseg *ms,
> > > >> -		size_t len, void *arg)
> > > >> -{
> > > >> -	int *vfio_container_fd = arg;
> > > >> -
> > > >> -	if (msl->external)
> > > >> -		return 0;
> > > >> -
> > > >> -	return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64,
> > > >> ms->iova,
> > > >> -			len, 1);
> > > >> -}
> > > >> -
> > > >>   static int
> > > >>   type1_map(const struct rte_memseg_list *msl, const struct
> > > >> rte_memseg
> > > *ms,
> > > >>   		void *arg)
> > > >> @@ -1396,10 +1383,6 @@ type1_map(const struct rte_memseg_list *msl,
> > > >> const struct rte_memseg *ms,
> > > >>   	if (ms->iova == RTE_BAD_IOVA)
> > > >>   		return 0;
> > > >>
> > > >> -	/* if IOVA mode is VA, we've already mapped the internal segments
> > > */
> > > >> -	if (!msl->external && rte_eal_iova_mode() == RTE_IOVA_VA)
> > > >> -		return 0;
> > > >> -
> > > >>   	return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64,
> > > >> ms->iova,
> > > >>   			ms->len, 1);
> > > >>   }
> > > >> @@ -1464,18 +1447,6 @@ vfio_type1_dma_mem_map(int
> > > vfio_container_fd,
> > > >> uint64_t vaddr, uint64_t iova,  static int  vfio_type1_dma_map(int
> > > >> vfio_container_fd)  {
> > > >> -	if (rte_eal_iova_mode() == RTE_IOVA_VA) {
> > > >> -		/* with IOVA as VA mode, we can get away with mapping
> > > >> contiguous
> > > >> -		 * chunks rather than going page-by-page.
> > > >> -		 */
> > > >> -		int ret = rte_memseg_contig_walk(type1_map_contig,
> > > >> -				&vfio_container_fd);
> > > >> -		if (ret)
> > > >> -			return ret;
> > > >> -		/* we have to continue the walk because we've skipped the
> > > >> -		 * external segments during the config walk.
> > > >> -		 */
> > > >> -	}
> > > >>   	return rte_memseg_walk(type1_map, &vfio_container_fd);  }
> > > >>
> > > >> --
> > > >> 2.25.1
> > > >

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH] vfio: do not coalesce DMA mappings
  2023-07-04  6:53         ` Gupta, Nipun
@ 2023-07-04  8:06           ` Ding, Xuan
  2023-07-04  9:23             ` Gupta, Nipun
  0 siblings, 1 reply; 30+ messages in thread
From: Ding, Xuan @ 2023-07-04  8:06 UTC (permalink / raw)
  To: Gupta, Nipun, dev, thomas, Burakov, Anatoly, Yigit, Ferruh,
	David Marchand
  Cc: Agarwal, Nikhil, He, Xingguang, Ling, WeiX

Hi Nipun,

> -----Original Message-----
> From: Gupta, Nipun <Nipun.Gupta@amd.com>
> Sent: Tuesday, July 4, 2023 2:54 PM
> To: Ding, Xuan <xuan.ding@intel.com>; dev@dpdk.org; thomas@monjalon.net;
> Burakov, Anatoly <anatoly.burakov@intel.com>; Yigit, Ferruh
> <Ferruh.Yigit@amd.com>; David Marchand <david.marchand@redhat.com>
> Cc: Agarwal, Nikhil <nikhil.agarwal@amd.com>; He, Xingguang
> <xingguang.he@intel.com>; Ling, WeiX <weix.ling@intel.com>
> Subject: RE: [PATCH] vfio: do not coalesce DMA mappings
> 
> Hi Xuan,
> 
> Please see inline.
> 
> > -----Original Message-----
> > From: Ding, Xuan <xuan.ding@intel.com>
> > Sent: Tuesday, July 4, 2023 10:43 AM
> > To: Gupta, Nipun <Nipun.Gupta@amd.com>; dev@dpdk.org;
> > thomas@monjalon.net; Burakov, Anatoly <anatoly.burakov@intel.com>;
> > Yigit, Ferruh <Ferruh.Yigit@amd.com>
> > Cc: Agarwal, Nikhil <nikhil.agarwal@amd.com>; He, Xingguang
> > <xingguang.he@intel.com>; Ling, WeiX <weix.ling@intel.com>
> > Subject: RE: [PATCH] vfio: do not coalesce DMA mappings
> >
> > Hi Nipun,
> >
> > > -----Original Message-----
> > > From: Ding, Xuan
> > > Sent: Friday, June 30, 2023 1:58 PM
> > > To: Nipun Gupta <nipun.gupta@amd.com>; dev@dpdk.org;
> > > thomas@monjalon.net; Burakov, Anatoly <anatoly.burakov@intel.com>;
> > > ferruh.yigit@amd.com
> > > Cc: nikhil.agarwal@amd.com; He, Xingguang <xingguang.he@intel.com>;
> > > Ling, WeiX <weix.ling@intel.com>
> > > Subject: RE: [PATCH] vfio: do not coalesce DMA mappings
> > >
> > > Hi Nipun,
> > >
> > > Replies are inline.
> > >
> > > > -----Original Message-----
> > > > From: Nipun Gupta <nipun.gupta@amd.com>
> > > > Sent: Friday, June 30, 2023 9:46 AM
> > > > To: Ding, Xuan <xuan.ding@intel.com>; dev@dpdk.org;
> > > > thomas@monjalon.net; Burakov, Anatoly <anatoly.burakov@intel.com>;
> > > > ferruh.yigit@amd.com
> > > > Cc: nikhil.agarwal@amd.com; He, Xingguang
> > > > <xingguang.he@intel.com>; Ling, WeiX <weix.ling@intel.com>
> > > > Subject: Re: [PATCH] vfio: do not coalesce DMA mappings
> > > >
> > > > Hi Xuan,
> > > >
> > > > Thanks for pointing out the issue and figuring out the patch which
> > > > introduced this. If you have answers to below queries, please let me know:
> > > >
> > > > Is there any other test cases which tests "--no-huge" which pass?
> > >
> > > Yes, there are test cases adding "--no-huge" option to validate 4k
> > > page size in async vhost.
> > > Actually, the page size is decided by front-end, so I think this
> > > case can be removed.
> > >
> > > Previously, testpmd can start with "--no-huge" options (not sure if
> > > there are test cases).
> > > Cmd: ./build/app/dpdk-testpmd -l 5-6 -n 4 --no-huge -m 1024 -- -i
> > >
> > > >
> > > > Also, if we change the "-m" option to provide lower memory, does
> > > > the test pass?
> > >
> > > "-m" option is also added and does not work.
> > >
> > > >
> > > > When you mention too many pages exceed the capability of IOMMU,
> > > > you are referring to HW capability to create multiple pages? Here
> > > > it seems in case of 4K page size we need 256K pages which is limiting the
> capacity?
> > >
> > > Yes, this is the result of my initial debugging.
> > > The direct impact is that this kind of testpmd cases cannot start now.
> > > If this is expected, I think we can close this defect and ignore the "--no-
> huge"
> > > option when start.
> >
> > Any insights? Should we just ignore the "--no-huge" option and close this
> defect?
> > Now we did this as a workaround. Seems no one uses the "--no-huge"
> > option in testpmd now.
> 
> VFIO supports dma_entry_limit as a module parameter, which has a default
> value of U16_MAX i.e. 64K, most likely which is limiting creation of 256K
> entries for 4K pages here. This can be modified while inserting vfio module:
>         modprobe vfio_iommu_type1 dma_entry_limit=1000000

Thanks for your suggestion. I tried it on ubuntu 22.04 but it does not work.
The reason I think is vfio-pci is build-in in kernel driver (since 20.04) and it does not support dynamic insmod/rmmod.

Does this command need to rmmod vfio first and then modprobe again?

Thanks,
Xuan

> 
> You can update the test case with updating this limit, and test case shall pass.
> 
> Thanks,
> Nipun
> 
> >
> > Thanks,
> > Xuan
> >
> > >
> > > Regards,
> > > Xuan
> > >
> > > >
> > > > Regards,
> > > > Nipun
> > > >
> > > > On 6/29/2023 1:51 PM, Ding, Xuan wrote:
> > > > > Hi Nipun,
> > > > >
> > > > > I'd like to appreciate your time reading this email.
> > > > >
> > > > > Our QA team found that since this commit "a399d7b5a994: do not
> > > > > coalesce DMA mappings" is introduced, the dpdk testpmd start
> > > > > with
> > > > > "--no-
> > > > huge" parameters will failed, and shows "EAL: Cannot set up DMA
> > > > remapping, error 28 (No space left on device)".
> > > > > So they reported it on dpdk Bugzilla:
> > > > https://bugs.dpdk.org/show_bug.cgi?id=1235.
> > > > >
> > > > > I understand this feature is to keep consistent with the kernel
> > > > > and not allow
> > > > memory segments be merged.
> > > > > The side effect is the testpmd with "--no-huge" parameters will
> > > > > not be able
> > > > to start because the too many pages will exceed the capability of IOMMU.
> > > > > Is it expected? Should we remove the --no-huge" in our testcase?
> > > > >
> > > > > Regards,
> > > > > Xuan
> > > > >
> > > > >> -----Original Message-----
> > > > >> From: Nipun Gupta <nipun.gupta@amd.com>
> > > > >> Sent: Friday, December 30, 2022 5:59 PM
> > > > >> To: dev@dpdk.org; thomas@monjalon.net; Burakov, Anatoly
> > > > >> <anatoly.burakov@intel.com>; ferruh.yigit@amd.com
> > > > >> Cc: nikhil.agarwal@amd.com; Nipun Gupta <nipun.gupta@amd.com>
> > > > >> Subject: [PATCH] vfio: do not coalesce DMA mappings
> > > > >>
> > > > >> At the cleanup time when dma unmap is done, linux kernel does
> > > > >> not allow unmap of individual segments which were coalesced
> > > > >> together while creating the DMA map for type1 IOMMU mappings.
> > > > >> So, this change updates the mapping of the memory
> > > > >> segments(hugepages) on a per-page basis.
> > > > >>
> > > > >> Signed-off-by: Nipun Gupta <nipun.gupta@amd.com>
> > > > >> ---
> > > > >>
> > > > >> When hotplug of devices is used, multiple pages gets colaeced
> > > > >> and a single mapping gets created for these pages (using APIs
> > > > >> rte_memseg_contig_walk() and type1_map_contig(). On the cleanup
> > > > >> time when the memory is released, the VFIO does not cleans up
> > > > >> that memory and following error is observed in the eal for 2MB
> > > > >> hugepages:
> > > > >> EAL: Unexpected size 0 of DMA remapping cleared instead of
> > > > >> 2097152
> > > > >>
> > > > >> This is because VFIO does not clear the DMA (refer API
> > > > >> vfio_dma_do_unmap() -
> > > > >>
> > > > https://elixir.bootlin.com/linux/latest/source/drivers/vfio/vfio_i
> > > > ommu
> > > > _type
> > > > 1.
> > > > >> c#L1330),
> > > > >> where it checks the dma mapping where it checks for IOVA to free:
> > > > >>
> > > > https://elixir.bootlin.com/linux/latest/source/drivers/vfio/vfio_i
> > > > ommu
> > > > _type
> > > > 1.
> > > > >> c#L1418.
> > > > >>
> > > > >> Thus this change updates the mapping to be created individually
> > > > >> instead of colaecing them.
> > > > >>
> > > > >>   lib/eal/linux/eal_vfio.c | 29 -----------------------------
> > > > >>   1 file changed, 29 deletions(-)
> > > > >>
> > > > >> diff --git a/lib/eal/linux/eal_vfio.c
> > > > >> b/lib/eal/linux/eal_vfio.c index 549b86ae1d..56edccb0db 100644
> > > > >> --- a/lib/eal/linux/eal_vfio.c
> > > > >> +++ b/lib/eal/linux/eal_vfio.c
> > > > >> @@ -1369,19 +1369,6 @@ rte_vfio_get_group_num(const char
> > > > *sysfs_base,
> > > > >>   	return 1;
> > > > >>   }
> > > > >>
> > > > >> -static int
> > > > >> -type1_map_contig(const struct rte_memseg_list *msl, const
> > > > >> struct rte_memseg *ms,
> > > > >> -		size_t len, void *arg)
> > > > >> -{
> > > > >> -	int *vfio_container_fd = arg;
> > > > >> -
> > > > >> -	if (msl->external)
> > > > >> -		return 0;
> > > > >> -
> > > > >> -	return vfio_type1_dma_mem_map(*vfio_container_fd, ms-
> >addr_64,
> > > > >> ms->iova,
> > > > >> -			len, 1);
> > > > >> -}
> > > > >> -
> > > > >>   static int
> > > > >>   type1_map(const struct rte_memseg_list *msl, const struct
> > > > >> rte_memseg
> > > > *ms,
> > > > >>   		void *arg)
> > > > >> @@ -1396,10 +1383,6 @@ type1_map(const struct rte_memseg_list
> > > > >> *msl, const struct rte_memseg *ms,
> > > > >>   	if (ms->iova == RTE_BAD_IOVA)
> > > > >>   		return 0;
> > > > >>
> > > > >> -	/* if IOVA mode is VA, we've already mapped the internal
> segments
> > > > */
> > > > >> -	if (!msl->external && rte_eal_iova_mode() == RTE_IOVA_VA)
> > > > >> -		return 0;
> > > > >> -
> > > > >>   	return vfio_type1_dma_mem_map(*vfio_container_fd,
> > > > >> ms->addr_64,
> > > > >> ms->iova,
> > > > >>   			ms->len, 1);
> > > > >>   }
> > > > >> @@ -1464,18 +1447,6 @@ vfio_type1_dma_mem_map(int
> > > > vfio_container_fd,
> > > > >> uint64_t vaddr, uint64_t iova,  static int
> > > > >> vfio_type1_dma_map(int
> > > > >> vfio_container_fd)  {
> > > > >> -	if (rte_eal_iova_mode() == RTE_IOVA_VA) {
> > > > >> -		/* with IOVA as VA mode, we can get away with
> mapping
> > > > >> contiguous
> > > > >> -		 * chunks rather than going page-by-page.
> > > > >> -		 */
> > > > >> -		int ret = rte_memseg_contig_walk(type1_map_contig,
> > > > >> -				&vfio_container_fd);
> > > > >> -		if (ret)
> > > > >> -			return ret;
> > > > >> -		/* we have to continue the walk because we've
> skipped the
> > > > >> -		 * external segments during the config walk.
> > > > >> -		 */
> > > > >> -	}
> > > > >>   	return rte_memseg_walk(type1_map, &vfio_container_fd);  }
> > > > >>
> > > > >> --
> > > > >> 2.25.1
> > > > >

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] vfio: do not coalesce DMA mappings
  2023-07-04  8:06           ` Ding, Xuan
@ 2023-07-04  9:23             ` Gupta, Nipun
  2023-07-04 14:09               ` Thomas Monjalon
  2023-07-04 15:11               ` Ding, Xuan
  0 siblings, 2 replies; 30+ messages in thread
From: Gupta, Nipun @ 2023-07-04  9:23 UTC (permalink / raw)
  To: Ding, Xuan, dev, thomas, Burakov, Anatoly, Yigit, Ferruh, David Marchand
  Cc: Agarwal, Nikhil, He, Xingguang, Ling, WeiX

Hi Xuan,

On 7/4/2023 1:36 PM, Ding, Xuan wrote:
> Hi Nipun,
> 
>> -----Original Message-----
>> From: Gupta, Nipun <Nipun.Gupta@amd.com>
>> Sent: Tuesday, July 4, 2023 2:54 PM
>> To: Ding, Xuan <xuan.ding@intel.com>; dev@dpdk.org; thomas@monjalon.net;
>> Burakov, Anatoly <anatoly.burakov@intel.com>; Yigit, Ferruh
>> <Ferruh.Yigit@amd.com>; David Marchand <david.marchand@redhat.com>
>> Cc: Agarwal, Nikhil <nikhil.agarwal@amd.com>; He, Xingguang
>> <xingguang.he@intel.com>; Ling, WeiX <weix.ling@intel.com>
>> Subject: RE: [PATCH] vfio: do not coalesce DMA mappings
>>
>> Hi Xuan,
>>
>> Please see inline.
>>
>>> -----Original Message-----
>>> From: Ding, Xuan <xuan.ding@intel.com>
>>> Sent: Tuesday, July 4, 2023 10:43 AM
>>> To: Gupta, Nipun <Nipun.Gupta@amd.com>; dev@dpdk.org;
>>> thomas@monjalon.net; Burakov, Anatoly <anatoly.burakov@intel.com>;
>>> Yigit, Ferruh <Ferruh.Yigit@amd.com>
>>> Cc: Agarwal, Nikhil <nikhil.agarwal@amd.com>; He, Xingguang
>>> <xingguang.he@intel.com>; Ling, WeiX <weix.ling@intel.com>
>>> Subject: RE: [PATCH] vfio: do not coalesce DMA mappings
>>>
>>> Hi Nipun,
>>>
>>>> -----Original Message-----
>>>> From: Ding, Xuan
>>>> Sent: Friday, June 30, 2023 1:58 PM
>>>> To: Nipun Gupta <nipun.gupta@amd.com>; dev@dpdk.org;
>>>> thomas@monjalon.net; Burakov, Anatoly <anatoly.burakov@intel.com>;
>>>> ferruh.yigit@amd.com
>>>> Cc: nikhil.agarwal@amd.com; He, Xingguang <xingguang.he@intel.com>;
>>>> Ling, WeiX <weix.ling@intel.com>
>>>> Subject: RE: [PATCH] vfio: do not coalesce DMA mappings
>>>>
>>>> Hi Nipun,
>>>>
>>>> Replies are inline.
>>>>
>>>>> -----Original Message-----
>>>>> From: Nipun Gupta <nipun.gupta@amd.com>
>>>>> Sent: Friday, June 30, 2023 9:46 AM
>>>>> To: Ding, Xuan <xuan.ding@intel.com>; dev@dpdk.org;
>>>>> thomas@monjalon.net; Burakov, Anatoly <anatoly.burakov@intel.com>;
>>>>> ferruh.yigit@amd.com
>>>>> Cc: nikhil.agarwal@amd.com; He, Xingguang
>>>>> <xingguang.he@intel.com>; Ling, WeiX <weix.ling@intel.com>
>>>>> Subject: Re: [PATCH] vfio: do not coalesce DMA mappings
>>>>>
>>>>> Hi Xuan,
>>>>>
>>>>> Thanks for pointing out the issue and figuring out the patch which
>>>>> introduced this. If you have answers to below queries, please let me know:
>>>>>
>>>>> Is there any other test cases which tests "--no-huge" which pass?
>>>>
>>>> Yes, there are test cases adding "--no-huge" option to validate 4k
>>>> page size in async vhost.
>>>> Actually, the page size is decided by front-end, so I think this
>>>> case can be removed.
>>>>
>>>> Previously, testpmd can start with "--no-huge" options (not sure if
>>>> there are test cases).
>>>> Cmd: ./build/app/dpdk-testpmd -l 5-6 -n 4 --no-huge -m 1024 -- -i
>>>>
>>>>>
>>>>> Also, if we change the "-m" option to provide lower memory, does
>>>>> the test pass?
>>>>
>>>> "-m" option is also added and does not work.
>>>>
>>>>>
>>>>> When you mention too many pages exceed the capability of IOMMU,
>>>>> you are referring to HW capability to create multiple pages? Here
>>>>> it seems in case of 4K page size we need 256K pages which is limiting the
>> capacity?
>>>>
>>>> Yes, this is the result of my initial debugging.
>>>> The direct impact is that this kind of testpmd cases cannot start now.
>>>> If this is expected, I think we can close this defect and ignore the "--no-
>> huge"
>>>> option when start.
>>>
>>> Any insights? Should we just ignore the "--no-huge" option and close this
>> defect?
>>> Now we did this as a workaround. Seems no one uses the "--no-huge"
>>> option in testpmd now.
>>
>> VFIO supports dma_entry_limit as a module parameter, which has a default
>> value of U16_MAX i.e. 64K, most likely which is limiting creation of 256K
>> entries for 4K pages here. This can be modified while inserting vfio module:
>>          modprobe vfio_iommu_type1 dma_entry_limit=1000000
> 
> Thanks for your suggestion. I tried it on ubuntu 22.04 but it does not work.
> The reason I think is vfio-pci is build-in in kernel driver (since 20.04) and it does not support dynamic insmod/rmmod.
> 
> Does this command need to rmmod vfio first and then modprobe again?
> 

If it is inserted as a module then you can remove using rmmod and then 
modprobe again with the dma_entry_limit parameter. Also note, 
vfio_iommu_type1 is the module which is limiting the entries to 64K, so 
this module needs to be inserted again providing the dma_entry_limit 
module param.

In case the module is built-in you can provide via kernel command line 
parameter (ref: 
https://www.kernel.org/doc/html/v4.12/admin-guide/kernel-parameters.html). 
As per this ref document, "vfio_iommu_type1.dma_entry_limit=1000000" 
should be used in the bootargs to set the module parameters.

FYI.. DPDK documentation also mentions the limitation at:
https://doc.dpdk.org/guides/linux_gsg/linux_drivers.html

Thanks,
Nipun

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] vfio: do not coalesce DMA mappings
  2023-07-04  9:23             ` Gupta, Nipun
@ 2023-07-04 14:09               ` Thomas Monjalon
  2023-07-04 16:39                 ` Gupta, Nipun
  2023-07-04 15:11               ` Ding, Xuan
  1 sibling, 1 reply; 30+ messages in thread
From: Thomas Monjalon @ 2023-07-04 14:09 UTC (permalink / raw)
  To: Ding, Xuan, Burakov, Anatoly, Gupta, Nipun
  Cc: dev, Yigit, Ferruh, David Marchand, Agarwal, Nikhil, He,
	Xingguang, Ling, WeiX

04/07/2023 11:23, Gupta, Nipun:
> On 7/4/2023 1:36 PM, Ding, Xuan wrote:
>> From: Gupta, Nipun <Nipun.Gupta@amd.com>
>>> From: Ding, Xuan <xuan.ding@intel.com>
>>>> From: Ding, Xuan
>>>>> From: Nipun Gupta <nipun.gupta@amd.com>
> >>>>> Hi Xuan,
> >>>>>
> >>>>> Thanks for pointing out the issue and figuring out the patch which
> >>>>> introduced this. If you have answers to below queries, please let me know:
> >>>>>
> >>>>> Is there any other test cases which tests "--no-huge" which pass?
> >>>>
> >>>> Yes, there are test cases adding "--no-huge" option to validate 4k
> >>>> page size in async vhost.
> >>>> Actually, the page size is decided by front-end, so I think this
> >>>> case can be removed.
> >>>>
> >>>> Previously, testpmd can start with "--no-huge" options (not sure if
> >>>> there are test cases).
> >>>> Cmd: ./build/app/dpdk-testpmd -l 5-6 -n 4 --no-huge -m 1024 -- -i
> >>>>
> >>>>>
> >>>>> Also, if we change the "-m" option to provide lower memory, does
> >>>>> the test pass?
> >>>>
> >>>> "-m" option is also added and does not work.
> >>>>
> >>>>>
> >>>>> When you mention too many pages exceed the capability of IOMMU,
> >>>>> you are referring to HW capability to create multiple pages? Here
> >>>>> it seems in case of 4K page size we need 256K pages which is limiting the
> >> capacity?
> >>>>
> >>>> Yes, this is the result of my initial debugging.
> >>>> The direct impact is that this kind of testpmd cases cannot start now.
> >>>> If this is expected, I think we can close this defect and ignore the "--no-
> >> huge"
> >>>> option when start.
> >>>
> >>> Any insights? Should we just ignore the "--no-huge" option and close this
> >> defect?
> >>> Now we did this as a workaround. Seems no one uses the "--no-huge"
> >>> option in testpmd now.
> >>
> >> VFIO supports dma_entry_limit as a module parameter, which has a default
> >> value of U16_MAX i.e. 64K, most likely which is limiting creation of 256K
> >> entries for 4K pages here. This can be modified while inserting vfio module:
> >>          modprobe vfio_iommu_type1 dma_entry_limit=1000000
> > 
> > Thanks for your suggestion. I tried it on ubuntu 22.04 but it does not work.
> > The reason I think is vfio-pci is build-in in kernel driver (since 20.04) and it does not support dynamic insmod/rmmod.
> > 
> > Does this command need to rmmod vfio first and then modprobe again?
> > 
> 
> If it is inserted as a module then you can remove using rmmod and then 
> modprobe again with the dma_entry_limit parameter. Also note, 
> vfio_iommu_type1 is the module which is limiting the entries to 64K, so 
> this module needs to be inserted again providing the dma_entry_limit 
> module param.
> 
> In case the module is built-in you can provide via kernel command line 
> parameter (ref: 
> https://www.kernel.org/doc/html/v4.12/admin-guide/kernel-parameters.html). 
> As per this ref document, "vfio_iommu_type1.dma_entry_limit=1000000" 
> should be used in the bootargs to set the module parameters.
> 
> FYI.. DPDK documentation also mentions the limitation at:
> https://doc.dpdk.org/guides/linux_gsg/linux_drivers.html

Yes the parameter is discussed in
https://doc.dpdk.org/guides/linux_gsg/linux_drivers.html#vfio-memory-mapping-limits
but it does not mention we may need to decrease it with --no-huge.
Please could you add this to the documentation?



^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH] vfio: do not coalesce DMA mappings
  2023-07-04  9:23             ` Gupta, Nipun
  2023-07-04 14:09               ` Thomas Monjalon
@ 2023-07-04 15:11               ` Ding, Xuan
  2023-07-04 16:42                 ` Gupta, Nipun
  1 sibling, 1 reply; 30+ messages in thread
From: Ding, Xuan @ 2023-07-04 15:11 UTC (permalink / raw)
  To: Gupta, Nipun, dev, thomas, Burakov, Anatoly, Yigit, Ferruh,
	David Marchand
  Cc: Agarwal, Nikhil, He, Xingguang, Ling, WeiX

Hi Nipun,

> -----Original Message-----
> From: Gupta, Nipun <nipun.gupta@amd.com>
> Sent: Tuesday, July 4, 2023 5:23 PM
> To: Ding, Xuan <xuan.ding@intel.com>; dev@dpdk.org; thomas@monjalon.net;
> Burakov, Anatoly <anatoly.burakov@intel.com>; Yigit, Ferruh
> <Ferruh.Yigit@amd.com>; David Marchand <david.marchand@redhat.com>
> Cc: Agarwal, Nikhil <nikhil.agarwal@amd.com>; He, Xingguang
> <xingguang.he@intel.com>; Ling, WeiX <weix.ling@intel.com>
> Subject: Re: [PATCH] vfio: do not coalesce DMA mappings
> 
> Hi Xuan,
> 
> On 7/4/2023 1:36 PM, Ding, Xuan wrote:
> > Hi Nipun,
> >
> >> -----Original Message-----
> >> From: Gupta, Nipun <Nipun.Gupta@amd.com>
> >> Sent: Tuesday, July 4, 2023 2:54 PM
> >> To: Ding, Xuan <xuan.ding@intel.com>; dev@dpdk.org;
> >> thomas@monjalon.net; Burakov, Anatoly <anatoly.burakov@intel.com>;
> >> Yigit, Ferruh <Ferruh.Yigit@amd.com>; David Marchand
> >> <david.marchand@redhat.com>
> >> Cc: Agarwal, Nikhil <nikhil.agarwal@amd.com>; He, Xingguang
> >> <xingguang.he@intel.com>; Ling, WeiX <weix.ling@intel.com>
> >> Subject: RE: [PATCH] vfio: do not coalesce DMA mappings
> >>
> >> Hi Xuan,
> >>
> >> Please see inline.
> >>
> >>> -----Original Message-----
> >>> From: Ding, Xuan <xuan.ding@intel.com>
> >>> Sent: Tuesday, July 4, 2023 10:43 AM
> >>> To: Gupta, Nipun <Nipun.Gupta@amd.com>; dev@dpdk.org;
> >>> thomas@monjalon.net; Burakov, Anatoly <anatoly.burakov@intel.com>;
> >>> Yigit, Ferruh <Ferruh.Yigit@amd.com>
> >>> Cc: Agarwal, Nikhil <nikhil.agarwal@amd.com>; He, Xingguang
> >>> <xingguang.he@intel.com>; Ling, WeiX <weix.ling@intel.com>
> >>> Subject: RE: [PATCH] vfio: do not coalesce DMA mappings
> >>>
> >>> Hi Nipun,
> >>>
> >>>> -----Original Message-----
> >>>> From: Ding, Xuan
> >>>> Sent: Friday, June 30, 2023 1:58 PM
> >>>> To: Nipun Gupta <nipun.gupta@amd.com>; dev@dpdk.org;
> >>>> thomas@monjalon.net; Burakov, Anatoly <anatoly.burakov@intel.com>;
> >>>> ferruh.yigit@amd.com
> >>>> Cc: nikhil.agarwal@amd.com; He, Xingguang <xingguang.he@intel.com>;
> >>>> Ling, WeiX <weix.ling@intel.com>
> >>>> Subject: RE: [PATCH] vfio: do not coalesce DMA mappings
> >>>>
> >>>> Hi Nipun,
> >>>>
> >>>> Replies are inline.
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Nipun Gupta <nipun.gupta@amd.com>
> >>>>> Sent: Friday, June 30, 2023 9:46 AM
> >>>>> To: Ding, Xuan <xuan.ding@intel.com>; dev@dpdk.org;
> >>>>> thomas@monjalon.net; Burakov, Anatoly <anatoly.burakov@intel.com>;
> >>>>> ferruh.yigit@amd.com
> >>>>> Cc: nikhil.agarwal@amd.com; He, Xingguang
> >>>>> <xingguang.he@intel.com>; Ling, WeiX <weix.ling@intel.com>
> >>>>> Subject: Re: [PATCH] vfio: do not coalesce DMA mappings
> >>>>>
> >>>>> Hi Xuan,
> >>>>>
> >>>>> Thanks for pointing out the issue and figuring out the patch which
> >>>>> introduced this. If you have answers to below queries, please let me
> know:
> >>>>>
> >>>>> Is there any other test cases which tests "--no-huge" which pass?
> >>>>
> >>>> Yes, there are test cases adding "--no-huge" option to validate 4k
> >>>> page size in async vhost.
> >>>> Actually, the page size is decided by front-end, so I think this
> >>>> case can be removed.
> >>>>
> >>>> Previously, testpmd can start with "--no-huge" options (not sure if
> >>>> there are test cases).
> >>>> Cmd: ./build/app/dpdk-testpmd -l 5-6 -n 4 --no-huge -m 1024 -- -i
> >>>>
> >>>>>
> >>>>> Also, if we change the "-m" option to provide lower memory, does
> >>>>> the test pass?
> >>>>
> >>>> "-m" option is also added and does not work.
> >>>>
> >>>>>
> >>>>> When you mention too many pages exceed the capability of IOMMU,
> >>>>> you are referring to HW capability to create multiple pages? Here
> >>>>> it seems in case of 4K page size we need 256K pages which is
> >>>>> limiting the
> >> capacity?
> >>>>
> >>>> Yes, this is the result of my initial debugging.
> >>>> The direct impact is that this kind of testpmd cases cannot start now.
> >>>> If this is expected, I think we can close this defect and ignore
> >>>> the "--no-
> >> huge"
> >>>> option when start.
> >>>
> >>> Any insights? Should we just ignore the "--no-huge" option and close
> >>> this
> >> defect?
> >>> Now we did this as a workaround. Seems no one uses the "--no-huge"
> >>> option in testpmd now.
> >>
> >> VFIO supports dma_entry_limit as a module parameter, which has a
> >> default value of U16_MAX i.e. 64K, most likely which is limiting
> >> creation of 256K entries for 4K pages here. This can be modified while
> inserting vfio module:
> >>          modprobe vfio_iommu_type1 dma_entry_limit=1000000
> >
> > Thanks for your suggestion. I tried it on ubuntu 22.04 but it does not work.
> > The reason I think is vfio-pci is build-in in kernel driver (since 20.04) and it
> does not support dynamic insmod/rmmod.
> >
> > Does this command need to rmmod vfio first and then modprobe again?
> >
> 
> If it is inserted as a module then you can remove using rmmod and then
> modprobe again with the dma_entry_limit parameter. Also note,
> vfio_iommu_type1 is the module which is limiting the entries to 64K, so this
> module needs to be inserted again providing the dma_entry_limit module
> param.
> 
> In case the module is built-in you can provide via kernel command line
> parameter (ref:
> https://www.kernel.org/doc/html/v4.12/admin-guide/kernel-
> parameters.html).
> As per this ref document, "vfio_iommu_type1.dma_entry_limit=1000000"
> should be used in the bootargs to set the module parameters.
> 
> FYI.. DPDK documentation also mentions the limitation at:
> https://doc.dpdk.org/guides/linux_gsg/linux_drivers.html

Thanks for pointing out these references.

Add supplement for configuring build-in vfio module:
Except adding " vfio_iommu_type1.dma_entry_limit=1000000" in bootargs,
we can use kernel command line: "echo 1000000 > /sys/module/vfio_iommu_type1/parameters/dma_entry_limit".

Both methods works, thanks again.

Regards,
Xuan

> 
> Thanks,
> Nipun

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] vfio: do not coalesce DMA mappings
  2023-07-04 14:09               ` Thomas Monjalon
@ 2023-07-04 16:39                 ` Gupta, Nipun
  0 siblings, 0 replies; 30+ messages in thread
From: Gupta, Nipun @ 2023-07-04 16:39 UTC (permalink / raw)
  To: Thomas Monjalon, Ding, Xuan, Burakov, Anatoly
  Cc: dev, Yigit, Ferruh, David Marchand, Agarwal, Nikhil, He,
	Xingguang, Ling, WeiX



On 7/4/2023 7:39 PM, Thomas Monjalon wrote:
> 04/07/2023 11:23, Gupta, Nipun:
>> On 7/4/2023 1:36 PM, Ding, Xuan wrote:
>>> From: Gupta, Nipun <Nipun.Gupta@amd.com>
>>>> From: Ding, Xuan <xuan.ding@intel.com>
>>>>> From: Ding, Xuan
>>>>>> From: Nipun Gupta <nipun.gupta@amd.com>
>>>>>>> Hi Xuan,
>>>>>>>
>>>>>>> Thanks for pointing out the issue and figuring out the patch which
>>>>>>> introduced this. If you have answers to below queries, please let me know:
>>>>>>>
>>>>>>> Is there any other test cases which tests "--no-huge" which pass?
>>>>>>
>>>>>> Yes, there are test cases adding "--no-huge" option to validate 4k
>>>>>> page size in async vhost.
>>>>>> Actually, the page size is decided by front-end, so I think this
>>>>>> case can be removed.
>>>>>>
>>>>>> Previously, testpmd can start with "--no-huge" options (not sure if
>>>>>> there are test cases).
>>>>>> Cmd: ./build/app/dpdk-testpmd -l 5-6 -n 4 --no-huge -m 1024 -- -i
>>>>>>
>>>>>>>
>>>>>>> Also, if we change the "-m" option to provide lower memory, does
>>>>>>> the test pass?
>>>>>>
>>>>>> "-m" option is also added and does not work.
>>>>>>
>>>>>>>
>>>>>>> When you mention too many pages exceed the capability of IOMMU,
>>>>>>> you are referring to HW capability to create multiple pages? Here
>>>>>>> it seems in case of 4K page size we need 256K pages which is limiting the
>>>> capacity?
>>>>>>
>>>>>> Yes, this is the result of my initial debugging.
>>>>>> The direct impact is that this kind of testpmd cases cannot start now.
>>>>>> If this is expected, I think we can close this defect and ignore the "--no-
>>>> huge"
>>>>>> option when start.
>>>>>
>>>>> Any insights? Should we just ignore the "--no-huge" option and close this
>>>> defect?
>>>>> Now we did this as a workaround. Seems no one uses the "--no-huge"
>>>>> option in testpmd now.
>>>>
>>>> VFIO supports dma_entry_limit as a module parameter, which has a default
>>>> value of U16_MAX i.e. 64K, most likely which is limiting creation of 256K
>>>> entries for 4K pages here. This can be modified while inserting vfio module:
>>>>           modprobe vfio_iommu_type1 dma_entry_limit=1000000
>>>
>>> Thanks for your suggestion. I tried it on ubuntu 22.04 but it does not work.
>>> The reason I think is vfio-pci is build-in in kernel driver (since 20.04) and it does not support dynamic insmod/rmmod.
>>>
>>> Does this command need to rmmod vfio first and then modprobe again?
>>>
>>
>> If it is inserted as a module then you can remove using rmmod and then
>> modprobe again with the dma_entry_limit parameter. Also note,
>> vfio_iommu_type1 is the module which is limiting the entries to 64K, so
>> this module needs to be inserted again providing the dma_entry_limit
>> module param.
>>
>> In case the module is built-in you can provide via kernel command line
>> parameter (ref:
>> https://www.kernel.org/doc/html/v4.12/admin-guide/kernel-parameters.html).
>> As per this ref document, "vfio_iommu_type1.dma_entry_limit=1000000"
>> should be used in the bootargs to set the module parameters.
>>
>> FYI.. DPDK documentation also mentions the limitation at:
>> https://doc.dpdk.org/guides/linux_gsg/linux_drivers.html
> 
> Yes the parameter is discussed in
> https://doc.dpdk.org/guides/linux_gsg/linux_drivers.html#vfio-memory-mapping-limits
> but it does not mention we may need to decrease it with --no-huge.
> Please could you add this to the documentation?

sure! Ill send out a patch for this.

> 
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] vfio: do not coalesce DMA mappings
  2023-07-04 15:11               ` Ding, Xuan
@ 2023-07-04 16:42                 ` Gupta, Nipun
  0 siblings, 0 replies; 30+ messages in thread
From: Gupta, Nipun @ 2023-07-04 16:42 UTC (permalink / raw)
  To: Ding, Xuan, dev, thomas, Burakov, Anatoly, Yigit, Ferruh, David Marchand
  Cc: Agarwal, Nikhil, He, Xingguang, Ling, WeiX



On 7/4/2023 8:41 PM, Ding, Xuan wrote:
> Hi Nipun,
> 
>> -----Original Message-----
>> From: Gupta, Nipun <nipun.gupta@amd.com>
>> Sent: Tuesday, July 4, 2023 5:23 PM
>> To: Ding, Xuan <xuan.ding@intel.com>; dev@dpdk.org; thomas@monjalon.net;
>> Burakov, Anatoly <anatoly.burakov@intel.com>; Yigit, Ferruh
>> <Ferruh.Yigit@amd.com>; David Marchand <david.marchand@redhat.com>
>> Cc: Agarwal, Nikhil <nikhil.agarwal@amd.com>; He, Xingguang
>> <xingguang.he@intel.com>; Ling, WeiX <weix.ling@intel.com>
>> Subject: Re: [PATCH] vfio: do not coalesce DMA mappings
>>
>> Hi Xuan,
>>
>> On 7/4/2023 1:36 PM, Ding, Xuan wrote:
>>> Hi Nipun,
>>>
>>>> -----Original Message-----
>>>> From: Gupta, Nipun <Nipun.Gupta@amd.com>
>>>> Sent: Tuesday, July 4, 2023 2:54 PM
>>>> To: Ding, Xuan <xuan.ding@intel.com>; dev@dpdk.org;
>>>> thomas@monjalon.net; Burakov, Anatoly <anatoly.burakov@intel.com>;
>>>> Yigit, Ferruh <Ferruh.Yigit@amd.com>; David Marchand
>>>> <david.marchand@redhat.com>
>>>> Cc: Agarwal, Nikhil <nikhil.agarwal@amd.com>; He, Xingguang
>>>> <xingguang.he@intel.com>; Ling, WeiX <weix.ling@intel.com>
>>>> Subject: RE: [PATCH] vfio: do not coalesce DMA mappings
>>>>
>>>> Hi Xuan,
>>>>
>>>> Please see inline.
>>>>
>>>>> -----Original Message-----
>>>>> From: Ding, Xuan <xuan.ding@intel.com>
>>>>> Sent: Tuesday, July 4, 2023 10:43 AM
>>>>> To: Gupta, Nipun <Nipun.Gupta@amd.com>; dev@dpdk.org;
>>>>> thomas@monjalon.net; Burakov, Anatoly <anatoly.burakov@intel.com>;
>>>>> Yigit, Ferruh <Ferruh.Yigit@amd.com>
>>>>> Cc: Agarwal, Nikhil <nikhil.agarwal@amd.com>; He, Xingguang
>>>>> <xingguang.he@intel.com>; Ling, WeiX <weix.ling@intel.com>
>>>>> Subject: RE: [PATCH] vfio: do not coalesce DMA mappings
>>>>>
>>>>> Hi Nipun,
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Ding, Xuan
>>>>>> Sent: Friday, June 30, 2023 1:58 PM
>>>>>> To: Nipun Gupta <nipun.gupta@amd.com>; dev@dpdk.org;
>>>>>> thomas@monjalon.net; Burakov, Anatoly <anatoly.burakov@intel.com>;
>>>>>> ferruh.yigit@amd.com
>>>>>> Cc: nikhil.agarwal@amd.com; He, Xingguang <xingguang.he@intel.com>;
>>>>>> Ling, WeiX <weix.ling@intel.com>
>>>>>> Subject: RE: [PATCH] vfio: do not coalesce DMA mappings
>>>>>>
>>>>>> Hi Nipun,
>>>>>>
>>>>>> Replies are inline.
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Nipun Gupta <nipun.gupta@amd.com>
>>>>>>> Sent: Friday, June 30, 2023 9:46 AM
>>>>>>> To: Ding, Xuan <xuan.ding@intel.com>; dev@dpdk.org;
>>>>>>> thomas@monjalon.net; Burakov, Anatoly <anatoly.burakov@intel.com>;
>>>>>>> ferruh.yigit@amd.com
>>>>>>> Cc: nikhil.agarwal@amd.com; He, Xingguang
>>>>>>> <xingguang.he@intel.com>; Ling, WeiX <weix.ling@intel.com>
>>>>>>> Subject: Re: [PATCH] vfio: do not coalesce DMA mappings
>>>>>>>
>>>>>>> Hi Xuan,
>>>>>>>
>>>>>>> Thanks for pointing out the issue and figuring out the patch which
>>>>>>> introduced this. If you have answers to below queries, please let me
>> know:
>>>>>>>
>>>>>>> Is there any other test cases which tests "--no-huge" which pass?
>>>>>>
>>>>>> Yes, there are test cases adding "--no-huge" option to validate 4k
>>>>>> page size in async vhost.
>>>>>> Actually, the page size is decided by front-end, so I think this
>>>>>> case can be removed.
>>>>>>
>>>>>> Previously, testpmd can start with "--no-huge" options (not sure if
>>>>>> there are test cases).
>>>>>> Cmd: ./build/app/dpdk-testpmd -l 5-6 -n 4 --no-huge -m 1024 -- -i
>>>>>>
>>>>>>>
>>>>>>> Also, if we change the "-m" option to provide lower memory, does
>>>>>>> the test pass?
>>>>>>
>>>>>> "-m" option is also added and does not work.
>>>>>>
>>>>>>>
>>>>>>> When you mention too many pages exceed the capability of IOMMU,
>>>>>>> you are referring to HW capability to create multiple pages? Here
>>>>>>> it seems in case of 4K page size we need 256K pages which is
>>>>>>> limiting the
>>>> capacity?
>>>>>>
>>>>>> Yes, this is the result of my initial debugging.
>>>>>> The direct impact is that this kind of testpmd cases cannot start now.
>>>>>> If this is expected, I think we can close this defect and ignore
>>>>>> the "--no-
>>>> huge"
>>>>>> option when start.
>>>>>
>>>>> Any insights? Should we just ignore the "--no-huge" option and close
>>>>> this
>>>> defect?
>>>>> Now we did this as a workaround. Seems no one uses the "--no-huge"
>>>>> option in testpmd now.
>>>>
>>>> VFIO supports dma_entry_limit as a module parameter, which has a
>>>> default value of U16_MAX i.e. 64K, most likely which is limiting
>>>> creation of 256K entries for 4K pages here. This can be modified while
>> inserting vfio module:
>>>>           modprobe vfio_iommu_type1 dma_entry_limit=1000000
>>>
>>> Thanks for your suggestion. I tried it on ubuntu 22.04 but it does not work.
>>> The reason I think is vfio-pci is build-in in kernel driver (since 20.04) and it
>> does not support dynamic insmod/rmmod.
>>>
>>> Does this command need to rmmod vfio first and then modprobe again?
>>>
>>
>> If it is inserted as a module then you can remove using rmmod and then
>> modprobe again with the dma_entry_limit parameter. Also note,
>> vfio_iommu_type1 is the module which is limiting the entries to 64K, so this
>> module needs to be inserted again providing the dma_entry_limit module
>> param.
>>
>> In case the module is built-in you can provide via kernel command line
>> parameter (ref:
>> https://www.kernel.org/doc/html/v4.12/admin-guide/kernel-
>> parameters.html).
>> As per this ref document, "vfio_iommu_type1.dma_entry_limit=1000000"
>> should be used in the bootargs to set the module parameters.
>>
>> FYI.. DPDK documentation also mentions the limitation at:
>> https://doc.dpdk.org/guides/linux_gsg/linux_drivers.html
> 
> Thanks for pointing out these references.
> 
> Add supplement for configuring build-in vfio module:
> Except adding " vfio_iommu_type1.dma_entry_limit=1000000" in bootargs,
> we can use kernel command line: "echo 1000000 > /sys/module/vfio_iommu_type1/parameters/dma_entry_limit".
> 
> Both methods works, thanks again.

Good to know this works!

> 
> Regards,
> Xuan
> 
>>
>> Thanks,
>> Nipun

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2023-07-04 16:42 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-30  9:58 [PATCH] vfio: do not coalesce DMA mappings Nipun Gupta
2023-01-04  5:19 ` [PATCH v2] " Nipun Gupta
2023-02-02 10:48   ` David Marchand
2023-02-07  8:56     ` Gupta, Nipun
2023-04-04 14:57       ` Gupta, Nipun
2023-04-04 15:13       ` Burakov, Anatoly
2023-04-04 16:32         ` Nipun Gupta
2023-04-05 14:16           ` Burakov, Anatoly
2023-04-05 15:06             ` Gupta, Nipun
2023-04-24 15:22             ` David Marchand
2023-04-24 16:10               ` Stephen Hemminger
2023-04-24 16:16                 ` Gupta, Nipun
2023-05-10 12:58               ` Nipun Gupta
2023-05-11 14:08                 ` Burakov, Anatoly
2023-04-11 15:13           ` Thomas Monjalon
2023-04-11 16:51             ` Gupta, Nipun
2023-04-07  6:13         ` Nipun Gupta
2023-04-07  7:26           ` David Marchand
2023-05-15 11:16   ` David Marchand
2023-06-29  8:21 ` [PATCH] " Ding, Xuan
2023-06-30  1:45   ` Nipun Gupta
2023-06-30  5:58     ` Ding, Xuan
2023-07-04  5:13       ` Ding, Xuan
2023-07-04  6:53         ` Gupta, Nipun
2023-07-04  8:06           ` Ding, Xuan
2023-07-04  9:23             ` Gupta, Nipun
2023-07-04 14:09               ` Thomas Monjalon
2023-07-04 16:39                 ` Gupta, Nipun
2023-07-04 15:11               ` Ding, Xuan
2023-07-04 16:42                 ` Gupta, Nipun

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).