Re: [PATCH] vfio: do not coalesce DMA mappings

DPDK patches and discussions
 help / color / mirror / Atom feed

From: Nipun Gupta <nipun.gupta@amd.com>
To: "Ding, Xuan" <xuan.ding@intel.com>, "dev@dpdk.org" <dev@dpdk.org>,
	"thomas@monjalon.net" <thomas@monjalon.net>,
	"Burakov, Anatoly" <anatoly.burakov@intel.com>,
	"ferruh.yigit@amd.com" <ferruh.yigit@amd.com>
Cc: "nikhil.agarwal@amd.com" <nikhil.agarwal@amd.com>,
	"He, Xingguang" <xingguang.he@intel.com>,
	"Ling, WeiX" <weix.ling@intel.com>
Subject: Re: [PATCH] vfio: do not coalesce DMA mappings
Date: Fri, 30 Jun 2023 07:15:38 +0530	[thread overview]
Message-ID: <e1ce79ae-a6d1-35dd-b5d5-8237d1936490@amd.com> (raw)
In-Reply-To: <BN9PR11MB55137CBE3281E8EAC1CB8BB5E725A@BN9PR11MB5513.namprd11.prod.outlook.com>

Hi Xuan,

Thanks for pointing out the issue and figuring out the patch which 
introduced this. If you have answers to below queries, please let me know:

Is there any other test cases which tests "--no-huge" which pass?

Also, if we change the "-m" option to provide lower memory, does the 
test pass?

When you mention too many pages exceed the capability of IOMMU, you are 
referring to HW capability to create multiple pages? Here it seems in 
case of 4K page size we need 256K pages which is limiting the capacity?

Regards,
Nipun

On 6/29/2023 1:51 PM, Ding, Xuan wrote:
> Hi Nipun,
> 
> I'd like to appreciate your time reading this email.
> 
> Our QA team found that since this commit "a399d7b5a994: do not coalesce DMA mappings" is introduced,
> the dpdk testpmd start with "--no-huge" parameters will failed, and shows "EAL: Cannot set up DMA remapping, error 28 (No space left on device)".
> So they reported it on dpdk Bugzilla: https://bugs.dpdk.org/show_bug.cgi?id=1235.
> 
> I understand this feature is to keep consistent with the kernel and not allow memory segments be merged.
> The side effect is the testpmd with "--no-huge" parameters will not be able to start because the too many pages will exceed the capability of IOMMU.
> Is it expected? Should we remove the --no-huge" in our testcase?
> 
> Regards,
> Xuan
> 
>> -----Original Message-----
>> From: Nipun Gupta <nipun.gupta@amd.com>
>> Sent: Friday, December 30, 2022 5:59 PM
>> To: dev@dpdk.org; thomas@monjalon.net; Burakov, Anatoly
>> <anatoly.burakov@intel.com>; ferruh.yigit@amd.com
>> Cc: nikhil.agarwal@amd.com; Nipun Gupta <nipun.gupta@amd.com>
>> Subject: [PATCH] vfio: do not coalesce DMA mappings
>>
>> At the cleanup time when dma unmap is done, linux kernel does not allow
>> unmap of individual segments which were coalesced together while creating
>> the DMA map for type1 IOMMU mappings. So, this change updates the
>> mapping of the memory
>> segments(hugepages) on a per-page basis.
>>
>> Signed-off-by: Nipun Gupta <nipun.gupta@amd.com>
>> ---
>>
>> When hotplug of devices is used, multiple pages gets colaeced and a single
>> mapping gets created for these pages (using APIs
>> rte_memseg_contig_walk() and type1_map_contig(). On the cleanup time
>> when the memory is released, the VFIO does not cleans up that memory and
>> following error is observed in the eal for 2MB
>> hugepages:
>> EAL: Unexpected size 0 of DMA remapping cleared instead of 2097152
>>
>> This is because VFIO does not clear the DMA (refer API
>> vfio_dma_do_unmap() -
>> https://elixir.bootlin.com/linux/latest/source/drivers/vfio/vfio_iommu_type1.
>> c#L1330),
>> where it checks the dma mapping where it checks for IOVA to free:
>> https://elixir.bootlin.com/linux/latest/source/drivers/vfio/vfio_iommu_type1.
>> c#L1418.
>>
>> Thus this change updates the mapping to be created individually instead of
>> colaecing them.
>>
>>   lib/eal/linux/eal_vfio.c | 29 -----------------------------
>>   1 file changed, 29 deletions(-)
>>
>> diff --git a/lib/eal/linux/eal_vfio.c b/lib/eal/linux/eal_vfio.c index
>> 549b86ae1d..56edccb0db 100644
>> --- a/lib/eal/linux/eal_vfio.c
>> +++ b/lib/eal/linux/eal_vfio.c
>> @@ -1369,19 +1369,6 @@ rte_vfio_get_group_num(const char *sysfs_base,
>>   	return 1;
>>   }
>>
>> -static int
>> -type1_map_contig(const struct rte_memseg_list *msl, const struct
>> rte_memseg *ms,
>> -		size_t len, void *arg)
>> -{
>> -	int *vfio_container_fd = arg;
>> -
>> -	if (msl->external)
>> -		return 0;
>> -
>> -	return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64,
>> ms->iova,
>> -			len, 1);
>> -}
>> -
>>   static int
>>   type1_map(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
>>   		void *arg)
>> @@ -1396,10 +1383,6 @@ type1_map(const struct rte_memseg_list *msl,
>> const struct rte_memseg *ms,
>>   	if (ms->iova == RTE_BAD_IOVA)
>>   		return 0;
>>
>> -	/* if IOVA mode is VA, we've already mapped the internal segments */
>> -	if (!msl->external && rte_eal_iova_mode() == RTE_IOVA_VA)
>> -		return 0;
>> -
>>   	return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64,
>> ms->iova,
>>   			ms->len, 1);
>>   }
>> @@ -1464,18 +1447,6 @@ vfio_type1_dma_mem_map(int vfio_container_fd,
>> uint64_t vaddr, uint64_t iova,  static int  vfio_type1_dma_map(int
>> vfio_container_fd)  {
>> -	if (rte_eal_iova_mode() == RTE_IOVA_VA) {
>> -		/* with IOVA as VA mode, we can get away with mapping
>> contiguous
>> -		 * chunks rather than going page-by-page.
>> -		 */
>> -		int ret = rte_memseg_contig_walk(type1_map_contig,
>> -				&vfio_container_fd);
>> -		if (ret)
>> -			return ret;
>> -		/* we have to continue the walk because we've skipped the
>> -		 * external segments during the config walk.
>> -		 */
>> -	}
>>   	return rte_memseg_walk(type1_map, &vfio_container_fd);  }
>>
>> --
>> 2.25.1
>

next prev parent reply	other threads:[~2023-06-30  1:46 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-30  9:58 Nipun Gupta
2023-01-04  5:19 ` [PATCH v2] " Nipun Gupta
2023-02-02 10:48   ` David Marchand
2023-02-07  8:56     ` Gupta, Nipun
2023-04-04 14:57       ` Gupta, Nipun
2023-04-04 15:13       ` Burakov, Anatoly
2023-04-04 16:32         ` Nipun Gupta
2023-04-05 14:16           ` Burakov, Anatoly
2023-04-05 15:06             ` Gupta, Nipun
2023-04-24 15:22             ` David Marchand
2023-04-24 16:10               ` Stephen Hemminger
2023-04-24 16:16                 ` Gupta, Nipun
2023-05-10 12:58               ` Nipun Gupta
2023-05-11 14:08                 ` Burakov, Anatoly
2023-04-11 15:13           ` Thomas Monjalon
2023-04-11 16:51             ` Gupta, Nipun
2023-04-07  6:13         ` Nipun Gupta
2023-04-07  7:26           ` David Marchand
2023-05-15 11:16   ` David Marchand
2023-06-29  8:21 ` [PATCH] " Ding, Xuan
2023-06-30  1:45   ` Nipun Gupta [this message]
2023-06-30  5:58     ` Ding, Xuan
2023-07-04  5:13       ` Ding, Xuan
2023-07-04  6:53         ` Gupta, Nipun
2023-07-04  8:06           ` Ding, Xuan
2023-07-04  9:23             ` Gupta, Nipun
2023-07-04 14:09               ` Thomas Monjalon
2023-07-04 16:39                 ` Gupta, Nipun
2023-07-04 15:11               ` Ding, Xuan
2023-07-04 16:42                 ` Gupta, Nipun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e1ce79ae-a6d1-35dd-b5d5-8237d1936490@amd.com \
    --to=nipun.gupta@amd.com \
    --cc=anatoly.burakov@intel.com \
    --cc=dev@dpdk.org \
    --cc=ferruh.yigit@amd.com \
    --cc=nikhil.agarwal@amd.com \
    --cc=thomas@monjalon.net \
    --cc=weix.ling@intel.com \
    --cc=xingguang.he@intel.com \
    --cc=xuan.ding@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).