RE: [PATCH] vfio: do not coalesce DMA mappings

DPDK patches and discussions
 help / color / mirror / Atom feed

From: "Ding, Xuan" <xuan.ding@intel.com>
To: Nipun Gupta <nipun.gupta@amd.com>, "dev@dpdk.org" <dev@dpdk.org>,
	"thomas@monjalon.net" <thomas@monjalon.net>,
	"Burakov, Anatoly" <anatoly.burakov@intel.com>,
	"ferruh.yigit@amd.com" <ferruh.yigit@amd.com>
Cc: "nikhil.agarwal@amd.com" <nikhil.agarwal@amd.com>,
	"He, Xingguang" <xingguang.he@intel.com>,
	"Ling, WeiX" <weix.ling@intel.com>
Subject: RE: [PATCH] vfio: do not coalesce DMA mappings
Date: Fri, 30 Jun 2023 05:58:25 +0000	[thread overview]
Message-ID: <BN9PR11MB551366E95D6746FC8C853833E72AA@BN9PR11MB5513.namprd11.prod.outlook.com> (raw)
In-Reply-To: <e1ce79ae-a6d1-35dd-b5d5-8237d1936490@amd.com>

Hi Nipun,

Replies are inline.

> -----Original Message-----
> From: Nipun Gupta <nipun.gupta@amd.com>
> Sent: Friday, June 30, 2023 9:46 AM
> To: Ding, Xuan <xuan.ding@intel.com>; dev@dpdk.org;
> thomas@monjalon.net; Burakov, Anatoly <anatoly.burakov@intel.com>;
> ferruh.yigit@amd.com
> Cc: nikhil.agarwal@amd.com; He, Xingguang <xingguang.he@intel.com>; Ling,
> WeiX <weix.ling@intel.com>
> Subject: Re: [PATCH] vfio: do not coalesce DMA mappings
> 
> Hi Xuan,
> 
> Thanks for pointing out the issue and figuring out the patch which introduced
> this. If you have answers to below queries, please let me know:
> 
> Is there any other test cases which tests "--no-huge" which pass?

Yes, there are test cases adding "--no-huge" option to validate 4k page size in async vhost.
Actually, the page size is decided by front-end, so I think this case can be removed.

Previously, testpmd can start with "--no-huge" options (not sure if there are test cases).
Cmd: ./build/app/dpdk-testpmd -l 5-6 -n 4 --no-huge -m 1024 -- -i

> 
> Also, if we change the "-m" option to provide lower memory, does the test
> pass?

"-m" option is also added and does not work.

> 
> When you mention too many pages exceed the capability of IOMMU, you are
> referring to HW capability to create multiple pages? Here it seems in case of
> 4K page size we need 256K pages which is limiting the capacity?

Yes, this is the result of my initial debugging.
The direct impact is that this kind of testpmd cases cannot start now.
If this is expected, I think we can close this defect and ignore the "--no-huge" option when start.

Regards,
Xuan

> 
> Regards,
> Nipun
> 
> On 6/29/2023 1:51 PM, Ding, Xuan wrote:
> > Hi Nipun,
> >
> > I'd like to appreciate your time reading this email.
> >
> > Our QA team found that since this commit "a399d7b5a994: do not
> > coalesce DMA mappings" is introduced, the dpdk testpmd start with "--no-
> huge" parameters will failed, and shows "EAL: Cannot set up DMA
> remapping, error 28 (No space left on device)".
> > So they reported it on dpdk Bugzilla:
> https://bugs.dpdk.org/show_bug.cgi?id=1235.
> >
> > I understand this feature is to keep consistent with the kernel and not allow
> memory segments be merged.
> > The side effect is the testpmd with "--no-huge" parameters will not be able
> to start because the too many pages will exceed the capability of IOMMU.
> > Is it expected? Should we remove the --no-huge" in our testcase?
> >
> > Regards,
> > Xuan
> >
> >> -----Original Message-----
> >> From: Nipun Gupta <nipun.gupta@amd.com>
> >> Sent: Friday, December 30, 2022 5:59 PM
> >> To: dev@dpdk.org; thomas@monjalon.net; Burakov, Anatoly
> >> <anatoly.burakov@intel.com>; ferruh.yigit@amd.com
> >> Cc: nikhil.agarwal@amd.com; Nipun Gupta <nipun.gupta@amd.com>
> >> Subject: [PATCH] vfio: do not coalesce DMA mappings
> >>
> >> At the cleanup time when dma unmap is done, linux kernel does not
> >> allow unmap of individual segments which were coalesced together
> >> while creating the DMA map for type1 IOMMU mappings. So, this change
> >> updates the mapping of the memory
> >> segments(hugepages) on a per-page basis.
> >>
> >> Signed-off-by: Nipun Gupta <nipun.gupta@amd.com>
> >> ---
> >>
> >> When hotplug of devices is used, multiple pages gets colaeced and a
> >> single mapping gets created for these pages (using APIs
> >> rte_memseg_contig_walk() and type1_map_contig(). On the cleanup time
> >> when the memory is released, the VFIO does not cleans up that memory
> >> and following error is observed in the eal for 2MB
> >> hugepages:
> >> EAL: Unexpected size 0 of DMA remapping cleared instead of 2097152
> >>
> >> This is because VFIO does not clear the DMA (refer API
> >> vfio_dma_do_unmap() -
> >>
> https://elixir.bootlin.com/linux/latest/source/drivers/vfio/vfio_iommu_type
> 1.
> >> c#L1330),
> >> where it checks the dma mapping where it checks for IOVA to free:
> >>
> https://elixir.bootlin.com/linux/latest/source/drivers/vfio/vfio_iommu_type
> 1.
> >> c#L1418.
> >>
> >> Thus this change updates the mapping to be created individually
> >> instead of colaecing them.
> >>
> >>   lib/eal/linux/eal_vfio.c | 29 -----------------------------
> >>   1 file changed, 29 deletions(-)
> >>
> >> diff --git a/lib/eal/linux/eal_vfio.c b/lib/eal/linux/eal_vfio.c
> >> index 549b86ae1d..56edccb0db 100644
> >> --- a/lib/eal/linux/eal_vfio.c
> >> +++ b/lib/eal/linux/eal_vfio.c
> >> @@ -1369,19 +1369,6 @@ rte_vfio_get_group_num(const char
> *sysfs_base,
> >>   	return 1;
> >>   }
> >>
> >> -static int
> >> -type1_map_contig(const struct rte_memseg_list *msl, const struct
> >> rte_memseg *ms,
> >> -		size_t len, void *arg)
> >> -{
> >> -	int *vfio_container_fd = arg;
> >> -
> >> -	if (msl->external)
> >> -		return 0;
> >> -
> >> -	return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64,
> >> ms->iova,
> >> -			len, 1);
> >> -}
> >> -
> >>   static int
> >>   type1_map(const struct rte_memseg_list *msl, const struct rte_memseg
> *ms,
> >>   		void *arg)
> >> @@ -1396,10 +1383,6 @@ type1_map(const struct rte_memseg_list *msl,
> >> const struct rte_memseg *ms,
> >>   	if (ms->iova == RTE_BAD_IOVA)
> >>   		return 0;
> >>
> >> -	/* if IOVA mode is VA, we've already mapped the internal segments
> */
> >> -	if (!msl->external && rte_eal_iova_mode() == RTE_IOVA_VA)
> >> -		return 0;
> >> -
> >>   	return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64,
> >> ms->iova,
> >>   			ms->len, 1);
> >>   }
> >> @@ -1464,18 +1447,6 @@ vfio_type1_dma_mem_map(int
> vfio_container_fd,
> >> uint64_t vaddr, uint64_t iova,  static int  vfio_type1_dma_map(int
> >> vfio_container_fd)  {
> >> -	if (rte_eal_iova_mode() == RTE_IOVA_VA) {
> >> -		/* with IOVA as VA mode, we can get away with mapping
> >> contiguous
> >> -		 * chunks rather than going page-by-page.
> >> -		 */
> >> -		int ret = rte_memseg_contig_walk(type1_map_contig,
> >> -				&vfio_container_fd);
> >> -		if (ret)
> >> -			return ret;
> >> -		/* we have to continue the walk because we've skipped the
> >> -		 * external segments during the config walk.
> >> -		 */
> >> -	}
> >>   	return rte_memseg_walk(type1_map, &vfio_container_fd);  }
> >>
> >> --
> >> 2.25.1
> >

next prev parent reply	other threads:[~2023-06-30  5:58 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-30  9:58 Nipun Gupta
2023-01-04  5:19 ` [PATCH v2] " Nipun Gupta
2023-02-02 10:48   ` David Marchand
2023-02-07  8:56     ` Gupta, Nipun
2023-04-04 14:57       ` Gupta, Nipun
2023-04-04 15:13       ` Burakov, Anatoly
2023-04-04 16:32         ` Nipun Gupta
2023-04-05 14:16           ` Burakov, Anatoly
2023-04-05 15:06             ` Gupta, Nipun
2023-04-24 15:22             ` David Marchand
2023-04-24 16:10               ` Stephen Hemminger
2023-04-24 16:16                 ` Gupta, Nipun
2023-05-10 12:58               ` Nipun Gupta
2023-05-11 14:08                 ` Burakov, Anatoly
2023-04-11 15:13           ` Thomas Monjalon
2023-04-11 16:51             ` Gupta, Nipun
2023-04-07  6:13         ` Nipun Gupta
2023-04-07  7:26           ` David Marchand
2023-05-15 11:16   ` David Marchand
2023-06-29  8:21 ` [PATCH] " Ding, Xuan
2023-06-30  1:45   ` Nipun Gupta
2023-06-30  5:58     ` Ding, Xuan [this message]
2023-07-04  5:13       ` Ding, Xuan
2023-07-04  6:53         ` Gupta, Nipun
2023-07-04  8:06           ` Ding, Xuan
2023-07-04  9:23             ` Gupta, Nipun
2023-07-04 14:09               ` Thomas Monjalon
2023-07-04 16:39                 ` Gupta, Nipun
2023-07-04 15:11               ` Ding, Xuan
2023-07-04 16:42                 ` Gupta, Nipun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BN9PR11MB551366E95D6746FC8C853833E72AA@BN9PR11MB5513.namprd11.prod.outlook.com \
    --to=xuan.ding@intel.com \
    --cc=anatoly.burakov@intel.com \
    --cc=dev@dpdk.org \
    --cc=ferruh.yigit@amd.com \
    --cc=nikhil.agarwal@amd.com \
    --cc=nipun.gupta@amd.com \
    --cc=thomas@monjalon.net \
    --cc=weix.ling@intel.com \
    --cc=xingguang.he@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).