RE: [PATCH] vfio: do not coalesce DMA mappings

DPDK patches and discussions
 help / color / mirror / Atom feed

From: "Ding, Xuan" <xuan.ding@intel.com>
To: "Gupta, Nipun" <Nipun.Gupta@amd.com>,
	"dev@dpdk.org" <dev@dpdk.org>,
	"thomas@monjalon.net" <thomas@monjalon.net>,
	"Burakov, Anatoly" <anatoly.burakov@intel.com>,
	"Yigit, Ferruh" <Ferruh.Yigit@amd.com>,
	"David Marchand" <david.marchand@redhat.com>
Cc: "Agarwal, Nikhil" <nikhil.agarwal@amd.com>,
	"He, Xingguang" <xingguang.he@intel.com>,
	"Ling, WeiX" <weix.ling@intel.com>
Subject: RE: [PATCH] vfio: do not coalesce DMA mappings
Date: Tue, 4 Jul 2023 08:06:16 +0000	[thread overview]
Message-ID: <BN9PR11MB5513695681A6AD2B3F1B3C5EE72EA@BN9PR11MB5513.namprd11.prod.outlook.com> (raw)
In-Reply-To: <CH3PR12MB83081EB825913CE4A2F3AEFCE82EA@CH3PR12MB8308.namprd12.prod.outlook.com>

Hi Nipun,

> -----Original Message-----
> From: Gupta, Nipun <Nipun.Gupta@amd.com>
> Sent: Tuesday, July 4, 2023 2:54 PM
> To: Ding, Xuan <xuan.ding@intel.com>; dev@dpdk.org; thomas@monjalon.net;
> Burakov, Anatoly <anatoly.burakov@intel.com>; Yigit, Ferruh
> <Ferruh.Yigit@amd.com>; David Marchand <david.marchand@redhat.com>
> Cc: Agarwal, Nikhil <nikhil.agarwal@amd.com>; He, Xingguang
> <xingguang.he@intel.com>; Ling, WeiX <weix.ling@intel.com>
> Subject: RE: [PATCH] vfio: do not coalesce DMA mappings
> 
> Hi Xuan,
> 
> Please see inline.
> 
> > -----Original Message-----
> > From: Ding, Xuan <xuan.ding@intel.com>
> > Sent: Tuesday, July 4, 2023 10:43 AM
> > To: Gupta, Nipun <Nipun.Gupta@amd.com>; dev@dpdk.org;
> > thomas@monjalon.net; Burakov, Anatoly <anatoly.burakov@intel.com>;
> > Yigit, Ferruh <Ferruh.Yigit@amd.com>
> > Cc: Agarwal, Nikhil <nikhil.agarwal@amd.com>; He, Xingguang
> > <xingguang.he@intel.com>; Ling, WeiX <weix.ling@intel.com>
> > Subject: RE: [PATCH] vfio: do not coalesce DMA mappings
> >
> > Hi Nipun,
> >
> > > -----Original Message-----
> > > From: Ding, Xuan
> > > Sent: Friday, June 30, 2023 1:58 PM
> > > To: Nipun Gupta <nipun.gupta@amd.com>; dev@dpdk.org;
> > > thomas@monjalon.net; Burakov, Anatoly <anatoly.burakov@intel.com>;
> > > ferruh.yigit@amd.com
> > > Cc: nikhil.agarwal@amd.com; He, Xingguang <xingguang.he@intel.com>;
> > > Ling, WeiX <weix.ling@intel.com>
> > > Subject: RE: [PATCH] vfio: do not coalesce DMA mappings
> > >
> > > Hi Nipun,
> > >
> > > Replies are inline.
> > >
> > > > -----Original Message-----
> > > > From: Nipun Gupta <nipun.gupta@amd.com>
> > > > Sent: Friday, June 30, 2023 9:46 AM
> > > > To: Ding, Xuan <xuan.ding@intel.com>; dev@dpdk.org;
> > > > thomas@monjalon.net; Burakov, Anatoly <anatoly.burakov@intel.com>;
> > > > ferruh.yigit@amd.com
> > > > Cc: nikhil.agarwal@amd.com; He, Xingguang
> > > > <xingguang.he@intel.com>; Ling, WeiX <weix.ling@intel.com>
> > > > Subject: Re: [PATCH] vfio: do not coalesce DMA mappings
> > > >
> > > > Hi Xuan,
> > > >
> > > > Thanks for pointing out the issue and figuring out the patch which
> > > > introduced this. If you have answers to below queries, please let me know:
> > > >
> > > > Is there any other test cases which tests "--no-huge" which pass?
> > >
> > > Yes, there are test cases adding "--no-huge" option to validate 4k
> > > page size in async vhost.
> > > Actually, the page size is decided by front-end, so I think this
> > > case can be removed.
> > >
> > > Previously, testpmd can start with "--no-huge" options (not sure if
> > > there are test cases).
> > > Cmd: ./build/app/dpdk-testpmd -l 5-6 -n 4 --no-huge -m 1024 -- -i
> > >
> > > >
> > > > Also, if we change the "-m" option to provide lower memory, does
> > > > the test pass?
> > >
> > > "-m" option is also added and does not work.
> > >
> > > >
> > > > When you mention too many pages exceed the capability of IOMMU,
> > > > you are referring to HW capability to create multiple pages? Here
> > > > it seems in case of 4K page size we need 256K pages which is limiting the
> capacity?
> > >
> > > Yes, this is the result of my initial debugging.
> > > The direct impact is that this kind of testpmd cases cannot start now.
> > > If this is expected, I think we can close this defect and ignore the "--no-
> huge"
> > > option when start.
> >
> > Any insights? Should we just ignore the "--no-huge" option and close this
> defect?
> > Now we did this as a workaround. Seems no one uses the "--no-huge"
> > option in testpmd now.
> 
> VFIO supports dma_entry_limit as a module parameter, which has a default
> value of U16_MAX i.e. 64K, most likely which is limiting creation of 256K
> entries for 4K pages here. This can be modified while inserting vfio module:
>         modprobe vfio_iommu_type1 dma_entry_limit=1000000

Thanks for your suggestion. I tried it on ubuntu 22.04 but it does not work.
The reason I think is vfio-pci is build-in in kernel driver (since 20.04) and it does not support dynamic insmod/rmmod.

Does this command need to rmmod vfio first and then modprobe again?

Thanks,
Xuan

> 
> You can update the test case with updating this limit, and test case shall pass.
> 
> Thanks,
> Nipun
> 
> >
> > Thanks,
> > Xuan
> >
> > >
> > > Regards,
> > > Xuan
> > >
> > > >
> > > > Regards,
> > > > Nipun
> > > >
> > > > On 6/29/2023 1:51 PM, Ding, Xuan wrote:
> > > > > Hi Nipun,
> > > > >
> > > > > I'd like to appreciate your time reading this email.
> > > > >
> > > > > Our QA team found that since this commit "a399d7b5a994: do not
> > > > > coalesce DMA mappings" is introduced, the dpdk testpmd start
> > > > > with
> > > > > "--no-
> > > > huge" parameters will failed, and shows "EAL: Cannot set up DMA
> > > > remapping, error 28 (No space left on device)".
> > > > > So they reported it on dpdk Bugzilla:
> > > > https://bugs.dpdk.org/show_bug.cgi?id=1235.
> > > > >
> > > > > I understand this feature is to keep consistent with the kernel
> > > > > and not allow
> > > > memory segments be merged.
> > > > > The side effect is the testpmd with "--no-huge" parameters will
> > > > > not be able
> > > > to start because the too many pages will exceed the capability of IOMMU.
> > > > > Is it expected? Should we remove the --no-huge" in our testcase?
> > > > >
> > > > > Regards,
> > > > > Xuan
> > > > >
> > > > >> -----Original Message-----
> > > > >> From: Nipun Gupta <nipun.gupta@amd.com>
> > > > >> Sent: Friday, December 30, 2022 5:59 PM
> > > > >> To: dev@dpdk.org; thomas@monjalon.net; Burakov, Anatoly
> > > > >> <anatoly.burakov@intel.com>; ferruh.yigit@amd.com
> > > > >> Cc: nikhil.agarwal@amd.com; Nipun Gupta <nipun.gupta@amd.com>
> > > > >> Subject: [PATCH] vfio: do not coalesce DMA mappings
> > > > >>
> > > > >> At the cleanup time when dma unmap is done, linux kernel does
> > > > >> not allow unmap of individual segments which were coalesced
> > > > >> together while creating the DMA map for type1 IOMMU mappings.
> > > > >> So, this change updates the mapping of the memory
> > > > >> segments(hugepages) on a per-page basis.
> > > > >>
> > > > >> Signed-off-by: Nipun Gupta <nipun.gupta@amd.com>
> > > > >> ---
> > > > >>
> > > > >> When hotplug of devices is used, multiple pages gets colaeced
> > > > >> and a single mapping gets created for these pages (using APIs
> > > > >> rte_memseg_contig_walk() and type1_map_contig(). On the cleanup
> > > > >> time when the memory is released, the VFIO does not cleans up
> > > > >> that memory and following error is observed in the eal for 2MB
> > > > >> hugepages:
> > > > >> EAL: Unexpected size 0 of DMA remapping cleared instead of
> > > > >> 2097152
> > > > >>
> > > > >> This is because VFIO does not clear the DMA (refer API
> > > > >> vfio_dma_do_unmap() -
> > > > >>
> > > > https://elixir.bootlin.com/linux/latest/source/drivers/vfio/vfio_i
> > > > ommu
> > > > _type
> > > > 1.
> > > > >> c#L1330),
> > > > >> where it checks the dma mapping where it checks for IOVA to free:
> > > > >>
> > > > https://elixir.bootlin.com/linux/latest/source/drivers/vfio/vfio_i
> > > > ommu
> > > > _type
> > > > 1.
> > > > >> c#L1418.
> > > > >>
> > > > >> Thus this change updates the mapping to be created individually
> > > > >> instead of colaecing them.
> > > > >>
> > > > >>   lib/eal/linux/eal_vfio.c | 29 -----------------------------
> > > > >>   1 file changed, 29 deletions(-)
> > > > >>
> > > > >> diff --git a/lib/eal/linux/eal_vfio.c
> > > > >> b/lib/eal/linux/eal_vfio.c index 549b86ae1d..56edccb0db 100644
> > > > >> --- a/lib/eal/linux/eal_vfio.c
> > > > >> +++ b/lib/eal/linux/eal_vfio.c
> > > > >> @@ -1369,19 +1369,6 @@ rte_vfio_get_group_num(const char
> > > > *sysfs_base,
> > > > >>   	return 1;
> > > > >>   }
> > > > >>
> > > > >> -static int
> > > > >> -type1_map_contig(const struct rte_memseg_list *msl, const
> > > > >> struct rte_memseg *ms,
> > > > >> -		size_t len, void *arg)
> > > > >> -{
> > > > >> -	int *vfio_container_fd = arg;
> > > > >> -
> > > > >> -	if (msl->external)
> > > > >> -		return 0;
> > > > >> -
> > > > >> -	return vfio_type1_dma_mem_map(*vfio_container_fd, ms-
> >addr_64,
> > > > >> ms->iova,
> > > > >> -			len, 1);
> > > > >> -}
> > > > >> -
> > > > >>   static int
> > > > >>   type1_map(const struct rte_memseg_list *msl, const struct
> > > > >> rte_memseg
> > > > *ms,
> > > > >>   		void *arg)
> > > > >> @@ -1396,10 +1383,6 @@ type1_map(const struct rte_memseg_list
> > > > >> *msl, const struct rte_memseg *ms,
> > > > >>   	if (ms->iova == RTE_BAD_IOVA)
> > > > >>   		return 0;
> > > > >>
> > > > >> -	/* if IOVA mode is VA, we've already mapped the internal
> segments
> > > > */
> > > > >> -	if (!msl->external && rte_eal_iova_mode() == RTE_IOVA_VA)
> > > > >> -		return 0;
> > > > >> -
> > > > >>   	return vfio_type1_dma_mem_map(*vfio_container_fd,
> > > > >> ms->addr_64,
> > > > >> ms->iova,
> > > > >>   			ms->len, 1);
> > > > >>   }
> > > > >> @@ -1464,18 +1447,6 @@ vfio_type1_dma_mem_map(int
> > > > vfio_container_fd,
> > > > >> uint64_t vaddr, uint64_t iova,  static int
> > > > >> vfio_type1_dma_map(int
> > > > >> vfio_container_fd)  {
> > > > >> -	if (rte_eal_iova_mode() == RTE_IOVA_VA) {
> > > > >> -		/* with IOVA as VA mode, we can get away with
> mapping
> > > > >> contiguous
> > > > >> -		 * chunks rather than going page-by-page.
> > > > >> -		 */
> > > > >> -		int ret = rte_memseg_contig_walk(type1_map_contig,
> > > > >> -				&vfio_container_fd);
> > > > >> -		if (ret)
> > > > >> -			return ret;
> > > > >> -		/* we have to continue the walk because we've
> skipped the
> > > > >> -		 * external segments during the config walk.
> > > > >> -		 */
> > > > >> -	}
> > > > >>   	return rte_memseg_walk(type1_map, &vfio_container_fd);  }
> > > > >>
> > > > >> --
> > > > >> 2.25.1
> > > > >

next prev parent reply	other threads:[~2023-07-04  8:06 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-30  9:58 Nipun Gupta
2023-01-04  5:19 ` [PATCH v2] " Nipun Gupta
2023-02-02 10:48   ` David Marchand
2023-02-07  8:56     ` Gupta, Nipun
2023-04-04 14:57       ` Gupta, Nipun
2023-04-04 15:13       ` Burakov, Anatoly
2023-04-04 16:32         ` Nipun Gupta
2023-04-05 14:16           ` Burakov, Anatoly
2023-04-05 15:06             ` Gupta, Nipun
2023-04-24 15:22             ` David Marchand
2023-04-24 16:10               ` Stephen Hemminger
2023-04-24 16:16                 ` Gupta, Nipun
2023-05-10 12:58               ` Nipun Gupta
2023-05-11 14:08                 ` Burakov, Anatoly
2023-04-11 15:13           ` Thomas Monjalon
2023-04-11 16:51             ` Gupta, Nipun
2023-04-07  6:13         ` Nipun Gupta
2023-04-07  7:26           ` David Marchand
2023-05-15 11:16   ` David Marchand
2023-06-29  8:21 ` [PATCH] " Ding, Xuan
2023-06-30  1:45   ` Nipun Gupta
2023-06-30  5:58     ` Ding, Xuan
2023-07-04  5:13       ` Ding, Xuan
2023-07-04  6:53         ` Gupta, Nipun
2023-07-04  8:06           ` Ding, Xuan [this message]
2023-07-04  9:23             ` Gupta, Nipun
2023-07-04 14:09               ` Thomas Monjalon
2023-07-04 16:39                 ` Gupta, Nipun
2023-07-04 15:11               ` Ding, Xuan
2023-07-04 16:42                 ` Gupta, Nipun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BN9PR11MB5513695681A6AD2B3F1B3C5EE72EA@BN9PR11MB5513.namprd11.prod.outlook.com \
    --to=xuan.ding@intel.com \
    --cc=Ferruh.Yigit@amd.com \
    --cc=Nipun.Gupta@amd.com \
    --cc=anatoly.burakov@intel.com \
    --cc=david.marchand@redhat.com \
    --cc=dev@dpdk.org \
    --cc=nikhil.agarwal@amd.com \
    --cc=thomas@monjalon.net \
    --cc=weix.ling@intel.com \
    --cc=xingguang.he@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).