From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 637D642D8A;
	Thu, 29 Jun 2023 10:21:45 +0200 (CEST)
Received: from mails.dpdk.org (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id DC73440EDB;
	Thu, 29 Jun 2023 10:21:44 +0200 (CEST)
Received: from mga17.intel.com (mga17.intel.com [192.55.52.151])
 by mails.dpdk.org (Postfix) with ESMTP id C9FCE406B7
 for <dev@dpdk.org>; Thu, 29 Jun 2023 10:21:42 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
 d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
 t=1688026903; x=1719562903;
 h=from:to:cc:subject:date:message-id:references:
 in-reply-to:content-transfer-encoding:mime-version;
 bh=ZM4W4hEQNm34hCMzxyWJdo1Qe1ukKzUqlveV/SGA6f8=;
 b=BIncrmq9KTNmB/H3jhl+jymNe1GJZFt6LvotkRSMgDRVaBfNt6j3KQHo
 PaKR3Qw3CJzmQJlrpAvRaTOJe9fdbaFXNTf7Nf5UsaWS6UdplvQw0CO46
 B7sA8BXkX6VN7Vt1nYqeAdyP3mpYT5Jc3UxhbF7m2bIGkGpZ2TeiZlTFp
 D+tfJv0sIAMKW86VE+g5lLpJ2eQEYoaP2ZRG8QOZtBapQmFENAVqPHOhs
 LiQ4ZjShBozDLBHK7GXmEQLyHo9xEvbWz2H9iHT9qpfjJHivG5BPXx40V
 H623XwmT+fx/A0+o/q256+i3gTtsYpAmbjTf+WSL8AFbHM+Nxz8tUi3tU w==;
X-IronPort-AV: E=McAfee;i="6600,9927,10755"; a="342389353"
X-IronPort-AV: E=Sophos;i="6.01,168,1684825200"; d="scan'208";a="342389353"
Received: from orsmga004.jf.intel.com ([10.7.209.38])
 by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 29 Jun 2023 01:21:19 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=McAfee;i="6600,9927,10755"; a="841377354"
X-IronPort-AV: E=Sophos;i="6.01,168,1684825200"; d="scan'208";a="841377354"
Received: from orsmsx601.amr.corp.intel.com ([10.22.229.14])
 by orsmga004.jf.intel.com with ESMTP; 29 Jun 2023 01:21:17 -0700
Received: from orsmsx610.amr.corp.intel.com (10.22.229.23) by
 ORSMSX601.amr.corp.intel.com (10.22.229.14) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.2507.27; Thu, 29 Jun 2023 01:21:16 -0700
Received: from ORSEDG601.ED.cps.intel.com (10.7.248.6) by
 orsmsx610.amr.corp.intel.com (10.22.229.23) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.2507.27 via Frontend Transport; Thu, 29 Jun 2023 01:21:16 -0700
Received: from NAM12-DM6-obe.outbound.protection.outlook.com (104.47.59.173)
 by edgegateway.intel.com (134.134.137.102) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.1.2507.27; Thu, 29 Jun 2023 01:21:16 -0700
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none;
 b=YThP7CUHhMBtaJUmEB29ww1lLnw4j1RbqwN/YYI1h6/201bgpQnsf3xCZOsooiKK3PfmDqFM+RW/XcgHI9Si4p+7xyqim/estYcHhxKKmWHj2pp/39Yp/rYJbXSlvS9t6uxNpsbILofCo0oslKOfiXnqdoYLz7ZUw0Z1X8CmhrqlX3s6ZxpPvtHQ1XJ864VtRaRb2NAv7HhW3zU94rsnSWwicZ6JH1uRxZCiIyvEyDQDA8/YFx7qdJNn35t0jxcghZtfPFCXA/pxHWlr2fnC/u2RUW0ej8VaGH+7DUcX5UAHHJkpJSVQOTrPrBPD1NBBHHR+MOpVloNZDg2kMr8/Nw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; 
 s=arcselector9901;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
 bh=JyVWLo+c2XFfUEttSbkFwFIyNipy/isuvD18gUvA7Qw=;
 b=c/AiboYa/XFB4Uyotfl3s4BGfPqD0UpFIJO4yvcGaBBYZarhOwRoby4Dp+36Z/xabcLDi3fSP6Tv4LQXht/CHZZhK4eV1mDhYiuyjtUIPH9FuLoOpFPZ19u8fi8Ft2FSst6NTDq1I9QXvGYE0Bd22JnWiEGHWeds0Uo0/a8iwedMqw78G8MdI2xEgi8NfqGJ3X+BD/vF6Rndcyl0SbtDCeiFuHwaDaNGQ4Ya62TFqzTL60VA/odl4YXkmPclOl8CeAVQLThUg/shLMe9DmWyzIix7sBCt7V3fwk8+n1JEPX0gzhkgw6gJp50ImICRYrmWygHfrlU+Cp1mxA/Wd3qZg==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
 smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com;
 dkim=pass header.d=intel.com; arc=none
Received: from BN9PR11MB5513.namprd11.prod.outlook.com (2603:10b6:408:102::11)
 by SA0PR11MB4574.namprd11.prod.outlook.com (2603:10b6:806:71::11)
 with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.24; Thu, 29 Jun
 2023 08:21:14 +0000
Received: from BN9PR11MB5513.namprd11.prod.outlook.com
 ([fe80::8575:f58b:94d7:e121]) by BN9PR11MB5513.namprd11.prod.outlook.com
 ([fe80::8575:f58b:94d7:e121%7]) with mapi id 15.20.6521.024; Thu, 29 Jun 2023
 08:21:14 +0000
From: "Ding, Xuan" <xuan.ding@intel.com>
To: Nipun Gupta <nipun.gupta@amd.com>, "dev@dpdk.org" <dev@dpdk.org>,
 "thomas@monjalon.net" <thomas@monjalon.net>, "Burakov, Anatoly"
 <anatoly.burakov@intel.com>, "ferruh.yigit@amd.com" <ferruh.yigit@amd.com>
CC: "nikhil.agarwal@amd.com" <nikhil.agarwal@amd.com>, "He, Xingguang"
 <xingguang.he@intel.com>, "Ling, WeiX" <weix.ling@intel.com>
Subject: RE: [PATCH] vfio: do not coalesce DMA mappings
Thread-Topic: [PATCH] vfio: do not coalesce DMA mappings
Thread-Index: AQHZHDVrAwVw6KlvPU+E8wN/ilq+d6+ijFzQ
Date: Thu, 29 Jun 2023 08:21:14 +0000
Message-ID: <BN9PR11MB55137CBE3281E8EAC1CB8BB5E725A@BN9PR11MB5513.namprd11.prod.outlook.com>
References: <20221230095853.1323616-1-nipun.gupta@amd.com>
In-Reply-To: <20221230095853.1323616-1-nipun.gupta@amd.com>
Accept-Language: zh-CN, en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
authentication-results: dkim=none (message not signed)
 header.d=none;dmarc=none action=none header.from=intel.com;
x-ms-publictraffictype: Email
x-ms-traffictypediagnostic: BN9PR11MB5513:EE_|SA0PR11MB4574:EE_
x-ms-office365-filtering-correlation-id: 69ce3198-0930-491f-9867-08db7879ce23
x-ms-exchange-senderadcheck: 1
x-ms-exchange-antispam-relay: 0
x-microsoft-antispam: BCL:0;
x-microsoft-antispam-message-info: mGmdOvA7T2Ereml2WMWKfWPKIPEWmXWcxX6rozgO3/ss60F03AQDqzs6fS4Cj4CMwCHFDNHbZ9GDbfU4PkHVbCNnr4Yc4dbTimHRwMCf6/2gsiEDhaAzkuZ46YguYPHbFOZzpwCwwXSbMnMDH8lSkB4oyPaL5c0sMdn6UwJjP5H5t8gmIfyPExq1ziwSUZ21/qmbhFica2jmlU0ArutBYARsk50lK6B8LQD3psK+da3sqt5FyXCZcb6Kyzl9kxn1xFjig0/3PVC/9ubUxmBIFfCYmtmuofi8QZ56ZCqE4AgYa00UvibWUAuIl3MxFcRAEN/OKuPRog2yUMERxLEqlC0Q9ODbem9ROj5ALhAaghQuaV64VtnqLpI0zOzaDNjdnq7Wx2zVUj3QDrOzNHZQxcHzWL1EAKTeA88/opVj96LOEGEg/tfq2MPmkQ2d0NQ8qXdBgxudUjtCrAZang09YGlRRevXSux0cFR604Ml3Cs4shG05tUoQQqRPpL0qycUzURyXu2h8ydd+11cEVFVkn5d1kNoogLltUjpeN/tyiT8psxIKOKHzt+tXYMdGhs9sTXFI4tbHMA2+V4g7cTaB/bVuZRoWR8OPP5VLJH5e0H7dxu8Lh09oJ6ykNT8Tx705OgpLgzSqRcBX65vgclHrg==
x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:;
 IPV:NLI; SFV:NSPM; H:BN9PR11MB5513.namprd11.prod.outlook.com; PTR:; CAT:NONE;
 SFS:(13230028)(376002)(346002)(136003)(396003)(39860400002)(366004)(451199021)(9686003)(41300700001)(966005)(107886003)(26005)(54906003)(71200400001)(7696005)(110136005)(83380400001)(478600001)(6506007)(186003)(53546011)(76116006)(2906002)(52536014)(38100700002)(33656002)(5660300002)(66446008)(66946007)(55016003)(82960400001)(122000001)(64756008)(4326008)(66476007)(316002)(66556008)(8936002)(86362001)(8676002)(38070700005);
 DIR:OUT; SFP:1102; 
x-ms-exchange-antispam-messagedata-chunkcount: 1
x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?dNLoDbif0hpoXbbR1whO08GufomjMSw1iFEJmrgz8o7RQb1/NMxLPGUM/O9x?=
 =?us-ascii?Q?fKLHLv+LNEggTGgIIlG3QTTjWMsVEfPibY9k478DZbyOeYUsjgOQLsf2Fj34?=
 =?us-ascii?Q?kJ2K+Qjz3XP+p2pRstkYM+M33rVazzWtQ6mQ24C6EzcR4uIuJp6MVkcqMvex?=
 =?us-ascii?Q?b9YVk9HiyzhwUB5JG4GJJxCIV7FaxWcjOWvoI4/RNNAnIGFCYI95847SSa1z?=
 =?us-ascii?Q?GKkgZ9B3olWm8e0ibjMOS9BE6PsRaHdLRo3rmcfi/zUyTIexnEsCPRPfT1eS?=
 =?us-ascii?Q?kOoOlgnsgWzvfZ6QIdz6e7ZNfQOv8zv40hRl2lAqFBsA8rT9mRBP0RF4jGxV?=
 =?us-ascii?Q?FQQyaLGamukho1zDu9gvEgyePZdaQ9EE7k+t/yRMsySLVTgVNy8FHeaPmpsE?=
 =?us-ascii?Q?sSE5+Vg6dF9K+FRBrMdZSaEZH2FNIhz5eHEMXMnLiRhkWTMHQuDwwRrXWdhe?=
 =?us-ascii?Q?Q7ErSaQoItxcZB/Q1+H6WJClS45s/F19lfEmwDymfskxpgQ6zEF6meGnF7BZ?=
 =?us-ascii?Q?NiLIibcWWqtiICNp4oWRPkscmiKVQvwS3GE0JIkv+6fgFnWhLJQicxDr5I6v?=
 =?us-ascii?Q?FckKTFc+BZDIghAPN+nPHcoFV9+hySR2fPfzO9LicAvoKTwHDvg2zgF/Apbt?=
 =?us-ascii?Q?4Owg7SHT4uXM6PSFn0wgmxzSYRv5N5x8MTNvrSsh4vqdzHHDOSI1ZbnRg6IN?=
 =?us-ascii?Q?NmBFsfJvFJNhxXGfKf3FP++8ED15X1/QHvtEplYmqmi0Lki68U2JyXQqSh0V?=
 =?us-ascii?Q?X+b/lJLcVVPf7R1RLVJKquT1vdqanSwjUH4dbrSSELxs4nbWC13jJ8/kGqvL?=
 =?us-ascii?Q?1RLQEczPUZzvzCFnwNXm516U8nYLydnLYX7/9HZqxvVGnkBeuGH8dp459jQI?=
 =?us-ascii?Q?8C5sifnl+16wr+zB6bUldyA9vKKmI1DCBmdPVDH6NaMwPZzyCM4EgNvnkKnu?=
 =?us-ascii?Q?eXwrXOJkL1LuiniuyRGZaAIxuFxHt0nCm6e2UYkuRwOTBHjP/QBYFjrh04A0?=
 =?us-ascii?Q?A3ChvTx1Gz8MkD65/OjGXwQYfeHW57ac7qwPE2qBuLpEIsiryuiRLrmGSP4s?=
 =?us-ascii?Q?c+UPUzRJnJiqaeGkhg/kw2lg/gxZhSCEwa/I/4Ap9rGbOJZdkStyTZ8rchtF?=
 =?us-ascii?Q?cSRdnS5K/AN7XkOixBoaI3BXL4gMTVkGM+OUJRpqKass3ekJbVmm3/XtZ30m?=
 =?us-ascii?Q?sXEsk37i4H6I16UqVPI+EoNxzlT7+NhOCz8f2dW595Iul1H6N3Bjs6AM/dYR?=
 =?us-ascii?Q?t7jA+yNSz2xp4JPkaolvidYF+Uh1QAfh04IcRugUa2ajTexeE/cTiPI/GR1z?=
 =?us-ascii?Q?fyQbkWCAcT1Q88ivpk053OWYVsJO4ToTwQzgD9aoyUosGVzYkdQZO2LOdRlI?=
 =?us-ascii?Q?ltumNOh/uC+Jy5+ucgpg8nn/P7cqZrIiccge+Uh3i5P6DH3XV9p0PWlVaIjU?=
 =?us-ascii?Q?CzcP+3ehszGCOgP0eiEmRdngZOBSqio2VVIYPd0s+Rbw/rp0gOf8AjXrgHLL?=
 =?us-ascii?Q?e9nkw/vM/IgDZCQqV+2idOc9ymwyfGrFH8maWRac0k3Oi+iUuxDwN+HWgnJN?=
 =?us-ascii?Q?7iYp52mWzu/Iy2K8hEPoLJQmrsvc6ohtyY0cEq6j?=
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-AuthSource: BN9PR11MB5513.namprd11.prod.outlook.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 69ce3198-0930-491f-9867-08db7879ce23
X-MS-Exchange-CrossTenant-originalarrivaltime: 29 Jun 2023 08:21:14.7351 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: 15koUdmtBnB5PnVoKsd8a4pOpmv2LTmbnfKzdOUmqceeZ0HWgHe5Hi9kIhDZMWCtxpxRMscxd6iz/DhU9O+9dg==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA0PR11MB4574
X-OriginatorOrg: intel.com
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org

Hi Nipun,

I'd like to appreciate your time reading this email.

Our QA team found that since this commit "a399d7b5a994: do not coalesce DMA=
 mappings" is introduced,
the dpdk testpmd start with "--no-huge" parameters will failed, and shows "=
EAL: Cannot set up DMA remapping, error 28 (No space left on device)".
So they reported it on dpdk Bugzilla: https://bugs.dpdk.org/show_bug.cgi?id=
=3D1235.

I understand this feature is to keep consistent with the kernel and not all=
ow memory segments be merged.
The side effect is the testpmd with "--no-huge" parameters will not be able=
 to start because the too many pages will exceed the capability of IOMMU.
Is it expected? Should we remove the --no-huge" in our testcase?

Regards,
Xuan

> -----Original Message-----
> From: Nipun Gupta <nipun.gupta@amd.com>
> Sent: Friday, December 30, 2022 5:59 PM
> To: dev@dpdk.org; thomas@monjalon.net; Burakov, Anatoly
> <anatoly.burakov@intel.com>; ferruh.yigit@amd.com
> Cc: nikhil.agarwal@amd.com; Nipun Gupta <nipun.gupta@amd.com>
> Subject: [PATCH] vfio: do not coalesce DMA mappings
>=20
> At the cleanup time when dma unmap is done, linux kernel does not allow
> unmap of individual segments which were coalesced together while creating
> the DMA map for type1 IOMMU mappings. So, this change updates the
> mapping of the memory
> segments(hugepages) on a per-page basis.
>=20
> Signed-off-by: Nipun Gupta <nipun.gupta@amd.com>
> ---
>=20
> When hotplug of devices is used, multiple pages gets colaeced and a singl=
e
> mapping gets created for these pages (using APIs
> rte_memseg_contig_walk() and type1_map_contig(). On the cleanup time
> when the memory is released, the VFIO does not cleans up that memory and
> following error is observed in the eal for 2MB
> hugepages:
> EAL: Unexpected size 0 of DMA remapping cleared instead of 2097152
>=20
> This is because VFIO does not clear the DMA (refer API
> vfio_dma_do_unmap() -
> https://elixir.bootlin.com/linux/latest/source/drivers/vfio/vfio_iommu_ty=
pe1.
> c#L1330),
> where it checks the dma mapping where it checks for IOVA to free:
> https://elixir.bootlin.com/linux/latest/source/drivers/vfio/vfio_iommu_ty=
pe1.
> c#L1418.
>=20
> Thus this change updates the mapping to be created individually instead o=
f
> colaecing them.
>=20
>  lib/eal/linux/eal_vfio.c | 29 -----------------------------
>  1 file changed, 29 deletions(-)
>=20
> diff --git a/lib/eal/linux/eal_vfio.c b/lib/eal/linux/eal_vfio.c index
> 549b86ae1d..56edccb0db 100644
> --- a/lib/eal/linux/eal_vfio.c
> +++ b/lib/eal/linux/eal_vfio.c
> @@ -1369,19 +1369,6 @@ rte_vfio_get_group_num(const char *sysfs_base,
>  	return 1;
>  }
>=20
> -static int
> -type1_map_contig(const struct rte_memseg_list *msl, const struct
> rte_memseg *ms,
> -		size_t len, void *arg)
> -{
> -	int *vfio_container_fd =3D arg;
> -
> -	if (msl->external)
> -		return 0;
> -
> -	return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64,
> ms->iova,
> -			len, 1);
> -}
> -
>  static int
>  type1_map(const struct rte_memseg_list *msl, const struct rte_memseg *ms=
,
>  		void *arg)
> @@ -1396,10 +1383,6 @@ type1_map(const struct rte_memseg_list *msl,
> const struct rte_memseg *ms,
>  	if (ms->iova =3D=3D RTE_BAD_IOVA)
>  		return 0;
>=20
> -	/* if IOVA mode is VA, we've already mapped the internal segments */
> -	if (!msl->external && rte_eal_iova_mode() =3D=3D RTE_IOVA_VA)
> -		return 0;
> -
>  	return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64,
> ms->iova,
>  			ms->len, 1);
>  }
> @@ -1464,18 +1447,6 @@ vfio_type1_dma_mem_map(int vfio_container_fd,
> uint64_t vaddr, uint64_t iova,  static int  vfio_type1_dma_map(int
> vfio_container_fd)  {
> -	if (rte_eal_iova_mode() =3D=3D RTE_IOVA_VA) {
> -		/* with IOVA as VA mode, we can get away with mapping
> contiguous
> -		 * chunks rather than going page-by-page.
> -		 */
> -		int ret =3D rte_memseg_contig_walk(type1_map_contig,
> -				&vfio_container_fd);
> -		if (ret)
> -			return ret;
> -		/* we have to continue the walk because we've skipped the
> -		 * external segments during the config walk.
> -		 */
> -	}
>  	return rte_memseg_walk(type1_map, &vfio_container_fd);  }
>=20
> --
> 2.25.1