From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <anatoly.burakov@intel.com>
Received: from mga06.intel.com (mga06.intel.com [134.134.136.31])
 by dpdk.org (Postfix) with ESMTP id 6D1E71B173
 for <dev@dpdk.org>; Mon, 16 Apr 2018 12:02:51 +0200 (CEST)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
 by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 16 Apr 2018 03:02:50 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.48,459,1517904000"; d="scan'208";a="48222619"
Received: from aburakov-mobl.ger.corp.intel.com (HELO [10.237.220.128])
 ([10.237.220.128])
 by orsmga001.jf.intel.com with ESMTP; 16 Apr 2018 03:02:46 -0700
To: Xiao Wang <xiao.w.wang@intel.com>, ferruh.yigit@intel.com
Cc: dev@dpdk.org, maxime.coquelin@redhat.com, zhihong.wang@intel.com,
 tiwei.bie@intel.com, jianfeng.tan@intel.com, cunming.liang@intel.com,
 dan.daly@intel.com, thomas@monjalon.net,
 Junjie Chen <junjie.j.chen@intel.com>
References: <20180412071956.66178-2-xiao.w.wang@intel.com>
 <20180415153349.62105-1-xiao.w.wang@intel.com>
 <20180415153349.62105-2-xiao.w.wang@intel.com>
From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
Message-ID: <9f41b581-ea27-51fb-dfbe-affd64d7996c@intel.com>
Date: Mon, 16 Apr 2018 11:02:44 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101
 Thunderbird/52.7.0
MIME-Version: 1.0
In-Reply-To: <20180415153349.62105-2-xiao.w.wang@intel.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Subject: Re: [dpdk-dev] [PATCH v7 1/5] vfio: extend data structure for multi
	container
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Mon, 16 Apr 2018 10:02:52 -0000

On 15-Apr-18 4:33 PM, Xiao Wang wrote:
> Currently eal vfio framework binds vfio group fd to the default
> container fd during rte_vfio_setup_device, while in some cases,
> e.g. vDPA (vhost data path acceleration), we want to put vfio group
> to a separate container and program IOMMU via this container.
> 
> This patch extends the vfio_config structure to contain per-container
> user_mem_maps and defines an array of vfio_config. The next patch will
> base on this to add container API.
> 
> Signed-off-by: Junjie Chen <junjie.j.chen@intel.com>
> Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
> ---
>   config/common_base                     |   1 +
>   lib/librte_eal/linuxapp/eal/eal_vfio.c | 407 ++++++++++++++++++++++-----------
>   lib/librte_eal/linuxapp/eal/eal_vfio.h |  19 +-
>   3 files changed, 275 insertions(+), 152 deletions(-)
> 
> diff --git a/config/common_base b/config/common_base
> index c4236fd1f..4a76d2f14 100644
> --- a/config/common_base
> +++ b/config/common_base
> @@ -87,6 +87,7 @@ CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
>   CONFIG_RTE_EAL_IGB_UIO=n
>   CONFIG_RTE_EAL_VFIO=n
>   CONFIG_RTE_MAX_VFIO_GROUPS=64
> +CONFIG_RTE_MAX_VFIO_CONTAINERS=64
>   CONFIG_RTE_MALLOC_DEBUG=n
>   CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES=n
>   CONFIG_RTE_USE_LIBBSD=n
> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> index 589d7d478..46fba2d8d 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> @@ -22,8 +22,46 @@
>   
>   #define VFIO_MEM_EVENT_CLB_NAME "vfio_mem_event_clb"
>   
> +/*
> + * we don't need to store device fd's anywhere since they can be obtained from
> + * the group fd via an ioctl() call.
> + */
> +struct vfio_group {
> +	int group_no;
> +	int fd;
> +	int devices;
> +};

What is the purpose of moving this into .c file? Seems like an 
unnecessary change.

> +
> +/* hot plug/unplug of VFIO groups may cause all DMA maps to be dropped. we can
> + * recreate the mappings for DPDK segments, but we cannot do so for memory that
> + * was registered by the user themselves, so we need to store the user mappings
> + * somewhere, to recreate them later.
> + */
> +#define VFIO_MAX_USER_MEM_MAPS 256
> +struct user_mem_map {
> +	uint64_t addr;
> +	uint64_t iova;
> +	uint64_t len;
> +};
> +

<...>

> +static struct vfio_config *
> +get_vfio_cfg_by_group_no(int iommu_group_no)
> +{
> +	struct vfio_config *vfio_cfg;
> +	int i, j;
> +
> +	for (i = 0; i < VFIO_MAX_CONTAINERS; i++) {
> +		vfio_cfg = &vfio_cfgs[i];
> +		for (j = 0; j < VFIO_MAX_GROUPS; j++) {
> +			if (vfio_cfg->vfio_groups[j].group_no ==
> +					iommu_group_no)
> +				return vfio_cfg;
> +		}
> +	}
> +
> +	return default_vfio_cfg;

Here and in other places: i'm not sure returning default vfio config if 
group not found is such a good idea. It would be better if calling code 
explicitly handled case of group not existing yet.

> +}
> +
> +static struct vfio_config *
> +get_vfio_cfg_by_group_fd(int vfio_group_fd)
> +{
> +	struct vfio_config *vfio_cfg;
> +	int i, j;
> +
> +	for (i = 0; i < VFIO_MAX_CONTAINERS; i++) {
> +		vfio_cfg = &vfio_cfgs[i];
> +		for (j = 0; j < VFIO_MAX_GROUPS; j++)
> +			if (vfio_cfg->vfio_groups[j].fd == vfio_group_fd)
> +				return vfio_cfg;
> +	}
>   

<...>

> -	for (i = 0; i < VFIO_MAX_GROUPS; i++) {
> -		vfio_cfg.vfio_groups[i].fd = -1;
> -		vfio_cfg.vfio_groups[i].group_no = -1;
> -		vfio_cfg.vfio_groups[i].devices = 0;
> +	rte_spinlock_recursive_t lock = RTE_SPINLOCK_RECURSIVE_INITIALIZER;
> +
> +	for (i = 0; i < VFIO_MAX_CONTAINERS; i++) {
> +		vfio_cfgs[i].vfio_container_fd = -1;
> +		vfio_cfgs[i].vfio_active_groups = 0;
> +		vfio_cfgs[i].vfio_iommu_type = NULL;
> +		vfio_cfgs[i].mem_maps.lock = lock;

Nitpick - why copy, instead of straight up initializing with 
RTE_SPINLOCK_RECURSIVE_INITIALIZER?

> +
> +		for (j = 0; j < VFIO_MAX_GROUPS; j++) {
> +			vfio_cfgs[i].vfio_groups[j].fd = -1;
> +			vfio_cfgs[i].vfio_groups[j].group_no = -1;
> +			vfio_cfgs[i].vfio_groups[j].devices = 0;
> +		}
>   	}
>   
>   	/* inform the user that we are probing for VFIO */
> @@ -841,12 +971,12 @@ rte_vfio_enable(const char *modname)
>   		return 0;
>   	}

<...>

-- 
Thanks,
Anatoly