From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 900A0A0543; Mon, 6 Jun 2022 13:24:18 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 4B98441181; Mon, 6 Jun 2022 13:22:41 +0200 (CEST) Received: from NAM12-DM6-obe.outbound.protection.outlook.com (mail-dm6nam12on2063.outbound.protection.outlook.com [40.107.243.63]) by mails.dpdk.org (Postfix) with ESMTP id 21EA642BC7 for ; Mon, 6 Jun 2022 13:22:38 +0200 (CEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=UH8+xJyeIkM1/e4yO/iVEsKCfo0MwEWB40cG1m0jc+0pf1/YdIhgGkt2DictfmY4ilLcQb7kETImP5Vh7rV0RwgBj+NQnqsINStQYwefrFVLFG+plvXvmN1oH4teEK9aesrg0L5NpUOHoQw2iF4O6tBa/KSH10GwBlq/T0YhQ6cgZZrgfVhVKH5105R7t8TSNYcJvdsP9gwFW185oj+9lSaW5vnOD6aTte1EuKcTA5nscvb5MqZ3bgsnle74mv76kUq46o2puENelwEjUGNWLZXu0WGk2DjdB1LC93NcC58sy/v7Pi5iAQOvzXcqAvPeT5Su60WLXekGawhwV2zUlQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=CVRcS2uqZd9Wh3g830a87N53PYxY8F6zN7aoMgp2IRU=; b=XRDFNBAp4aNLJqYwTPbBBZc+s/C1iiHRwNjRz/W3YZieNVyhuxdcwpossKy0f1S9LLzGtJVxTParEqsy3h8woZUBPcG5vPYsuROlCHUPfApPXzumdzA4kUzTRfOlxajIgbgWF2TSSlM/KzLHK65TsmPvwgh87BBjqiGoLtB6vt32w3su5Dy75OkuOGImUcFMMF41B+CgK3rfGVSbJf2kPA85pbyo6aFWxbTMlMRmH5tmzRco0eOtcr0fALX4CXps/EEalR23i5KSHvciWsqMgzbcrB3xBeQk72ZKSXevTXv7hV60AQ83z+zYyFtzLgHkFyvdoY43o15IMiwdncGpOA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 12.22.5.238) smtp.rcpttodomain=monjalon.net smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=CVRcS2uqZd9Wh3g830a87N53PYxY8F6zN7aoMgp2IRU=; b=ffzNBncaG4YUvzKaIh0I7tY8rSNs29Er0wRxyvPlye/1kUFlGNTzmp7MnWuHSc/lqMQrCc+38BdFxgDmZsyBjV1RMe2SdqX4Jae3UELdZrEPqNEie/38wpomUAs3Q/Jb5FbAufruWiv8EElg+VhBWEZG6irj21Cv/QdoA6jigzrIinnAosSX1FJZp/j15EzSeF0py+RqeDtiQ2hIO+ytuuLexp+LCedS1qgqW9esv/DwmUYvEHwyZ4wq8EplQs5NvJ40Vh1YzAWRGfzUXvivROYtJzBYWWlsIWLnrEIMSZKjFj3wVSYBephwkugHKrNnCp6XxV7bcEMr3UESukk0aA== Received: from DM6PR17CA0019.namprd17.prod.outlook.com (2603:10b6:5:1b3::32) by BN6PR12MB1363.namprd12.prod.outlook.com (2603:10b6:404:1f::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5314.12; Mon, 6 Jun 2022 11:22:36 +0000 Received: from DM6NAM11FT016.eop-nam11.prod.protection.outlook.com (2603:10b6:5:1b3:cafe::1b) by DM6PR17CA0019.outlook.office365.com (2603:10b6:5:1b3::32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5314.18 via Frontend Transport; Mon, 6 Jun 2022 11:22:36 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 12.22.5.238) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 12.22.5.238 as permitted sender) receiver=protection.outlook.com; client-ip=12.22.5.238; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (12.22.5.238) by DM6NAM11FT016.mail.protection.outlook.com (10.13.173.139) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.5314.12 via Frontend Transport; Mon, 6 Jun 2022 11:22:35 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by DRHQMAIL105.nvidia.com (10.27.9.14) with Microsoft SMTP Server (TLS) id 15.0.1497.32; Mon, 6 Jun 2022 11:22:35 +0000 Received: from nvidia.com (10.126.231.35) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.22; Mon, 6 Jun 2022 04:22:32 -0700 From: Li Zhang To: , , , CC: , , , Subject: [PATCH 11/16] vdpa/mlx5: add MT task for VM memory registration Date: Mon, 6 Jun 2022 14:20:57 +0300 Message-ID: <20220606112109.208873-21-lizh@nvidia.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220606112109.208873-1-lizh@nvidia.com> References: <20220408075606.33056-1-lizh@nvidia.com> <20220606112109.208873-1-lizh@nvidia.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [10.126.231.35] X-ClientProxiedBy: rnnvmail202.nvidia.com (10.129.68.7) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 6a2a2d9c-1b13-4b52-2a7f-08da47aedb9b X-MS-TrafficTypeDiagnostic: BN6PR12MB1363:EE_ X-LD-Processed: 43083d15-7273-40c1-b7db-39efd9ccc17a,ExtAddr X-Microsoft-Antispam-PRVS: X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: j1n+RxYwZaYtWCLxkpVhJreYpK6usF+uH/g2wv2GGazewcM486/E8/Y3KcNGjLMTneOwxi3XXJ+RZ1mRXH0q9zCvDNU78khMbFaqb6V1gqqG4viFmUO3vFKedklXmxPdqj0PJR7PMsL12Ixbc8/fkXso2AaCmmqnahdx9m4k5DDyXaJ8oBr8LRtIjpZED+ClPAKS2V0wwddrsEXFGa2vZaddQn2Ms4GTl5XS5+0LtKdlHegj+SfFqoZaduHUM+3RkU3Bqnon53ytQoRS9LglsYT22igpnIiB/UCdgOWz27DbgRdhJ92+OKH7sH5+72IRugl7/BQT7hlNAB84tFVWE4cC/mSn7VgEyU0WhFpcZ6L2vrvAHgE7m47x41eMVDdd9TagVqSvV5+QwEwsLrDNri2BiuZdk0YHZNyOy9d0VDLAqd1Igf8v9/BAhLJeyZNU/Z9ZBpDEjZG3Sg5KIgcyJSvw9VBL2ZTzJbAwLUy319uGkHX3kFGxRnNmyWqLo1qPqVtoZTyzbV6L9v60Qp9tL/gAedfmQWTqdt81khDxCY04ufYLPYWOKNlP7pONTs6cLcsUWxw8iYJ+x2j8BtkQ2igV721hnVeaXDvRXDwLg2DZ7Qw09Cj459MmmJuCieBvMlImtG7dQ4LZsAmjrzwK+JcXvpwx0o9UAAb8Liocn85W4SmypUAE0lxYLfLEE1q87x7FnNHekl0IxcDNQmykVA== X-Forefront-Antispam-Report: CIP:12.22.5.238; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:InfoNoRecords; CAT:NONE; SFS:(13230001)(4636009)(36840700001)(46966006)(40470700004)(6286002)(426003)(16526019)(36860700001)(110136005)(82310400005)(6636002)(1076003)(186003)(107886003)(2906002)(40460700003)(508600001)(47076005)(83380400001)(54906003)(336012)(2616005)(70586007)(70206006)(55016003)(6666004)(36756003)(81166007)(356005)(8676002)(4326008)(7696005)(30864003)(26005)(5660300002)(86362001)(316002)(8936002)(36900700001); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Jun 2022 11:22:35.9044 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 6a2a2d9c-1b13-4b52-2a7f-08da47aedb9b X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[12.22.5.238]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT016.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN6PR12MB1363 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org The driver creates a direct MR object of the HW for each VM memory region, which maps the VM physical address to the actual physical address. Later, after all the MRs are ready, the driver creates an indirect MR to group all the direct MRs into one virtual space from the HW perspective. Create direct MRs in parallel using the MT mechanism. After completion, the primary thread creates the indirect MR needed for the following virtqs configurations. This optimization accelerrate the LM process and reduce its time by 5%. Signed-off-by: Li Zhang --- drivers/vdpa/mlx5/mlx5_vdpa.c | 1 - drivers/vdpa/mlx5/mlx5_vdpa.h | 31 ++- drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 47 ++++- drivers/vdpa/mlx5/mlx5_vdpa_mem.c | 270 ++++++++++++++++++-------- drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 6 +- 5 files changed, 258 insertions(+), 97 deletions(-) diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c index a9d023ed08..e3b32fa087 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa.c @@ -768,7 +768,6 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev, rte_errno = rte_errno ? rte_errno : EINVAL; goto error; } - SLIST_INIT(&priv->mr_list); pthread_mutex_lock(&priv_list_lock); TAILQ_INSERT_TAIL(&priv_list, priv, next); pthread_mutex_unlock(&priv_list_lock); diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h index 2bbb868ec6..3316ce42be 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa.h +++ b/drivers/vdpa/mlx5/mlx5_vdpa.h @@ -59,7 +59,6 @@ struct mlx5_vdpa_event_qp { }; struct mlx5_vdpa_query_mr { - SLIST_ENTRY(mlx5_vdpa_query_mr) next; union { struct ibv_mr *mr; struct mlx5_devx_obj *mkey; @@ -76,10 +75,17 @@ enum { #define MLX5_VDPA_MAX_C_THRD 256 #define MLX5_VDPA_MAX_TASKS_PER_THRD 4096 #define MLX5_VDPA_TASKS_PER_DEV 64 +#define MLX5_VDPA_MAX_MRS 0xFFFF + +/* Vdpa task types. */ +enum mlx5_vdpa_task_type { + MLX5_VDPA_TASK_REG_MR = 1, +}; /* Generic task information and size must be multiple of 4B. */ struct mlx5_vdpa_task { struct mlx5_vdpa_priv *priv; + enum mlx5_vdpa_task_type type; uint32_t *remaining_cnt; uint32_t *err_cnt; uint32_t idx; @@ -101,6 +107,14 @@ struct mlx5_vdpa_conf_thread_mng { }; extern struct mlx5_vdpa_conf_thread_mng conf_thread_mng; +struct mlx5_vdpa_vmem_info { + struct rte_vhost_memory *vmem; + uint32_t entries_num; + uint64_t gcd; + uint64_t size; + uint8_t mode; +}; + struct mlx5_vdpa_virtq { SLIST_ENTRY(mlx5_vdpa_virtq) next; uint8_t enable; @@ -176,7 +190,7 @@ struct mlx5_vdpa_priv { struct mlx5_hca_vdpa_attr caps; uint32_t gpa_mkey_index; struct ibv_mr *null_mr; - struct rte_vhost_memory *vmem; + struct mlx5_vdpa_vmem_info vmem_info; struct mlx5dv_devx_event_channel *eventc; struct mlx5dv_devx_event_channel *err_chnl; struct mlx5_uar uar; @@ -187,11 +201,13 @@ struct mlx5_vdpa_priv { uint8_t num_lag_ports; uint64_t features; /* Negotiated features. */ uint16_t log_max_rqt_size; + uint16_t last_c_thrd_idx; + uint16_t num_mrs; /* Number of memory regions. */ struct mlx5_vdpa_steer steer; struct mlx5dv_var *var; void *virtq_db_addr; struct mlx5_pmd_wrapped_mr lm_mr; - SLIST_HEAD(mr_list, mlx5_vdpa_query_mr) mr_list; + struct mlx5_vdpa_query_mr **mrs; struct mlx5_vdpa_virtq virtqs[]; }; @@ -548,5 +564,12 @@ mlx5_vdpa_mult_threads_destroy(bool need_unlock); bool mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv, uint32_t thrd_idx, - uint32_t num); + enum mlx5_vdpa_task_type task_type, + uint32_t *bulk_refcnt, uint32_t *bulk_err_cnt, + void **task_data, uint32_t num); +int +mlx5_vdpa_register_mr(struct mlx5_vdpa_priv *priv, uint32_t idx); +bool +mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt, + uint32_t *err_cnt, uint32_t sleep_time); #endif /* RTE_PMD_MLX5_VDPA_H_ */ diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c index 1fdc92d3ad..10391931ae 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c @@ -47,16 +47,23 @@ mlx5_vdpa_c_thrd_ring_enqueue_bulk(struct rte_ring *r, bool mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv, uint32_t thrd_idx, - uint32_t num) + enum mlx5_vdpa_task_type task_type, + uint32_t *remaining_cnt, uint32_t *err_cnt, + void **task_data, uint32_t num) { struct rte_ring *rng = conf_thread_mng.cthrd[thrd_idx].rng; struct mlx5_vdpa_task task[MLX5_VDPA_TASKS_PER_DEV]; + uint32_t *data = (uint32_t *)task_data; uint32_t i; MLX5_ASSERT(num <= MLX5_VDPA_TASKS_PER_DEV); for (i = 0 ; i < num; i++) { task[i].priv = priv; /* To be added later. */ + task[i].type = task_type; + task[i].remaining_cnt = remaining_cnt; + task[i].err_cnt = err_cnt; + task[i].idx = data[i]; } if (!mlx5_vdpa_c_thrd_ring_enqueue_bulk(rng, (void **)&task, num, NULL)) return -1; @@ -71,6 +78,23 @@ mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv, return 0; } +bool +mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt, + uint32_t *err_cnt, uint32_t sleep_time) +{ + /* Check and wait all tasks done. */ + while (__atomic_load_n(remaining_cnt, + __ATOMIC_RELAXED) != 0) { + rte_delay_us_sleep(sleep_time); + } + if (__atomic_load_n(err_cnt, + __ATOMIC_RELAXED)) { + DRV_LOG(ERR, "Tasks done with error."); + return true; + } + return false; +} + static void * mlx5_vdpa_c_thread_handle(void *arg) { @@ -81,6 +105,7 @@ mlx5_vdpa_c_thread_handle(void *arg) struct rte_ring *rng; uint32_t thrd_idx; uint32_t task_num; + int ret; for (thrd_idx = 0; thrd_idx < multhrd->max_thrds; thrd_idx++) @@ -99,13 +124,29 @@ mlx5_vdpa_c_thread_handle(void *arg) &multhrd->cthrd[thrd_idx].c_cond, &multhrd->cthrd_lock); pthread_mutex_unlock(&multhrd->cthrd_lock); + continue; } priv = task.priv; if (priv == NULL) continue; - __atomic_fetch_sub(task.remaining_cnt, + switch (task.type) { + case MLX5_VDPA_TASK_REG_MR: + ret = mlx5_vdpa_register_mr(priv, task.idx); + if (ret) { + DRV_LOG(ERR, + "Failed to register mr %d.", task.idx); + __atomic_fetch_add(task.err_cnt, 1, + __ATOMIC_RELAXED); + } + break; + default: + DRV_LOG(ERR, "Invalid vdpa task type %d.", + task.type); + break; + } + if (task.remaining_cnt) + __atomic_fetch_sub(task.remaining_cnt, 1, __ATOMIC_RELAXED); - /* To be added later. */ } return NULL; } diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_mem.c b/drivers/vdpa/mlx5/mlx5_vdpa_mem.c index d6e3dd664b..e333f0bca6 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa_mem.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa_mem.c @@ -17,25 +17,33 @@ void mlx5_vdpa_mem_dereg(struct mlx5_vdpa_priv *priv) { + struct mlx5_vdpa_query_mr *mrs = + (struct mlx5_vdpa_query_mr *)priv->mrs; struct mlx5_vdpa_query_mr *entry; - struct mlx5_vdpa_query_mr *next; + int i; - entry = SLIST_FIRST(&priv->mr_list); - while (entry) { - next = SLIST_NEXT(entry, next); - if (entry->is_indirect) - claim_zero(mlx5_devx_cmd_destroy(entry->mkey)); - else - claim_zero(mlx5_glue->dereg_mr(entry->mr)); - SLIST_REMOVE(&priv->mr_list, entry, mlx5_vdpa_query_mr, next); - rte_free(entry); - entry = next; + if (priv->mrs) { + for (i = priv->num_mrs - 1; i >= 0; i--) { + entry = &mrs[i]; + if (entry->is_indirect) { + if (entry->mkey) + claim_zero( + mlx5_devx_cmd_destroy(entry->mkey)); + } else { + if (entry->mr) + claim_zero( + mlx5_glue->dereg_mr(entry->mr)); + } + } + rte_free(priv->mrs); + priv->mrs = NULL; + priv->num_mrs = 0; } - SLIST_INIT(&priv->mr_list); - if (priv->vmem) { - free(priv->vmem); - priv->vmem = NULL; + if (priv->vmem_info.vmem) { + free(priv->vmem_info.vmem); + priv->vmem_info.vmem = NULL; } + priv->gpa_mkey_index = 0; } static int @@ -167,72 +175,29 @@ mlx5_vdpa_mem_cmp(struct rte_vhost_memory *mem1, struct rte_vhost_memory *mem2) #define KLM_SIZE_MAX_ALIGN(sz) ((sz) > MLX5_MAX_KLM_BYTE_COUNT ? \ MLX5_MAX_KLM_BYTE_COUNT : (sz)) -/* - * The target here is to group all the physical memory regions of the - * virtio device in one indirect mkey. - * For KLM Fixed Buffer Size mode (HW find the translation entry in one - * read according to the guest physical address): - * All the sub-direct mkeys of it must be in the same size, hence, each - * one of them should be in the GCD size of all the virtio memory - * regions and the holes between them. - * For KLM mode (each entry may be in different size so HW must iterate - * the entries): - * Each virtio memory region and each hole between them have one entry, - * just need to cover the maximum allowed size(2G) by splitting entries - * which their associated memory regions are bigger than 2G. - * It means that each virtio memory region may be mapped to more than - * one direct mkey in the 2 modes. - * All the holes of invalid memory between the virtio memory regions - * will be mapped to the null memory region for security. - */ -int -mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv) +static int +mlx5_vdpa_create_indirect_mkey(struct mlx5_vdpa_priv *priv) { struct mlx5_devx_mkey_attr mkey_attr; - struct mlx5_vdpa_query_mr *entry = NULL; - struct rte_vhost_mem_region *reg = NULL; - uint8_t mode = 0; - uint32_t entries_num = 0; - uint32_t i; - uint64_t gcd = 0; + struct mlx5_vdpa_query_mr *mrs = + (struct mlx5_vdpa_query_mr *)priv->mrs; + struct mlx5_vdpa_query_mr *entry; + struct rte_vhost_mem_region *reg; + uint8_t mode = priv->vmem_info.mode; + uint32_t entries_num = priv->vmem_info.entries_num; + struct rte_vhost_memory *mem = priv->vmem_info.vmem; + struct mlx5_klm klm_array[entries_num]; + uint64_t gcd = priv->vmem_info.gcd; + int ret = -rte_errno; uint64_t klm_size; - uint64_t mem_size; - uint64_t k; int klm_index = 0; - int ret; - struct rte_vhost_memory *mem = mlx5_vdpa_vhost_mem_regions_prepare - (priv->vid, &mode, &mem_size, &gcd, &entries_num); - struct mlx5_klm klm_array[entries_num]; + uint64_t k; + uint32_t i; - if (!mem) - return -rte_errno; - if (priv->vmem != NULL) { - if (mlx5_vdpa_mem_cmp(mem, priv->vmem) == 0) { - /* VM memory not changed, reuse resources. */ - free(mem); - return 0; - } - mlx5_vdpa_mem_dereg(priv); - } - priv->vmem = mem; + /* If it is the last entry, create indirect mkey. */ for (i = 0; i < mem->nregions; i++) { + entry = &mrs[i]; reg = &mem->regions[i]; - entry = rte_zmalloc(__func__, sizeof(*entry), 0); - if (!entry) { - ret = -ENOMEM; - DRV_LOG(ERR, "Failed to allocate mem entry memory."); - goto error; - } - entry->mr = mlx5_glue->reg_mr_iova(priv->cdev->pd, - (void *)(uintptr_t)(reg->host_user_addr), - reg->size, reg->guest_phys_addr, - IBV_ACCESS_LOCAL_WRITE); - if (!entry->mr) { - DRV_LOG(ERR, "Failed to create direct Mkey."); - ret = -rte_errno; - goto error; - } - entry->is_indirect = 0; if (i > 0) { uint64_t sadd; uint64_t empty_region_sz = reg->guest_phys_addr - @@ -265,11 +230,10 @@ mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv) klm_array[klm_index].address = reg->guest_phys_addr + k; klm_index++; } - SLIST_INSERT_HEAD(&priv->mr_list, entry, next); } memset(&mkey_attr, 0, sizeof(mkey_attr)); mkey_attr.addr = (uintptr_t)(mem->regions[0].guest_phys_addr); - mkey_attr.size = mem_size; + mkey_attr.size = priv->vmem_info.size; mkey_attr.pd = priv->cdev->pdn; mkey_attr.umem_id = 0; /* Must be zero for KLM mode. */ @@ -278,25 +242,159 @@ mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv) mkey_attr.pg_access = 0; mkey_attr.klm_array = klm_array; mkey_attr.klm_num = klm_index; - entry = rte_zmalloc(__func__, sizeof(*entry), 0); - if (!entry) { - DRV_LOG(ERR, "Failed to allocate memory for indirect entry."); - ret = -ENOMEM; - goto error; - } + entry = &mrs[mem->nregions]; entry->mkey = mlx5_devx_cmd_mkey_create(priv->cdev->ctx, &mkey_attr); if (!entry->mkey) { DRV_LOG(ERR, "Failed to create indirect Mkey."); - ret = -rte_errno; - goto error; + rte_errno = -ret; + return ret; } entry->is_indirect = 1; - SLIST_INSERT_HEAD(&priv->mr_list, entry, next); priv->gpa_mkey_index = entry->mkey->id; return 0; +} + +/* + * The target here is to group all the physical memory regions of the + * virtio device in one indirect mkey. + * For KLM Fixed Buffer Size mode (HW find the translation entry in one + * read according to the guest phisical address): + * All the sub-direct mkeys of it must be in the same size, hence, each + * one of them should be in the GCD size of all the virtio memory + * regions and the holes between them. + * For KLM mode (each entry may be in different size so HW must iterate + * the entries): + * Each virtio memory region and each hole between them have one entry, + * just need to cover the maximum allowed size(2G) by splitting entries + * which their associated memory regions are bigger than 2G. + * It means that each virtio memory region may be mapped to more than + * one direct mkey in the 2 modes. + * All the holes of invalid memory between the virtio memory regions + * will be mapped to the null memory region for security. + */ +int +mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv) +{ + void *mrs; + uint8_t mode = 0; + int ret = -rte_errno; + uint32_t i, thrd_idx, data[1]; + uint32_t remaining_cnt = 0, err_cnt = 0, task_num = 0; + struct rte_vhost_memory *mem = mlx5_vdpa_vhost_mem_regions_prepare + (priv->vid, &mode, &priv->vmem_info.size, + &priv->vmem_info.gcd, &priv->vmem_info.entries_num); + + if (!mem) + return -rte_errno; + if (priv->vmem_info.vmem != NULL) { + if (mlx5_vdpa_mem_cmp(mem, priv->vmem_info.vmem) == 0) { + /* VM memory not changed, reuse resources. */ + free(mem); + return 0; + } + mlx5_vdpa_mem_dereg(priv); + } + priv->vmem_info.vmem = mem; + priv->vmem_info.mode = mode; + priv->num_mrs = mem->nregions; + if (!priv->num_mrs || priv->num_mrs >= MLX5_VDPA_MAX_MRS) { + DRV_LOG(ERR, + "Invalid number of memory regions."); + goto error; + } + /* The last one is indirect mkey entry. */ + priv->num_mrs++; + mrs = rte_zmalloc("mlx5 vDPA memory regions", + sizeof(struct mlx5_vdpa_query_mr) * priv->num_mrs, 0); + priv->mrs = mrs; + if (!priv->mrs) { + DRV_LOG(ERR, "Failed to allocate private memory regions."); + goto error; + } + if (priv->use_c_thread) { + uint32_t main_task_idx[mem->nregions]; + + for (i = 0; i < mem->nregions; i++) { + thrd_idx = i % (conf_thread_mng.max_thrds + 1); + if (!thrd_idx) { + main_task_idx[task_num] = i; + task_num++; + continue; + } + thrd_idx = priv->last_c_thrd_idx + 1; + if (thrd_idx >= conf_thread_mng.max_thrds) + thrd_idx = 0; + priv->last_c_thrd_idx = thrd_idx; + data[0] = i; + if (mlx5_vdpa_task_add(priv, thrd_idx, + MLX5_VDPA_TASK_REG_MR, + &remaining_cnt, &err_cnt, + (void **)&data, 1)) { + DRV_LOG(ERR, + "Fail to add task mem region (%d)", i); + main_task_idx[task_num] = i; + task_num++; + } + } + for (i = 0; i < task_num; i++) { + ret = mlx5_vdpa_register_mr(priv, + main_task_idx[i]); + if (ret) { + DRV_LOG(ERR, + "Failed to register mem region %d.", i); + goto error; + } + } + if (mlx5_vdpa_c_thread_wait_bulk_tasks_done(&remaining_cnt, + &err_cnt, 100)) { + DRV_LOG(ERR, + "Failed to wait register mem region tasks ready."); + goto error; + } + } else { + for (i = 0; i < mem->nregions; i++) { + ret = mlx5_vdpa_register_mr(priv, i); + if (ret) { + DRV_LOG(ERR, + "Failed to register mem region %d.", i); + goto error; + } + } + } + ret = mlx5_vdpa_create_indirect_mkey(priv); + if (ret) { + DRV_LOG(ERR, "Failed to create indirect mkey ."); + goto error; + } + return 0; error: - rte_free(entry); mlx5_vdpa_mem_dereg(priv); rte_errno = -ret; return ret; } + +int +mlx5_vdpa_register_mr(struct mlx5_vdpa_priv *priv, uint32_t idx) +{ + struct rte_vhost_memory *mem = priv->vmem_info.vmem; + struct mlx5_vdpa_query_mr *mrs = + (struct mlx5_vdpa_query_mr *)priv->mrs; + struct mlx5_vdpa_query_mr *entry; + struct rte_vhost_mem_region *reg; + int ret; + + reg = &mem->regions[idx]; + entry = &mrs[idx]; + entry->mr = mlx5_glue->reg_mr_iova + (priv->cdev->pd, + (void *)(uintptr_t)(reg->host_user_addr), + reg->size, reg->guest_phys_addr, + IBV_ACCESS_LOCAL_WRITE); + if (!entry->mr) { + DRV_LOG(ERR, "Failed to create direct Mkey."); + ret = -rte_errno; + return ret; + } + entry->is_indirect = 0; + return 0; +} diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c index 599809b09b..0b317655db 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c @@ -353,21 +353,21 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv, } } if (attr->q_type == MLX5_VIRTQ_TYPE_SPLIT) { - gpa = mlx5_vdpa_hva_to_gpa(priv->vmem, + gpa = mlx5_vdpa_hva_to_gpa(priv->vmem_info.vmem, (uint64_t)(uintptr_t)vq->desc); if (!gpa) { DRV_LOG(ERR, "Failed to get descriptor ring GPA."); return -1; } attr->desc_addr = gpa; - gpa = mlx5_vdpa_hva_to_gpa(priv->vmem, + gpa = mlx5_vdpa_hva_to_gpa(priv->vmem_info.vmem, (uint64_t)(uintptr_t)vq->used); if (!gpa) { DRV_LOG(ERR, "Failed to get GPA for used ring."); return -1; } attr->used_addr = gpa; - gpa = mlx5_vdpa_hva_to_gpa(priv->vmem, + gpa = mlx5_vdpa_hva_to_gpa(priv->vmem_info.vmem, (uint64_t)(uintptr_t)vq->avail); if (!gpa) { DRV_LOG(ERR, "Failed to get GPA for available ring."); -- 2.31.1