From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 4D76CA0C4C; Tue, 5 Oct 2021 16:45:41 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 3735641377; Tue, 5 Oct 2021 16:45:41 +0200 (CEST) Received: from NAM11-DM6-obe.outbound.protection.outlook.com (mail-dm6nam11on2070.outbound.protection.outlook.com [40.107.223.70]) by mails.dpdk.org (Postfix) with ESMTP id 3790641377 for ; Tue, 5 Oct 2021 16:45:39 +0200 (CEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=A7vmjjCKKHIsg/8cvQ1xMMuY08olNVTjCoeBoLoWKl059X5IuFylJYIuGzJ+urtFu2O7VYue5PwNwxntdtpFKynPDqYZ17WDcG9QcUGF1BjrE/8bx7k+bBWEbWMCTeXkhCs7wH65kROFUdq+GHYvEk6c6o8NmGutzQE34fIAJsLuffENNCocHqSViMTPbzh0I5O+u0KaxoXqkB8PU5gBiQYCT7wGDFF8Mgu99t+XkvHlgSHLN18sWeYkb98VwxNXjrQjf1MaohKLHZRjzJQBYU11FodD/IV8RLDgNJpAKeUooX6Mrl/XBXwG6agEGNkT0Xm9MKfOt5Ex092vaHJBUQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=a1WRbbJPT4Ed2KXVLfFuW0g7Ylj/rqOXFcmQfqWrvHI=; b=HGboSqJBNAwU51mkVHfjWWQuoQfHtrQ+TYYTOto2+bNxpMKKFTNjbCAR4FrTRaMxS6VqvmU74IL2yDO0UFY9Z37tqwT3ozPNBzHEL8tUrLRiVLRbNwGafPW5MgXDnRopd3QQ691o5R4XGpRchhohsyj0c1KoEzOW9jO8dPYXUVr++S5eeqocF9cG6f+ql1Y61Zx022Ns2FpuGf8F1OkKtphlBEkG1MwrxqG57tgBlYz4U5VE/arPxpW427CnTtRpEByEm2qtw99Hw5CK6VsQlTKli5MjNFBf4GB2Pat5SXSrkceX082dR3mHgLOdIOZAX6rO2ETT0/tYp5lm7Kp4UA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.112.36) smtp.rcpttodomain=dpdk.org smtp.mailfrom=nvidia.com; dmarc=pass (p=quarantine sp=none pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=a1WRbbJPT4Ed2KXVLfFuW0g7Ylj/rqOXFcmQfqWrvHI=; b=t4w5xXbiXJE/IjQY+V3VAnby4kvLUwTtcRMpcYxjDqy04EypHng0wjmhB1yzA0YjNRbVEdqX1BZZ1kydzbv+Dqfgp4fafPKtE+quriytj8nvuo3wXwa20MDCU1m2Cx0M8x8WQUHIb6OxXeMPNv085y3okhrUGE1sVTJ6g0aviXi83+ee3W/a5t116A+Q1PHsRSeaFtmpvcgcc28s+lAwPYKaXjS5ufIlFYo86vcaYqGbFV+F2AwAr9pofBdVTUijvkWaaofwkNOou5ayd88/hAOEOFzRrKfgaqHDKZsUrnJ7sJYcJZSIPL/hH1lZN6zydonJkF9Ip4fgi8KnFrOIUg== Received: from DM6PR07CA0040.namprd07.prod.outlook.com (2603:10b6:5:74::17) by DM6PR12MB4139.namprd12.prod.outlook.com (2603:10b6:5:214::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4566.14; Tue, 5 Oct 2021 14:45:37 +0000 Received: from DM6NAM11FT006.eop-nam11.prod.protection.outlook.com (2603:10b6:5:74:cafe::16) by DM6PR07CA0040.outlook.office365.com (2603:10b6:5:74::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4566.14 via Frontend Transport; Tue, 5 Oct 2021 14:45:37 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.112.36) smtp.mailfrom=nvidia.com; dpdk.org; dkim=none (message not signed) header.d=none;dpdk.org; dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.112.36 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.112.36; helo=mail.nvidia.com; Received: from mail.nvidia.com (216.228.112.36) by DM6NAM11FT006.mail.protection.outlook.com (10.13.173.104) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.4566.14 via Frontend Transport; Tue, 5 Oct 2021 14:45:37 +0000 Received: from HQMAIL111.nvidia.com (172.20.187.18) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1497.18; Tue, 5 Oct 2021 14:45:35 +0000 Received: from dhcp-10-21-177-184.nvidia.com (172.20.187.6) by mail.nvidia.com (172.20.187.18) with Microsoft SMTP Server (TLS) id 15.0.1497.18 via Frontend Transport; Tue, 5 Oct 2021 14:45:35 +0000 Message-ID: <167198.611281009-sendEmail@dhcp-10-21-177-184> From: "eagostini@nvidia.com" To: "dev@dpdk.org" Date: Tue, 5 Oct 2021 14:45:32 +0000 X-Mailer: sendEmail-1.56 MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 6d4d436c-45a3-40dd-f674-08d9880ecb59 X-MS-TrafficTypeDiagnostic: DM6PR12MB4139: X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:172; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: J2H6fN/oTXwebO+ekB26tMVyfT/jkiIGKwBJIkBDvCfBXDnsxqp9ZPMAzK+SiaGnMy2FeM09l3y4f/mlRExp7WoSwPq7PFMzFe5utoaUnPMPihTCljLA165yVGOq93b+JEZzrKfm/bSjz2vmVm7azPgu649vylOJ0z6rVmWgbCR/iUN4TtiMjkr+NsNYH4fnmFJl+6GBTfcxPxoPRtp7d5uGLZ9hfUiO2/gAN+B180HUStgdpTTx99u/zmhibVYxZMuOuIcp/DU2QBm3iaDli6h+7QZ1jm8tfmnkwYXM0cGWO8akcQXmRstogQSiaekGFBjx6v+Jvp6C3x+mMjfkMiMQXPpP1ACFFpSJBkd9F5YgOfmgM/l2DX14CLoK08FVEQij2no25JTuB010MI3U5u529+XWS8/hK6FNtAcLGRRT4TIKblQHavgBM+qqCpFA8RvfdnOjn7kJofxmLv5kwuquuQcQZzKWgSiFNarktsSZtrKSr+MXb+DdkMxBxi3N/pEaFzFH5x2f563pz2ZyBucChpe6Jp34NYXU7YXWY/HobG8KhqWx1SZ04yqzN7mHE95jZLhh5pK7XkFxik5tqCYlYHOzH1ocOz2Y6Z38d8BUddJ7z/pLrQglaXQMLFBB427A03ZELZD/M/aumzoS64FeBpsBnfH2eQ8BBe3/nokgtf41C0wRy7grAkMhNNKY0hh+s3/6ipRytcFvx1r2RQ== X-Forefront-Antispam-Report: CIP:216.228.112.36; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:schybrid05.nvidia.com; CAT:NONE; SFS:(4636009)(46966006)(36840700001)(36860700001)(5660300002)(30864003)(8676002)(6666004)(316002)(336012)(33716001)(86362001)(53546011)(508600001)(7636003)(8936002)(426003)(70586007)(6916009)(47076005)(82310400003)(83380400001)(186003)(2906002)(70206006)(26005)(9686003)(356005); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 05 Oct 2021 14:45:37.1186 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 6d4d436c-45a3-40dd-f674-08d9880ecb59 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.112.36]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT006.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR12MB4139 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.29 Subject: [dpdk-dev] [RFC PATCH] gpu/cuda X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" >From 8b603e6a9aa7bf1c5e89bad49f0f4a1b902dd299 Mon Sep 17 00:00:00 2001 From: Elena Agostini Date: Wed, 15 Sep 2021 17:04:53 +0200 Subject: [PATCH] gpu/cuda: introduce CUDA driver This is the CUDA implementation of the gpudev library. Funcitonalities implemented through CUDA Driver API are: - Device probe and remove - Manage device memory allocations - Register/unregister external CPU memory in the device memory area Signed-off-by: Elena Agostini --- drivers/gpu/cuda/cuda.c | 751 +++++++++++++++++++++++++++++++++++ drivers/gpu/cuda/meson.build | 30 ++ drivers/gpu/cuda/version.map | 3 + drivers/gpu/meson.build | 2 +- 4 files changed, 785 insertions(+), 1 deletion(-) create mode 100644 drivers/gpu/cuda/cuda.c create mode 100644 drivers/gpu/cuda/meson.build create mode 100644 drivers/gpu/cuda/version.map diff --git a/drivers/gpu/cuda/cuda.c b/drivers/gpu/cuda/cuda.c new file mode 100644 index 0000000000..202f0a0c0c --- /dev/null +++ b/drivers/gpu/cuda/cuda.c @@ -0,0 +1,751 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright (c) 2021 NVIDIA Corporation & Affiliates +*/ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +/* NVIDIA GPU vendor */ +#define NVIDIA_GPU_VENDOR_ID (0x10de) + +/* NVIDIA GPU device IDs */ +#define NVIDIA_GPU_A100_40GB_DEVICE_ID (0x20f1) +#define NVIDIA_GPU_V100_32GB_DEVICE_ID (0x1db6) + +#define CUDA_MAX_ALLOCATION_NUM 512 + +#define GPU_PAGE_SHIFT 16 +#define GPU_PAGE_SIZE (1UL << GPU_PAGE_SHIFT) + +RTE_LOG_REGISTER_DEFAULT(gpu_logtype, NOTICE); + +/** Helper macro for logging */ +#define rte_gpu_log(level, fmt, ...) \ + rte_log(RTE_LOG_ ## level, gpu_logtype, fmt "\n", ##__VA_ARGS__) + +#define rte_gpu_log_debug(fmt, ...) \ + rte_gpu_log(DEBUG, RTE_STR(__LINE__) ":%s() " fmt, __func__, \ + ##__VA_ARGS__) + +/* NVIDIA GPU address map */ +static struct rte_pci_id pci_id_cuda_map[] = { + { + RTE_PCI_DEVICE(NVIDIA_GPU_VENDOR_ID, + NVIDIA_GPU_A100_40GB_DEVICE_ID) + }, + { + RTE_PCI_DEVICE(NVIDIA_GPU_VENDOR_ID, + NVIDIA_GPU_V100_32GB_DEVICE_ID) + }, + /* {.device_id = 0}, ?? */ +}; + +/* Device private info */ +struct cuda_info { + char gpu_name[RTE_DEV_NAME_MAX_LEN]; + CUdevice cu_dev; +}; + +/* Type of memory allocated by CUDA driver */ +enum mem_type { + GPU_MEM = 0, + CPU_REGISTERED, + GPU_REGISTERED /* Not used yet */ +}; + +/* key associated to a memory address */ +typedef uintptr_t ptr_key; + +/* Single entry of the memory list */ +struct mem_entry { + CUdeviceptr ptr_d; + void * ptr_h; + size_t size; + struct rte_gpu *dev; + CUcontext ctx; + ptr_key pkey; + enum mem_type mtype; + struct mem_entry * prev; + struct mem_entry * next; +}; + +struct mem_entry * mem_alloc_list_head = NULL; +struct mem_entry * mem_alloc_list_tail = NULL; +uint32_t mem_alloc_list_last_elem = 0; + +/* Generate a key from a memory pointer */ +static ptr_key +get_hash_from_ptr(void * ptr) +{ + return (uintptr_t) ptr; +} + +static uint32_t +mem_list_count_item(void) +{ + return mem_alloc_list_last_elem; +} + +/* Initiate list of memory allocations if not done yet */ +static struct mem_entry * +mem_list_add_item(void) +{ + /* Initiate list of memory allocations if not done yet */ + if(mem_alloc_list_head == NULL) + { + mem_alloc_list_head = rte_zmalloc(NULL, sizeof(struct mem_entry), RTE_CACHE_LINE_SIZE); + if (mem_alloc_list_head == NULL) { + rte_gpu_log(ERR, "Failed to allocate memory for memory list.\n"); + return NULL; + } + + mem_alloc_list_head->next = NULL; + mem_alloc_list_head->prev = NULL; + mem_alloc_list_tail = mem_alloc_list_head; + } + else + { + struct mem_entry * mem_alloc_list_cur = rte_zmalloc(NULL, sizeof(struct mem_entry), RTE_CACHE_LINE_SIZE); + if (mem_alloc_list_cur == NULL) { + rte_gpu_log(ERR, "Failed to allocate memory for memory list.\n"); + return NULL; + } + + mem_alloc_list_tail->next = mem_alloc_list_cur; + mem_alloc_list_cur->prev = mem_alloc_list_tail; + mem_alloc_list_tail = mem_alloc_list_tail->next; + mem_alloc_list_tail->next = NULL; + } + + mem_alloc_list_last_elem++; + + return mem_alloc_list_tail; +} + +static struct mem_entry * +mem_list_find_item(ptr_key pk) +{ + struct mem_entry * mem_alloc_list_cur = NULL; + + if( mem_alloc_list_head == NULL ) + { + rte_gpu_log(ERR, "Memory list doesn't exist\n"); + return NULL; + } + + if(mem_list_count_item() == 0) + { + rte_gpu_log(ERR, "No items in memory list\n"); + return NULL; + } + + mem_alloc_list_cur = mem_alloc_list_head; + + while(mem_alloc_list_cur != NULL) + { + if(mem_alloc_list_cur->pkey == pk) + return mem_alloc_list_cur; + mem_alloc_list_cur = mem_alloc_list_cur->next; + } + + return mem_alloc_list_cur; +} + +static int +mem_list_del_item(ptr_key pk) +{ + struct mem_entry * mem_alloc_list_cur = NULL; + + mem_alloc_list_cur = mem_list_find_item(pk); + if(mem_alloc_list_cur == NULL) + return -EINVAL; + + /* if key is in head */ + if(mem_alloc_list_cur->prev == NULL) + mem_alloc_list_head = mem_alloc_list_cur->next; + else + { + mem_alloc_list_cur->prev->next = mem_alloc_list_cur->next; + if(mem_alloc_list_cur->next != NULL) + mem_alloc_list_cur->next->prev = mem_alloc_list_cur->prev; + } + + rte_free(mem_alloc_list_cur); + + mem_alloc_list_last_elem--; + + return 0; +} + +static int +cuda_dev_info_get(struct rte_gpu *dev, struct rte_gpu_info *info) +{ + int ret = 0; + CUresult res; + struct rte_gpu_info parent_info; + CUexecAffinityParam affinityPrm; + const char * err_string; + struct cuda_info * private; + CUcontext current_ctx; + CUcontext input_ctx; + + if(dev == NULL) + return -EINVAL; + + /* Child initialization time probably called by rte_gpu_add_child() */ + if( + dev->mpshared->info.parent != RTE_GPU_ID_NONE && + dev->mpshared->dev_private == NULL + ) + { + /* Store current ctx */ + res = cuCtxGetCurrent(¤t_ctx); + if(CUDA_SUCCESS != res) + { + cuGetErrorString(res, &(err_string)); + rte_gpu_log(ERR, "cuCtxGetCurrent failed with %s.\n", err_string); + + return -1; + } + + /* Set child ctx as current ctx */ + input_ctx = (CUcontext)dev->mpshared->info.context; + res = cuCtxSetCurrent(input_ctx); + if(CUDA_SUCCESS != res) + { + cuGetErrorString(res, &(err_string)); + rte_gpu_log(ERR, "cuda_dev_info_get cuCtxSetCurrent input failed with %s.\n", err_string); + + return -1; + } + + /* + * Ctx capacity info + */ + + /* MPS compatible */ + res = cuCtxGetExecAffinity(&affinityPrm, CU_EXEC_AFFINITY_TYPE_SM_COUNT); + if(CUDA_SUCCESS != res) + { + cuGetErrorString(res, &(err_string)); + rte_gpu_log(ERR, "cuCtxGetExecAffinity failed with %s.\n", err_string); + } + dev->mpshared->info.processor_count = (uint32_t)affinityPrm.param.smCount.val; + + ret = rte_gpu_info_get(dev->mpshared->info.parent, &parent_info); + if (ret) + return -ENODEV; + dev->mpshared->info.total_memory = parent_info.total_memory; + + /* + * GPU Device private info + */ + dev->mpshared->dev_private = rte_zmalloc(NULL, sizeof(struct cuda_info), RTE_CACHE_LINE_SIZE); + if (dev->mpshared->dev_private == NULL) { + rte_gpu_log(ERR, "Failed to allocate memory for GPU process private.\n"); + + return -1; + } + + private = (struct cuda_info *)dev->mpshared->dev_private; + + res = cuCtxGetDevice(&(private->cu_dev)); + if(CUDA_SUCCESS != res) + { + cuGetErrorString(res, &(err_string)); + rte_gpu_log(ERR, "cuCtxGetDevice failed with %s.\n", err_string); + + return -1; + } + + res = cuDeviceGetName(private->gpu_name, RTE_DEV_NAME_MAX_LEN, private->cu_dev); + if(CUDA_SUCCESS != res) + { + cuGetErrorString(res, &(err_string)); + rte_gpu_log(ERR, "cuDeviceGetName failed with %s.\n", err_string); + + return -1; + } + + /* Restore original ctx as current ctx */ + res = cuCtxSetCurrent(current_ctx); + if(CUDA_SUCCESS != res) + { + cuGetErrorString(res, &(err_string)); + rte_gpu_log(ERR, "cuda_dev_info_get cuCtxSetCurrent current failed with %s.\n", err_string); + + return -1; + } + } + + *info = dev->mpshared->info; + + return 0; +} + +/* + * GPU Memory + */ + +static int +cuda_mem_alloc(struct rte_gpu * dev, size_t size, void ** ptr) +{ + CUresult res; + const char * err_string; + CUcontext current_ctx; + CUcontext input_ctx; + unsigned int flag = 1; + + if(dev == NULL || size == 0) + return -EINVAL; + + /* Store current ctx */ + res = cuCtxGetCurrent(¤t_ctx); + if(CUDA_SUCCESS != res) + { + cuGetErrorString(res, &(err_string)); + rte_gpu_log(ERR, "cuCtxGetCurrent failed with %s.\n", err_string); + + return -1; + } + + /* Set child ctx as current ctx */ + input_ctx = (CUcontext)dev->mpshared->info.context; + res = cuCtxSetCurrent(input_ctx); + if(CUDA_SUCCESS != res) + { + cuGetErrorString(res, &(err_string)); + rte_gpu_log(ERR, "cuda_mem_alloc cuCtxSetCurrent input failed with %s.\n", err_string); + + return -1; + } + + /* Get next memory list item */ + mem_alloc_list_tail = mem_list_add_item(); + if(mem_alloc_list_tail == NULL) + return -ENOMEM; + + /* Allocate memory */ + mem_alloc_list_tail->size = size; + res = cuMemAlloc(&(mem_alloc_list_tail->ptr_d), mem_alloc_list_tail->size); + if (CUDA_SUCCESS != res) { + cuGetErrorString(res, &(err_string)); + rte_gpu_log(ERR, "cuda_mem_alloc cuCtxSetCurrent current failed with %s.\n", err_string); + + return -1; + } + + /* GPUDirect RDMA attribute required */ + res = cuPointerSetAttribute(&flag, CU_POINTER_ATTRIBUTE_SYNC_MEMOPS, mem_alloc_list_tail->ptr_d); + if (CUDA_SUCCESS != res) { + rte_gpu_log(ERR, "Could not set SYNC MEMOP attribute for GPU memory at %llx , err %d\n", mem_alloc_list_tail->ptr_d, res); + return -1; + } + + mem_alloc_list_tail->pkey = get_hash_from_ptr((void *) mem_alloc_list_tail->ptr_d); + mem_alloc_list_tail->ptr_h = NULL; + mem_alloc_list_tail->size = size; + mem_alloc_list_tail->dev = dev; + mem_alloc_list_tail->ctx = (CUcontext)dev->mpshared->info.context; + mem_alloc_list_tail->mtype = GPU_MEM; + + /* Restore original ctx as current ctx */ + res = cuCtxSetCurrent(current_ctx); + if(CUDA_SUCCESS != res) + { + cuGetErrorString(res, &(err_string)); + rte_gpu_log(ERR, "cuda_mem_alloc cuCtxSetCurrent current failed with %s.\n", err_string); + + return -1; + } + + *ptr = (void*) mem_alloc_list_tail->ptr_d; + + return 0; +} + +static int +cuda_mem_register(struct rte_gpu * dev, size_t size, void * ptr) +{ + CUresult res; + const char * err_string; + CUcontext current_ctx; + CUcontext input_ctx; + unsigned int flag = 1; + int use_ptr_h = 0; + + if(dev == NULL || size == 0 || ptr == NULL) + return -EINVAL; + + /* Store current ctx */ + res = cuCtxGetCurrent(¤t_ctx); + if(CUDA_SUCCESS != res) + { + cuGetErrorString(res, &(err_string)); + rte_gpu_log(ERR, "cuCtxGetCurrent failed with %s.\n", err_string); + + return -1; + } + + /* Set child ctx as current ctx */ + input_ctx = (CUcontext)dev->mpshared->info.context; + res = cuCtxSetCurrent(input_ctx); + if(CUDA_SUCCESS != res) + { + cuGetErrorString(res, &(err_string)); + rte_gpu_log(ERR, "cuda_mem_register cuCtxSetCurrent input failed with %s.\n", err_string); + + return -1; + } + + /* Get next memory list item */ + mem_alloc_list_tail = mem_list_add_item(); + if(mem_alloc_list_tail == NULL) + return -ENOMEM; + + /* Allocate memory */ + mem_alloc_list_tail->size = size; + mem_alloc_list_tail->ptr_h = ptr; + + res = cuMemHostRegister(mem_alloc_list_tail->ptr_h, mem_alloc_list_tail->size, CU_MEMHOSTREGISTER_PORTABLE | CU_MEMHOSTREGISTER_DEVICEMAP); + if (CUDA_SUCCESS != res) { + cuGetErrorString(res, &(err_string)); + rte_gpu_log(ERR, "cuda_mem_register cuMemHostRegister failed with %s ptr %p size %zd.\n", + err_string, mem_alloc_list_tail->ptr_h, mem_alloc_list_tail->size + ); + + return -1; + } + + res = cuDeviceGetAttribute(&(use_ptr_h), + CU_DEVICE_ATTRIBUTE_CAN_USE_HOST_POINTER_FOR_REGISTERED_MEM, + ((struct cuda_info *)(dev->mpshared->dev_private))->cu_dev + ); + if(CUDA_SUCCESS != res) + { + cuGetErrorString(res, &(err_string)); + rte_gpu_log(ERR, "cuDeviceGetAttribute failed with %s.\n", + err_string + ); + + return -1; + } + + if(use_ptr_h == 0) + { + res = cuMemHostGetDevicePointer(&(mem_alloc_list_tail->ptr_d), mem_alloc_list_tail->ptr_h, 0); + if (CUDA_SUCCESS != res) { + cuGetErrorString(res, &(err_string)); + rte_gpu_log(ERR, "cuMemHostGetDevicePointer failed with %s.\n", err_string); + + return -1; + } + + if((uintptr_t) mem_alloc_list_tail->ptr_d != (uintptr_t) mem_alloc_list_tail->ptr_h) + { + rte_gpu_log(ERR, "Host input pointer is different wrt GPU registered pointer\n"); + return -1; + } + } + else + mem_alloc_list_tail->ptr_d = (CUdeviceptr) mem_alloc_list_tail->ptr_h; + + /* GPUDirect RDMA attribute required */ + res = cuPointerSetAttribute(&flag, CU_POINTER_ATTRIBUTE_SYNC_MEMOPS, mem_alloc_list_tail->ptr_d); + if (CUDA_SUCCESS != res) { + rte_gpu_log(ERR, "Could not set SYNC MEMOP attribute for GPU memory at %llx , err %d\n", mem_alloc_list_tail->ptr_d, res); + return -1; + } + + mem_alloc_list_tail->pkey = get_hash_from_ptr((void *) mem_alloc_list_tail->ptr_h); + mem_alloc_list_tail->size = size; + mem_alloc_list_tail->dev = dev; + mem_alloc_list_tail->ctx = (CUcontext)dev->mpshared->info.context; + mem_alloc_list_tail->mtype = CPU_REGISTERED; + + /* Restore original ctx as current ctx */ + res = cuCtxSetCurrent(current_ctx); + if(CUDA_SUCCESS != res) + { + cuGetErrorString(res, &(err_string)); + rte_gpu_log(ERR, "cuda_mem_register cuCtxSetCurrent current failed with %s.\n", err_string); + + return -1; + } + + return 0; +} + +static int +cuda_mem_free(struct rte_gpu * dev, void * ptr) +{ + CUresult res; + struct mem_entry * mem_item; + const char * err_string; + ptr_key hk; + + if(dev == NULL || ptr == NULL) + return -EINVAL; + + hk = get_hash_from_ptr((void *) ptr); + + mem_item = mem_list_find_item(hk); + if(mem_item == NULL) + { + rte_gpu_log(ERR, "Memory address 0x%p not found in driver memory\n", ptr); + return -1; + } + + if(mem_item->mtype == GPU_MEM) + { + res = cuMemFree(mem_item->ptr_d); + if(CUDA_SUCCESS != res) + { + cuGetErrorString(res, &(err_string)); + rte_gpu_log(ERR, "cuMemFree current failed with %s.\n", err_string); + + return -1; + } + + return mem_list_del_item(hk); + + } + else + { + rte_gpu_log(ERR, "Memory type %d not supported\n", mem_item->mtype); + return -1; + } + + return 0; +} + +static int +cuda_mem_unregister(struct rte_gpu * dev, void * ptr) +{ + CUresult res; + struct mem_entry * mem_item; + const char * err_string; + ptr_key hk; + + if(dev == NULL || ptr == NULL) + return -EINVAL; + + hk = get_hash_from_ptr((void *) ptr); + + mem_item = mem_list_find_item(hk); + if(mem_item == NULL) + { + rte_gpu_log(ERR, "Memory address 0x%p not nd in driver memory\n", ptr); + return -1; + } + + if(mem_item->mtype == CPU_REGISTERED) + { + res = cuMemHostUnregister(ptr); + if(CUDA_SUCCESS != res) + { + cuGetErrorString(res, &(err_string)); + rte_gpu_log(ERR, "cuMemHostUnregister current failed with %s.\n", err_string); + + return -1; + } + + return mem_list_del_item(hk); + } + else + { + rte_gpu_log(ERR, "Memory type %d not supported\n", mem_item->mtype); + return -1; + } + + return 0; +} + +static int +cuda_dev_close(struct rte_gpu * dev) +{ + if (dev == NULL) + return -EINVAL; + + rte_free(dev->mpshared->dev_private); + + return 0; +} + +static int +cuda_gpu_probe(__rte_unused struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev) +{ + struct rte_gpu *dev = NULL; + CUresult res; + CUdevice cu_dev_id; + CUcontext pctx; + char dev_name[RTE_DEV_NAME_MAX_LEN]; + const char * err_string; + int processor_count = 0; + struct cuda_info * private; + + if (pci_dev == NULL) { + rte_gpu_log(ERR, "NULL PCI device"); + return -EINVAL; + } + + rte_pci_device_name(&pci_dev->addr, dev_name, sizeof(dev_name)); + + /* Allocate memory to be used privately by drivers */ + dev = rte_gpu_allocate(pci_dev->device.name); + if (dev == NULL) + return -ENODEV; + + /* Fill HW specific part of device structure */ + dev->device = &pci_dev->device; + dev->mpshared->info.numa_node = pci_dev->device.numa_node; + + /* + * GPU Device init + */ + + /* + * Required to initialize the CUDA Driver. + * Multiple calls of cuInit() will return immediately + * without making any relevant change + */ + cuInit(0); + + /* Get NVIDIA GPU Device descriptor */ + res = cuDeviceGetByPCIBusId(&cu_dev_id, dev->device->name); + if(CUDA_SUCCESS != res) + { + cuGetErrorString(res, &(err_string)); + rte_gpu_log(ERR, "cuDeviceGetByPCIBusId name %s failed with %d: %s.\n", + dev->device->name, res, err_string + ); + + return -1; + } + + res = cuDevicePrimaryCtxRetain(&pctx, cu_dev_id); + if(CUDA_SUCCESS != res) + { + cuGetErrorString(res, &(err_string)); + rte_gpu_log(ERR, "cuDevicePrimaryCtxRetain name %s failed with %d: %s.\n", + dev->device->name, res, err_string + ); + + return -1; + } + + dev->mpshared->info.context = (uint64_t) pctx; + + /* + * GPU Device generic info + */ + + /* Processor count */ + res = cuDeviceGetAttribute(&(processor_count), CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT, cu_dev_id); + if(CUDA_SUCCESS != res) + { + cuGetErrorString(res, &(err_string)); + rte_gpu_log(ERR, "cuDeviceGetAttribute failed with %s.\n", + err_string + ); + + return -1; + } + dev->mpshared->info.processor_count = (uint32_t)processor_count; + + /* Total memory */ + res = cuDeviceTotalMem(&dev->mpshared->info.total_memory, cu_dev_id); + if(CUDA_SUCCESS != res) + { + cuGetErrorString(res, &(err_string)); + rte_gpu_log(ERR, "cuDeviceTotalMem failed with %s.\n", + err_string + ); + + return -1; + } + + /* + * GPU Device private info + */ + dev->mpshared->dev_private = rte_zmalloc(NULL, sizeof(struct cuda_info), RTE_CACHE_LINE_SIZE); + if (dev->mpshared->dev_private == NULL) { + rte_gpu_log(ERR, "Failed to allocate memory for GPU process private.\n"); + + return -1; + } + + private = (struct cuda_info *)dev->mpshared->dev_private; + private->cu_dev = cu_dev_id; + res = cuDeviceGetName(private->gpu_name, RTE_DEV_NAME_MAX_LEN, cu_dev_id); + if(CUDA_SUCCESS != res) + { + cuGetErrorString(res, &(err_string)); + rte_gpu_log(ERR, "cuDeviceGetName failed with %s.\n", err_string); + + return -1; + } + + dev->ops.mem_alloc = cuda_mem_alloc; + dev->ops.mem_free = cuda_mem_free; + dev->ops.mem_register = cuda_mem_register; + dev->ops.mem_unregister = cuda_mem_unregister; + dev->ops.dev_info_get = cuda_dev_info_get; + dev->ops.dev_close = cuda_dev_close; + + rte_gpu_complete_new(dev); + + rte_gpu_log_debug("dev id = %u name = %s\n", dev->mpshared->info.dev_id, private->gpu_name); + + return 0; +} + +static int +cuda_gpu_remove(struct rte_pci_device *pci_dev) +{ + struct rte_gpu *dev; + int ret; + uint8_t gpu_id; + + if (pci_dev == NULL) + return -EINVAL; + + dev = rte_gpu_get_by_name(pci_dev->device.name); + if (dev == NULL) { + rte_gpu_log(ERR, + "Couldn't find HW dev \"%s\" to uninitialise it", + pci_dev->device.name); + return -ENODEV; + } + gpu_id = dev->mpshared->info.dev_id; + + /* release dev from library */ + ret = rte_gpu_release(dev); + if (ret) + rte_gpu_log(ERR, "Device %i failed to uninit: %i", gpu_id, ret); + + rte_gpu_log_debug("Destroyed dev = %u", gpu_id); + + return 0; +} + +static struct rte_pci_driver rte_cuda_driver = { + .id_table = pci_id_cuda_map, + .drv_flags = RTE_PCI_DRV_WC_ACTIVATE, + .probe = cuda_gpu_probe, + .remove = cuda_gpu_remove, +}; + +RTE_PMD_REGISTER_PCI(gpu_cuda, rte_cuda_driver); +RTE_PMD_REGISTER_PCI_TABLE(gpu_cuda, pci_id_cuda_map); +RTE_PMD_REGISTER_KMOD_DEP(gpu_cuda, "* nvidia & (nv_peer_mem | nvpeer_mem)"); + diff --git a/drivers/gpu/cuda/meson.build b/drivers/gpu/cuda/meson.build new file mode 100644 index 0000000000..53e40e6832 --- /dev/null +++ b/drivers/gpu/cuda/meson.build @@ -0,0 +1,30 @@ +# SPDX-License-Identifier: BSD-3-Clause +# Copyright (c) 2021 NVIDIA Corporation & Affiliates + +if not is_linux + build = false + reason = 'only supported on Linux' +endif + +# cuda_dep = dependency('cuda-11.1', required: true, version : '>=11.1', method: 'pkg-config') +# if not cuda_dep.found() +# build = false +# reason = 'missing dependency, "CUDA"' +# subdir_done() +# endif +# ext_deps += cuda_dep + +cuda_dep = dependency('cuda', version : '>=11', modules: ['cuda']) +ext_deps += cuda_dep + +# cudart_dep = dependency('cudart-11.1', required: true, version : '>=11.1', method: 'pkg-config') +# if not cudart_dep.found() +# build = false +# reason = 'missing dependency, "CUDA RT"' +# subdir_done() +# endif +# ext_deps += cudart_dep + +deps += ['gpudev','pci','bus_pci', 'hash'] +sources = files('cuda.c') +# headers = files('header.h') diff --git a/drivers/gpu/cuda/version.map b/drivers/gpu/cuda/version.map new file mode 100644 index 0000000000..4a76d1d52d --- /dev/null +++ b/drivers/gpu/cuda/version.map @@ -0,0 +1,3 @@ +DPDK_21 { + local: *; +}; diff --git a/drivers/gpu/meson.build b/drivers/gpu/meson.build index e51ad3381b..601bedcd61 100644 --- a/drivers/gpu/meson.build +++ b/drivers/gpu/meson.build @@ -1,4 +1,4 @@ # SPDX-License-Identifier: BSD-3-Clause # Copyright (c) 2021 NVIDIA Corporation & Affiliates -drivers = [] +drivers = [ 'cuda' ] -- 2.21.1 (Apple Git-122.3)