From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 6C10F48A99; Tue, 4 Nov 2025 05:23:08 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id B41C040430; Tue, 4 Nov 2025 05:23:00 +0100 (CET) Received: from mx0b-00154904.pphosted.com (mx0b-00154904.pphosted.com [148.163.137.20]) by mails.dpdk.org (Postfix) with ESMTP id 707A740651 for ; Tue, 4 Nov 2025 05:22:59 +0100 (CET) Received: from pps.filterd (m0170398.ppops.net [127.0.0.1]) by mx0b-00154904.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5A3Gvven001677 for ; Mon, 3 Nov 2025 23:22:58 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dell.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=smtpout1; bh=1 pcYwdQgWO1SR4BniSKKcJgPDpvL5n80AZYS0/Q/gPo=; b=vYziAy50GiObANrf3 JSe2DMsJIrNtV9ccsPamSv0j6HwIcV9kOgAqhUMekEibwQ21+3VbIb/6GKNz5nHG sLiunrq22bAg/JXhNBU41o50/e7SVhsCdD2341gywj1FCi48iA4sXdjrDFJOxyuQ Oa/qWAT2hiohJUdFU8ZUEPU1q2Cxr4iTFR20d3WGP+CKBphvyCMo2PF0L6gzrgh5 WPEdl7a6bwADW8g5DWGtIfEOE/BtlctUhorh5kCRJR4q8ZHMTMVCEx4kxXdQoCVX Lka26dOIxpcQE92w9HMHcaBFlecsSkJ6iY3WGms/BMx7+ohLm1+iuxcEG5mNY2NV pXIPQ== Received: from mx0b-00154901.pphosted.com (mx0b-00154901.pphosted.com [67.231.157.37]) by mx0b-00154904.pphosted.com (PPS) with ESMTPS id 4a5de8axuq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Mon, 03 Nov 2025 23:22:58 -0500 (EST) Received: from pps.filterd (m0144103.ppops.net [127.0.0.1]) by mx0b-00154901.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5A42pg2m019214 for ; Mon, 3 Nov 2025 23:22:45 -0500 Received: from esapsmtpat09.us.dell.com (esapsmtpat09.us.dell.com [143.166.211.151]) by mx0b-00154901.pphosted.com (PPS) with ESMTPS id 4a74uuu91t-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Mon, 03 Nov 2025 23:22:45 -0500 (EST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dell.com; i=@dell.com; q=dns/txt; s=smtpdev1; t=1762230165; x=1793766165; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=1pcYwdQgWO1SR4BniSKKcJgPDpvL5n80AZYS0/Q/gPo=; b=em4XWSExtNId0LMsqkLOf79kFv1mf6dFmwNqfI2cBmEjH8kjolAYwky0 MHZtieBtlB6WP2GZiNhvadTIufb8RkVY0CgHKLAD0SZSdl/y8au8rx6G1 flcnszPJv+AZ9EfQKFZW998nGfHH+lM/fs1+4izMfmbvHV+AJyQ77sXAl GPxyQzaQ5fDDYX+tsY2lTSl4PEQSM6VxYrdGbJSmORNm2kFRPf81pJVli 1dx4wD9GYcUPrEYVjc6LGiCvmA1qF+qpTicA/JrVwxyakNxVxeBmQDfb3 zUQ4WL6t3XpXJX2+pyQzfmeJqmbKf2tkDBLsSSy9+z17elk984IIPIBgk w==; X-CSE-ConnectionGUID: l/Qj71MlRTahhh1U5P0G7g== X-CSE-MsgGUID: 7x5kmuhLQEKJl64GxZEOYw== X-LoopCount0: from 10.215.64.31 X-MS-Exchange-CrossPremises-AuthAs: Internal Received: from mx2-adc.dell.com (HELO ieorr5-pvwed002.delllabs.net) ([10.215.64.31]) by esapsmtpat09.us.dell.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2025 04:22:45 +0000 Received: from ieorr5-pvwer002.delllabs.net (100.64.0.135) by ieorr5-pvwed002.delllabs.net (10.215.64.31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.36; Mon, 3 Nov 2025 22:23:34 -0600 Received: from ieorr5-pvwer001.delllabs.net (100.64.0.134) by ieorr5-pvwer002.delllabs.net (100.64.0.135) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.36; Mon, 3 Nov 2025 22:22:44 -0600 Received: from localhost (10.240.17.25) by smtprelay.delllabs.net (100.64.0.134) with Microsoft SMTP Server id 15.2.1544.36 via Frontend Transport; Mon, 3 Nov 2025 22:22:43 -0600 From: Pravin M Bathija To: CC: , Subject: [PATCH v3 3/5] vhost_user: Function defs for add/rem mem regions Date: Tue, 4 Nov 2025 04:21:39 +0000 Message-ID: <20251104042142.2787631-4-pravin.bathija@dell.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251104042142.2787631-1-pravin.bathija@dell.com> References: <20251104042142.2787631-1-pravin.bathija@dell.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-11-03_06,2025-11-03_03,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 impostorscore=0 bulkscore=0 priorityscore=1501 spamscore=0 malwarescore=0 lowpriorityscore=0 phishscore=0 suspectscore=0 adultscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2510240000 definitions=main-2511030195 X-Proofpoint-GUID: qeHnfYlM0Cx-H0vyqoJC9b-w2t0JpvSe X-Proofpoint-ORIG-GUID: qeHnfYlM0Cx-H0vyqoJC9b-w2t0JpvSe X-Authority-Analysis: v=2.4 cv=aYtsXBot c=1 sm=1 tr=0 ts=69097fa2 cx=c_pps a=Z2e5DKjA+8LiMDv5v6mwwA==:117 a=Vd41IKOgs+CGE2+bDnhSdQ==:17 a=6UeiqGixMTsA:10 a=ke5jqHz-1hQA:10 a=VkNPw1HP01LnGYTKEx00:22 a=iLNU1ar6AAAA:8 a=S85rRORDN14OaA5aj9oA:9 a=hlJyneSgMmFPbskH-t2w:22 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMTAxMDA1NyBTYWx0ZWRfXzWuMrbgPiNd0 5yGOWms8y+YOywwUDdHvfGbegZAeiwqDse0ym2O8qdrX5WgASZgkNHrAH6AGC/6cn0bUYH4v8JX lshICIqI5unbcle1UoEjp83BJa32zIL7aofS++dArzGM6MXaLCsVbwGSMzm65z/Ub/xKUVEs6Dl Hw/m7okQgRG5Sd+bDrjFKnSpYLruG/RiKle5JXeXBOr5DOMh8vEoA0bHaHNLy7wOY1nc89qHOcv RKbHLziFwLlETNmu3T6ub8dOef7HMMb/SvSue0cueTiWSyozrAXNZMAj7nXNF3PqG8hC0XN6um+ V06h1f4snf7FjWa/R5VXNrXl7Rmc7FD0w5b1Tk4BJCept0CjMfKX34UP4gEyYtdWsOmydTebAsD 2oWdwxa6YPQNRqFsnS+d2WJSXNLLdQ== X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 priorityscore=1501 bulkscore=0 impostorscore=0 suspectscore=0 lowpriorityscore=0 malwarescore=0 clxscore=1015 phishscore=0 spamscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2510240000 definitions=main-2511010057 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org These changes cover the function definition for add/remove memory region calls which are invoked on receiving vhost user message from vhost user front-end (e.g. Qemu). The vhost-user front end sw sends an add memory region message which includes the guest physical address range(GPA), memory size, front-end's virtual address and offset within tha file-descriptor mapping. The back-end(dpdk) uses this information to create a mapping within it's own address space using mmap on the fd + offset. This added memory region serves as sshared memory to pass data back and forth between vhost-user front and back ends. Similarly in the case of remove memory region, the said memory region is unmapped from the back-end(dpdk). In our case, in addition to testing with qemu front-end, the testing has also been performed with libblkio front-end and spdk/dpdk back-end. We did I/O using libblkio based device driver, to spdk based drives. There are also changes for set mem table and new definition for get memory slots. The message vhost set memory table essentially is how the vhost-user front-end (qemu or libblkio) tells vhost-user back-end (dpdk) about all of it's guest memory regions. This allows the back-end to translate guest physical addresses to back-end virtual addresses and perform direct I/O to guest memory. Our changes optimize the set memory table call to use common support functions. Message get memory slots is how the vhost-user front-end queries the vhost-user back-end about the number of memory slots available to be registered by the back-end. Signed-off-by: Pravin M Bathija --- lib/vhost/vhost_user.c | 253 +++++++++++++++++++++++++++++++++++------ 1 file changed, 221 insertions(+), 32 deletions(-) diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c index 4bfb13fb98..168432e7d1 100644 --- a/lib/vhost/vhost_user.c +++ b/lib/vhost/vhost_user.c @@ -71,6 +71,9 @@ VHOST_MESSAGE_HANDLER(VHOST_USER_SET_FEATURES, vhost_user_set_features, false, t VHOST_MESSAGE_HANDLER(VHOST_USER_SET_OWNER, vhost_user_set_owner, false, true) \ VHOST_MESSAGE_HANDLER(VHOST_USER_RESET_OWNER, vhost_user_reset_owner, false, false) \ VHOST_MESSAGE_HANDLER(VHOST_USER_SET_MEM_TABLE, vhost_user_set_mem_table, true, true) \ +VHOST_MESSAGE_HANDLER(VHOST_USER_GET_MAX_MEM_SLOTS, vhost_user_get_max_mem_slots, false, false) \ +VHOST_MESSAGE_HANDLER(VHOST_USER_ADD_MEM_REG, vhost_user_add_mem_reg, true, true) \ +VHOST_MESSAGE_HANDLER(VHOST_USER_REM_MEM_REG, vhost_user_rem_mem_reg, false, true) \ VHOST_MESSAGE_HANDLER(VHOST_USER_SET_LOG_BASE, vhost_user_set_log_base, true, true) \ VHOST_MESSAGE_HANDLER(VHOST_USER_SET_LOG_FD, vhost_user_set_log_fd, true, true) \ VHOST_MESSAGE_HANDLER(VHOST_USER_SET_VRING_NUM, vhost_user_set_vring_num, false, true) \ @@ -1390,7 +1393,6 @@ vhost_user_set_mem_table(struct virtio_net **pdev, struct virtio_net *dev = *pdev; struct VhostUserMemory *memory = &ctx->msg.payload.memory; struct rte_vhost_mem_region *reg; - int numa_node = SOCKET_ID_ANY; uint64_t mmap_offset; uint32_t i; bool async_notify = false; @@ -1435,39 +1437,13 @@ vhost_user_set_mem_table(struct virtio_net **pdev, if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) vhost_user_iotlb_flush_all(dev); - free_mem_region(dev); + free_all_mem_regions(dev); rte_free(dev->mem); dev->mem = NULL; } - /* - * If VQ 0 has already been allocated, try to allocate on the same - * NUMA node. It can be reallocated later in numa_realloc(). - */ - if (dev->nr_vring > 0) - numa_node = dev->virtqueue[0]->numa_node; - - dev->nr_guest_pages = 0; - if (dev->guest_pages == NULL) { - dev->max_guest_pages = 8; - dev->guest_pages = rte_zmalloc_socket(NULL, - dev->max_guest_pages * - sizeof(struct guest_page), - RTE_CACHE_LINE_SIZE, - numa_node); - if (dev->guest_pages == NULL) { - VHOST_CONFIG_LOG(dev->ifname, ERR, - "failed to allocate memory for dev->guest_pages"); - goto close_msg_fds; - } - } - - dev->mem = rte_zmalloc_socket("vhost-mem-table", sizeof(struct rte_vhost_memory) + - sizeof(struct rte_vhost_mem_region) * memory->nregions, 0, numa_node); - if (dev->mem == NULL) { - VHOST_CONFIG_LOG(dev->ifname, ERR, "failed to allocate memory for dev->mem"); - goto free_guest_pages; - } + if (vhost_user_initialize_memory(pdev) < 0) + goto close_msg_fds; for (i = 0; i < memory->nregions; i++) { reg = &dev->mem->regions[i]; @@ -1531,11 +1507,182 @@ vhost_user_set_mem_table(struct virtio_net **pdev, return RTE_VHOST_MSG_RESULT_OK; free_mem_table: - free_mem_region(dev); + free_all_mem_regions(dev); rte_free(dev->mem); dev->mem = NULL; + rte_free(dev->guest_pages); + dev->guest_pages = NULL; +close_msg_fds: + close_msg_fds(ctx); + return RTE_VHOST_MSG_RESULT_ERR; +} + -free_guest_pages: +static int +vhost_user_get_max_mem_slots(struct virtio_net **pdev __rte_unused, + struct vhu_msg_context *ctx, + int main_fd __rte_unused) +{ + uint32_t max_mem_slots = VHOST_MEMORY_MAX_NREGIONS; + + ctx->msg.payload.u64 = (uint64_t)max_mem_slots; + ctx->msg.size = sizeof(ctx->msg.payload.u64); + ctx->fd_num = 0; + + return RTE_VHOST_MSG_RESULT_REPLY; +} + +static int +vhost_user_add_mem_reg(struct virtio_net **pdev, + struct vhu_msg_context *ctx, + int main_fd __rte_unused) +{ + struct virtio_net *dev = *pdev; + struct VhostUserMemoryRegion *region = &ctx->msg.payload.memory_single.region; + uint32_t i; + + /* make sure new region will fit */ + if (dev->mem != NULL && dev->mem->nregions >= VHOST_MEMORY_MAX_NREGIONS) { + VHOST_CONFIG_LOG(dev->ifname, ERR, + "too many memory regions already (%u)", + dev->mem->nregions); + goto close_msg_fds; + } + + /* make sure supplied memory fd present */ + if (ctx->fd_num != 1) { + VHOST_CONFIG_LOG(dev->ifname, ERR, + "fd count makes no sense (%u)", + ctx->fd_num); + goto close_msg_fds; + } + + /* Make sure no overlap in guest virtual address space */ + if (dev->mem != NULL && dev->mem->nregions > 0) { + for (uint32_t i = 0; i < VHOST_MEMORY_MAX_NREGIONS; i++) { + struct rte_vhost_mem_region *current_region = &dev->mem->regions[i]; + + if (current_region->mmap_size == 0) + continue; + + uint64_t current_region_guest_start = current_region->guest_user_addr; + uint64_t current_region_guest_end = current_region_guest_start + + current_region->mmap_size - 1; + uint64_t proposed_region_guest_start = region->userspace_addr; + uint64_t proposed_region_guest_end = proposed_region_guest_start + + region->memory_size - 1; + bool overlap = false; + + bool curent_region_guest_start_overlap = + current_region_guest_start >= proposed_region_guest_start + && current_region_guest_start <= proposed_region_guest_end; + bool curent_region_guest_end_overlap = + current_region_guest_end >= proposed_region_guest_start + && current_region_guest_end <= proposed_region_guest_end; + bool proposed_region_guest_start_overlap = + proposed_region_guest_start >= current_region_guest_start + && proposed_region_guest_start <= current_region_guest_end; + bool proposed_region_guest_end_overlap = + proposed_region_guest_end >= current_region_guest_start + && proposed_region_guest_end <= current_region_guest_end; + + overlap = curent_region_guest_start_overlap + || curent_region_guest_end_overlap + || proposed_region_guest_start_overlap + || proposed_region_guest_end_overlap; + + if (overlap) { + VHOST_CONFIG_LOG(dev->ifname, ERR, + "requested memory region overlaps with another region"); + VHOST_CONFIG_LOG(dev->ifname, ERR, + "\tRequested region address:0x%" PRIx64, + region->userspace_addr); + VHOST_CONFIG_LOG(dev->ifname, ERR, + "\tRequested region size:0x%" PRIx64, + region->memory_size); + VHOST_CONFIG_LOG(dev->ifname, ERR, + "\tOverlapping region address:0x%" PRIx64, + current_region->guest_user_addr); + VHOST_CONFIG_LOG(dev->ifname, ERR, + "\tOverlapping region size:0x%" PRIx64, + current_region->mmap_size); + goto close_msg_fds; + } + + } + } + + /* convert first region add to normal memory table set */ + if (dev->mem == NULL) { + if (vhost_user_initialize_memory(pdev) < 0) + goto close_msg_fds; + } + + /* find a new region and set it like memory table set does */ + struct rte_vhost_mem_region *reg = NULL; + uint64_t mmap_offset; + + for (uint32_t i = 0; i < VHOST_MEMORY_MAX_NREGIONS; i++) { + if (dev->mem->regions[i].guest_user_addr == 0) { + reg = &dev->mem->regions[i]; + break; + } + } + if (reg == NULL) { + VHOST_CONFIG_LOG(dev->ifname, ERR, "no free memory region"); + goto close_msg_fds; + } + + reg->guest_phys_addr = region->guest_phys_addr; + reg->guest_user_addr = region->userspace_addr; + reg->size = region->memory_size; + reg->fd = ctx->fds[0]; + + mmap_offset = region->mmap_offset; + + if (vhost_user_mmap_region(dev, reg, mmap_offset) < 0) { + VHOST_CONFIG_LOG(dev->ifname, ERR, "failed to mmap region"); + goto close_msg_fds; + } + + dev->mem->nregions++; + + if (dev->async_copy && rte_vfio_is_enabled("vfio")) + async_dma_map(dev, true); + + if (vhost_user_postcopy_register(dev, main_fd, ctx) < 0) + goto free_mem_table; + + for (i = 0; i < dev->nr_vring; i++) { + struct vhost_virtqueue *vq = dev->virtqueue[i]; + + if (!vq) + continue; + + if (vq->desc || vq->avail || vq->used) { + /* vhost_user_lock_all_queue_pairs locked all qps */ + VHOST_USER_ASSERT_LOCK(dev, vq, VHOST_USER_ADD_MEM_REG); + + /* + * If the memory table got updated, the ring addresses + * need to be translated again as virtual addresses have + * changed. + */ + vring_invalidate(dev, vq); + + translate_ring_addresses(&dev, &vq); + *pdev = dev; + } + } + + dump_guest_pages(dev); + + return RTE_VHOST_MSG_RESULT_OK; + +free_mem_table: + free_all_mem_regions(dev); + rte_free(dev->mem); + dev->mem = NULL; rte_free(dev->guest_pages); dev->guest_pages = NULL; close_msg_fds: @@ -1543,6 +1690,48 @@ vhost_user_set_mem_table(struct virtio_net **pdev, return RTE_VHOST_MSG_RESULT_ERR; } +static int +vhost_user_rem_mem_reg(struct virtio_net **pdev __rte_unused, + struct vhu_msg_context *ctx __rte_unused, + int main_fd __rte_unused) +{ + struct virtio_net *dev = *pdev; + struct VhostUserMemoryRegion *region = &ctx->msg.payload.memory_single.region; + + if ((dev->mem) && (dev->flags & VIRTIO_DEV_VDPA_CONFIGURED)) { + struct rte_vdpa_device *vdpa_dev = dev->vdpa_dev; + + if (vdpa_dev && vdpa_dev->ops->dev_close) + vdpa_dev->ops->dev_close(dev->vid); + dev->flags &= ~VIRTIO_DEV_VDPA_CONFIGURED; + } + + if (dev->mem != NULL && dev->mem->nregions > 0) { + for (uint32_t i = 0; i < VHOST_MEMORY_MAX_NREGIONS; i++) { + struct rte_vhost_mem_region *current_region = &dev->mem->regions[i]; + + if (current_region->guest_user_addr == 0) + continue; + + /* + * According to the vhost-user specification: + * The memory region to be removed is identified by its guest address, + * user address and size. The mmap offset is ignored. + */ + if (region->userspace_addr == current_region->guest_user_addr + && region->guest_phys_addr == current_region->guest_phys_addr + && region->memory_size == current_region->size) { + free_mem_region(current_region); + dev->mem->nregions--; + return RTE_VHOST_MSG_RESULT_OK; + } + } + } + + VHOST_CONFIG_LOG(dev->ifname, ERR, "failed to find region"); + return RTE_VHOST_MSG_RESULT_ERR; +} + static bool vq_is_ready(struct virtio_net *dev, struct vhost_virtqueue *vq) { -- 2.43.0