From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 9A68DA0350; Thu, 27 Jan 2022 11:47:09 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 20E764278A; Thu, 27 Jan 2022 11:47:09 +0100 (CET) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by mails.dpdk.org (Postfix) with ESMTP id 2B6764067C for ; Thu, 27 Jan 2022 11:47:07 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643280426; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BEdMiqIojGS4Cmijlx+xFXxNVLAWcG5soUjnUh32EMQ=; b=AlO6sFoMlvp0BjuERqEEnRJxFdTIaZ+V6JdLRPhai38FufnmDz6b+0DanHuRr2PWkk7h2A 1cTfbNw/cGUOMf/PFBj/fE97cNgnYqOEwMoocIZAIMgImb309YpLDSjoGXmnZW/u7TrcEd de/wGLt2mxHiNk7biprog7321n3IbAk= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-383-WArMAAt3P2iW63op1hCk7Q-1; Thu, 27 Jan 2022 05:47:02 -0500 X-MC-Unique: WArMAAt3P2iW63op1hCk7Q-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id E081683DD25; Thu, 27 Jan 2022 10:47:00 +0000 (UTC) Received: from [10.39.208.34] (unknown [10.39.208.34]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 0E0876C1BA; Thu, 27 Jan 2022 10:46:58 +0000 (UTC) Message-ID: <1fda6254-4be4-9a2c-cdf2-d63893f727f5@redhat.com> Date: Thu, 27 Jan 2022 11:46:56 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.4.0 Subject: Re: [PATCH] vhost: fix data-plane access to released vq To: "Wang, YuanX" , "Xia, Chenbo" Cc: "dev@dpdk.org" , "Hu, Jiayu" , "Ding, Xuan" , "Ma, WenwuX" , "Ling, WeiX" References: <20211203163400.164545-1-yuanx.wang@intel.com> <63fdcab8-d692-c8c6-240d-a87b01ed1778@redhat.com> From: Maxime Coquelin In-Reply-To: X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=maxime.coquelin@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Hi, On 1/27/22 11:30, Wang, YuanX wrote: > Hi Maxime, > >> -----Original Message----- >> From: Maxime Coquelin >> Sent: Wednesday, January 26, 2022 10:03 PM >> To: Wang, YuanX ; Xia, Chenbo >> >> Cc: dev@dpdk.org; Hu, Jiayu ; Ding, Xuan >> ; Ma, WenwuX ; Ling, >> WeiX >> Subject: Re: [PATCH] vhost: fix data-plane access to released vq >> >> Hi Yuan, >> >> On 12/3/21 17:34, Yuan Wang wrote: >>> From: yuan wang >>> >>> When numa reallocation occurs, numa_realoc() on the control plane will >>> free the old vq. If rte_vhost_dequeue_burst() on the data plane get >>> the vq just before release, then it will access the released vq. We >>> need to put the >>> vq->access_lock into struct virtio_net to ensure that it >>> can prevents this situation. >> >> >> This patch is a fix, so the Fixes tag would be needed. >> >> But are you really facing this issue, or this is just based on code review? > > This issue is run-time checked with AddressSanitizer which can be turned on by: > meson configure -Db_sanitize=address > >> >> Currently NUMA reallocation is called whenever >> translate_ring_addresses() is called. >> >> translate_ring_addresses() is primarly called at device initialization, before >> the .new_device() callback is called. At that stage, there is no risk to >> performa NUMA reallocation as the application is not expected to use APIs >> requiring vq->access_lock acquisition. >> >> But I agree there are possibilities that numa_realloc() gets called while device >> is in running state. But even if that happened, I don't think it is possible that >> numa_realloc() ends-up reallocating the virtqueue on a different NUMA >> node (the vring should not have moved from a physical memory standpoint). >> And if even it happened, we should be safe because we ensure the VQ was >> not ready (so not usable by the >> application) before proceeding with reallocation: > > Here is a scenario where VQ ready has not been set: > 1. run the testpmd and then start the data plane process. > 2. run the front-end. > 3. new_device() gets called when the first two queues are ready, even if the later queues are not. > 4. when processing messages from the later queues, it may go to numa_realloc(), the ready flag has not been set and therefore can be reallocated. I will need a bit more details here. AFAICT, if the ready flag is not set for a given virtqueue, the virtqueue is not supposed to be exposed to the application. Is there a case where it happens? If so, the fix should consist in ensuring the application cannot use the virtqueue if it is not ready. Regards, Maxime > > If all the queues are ready before call new_deivce(), this issue does not occur. > I think maybe it is another solution. No, that was the older behaviour but causes issues with vDPA. We cannot just revert to older behaviour. Thanks, Maxime > Thanks, > Yuan > >> >> static struct virtio_net* >> numa_realloc(struct virtio_net *dev, int index) { >> int node, dev_node; >> struct virtio_net *old_dev; >> struct vhost_virtqueue *vq; >> struct batch_copy_elem *bce; >> struct guest_page *gp; >> struct rte_vhost_memory *mem; >> size_t mem_size; >> int ret; >> >> old_dev = dev; >> vq = dev->virtqueue[index]; >> >> /* >> * If VQ is ready, it is too late to reallocate, it certainly already >> * happened anyway on VHOST_USER_SET_VRING_ADRR. >> */ >> if (vq->ready) >> return dev; >> >> So, if this is fixing a real issue, I would need more details on the issue in order >> to understand why vq->ready was not set when it should have been. >> >> On a side note, while trying to understand how you could face an issue, I >> noticed that translate_ring_addresses() may be called by >> vhost_user_iotlb_msg(). In that case, vq->access_lock is not held as this is >> the handler for VHOST_USER_IOTLB_MSG. We may want to protect >> translate_ring_addresses() calls with locking the VQ locks. I will post a fix for >> it. >> >>> Signed-off-by: Yuan Wang >>> --- >>> lib/vhost/vhost.c | 26 +++++++++++++------------- >>> lib/vhost/vhost.h | 4 +--- >>> lib/vhost/vhost_user.c | 4 ++-- >>> lib/vhost/virtio_net.c | 16 ++++++++-------- >>> 4 files changed, 24 insertions(+), 26 deletions(-) >>> >> >> ... >> >>> diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h index >>> 7085e0885c..f85ce4fda5 100644 >>> --- a/lib/vhost/vhost.h >>> +++ b/lib/vhost/vhost.h >>> @@ -185,9 +185,6 @@ struct vhost_virtqueue { >>> bool access_ok; >>> bool ready; >>> >>> - rte_spinlock_t access_lock; >>> - >>> - >>> union { >>> struct vring_used_elem *shadow_used_split; >>> struct vring_used_elem_packed *shadow_used_packed; >> @@ -384,6 >>> +381,7 @@ struct virtio_net { >>> int extbuf; >>> int linearbuf; >>> struct vhost_virtqueue *virtqueue[VHOST_MAX_QUEUE_PAIRS * 2]; >>> + rte_spinlock_t vq_access_lock[VHOST_MAX_QUEUE_PAIRS >> * 2]; >> >> The problem here is that you'll be introducing false sharing, so I expect >> performance to no more scale with the number of queues. >> >> It also consumes unnecessary memory. >> >>> struct inflight_mem_info *inflight_info; >>> #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ) >>> char ifname[IF_NAME_SZ]; >> >> Thanks, >> Maxime >