From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 9A68DA0350;
	Thu, 27 Jan 2022 11:47:09 +0100 (CET)
Received: from [217.70.189.124] (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 20E764278A;
	Thu, 27 Jan 2022 11:47:09 +0100 (CET)
Received: from us-smtp-delivery-124.mimecast.com
 (us-smtp-delivery-124.mimecast.com [170.10.129.124])
 by mails.dpdk.org (Postfix) with ESMTP id 2B6764067C
 for <dev@dpdk.org>; Thu, 27 Jan 2022 11:47:07 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
 s=mimecast20190719; t=1643280426;
 h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
 content-transfer-encoding:content-transfer-encoding:
 in-reply-to:in-reply-to:references:references;
 bh=BEdMiqIojGS4Cmijlx+xFXxNVLAWcG5soUjnUh32EMQ=;
 b=AlO6sFoMlvp0BjuERqEEnRJxFdTIaZ+V6JdLRPhai38FufnmDz6b+0DanHuRr2PWkk7h2A
 1cTfbNw/cGUOMf/PFBj/fE97cNgnYqOEwMoocIZAIMgImb309YpLDSjoGXmnZW/u7TrcEd
 de/wGLt2mxHiNk7biprog7321n3IbAk=
Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com
 [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 us-mta-383-WArMAAt3P2iW63op1hCk7Q-1; Thu, 27 Jan 2022 05:47:02 -0500
X-MC-Unique: WArMAAt3P2iW63op1hCk7Q-1
Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com
 [10.5.11.12])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mimecast-mx01.redhat.com (Postfix) with ESMTPS id E081683DD25;
 Thu, 27 Jan 2022 10:47:00 +0000 (UTC)
Received: from [10.39.208.34] (unknown [10.39.208.34])
 by smtp.corp.redhat.com (Postfix) with ESMTPS id 0E0876C1BA;
 Thu, 27 Jan 2022 10:46:58 +0000 (UTC)
Message-ID: <1fda6254-4be4-9a2c-cdf2-d63893f727f5@redhat.com>
Date: Thu, 27 Jan 2022 11:46:56 +0100
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
 Thunderbird/91.4.0
Subject: Re: [PATCH] vhost: fix data-plane access to released vq
To: "Wang, YuanX" <yuanx.wang@intel.com>, "Xia, Chenbo" <chenbo.xia@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>, "Hu, Jiayu" <jiayu.hu@intel.com>,
 "Ding, Xuan" <xuan.ding@intel.com>, "Ma, WenwuX" <wenwux.ma@intel.com>,
 "Ling, WeiX" <weix.ling@intel.com>
References: <20211203163400.164545-1-yuanx.wang@intel.com>
 <63fdcab8-d692-c8c6-240d-a87b01ed1778@redhat.com>
 <CO1PR11MB48971FCDCC2936AE9806569A85219@CO1PR11MB4897.namprd11.prod.outlook.com>
From: Maxime Coquelin <maxime.coquelin@redhat.com>
In-Reply-To: <CO1PR11MB48971FCDCC2936AE9806569A85219@CO1PR11MB4897.namprd11.prod.outlook.com>
X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12
Authentication-Results: relay.mimecast.com;
 auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=maxime.coquelin@redhat.com
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Language: en-US
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org

Hi,

On 1/27/22 11:30, Wang, YuanX wrote:
> Hi Maxime,
> 
>> -----Original Message-----
>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Sent: Wednesday, January 26, 2022 10:03 PM
>> To: Wang, YuanX <yuanx.wang@intel.com>; Xia, Chenbo
>> <chenbo.xia@intel.com>
>> Cc: dev@dpdk.org; Hu, Jiayu <jiayu.hu@intel.com>; Ding, Xuan
>> <xuan.ding@intel.com>; Ma, WenwuX <wenwux.ma@intel.com>; Ling,
>> WeiX <weix.ling@intel.com>
>> Subject: Re: [PATCH] vhost: fix data-plane access to released vq
>>
>> Hi Yuan,
>>
>> On 12/3/21 17:34, Yuan Wang wrote:
>>> From: yuan wang <yuanx.wang@intel.com>
>>>
>>> When numa reallocation occurs, numa_realoc() on the control plane will
>>> free the old vq. If rte_vhost_dequeue_burst() on the data plane get
>>> the vq just before release, then it will access the released vq. We
>>> need to put the
>>> vq->access_lock into struct virtio_net to ensure that it
>>> can prevents this situation.
>>
>>
>> This patch is a fix, so the Fixes tag would be needed.
>>
>> But are you really facing this issue, or this is just based on code review?
> 
> This issue is run-time checked with AddressSanitizer which can be turned on by:
> meson configure -Db_sanitize=address <build_dir>
> 
>>
>> Currently NUMA reallocation is called whenever
>> translate_ring_addresses() is called.
>>
>> translate_ring_addresses() is primarly called at device initialization, before
>> the .new_device() callback is called. At that stage, there is no risk to
>> performa NUMA reallocation as the application is not expected to use APIs
>> requiring vq->access_lock acquisition.
>>
>> But I agree there are possibilities that numa_realloc() gets called while device
>> is in running state. But even if that happened, I don't think it is possible that
>> numa_realloc() ends-up reallocating the virtqueue on a different NUMA
>> node (the vring should not have moved from a physical memory standpoint).
>> And if even it happened, we should be safe because we ensure the VQ was
>> not ready (so not usable by the
>> application) before proceeding with reallocation:
> 
> Here is a scenario where VQ ready has not been set:
> 1. run the testpmd and then start the data plane process.
> 2. run the front-end.
> 3. new_device() gets called when the first two queues are ready, even if the later queues are not.
> 4. when processing messages from the later queues, it may go to numa_realloc(), the ready flag has not been set and therefore can be reallocated.

I will need a bit more details here.

AFAICT, if the ready flag is not set for a given virtqueue, the
virtqueue is not supposed to be exposed to the application. Is there a
case where it happens? If so, the fix should consist in ensuring the
application cannot use the virtqueue if it is not ready.

Regards,
Maxime

> 
> If all the queues are ready before call new_deivce(), this issue does not occur.
> I think maybe it is another solution.

No, that was the older behaviour but causes issues with vDPA.
We cannot just revert to older behaviour.

Thanks,
Maxime

> Thanks,
> Yuan
> 
>>
>> static struct virtio_net*
>> numa_realloc(struct virtio_net *dev, int index) {
>> 	int node, dev_node;
>> 	struct virtio_net *old_dev;
>> 	struct vhost_virtqueue *vq;
>> 	struct batch_copy_elem *bce;
>> 	struct guest_page *gp;
>> 	struct rte_vhost_memory *mem;
>> 	size_t mem_size;
>> 	int ret;
>>
>> 	old_dev = dev;
>> 	vq = dev->virtqueue[index];
>>
>> 	/*
>> 	 * If VQ is ready, it is too late to reallocate, it certainly already
>> 	 * happened anyway on VHOST_USER_SET_VRING_ADRR.
>> 	 */
>> 	if (vq->ready)
>> 		return dev;
>>
>> So, if this is fixing a real issue, I would need more details on the issue in order
>> to understand why vq->ready was not set when it should have been.
>>
>> On a side note, while trying to understand how you could face an issue, I
>> noticed that translate_ring_addresses() may be called by
>> vhost_user_iotlb_msg(). In that case, vq->access_lock is not held as this is
>> the handler for VHOST_USER_IOTLB_MSG. We may want to protect
>> translate_ring_addresses() calls with locking the VQ locks. I will post a fix for
>> it.
>>
>>> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
>>> ---
>>>    lib/vhost/vhost.c      | 26 +++++++++++++-------------
>>>    lib/vhost/vhost.h      |  4 +---
>>>    lib/vhost/vhost_user.c |  4 ++--
>>>    lib/vhost/virtio_net.c | 16 ++++++++--------
>>>    4 files changed, 24 insertions(+), 26 deletions(-)
>>>
>>
>> ...
>>
>>> diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h index
>>> 7085e0885c..f85ce4fda5 100644
>>> --- a/lib/vhost/vhost.h
>>> +++ b/lib/vhost/vhost.h
>>> @@ -185,9 +185,6 @@ struct vhost_virtqueue {
>>>    	bool			access_ok;
>>>    	bool			ready;
>>>
>>> -	rte_spinlock_t		access_lock;
>>> -
>>> -
>>>    	union {
>>>    		struct vring_used_elem  *shadow_used_split;
>>>    		struct vring_used_elem_packed *shadow_used_packed;
>> @@ -384,6
>>> +381,7 @@ struct virtio_net {
>>>    	int			extbuf;
>>>    	int			linearbuf;
>>>    	struct vhost_virtqueue	*virtqueue[VHOST_MAX_QUEUE_PAIRS * 2];
>>> +	rte_spinlock_t		vq_access_lock[VHOST_MAX_QUEUE_PAIRS
>> * 2];
>>
>> The problem here is that you'll be introducing false sharing, so I expect
>> performance to no more scale with the number of queues.
>>
>> It also consumes unnecessary memory.
>>
>>>    	struct inflight_mem_info *inflight_info;
>>>    #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ)
>>>    	char			ifname[IF_NAME_SZ];
>>
>> Thanks,
>> Maxime
>