From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <maxime.coquelin@redhat.com>
Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28])
 by dpdk.org (Postfix) with ESMTP id D0F401B6B9
 for <dev@dpdk.org>; Fri,  3 Nov 2017 16:54:22 +0100 (CET)
Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com
 [10.5.11.12])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.redhat.com (Postfix) with ESMTPS id 0FEE5D7E96;
 Fri,  3 Nov 2017 15:54:22 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 0FEE5D7E96
Authentication-Results: ext-mx09.extmail.prod.ext.phx2.redhat.com;
 dmarc=none (p=none dis=none) header.from=redhat.com
Authentication-Results: ext-mx09.extmail.prod.ext.phx2.redhat.com;
 spf=fail smtp.mailfrom=maxime.coquelin@redhat.com
Received: from [10.36.112.52] (ovpn-112-52.ams2.redhat.com [10.36.112.52])
 by smtp.corp.redhat.com (Postfix) with ESMTPS id 04782609A5;
 Fri,  3 Nov 2017 15:54:07 +0000 (UTC)
To: "Michael S. Tsirkin" <mst@redhat.com>, "Yao, Lei A" <lei.a.yao@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>, "yliu@fridaylinux.org"
 <yliu@fridaylinux.org>, "Horton, Remy" <remy.horton@intel.com>,
 "Bie, Tiwei" <tiwei.bie@intel.com>, "jfreiman@redhat.com"
 <jfreiman@redhat.com>, "vkaplans@redhat.com" <vkaplans@redhat.com>,
 "jasowang@redhat.com" <jasowang@redhat.com>
References: <20171005083627.27828-1-maxime.coquelin@redhat.com>
 <20171005083627.27828-18-maxime.coquelin@redhat.com>
 <2DBBFF226F7CF64BAFCA79B681719D953A2CD10A@shsmsx102.ccr.corp.intel.com>
 <8d27d7d9-9567-1a57-5a5f-760f2f117b73@redhat.com>
 <4054d863-8909-23a7-aba2-b8675cdbb4ba@redhat.com>
 <c915792b-34f7-5c22-334b-cde17b09d47a@redhat.com>
 <20171103171230-mutt-send-email-mst@kernel.org>
From: Maxime Coquelin <maxime.coquelin@redhat.com>
Message-ID: <041c6163-f4fc-a6e7-7142-8ea6594800b5@redhat.com>
Date: Fri, 3 Nov 2017 16:54:05 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.3.0
MIME-Version: 1.0
In-Reply-To: <20171103171230-mutt-send-email-mst@kernel.org>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 8bit
X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16
 (mx1.redhat.com [10.5.110.38]); Fri, 03 Nov 2017 15:54:22 +0000 (UTC)
Subject: Re: [dpdk-dev] [PATCH v3 17/19] vhost-user: iommu: postpone device
 creation until ring are mapped
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Fri, 03 Nov 2017 15:54:23 -0000



On 11/03/2017 04:15 PM, Michael S. Tsirkin wrote:
> On Fri, Nov 03, 2017 at 09:25:58AM +0100, Maxime Coquelin wrote:
>>
>>
>> On 11/02/2017 05:02 PM, Maxime Coquelin wrote:
>>>
>>>
>>> On 11/02/2017 09:21 AM, Maxime Coquelin wrote:
>>>> Hi Lei,
>>>>
>>>> On 11/02/2017 08:21 AM, Yao, Lei A wrote:
>>>>>
>>>> ...
>>>>> Hi, Maxime > I met one issue with your patch set during the v17.11 test.
>>>>
>>>> Is it with v17.11-rc2 or -rc1?
>>>>
>>>>> The test scenario is following,
>>>>> 1.    Bind one NIC, use test-pmd set vhost-user with 2 queue
>>>>> usertools/dpdk-devbind.py --bind=igb_uio 0000:05:00.0
>>>>> ./x86_64-native-linuxapp-gcc/app/testpmd -c 0xe -n 4
>>>>> --socket-mem 1024,1024 \
>>>>> --vdev 'net_vhost0,iface=vhost-net,queues=2' - -i --rxq=2
>>>>> --txq=2 --nb-cores=2 --rss-ip
>>>>> 2.    Launch qemu with  virtio device which has 2 queue
>>>>> 3.    In VM, launch testpmd with virtio-pmd using only 1 queue.
>>>>> x86_64-native-linuxapp-gcc/app/testpmd -c 0x07 -n 3 - -i
>>>>> --txqflags=0xf01 \
>>>>> --rxq=1 --txq=1 --rss-ip --nb-cores=1
>>>>>
>>>>> First,
>>>>> commit 09927b5249694bad1c094d3068124673722e6b8f
>>>>> vhost: translate ring addresses when IOMMU enabled
>>>>> The patch causes no traffic in PVP test. but link status is
>>>>> still up in vhost-user.
>>>>>
>>>>> Second,
>>>>> eefac9536a901a1f0bb52aa3b6fec8f375f09190
>>>>> vhost: postpone device creation until rings are mapped
>>>>> The patch causes link status "down" in vhost-user.
>>>
>>> I reproduced this one, and understand why link status remains down.
>>> My series did fixed a potential issue Michael raised, that the vring
>>> addresses should only interpreted once the ring is enabled.
>>> When VHOST_USER_F_PROTOCOL_FEATURES is negotiated, the rings addrs are
>>> translated when ring is enabled via VHOST_USER_SET_VRING_ENABLE.
>>> When not negotiated, the ring is considered started enabled, so
>>> translation is done at VHOST_USER_SET_VRING_KICK time.
>>>
>>> In your case, protocol features are negotiated, so the ring addresses
>>> are translated at enable time. The problem is that the code considers
>>> the device is ready once addresses for all the rings are translated.
>>> But since only the first pair of rings is used, it never happens, and
>>> the link remains down.
>>>
>>> One of the reason this check is done is to avoid starting the PMD
>>> threads before the addresses are translated in case of NUMA
>>> reallocation, as virtqueues and virtio-net device structs can be
>>> reallocated on a different node.
>>>
>>> I think the right fix would be to only perform NUMA reallocation for
>>> vring 0, as today we would end-up reallocating virtio-net struct
>>> mulitple time if VQs are on different NUMA nodes.
>>>
>>> Doing that, we could then consider the device is ready if vring 0 is
>>> enabled and its ring addresses are translated, and if other vrings have
>>> been kicked.
>>>
>>> I'll post a patch shortly implementing this idea.
>>
>> The proposed solution doesn't work, because disabled queues get accessed at
>> device start time:
>>
>> int
>> rte_vhost_enable_guest_notification(int vid, uint16_t queue_id, int enable)
>> {
>> ..
>> 	dev->virtqueue[queue_id]->used->flags = VRING_USED_F_NO_NOTIFY;
>> 	return 0;
>> }
>>
>> The above function being called in Vhost PMD for every queues, enabled
>> or not. While we could fix the PMD, it could break other applications
>> using the Vhost lib API directly, so we cannot translate at enable
>> time reliably.
>>
>> I think we may be a bit less conservative, and postpone addresses
>> translation at kick time, whatever VHOST_USER_F_PROTOCOL_FEATURES is
>> negotiated or not.
>>
>> Regards,
>> Maxime
>>
>>> Thanks,
>>> Maxime
> 
> I agree, enabling has nothing to do with it.
> 
> The spec is quite explicit:
> 
> Client must only process each ring when it is started.
> 
> and
> 
> Client must start ring upon receiving a kick (that is, detecting that file
> descriptor is readable) on the descriptor specified by
> VHOST_USER_SET_VRING_KICK, and stop ring upon receiving
> VHOST_USER_GET_VRING_BASE.
> 

Thanks for the confirmation Michael, fix posted.

Maxime