From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <maxime.coquelin@redhat.com>
Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28])
 by dpdk.org (Postfix) with ESMTP id 1CD3AF04
 for <dev@dpdk.org>; Wed,  6 Sep 2017 09:15:56 +0200 (CEST)
Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com
 [10.5.11.12])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.redhat.com (Postfix) with ESMTPS id 62CEB267C1;
 Wed,  6 Sep 2017 07:15:55 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 62CEB267C1
Authentication-Results: ext-mx06.extmail.prod.ext.phx2.redhat.com;
 dmarc=none (p=none dis=none) header.from=redhat.com
Authentication-Results: ext-mx06.extmail.prod.ext.phx2.redhat.com;
 spf=fail smtp.mailfrom=maxime.coquelin@redhat.com
Received: from [10.36.112.20] (ovpn-112-20.ams2.redhat.com [10.36.112.20])
 by smtp.corp.redhat.com (Postfix) with ESMTPS id 318D26FAAB;
 Wed,  6 Sep 2017 07:15:49 +0000 (UTC)
To: Tiwei Bie <tiwei.bie@intel.com>
Cc: dev@dpdk.org, yliu@fridaylinux.org, jfreiman@redhat.com, mst@redhat.com,
 vkaplans@redhat.com, jasowang@redhat.com, lei.a.yao@intel.com,
 cunming.liang@intel.com
References: <20170831095023.21037-1-maxime.coquelin@redhat.com>
 <20170831095023.21037-4-maxime.coquelin@redhat.com>
 <20170905044516.GC31895@debian-ZGViaWFuCg>
 <68468145-5b45-5875-b37f-35df3482379a@redhat.com>
 <20170905100751.GA7290@debian-ZGViaWFuCg>
 <0362ed01-211f-d4fc-d4ae-11ea81ad5df1@redhat.com>
 <20170906011459.GA3965@debian-ZGViaWFuCg>
From: Maxime Coquelin <maxime.coquelin@redhat.com>
Message-ID: <06b50f42-9151-9b94-646e-e2e2b153a957@redhat.com>
Date: Wed, 6 Sep 2017 09:15:47 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.2.1
MIME-Version: 1.0
In-Reply-To: <20170906011459.GA3965@debian-ZGViaWFuCg>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16
 (mx1.redhat.com [10.5.110.30]); Wed, 06 Sep 2017 07:15:55 +0000 (UTC)
Subject: Re: [dpdk-dev] [PATCH 03/21] vhost: protect virtio_net device struct
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Wed, 06 Sep 2017 07:15:56 -0000

Hi Tiwei,

On 09/06/2017 03:15 AM, Tiwei Bie wrote:
> On Tue, Sep 05, 2017 at 01:00:42PM +0200, Maxime Coquelin wrote:
>> On 09/05/2017 12:07 PM, Tiwei Bie wrote:
>>> On Tue, Sep 05, 2017 at 11:24:14AM +0200, Maxime Coquelin wrote:
>>>> On 09/05/2017 06:45 AM, Tiwei Bie wrote:
>>>>> On Thu, Aug 31, 2017 at 11:50:05AM +0200, Maxime Coquelin wrote:
>>>>>> virtio_net device might be accessed while being reallocated
>>>>>> in case of NUMA awareness. This case might be theoretical,
>>>>>> but it will be needed anyway to protect vrings pages against
>>>>>> invalidation.
>>>>>>
>>>>>> The virtio_net devs are now protected with a readers/writers
>>>>>> lock, so that before reallocating the device, it is ensured
>>>>>> that it is not being referenced by the processing threads.
>>>>>>
>>>>> [...]
>>>>>> +struct virtio_net *
>>>>>> +get_device(int vid)
>>>>>> +{
>>>>>> +	struct virtio_net *dev;
>>>>>> +
>>>>>> +	rte_rwlock_read_lock(&vhost_devices[vid].lock);
>>>>>> +
>>>>>> +	dev = __get_device(vid);
>>>>>> +	if (unlikely(!dev))
>>>>>> +		rte_rwlock_read_unlock(&vhost_devices[vid].lock);
>>>>>> +
>>>>>> +	return dev;
>>>>>> +}
>>>>>> +
>>>>>> +void
>>>>>> +put_device(int vid)
>>>>>> +{
>>>>>> +	rte_rwlock_read_unlock(&vhost_devices[vid].lock);
>>>>>> +}
>>>>>> +
>>>>>
>>>>> This patch introduced a per-device rwlock which needs to be acquired
>>>>> unconditionally in the data path. So for each vhost device, the IO
>>>>> threads of different queues will need to acquire/release this lock
>>>>> during each enqueue and dequeue operation, which will cause cache
>>>>> contention when multiple queues are enabled and handled by different
>>>>> cores. With this patch alone, I saw ~7% performance drop when enabling
>>>>> 6 queues to do 64bytes iofwd loopback test. Is there any way to avoid
>>>>> introducing this lock to the data path?
>>>>
>>>> First, I'd like to thank you for running the MQ test.
>>>> I agree it may have a performance impact in this case.
>>>>
>>>> This lock has currently two purposes:
>>>> 1. Prevent referencing freed virtio_dev struct in case of numa_realloc.
>>>> 2. Protect vring pages against invalidation.
>>>>
>>>> For 2., it can be fixed by using the per-vq IOTLB lock (it was not the
>>>> case in my early prototypes that had per device IOTLB cache).
>>>>
>>>> For 1., this is an existing problem, so we might consider it is
>>>> acceptable to keep current state. Maybe it could be improved by only
>>>> reallocating in case VQ0 is not on the right NUMA node, the other VQs
>>>> not being initialized at this point.
>>>>
>>>> If we do this we might be able to get rid of this lock, I need some more
>>>> time though to ensure I'm not missing something.
>>>>
>>>> What do you think?
>>>>
>>>
>>> Cool. So it's possible that the lock in the data path will be
>>> acquired only when the IOMMU feature is enabled. It will be
>>> great!
>>>
>>> Besides, I just did a very simple MQ test to verify my thoughts.
>>> Lei (CC'ed in this mail) may do a thorough performance test for
>>> this patch set to evaluate the performance impacts.
>>
>> I'll try to post v2 this week including the proposed change.
>> Maybe it'll be better Lei waits for the v2.
>>
> 
> Cool. Sure. Thank you! :)

I have done the changes, you can find the v2 on my gitlab repo:
https://gitlab.com/mcoquelin/dpdk-next-virtio/commits/vhost_iotlb_v2

I'm testing it right now, but if you'd like to run some early benchmark
before I post the series, there it is!

Thanks,
Maxime
> Best regards,
> Tiwei Bie
>