From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <maxime.coquelin@redhat.com>
Received: from mx1.redhat.com (mx3-rdu2.redhat.com [66.187.233.73])
 by dpdk.org (Postfix) with ESMTP id 90A421BE0;
 Fri,  4 May 2018 17:48:10 +0200 (CEST)
Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com
 [10.11.54.5])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.redhat.com (Postfix) with ESMTPS id 031E68DC4E;
 Fri,  4 May 2018 15:48:10 +0000 (UTC)
Received: from [10.36.112.52] (ovpn-112-52.ams2.redhat.com [10.36.112.52])
 by smtp.corp.redhat.com (Postfix) with ESMTPS id AE7D383B85;
 Fri,  4 May 2018 15:48:06 +0000 (UTC)
To: Tiwei Bie <tiwei.bie@intel.com>
Cc: dev@dpdk.org, jianfeng.tan@intel.com, mst@redhat.com, stable@dpdk.org
References: <20180430155954.9939-1-maxime.coquelin@redhat.com>
 <20180503115634.feaimkzpnbodferd@debian>
From: Maxime Coquelin <maxime.coquelin@redhat.com>
Message-ID: <6125f044-d557-666a-8228-4930ead16089@redhat.com>
Date: Fri, 4 May 2018 17:48:05 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.7.0
MIME-Version: 1.0
In-Reply-To: <20180503115634.feaimkzpnbodferd@debian>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16
 (mx1.redhat.com [10.11.55.2]); Fri, 04 May 2018 15:48:10 +0000 (UTC)
X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.2]);
 Fri, 04 May 2018 15:48:10 +0000 (UTC) for IP:'10.11.54.5'
 DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com'
 HELO:'smtp.corp.redhat.com' FROM:'maxime.coquelin@redhat.com' RCPT:''
Subject: Re: [dpdk-dev] [PATCH] vhost: improve dirty pages logging
	performance
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Fri, 04 May 2018 15:48:10 -0000

Hi Tiwei,

On 05/03/2018 01:56 PM, Tiwei Bie wrote:
> On Mon, Apr 30, 2018 at 05:59:54PM +0200, Maxime Coquelin wrote:
>> This patch caches all dirty pages logging until the used ring index
>> is updated. These dirty pages won't be accessed by the guest as
>> long as the host doesn't give them back to it by updating the
>> index.
> 
> Below sentence in above commit message isn't the reason why
> we can cache the dirty page logging. Right?
> 
> """
> These dirty pages won't be accessed by the guest as
> long as the host doesn't give them back to it by updating the
> index.
> """
> 
>>
>> The goal of this optimization is to fix a performance regression
>> introduced when the vhost library started to use atomic operations
>> to set bits in the shared dirty log map. While the fix was valid
>> as previous implementation wasn't safe against concurent accesses,
>> contention was induced.
>>
>> With this patch, during migration, we have:
>> 1. Less atomic operations as only a single atomic OR operation
>> per 32 pages.
> 
> Why not do it per 64 pages?

I wasn't sure it would be supported on 32bits CPU, but [0] seems do
indicate that only 128bits atomic operations aren't supported on all
architectures.

I will change to do it per 64 pages.

[0]: 
https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html#g_t_005f_005fatomic-Builtins
>> 2. Less atomic operations as during a burst, the same page will
>> be marked dirty only once.
>> 3. Less write memory barriers.
>>
>> Fixes: 897f13a1f726 ("vhost: make page logging atomic")
>>
>> Cc: stable@dpdk.org
>>
>> Suggested-by: Michael S. Tsirkin <mst@redhat.com>
>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>> ---
>>
>> Hi,
>>
>> This series was tested with migrating a guest while running PVP
>> benchmark at 1Mpps with both ovs-dpdk and testpmd as vswitch.
> 
> If the throughput is higher (e.g. by adding more cores
> and queues), will the live migration fail due to the
> higher dirty page generating speed?

I haven't done the check, but I agree that the higher is the throughput,
the longer will be the migration duration and the higher the risk it
never converges.

In this case of scenario, postcopy live-migration way be a better fit,
as the hot pages will be copied only once. Postcopy support for
vhost-user have been added to QEMU in v2.12, and I have a prototype for
DPDK that I plan to propose for next release.

> 
>>
>> With this patch we recover the packet drops regressions seen since
>> the use of atomic operations to log dirty pages.
> [...]
>>   
>> +static __rte_always_inline void
>> +vhost_log_cache_sync(struct virtio_net *dev, struct vhost_virtqueue *vq)
>> +{
>> +	uint32_t *log_base;
>> +	int i;
>> +
>> +	if (likely(((dev->features & (1ULL << VHOST_F_LOG_ALL)) == 0) ||
>> +		   !dev->log_base))
>> +		return;
>> +
>> +	log_base = (uint32_t *)(uintptr_t)dev->log_base;
>> +
>> +	/* To make sure guest memory updates are committed before logging */
>> +	rte_smp_wmb();
> 
> It seems that __sync_fetch_and_or() can be considered a full
> barrier [1]. So do we really need this rte_smp_wmb()?

That's a good point, thanks for the pointer.

> Besides, based on the same doc [1], it seems that the __sync_
> version is deprecated in favor of the __atomic_ one.

I will change to __atomic_. For the memory model, do you agree I should
use __ATOMIC_SEQ_CST?

> [1] https://gcc.gnu.org/onlinedocs/gcc/_005f_005fsync-Builtins.html
> 
>> +
>> +	for (i = 0; i < vq->log_cache_nb_elem; i++) {
>> +		struct log_cache_entry *elem = vq->log_cache + i;
>> +
>> +		__sync_fetch_and_or(log_base + elem->offset, elem->val);
>> +	}
>> +
>> +	vq->log_cache_nb_elem = 0;
>> +}
>> +
> [...]
> 

Thanks,
Maxime