From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id E86D1160; Thu, 3 May 2018 13:55:51 +0200 (CEST) X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 May 2018 04:55:50 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,358,1520924400"; d="scan'208";a="38127683" Received: from debian.sh.intel.com (HELO debian) ([10.67.104.164]) by orsmga007.jf.intel.com with ESMTP; 03 May 2018 04:55:49 -0700 Date: Thu, 3 May 2018 19:56:34 +0800 From: Tiwei Bie To: Maxime Coquelin Cc: dev@dpdk.org, jianfeng.tan@intel.com, mst@redhat.com, stable@dpdk.org Message-ID: <20180503115634.feaimkzpnbodferd@debian> References: <20180430155954.9939-1-maxime.coquelin@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20180430155954.9939-1-maxime.coquelin@redhat.com> User-Agent: NeoMutt/20170113 (1.7.2) Subject: Re: [dpdk-dev] [PATCH] vhost: improve dirty pages logging performance X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 May 2018 11:55:52 -0000 On Mon, Apr 30, 2018 at 05:59:54PM +0200, Maxime Coquelin wrote: > This patch caches all dirty pages logging until the used ring index > is updated. These dirty pages won't be accessed by the guest as > long as the host doesn't give them back to it by updating the > index. Below sentence in above commit message isn't the reason why we can cache the dirty page logging. Right? """ These dirty pages won't be accessed by the guest as long as the host doesn't give them back to it by updating the index. """ > > The goal of this optimization is to fix a performance regression > introduced when the vhost library started to use atomic operations > to set bits in the shared dirty log map. While the fix was valid > as previous implementation wasn't safe against concurent accesses, > contention was induced. > > With this patch, during migration, we have: > 1. Less atomic operations as only a single atomic OR operation > per 32 pages. Why not do it per 64 pages? > 2. Less atomic operations as during a burst, the same page will > be marked dirty only once. > 3. Less write memory barriers. > > Fixes: 897f13a1f726 ("vhost: make page logging atomic") > > Cc: stable@dpdk.org > > Suggested-by: Michael S. Tsirkin > Signed-off-by: Maxime Coquelin > --- > > Hi, > > This series was tested with migrating a guest while running PVP > benchmark at 1Mpps with both ovs-dpdk and testpmd as vswitch. If the throughput is higher (e.g. by adding more cores and queues), will the live migration fail due to the higher dirty page generating speed? > > With this patch we recover the packet drops regressions seen since > the use of atomic operations to log dirty pages. [...] > > +static __rte_always_inline void > +vhost_log_cache_sync(struct virtio_net *dev, struct vhost_virtqueue *vq) > +{ > + uint32_t *log_base; > + int i; > + > + if (likely(((dev->features & (1ULL << VHOST_F_LOG_ALL)) == 0) || > + !dev->log_base)) > + return; > + > + log_base = (uint32_t *)(uintptr_t)dev->log_base; > + > + /* To make sure guest memory updates are committed before logging */ > + rte_smp_wmb(); It seems that __sync_fetch_and_or() can be considered a full barrier [1]. So do we really need this rte_smp_wmb()? Besides, based on the same doc [1], it seems that the __sync_ version is deprecated in favor of the __atomic_ one. [1] https://gcc.gnu.org/onlinedocs/gcc/_005f_005fsync-Builtins.html > + > + for (i = 0; i < vq->log_cache_nb_elem; i++) { > + struct log_cache_entry *elem = vq->log_cache + i; > + > + __sync_fetch_and_or(log_base + elem->offset, elem->val); > + } > + > + vq->log_cache_nb_elem = 0; > +} > + [...]