From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id E75585911 for ; Thu, 20 Dec 2018 16:32:23 +0100 (CET) Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 1471A23E6CE; Thu, 20 Dec 2018 15:32:23 +0000 (UTC) Received: from [10.36.112.60] (ovpn-112-60.ams2.redhat.com [10.36.112.60]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 0D9CC19C7B; Thu, 20 Dec 2018 15:32:15 +0000 (UTC) To: "Michael S. Tsirkin" Cc: dev@dpdk.org, i.maximets@samsung.com, tiwei.bie@intel.com, zhihong.wang@intel.com, jfreiman@redhat.com References: <20181220100022.3531-1-maxime.coquelin@redhat.com> <20181220092201-mutt-send-email-mst@kernel.org> From: Maxime Coquelin Message-ID: <4cb93c35-24d4-1574-79f2-889772879556@redhat.com> Date: Thu, 20 Dec 2018 16:32:13 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.1 MIME-Version: 1.0 In-Reply-To: <20181220092201-mutt-send-email-mst@kernel.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Thu, 20 Dec 2018 15:32:23 +0000 (UTC) Subject: Re: [dpdk-dev] [PATCH v3] vhost: batch used descs chains write-back with packed ring X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Dec 2018 15:32:24 -0000 On 12/20/18 3:30 PM, Michael S. Tsirkin wrote: > On Thu, Dec 20, 2018 at 11:00:22AM +0100, Maxime Coquelin wrote: >> Instead of writing back descriptors chains in order, let's >> write the first chain flags last in order to improve batching. >> >> With Kernel's pktgen benchmark, ~3% performance gain is measured. >> >> Signed-off-by: Maxime Coquelin >> --- >> lib/librte_vhost/virtio_net.c | 19 +++++++++++++++++-- >> 1 file changed, 17 insertions(+), 2 deletions(-) >> >> diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c >> index 8c657a101..66ccd3c35 100644 >> --- a/lib/librte_vhost/virtio_net.c >> +++ b/lib/librte_vhost/virtio_net.c >> @@ -97,6 +97,8 @@ flush_shadow_used_ring_packed(struct virtio_net *dev, >> { >> int i; >> uint16_t used_idx = vq->last_used_idx; >> + uint16_t head_idx = vq->last_used_idx; >> + uint16_t head_flags = 0; >> >> /* Split loop in two to save memory barriers */ >> for (i = 0; i < vq->shadow_used_idx; i++) { >> @@ -126,12 +128,17 @@ flush_shadow_used_ring_packed(struct virtio_net *dev, >> flags &= ~VRING_DESC_F_AVAIL; >> } >> >> - vq->desc_packed[vq->last_used_idx].flags = flags; >> + if (i > 0) { >> + vq->desc_packed[vq->last_used_idx].flags = flags; >> >> - vhost_log_cache_used_vring(dev, vq, >> + vhost_log_cache_used_vring(dev, vq, >> vq->last_used_idx * >> sizeof(struct vring_packed_desc), >> sizeof(struct vring_packed_desc)); >> + } else { >> + head_idx = vq->last_used_idx; >> + head_flags = flags; >> + } >> >> vq->last_used_idx += vq->shadow_used_packed[i].count; >> if (vq->last_used_idx >= vq->size) { >> @@ -140,7 +147,15 @@ flush_shadow_used_ring_packed(struct virtio_net *dev, >> } >> } >> >> + vq->desc_packed[head_idx].flags = head_flags; >> + >> rte_smp_wmb(); >> + >> + vhost_log_cache_used_vring(dev, vq, >> + head_idx * >> + sizeof(struct vring_packed_desc), >> + sizeof(struct vring_packed_desc)); >> + >> vq->shadow_used_idx = 0; >> vhost_log_cache_sync(dev, vq); > > How about moving rte_smp_wmb into logging functions? > This way it's free with log disabled even on arm... That's what I initially suggested in my reply to v2. Problem is that in split ring case, we already have a barrier before cache sync, and we need it even if logging is disabled. But I think you are right, it might be better to have the barrier twice in split ring case when logging is enabled and none for packed ring when logging is disabled. I'll post a v4. Thanks, Maxime >> } >> -- >> 2.17.2