From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id 5565B1B9D0 for ; Thu, 20 Dec 2018 10:27:54 +0100 (CET) Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 8A1CEC0669A9; Thu, 20 Dec 2018 09:27:53 +0000 (UTC) Received: from [10.36.112.60] (ovpn-112-60.ams2.redhat.com [10.36.112.60]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 933EC6FECE; Thu, 20 Dec 2018 09:27:49 +0000 (UTC) From: Maxime Coquelin To: Tiwei Bie Cc: dev@dpdk.org, i.maximets@samsung.com, zhihong.wang@intel.com, jfreiman@redhat.com, mst@redhat.com References: <20181219092952.25728-1-maxime.coquelin@redhat.com> <20181220044446.GB21484@dpdk-tbie.sh.intel.com> Message-ID: <543cf4ff-4712-da50-8a26-d51d6dfaa8d7@redhat.com> Date: Thu, 20 Dec 2018 10:27:46 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Thu, 20 Dec 2018 09:27:53 +0000 (UTC) Subject: Re: [dpdk-dev] [PATCH v2] vhost: batch used descs chains write-back with packed ring X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Dec 2018 09:27:54 -0000 On 12/20/18 9:49 AM, Maxime Coquelin wrote: > > > On 12/20/18 5:44 AM, Tiwei Bie wrote: >> On Wed, Dec 19, 2018 at 10:29:52AM +0100, Maxime Coquelin wrote: >>> Instead of writing back descriptors chains in order, let's >>> write the first chain flags last in order to improve batching. >>> >>> With Kernel's pktgen benchmark, ~3% performance gain is measured. >>> >>> Signed-off-by: Maxime Coquelin >>> --- >>> >>> V2: >>> Revert back to initial implementation to have a write >>> barrier before every descs flags store, but still >>> store first desc flags last. (Missing barrier reported >>> by Ilya) >>> >>> >>>   lib/librte_vhost/virtio_net.c | 19 ++++++++++++++++--- >>>   1 file changed, 16 insertions(+), 3 deletions(-) >>> >>> diff --git a/lib/librte_vhost/virtio_net.c >>> b/lib/librte_vhost/virtio_net.c >>> index 8c657a101..de436af79 100644 >>> --- a/lib/librte_vhost/virtio_net.c >>> +++ b/lib/librte_vhost/virtio_net.c >>> @@ -97,6 +97,8 @@ flush_shadow_used_ring_packed(struct virtio_net *dev, >>>   { >>>       int i; >>>       uint16_t used_idx = vq->last_used_idx; >>> +    uint16_t head_idx = vq->last_used_idx; >>> +    uint16_t head_flags = 0; >>>       /* Split loop in two to save memory barriers */ >>>       for (i = 0; i < vq->shadow_used_idx; i++) { >>> @@ -126,12 +128,17 @@ flush_shadow_used_ring_packed(struct virtio_net >>> *dev, >>>               flags &= ~VRING_DESC_F_AVAIL; >>>           } >>> -        vq->desc_packed[vq->last_used_idx].flags = flags; >>> +        if (i > 0) { >>> +            vq->desc_packed[vq->last_used_idx].flags = flags; >>> -        vhost_log_cache_used_vring(dev, vq, >>> +            vhost_log_cache_used_vring(dev, vq, >>>                       vq->last_used_idx * >>>                       sizeof(struct vring_packed_desc), >>>                       sizeof(struct vring_packed_desc)); >>> +        } else { >>> +            head_idx = vq->last_used_idx; >>> +            head_flags = flags; >>> +        } >>>           vq->last_used_idx += vq->shadow_used_packed[i].count; >>>           if (vq->last_used_idx >= vq->size) { >>> @@ -140,7 +147,13 @@ flush_shadow_used_ring_packed(struct virtio_net >>> *dev, >>>           } >>>       } >>> -    rte_smp_wmb(); >>> +    vq->desc_packed[head_idx].flags = head_flags; >>> + >>> +    vhost_log_cache_used_vring(dev, vq, >>> +                vq->last_used_idx * >> >> Should be head_idx. > > Oh yes, thanks for spotting this. > >> >>> +                sizeof(struct vring_packed_desc), >>> +                sizeof(struct vring_packed_desc)); >>> + >>>       vq->shadow_used_idx = 0; >> >> A wmb() is needed before log_cache_sync? > > I think you're right, I was wrong but thought we had a barrier in cache > sync function. > That's not very important for x86, but I think it should be preferable > to do it in vhost_log_cache_sync(), if logging is enabled. > > What do you think? I'll keep it in this function for now, as I think we cannot remove the one in the split variant so it would mean having two barriers in that case. >>>       vhost_log_cache_sync(dev, vq); >>>   } >>> -- >>> 2.17.2 >>>