From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id 3879D56A1 for ; Fri, 3 Jun 2016 10:18:27 +0200 (CEST) Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga104.fm.intel.com with ESMTP; 03 Jun 2016 01:18:26 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.26,411,1459839600"; d="scan'208";a="713251711" Received: from fmsmsx107.amr.corp.intel.com ([10.18.124.205]) by FMSMGA003.fm.intel.com with ESMTP; 03 Jun 2016 01:18:27 -0700 Received: from fmsmsx153.amr.corp.intel.com (10.18.125.6) by fmsmsx107.amr.corp.intel.com (10.18.124.205) with Microsoft SMTP Server (TLS) id 14.3.248.2; Fri, 3 Jun 2016 01:18:26 -0700 Received: from shsmsx102.ccr.corp.intel.com (10.239.4.154) by FMSMSX153.amr.corp.intel.com (10.18.125.6) with Microsoft SMTP Server (TLS) id 14.3.248.2; Fri, 3 Jun 2016 01:18:25 -0700 Received: from shsmsx101.ccr.corp.intel.com ([169.254.1.150]) by shsmsx102.ccr.corp.intel.com ([169.254.2.220]) with mapi id 14.03.0248.002; Fri, 3 Jun 2016 16:18:23 +0800 From: "Xie, Huawei" To: Yuanhan Liu CC: "dev@dpdk.org" , "Michael S. Tsirkin" Thread-Topic: [PATCH 1/3] vhost: pre update used ring for Tx and Rx Thread-Index: AdG70IO5lAli24e+RZC4jHIGzfzDoA== Date: Fri, 3 Jun 2016 08:18:23 +0000 Message-ID: References: <1462236378-7604-1-git-send-email-yuanhan.liu@linux.intel.com> <1462236378-7604-2-git-send-email-yuanhan.liu@linux.intel.com> <20160601065557.GB10038@yliu-dev.sh.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH 1/3] vhost: pre update used ring for Tx and Rx X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 03 Jun 2016 08:18:27 -0000 On 6/1/2016 2:53 PM, Yuanhan Liu wrote:=0A= > On Wed, Jun 01, 2016 at 06:40:41AM +0000, Xie, Huawei wrote:=0A= >>> /* Retrieve all of the head indexes first to avoid caching issues. */= =0A= >>> for (i =3D 0; i < count; i++) {=0A= >>> - desc_indexes[i] =3D vq->avail->ring[(vq->last_used_idx + i) &=0A= >>> - (vq->size - 1)];=0A= >>> + used_idx =3D (vq->last_used_idx + i) & (vq->size - 1);=0A= >>> + desc_indexes[i] =3D vq->avail->ring[used_idx];=0A= >>> +=0A= >>> + vq->used->ring[used_idx].id =3D desc_indexes[i];=0A= >>> + vq->used->ring[used_idx].len =3D 0;=0A= >>> + vhost_log_used_vring(dev, vq,=0A= >>> + offsetof(struct vring_used, ring[used_idx]),=0A= >>> + sizeof(vq->used->ring[used_idx]));=0A= >>> }=0A= >>> =0A= >>> /* Prefetch descriptor index. */=0A= >>> rte_prefetch0(&vq->desc[desc_indexes[0]]);=0A= >>> - rte_prefetch0(&vq->used->ring[vq->last_used_idx & (vq->size - 1)]);= =0A= >>> -=0A= >>> for (i =3D 0; i < count; i++) {=0A= >>> int err;=0A= >>> =0A= >>> - if (likely(i + 1 < count)) {=0A= >>> + if (likely(i + 1 < count))=0A= >>> rte_prefetch0(&vq->desc[desc_indexes[i + 1]]);=0A= >>> - rte_prefetch0(&vq->used->ring[(used_idx + 1) &=0A= >>> - (vq->size - 1)]);=0A= >>> - }=0A= >>> =0A= >>> pkts[i] =3D rte_pktmbuf_alloc(mbuf_pool);=0A= >>> if (unlikely(pkts[i] =3D=3D NULL)) {=0A= >>> @@ -916,18 +920,12 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_i= d,=0A= >>> rte_pktmbuf_free(pkts[i]);=0A= >>> break;=0A= >>> }=0A= >>> -=0A= >>> - used_idx =3D vq->last_used_idx++ & (vq->size - 1);=0A= >>> - vq->used->ring[used_idx].id =3D desc_indexes[i];=0A= >>> - vq->used->ring[used_idx].len =3D 0;=0A= >>> - vhost_log_used_vring(dev, vq,=0A= >>> - offsetof(struct vring_used, ring[used_idx]),=0A= >>> - sizeof(vq->used->ring[used_idx]));=0A= >>> }=0A= >> Had tried post-updating used ring in batch, but forget the perf change.= =0A= > I would assume pre-updating gives better performance gain, as we are=0A= > fiddling with avail and used ring together, which would be more cache=0A= > friendly.=0A= =0A= The distance between entry for avail ring and used ring are at least 8=0A= cache lines.=0A= The benefit comes from batch updates, if applicable.=0A= =0A= >=0A= >> One optimization would be on vhost_log_used_ring.=0A= >> I have two ideas,=0A= >> a) In QEMU side, we always assume use ring will be changed. so that we= =0A= >> don't need to log used ring in VHOST.=0A= >>=0A= >> Michael: feasible in QEMU? comments on this?=0A= >>=0A= >> b) We could always mark the total used ring modified rather than entry= =0A= >> by entry.=0A= > I doubt it's worthwhile. One fact is that vhost_log_used_ring is=0A= > a non operation in most time: it will take action only in the short=0A= > gap of during live migration.=0A= >=0A= > And FYI, I even tried with all vhost_log_xxx being removed, it showed=0A= > no performance boost at all. Therefore, it's not a factor that will=0A= > impact performance.=0A= =0A= I knew this.=0A= =0A= > --yliu=0A= >=0A= =0A=