From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 483B358D8 for ; Tue, 8 Sep 2015 17:52:42 +0200 (CEST) Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga101.jf.intel.com with ESMTP; 08 Sep 2015 08:52:40 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.17,490,1437462000"; d="scan'208";a="557639516" Received: from pgsmsx106.gar.corp.intel.com ([10.221.44.98]) by FMSMGA003.fm.intel.com with ESMTP; 08 Sep 2015 08:52:37 -0700 Received: from shsmsx151.ccr.corp.intel.com (10.239.6.50) by PGSMSX106.gar.corp.intel.com (10.221.44.98) with Microsoft SMTP Server (TLS) id 14.3.224.2; Tue, 8 Sep 2015 23:52:36 +0800 Received: from shsmsx101.ccr.corp.intel.com ([169.254.1.171]) by SHSMSX151.ccr.corp.intel.com ([169.254.3.101]) with mapi id 14.03.0224.002; Tue, 8 Sep 2015 23:52:35 +0800 From: "Xie, Huawei" To: Stephen Hemminger Thread-Topic: [dpdk-dev] virtio optimization idea Thread-Index: AdDm6zPdM5XrIXmIQz2JKkVwrZfYFQ== Date: Tue, 8 Sep 2015 15:52:35 +0000 Message-ID: References: <20150908083926.3f2f409f@urahara> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "dev@dpdk.org" , "ms >> Michael S. Tsirkin" Subject: Re: [dpdk-dev] virtio optimization idea X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Sep 2015 15:52:43 -0000 On 9/8/2015 11:39 PM, Stephen Hemminger wrote:=0A= > On Fri, 4 Sep 2015 08:25:05 +0000=0A= > "Xie, Huawei" wrote:=0A= >=0A= >> Hi:=0A= >>=0A= >> Recently I have done one virtio optimization proof of concept. The=0A= >> optimization includes two parts:=0A= >> 1) avail ring set with fixed descriptors=0A= >> 2) RX vectorization=0A= >> With the optimizations, we could have several times of performance boost= =0A= >> for purely vhost-virtio throughput.=0A= >>=0A= >> Here i will only cover the first part, which is the prerequisite for the= =0A= >> second part.=0A= >> Let us first take RX for example. Currently when we fill the avail ring= =0A= >> with guest mbuf, we need=0A= >> a) allocate one descriptor(for non sg mbuf) from free descriptors=0A= >> b) set the idx of the desc into the entry of avail ring=0A= >> c) set the addr/len field of the descriptor to point to guest blank mbuf= =0A= >> data area=0A= >>=0A= >> Those operation takes time, and especially step b results in modifed (M)= =0A= >> state of the cache line for the avail ring in the virtio processing=0A= >> core. When vhost processes the avail ring, the cache line transfer from= =0A= >> virtio processing core to vhost processing core takes pretty much CPU=0A= >> cycles.=0A= >> To solve this problem, this is the arrangement of RX ring for DPDK=0A= >> pmd(for non-mergable case).=0A= >> =0A= >> avail =0A= >> idx =0A= >> + =0A= >> | =0A= >> +----+----+---+-------------+------+ =0A= >> | 0 | 1 | 2 | ... | 254 | 255 | avail ring=0A= >> +-+--+-+--+-+-+---------+---+--+---+ =0A= >> | | | | | | =0A= >> | | | | | | =0A= >> v v v | v v =0A= >> +-+--+-+--+-+-+---------+---+--+---+ =0A= >> | 0 | 1 | 2 | ... | 254 | 255 | desc ring=0A= >> +----+----+---+-------------+------+ =0A= >> | =0A= >> | =0A= >> +----+----+---+-------------+------+ =0A= >> | 0 | 1 | 2 | | 254 | 255 | used ring=0A= >> +----+----+---+-------------+------+ =0A= >> | =0A= >> + =0A= >> Avail ring is initialized with fixed descriptor and is never changed,=0A= >> i.e, the index value of the nth avail ring entry is always n, which=0A= >> means virtio PMD is actually refilling desc ring only, without having to= =0A= >> change avail ring.=0A= >> When vhost fetches avail ring, if not evicted, it is always in its first= =0A= >> level cache.=0A= >>=0A= >> When RX receives packets from used ring, we use the used->idx as the=0A= >> desc idx. This requires that vhost processes and returns descs from=0A= >> avail ring to used ring in order, which is true for both current dpdk=0A= >> vhost and kernel vhost implementation. In my understanding, there is no= =0A= >> necessity for vhost net to process descriptors OOO. One case could be=0A= >> zero copy, for example, if one descriptor doesn't meet zero copy=0A= >> requirment, we could directly return it to used ring, earlier than the= =0A= >> descriptors in front of it.=0A= >> To enforce this, i want to use a reserved bit to indicate in order=0A= >> processing of descriptors.=0A= >>=0A= >> For tx ring, the arrangement is like below. Each transmitted mbuf needs= =0A= >> a desc for virtio_net_hdr, so actually we have only 128 free slots.=0A= >> = =0A= >>=0A= >> =0A= >> ++ =0A= >> =0A= >> || =0A= >> =0A= >> || =0A= >> =0A= >> +-----+-----+-----+--------------+------+------+------+ = =0A= >>=0A= >> | 0 | 1 | ... | 127 || 128 | 129 | ... | 255 | avail ring= =0A= >> with fixed descriptor =0A= >> =0A= >> +--+--+--+--+-----+---+------+---+--+---+------+--+---+ = =0A= >>=0A= >> | | | || | | =0A= >> | =0A= >> v v v || v v =0A= >> v =0A= >> =0A= >> +--+--+--+--+-----+---+------+---+--+---+------+--+---+ = =0A= >>=0A= >> | 127 | 128 | ... | 255 || 127 | 128 | ... | 255 | desc ring= =0A= >> for virtio_net_hdr=0A= >> =0A= >> +--+--+--+--+-----+---+------+---+--+---+------+--+---+ = =0A= >>=0A= >> | | | || | | =0A= >> | =0A= >> v v v || v v =0A= >> v =0A= >> =0A= >> +--+--+--+--+-----+---+------+---+--+---+------+--+---+ = =0A= >>=0A= >> | 0 | 1 | ... | 127 || 0 | 1 | ... | 127 | desc ring= =0A= >> for tx dat =0A= >> =0A= >> +-----+-----+-----+--------------+------+------+------+ = =0A= >>=0A= > Does this still work with Linux (or BSD) guest/host.=0A= > If you are assuming both virtio/vhost are DPDK this is never going=0A= > to be usable.=0A= It works with both dpdk vhost and kernel vhost implementations.=0A= But to enforce this, we had better add a new feature bit.=0A= >=0A= > On a related note, have you looked at getting virtio to support the=0A= > new standard (not legacy) mode?=0A= Yes, we add it to our plan to support virtio 1.0.=0A= >=0A= >=0A= =0A=