From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id CF475569A for ; Thu, 17 Sep 2015 17:41:41 +0200 (CEST) Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga101.fm.intel.com with ESMTP; 17 Sep 2015 08:41:40 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.17,547,1437462000"; d="scan'208";a="646869626" Received: from kmsmsx152.gar.corp.intel.com ([172.21.73.87]) by orsmga003.jf.intel.com with ESMTP; 17 Sep 2015 08:41:39 -0700 Received: from shsmsx102.ccr.corp.intel.com (10.239.4.154) by KMSMSX152.gar.corp.intel.com (172.21.73.87) with Microsoft SMTP Server (TLS) id 14.3.224.2; Thu, 17 Sep 2015 23:41:37 +0800 Received: from shsmsx101.ccr.corp.intel.com ([169.254.1.75]) by shsmsx102.ccr.corp.intel.com ([169.254.2.179]) with mapi id 14.03.0248.002; Thu, 17 Sep 2015 23:41:36 +0800 From: "Xie, Huawei" To: Stephen Hemminger Thread-Topic: [dpdk-dev] virtio optimization idea Thread-Index: AdDm6zPdM5XrIXmIQz2JKkVwrZfYFQ== Date: Thu, 17 Sep 2015 15:41:36 +0000 Message-ID: References: <20150908083926.3f2f409f@urahara> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "dev@dpdk.org" , "virtualization@lists.linux-foundation.org" , "ms >> Michael S. Tsirkin" Subject: Re: [dpdk-dev] virtio optimization idea X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Sep 2015 15:41:42 -0000 On 9/8/2015 11:54 PM, Xie, Huawei wrote:=0A= > On 9/8/2015 11:39 PM, Stephen Hemminger wrote:=0A= >> On Fri, 4 Sep 2015 08:25:05 +0000=0A= >> "Xie, Huawei" wrote:=0A= >>=0A= >>> Hi:=0A= >>>=0A= >>> Recently I have done one virtio optimization proof of concept. The=0A= >>> optimization includes two parts:=0A= >>> 1) avail ring set with fixed descriptors=0A= >>> 2) RX vectorization=0A= >>> With the optimizations, we could have several times of performance boos= t=0A= >>> for purely vhost-virtio throughput.=0A= >>>=0A= >>> Here i will only cover the first part, which is the prerequisite for th= e=0A= >>> second part.=0A= >>> Let us first take RX for example. Currently when we fill the avail ring= =0A= >>> with guest mbuf, we need=0A= >>> a) allocate one descriptor(for non sg mbuf) from free descriptors=0A= >>> b) set the idx of the desc into the entry of avail ring=0A= >>> c) set the addr/len field of the descriptor to point to guest blank mbu= f=0A= >>> data area=0A= >>>=0A= >>> Those operation takes time, and especially step b results in modifed (M= )=0A= >>> state of the cache line for the avail ring in the virtio processing=0A= >>> core. When vhost processes the avail ring, the cache line transfer from= =0A= >>> virtio processing core to vhost processing core takes pretty much CPU= =0A= >>> cycles.=0A= >>> To solve this problem, this is the arrangement of RX ring for DPDK=0A= >>> pmd(for non-mergable case).=0A= >>> =0A= >>> avail =0A= >>> idx =0A= >>> + =0A= >>> | =0A= >>> +----+----+---+-------------+------+ =0A= >>> | 0 | 1 | 2 | ... | 254 | 255 | avail ring=0A= >>> +-+--+-+--+-+-+---------+---+--+---+ =0A= >>> | | | | | | =0A= >>> | | | | | | =0A= >>> v v v | v v =0A= >>> +-+--+-+--+-+-+---------+---+--+---+ =0A= >>> | 0 | 1 | 2 | ... | 254 | 255 | desc ring=0A= >>> +----+----+---+-------------+------+ =0A= >>> | =0A= >>> | =0A= >>> +----+----+---+-------------+------+ =0A= >>> | 0 | 1 | 2 | | 254 | 255 | used ring=0A= >>> +----+----+---+-------------+------+ =0A= >>> | =0A= >>> + =0A= >>> Avail ring is initialized with fixed descriptor and is never changed,= =0A= >>> i.e, the index value of the nth avail ring entry is always n, which=0A= >>> means virtio PMD is actually refilling desc ring only, without having t= o=0A= >>> change avail ring.=0A= >>> When vhost fetches avail ring, if not evicted, it is always in its firs= t=0A= >>> level cache.=0A= >>>=0A= >>> When RX receives packets from used ring, we use the used->idx as the=0A= >>> desc idx. This requires that vhost processes and returns descs from=0A= >>> avail ring to used ring in order, which is true for both current dpdk= =0A= >>> vhost and kernel vhost implementation. In my understanding, there is no= =0A= >>> necessity for vhost net to process descriptors OOO. One case could be= =0A= >>> zero copy, for example, if one descriptor doesn't meet zero copy=0A= >>> requirment, we could directly return it to used ring, earlier than the= =0A= >>> descriptors in front of it.=0A= >>> To enforce this, i want to use a reserved bit to indicate in order=0A= >>> processing of descriptors.=0A= >>>=0A= >>> For tx ring, the arrangement is like below. Each transmitted mbuf needs= =0A= >>> a desc for virtio_net_hdr, so actually we have only 128 free slots.=0A= >>> = =0A= >>>=0A= >>> =0A= =0A= ++ = =0A= || = =0A= || = =0A= +-----+-----+-----+--------------+------+------+------+ = =0A= | 0 | 1 | ... | 127 || 128 | 129 | ... | 255 | avail ring = =0A= +--+--+--+--+-----+---+------+---+--+---+------+--+---+ = =0A= | | | || | | | = =0A= v v v || v v v = =0A= +--+--+--+--+-----+---+------+---+--+---+------+--+---+ = =0A= | 127 | 128 | ... | 255 || 127 | 128 | ... | 255 | desc ring for = virtio_net_hdr=0A= +--+--+--+--+-----+---+------+---+--+---+------+--+---+ = =0A= | | | || | | | = =0A= v v v || v v v = =0A= +--+--+--+--+-----+---+------+---+--+---+------+--+---+ = =0A= | 0 | 1 | ... | 127 || 0 | 1 | ... | 127 | desc ring for = tx dat =0A= =0A= >>> =0A= >>>=0A= >> Does this still work with Linux (or BSD) guest/host.=0A= >> If you are assuming both virtio/vhost are DPDK this is never going=0A= >> to be usable.=0A= > It works with both dpdk vhost and kernel vhost implementations.=0A= > But to enforce this, we had better add a new feature bit.=0A= Hi Stephen, some update about compatibility:=0A= This optimization in theory is compliant with current kernel vhost,=0A= qemu, and dpdk vhost implementations.=0A= Today i run dpdk virtio PMD with qemu and kernel vhost, and it works fine.= =0A= =0A= =0A= >> On a related note, have you looked at getting virtio to support the=0A= >> new standard (not legacy) mode?=0A= > Yes, we add it to our plan to support virtio 1.0.=0A= >>=0A= >=0A= =0A=