From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 9D7138DB3 for ; Tue, 8 Sep 2015 11:42:33 +0200 (CEST) Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga102.jf.intel.com with ESMTP; 08 Sep 2015 02:42:32 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.17,489,1437462000"; d="scan'208";a="799923304" Received: from pgsmsx105.gar.corp.intel.com ([10.221.44.96]) by orsmga002.jf.intel.com with ESMTP; 08 Sep 2015 02:42:30 -0700 Received: from shsmsx103.ccr.corp.intel.com (10.239.4.69) by PGSMSX105.gar.corp.intel.com (10.221.44.96) with Microsoft SMTP Server (TLS) id 14.3.224.2; Tue, 8 Sep 2015 17:42:29 +0800 Received: from shsmsx101.ccr.corp.intel.com ([169.254.1.171]) by SHSMSX103.ccr.corp.intel.com ([169.254.4.248]) with mapi id 14.03.0224.002; Tue, 8 Sep 2015 17:42:28 +0800 From: "Xie, Huawei" To: Tetsuya Mukawa , "dev@dpdk.org" , Thomas Monjalon , Linhaifeng Thread-Topic: virtio optimization idea Thread-Index: AdDm6zPdM5XrIXmIQz2JKkVwrZfYFQ== Date: Tue, 8 Sep 2015 09:42:27 +0000 Message-ID: References: <55EE9A75.7020306@igel.co.jp> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "ms >> Michael S. Tsirkin" Subject: Re: [dpdk-dev] virtio optimization idea X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Sep 2015 09:42:34 -0000 On 9/8/2015 4:21 PM, Tetsuya Mukawa wrote:=0A= > On 2015/09/05 1:50, Xie, Huawei wrote:=0A= >> There is some format issue with the ascii chart of the tx ring. Update= =0A= >> that chart.=0A= >> Sorry for the trouble.=0A= > Hi XIe,=0A= >=0A= > Thanks for sharing a way to optimize virtio.=0A= > I have a few questions.=0A= >=0A= >> On 9/4/2015 4:25 PM, Xie, Huawei wrote:=0A= >>> Hi:=0A= >>>=0A= >>> Recently I have done one virtio optimization proof of concept. The=0A= >>> optimization includes two parts:=0A= >>> 1) avail ring set with fixed descriptors=0A= >>> 2) RX vectorization=0A= >>> With the optimizations, we could have several times of performance boos= t=0A= >>> for purely vhost-virtio throughput.=0A= > When you check performance, have you optimized only virtio-net driver?=0A= > If so, can we optimize vhost backend(librte_vhost) also using your=0A= > optimization way?=0A= =0A= We could do some optimization to vhost based on the same vring layout,=0A= but as vhost needs to support legacy virtio as well, it couldn't make=0A= this assumption.=0A= >>> Here i will only cover the first part, which is the prerequisite for th= e=0A= >>> second part.=0A= >>> Let us first take RX for example. Currently when we fill the avail ring= =0A= >>> with guest mbuf, we need=0A= >>> a) allocate one descriptor(for non sg mbuf) from free descriptors=0A= >>> b) set the idx of the desc into the entry of avail ring=0A= >>> c) set the addr/len field of the descriptor to point to guest blank mbu= f=0A= >>> data area=0A= >>>=0A= >>> Those operation takes time, and especially step b results in modifed (M= )=0A= >>> state of the cache line for the avail ring in the virtio processing=0A= >>> core. When vhost processes the avail ring, the cache line transfer from= =0A= >>> virtio processing core to vhost processing core takes pretty much CPU= =0A= >>> cycles.=0A= >>> To solve this problem, this is the arrangement of RX ring for DPDK=0A= >>> pmd(for non-mergable case).=0A= >>> =0A= >>> avail =0A= >>> idx =0A= >>> + =0A= >>> | =0A= >>> +----+----+---+-------------+------+ =0A= >>> | 0 | 1 | 2 | ... | 254 | 255 | avail ring=0A= >>> +-+--+-+--+-+-+---------+---+--+---+ =0A= >>> | | | | | | =0A= >>> | | | | | | =0A= >>> v v v | v v =0A= >>> +-+--+-+--+-+-+---------+---+--+---+ =0A= >>> | 0 | 1 | 2 | ... | 254 | 255 | desc ring=0A= >>> +----+----+---+-------------+------+ =0A= >>> | =0A= >>> | =0A= >>> +----+----+---+-------------+------+ =0A= >>> | 0 | 1 | 2 | | 254 | 255 | used ring=0A= >>> +----+----+---+-------------+------+ =0A= >>> | =0A= >>> + =0A= >>> Avail ring is initialized with fixed descriptor and is never changed,= =0A= >>> i.e, the index value of the nth avail ring entry is always n, which=0A= >>> means virtio PMD is actually refilling desc ring only, without having t= o=0A= >>> change avail ring.=0A= > For example, avail ring is like below.=0A= > struct vring_avail {=0A= > uint16_t flags;=0A= > uint16_t idx;=0A= > uint16_t ring[QUEUE_SIZE];=0A= > };=0A= >=0A= > My understanding is that virtio-net driver still needs to change=0A= > avail_ring.idx, but don't need to change avail_ring.ring[].=0A= > Is this correct?=0A= =0A= Yes, avail ring is initialized once and never gets updated. It is like=0A= virtio frontend is only using descriptor ring.=0A= >=0A= > Tetsuya=0A= >=0A= >>> When vhost fetches avail ring, if not evicted, it is always in its firs= t=0A= >>> level cache.=0A= >>>=0A= >>> When RX receives packets from used ring, we use the used->idx as the=0A= >>> desc idx. This requires that vhost processes and returns descs from=0A= >>> avail ring to used ring in order, which is true for both current dpdk= =0A= >>> vhost and kernel vhost implementation. In my understanding, there is no= =0A= >>> necessity for vhost net to process descriptors OOO. One case could be= =0A= >>> zero copy, for example, if one descriptor doesn't meet zero copy=0A= >>> requirment, we could directly return it to used ring, earlier than the= =0A= >>> descriptors in front of it.=0A= >>> To enforce this, i want to use a reserved bit to indicate in order=0A= >>> processing of descriptors.=0A= >>>=0A= >>> For tx ring, the arrangement is like below. Each transmitted mbuf needs= =0A= >>> a desc for virtio_net_hdr, so actually we have only 128 free slots.=0A= >>> = =0A= >>>=0A= >>> =0A= >>> ++ = =0A= >>> || = =0A= >>> || = =0A= >>> +-----+-----+-----+--------------+------+------+------+ = =0A= >>> | 0 | 1 | ... | 127 || 128 | 129 | ... | 255 | avail ring= =0A= >>> +--+--+--+--+-----+---+------+---+--+---+------+--+---+ = =0A= >>> | | | || | | | = =0A= >>> v v v || v v v = =0A= >>> +--+--+--+--+-----+---+------+---+--+---+------+--+---+ = =0A= >>> | 127 | 128 | ... | 255 || 127 | 128 | ... | 255 | desc ring = for virtio_net_hdr=0A= >>> +--+--+--+--+-----+---+------+---+--+---+------+--+---+ = =0A= >>> | | | || | | | = =0A= >>> v v v || v v v = =0A= >>> +--+--+--+--+-----+---+------+---+--+---+------+--+---+ = =0A= >>> | 0 | 1 | ... | 127 || 0 | 1 | ... | 127 | desc ring = for tx dat =0A= >>>=0A= >>>=0A= >>> =0A= >>> /huawei=0A= >>>=0A= >=0A= =0A=