From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id 616E73784 for ; Fri, 4 Sep 2015 18:50:45 +0200 (CEST) Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga101.fm.intel.com with ESMTP; 04 Sep 2015 09:50:42 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.17,470,1437462000"; d="scan'208";a="782775254" Received: from pgsmsx103.gar.corp.intel.com ([10.221.44.82]) by fmsmga001.fm.intel.com with ESMTP; 04 Sep 2015 09:50:41 -0700 Received: from shsmsx104.ccr.corp.intel.com (10.239.110.15) by PGSMSX103.gar.corp.intel.com (10.221.44.82) with Microsoft SMTP Server (TLS) id 14.3.224.2; Sat, 5 Sep 2015 00:50:39 +0800 Received: from shsmsx101.ccr.corp.intel.com ([169.254.1.171]) by SHSMSX104.ccr.corp.intel.com ([169.254.5.210]) with mapi id 14.03.0224.002; Sat, 5 Sep 2015 00:50:38 +0800 From: "Xie, Huawei" To: "dev@dpdk.org" , Thomas Monjalon , Linhaifeng , "Tetsuya Mukawa" Thread-Topic: virtio optimization idea Thread-Index: AdDm6zPdM5XrIXmIQz2JKkVwrZfYFQ== Date: Fri, 4 Sep 2015 16:50:37 +0000 Message-ID: References: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "ms >> Michael S. Tsirkin" Subject: Re: [dpdk-dev] virtio optimization idea X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 04 Sep 2015 16:50:46 -0000 There is some format issue with the ascii chart of the tx ring. Update=0A= that chart.=0A= Sorry for the trouble.=0A= =0A= =0A= On 9/4/2015 4:25 PM, Xie, Huawei wrote:=0A= > Hi:=0A= >=0A= > Recently I have done one virtio optimization proof of concept. The=0A= > optimization includes two parts:=0A= > 1) avail ring set with fixed descriptors=0A= > 2) RX vectorization=0A= > With the optimizations, we could have several times of performance boost= =0A= > for purely vhost-virtio throughput.=0A= >=0A= > Here i will only cover the first part, which is the prerequisite for the= =0A= > second part.=0A= > Let us first take RX for example. Currently when we fill the avail ring= =0A= > with guest mbuf, we need=0A= > a) allocate one descriptor(for non sg mbuf) from free descriptors=0A= > b) set the idx of the desc into the entry of avail ring=0A= > c) set the addr/len field of the descriptor to point to guest blank mbuf= =0A= > data area=0A= >=0A= > Those operation takes time, and especially step b results in modifed (M)= =0A= > state of the cache line for the avail ring in the virtio processing=0A= > core. When vhost processes the avail ring, the cache line transfer from= =0A= > virtio processing core to vhost processing core takes pretty much CPU=0A= > cycles.=0A= > To solve this problem, this is the arrangement of RX ring for DPDK=0A= > pmd(for non-mergable case).=0A= > =0A= > avail =0A= > idx =0A= > + =0A= > | =0A= > +----+----+---+-------------+------+ =0A= > | 0 | 1 | 2 | ... | 254 | 255 | avail ring=0A= > +-+--+-+--+-+-+---------+---+--+---+ =0A= > | | | | | | =0A= > | | | | | | =0A= > v v v | v v =0A= > +-+--+-+--+-+-+---------+---+--+---+ =0A= > | 0 | 1 | 2 | ... | 254 | 255 | desc ring=0A= > +----+----+---+-------------+------+ =0A= > | =0A= > | =0A= > +----+----+---+-------------+------+ =0A= > | 0 | 1 | 2 | | 254 | 255 | used ring=0A= > +----+----+---+-------------+------+ =0A= > | =0A= > + =0A= > Avail ring is initialized with fixed descriptor and is never changed,=0A= > i.e, the index value of the nth avail ring entry is always n, which=0A= > means virtio PMD is actually refilling desc ring only, without having to= =0A= > change avail ring.=0A= > When vhost fetches avail ring, if not evicted, it is always in its first= =0A= > level cache.=0A= >=0A= > When RX receives packets from used ring, we use the used->idx as the=0A= > desc idx. This requires that vhost processes and returns descs from=0A= > avail ring to used ring in order, which is true for both current dpdk=0A= > vhost and kernel vhost implementation. In my understanding, there is no= =0A= > necessity for vhost net to process descriptors OOO. One case could be=0A= > zero copy, for example, if one descriptor doesn't meet zero copy=0A= > requirment, we could directly return it to used ring, earlier than the=0A= > descriptors in front of it.=0A= > To enforce this, i want to use a reserved bit to indicate in order=0A= > processing of descriptors.=0A= >=0A= > For tx ring, the arrangement is like below. Each transmitted mbuf needs= =0A= > a desc for virtio_net_hdr, so actually we have only 128 free slots.=0A= > = =0A= >=0A= > =0A= > ++ = =0A= > || = =0A= > || = =0A= > +-----+-----+-----+--------------+------+------+------+ = =0A= > | 0 | 1 | ... | 127 || 128 | 129 | ... | 255 | avail ring = =0A= > +--+--+--+--+-----+---+------+---+--+---+------+--+---+ = =0A= > | | | || | | | = =0A= > v v v || v v v = =0A= > +--+--+--+--+-----+---+------+---+--+---+------+--+---+ = =0A= > | 127 | 128 | ... | 255 || 127 | 128 | ... | 255 | desc ring fo= r virtio_net_hdr=0A= > +--+--+--+--+-----+---+------+---+--+---+------+--+---+ = =0A= > | | | || | | | = =0A= > v v v || v v v = =0A= > +--+--+--+--+-----+---+------+---+--+---+------+--+---+ = =0A= > | 0 | 1 | ... | 127 || 0 | 1 | ... | 127 | desc ring fo= r tx dat =0A= >=0A= >=0A= > =0A= > /huawei=0A= >=0A= =0A=