From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <huawei.xie@intel.com>
Received: from mga02.intel.com (mga02.intel.com [134.134.136.20])
 by dpdk.org (Postfix) with ESMTP id 483B358D8
 for <dev@dpdk.org>; Tue,  8 Sep 2015 17:52:42 +0200 (CEST)
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by orsmga101.jf.intel.com with ESMTP; 08 Sep 2015 08:52:40 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.17,490,1437462000"; d="scan'208";a="557639516"
Received: from pgsmsx106.gar.corp.intel.com ([10.221.44.98])
 by FMSMGA003.fm.intel.com with ESMTP; 08 Sep 2015 08:52:37 -0700
Received: from shsmsx151.ccr.corp.intel.com (10.239.6.50) by
 PGSMSX106.gar.corp.intel.com (10.221.44.98) with Microsoft SMTP Server (TLS)
 id 14.3.224.2; Tue, 8 Sep 2015 23:52:36 +0800
Received: from shsmsx101.ccr.corp.intel.com ([169.254.1.171]) by
 SHSMSX151.ccr.corp.intel.com ([169.254.3.101]) with mapi id 14.03.0224.002;
 Tue, 8 Sep 2015 23:52:35 +0800
From: "Xie, Huawei" <huawei.xie@intel.com>
To: Stephen Hemminger <stephen@networkplumber.org>
Thread-Topic: [dpdk-dev] virtio optimization idea
Thread-Index: AdDm6zPdM5XrIXmIQz2JKkVwrZfYFQ==
Date: Tue, 8 Sep 2015 15:52:35 +0000
Message-ID: <C37D651A908B024F974696C65296B57B2BDC0872@SHSMSX101.ccr.corp.intel.com>
References: <C37D651A908B024F974696C65296B57B2BDB8C06@SHSMSX101.ccr.corp.intel.com>
 <20150908083926.3f2f409f@urahara>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [10.239.127.40]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: "dev@dpdk.org" <dev@dpdk.org>, "ms >> Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: [dpdk-dev] virtio optimization idea
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Tue, 08 Sep 2015 15:52:43 -0000

On 9/8/2015 11:39 PM, Stephen Hemminger wrote:=0A=
> On Fri, 4 Sep 2015 08:25:05 +0000=0A=
> "Xie, Huawei" <huawei.xie@intel.com> wrote:=0A=
>=0A=
>> Hi:=0A=
>>=0A=
>> Recently I have done one virtio optimization proof of concept. The=0A=
>> optimization includes two parts:=0A=
>> 1) avail ring set with fixed descriptors=0A=
>> 2) RX vectorization=0A=
>> With the optimizations, we could have several times of performance boost=
=0A=
>> for purely vhost-virtio throughput.=0A=
>>=0A=
>> Here i will only cover the first part, which is the prerequisite for the=
=0A=
>> second part.=0A=
>> Let us first take RX for example. Currently when we fill the avail ring=
=0A=
>> with guest mbuf, we need=0A=
>> a) allocate one descriptor(for non sg mbuf) from free descriptors=0A=
>> b) set the idx of the desc into the entry of avail ring=0A=
>> c) set the addr/len field of the descriptor to point to guest blank mbuf=
=0A=
>> data area=0A=
>>=0A=
>> Those operation takes time, and especially step b results in modifed (M)=
=0A=
>> state of the cache line for the avail ring in the virtio processing=0A=
>> core. When vhost processes the avail ring, the cache line transfer from=
=0A=
>> virtio processing core to vhost processing core takes pretty much CPU=0A=
>> cycles.=0A=
>> To solve this problem, this is the arrangement of RX ring for DPDK=0A=
>> pmd(for non-mergable case).=0A=
>>    =0A=
>>                     avail                      =0A=
>>                     idx                        =0A=
>>                     +                          =0A=
>>                     |                          =0A=
>> +----+----+---+-------------+------+           =0A=
>> | 0  | 1  | 2 | ... |  254  | 255  |  avail ring=0A=
>> +-+--+-+--+-+-+---------+---+--+---+           =0A=
>>   |    |    |       |   |      |               =0A=
>>   |    |    |       |   |      |               =0A=
>>   v    v    v       |   v      v               =0A=
>> +-+--+-+--+-+-+---------+---+--+---+           =0A=
>> | 0  | 1  | 2 | ... |  254  | 255  |  desc ring=0A=
>> +----+----+---+-------------+------+           =0A=
>>                     |                          =0A=
>>                     |                          =0A=
>> +----+----+---+-------------+------+           =0A=
>> | 0  | 1  | 2 |     |  254  | 255  |  used ring=0A=
>> +----+----+---+-------------+------+           =0A=
>>                     |                          =0A=
>>                     +    =0A=
>> Avail ring is initialized with fixed descriptor and is never changed,=0A=
>> i.e, the index value of the nth avail ring entry is always n, which=0A=
>> means virtio PMD is actually refilling desc ring only, without having to=
=0A=
>> change avail ring.=0A=
>> When vhost fetches avail ring, if not evicted, it is always in its first=
=0A=
>> level cache.=0A=
>>=0A=
>> When RX receives packets from used ring, we use the used->idx as the=0A=
>> desc idx. This requires that vhost processes and returns descs from=0A=
>> avail ring to used ring in order, which is true for both current dpdk=0A=
>> vhost and kernel vhost implementation. In my understanding, there is no=
=0A=
>> necessity for vhost net to process descriptors OOO. One case could be=0A=
>> zero copy, for example, if one descriptor doesn't meet zero copy=0A=
>> requirment, we could directly return it to used ring, earlier than the=
=0A=
>> descriptors in front of it.=0A=
>> To enforce this, i want to use a reserved bit to indicate in order=0A=
>> processing of descriptors.=0A=
>>=0A=
>> For tx ring, the arrangement is like below. Each transmitted mbuf needs=
=0A=
>> a desc for virtio_net_hdr, so actually we have only 128 free slots.=0A=
>>                                                                         =
              =0A=
>>=0A=
>>                            =0A=
>> ++                                                          =0A=
>>                            =0A=
>> ||                                                          =0A=
>>                            =0A=
>> ||                                                          =0A=
>>   =0A=
>> +-----+-----+-----+--------------+------+------+------+                 =
             =0A=
>>=0A=
>>    |  0  |  1  | ... |  127 || 128  | 129  | ...  | 255  |   avail ring=
=0A=
>> with fixed descriptor                =0A=
>>   =0A=
>> +--+--+--+--+-----+---+------+---+--+---+------+--+---+                 =
             =0A=
>>=0A=
>>       |     |            |  ||  |      |            =0A=
>> |                                  =0A=
>>       v     v            v  ||  v      v            =0A=
>> v                                  =0A=
>>   =0A=
>> +--+--+--+--+-----+---+------+---+--+---+------+--+---+                 =
             =0A=
>>=0A=
>>    | 127 | 128 | ... |  255 || 127  | 128  | ...  | 255  |   desc ring=
=0A=
>> for virtio_net_hdr=0A=
>>   =0A=
>> +--+--+--+--+-----+---+------+---+--+---+------+--+---+                 =
             =0A=
>>=0A=
>>       |     |            |  ||  |      |            =0A=
>> |                                  =0A=
>>       v     v            v  ||  v      v            =0A=
>> v                                  =0A=
>>   =0A=
>> +--+--+--+--+-----+---+------+---+--+---+------+--+---+                 =
             =0A=
>>=0A=
>>    |  0  |  1  | ... |  127 ||  0   |  1   | ...  | 127  |   desc ring=
=0A=
>> for tx dat       =0A=
>>   =0A=
>> +-----+-----+-----+--------------+------+------+------+                 =
       =0A=
>>=0A=
> Does this still work with Linux (or BSD) guest/host.=0A=
> If you are assuming both virtio/vhost are DPDK this is never going=0A=
> to be usable.=0A=
It works with both dpdk vhost and kernel vhost implementations.=0A=
But to enforce this, we had better add a new feature bit.=0A=
>=0A=
> On a related note, have you looked at getting virtio to support the=0A=
> new standard (not legacy) mode?=0A=
Yes, we add it to our plan to support virtio 1.0.=0A=
>=0A=
>=0A=
=0A=