From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <huawei.xie@intel.com>
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115])
 by dpdk.org (Postfix) with ESMTP id 164E78D3C
 for <dev@dpdk.org>; Fri,  4 Sep 2015 10:25:11 +0200 (CEST)
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga103.fm.intel.com with ESMTP; 04 Sep 2015 01:25:10 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.17,468,1437462000"; d="scan'208";a="555096012"
Received: from pgsmsx101.gar.corp.intel.com ([10.221.44.78])
 by FMSMGA003.fm.intel.com with ESMTP; 04 Sep 2015 01:25:09 -0700
Received: from shsmsx102.ccr.corp.intel.com (10.239.4.154) by
 PGSMSX101.gar.corp.intel.com (10.221.44.78) with Microsoft SMTP Server (TLS)
 id 14.3.224.2; Fri, 4 Sep 2015 16:25:08 +0800
Received: from shsmsx101.ccr.corp.intel.com ([169.254.1.171]) by
 shsmsx102.ccr.corp.intel.com ([169.254.2.206]) with mapi id 14.03.0224.002;
 Fri, 4 Sep 2015 16:25:06 +0800
From: "Xie, Huawei" <huawei.xie@intel.com>
To: "dev@dpdk.org" <dev@dpdk.org>, Thomas Monjalon
 <thomas.monjalon@6wind.com>, Linhaifeng <haifeng.lin@huawei.com>, "Tetsuya
 Mukawa" <mukawa@igel.co.jp>
Thread-Topic: virtio optimization idea
Thread-Index: AdDm6zPdM5XrIXmIQz2JKkVwrZfYFQ==
Date: Fri, 4 Sep 2015 08:25:05 +0000
Message-ID: <C37D651A908B024F974696C65296B57B2BDB8C06@SHSMSX101.ccr.corp.intel.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [10.239.127.40]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: "ms >> Michael S. Tsirkin" <mst@redhat.com>
Subject: [dpdk-dev] virtio optimization idea
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Fri, 04 Sep 2015 08:25:12 -0000

Hi:=0A=
=0A=
Recently I have done one virtio optimization proof of concept. The=0A=
optimization includes two parts:=0A=
1) avail ring set with fixed descriptors=0A=
2) RX vectorization=0A=
With the optimizations, we could have several times of performance boost=0A=
for purely vhost-virtio throughput.=0A=
=0A=
Here i will only cover the first part, which is the prerequisite for the=0A=
second part.=0A=
Let us first take RX for example. Currently when we fill the avail ring=0A=
with guest mbuf, we need=0A=
a) allocate one descriptor(for non sg mbuf) from free descriptors=0A=
b) set the idx of the desc into the entry of avail ring=0A=
c) set the addr/len field of the descriptor to point to guest blank mbuf=0A=
data area=0A=
=0A=
Those operation takes time, and especially step b results in modifed (M)=0A=
state of the cache line for the avail ring in the virtio processing=0A=
core. When vhost processes the avail ring, the cache line transfer from=0A=
virtio processing core to vhost processing core takes pretty much CPU=0A=
cycles.=0A=
To solve this problem, this is the arrangement of RX ring for DPDK=0A=
pmd(for non-mergable case).=0A=
   =0A=
                    avail                      =0A=
                    idx                        =0A=
                    +                          =0A=
                    |                          =0A=
+----+----+---+-------------+------+           =0A=
| 0  | 1  | 2 | ... |  254  | 255  |  avail ring=0A=
+-+--+-+--+-+-+---------+---+--+---+           =0A=
  |    |    |       |   |      |               =0A=
  |    |    |       |   |      |               =0A=
  v    v    v       |   v      v               =0A=
+-+--+-+--+-+-+---------+---+--+---+           =0A=
| 0  | 1  | 2 | ... |  254  | 255  |  desc ring=0A=
+----+----+---+-------------+------+           =0A=
                    |                          =0A=
                    |                          =0A=
+----+----+---+-------------+------+           =0A=
| 0  | 1  | 2 |     |  254  | 255  |  used ring=0A=
+----+----+---+-------------+------+           =0A=
                    |                          =0A=
                    +    =0A=
Avail ring is initialized with fixed descriptor and is never changed,=0A=
i.e, the index value of the nth avail ring entry is always n, which=0A=
means virtio PMD is actually refilling desc ring only, without having to=0A=
change avail ring.=0A=
When vhost fetches avail ring, if not evicted, it is always in its first=0A=
level cache.=0A=
=0A=
When RX receives packets from used ring, we use the used->idx as the=0A=
desc idx. This requires that vhost processes and returns descs from=0A=
avail ring to used ring in order, which is true for both current dpdk=0A=
vhost and kernel vhost implementation. In my understanding, there is no=0A=
necessity for vhost net to process descriptors OOO. One case could be=0A=
zero copy, for example, if one descriptor doesn't meet zero copy=0A=
requirment, we could directly return it to used ring, earlier than the=0A=
descriptors in front of it.=0A=
To enforce this, i want to use a reserved bit to indicate in order=0A=
processing of descriptors.=0A=
=0A=
For tx ring, the arrangement is like below. Each transmitted mbuf needs=0A=
a desc for virtio_net_hdr, so actually we have only 128 free slots.=0A=
                                                                           =
           =0A=
=0A=
                           =0A=
++                                                          =0A=
                           =0A=
||                                                          =0A=
                           =0A=
||                                                          =0A=
  =0A=
+-----+-----+-----+--------------+------+------+------+                    =
          =0A=
=0A=
   |  0  |  1  | ... |  127 || 128  | 129  | ...  | 255  |   avail ring=0A=
with fixed descriptor                =0A=
  =0A=
+--+--+--+--+-----+---+------+---+--+---+------+--+---+                    =
          =0A=
=0A=
      |     |            |  ||  |      |            =0A=
|                                  =0A=
      v     v            v  ||  v      v            =0A=
v                                  =0A=
  =0A=
+--+--+--+--+-----+---+------+---+--+---+------+--+---+                    =
          =0A=
=0A=
   | 127 | 128 | ... |  255 || 127  | 128  | ...  | 255  |   desc ring=0A=
for virtio_net_hdr=0A=
  =0A=
+--+--+--+--+-----+---+------+---+--+---+------+--+---+                    =
          =0A=
=0A=
      |     |            |  ||  |      |            =0A=
|                                  =0A=
      v     v            v  ||  v      v            =0A=
v                                  =0A=
  =0A=
+--+--+--+--+-----+---+------+---+--+---+------+--+---+                    =
          =0A=
=0A=
   |  0  |  1  | ... |  127 ||  0   |  1   | ...  | 127  |   desc ring=0A=
for tx dat       =0A=
  =0A=
+-----+-----+-----+--------------+------+------+------+                    =
    =0A=
=0A=
=0A=
                     =0A=
/huawei=0A=