From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <hxie5@shecgisg003.sh.intel.com>
Received: from mga02.intel.com (mga02.intel.com [134.134.136.20])
 by dpdk.org (Postfix) with ESMTP id 339215936
 for <dev@dpdk.org>; Sun, 18 Oct 2015 08:28:45 +0200 (CEST)
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by orsmga101.jf.intel.com with ESMTP; 17 Oct 2015 23:28:45 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.17,696,1437462000"; d="scan'208";a="583064446"
Received: from shvmail01.sh.intel.com ([10.239.29.42])
 by FMSMGA003.fm.intel.com with ESMTP; 17 Oct 2015 23:28:43 -0700
Received: from shecgisg003.sh.intel.com (shecgisg003.sh.intel.com
 [10.239.29.90])
 by shvmail01.sh.intel.com with ESMTP id t9I6SgPq023384
 for <dev@dpdk.org>; Sun, 18 Oct 2015 14:28:42 +0800
Received: from shecgisg003.sh.intel.com (localhost [127.0.0.1])
 by shecgisg003.sh.intel.com (8.13.6/8.13.6/SuSE Linux 0.8) with ESMTP id
 t9I6Senv003210 for <dev@dpdk.org>; Sun, 18 Oct 2015 14:28:42 +0800
Received: (from hxie5@localhost)
 by shecgisg003.sh.intel.com (8.13.6/8.13.6/Submit) id t9I6SeVO003206
 for dev@dpdk.org; Sun, 18 Oct 2015 14:28:40 +0800
From: Huawei Xie <huawei.xie@intel.com>
To: dev@dpdk.org
Date: Sun, 18 Oct 2015 14:28:25 +0800
Message-Id: <1445149720-3172-1-git-send-email-huawei.xie@intel.com>
X-Mailer: git-send-email 1.7.4.1
In-Reply-To: <1443537953-23917-1-git-send-email-huawei.xie@intel.com>
References: <1443537953-23917-1-git-send-email-huawei.xie@intel.com>
Subject: [dpdk-dev] [PATCH v2 0/7] virtio ring layout optimization and
	simple rx/tx processing
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Sun, 18 Oct 2015 06:28:45 -0000

In DPDK based switching enviroment, mostly vhost runs on a dedicated core
while virtio processing in guest VMs runs on different cores.
Take RX for example, with generic implementation, for each guest buffer,
a) virtio driver allocates a descriptor from free descriptor list
b) modify the entry of avail ring to point to allocated descriptor
c) after packet is received, free the descriptor

When vhost fetches the avail ring, it needs to fetch the modified L1 cache from
virtio core, which is a heavy cost in current CPU implementation.

This idea of this optimization is:
    allocate the fixed descriptor for each entry of avail ring.
and avail ring will always be the same during the run.
This removes L1 cache transfer from virtio core to vhost core for avail ring.
Besides, no descriptor free and allocation is needed.

Most importantly, this makes vector procesing possible to further accelerate
the processing.

This is the layout for the avail ring(take 256 ring entries for example), with
each entry pointing to the descriptor with the same index.
                    avail
                    idx
                    +
                    |
+----+----+---+-------------+------+
| 0  | 1  | 2 | ... |  254  | 255  |  avail ring
+-+--+-+--+-+-+---------+---+--+---+
  |    |    |       |   |      |
  |    |    |       |   |      |
  v    v    v       |   v      v
+-+--+-+--+-+-+---------+---+--+---+
| 0  | 1  | 2 | ... |  254  | 255  |  desc ring
+----+----+---+-------------+------+
                    |
                    |
+----+----+---+-------------+------+
| 0  | 1  | 2 |     |  254  | 255  |  used ring
+----+----+---+-------------+------+
                    |
                    +

This is the ring layout for TX.
As we need one virtio header for each xmit packet, we have 128 slots available.

                         ++
                         ||
                         ||
+-----+-----+-----+--------------+------+------+------+
|  0  |  1  | ... |  127 || 128  | 129  | ...  | 255  |   avail ring
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
   |     |            |  ||  |      |             |
   v     v            v  ||  v      v             v
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
| 127 | 128 | ... |  255 || 127  | 128  | ...  | 255  |   desc ring for virtio_net_hdr
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
   |     |            |  ||  |      |             |
   v     v            v  ||  v      v             v
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
|  0  |  1  | ... |  127 ||  0   |  1   | ...  | 127  |   desc ring for tx dat
+-----+-----+-----+--------------+------+------+------+
                         ||
                         ||
                         ++

Performance boost could be observed only if the virtio backend isn't the bottleneck or in VM2VM
case.
There are also several vhost optimization patches to be submitted later.

Changes in v2:
- Remove the configure macro
- Enable simple R/TX processing when user specifies simple txq flags
- Reword some comments and commit messages

Huawei Xie (7):
  virtio: add virtio_rxtx.h header file
  virtio: add software rx ring, fake_buf into virtqueue
  virtio: rx/tx ring layout optimization
  virtio: fill RX avail ring with blank mbufs
  virtio: virtio vec rx
  virtio: simple tx routine
  virtio: pick simple rx/tx func

 drivers/net/virtio/Makefile             |   2 +-
 drivers/net/virtio/virtio_ethdev.c      |  13 ++
 drivers/net/virtio/virtio_ethdev.h      |   5 +
 drivers/net/virtio/virtio_rxtx.c        |  53 ++++-
 drivers/net/virtio/virtio_rxtx.h        |  39 ++++
 drivers/net/virtio/virtio_rxtx_simple.c | 403 ++++++++++++++++++++++++++++++++
 drivers/net/virtio/virtqueue.h          |   5 +
 7 files changed, 517 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/virtio/virtio_rxtx.h
 create mode 100644 drivers/net/virtio/virtio_rxtx_simple.c

-- 
1.8.1.4