DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 0/8] virtio: virtio ring layout optimization and RX vector processing
@ 2015-09-29 14:45 Huawei Xie
  2015-09-29 14:45 ` [dpdk-dev] [PATCH 1/8] virtio: add configure for simple virtio rx/tx Huawei Xie
                   ` (14 more replies)
  0 siblings, 15 replies; 92+ messages in thread
From: Huawei Xie @ 2015-09-29 14:45 UTC (permalink / raw)
  To: dev

Copied some message from patch 4.
In DPDK based switching enviroment, mostly vhost runs on a dedicated core
while virtio processing in guest VMs runs on different cores.
Take RX for example, with generic implementation, for each guest buffer,
a) virtio driver allocates a descriptor from free descriptor list
b) modify the entry of avail ring to point to allocated descriptor
c) after packet is received, free the descriptor

When vhost fetches the avail ring, it needs to fetch the modified L1 cache from
virtio core, which is a heavy cost in current CPU implementation.

This idea of this optimization is:
    allocate the fixed descriptor for each entry of avail ring.
and avail ring will always be the same during the run.
This removes L1 cache transfer from virtio core to vhost core for avail ring.
Besides, no descriptor free and allocation is needed.

Most importantly, this makes vector procesing possible to further accelerate
the processing.

This is the layout for the avail ring(take 256 ring entries for example), with
each entry pointing to the descriptor with the same index.
                    avail
                    idx
                    +
                    |
+----+----+---+-------------+------+
| 0  | 1  | 2 | ... |  254  | 255  |  avail ring
+-+--+-+--+-+-+---------+---+--+---+
  |    |    |       |   |      |
  |    |    |       |   |      |
  v    v    v       |   v      v
+-+--+-+--+-+-+---------+---+--+---+
| 0  | 1  | 2 | ... |  254  | 255  |  desc ring
+----+----+---+-------------+------+
                    |
                    |
+----+----+---+-------------+------+
| 0  | 1  | 2 |     |  254  | 255  |  used ring
+----+----+---+-------------+------+
                    |
                    +

This is the ring layout for TX.
As one virtio header is needed for each xmit packet, we have 128 slots available.

                         ++
                         ||
                         ||
+-----+-----+-----+--------------+------+------+------+
|  0  |  1  | ... |  127 || 128  | 129  | ...  | 255  |   avail ring
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
   |     |            |  ||  |      |             |
   v     v            v  ||  v      v             v
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
| 127 | 128 | ... |  255 || 127  | 128  | ...  | 255  |   desc ring for virtio_net_hdr
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
   |     |            |  ||  |      |             |
   v     v            v  ||  v      v             v
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
|  0  |  1  | ... |  127 ||  0   |  1   | ...  | 127  |   desc ring for tx dat
+-----+-----+-----+--------------+------+------+------+
                         ||
                         ||
                         ++

Performance boost could be observed if the virtio backend isn't the bottleneck or in VM2VM
case.
There are also several vhost optimization patches to be submitted later.

Huawei Xie (8):
  virtio: add configure for simple virtio rx/tx
  virtio: add virtio_rxtx.h header file
  virtio: add software rx ring, fake_buf, simple_rxtx into virtqueue
  virtio: rx/tx ring layout optimization
  virtio: fill RX avail ring with blank mbufs
  virtio: virtio vec rx
  virtio: simple tx routine
  virtio: rxtx_func_get

 config/common_linuxapp                  |   1 +
 drivers/net/virtio/Makefile             |   2 +-
 drivers/net/virtio/virtio_ethdev.c      |  29 ++-
 drivers/net/virtio/virtio_ethdev.h      |   5 +
 drivers/net/virtio/virtio_rxtx.c        |  70 +++++-
 drivers/net/virtio/virtio_rxtx.h        |  39 ++++
 drivers/net/virtio/virtio_rxtx_simple.c | 403 ++++++++++++++++++++++++++++++++
 drivers/net/virtio/virtqueue.h          |   7 +
 8 files changed, 550 insertions(+), 6 deletions(-)
 create mode 100644 drivers/net/virtio/virtio_rxtx.h
 create mode 100644 drivers/net/virtio/virtio_rxtx_simple.c

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH 1/8] virtio: add configure for simple virtio rx/tx
  2015-09-29 14:45 [dpdk-dev] [PATCH 0/8] virtio: virtio ring layout optimization and RX vector processing Huawei Xie
@ 2015-09-29 14:45 ` Huawei Xie
  2015-09-29 14:45 ` [dpdk-dev] [PATCH 2/8] virtio: add virtio_rxtx.h header file Huawei Xie
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 92+ messages in thread
From: Huawei Xie @ 2015-09-29 14:45 UTC (permalink / raw)
  To: dev

Turned off by default. This is development feature.
Will remove this macro when this optimization gets widely accepted.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 config/common_linuxapp | 1 +
 1 file changed, 1 insertion(+)

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 0de43d5..b70c5d7 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -241,6 +241,7 @@ CONFIG_RTE_LIBRTE_ENIC_DEBUG=n
 # Compile burst-oriented VIRTIO PMD driver
 #
 CONFIG_RTE_LIBRTE_VIRTIO_PMD=y
+CONFIG_RTE_LIBRTE_VIRTIO_SIMPLE=n
 CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_INIT=n
 CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_RX=n
 CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_TX=n
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH 2/8] virtio: add virtio_rxtx.h header file
  2015-09-29 14:45 [dpdk-dev] [PATCH 0/8] virtio: virtio ring layout optimization and RX vector processing Huawei Xie
  2015-09-29 14:45 ` [dpdk-dev] [PATCH 1/8] virtio: add configure for simple virtio rx/tx Huawei Xie
@ 2015-09-29 14:45 ` Huawei Xie
  2015-09-29 14:45 ` [dpdk-dev] [PATCH 3/8] virtio: add software rx ring, fake_buf, simple_rxtx into virtqueue Huawei Xie
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 92+ messages in thread
From: Huawei Xie @ 2015-09-29 14:45 UTC (permalink / raw)
  To: dev

Would move all rx/tx related code into this header file in future.
Add RTE_VIRTIO_PMD_MAX_BURST.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 drivers/net/virtio/virtio_ethdev.c |  1 +
 drivers/net/virtio/virtio_rxtx.c   |  1 +
 drivers/net/virtio/virtio_rxtx.h   | 34 ++++++++++++++++++++++++++++++++++
 3 files changed, 36 insertions(+)
 create mode 100644 drivers/net/virtio/virtio_rxtx.h

diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index 465d3cd..79a3640 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -61,6 +61,7 @@
 #include "virtio_pci.h"
 #include "virtio_logs.h"
 #include "virtqueue.h"
+#include "virtio_rxtx.h"
 
 
 static int eth_virtio_dev_init(struct rte_eth_dev *eth_dev);
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index c5b53bb..9324f7f 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -54,6 +54,7 @@
 #include "virtio_logs.h"
 #include "virtio_ethdev.h"
 #include "virtqueue.h"
+#include "virtio_rxtx.h"
 
 #ifdef RTE_LIBRTE_VIRTIO_DEBUG_DUMP
 #define VIRTIO_DUMP_PACKET(m, len) rte_pktmbuf_dump(stdout, m, len)
diff --git a/drivers/net/virtio/virtio_rxtx.h b/drivers/net/virtio/virtio_rxtx.h
new file mode 100644
index 0000000..a10aa69
--- /dev/null
+++ b/drivers/net/virtio/virtio_rxtx.h
@@ -0,0 +1,34 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#define RTE_PMD_VIRTIO_RX_MAX_BURST 64
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH 3/8] virtio: add software rx ring, fake_buf, simple_rxtx into virtqueue
  2015-09-29 14:45 [dpdk-dev] [PATCH 0/8] virtio: virtio ring layout optimization and RX vector processing Huawei Xie
  2015-09-29 14:45 ` [dpdk-dev] [PATCH 1/8] virtio: add configure for simple virtio rx/tx Huawei Xie
  2015-09-29 14:45 ` [dpdk-dev] [PATCH 2/8] virtio: add virtio_rxtx.h header file Huawei Xie
@ 2015-09-29 14:45 ` Huawei Xie
  2015-09-29 16:15   ` Stephen Hemminger
  2015-09-29 14:45 ` [dpdk-dev] [PATCH 4/8] virtio: rx/tx ring layout optimization Huawei Xie
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 92+ messages in thread
From: Huawei Xie @ 2015-09-29 14:45 UTC (permalink / raw)
  To: dev

Add software RX ring in virtqueue.
Add fake_mbuf in virtqueue for wraparound processing.
Use simple_rxtx to indicate whether simple rxtx is enabled

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 drivers/net/virtio/virtio_ethdev.c | 12 ++++++++++++
 drivers/net/virtio/virtio_rxtx.c   |  5 +++++
 drivers/net/virtio/virtqueue.h     |  6 ++++++
 3 files changed, 23 insertions(+)

diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index 79a3640..3b7b841 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -247,6 +247,9 @@ virtio_dev_queue_release(struct virtqueue *vq) {
 		VIRTIO_WRITE_REG_2(hw, VIRTIO_PCI_QUEUE_SEL, vq->queue_id);
 		VIRTIO_WRITE_REG_4(hw, VIRTIO_PCI_QUEUE_PFN, 0);
 
+		if (vq->sw_ring)
+			rte_free(vq->sw_ring);
+
 		rte_free(vq);
 		vq = NULL;
 	}
@@ -292,6 +295,9 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
 			dev->data->port_id, queue_idx);
 		vq = rte_zmalloc(vq_name, sizeof(struct virtqueue) +
 			vq_size * sizeof(struct vq_desc_extra), RTE_CACHE_LINE_SIZE);
+		vq->sw_ring = rte_zmalloc_socket("rxq->sw_ring",
+			(RTE_PMD_VIRTIO_RX_MAX_BURST + vq_size) *
+			sizeof(vq->sw_ring[0]), RTE_CACHE_LINE_SIZE, socket_id);
 	} else if (queue_type == VTNET_TQ) {
 		snprintf(vq_name, sizeof(vq_name), "port%d_tvq%d",
 			dev->data->port_id, queue_idx);
@@ -308,6 +314,12 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
 		PMD_INIT_LOG(ERR, "%s: Can not allocate virtqueue", __func__);
 		return (-ENOMEM);
 	}
+	if (queue_type == VTNET_RQ && vq->sw_ring == NULL) {
+		PMD_INIT_LOG(ERR, "%s: Can not allocate RX soft ring",
+			__func__);
+		rte_free(vq);
+		return -ENOMEM;
+	}
 
 	vq->hw = hw;
 	vq->port_id = dev->data->port_id;
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 9324f7f..dcc4524 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -299,6 +299,11 @@ virtio_dev_vring_start(struct virtqueue *vq, int queue_type)
 		/* Allocate blank mbufs for the each rx descriptor */
 		nbufs = 0;
 		error = ENOSPC;
+
+		memset(&vq->fake_mbuf, 0, sizeof(vq->fake_mbuf));
+		for (i = 0; i < RTE_PMD_VIRTIO_RX_MAX_BURST; i++)
+			vq->sw_ring[vq->vq_nentries + i] = &vq->fake_mbuf;
+
 		while (!virtqueue_full(vq)) {
 			m = rte_rxmbuf_alloc(vq->mpool);
 			if (m == NULL)
diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
index 7789411..dd63285 100644
--- a/drivers/net/virtio/virtqueue.h
+++ b/drivers/net/virtio/virtqueue.h
@@ -190,11 +190,17 @@ struct virtqueue {
 	uint16_t vq_avail_idx;
 	phys_addr_t virtio_net_hdr_mem; /**< hdr for each xmit packet */
 
+	struct rte_mbuf **sw_ring; /**< RX software ring. */
+	/* dummy mbuf, for wraparound when processing RX ring. */
+	struct rte_mbuf fake_mbuf;
+
 	/* Statistics */
 	uint64_t	packets;
 	uint64_t	bytes;
 	uint64_t	errors;
 
+	int use_simple_rxtx;
+
 	struct vq_desc_extra {
 		void              *cookie;
 		uint16_t          ndescs;
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH 4/8] virtio: rx/tx ring layout optimization
  2015-09-29 14:45 [dpdk-dev] [PATCH 0/8] virtio: virtio ring layout optimization and RX vector processing Huawei Xie
                   ` (2 preceding siblings ...)
  2015-09-29 14:45 ` [dpdk-dev] [PATCH 3/8] virtio: add software rx ring, fake_buf, simple_rxtx into virtqueue Huawei Xie
@ 2015-09-29 14:45 ` Huawei Xie
  2015-09-29 14:45 ` [dpdk-dev] [PATCH 5/8] virtio: fill RX avail ring with blank mbufs Huawei Xie
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 92+ messages in thread
From: Huawei Xie @ 2015-09-29 14:45 UTC (permalink / raw)
  To: dev

In DPDK based switching enviroment, mostly vhost runs on a dedicated core
while virtio processing in guest VMs runs on different cores.
Take RX for example, with generic implementation, for each guest buffer,
a) virtio driver allocates a descriptor from free descriptor list
b) modify the entry of avail ring to point to allocated descriptor
c) after packet is received, free the descriptor

When vhost fetches the avail ring, it needs to fetch the modified L1 cache from
virtio core, which is a heavy cost in current CPU implementation.

This idea of this optimization is:
    allocate the fixed descriptor for each entry of avail ring.
and avail ring will always be the same during the run.
This removes L1 cache transfer from virtio core to vhost core for avail ring.
Besides, no descriptor free and allocation is needed.
This also makes vector procesing possible to further accelerate the processing.

This is the layout for the avail ring(take 256 ring entries for example), with
each entry pointing to the descriptor with the same index.
                    avail
                    idx
                    +
                    |
+----+----+---+-------------+------+
| 0  | 1  | 2 | ... |  254  | 255  |  avail ring
+-+--+-+--+-+-+---------+---+--+---+
  |    |    |       |   |      |
  |    |    |       |   |      |
  v    v    v       |   v      v
+-+--+-+--+-+-+---------+---+--+---+
| 0  | 1  | 2 | ... |  254  | 255  |  desc ring
+----+----+---+-------------+------+
                    |
                    |
+----+----+---+-------------+------+
| 0  | 1  | 2 |     |  254  | 255  |  used ring
+----+----+---+-------------+------+
                    |
                    +

This is the ring layout for TX.
As we need one virtio header for each xmit packet, we have 128 slots available.

                         ++
                         ||
                         ||
+-----+-----+-----+--------------+------+------+------+
|  0  |  1  | ... |  127 || 128  | 129  | ...  | 255  |   avail ring
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
   |     |            |  ||  |      |             |
   v     v            v  ||  v      v             v
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
| 127 | 128 | ... |  255 || 127  | 128  | ...  | 255  |   desc ring for virtio_net_hdr
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
   |     |            |  ||  |      |             |
   v     v            v  ||  v      v             v
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
|  0  |  1  | ... |  127 ||  0   |  1   | ...  | 127  |   desc ring for tx dat
+-----+-----+-----+--------------+------+------+------+
                         ||
                         ||
                         ++

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 drivers/net/virtio/virtio_rxtx.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index dcc4524..b4d268d 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -300,6 +300,12 @@ virtio_dev_vring_start(struct virtqueue *vq, int queue_type)
 		nbufs = 0;
 		error = ENOSPC;
 
+		if (vq->use_simple_rxtx)
+			for (i = 0; i < vq->vq_nentries; i++) {
+				vq->vq_ring.avail->ring[i] = i;
+				vq->vq_ring.desc[i].flags = VRING_DESC_F_WRITE;
+			}
+
 		memset(&vq->fake_mbuf, 0, sizeof(vq->fake_mbuf));
 		for (i = 0; i < RTE_PMD_VIRTIO_RX_MAX_BURST; i++)
 			vq->sw_ring[vq->vq_nentries + i] = &vq->fake_mbuf;
@@ -330,6 +336,24 @@ virtio_dev_vring_start(struct virtqueue *vq, int queue_type)
 		VIRTIO_WRITE_REG_4(vq->hw, VIRTIO_PCI_QUEUE_PFN,
 			vq->mz->phys_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT);
 	} else if (queue_type == VTNET_TQ) {
+		if (vq->use_simple_rxtx) {
+			int mid_idx  = vq->vq_nentries >> 1;
+			for (i = 0; i < mid_idx; i++) {
+				vq->vq_ring.avail->ring[i] = i + mid_idx;
+				vq->vq_ring.desc[i + mid_idx].next = i;
+				vq->vq_ring.desc[i + mid_idx].addr =
+					vq->virtio_net_hdr_mem +
+						mid_idx * vq->hw->vtnet_hdr_size;
+				vq->vq_ring.desc[i + mid_idx].len =
+					vq->hw->vtnet_hdr_size;
+				vq->vq_ring.desc[i + mid_idx].flags =
+					VRING_DESC_F_NEXT;
+				vq->vq_ring.desc[i].flags = 0;
+			}
+			for (i = mid_idx; i < vq->vq_nentries; i++)
+				vq->vq_ring.avail->ring[i] = i;
+		}
+
 		VIRTIO_WRITE_REG_2(vq->hw, VIRTIO_PCI_QUEUE_SEL,
 			vq->vq_queue_index);
 		VIRTIO_WRITE_REG_4(vq->hw, VIRTIO_PCI_QUEUE_PFN,
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH 5/8] virtio: fill RX avail ring with blank mbufs
  2015-09-29 14:45 [dpdk-dev] [PATCH 0/8] virtio: virtio ring layout optimization and RX vector processing Huawei Xie
                   ` (3 preceding siblings ...)
  2015-09-29 14:45 ` [dpdk-dev] [PATCH 4/8] virtio: rx/tx ring layout optimization Huawei Xie
@ 2015-09-29 14:45 ` Huawei Xie
  2015-09-29 14:45 ` [dpdk-dev] [PATCH 6/8] virtio: virtio vec rx Huawei Xie
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 92+ messages in thread
From: Huawei Xie @ 2015-09-29 14:45 UTC (permalink / raw)
  To: dev

fill avail ring with blank mbufs in virtio_dev_vring_start

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 drivers/net/virtio/Makefile             |  2 +-
 drivers/net/virtio/virtio_rxtx.c        | 14 +++++-
 drivers/net/virtio/virtio_rxtx.h        |  3 ++
 drivers/net/virtio/virtio_rxtx_simple.c | 84 +++++++++++++++++++++++++++++++++
 4 files changed, 100 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/virtio/virtio_rxtx_simple.c

diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
index 930b60f..40f8e8d 100644
--- a/drivers/net/virtio/Makefile
+++ b/drivers/net/virtio/Makefile
@@ -50,7 +50,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtqueue.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_pci.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_ethdev.c
-
+SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_SIMPLE)     += virtio_rxtx_simple.c
 
 # this lib depends upon:
 DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_eal lib/librte_ether
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index b4d268d..aab6724 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -318,8 +318,10 @@ virtio_dev_vring_start(struct virtqueue *vq, int queue_type)
 			/******************************************
 			*         Enqueue allocated buffers        *
 			*******************************************/
-			error = virtqueue_enqueue_recv_refill(vq, m);
-
+			if (vq->use_simple_rxtx)
+				error = virtqueue_enqueue_recv_refill_simple(vq, m);
+			else
+				error = virtqueue_enqueue_recv_refill(vq, m);
 			if (error) {
 				rte_pktmbuf_free(m);
 				break;
@@ -855,3 +857,11 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 	return nb_tx;
 }
+
+
+int __attribute__((weak))
+virtqueue_enqueue_recv_refill_simple(struct virtqueue __rte_unused *vq,
+	struct rte_mbuf __rte_unused *m)
+{
+	return -1;
+}
diff --git a/drivers/net/virtio/virtio_rxtx.h b/drivers/net/virtio/virtio_rxtx.h
index a10aa69..7d2d8fe 100644
--- a/drivers/net/virtio/virtio_rxtx.h
+++ b/drivers/net/virtio/virtio_rxtx.h
@@ -32,3 +32,6 @@
  */
 
 #define RTE_PMD_VIRTIO_RX_MAX_BURST 64
+
+int virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,
+	struct rte_mbuf *m);
diff --git a/drivers/net/virtio/virtio_rxtx_simple.c b/drivers/net/virtio/virtio_rxtx_simple.c
new file mode 100644
index 0000000..cac5b9f
--- /dev/null
+++ b/drivers/net/virtio/virtio_rxtx_simple.c
@@ -0,0 +1,84 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <errno.h>
+
+#include <tmmintrin.h>
+
+#include <rte_cycles.h>
+#include <rte_memory.h>
+#include <rte_memzone.h>
+#include <rte_branch_prediction.h>
+#include <rte_mempool.h>
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+#include <rte_ether.h>
+#include <rte_ethdev.h>
+#include <rte_prefetch.h>
+#include <rte_string_fns.h>
+#include <rte_errno.h>
+#include <rte_byteorder.h>
+
+#include "virtio_logs.h"
+#include "virtio_ethdev.h"
+#include "virtqueue.h"
+#include "virtio_rxtx.h"
+
+int __attribute__((cold))
+virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,
+	struct rte_mbuf *cookie)
+{
+	struct vq_desc_extra *dxp;
+	struct vring_desc *start_dp;
+	uint16_t desc_idx;
+
+	desc_idx = vq->vq_avail_idx & (vq->vq_nentries - 1);
+	dxp = &vq->vq_descx[desc_idx];
+	dxp->cookie = (void *)cookie;
+	vq->sw_ring[desc_idx] = cookie;
+
+	start_dp = vq->vq_ring.desc;
+	start_dp[desc_idx].addr = (uint64_t)((uintptr_t)cookie->buf_physaddr +
+		RTE_PKTMBUF_HEADROOM - sizeof(struct virtio_net_hdr));
+	start_dp[desc_idx].len = cookie->buf_len -
+		RTE_PKTMBUF_HEADROOM + sizeof(struct virtio_net_hdr);
+
+	vq->vq_free_cnt--;
+	vq->vq_avail_idx++;
+
+	return 0;
+}
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH 6/8] virtio: virtio vec rx
  2015-09-29 14:45 [dpdk-dev] [PATCH 0/8] virtio: virtio ring layout optimization and RX vector processing Huawei Xie
                   ` (4 preceding siblings ...)
  2015-09-29 14:45 ` [dpdk-dev] [PATCH 5/8] virtio: fill RX avail ring with blank mbufs Huawei Xie
@ 2015-09-29 14:45 ` Huawei Xie
  2015-09-29 14:45 ` [dpdk-dev] [PATCH 7/8] virtio: simple tx routine Huawei Xie
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 92+ messages in thread
From: Huawei Xie @ 2015-09-29 14:45 UTC (permalink / raw)
  To: dev

With fixed avail ring, we don't need to get desc idx from avail ring.
virtio driver only has to deal with desc ring.
This patch uses vector instruction to accelerate processing desc ring.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 drivers/net/virtio/virtio_ethdev.h      |   2 +
 drivers/net/virtio/virtio_rxtx.c        |  17 +++
 drivers/net/virtio/virtio_rxtx.h        |   2 +
 drivers/net/virtio/virtio_rxtx_simple.c | 224 ++++++++++++++++++++++++++++++++
 drivers/net/virtio/virtqueue.h          |   1 +
 5 files changed, 246 insertions(+)

diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h
index 9026d42..d7797ab 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -108,6 +108,8 @@ uint16_t virtio_recv_mergeable_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 uint16_t virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
+		uint16_t nb_pkts);
 
 /*
  * The VIRTIO_NET_F_GUEST_TSO[46] features permit the host to send us
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index aab6724..b721336 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -430,6 +430,9 @@ virtio_dev_rx_queue_setup(struct rte_eth_dev *dev,
 	vq->mpool = mp;
 
 	dev->data->rx_queues[queue_idx] = vq;
+
+	virtio_rxq_vec_setup(vq);
+
 	return 0;
 }
 
@@ -858,6 +861,20 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	return nb_tx;
 }
 
+uint16_t __attribute__((weak))
+virtio_recv_pkts_vec(
+	void __rte_unused *rx_queue,
+	struct rte_mbuf __rte_unused **rx_pkts,
+	uint16_t __rte_unused nb_pkts)
+{
+	return 0;
+}
+
+int __attribute__((weak))
+virtio_rxq_vec_setup(struct virtqueue __rte_unused *rxq)
+{
+	return -1;
+}
 
 int __attribute__((weak))
 virtqueue_enqueue_recv_refill_simple(struct virtqueue __rte_unused *vq,
diff --git a/drivers/net/virtio/virtio_rxtx.h b/drivers/net/virtio/virtio_rxtx.h
index 7d2d8fe..19c871c 100644
--- a/drivers/net/virtio/virtio_rxtx.h
+++ b/drivers/net/virtio/virtio_rxtx.h
@@ -33,5 +33,7 @@
 
 #define RTE_PMD_VIRTIO_RX_MAX_BURST 64
 
+int virtio_rxq_vec_setup(struct virtqueue __rte_unused *rxq);
+
 int virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,
 	struct rte_mbuf *m);
diff --git a/drivers/net/virtio/virtio_rxtx_simple.c b/drivers/net/virtio/virtio_rxtx_simple.c
index cac5b9f..3d57038 100644
--- a/drivers/net/virtio/virtio_rxtx_simple.c
+++ b/drivers/net/virtio/virtio_rxtx_simple.c
@@ -58,6 +58,10 @@
 #include "virtqueue.h"
 #include "virtio_rxtx.h"
 
+#define RTE_VIRTIO_VPMD_RX_BURST 32
+#define RTE_VIRTIO_DESC_PER_LOOP 8
+#define RTE_VIRTIO_VPMD_RX_REARM_THRESH RTE_VIRTIO_VPMD_RX_BURST
+
 int __attribute__((cold))
 virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,
 	struct rte_mbuf *cookie)
@@ -82,3 +86,223 @@ virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,
 
 	return 0;
 }
+
+static inline void
+virtio_rxq_rearm_vec(struct virtqueue *rxvq)
+{
+	int i;
+	uint16_t desc_idx;
+	struct rte_mbuf **sw_ring;
+	struct vring_desc *start_dp;
+	int ret;
+
+	desc_idx = rxvq->vq_avail_idx & (rxvq->vq_nentries - 1);
+	sw_ring = &rxvq->sw_ring[desc_idx];
+	start_dp = &rxvq->vq_ring.desc[desc_idx];
+
+	ret = rte_mempool_get_bulk(rxvq->mpool, (void **)sw_ring,
+		RTE_VIRTIO_VPMD_RX_REARM_THRESH);
+	if (unlikely(ret)) {
+		rte_eth_devices[rxvq->port_id].data->rx_mbuf_alloc_failed +=
+			RTE_VIRTIO_VPMD_RX_REARM_THRESH;
+		return;
+	}
+
+	for (i = 0; i < RTE_VIRTIO_VPMD_RX_REARM_THRESH; i++) {
+		uintptr_t p;
+
+		p = (uintptr_t)&sw_ring[i]->rearm_data;
+		*(uint64_t *)p = rxvq->mbuf_initializer;
+
+		start_dp[i].addr =
+			(uint64_t)((uintptr_t)sw_ring[i]->buf_physaddr +
+			RTE_PKTMBUF_HEADROOM - sizeof(struct virtio_net_hdr));
+		start_dp[i].len = sw_ring[i]->buf_len -
+			RTE_PKTMBUF_HEADROOM + sizeof(struct virtio_net_hdr);
+	}
+
+	rxvq->vq_avail_idx += RTE_VIRTIO_VPMD_RX_REARM_THRESH;
+	rxvq->vq_free_cnt -= RTE_VIRTIO_VPMD_RX_REARM_THRESH;
+	vq_update_avail_idx(rxvq);
+}
+
+/*
+ * virtio vPMD receive routine, only accept(nb_pkts >= RTE_VIRTIO_DESC_PER_LOOP)
+ *
+ * This routine is for non-mergable RX, one desc for each guest buffer.
+ * This routine is based on the RX ring layout optimization. Each entry in the
+ * avail ring points to the desc with the same index in the desc ring and this
+ * will never be changed in the driver.
+ *
+ * - nb_pkts < RTE_VIRTIO_DESC_PER_LOOP, just return no packet
+ */
+uint16_t
+virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
+	uint16_t nb_pkts)
+{
+	struct virtqueue *rxvq = rx_queue;
+	uint16_t nb_used;
+	uint16_t desc_idx;
+	struct vring_used_elem *rused;
+	struct rte_mbuf **sw_ring;
+	struct rte_mbuf **sw_ring_end;
+	uint16_t nb_pkts_received;
+	__m128i shuf_msk1, shuf_msk2, len_adjust;
+
+	shuf_msk1 = _mm_set_epi8(
+		0xFF, 0xFF, 0xFF, 0xFF,
+		0xFF, 0xFF,		/* vlan tci */
+		5, 4,			/* dat len */
+		0xFF, 0xFF, 5, 4,	/* pkt len */
+		0xFF, 0xFF, 0xFF, 0xFF	/* packet type */
+
+	);
+
+	shuf_msk2 = _mm_set_epi8(
+		0xFF, 0xFF, 0xFF, 0xFF,
+		0xFF, 0xFF,		/* vlan tci */
+		13, 12,			/* dat len */
+		0xFF, 0xFF, 13, 12,	/* pkt len */
+		0xFF, 0xFF, 0xFF, 0xFF	/* packet type */
+	);
+
+	/* Substract the header length.
+	*  In which case do we need the header length in used->len ? */
+	len_adjust = _mm_set_epi16(
+		0, 0,
+		0,
+		(uint16_t) -sizeof(struct virtio_net_hdr),
+		0, (uint16_t) -sizeof(struct virtio_net_hdr),
+		0, 0);
+
+	if (unlikely(nb_pkts < RTE_VIRTIO_DESC_PER_LOOP))
+		return 0;
+
+	nb_used = *(volatile uint16_t *)&rxvq->vq_ring.used->idx -
+		rxvq->vq_used_cons_idx;
+
+	rte_compiler_barrier();
+
+	if (unlikely(nb_used == 0))
+		return 0;
+
+	nb_pkts = RTE_ALIGN_FLOOR(nb_pkts, RTE_VIRTIO_DESC_PER_LOOP);
+	nb_used = RTE_MIN(nb_used, nb_pkts);
+
+	desc_idx = (uint16_t)(rxvq->vq_used_cons_idx & (rxvq->vq_nentries - 1));
+	rused = &rxvq->vq_ring.used->ring[desc_idx];
+	sw_ring  = &rxvq->sw_ring[desc_idx];
+	sw_ring_end = &rxvq->sw_ring[rxvq->vq_nentries];
+
+	_mm_prefetch((const void *)rused, _MM_HINT_T0);
+
+	if (rxvq->vq_free_cnt >= RTE_VIRTIO_VPMD_RX_REARM_THRESH) {
+		virtio_rxq_rearm_vec(rxvq);
+		if (unlikely(virtqueue_kick_prepare(rxvq)))
+			virtqueue_notify(rxvq);
+	}
+
+	for (nb_pkts_received = 0;
+		nb_pkts_received < nb_used;) {
+		__m128i desc[RTE_VIRTIO_DESC_PER_LOOP / 2];
+		__m128i mbp[RTE_VIRTIO_DESC_PER_LOOP / 2];
+		__m128i pkt_mb[RTE_VIRTIO_DESC_PER_LOOP];
+
+		mbp[0] = _mm_loadu_si128((__m128i *)(sw_ring + 0));
+		desc[0] = _mm_loadu_si128((__m128i *)(rused + 0));
+		_mm_storeu_si128((__m128i *)&rx_pkts[0], mbp[0]);
+
+		mbp[1] = _mm_loadu_si128((__m128i *)(sw_ring + 2));
+		desc[1] = _mm_loadu_si128((__m128i *)(rused + 2));
+		_mm_storeu_si128((__m128i *)&rx_pkts[2], mbp[1]);
+
+		mbp[2] = _mm_loadu_si128((__m128i *)(sw_ring + 4));
+		desc[2] = _mm_loadu_si128((__m128i *)(rused + 4));
+		_mm_storeu_si128((__m128i *)&rx_pkts[4], mbp[2]);
+
+		mbp[3] = _mm_loadu_si128((__m128i *)(sw_ring + 6));
+		desc[3] = _mm_loadu_si128((__m128i *)(rused + 6));
+		_mm_storeu_si128((__m128i *)&rx_pkts[6], mbp[3]);
+
+		pkt_mb[1] = _mm_shuffle_epi8(desc[0], shuf_msk2);
+		pkt_mb[0] = _mm_shuffle_epi8(desc[0], shuf_msk1);
+		pkt_mb[1] = _mm_add_epi16(pkt_mb[1], len_adjust);
+		pkt_mb[0] = _mm_add_epi16(pkt_mb[0], len_adjust);
+		_mm_storeu_si128((void *)&rx_pkts[1]->rx_descriptor_fields1,
+			pkt_mb[1]);
+		_mm_storeu_si128((void *)&rx_pkts[0]->rx_descriptor_fields1,
+			pkt_mb[0]);
+
+		pkt_mb[3] = _mm_shuffle_epi8(desc[1], shuf_msk2);
+		pkt_mb[2] = _mm_shuffle_epi8(desc[1], shuf_msk1);
+		pkt_mb[3] = _mm_add_epi16(pkt_mb[3], len_adjust);
+		pkt_mb[2] = _mm_add_epi16(pkt_mb[2], len_adjust);
+		_mm_storeu_si128((void *)&rx_pkts[3]->rx_descriptor_fields1,
+			pkt_mb[3]);
+		_mm_storeu_si128((void *)&rx_pkts[2]->rx_descriptor_fields1,
+			pkt_mb[2]);
+
+		pkt_mb[5] = _mm_shuffle_epi8(desc[2], shuf_msk2);
+		pkt_mb[4] = _mm_shuffle_epi8(desc[2], shuf_msk1);
+		pkt_mb[5] = _mm_add_epi16(pkt_mb[5], len_adjust);
+		pkt_mb[4] = _mm_add_epi16(pkt_mb[4], len_adjust);
+		_mm_storeu_si128((void *)&rx_pkts[5]->rx_descriptor_fields1,
+			pkt_mb[5]);
+		_mm_storeu_si128((void *)&rx_pkts[4]->rx_descriptor_fields1,
+			pkt_mb[4]);
+
+		pkt_mb[7] = _mm_shuffle_epi8(desc[3], shuf_msk2);
+		pkt_mb[6] = _mm_shuffle_epi8(desc[3], shuf_msk1);
+		pkt_mb[7] = _mm_add_epi16(pkt_mb[7], len_adjust);
+		pkt_mb[6] = _mm_add_epi16(pkt_mb[6], len_adjust);
+		_mm_storeu_si128((void *)&rx_pkts[7]->rx_descriptor_fields1,
+			pkt_mb[7]);
+		_mm_storeu_si128((void *)&rx_pkts[6]->rx_descriptor_fields1,
+			pkt_mb[6]);
+
+		if (unlikely(nb_used <= RTE_VIRTIO_DESC_PER_LOOP)) {
+			if (sw_ring + nb_used <= sw_ring_end)
+				nb_pkts_received += nb_used;
+			else
+				nb_pkts_received += sw_ring_end - sw_ring;
+			break;
+		} else {
+			if (unlikely(sw_ring + RTE_VIRTIO_DESC_PER_LOOP >=
+				sw_ring_end)) {
+				nb_pkts_received += sw_ring_end - sw_ring;
+				break;
+			} else {
+				nb_pkts_received += RTE_VIRTIO_DESC_PER_LOOP;
+
+				rx_pkts += RTE_VIRTIO_DESC_PER_LOOP;
+				sw_ring += RTE_VIRTIO_DESC_PER_LOOP;
+				rused   += RTE_VIRTIO_DESC_PER_LOOP;
+				nb_used -= RTE_VIRTIO_DESC_PER_LOOP;
+			}
+		}
+	}
+
+	rxvq->vq_used_cons_idx += nb_pkts_received;
+	rxvq->vq_free_cnt += nb_pkts_received;
+	rxvq->packets += nb_pkts_received;
+	return nb_pkts_received;
+}
+
+int __attribute__((cold))
+virtio_rxq_vec_setup(struct virtqueue *rxq)
+{
+	uintptr_t p;
+	struct rte_mbuf mb_def = { .buf_addr = 0 }; /* zeroed mbuf */
+
+	mb_def.nb_segs = 1;
+	mb_def.data_off = RTE_PKTMBUF_HEADROOM;
+	mb_def.port = rxq->port_id;
+	rte_mbuf_refcnt_set(&mb_def, 1);
+
+	/* prevent compiler reordering: rearm_data covers previous fields */
+	rte_compiler_barrier();
+	p = (uintptr_t)&mb_def.rearm_data;
+	rxq->mbuf_initializer = *(uint64_t *)p;
+
+	return 0;
+}
diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
index dd63285..363fb99 100644
--- a/drivers/net/virtio/virtqueue.h
+++ b/drivers/net/virtio/virtqueue.h
@@ -188,6 +188,7 @@ struct virtqueue {
 	 */
 	uint16_t vq_used_cons_idx;
 	uint16_t vq_avail_idx;
+	uint64_t mbuf_initializer; /**< value to init mbufs. */
 	phys_addr_t virtio_net_hdr_mem; /**< hdr for each xmit packet */
 
 	struct rte_mbuf **sw_ring; /**< RX software ring. */
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH 7/8] virtio: simple tx routine
  2015-09-29 14:45 [dpdk-dev] [PATCH 0/8] virtio: virtio ring layout optimization and RX vector processing Huawei Xie
                   ` (5 preceding siblings ...)
  2015-09-29 14:45 ` [dpdk-dev] [PATCH 6/8] virtio: virtio vec rx Huawei Xie
@ 2015-09-29 14:45 ` Huawei Xie
  2015-09-29 14:45 ` [dpdk-dev] [PATCH 8/8] virtio: rxtx_func_get Huawei Xie
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 92+ messages in thread
From: Huawei Xie @ 2015-09-29 14:45 UTC (permalink / raw)
  To: dev

bulk free of mbufs when clean used ring.
shift operation of idx could be further saved if vq_free_cnt means
free slots rather than free descriptors.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 drivers/net/virtio/virtio_ethdev.h      |  3 ++
 drivers/net/virtio/virtio_rxtx.c        |  9 ++++
 drivers/net/virtio/virtio_rxtx_simple.c | 95 +++++++++++++++++++++++++++++++++
 3 files changed, 107 insertions(+)

diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h
index d7797ab..ae2d47d 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -111,6 +111,9 @@ uint16_t virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 uint16_t virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t virtio_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 /*
  * The VIRTIO_NET_F_GUEST_TSO[46] features permit the host to send us
  * frames larger than 1514 bytes. We do not yet support software LRO
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index b721336..328bb7d 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -870,6 +870,15 @@ virtio_recv_pkts_vec(
 	return 0;
 }
 
+uint16_t __attribute__((weak))
+virtio_xmit_pkts_simple(
+	void __rte_unused *tx_queue,
+	struct rte_mbuf __rte_unused **tx_pkts,
+	uint16_t __rte_unused nb_pkts)
+{
+	return 0;
+}
+
 int __attribute__((weak))
 virtio_rxq_vec_setup(struct virtqueue __rte_unused *rxq)
 {
diff --git a/drivers/net/virtio/virtio_rxtx_simple.c b/drivers/net/virtio/virtio_rxtx_simple.c
index 3d57038..d1eed79 100644
--- a/drivers/net/virtio/virtio_rxtx_simple.c
+++ b/drivers/net/virtio/virtio_rxtx_simple.c
@@ -288,6 +288,101 @@ virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
 	return nb_pkts_received;
 }
 
+#define VIRTIO_TX_FREE_THRESH 32
+#define VIRTIO_TX_MAX_FREE_BUF_SZ 32
+#define VIRTIO_TX_FREE_NR 32
+/* TODO: vq->tx_free_cnt could mean num of free slots so we could avoid shift */
+static inline void __attribute__((always_inline))
+virtio_xmit_cleanup(struct virtqueue *vq)
+{
+	uint16_t i, desc_idx;
+	int nb_free = 0;
+	struct rte_mbuf *m, *free[VIRTIO_TX_MAX_FREE_BUF_SZ];
+
+	desc_idx = (uint16_t)(vq->vq_used_cons_idx &
+		((vq->vq_nentries >> 1) - 1));
+	free[0] = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
+	nb_free = 1;
+
+	for (i = 1; i < VIRTIO_TX_FREE_NR; i++) {
+		m = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
+		if (likely(m->pool == free[0]->pool))
+			free[nb_free++] = m;
+		else {
+			rte_mempool_put_bulk(free[0]->pool, (void **)free,
+				nb_free);
+			free[0] = m;
+			nb_free = 1;
+		}
+	}
+
+	rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
+	vq->vq_used_cons_idx += VIRTIO_TX_FREE_NR;
+	vq->vq_free_cnt += (VIRTIO_TX_FREE_NR << 1);
+
+	return;
+}
+
+uint16_t
+virtio_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
+	uint16_t nb_pkts)
+{
+	struct virtqueue *txvq = tx_queue;
+	uint16_t nb_used;
+	uint16_t desc_idx;
+	struct vring_desc *start_dp;
+	uint16_t nb_tail, nb_commit;
+	int i;
+	uint16_t desc_idx_max = (txvq->vq_nentries >> 1) - 1;
+
+	nb_used = VIRTQUEUE_NUSED(txvq);
+	rte_compiler_barrier();
+
+	nb_commit = nb_pkts = RTE_MIN((txvq->vq_free_cnt >> 1), nb_pkts);
+	desc_idx = (uint16_t) (txvq->vq_avail_idx & desc_idx_max);
+	start_dp = txvq->vq_ring.desc;
+	nb_tail = (uint16_t) (desc_idx_max + 1 - desc_idx);
+
+	if (nb_used >= VIRTIO_TX_FREE_THRESH)
+		virtio_xmit_cleanup(tx_queue);
+
+	if (nb_commit >= nb_tail) {
+		for (i = 0; i < nb_tail; i++)
+			txvq->vq_descx[desc_idx + i].cookie = tx_pkts[i];
+		for (i = 0; i < nb_tail; i++) {
+			start_dp[desc_idx].addr =
+				RTE_MBUF_DATA_DMA_ADDR(*tx_pkts);
+			start_dp[desc_idx].len = (*tx_pkts)->pkt_len;
+			tx_pkts++;
+			desc_idx++;
+		}
+		nb_commit -= nb_tail;
+		desc_idx = 0;
+	}
+	for (i = 0; i < nb_commit; i++)
+		txvq->vq_descx[desc_idx + i].cookie = tx_pkts[i];
+	for (i = 0; i < nb_commit; i++) {
+		start_dp[desc_idx].addr = RTE_MBUF_DATA_DMA_ADDR(*tx_pkts);
+		start_dp[desc_idx].len = (*tx_pkts)->pkt_len;
+		tx_pkts++;
+		desc_idx++;
+	}
+
+	rte_compiler_barrier();
+
+	txvq->vq_free_cnt -= (uint16_t)(nb_pkts << 1);
+	txvq->vq_avail_idx += nb_pkts;
+	txvq->vq_ring.avail->idx = txvq->vq_avail_idx;
+	txvq->packets += nb_pkts;
+
+	if (likely(nb_pkts)) {
+		if (unlikely(virtqueue_kick_prepare(txvq)))
+			virtqueue_notify(txvq);
+	}
+
+	return nb_pkts;
+}
+
 int __attribute__((cold))
 virtio_rxq_vec_setup(struct virtqueue *rxq)
 {
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH 8/8] virtio: rxtx_func_get
  2015-09-29 14:45 [dpdk-dev] [PATCH 0/8] virtio: virtio ring layout optimization and RX vector processing Huawei Xie
                   ` (6 preceding siblings ...)
  2015-09-29 14:45 ` [dpdk-dev] [PATCH 7/8] virtio: simple tx routine Huawei Xie
@ 2015-09-29 14:45 ` Huawei Xie
  2015-09-29 15:41 ` [dpdk-dev] [PATCH 0/8] virtio: virtio ring layout optimization and RX vector processing Xie, Huawei
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 92+ messages in thread
From: Huawei Xie @ 2015-09-29 14:45 UTC (permalink / raw)
  To: dev

Select simplified rx/tx when mergable isn't enabled and there is no
offload flags specified.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 drivers/net/virtio/virtio_ethdev.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index 3b7b841..a08529c 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -369,6 +369,8 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
 	vq->virtio_net_hdr_mz  = NULL;
 	vq->virtio_net_hdr_mem = 0;
 
+	vq->use_simple_rxtx = (dev->rx_pkt_burst == virtio_recv_pkts_vec);
+
 	if (queue_type == VTNET_TQ) {
 		/*
 		 * For each xmit packet, allocate a virtio_net_hdr
@@ -1156,13 +1158,21 @@ virtio_interrupt_handler(__rte_unused struct rte_intr_handle *handle,
 }
 
 static void
-rx_func_get(struct rte_eth_dev *eth_dev)
+rxtx_func_get(struct rte_eth_dev *eth_dev)
 {
 	struct virtio_hw *hw = eth_dev->data->dev_private;
+
 	if (vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF))
 		eth_dev->rx_pkt_burst = &virtio_recv_mergeable_pkts;
 	else
 		eth_dev->rx_pkt_burst = &virtio_recv_pkts;
+
+#ifdef RTE_LIBRTE_VIRTIO_SIMPLE
+	if (!vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF)) {
+		eth_dev->rx_pkt_burst = &virtio_recv_pkts_vec;
+		eth_dev->tx_pkt_burst = &virtio_xmit_pkts_simple;
+	}
+#endif
 }
 
 /*
@@ -1184,7 +1194,7 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->tx_pkt_burst = &virtio_xmit_pkts;
 
 	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
-		rx_func_get(eth_dev);
+		rxtx_func_get(eth_dev);
 		return 0;
 	}
 
@@ -1214,7 +1224,7 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
 	vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER);
 	virtio_negotiate_features(hw);
 
-	rx_func_get(eth_dev);
+	rxtx_func_get(eth_dev);
 
 	/* Setting up rx_header size for the device */
 	if (vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF))
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [dpdk-dev] [PATCH 0/8] virtio: virtio ring layout optimization and RX vector processing
  2015-09-29 14:45 [dpdk-dev] [PATCH 0/8] virtio: virtio ring layout optimization and RX vector processing Huawei Xie
                   ` (7 preceding siblings ...)
  2015-09-29 14:45 ` [dpdk-dev] [PATCH 8/8] virtio: rxtx_func_get Huawei Xie
@ 2015-09-29 15:41 ` Xie, Huawei
  2015-10-18  6:28 ` [dpdk-dev] [PATCH v2 0/7] virtio ring layout optimization and simple rx/tx processing Huawei Xie
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 92+ messages in thread
From: Xie, Huawei @ 2015-09-29 15:41 UTC (permalink / raw)
  To: dev, Thomas Monjalon

Thomas:
Let us review first, then discuss the macro after that.
My preference is use the configure macro and then fix this before next
release. We could give some development features or aggressive changes a
time buffer.


On 9/29/2015 10:46 PM, Huawei Xie wrote:
> Copied some message from patch 4.
> In DPDK based switching enviroment, mostly vhost runs on a dedicated core
> while virtio processing in guest VMs runs on different cores.
> Take RX for example, with generic implementation, for each guest buffer,
> a) virtio driver allocates a descriptor from free descriptor list
> b) modify the entry of avail ring to point to allocated descriptor
> c) after packet is received, free the descriptor
>
> When vhost fetches the avail ring, it needs to fetch the modified L1 cache from
> virtio core, which is a heavy cost in current CPU implementation.
>
> This idea of this optimization is:
>     allocate the fixed descriptor for each entry of avail ring.
> and avail ring will always be the same during the run.
> This removes L1 cache transfer from virtio core to vhost core for avail ring.
> Besides, no descriptor free and allocation is needed.
>
> Most importantly, this makes vector procesing possible to further accelerate
> the processing.
>
> This is the layout for the avail ring(take 256 ring entries for example), with
> each entry pointing to the descriptor with the same index.
>                     avail
>                     idx
>                     +
>                     |
> +----+----+---+-------------+------+
> | 0  | 1  | 2 | ... |  254  | 255  |  avail ring
> +-+--+-+--+-+-+---------+---+--+---+
>   |    |    |       |   |      |
>   |    |    |       |   |      |
>   v    v    v       |   v      v
> +-+--+-+--+-+-+---------+---+--+---+
> | 0  | 1  | 2 | ... |  254  | 255  |  desc ring
> +----+----+---+-------------+------+
>                     |
>                     |
> +----+----+---+-------------+------+
> | 0  | 1  | 2 |     |  254  | 255  |  used ring
> +----+----+---+-------------+------+
>                     |
>                     +
>
> This is the ring layout for TX.
> As one virtio header is needed for each xmit packet, we have 128 slots available.
>
>                          ++
>                          ||
>                          ||
> +-----+-----+-----+--------------+------+------+------+
> |  0  |  1  | ... |  127 || 128  | 129  | ...  | 255  |   avail ring
> +--+--+--+--+-----+---+------+---+--+---+------+--+---+
>    |     |            |  ||  |      |             |
>    v     v            v  ||  v      v             v
> +--+--+--+--+-----+---+------+---+--+---+------+--+---+
> | 127 | 128 | ... |  255 || 127  | 128  | ...  | 255  |   desc ring for virtio_net_hdr
> +--+--+--+--+-----+---+------+---+--+---+------+--+---+
>    |     |            |  ||  |      |             |
>    v     v            v  ||  v      v             v
> +--+--+--+--+-----+---+------+---+--+---+------+--+---+
> |  0  |  1  | ... |  127 ||  0   |  1   | ...  | 127  |   desc ring for tx dat
> +-----+-----+-----+--------------+------+------+------+
>                          ||
>                          ||
>                          ++
>
> Performance boost could be observed if the virtio backend isn't the bottleneck or in VM2VM
> case.
> There are also several vhost optimization patches to be submitted later.
>
> Huawei Xie (8):
>   virtio: add configure for simple virtio rx/tx
>   virtio: add virtio_rxtx.h header file
>   virtio: add software rx ring, fake_buf, simple_rxtx into virtqueue
>   virtio: rx/tx ring layout optimization
>   virtio: fill RX avail ring with blank mbufs
>   virtio: virtio vec rx
>   virtio: simple tx routine
>   virtio: rxtx_func_get
>
>  config/common_linuxapp                  |   1 +
>  drivers/net/virtio/Makefile             |   2 +-
>  drivers/net/virtio/virtio_ethdev.c      |  29 ++-
>  drivers/net/virtio/virtio_ethdev.h      |   5 +
>  drivers/net/virtio/virtio_rxtx.c        |  70 +++++-
>  drivers/net/virtio/virtio_rxtx.h        |  39 ++++
>  drivers/net/virtio/virtio_rxtx_simple.c | 403 ++++++++++++++++++++++++++++++++
>  drivers/net/virtio/virtqueue.h          |   7 +
>  8 files changed, 550 insertions(+), 6 deletions(-)
>  create mode 100644 drivers/net/virtio/virtio_rxtx.h
>  create mode 100644 drivers/net/virtio/virtio_rxtx_simple.c
>


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [dpdk-dev] [PATCH 3/8] virtio: add software rx ring, fake_buf, simple_rxtx into virtqueue
  2015-09-29 14:45 ` [dpdk-dev] [PATCH 3/8] virtio: add software rx ring, fake_buf, simple_rxtx into virtqueue Huawei Xie
@ 2015-09-29 16:15   ` Stephen Hemminger
  0 siblings, 0 replies; 92+ messages in thread
From: Stephen Hemminger @ 2015-09-29 16:15 UTC (permalink / raw)
  To: Huawei Xie; +Cc: dev

On Tue, 29 Sep 2015 22:45:48 +0800
Huawei Xie <huawei.xie@intel.com> wrote:

> +		if (vq->sw_ring)
> +			rte_free(vq->sw_ring);

	rte_free of NULL is a nop so conditional here is unnecessary

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH v2 0/7] virtio ring layout optimization and simple rx/tx processing
  2015-09-29 14:45 [dpdk-dev] [PATCH 0/8] virtio: virtio ring layout optimization and RX vector processing Huawei Xie
                   ` (8 preceding siblings ...)
  2015-09-29 15:41 ` [dpdk-dev] [PATCH 0/8] virtio: virtio ring layout optimization and RX vector processing Xie, Huawei
@ 2015-10-18  6:28 ` Huawei Xie
  2015-10-18  6:28 ` Huawei Xie
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 92+ messages in thread
From: Huawei Xie @ 2015-10-18  6:28 UTC (permalink / raw)
  To: dev

In DPDK based switching enviroment, mostly vhost runs on a dedicated core
while virtio processing in guest VMs runs on different cores.
Take RX for example, with generic implementation, for each guest buffer,
a) virtio driver allocates a descriptor from free descriptor list
b) modify the entry of avail ring to point to allocated descriptor
c) after packet is received, free the descriptor

When vhost fetches the avail ring, it needs to fetch the modified L1 cache from
virtio core, which is a heavy cost in current CPU implementation.

This idea of this optimization is:
    allocate the fixed descriptor for each entry of avail ring.
and avail ring will always be the same during the run.
This removes L1 cache transfer from virtio core to vhost core for avail ring.
Besides, no descriptor free and allocation is needed.

Most importantly, this makes vector procesing possible to further accelerate
the processing.

This is the layout for the avail ring(take 256 ring entries for example), with
each entry pointing to the descriptor with the same index.
                    avail
                    idx
                    +
                    |
+----+----+---+-------------+------+
| 0  | 1  | 2 | ... |  254  | 255  |  avail ring
+-+--+-+--+-+-+---------+---+--+---+
  |    |    |       |   |      |
  |    |    |       |   |      |
  v    v    v       |   v      v
+-+--+-+--+-+-+---------+---+--+---+
| 0  | 1  | 2 | ... |  254  | 255  |  desc ring
+----+----+---+-------------+------+
                    |
                    |
+----+----+---+-------------+------+
| 0  | 1  | 2 |     |  254  | 255  |  used ring
+----+----+---+-------------+------+
                    |
                    +

This is the ring layout for TX.
As we need one virtio header for each xmit packet, we have 128 slots available.

                         ++
                         ||
                         ||
+-----+-----+-----+--------------+------+------+------+
|  0  |  1  | ... |  127 || 128  | 129  | ...  | 255  |   avail ring
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
   |     |            |  ||  |      |             |
   v     v            v  ||  v      v             v
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
| 127 | 128 | ... |  255 || 127  | 128  | ...  | 255  |   desc ring for virtio_net_hdr
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
   |     |            |  ||  |      |             |
   v     v            v  ||  v      v             v
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
|  0  |  1  | ... |  127 ||  0   |  1   | ...  | 127  |   desc ring for tx dat
+-----+-----+-----+--------------+------+------+------+
                         ||
                         ||
                         ++

Performance boost could be observed only if the virtio backend isn't the bottleneck or in VM2VM
case.
There are also several vhost optimization patches to be submitted later.

Changes in v2:
- Remove the configure macro
- Enable simple R/TX processing when user specifies simple txq flags
- Reword some comments and commit messages

Huawei Xie (7):
  virtio: add virtio_rxtx.h header file
  virtio: add software rx ring, fake_buf into virtqueue
  virtio: rx/tx ring layout optimization
  virtio: fill RX avail ring with blank mbufs
  virtio: virtio vec rx
  virtio: simple tx routine
  virtio: pick simple rx/tx func

 drivers/net/virtio/Makefile             |   2 +-
 drivers/net/virtio/virtio_ethdev.c      |  13 ++
 drivers/net/virtio/virtio_ethdev.h      |   5 +
 drivers/net/virtio/virtio_rxtx.c        |  53 ++++-
 drivers/net/virtio/virtio_rxtx.h        |  39 ++++
 drivers/net/virtio/virtio_rxtx_simple.c | 403 ++++++++++++++++++++++++++++++++
 drivers/net/virtio/virtqueue.h          |   5 +
 7 files changed, 517 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/virtio/virtio_rxtx.h
 create mode 100644 drivers/net/virtio/virtio_rxtx_simple.c

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH v2 0/7] virtio ring layout optimization and simple rx/tx processing
  2015-09-29 14:45 [dpdk-dev] [PATCH 0/8] virtio: virtio ring layout optimization and RX vector processing Huawei Xie
                   ` (9 preceding siblings ...)
  2015-10-18  6:28 ` [dpdk-dev] [PATCH v2 0/7] virtio ring layout optimization and simple rx/tx processing Huawei Xie
@ 2015-10-18  6:28 ` Huawei Xie
  2015-10-18  6:28   ` [dpdk-dev] [PATCH v2 1/7] virtio: add virtio_rxtx.h header file Huawei Xie
                     ` (6 more replies)
  2015-10-20 15:30 ` [dpdk-dev] [PATCH v3 0/7] virtio ring layout optimization and simple rx/tx processing Huawei Xie
                   ` (3 subsequent siblings)
  14 siblings, 7 replies; 92+ messages in thread
From: Huawei Xie @ 2015-10-18  6:28 UTC (permalink / raw)
  To: dev

In DPDK based switching enviroment, mostly vhost runs on a dedicated core
while virtio processing in guest VMs runs on different cores.
Take RX for example, with generic implementation, for each guest buffer,
a) virtio driver allocates a descriptor from free descriptor list
b) modify the entry of avail ring to point to allocated descriptor
c) after packet is received, free the descriptor

When vhost fetches the avail ring, it needs to fetch the modified L1 cache from
virtio core, which is a heavy cost in current CPU implementation.

This idea of this optimization is:
    allocate the fixed descriptor for each entry of avail ring.
and avail ring will always be the same during the run.
This removes L1 cache transfer from virtio core to vhost core for avail ring.
Besides, no descriptor free and allocation is needed.

Most importantly, this makes vector procesing possible to further accelerate
the processing.

This is the layout for the avail ring(take 256 ring entries for example), with
each entry pointing to the descriptor with the same index.
                    avail
                    idx
                    +
                    |
+----+----+---+-------------+------+
| 0  | 1  | 2 | ... |  254  | 255  |  avail ring
+-+--+-+--+-+-+---------+---+--+---+
  |    |    |       |   |      |
  |    |    |       |   |      |
  v    v    v       |   v      v
+-+--+-+--+-+-+---------+---+--+---+
| 0  | 1  | 2 | ... |  254  | 255  |  desc ring
+----+----+---+-------------+------+
                    |
                    |
+----+----+---+-------------+------+
| 0  | 1  | 2 |     |  254  | 255  |  used ring
+----+----+---+-------------+------+
                    |
                    +

This is the ring layout for TX.
As we need one virtio header for each xmit packet, we have 128 slots available.

                         ++
                         ||
                         ||
+-----+-----+-----+--------------+------+------+------+
|  0  |  1  | ... |  127 || 128  | 129  | ...  | 255  |   avail ring
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
   |     |            |  ||  |      |             |
   v     v            v  ||  v      v             v
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
| 127 | 128 | ... |  255 || 127  | 128  | ...  | 255  |   desc ring for virtio_net_hdr
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
   |     |            |  ||  |      |             |
   v     v            v  ||  v      v             v
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
|  0  |  1  | ... |  127 ||  0   |  1   | ...  | 127  |   desc ring for tx dat
+-----+-----+-----+--------------+------+------+------+
                         ||
                         ||
                         ++

Performance boost could be observed only if the virtio backend isn't the bottleneck or in VM2VM
case.
There are also several vhost optimization patches to be submitted later.

Changes in v2:
- Remove the configure macro
- Enable simple R/TX processing when user specifies simple txq flags
- Reword some comments and commit messages

Huawei Xie (7):
  virtio: add virtio_rxtx.h header file
  virtio: add software rx ring, fake_buf into virtqueue
  virtio: rx/tx ring layout optimization
  virtio: fill RX avail ring with blank mbufs
  virtio: virtio vec rx
  virtio: simple tx routine
  virtio: pick simple rx/tx func

 drivers/net/virtio/Makefile             |   2 +-
 drivers/net/virtio/virtio_ethdev.c      |  13 ++
 drivers/net/virtio/virtio_ethdev.h      |   5 +
 drivers/net/virtio/virtio_rxtx.c        |  53 ++++-
 drivers/net/virtio/virtio_rxtx.h        |  39 ++++
 drivers/net/virtio/virtio_rxtx_simple.c | 403 ++++++++++++++++++++++++++++++++
 drivers/net/virtio/virtqueue.h          |   5 +
 7 files changed, 517 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/virtio/virtio_rxtx.h
 create mode 100644 drivers/net/virtio/virtio_rxtx_simple.c

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH v2 1/7] virtio: add virtio_rxtx.h header file
  2015-10-18  6:28 ` Huawei Xie
@ 2015-10-18  6:28   ` Huawei Xie
  2015-10-18  6:28   ` [dpdk-dev] [PATCH v2 2/7] virtio: add software rx ring, fake_buf into virtqueue Huawei Xie
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 92+ messages in thread
From: Huawei Xie @ 2015-10-18  6:28 UTC (permalink / raw)
  To: dev

Would move all rx/tx related code into this header file in future.
Add RTE_VIRTIO_PMD_MAX_BURST.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 drivers/net/virtio/virtio_ethdev.c |  1 +
 drivers/net/virtio/virtio_rxtx.c   |  1 +
 drivers/net/virtio/virtio_rxtx.h   | 34 ++++++++++++++++++++++++++++++++++
 3 files changed, 36 insertions(+)
 create mode 100644 drivers/net/virtio/virtio_rxtx.h

diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index 465d3cd..79a3640 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -61,6 +61,7 @@
 #include "virtio_pci.h"
 #include "virtio_logs.h"
 #include "virtqueue.h"
+#include "virtio_rxtx.h"
 
 
 static int eth_virtio_dev_init(struct rte_eth_dev *eth_dev);
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index c5b53bb..9324f7f 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -54,6 +54,7 @@
 #include "virtio_logs.h"
 #include "virtio_ethdev.h"
 #include "virtqueue.h"
+#include "virtio_rxtx.h"
 
 #ifdef RTE_LIBRTE_VIRTIO_DEBUG_DUMP
 #define VIRTIO_DUMP_PACKET(m, len) rte_pktmbuf_dump(stdout, m, len)
diff --git a/drivers/net/virtio/virtio_rxtx.h b/drivers/net/virtio/virtio_rxtx.h
new file mode 100644
index 0000000..a10aa69
--- /dev/null
+++ b/drivers/net/virtio/virtio_rxtx.h
@@ -0,0 +1,34 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#define RTE_PMD_VIRTIO_RX_MAX_BURST 64
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH v2 2/7] virtio: add software rx ring, fake_buf into virtqueue
  2015-10-18  6:28 ` Huawei Xie
  2015-10-18  6:28   ` [dpdk-dev] [PATCH v2 1/7] virtio: add virtio_rxtx.h header file Huawei Xie
@ 2015-10-18  6:28   ` Huawei Xie
  2015-10-19  4:20     ` Stephen Hemminger
  2015-10-18  6:29   ` [dpdk-dev] [PATCH v2 3/7] virtio: rx/tx ring layout optimization Huawei Xie
                     ` (4 subsequent siblings)
  6 siblings, 1 reply; 92+ messages in thread
From: Huawei Xie @ 2015-10-18  6:28 UTC (permalink / raw)
  To: dev

Add software RX ring in virtqueue.
Add fake_mbuf in virtqueue for wraparound processing.
Use global simple_rxtx to indicate whether simple rxtx is enabled

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 drivers/net/virtio/virtio_ethdev.c | 12 ++++++++++++
 drivers/net/virtio/virtio_rxtx.c   |  7 +++++++
 drivers/net/virtio/virtqueue.h     |  4 ++++
 3 files changed, 23 insertions(+)

diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index 79a3640..3b7b841 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -247,6 +247,9 @@ virtio_dev_queue_release(struct virtqueue *vq) {
 		VIRTIO_WRITE_REG_2(hw, VIRTIO_PCI_QUEUE_SEL, vq->queue_id);
 		VIRTIO_WRITE_REG_4(hw, VIRTIO_PCI_QUEUE_PFN, 0);
 
+		if (vq->sw_ring)
+			rte_free(vq->sw_ring);
+
 		rte_free(vq);
 		vq = NULL;
 	}
@@ -292,6 +295,9 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
 			dev->data->port_id, queue_idx);
 		vq = rte_zmalloc(vq_name, sizeof(struct virtqueue) +
 			vq_size * sizeof(struct vq_desc_extra), RTE_CACHE_LINE_SIZE);
+		vq->sw_ring = rte_zmalloc_socket("rxq->sw_ring",
+			(RTE_PMD_VIRTIO_RX_MAX_BURST + vq_size) *
+			sizeof(vq->sw_ring[0]), RTE_CACHE_LINE_SIZE, socket_id);
 	} else if (queue_type == VTNET_TQ) {
 		snprintf(vq_name, sizeof(vq_name), "port%d_tvq%d",
 			dev->data->port_id, queue_idx);
@@ -308,6 +314,12 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
 		PMD_INIT_LOG(ERR, "%s: Can not allocate virtqueue", __func__);
 		return (-ENOMEM);
 	}
+	if (queue_type == VTNET_RQ && vq->sw_ring == NULL) {
+		PMD_INIT_LOG(ERR, "%s: Can not allocate RX soft ring",
+			__func__);
+		rte_free(vq);
+		return -ENOMEM;
+	}
 
 	vq->hw = hw;
 	vq->port_id = dev->data->port_id;
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 9324f7f..5c00e9d 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -62,6 +62,8 @@
 #define  VIRTIO_DUMP_PACKET(m, len) do { } while (0)
 #endif
 
+static int use_simple_rxtx;
+
 static void
 vq_ring_free_chain(struct virtqueue *vq, uint16_t desc_idx)
 {
@@ -299,6 +301,11 @@ virtio_dev_vring_start(struct virtqueue *vq, int queue_type)
 		/* Allocate blank mbufs for the each rx descriptor */
 		nbufs = 0;
 		error = ENOSPC;
+
+		memset(&vq->fake_mbuf, 0, sizeof(vq->fake_mbuf));
+		for (i = 0; i < RTE_PMD_VIRTIO_RX_MAX_BURST; i++)
+			vq->sw_ring[vq->vq_nentries + i] = &vq->fake_mbuf;
+
 		while (!virtqueue_full(vq)) {
 			m = rte_rxmbuf_alloc(vq->mpool);
 			if (m == NULL)
diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
index 7789411..6a1ec48 100644
--- a/drivers/net/virtio/virtqueue.h
+++ b/drivers/net/virtio/virtqueue.h
@@ -190,6 +190,10 @@ struct virtqueue {
 	uint16_t vq_avail_idx;
 	phys_addr_t virtio_net_hdr_mem; /**< hdr for each xmit packet */
 
+	struct rte_mbuf **sw_ring; /**< RX software ring. */
+	/* dummy mbuf, for wraparound when processing RX ring. */
+	struct rte_mbuf fake_mbuf;
+
 	/* Statistics */
 	uint64_t	packets;
 	uint64_t	bytes;
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH v2 3/7] virtio: rx/tx ring layout optimization
  2015-10-18  6:28 ` Huawei Xie
  2015-10-18  6:28   ` [dpdk-dev] [PATCH v2 1/7] virtio: add virtio_rxtx.h header file Huawei Xie
  2015-10-18  6:28   ` [dpdk-dev] [PATCH v2 2/7] virtio: add software rx ring, fake_buf into virtqueue Huawei Xie
@ 2015-10-18  6:29   ` Huawei Xie
  2015-10-18  6:29   ` [dpdk-dev] [PATCH v2 4/7] virtio: fill RX avail ring with blank mbufs Huawei Xie
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 92+ messages in thread
From: Huawei Xie @ 2015-10-18  6:29 UTC (permalink / raw)
  To: dev

In DPDK based switching enviroment, mostly vhost runs on a dedicated core
while virtio processing in guest VMs runs on different cores.
Take RX for example, with generic implementation, for each guest buffer,
a) virtio driver allocates a descriptor from free descriptor list
b) modify the entry of avail ring to point to allocated descriptor
c) after packet is received, free the descriptor

When vhost fetches the avail ring, it needs to fetch the modified L1 cache from
virtio core, which is a heavy cost in current CPU implementation.

This idea of this optimization is:
    allocate the fixed descriptor for each entry of avail ring.
and avail ring will always be the same during the run.
This removes L1 cache transfer from virtio core to vhost core for avail ring.
Besides, no descriptor free and allocation is needed.
This also makes vector procesing possible to further accelerate the processing.

This is the layout for the avail ring(take 256 ring entries for example), with
each entry pointing to the descriptor with the same index.
                    avail
                    idx
                    +
                    |
+----+----+---+-------------+------+
| 0  | 1  | 2 | ... |  254  | 255  |  avail ring
+-+--+-+--+-+-+---------+---+--+---+
  |    |    |       |   |      |
  |    |    |       |   |      |
  v    v    v       |   v      v
+-+--+-+--+-+-+---------+---+--+---+
| 0  | 1  | 2 | ... |  254  | 255  |  desc ring
+----+----+---+-------------+------+
                    |
                    |
+----+----+---+-------------+------+
| 0  | 1  | 2 |     |  254  | 255  |  used ring
+----+----+---+-------------+------+
                    |
                    +

This is the ring layout for TX.
As we need one virtio header for each xmit packet, we have 128 slots available.

                         ++
                         ||
                         ||
+-----+-----+-----+--------------+------+------+------+
|  0  |  1  | ... |  127 || 128  | 129  | ...  | 255  |   avail ring
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
   |     |            |  ||  |      |             |
   v     v            v  ||  v      v             v
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
| 127 | 128 | ... |  255 || 127  | 128  | ...  | 255  |   desc ring for virtio_net_hdr
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
   |     |            |  ||  |      |             |
   v     v            v  ||  v      v             v
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
|  0  |  1  | ... |  127 ||  0   |  1   | ...  | 127  |   desc ring for tx dat
+-----+-----+-----+--------------+------+------+------+
                         ||
                         ||
                         ++

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 drivers/net/virtio/virtio_rxtx.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 5c00e9d..7c82a6a 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -302,6 +302,12 @@ virtio_dev_vring_start(struct virtqueue *vq, int queue_type)
 		nbufs = 0;
 		error = ENOSPC;
 
+		if (use_simple_rxtx)
+			for (i = 0; i < vq->vq_nentries; i++) {
+				vq->vq_ring.avail->ring[i] = i;
+				vq->vq_ring.desc[i].flags = VRING_DESC_F_WRITE;
+			}
+
 		memset(&vq->fake_mbuf, 0, sizeof(vq->fake_mbuf));
 		for (i = 0; i < RTE_PMD_VIRTIO_RX_MAX_BURST; i++)
 			vq->sw_ring[vq->vq_nentries + i] = &vq->fake_mbuf;
@@ -332,6 +338,24 @@ virtio_dev_vring_start(struct virtqueue *vq, int queue_type)
 		VIRTIO_WRITE_REG_4(vq->hw, VIRTIO_PCI_QUEUE_PFN,
 			vq->mz->phys_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT);
 	} else if (queue_type == VTNET_TQ) {
+		if (use_simple_rxtx) {
+			int mid_idx  = vq->vq_nentries >> 1;
+			for (i = 0; i < mid_idx; i++) {
+				vq->vq_ring.avail->ring[i] = i + mid_idx;
+				vq->vq_ring.desc[i + mid_idx].next = i;
+				vq->vq_ring.desc[i + mid_idx].addr =
+					vq->virtio_net_hdr_mem +
+						mid_idx * vq->hw->vtnet_hdr_size;
+				vq->vq_ring.desc[i + mid_idx].len =
+					vq->hw->vtnet_hdr_size;
+				vq->vq_ring.desc[i + mid_idx].flags =
+					VRING_DESC_F_NEXT;
+				vq->vq_ring.desc[i].flags = 0;
+			}
+			for (i = mid_idx; i < vq->vq_nentries; i++)
+				vq->vq_ring.avail->ring[i] = i;
+		}
+
 		VIRTIO_WRITE_REG_2(vq->hw, VIRTIO_PCI_QUEUE_SEL,
 			vq->vq_queue_index);
 		VIRTIO_WRITE_REG_4(vq->hw, VIRTIO_PCI_QUEUE_PFN,
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH v2 4/7] virtio: fill RX avail ring with blank mbufs
  2015-10-18  6:28 ` Huawei Xie
                     ` (2 preceding siblings ...)
  2015-10-18  6:29   ` [dpdk-dev] [PATCH v2 3/7] virtio: rx/tx ring layout optimization Huawei Xie
@ 2015-10-18  6:29   ` Huawei Xie
  2015-10-18  6:29   ` [dpdk-dev] [PATCH v2 5/7] virtio: virtio vec rx Huawei Xie
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 92+ messages in thread
From: Huawei Xie @ 2015-10-18  6:29 UTC (permalink / raw)
  To: dev

fill avail ring with blank mbufs in virtio_dev_vring_start

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 drivers/net/virtio/Makefile             |  2 +-
 drivers/net/virtio/virtio_rxtx.c        |  6 ++-
 drivers/net/virtio/virtio_rxtx.h        |  3 ++
 drivers/net/virtio/virtio_rxtx_simple.c | 84 +++++++++++++++++++++++++++++++++
 4 files changed, 92 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/virtio/virtio_rxtx_simple.c

diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
index 930b60f..43835ba 100644
--- a/drivers/net/virtio/Makefile
+++ b/drivers/net/virtio/Makefile
@@ -50,7 +50,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtqueue.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_pci.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_ethdev.c
-
+SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_simple.c
 
 # this lib depends upon:
 DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_eal lib/librte_ether
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 7c82a6a..5162ce6 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -320,8 +320,10 @@ virtio_dev_vring_start(struct virtqueue *vq, int queue_type)
 			/******************************************
 			*         Enqueue allocated buffers        *
 			*******************************************/
-			error = virtqueue_enqueue_recv_refill(vq, m);
-
+			if (use_simple_rxtx)
+				error = virtqueue_enqueue_recv_refill_simple(vq, m);
+			else
+				error = virtqueue_enqueue_recv_refill(vq, m);
 			if (error) {
 				rte_pktmbuf_free(m);
 				break;
diff --git a/drivers/net/virtio/virtio_rxtx.h b/drivers/net/virtio/virtio_rxtx.h
index a10aa69..7d2d8fe 100644
--- a/drivers/net/virtio/virtio_rxtx.h
+++ b/drivers/net/virtio/virtio_rxtx.h
@@ -32,3 +32,6 @@
  */
 
 #define RTE_PMD_VIRTIO_RX_MAX_BURST 64
+
+int virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,
+	struct rte_mbuf *m);
diff --git a/drivers/net/virtio/virtio_rxtx_simple.c b/drivers/net/virtio/virtio_rxtx_simple.c
new file mode 100644
index 0000000..cac5b9f
--- /dev/null
+++ b/drivers/net/virtio/virtio_rxtx_simple.c
@@ -0,0 +1,84 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <errno.h>
+
+#include <tmmintrin.h>
+
+#include <rte_cycles.h>
+#include <rte_memory.h>
+#include <rte_memzone.h>
+#include <rte_branch_prediction.h>
+#include <rte_mempool.h>
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+#include <rte_ether.h>
+#include <rte_ethdev.h>
+#include <rte_prefetch.h>
+#include <rte_string_fns.h>
+#include <rte_errno.h>
+#include <rte_byteorder.h>
+
+#include "virtio_logs.h"
+#include "virtio_ethdev.h"
+#include "virtqueue.h"
+#include "virtio_rxtx.h"
+
+int __attribute__((cold))
+virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,
+	struct rte_mbuf *cookie)
+{
+	struct vq_desc_extra *dxp;
+	struct vring_desc *start_dp;
+	uint16_t desc_idx;
+
+	desc_idx = vq->vq_avail_idx & (vq->vq_nentries - 1);
+	dxp = &vq->vq_descx[desc_idx];
+	dxp->cookie = (void *)cookie;
+	vq->sw_ring[desc_idx] = cookie;
+
+	start_dp = vq->vq_ring.desc;
+	start_dp[desc_idx].addr = (uint64_t)((uintptr_t)cookie->buf_physaddr +
+		RTE_PKTMBUF_HEADROOM - sizeof(struct virtio_net_hdr));
+	start_dp[desc_idx].len = cookie->buf_len -
+		RTE_PKTMBUF_HEADROOM + sizeof(struct virtio_net_hdr);
+
+	vq->vq_free_cnt--;
+	vq->vq_avail_idx++;
+
+	return 0;
+}
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH v2 5/7] virtio: virtio vec rx
  2015-10-18  6:28 ` Huawei Xie
                     ` (3 preceding siblings ...)
  2015-10-18  6:29   ` [dpdk-dev] [PATCH v2 4/7] virtio: fill RX avail ring with blank mbufs Huawei Xie
@ 2015-10-18  6:29   ` Huawei Xie
  2015-10-18  6:29   ` [dpdk-dev] [PATCH v2 6/7] virtio: simple tx routine Huawei Xie
  2015-10-18  6:29   ` [dpdk-dev] [PATCH v2 7/7] virtio: pick simple rx/tx func Huawei Xie
  6 siblings, 0 replies; 92+ messages in thread
From: Huawei Xie @ 2015-10-18  6:29 UTC (permalink / raw)
  To: dev

With fixed avail ring, we don't need to get desc idx from avail ring.
virtio driver only has to deal with desc ring.
This patch uses vector instruction to accelerate processing desc ring.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 drivers/net/virtio/virtio_ethdev.h      |   2 +
 drivers/net/virtio/virtio_rxtx.c        |   3 +
 drivers/net/virtio/virtio_rxtx.h        |   2 +
 drivers/net/virtio/virtio_rxtx_simple.c | 224 ++++++++++++++++++++++++++++++++
 drivers/net/virtio/virtqueue.h          |   1 +
 5 files changed, 232 insertions(+)

diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h
index 9026d42..d7797ab 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -108,6 +108,8 @@ uint16_t virtio_recv_mergeable_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 uint16_t virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
+		uint16_t nb_pkts);
 
 /*
  * The VIRTIO_NET_F_GUEST_TSO[46] features permit the host to send us
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 5162ce6..947fc46 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -432,6 +432,9 @@ virtio_dev_rx_queue_setup(struct rte_eth_dev *dev,
 	vq->mpool = mp;
 
 	dev->data->rx_queues[queue_idx] = vq;
+
+	virtio_rxq_vec_setup(vq);
+
 	return 0;
 }
 
diff --git a/drivers/net/virtio/virtio_rxtx.h b/drivers/net/virtio/virtio_rxtx.h
index 7d2d8fe..831e492 100644
--- a/drivers/net/virtio/virtio_rxtx.h
+++ b/drivers/net/virtio/virtio_rxtx.h
@@ -33,5 +33,7 @@
 
 #define RTE_PMD_VIRTIO_RX_MAX_BURST 64
 
+int virtio_rxq_vec_setup(struct virtqueue *rxq);
+
 int virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,
 	struct rte_mbuf *m);
diff --git a/drivers/net/virtio/virtio_rxtx_simple.c b/drivers/net/virtio/virtio_rxtx_simple.c
index cac5b9f..ef17562 100644
--- a/drivers/net/virtio/virtio_rxtx_simple.c
+++ b/drivers/net/virtio/virtio_rxtx_simple.c
@@ -58,6 +58,10 @@
 #include "virtqueue.h"
 #include "virtio_rxtx.h"
 
+#define RTE_VIRTIO_VPMD_RX_BURST 32
+#define RTE_VIRTIO_DESC_PER_LOOP 8
+#define RTE_VIRTIO_VPMD_RX_REARM_THRESH RTE_VIRTIO_VPMD_RX_BURST
+
 int __attribute__((cold))
 virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,
 	struct rte_mbuf *cookie)
@@ -82,3 +86,223 @@ virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,
 
 	return 0;
 }
+
+static inline void
+virtio_rxq_rearm_vec(struct virtqueue *rxvq)
+{
+	int i;
+	uint16_t desc_idx;
+	struct rte_mbuf **sw_ring;
+	struct vring_desc *start_dp;
+	int ret;
+
+	desc_idx = rxvq->vq_avail_idx & (rxvq->vq_nentries - 1);
+	sw_ring = &rxvq->sw_ring[desc_idx];
+	start_dp = &rxvq->vq_ring.desc[desc_idx];
+
+	ret = rte_mempool_get_bulk(rxvq->mpool, (void **)sw_ring,
+		RTE_VIRTIO_VPMD_RX_REARM_THRESH);
+	if (unlikely(ret)) {
+		rte_eth_devices[rxvq->port_id].data->rx_mbuf_alloc_failed +=
+			RTE_VIRTIO_VPMD_RX_REARM_THRESH;
+		return;
+	}
+
+	for (i = 0; i < RTE_VIRTIO_VPMD_RX_REARM_THRESH; i++) {
+		uintptr_t p;
+
+		p = (uintptr_t)&sw_ring[i]->rearm_data;
+		*(uint64_t *)p = rxvq->mbuf_initializer;
+
+		start_dp[i].addr =
+			(uint64_t)((uintptr_t)sw_ring[i]->buf_physaddr +
+			RTE_PKTMBUF_HEADROOM - sizeof(struct virtio_net_hdr));
+		start_dp[i].len = sw_ring[i]->buf_len -
+			RTE_PKTMBUF_HEADROOM + sizeof(struct virtio_net_hdr);
+	}
+
+	rxvq->vq_avail_idx += RTE_VIRTIO_VPMD_RX_REARM_THRESH;
+	rxvq->vq_free_cnt -= RTE_VIRTIO_VPMD_RX_REARM_THRESH;
+	vq_update_avail_idx(rxvq);
+}
+
+/* virtio vPMD receive routine, only accept(nb_pkts >= RTE_VIRTIO_DESC_PER_LOOP)
+ *
+ * This routine is for non-mergable RX, one desc for each guest buffer.
+ * This routine is based on the RX ring layout optimization. Each entry in the
+ * avail ring points to the desc with the same index in the desc ring and this
+ * will never be changed in the driver.
+ *
+ * - nb_pkts < RTE_VIRTIO_DESC_PER_LOOP, just return no packet
+ */
+uint16_t
+virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
+	uint16_t nb_pkts)
+{
+	struct virtqueue *rxvq = rx_queue;
+	uint16_t nb_used;
+	uint16_t desc_idx;
+	struct vring_used_elem *rused;
+	struct rte_mbuf **sw_ring;
+	struct rte_mbuf **sw_ring_end;
+	uint16_t nb_pkts_received;
+	__m128i shuf_msk1, shuf_msk2, len_adjust;
+
+	shuf_msk1 = _mm_set_epi8(
+		0xFF, 0xFF, 0xFF, 0xFF,
+		0xFF, 0xFF,		/* vlan tci */
+		5, 4,			/* dat len */
+		0xFF, 0xFF, 5, 4,	/* pkt len */
+		0xFF, 0xFF, 0xFF, 0xFF	/* packet type */
+
+	);
+
+	shuf_msk2 = _mm_set_epi8(
+		0xFF, 0xFF, 0xFF, 0xFF,
+		0xFF, 0xFF,		/* vlan tci */
+		13, 12,			/* dat len */
+		0xFF, 0xFF, 13, 12,	/* pkt len */
+		0xFF, 0xFF, 0xFF, 0xFF	/* packet type */
+	);
+
+	/* Substract the header length.
+	*  In which case do we need the header length in used->len ?
+	*/
+	len_adjust = _mm_set_epi16(
+		0, 0,
+		0,
+		(uint16_t) -sizeof(struct virtio_net_hdr),
+		0, (uint16_t) -sizeof(struct virtio_net_hdr),
+		0, 0);
+
+	if (unlikely(nb_pkts < RTE_VIRTIO_DESC_PER_LOOP))
+		return 0;
+
+	nb_used = *(volatile uint16_t *)&rxvq->vq_ring.used->idx -
+		rxvq->vq_used_cons_idx;
+
+	rte_compiler_barrier();
+
+	if (unlikely(nb_used == 0))
+		return 0;
+
+	nb_pkts = RTE_ALIGN_FLOOR(nb_pkts, RTE_VIRTIO_DESC_PER_LOOP);
+	nb_used = RTE_MIN(nb_used, nb_pkts);
+
+	desc_idx = (uint16_t)(rxvq->vq_used_cons_idx & (rxvq->vq_nentries - 1));
+	rused = &rxvq->vq_ring.used->ring[desc_idx];
+	sw_ring  = &rxvq->sw_ring[desc_idx];
+	sw_ring_end = &rxvq->sw_ring[rxvq->vq_nentries];
+
+	_mm_prefetch((const void *)rused, _MM_HINT_T0);
+
+	if (rxvq->vq_free_cnt >= RTE_VIRTIO_VPMD_RX_REARM_THRESH) {
+		virtio_rxq_rearm_vec(rxvq);
+		if (unlikely(virtqueue_kick_prepare(rxvq)))
+			virtqueue_notify(rxvq);
+	}
+
+	for (nb_pkts_received = 0;
+		nb_pkts_received < nb_used;) {
+		__m128i desc[RTE_VIRTIO_DESC_PER_LOOP / 2];
+		__m128i mbp[RTE_VIRTIO_DESC_PER_LOOP / 2];
+		__m128i pkt_mb[RTE_VIRTIO_DESC_PER_LOOP];
+
+		mbp[0] = _mm_loadu_si128((__m128i *)(sw_ring + 0));
+		desc[0] = _mm_loadu_si128((__m128i *)(rused + 0));
+		_mm_storeu_si128((__m128i *)&rx_pkts[0], mbp[0]);
+
+		mbp[1] = _mm_loadu_si128((__m128i *)(sw_ring + 2));
+		desc[1] = _mm_loadu_si128((__m128i *)(rused + 2));
+		_mm_storeu_si128((__m128i *)&rx_pkts[2], mbp[1]);
+
+		mbp[2] = _mm_loadu_si128((__m128i *)(sw_ring + 4));
+		desc[2] = _mm_loadu_si128((__m128i *)(rused + 4));
+		_mm_storeu_si128((__m128i *)&rx_pkts[4], mbp[2]);
+
+		mbp[3] = _mm_loadu_si128((__m128i *)(sw_ring + 6));
+		desc[3] = _mm_loadu_si128((__m128i *)(rused + 6));
+		_mm_storeu_si128((__m128i *)&rx_pkts[6], mbp[3]);
+
+		pkt_mb[1] = _mm_shuffle_epi8(desc[0], shuf_msk2);
+		pkt_mb[0] = _mm_shuffle_epi8(desc[0], shuf_msk1);
+		pkt_mb[1] = _mm_add_epi16(pkt_mb[1], len_adjust);
+		pkt_mb[0] = _mm_add_epi16(pkt_mb[0], len_adjust);
+		_mm_storeu_si128((void *)&rx_pkts[1]->rx_descriptor_fields1,
+			pkt_mb[1]);
+		_mm_storeu_si128((void *)&rx_pkts[0]->rx_descriptor_fields1,
+			pkt_mb[0]);
+
+		pkt_mb[3] = _mm_shuffle_epi8(desc[1], shuf_msk2);
+		pkt_mb[2] = _mm_shuffle_epi8(desc[1], shuf_msk1);
+		pkt_mb[3] = _mm_add_epi16(pkt_mb[3], len_adjust);
+		pkt_mb[2] = _mm_add_epi16(pkt_mb[2], len_adjust);
+		_mm_storeu_si128((void *)&rx_pkts[3]->rx_descriptor_fields1,
+			pkt_mb[3]);
+		_mm_storeu_si128((void *)&rx_pkts[2]->rx_descriptor_fields1,
+			pkt_mb[2]);
+
+		pkt_mb[5] = _mm_shuffle_epi8(desc[2], shuf_msk2);
+		pkt_mb[4] = _mm_shuffle_epi8(desc[2], shuf_msk1);
+		pkt_mb[5] = _mm_add_epi16(pkt_mb[5], len_adjust);
+		pkt_mb[4] = _mm_add_epi16(pkt_mb[4], len_adjust);
+		_mm_storeu_si128((void *)&rx_pkts[5]->rx_descriptor_fields1,
+			pkt_mb[5]);
+		_mm_storeu_si128((void *)&rx_pkts[4]->rx_descriptor_fields1,
+			pkt_mb[4]);
+
+		pkt_mb[7] = _mm_shuffle_epi8(desc[3], shuf_msk2);
+		pkt_mb[6] = _mm_shuffle_epi8(desc[3], shuf_msk1);
+		pkt_mb[7] = _mm_add_epi16(pkt_mb[7], len_adjust);
+		pkt_mb[6] = _mm_add_epi16(pkt_mb[6], len_adjust);
+		_mm_storeu_si128((void *)&rx_pkts[7]->rx_descriptor_fields1,
+			pkt_mb[7]);
+		_mm_storeu_si128((void *)&rx_pkts[6]->rx_descriptor_fields1,
+			pkt_mb[6]);
+
+		if (unlikely(nb_used <= RTE_VIRTIO_DESC_PER_LOOP)) {
+			if (sw_ring + nb_used <= sw_ring_end)
+				nb_pkts_received += nb_used;
+			else
+				nb_pkts_received += sw_ring_end - sw_ring;
+			break;
+		} else {
+			if (unlikely(sw_ring + RTE_VIRTIO_DESC_PER_LOOP >=
+				sw_ring_end)) {
+				nb_pkts_received += sw_ring_end - sw_ring;
+				break;
+			} else {
+				nb_pkts_received += RTE_VIRTIO_DESC_PER_LOOP;
+
+				rx_pkts += RTE_VIRTIO_DESC_PER_LOOP;
+				sw_ring += RTE_VIRTIO_DESC_PER_LOOP;
+				rused   += RTE_VIRTIO_DESC_PER_LOOP;
+				nb_used -= RTE_VIRTIO_DESC_PER_LOOP;
+			}
+		}
+	}
+
+	rxvq->vq_used_cons_idx += nb_pkts_received;
+	rxvq->vq_free_cnt += nb_pkts_received;
+	rxvq->packets += nb_pkts_received;
+	return nb_pkts_received;
+}
+
+int __attribute__((cold))
+virtio_rxq_vec_setup(struct virtqueue *rxq)
+{
+	uintptr_t p;
+	struct rte_mbuf mb_def = { .buf_addr = 0 }; /* zeroed mbuf */
+
+	mb_def.nb_segs = 1;
+	mb_def.data_off = RTE_PKTMBUF_HEADROOM;
+	mb_def.port = rxq->port_id;
+	rte_mbuf_refcnt_set(&mb_def, 1);
+
+	/* prevent compiler reordering: rearm_data covers previous fields */
+	rte_compiler_barrier();
+	p = (uintptr_t)&mb_def.rearm_data;
+	rxq->mbuf_initializer = *(uint64_t *)p;
+
+	return 0;
+}
diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
index 6a1ec48..98a77d5 100644
--- a/drivers/net/virtio/virtqueue.h
+++ b/drivers/net/virtio/virtqueue.h
@@ -188,6 +188,7 @@ struct virtqueue {
 	 */
 	uint16_t vq_used_cons_idx;
 	uint16_t vq_avail_idx;
+	uint64_t mbuf_initializer; /**< value to init mbufs. */
 	phys_addr_t virtio_net_hdr_mem; /**< hdr for each xmit packet */
 
 	struct rte_mbuf **sw_ring; /**< RX software ring. */
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH v2 6/7] virtio: simple tx routine
  2015-10-18  6:28 ` Huawei Xie
                     ` (4 preceding siblings ...)
  2015-10-18  6:29   ` [dpdk-dev] [PATCH v2 5/7] virtio: virtio vec rx Huawei Xie
@ 2015-10-18  6:29   ` Huawei Xie
  2015-10-19  4:16     ` Stephen Hemminger
                       ` (2 more replies)
  2015-10-18  6:29   ` [dpdk-dev] [PATCH v2 7/7] virtio: pick simple rx/tx func Huawei Xie
  6 siblings, 3 replies; 92+ messages in thread
From: Huawei Xie @ 2015-10-18  6:29 UTC (permalink / raw)
  To: dev

bulk free of mbufs when clean used ring.
shift operation of idx could be further saved if vq_free_cnt means
free slots rather than free descriptors.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 drivers/net/virtio/virtio_ethdev.h      |  3 ++
 drivers/net/virtio/virtio_rxtx_simple.c | 95 +++++++++++++++++++++++++++++++++
 2 files changed, 98 insertions(+)

diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h
index d7797ab..ae2d47d 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -111,6 +111,9 @@ uint16_t virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 uint16_t virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t virtio_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 /*
  * The VIRTIO_NET_F_GUEST_TSO[46] features permit the host to send us
  * frames larger than 1514 bytes. We do not yet support software LRO
diff --git a/drivers/net/virtio/virtio_rxtx_simple.c b/drivers/net/virtio/virtio_rxtx_simple.c
index ef17562..3339a24 100644
--- a/drivers/net/virtio/virtio_rxtx_simple.c
+++ b/drivers/net/virtio/virtio_rxtx_simple.c
@@ -288,6 +288,101 @@ virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
 	return nb_pkts_received;
 }
 
+#define VIRTIO_TX_FREE_THRESH 32
+#define VIRTIO_TX_MAX_FREE_BUF_SZ 32
+#define VIRTIO_TX_FREE_NR 32
+/* TODO: vq->tx_free_cnt could mean num of free slots so we could avoid shift */
+static inline void __attribute__((always_inline))
+virtio_xmit_cleanup(struct virtqueue *vq)
+{
+	uint16_t i, desc_idx;
+	int nb_free = 0;
+	struct rte_mbuf *m, *free[VIRTIO_TX_MAX_FREE_BUF_SZ];
+
+	desc_idx = (uint16_t)(vq->vq_used_cons_idx &
+		((vq->vq_nentries >> 1) - 1));
+	free[0] = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
+	nb_free = 1;
+
+	for (i = 1; i < VIRTIO_TX_FREE_NR; i++) {
+		m = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
+		if (likely(m->pool == free[0]->pool))
+			free[nb_free++] = m;
+		else {
+			rte_mempool_put_bulk(free[0]->pool, (void **)free,
+				nb_free);
+			free[0] = m;
+			nb_free = 1;
+		}
+	}
+
+	rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
+	vq->vq_used_cons_idx += VIRTIO_TX_FREE_NR;
+	vq->vq_free_cnt += (VIRTIO_TX_FREE_NR << 1);
+
+	return;
+}
+
+uint16_t
+virtio_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
+	uint16_t nb_pkts)
+{
+	struct virtqueue *txvq = tx_queue;
+	uint16_t nb_used;
+	uint16_t desc_idx;
+	struct vring_desc *start_dp;
+	uint16_t nb_tail, nb_commit;
+	int i;
+	uint16_t desc_idx_max = (txvq->vq_nentries >> 1) - 1;
+
+	nb_used = VIRTQUEUE_NUSED(txvq);
+	rte_compiler_barrier();
+
+	nb_commit = nb_pkts = RTE_MIN((txvq->vq_free_cnt >> 1), nb_pkts);
+	desc_idx = (uint16_t) (txvq->vq_avail_idx & desc_idx_max);
+	start_dp = txvq->vq_ring.desc;
+	nb_tail = (uint16_t) (desc_idx_max + 1 - desc_idx);
+
+	if (nb_used >= VIRTIO_TX_FREE_THRESH)
+		virtio_xmit_cleanup(tx_queue);
+
+	if (nb_commit >= nb_tail) {
+		for (i = 0; i < nb_tail; i++)
+			txvq->vq_descx[desc_idx + i].cookie = tx_pkts[i];
+		for (i = 0; i < nb_tail; i++) {
+			start_dp[desc_idx].addr =
+				RTE_MBUF_DATA_DMA_ADDR(*tx_pkts);
+			start_dp[desc_idx].len = (*tx_pkts)->pkt_len;
+			tx_pkts++;
+			desc_idx++;
+		}
+		nb_commit -= nb_tail;
+		desc_idx = 0;
+	}
+	for (i = 0; i < nb_commit; i++)
+		txvq->vq_descx[desc_idx + i].cookie = tx_pkts[i];
+	for (i = 0; i < nb_commit; i++) {
+		start_dp[desc_idx].addr = RTE_MBUF_DATA_DMA_ADDR(*tx_pkts);
+		start_dp[desc_idx].len = (*tx_pkts)->pkt_len;
+		tx_pkts++;
+		desc_idx++;
+	}
+
+	rte_compiler_barrier();
+
+	txvq->vq_free_cnt -= (uint16_t)(nb_pkts << 1);
+	txvq->vq_avail_idx += nb_pkts;
+	txvq->vq_ring.avail->idx = txvq->vq_avail_idx;
+	txvq->packets += nb_pkts;
+
+	if (likely(nb_pkts)) {
+		if (unlikely(virtqueue_kick_prepare(txvq)))
+			virtqueue_notify(txvq);
+	}
+
+	return nb_pkts;
+}
+
 int __attribute__((cold))
 virtio_rxq_vec_setup(struct virtqueue *rxq)
 {
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH v2 7/7] virtio: pick simple rx/tx func
  2015-10-18  6:28 ` Huawei Xie
                     ` (5 preceding siblings ...)
  2015-10-18  6:29   ` [dpdk-dev] [PATCH v2 6/7] virtio: simple tx routine Huawei Xie
@ 2015-10-18  6:29   ` Huawei Xie
  6 siblings, 0 replies; 92+ messages in thread
From: Huawei Xie @ 2015-10-18  6:29 UTC (permalink / raw)
  To: dev

simple rx/tx func is enabled when user specifies single segment, no offload support.
merge-able should be disabled to use simple rxtx.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 drivers/net/virtio/virtio_rxtx.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 947fc46..71f8cd4 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -62,6 +62,10 @@
 #define  VIRTIO_DUMP_PACKET(m, len) do { } while (0)
 #endif
 
+
+#define VIRTIO_SIMPLE_FLAGS ((uint32_t)ETH_TXQ_FLAGS_NOMULTSEGS | \
+	ETH_TXQ_FLAGS_NOOFFLOADS)
+
 static int use_simple_rxtx;
 
 static void
@@ -471,6 +475,14 @@ virtio_dev_tx_queue_setup(struct rte_eth_dev *dev,
 		return -EINVAL;
 	}
 
+	/* Use simple rx/tx func if single segment and no offloads */
+	if ((tx_conf->txq_flags & VIRTIO_SIMPLE_FLAGS) == VIRTIO_SIMPLE_FLAGS) {
+		PMD_INIT_LOG(INFO, "Using simple rx/tx path");
+		dev->tx_pkt_burst = virtio_xmit_pkts_simple;
+		dev->rx_pkt_burst = virtio_recv_pkts_vec;
+		use_simple_rxtx = 1;
+	}
+
 	ret = virtio_dev_queue_setup(dev, VTNET_TQ, queue_idx, vtpci_queue_idx,
 			nb_desc, socket_id, &vq);
 	if (ret < 0) {
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [dpdk-dev] [PATCH v2 6/7] virtio: simple tx routine
  2015-10-18  6:29   ` [dpdk-dev] [PATCH v2 6/7] virtio: simple tx routine Huawei Xie
@ 2015-10-19  4:16     ` Stephen Hemminger
  2015-10-19  5:22       ` Xie, Huawei
  2015-10-19  4:18     ` Stephen Hemminger
  2015-10-19  4:19     ` Stephen Hemminger
  2 siblings, 1 reply; 92+ messages in thread
From: Stephen Hemminger @ 2015-10-19  4:16 UTC (permalink / raw)
  To: Huawei Xie; +Cc: dev

On Sun, 18 Oct 2015 14:29:03 +0800
Huawei Xie <huawei.xie@intel.com> wrote:

> bulk free of mbufs when clean used ring.
> shift operation of idx could be further saved if vq_free_cnt means
> free slots rather than free descriptors.
> 
> Signed-off-by: Huawei Xie <huawei.xie@intel.com>

Did you measure this. I finished my transmit optimizations and gets 25% performance improvement
without any of these restrictions.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [dpdk-dev] [PATCH v2 6/7] virtio: simple tx routine
  2015-10-18  6:29   ` [dpdk-dev] [PATCH v2 6/7] virtio: simple tx routine Huawei Xie
  2015-10-19  4:16     ` Stephen Hemminger
@ 2015-10-19  4:18     ` Stephen Hemminger
  2015-10-19  5:15       ` Xie, Huawei
  2015-10-19  4:19     ` Stephen Hemminger
  2 siblings, 1 reply; 92+ messages in thread
From: Stephen Hemminger @ 2015-10-19  4:18 UTC (permalink / raw)
  To: Huawei Xie; +Cc: dev

On Sun, 18 Oct 2015 14:29:03 +0800
Huawei Xie <huawei.xie@intel.com> wrote:

> +
> +	for (i = 1; i < VIRTIO_TX_FREE_NR; i++) {
> +		m = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
> +		if (likely(m->pool == free[0]->pool))
> +			free[nb_free++] = m;
> +		else {
> +			rte_mempool_put_bulk(free[0]->pool, (void **)free,
> +				nb_free);
> +			free[0] = m;
> +			nb_free = 1;
> +		}
> +	}

This assumes all transmits are from the same pool, which is not necessarily true.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [dpdk-dev] [PATCH v2 6/7] virtio: simple tx routine
  2015-10-18  6:29   ` [dpdk-dev] [PATCH v2 6/7] virtio: simple tx routine Huawei Xie
  2015-10-19  4:16     ` Stephen Hemminger
  2015-10-19  4:18     ` Stephen Hemminger
@ 2015-10-19  4:19     ` Stephen Hemminger
  2015-10-19  5:12       ` Xie, Huawei
  2 siblings, 1 reply; 92+ messages in thread
From: Stephen Hemminger @ 2015-10-19  4:19 UTC (permalink / raw)
  To: Huawei Xie; +Cc: dev


+static inline void __attribute__((always_inline))
+virtio_xmit_cleanup(struct virtqueue *vq)
+{

Please don't use always inline, frustrating the compiler isn't going
to help.

+	uint16_t i, desc_idx;
+	int nb_free = 0;
+	struct rte_mbuf *m, *free[VIRTIO_TX_MAX_FREE_BUF_SZ];
+
+	desc_idx = (uint16_t)(vq->vq_used_cons_idx &
+		((vq->vq_nentries >> 1) - 1));
+	free[0] = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
+	nb_free = 1;
+
+	for (i = 1; i < VIRTIO_TX_FREE_NR; i++) {
+		m = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
+		if (likely(m->pool == free[0]->pool))
+			free[nb_free++] = m;
+		else {
+			rte_mempool_put_bulk(free[0]->pool, (void **)free,
+				nb_free);
+			free[0] = m;
+			nb_free = 1;
+		}
+	}
+
+	rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
+	vq->vq_used_cons_idx += VIRTIO_TX_FREE_NR;
+	vq->vq_free_cnt += (VIRTIO_TX_FREE_NR << 1);
+
+	return;
+}

Don't add return; at end of void functions. It only clutters
things for no reason.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [dpdk-dev] [PATCH v2 2/7] virtio: add software rx ring, fake_buf into virtqueue
  2015-10-18  6:28   ` [dpdk-dev] [PATCH v2 2/7] virtio: add software rx ring, fake_buf into virtqueue Huawei Xie
@ 2015-10-19  4:20     ` Stephen Hemminger
  2015-10-19  5:06       ` Xie, Huawei
  0 siblings, 1 reply; 92+ messages in thread
From: Stephen Hemminger @ 2015-10-19  4:20 UTC (permalink / raw)
  To: Huawei Xie; +Cc: dev

On Sun, 18 Oct 2015 14:28:59 +0800
Huawei Xie <huawei.xie@intel.com> wrote:

> +		if (vq->sw_ring)
> +			rte_free(vq->sw_ring);
> +

Do not need to test for NULL before calling rte_free.
Better to just rely on the fact that rte_free(NULL) is documented
to be ok (no operation).

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [dpdk-dev] [PATCH v2 2/7] virtio: add software rx ring, fake_buf into virtqueue
  2015-10-19  4:20     ` Stephen Hemminger
@ 2015-10-19  5:06       ` Xie, Huawei
  2015-10-20 15:32         ` Xie, Huawei
  0 siblings, 1 reply; 92+ messages in thread
From: Xie, Huawei @ 2015-10-19  5:06 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

On 10/19/2015 12:20 PM, Stephen Hemminger wrote:

On Sun, 18 Oct 2015 14:28:59 +0800
Huawei Xie <huawei.xie@intel.com><mailto:huawei.xie@intel.com> wrote:



+               if (vq->sw_ring)
+                       rte_free(vq->sw_ring);
+



Do not need to test for NULL before calling rte_free.
Better to just rely on the fact that rte_free(NULL) is documented
to be ok (no operation).



 ok, btw, in previous commit, just in the same function,
void virtio_dev_queue_release(vq)
[...]

                           rte_free(vq);

                           vq = NULL;
I think there is no need to set NULL to vq. Will submit a patch to fix it if you agree.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [dpdk-dev] [PATCH v2 6/7] virtio: simple tx routine
  2015-10-19  4:19     ` Stephen Hemminger
@ 2015-10-19  5:12       ` Xie, Huawei
  0 siblings, 0 replies; 92+ messages in thread
From: Xie, Huawei @ 2015-10-19  5:12 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

On 10/19/2015 12:19 PM, Stephen Hemminger wrote:
> +static inline void __attribute__((always_inline))
> +virtio_xmit_cleanup(struct virtqueue *vq)
> +{
>
> Please don't use always inline, frustrating the compiler isn't going
> to help.
always inline is scattered elsewhere in the dpdk code.
What is the negative effect? Should we remove all of them?
> +	uint16_t i, desc_idx;
> +	int nb_free = 0;
> +	struct rte_mbuf *m, *free[VIRTIO_TX_MAX_FREE_BUF_SZ];
> +
> +	desc_idx = (uint16_t)(vq->vq_used_cons_idx &
> +		((vq->vq_nentries >> 1) - 1));
> +	free[0] = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
> +	nb_free = 1;
> +
> +	for (i = 1; i < VIRTIO_TX_FREE_NR; i++) {
> +		m = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
> +		if (likely(m->pool == free[0]->pool))
> +			free[nb_free++] = m;
> +		else {
> +			rte_mempool_put_bulk(free[0]->pool, (void **)free,
> +				nb_free);
> +			free[0] = m;
> +			nb_free = 1;
> +		}
> +	}
> +
> +	rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
> +	vq->vq_used_cons_idx += VIRTIO_TX_FREE_NR;
> +	vq->vq_free_cnt += (VIRTIO_TX_FREE_NR << 1);
> +
> +	return;
> +}
>
> Don't add return; at end of void functions. It only clutters
> things for no reason.
Agree.
>


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [dpdk-dev] [PATCH v2 6/7] virtio: simple tx routine
  2015-10-19  4:18     ` Stephen Hemminger
@ 2015-10-19  5:15       ` Xie, Huawei
  0 siblings, 0 replies; 92+ messages in thread
From: Xie, Huawei @ 2015-10-19  5:15 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

On 10/19/2015 12:17 PM, Stephen Hemminger wrote:
> On Sun, 18 Oct 2015 14:29:03 +0800
> Huawei Xie <huawei.xie@intel.com> wrote:
>
>> +
>> +	for (i = 1; i < VIRTIO_TX_FREE_NR; i++) {
>> +		m = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
>> +		if (likely(m->pool == free[0]->pool))
>> +			free[nb_free++] = m;
>> +		else {
>> +			rte_mempool_put_bulk(free[0]->pool, (void **)free,
>> +				nb_free);
>> +			free[0] = m;
>> +			nb_free = 1;
>> +		}
>> +	}
> This assumes all transmits are from the same pool, which is not necessarily true.

Don't get you. It accumulates all the mbufs from the same pool, free all
of them until it meets one from a different pool.

	if (likely(m->pool == free[0]->pool))


>


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [dpdk-dev] [PATCH v2 6/7] virtio: simple tx routine
  2015-10-19  4:16     ` Stephen Hemminger
@ 2015-10-19  5:22       ` Xie, Huawei
  0 siblings, 0 replies; 92+ messages in thread
From: Xie, Huawei @ 2015-10-19  5:22 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

On 10/19/2015 12:17 PM, Stephen Hemminger wrote:
> On Sun, 18 Oct 2015 14:29:03 +0800
> Huawei Xie <huawei.xie@intel.com> wrote:
>
>> bulk free of mbufs when clean used ring.
>> shift operation of idx could be further saved if vq_free_cnt means
>> free slots rather than free descriptors.
>>
>> Signed-off-by: Huawei Xie <huawei.xie@intel.com>
> Did you measure this. I finished my transmit optimizations and gets 25% performance improvement
> without any of these restrictions.
Which restriction do you mean?  For the ring layout optimization, this
is the core idea. For the single segment mbuf, this is what all other
PMDs assume for fastest rx/tx path.
With all vhost and virtio optimizations, it could achieve approximately
3~4 times performance improvement(depending on the workload).
Do you mean the indirect feature support or additional optimization not
submitted? Would review your patch this week.
>


^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH v3 0/7] virtio ring layout optimization and simple rx/tx processing
  2015-09-29 14:45 [dpdk-dev] [PATCH 0/8] virtio: virtio ring layout optimization and RX vector processing Huawei Xie
                   ` (10 preceding siblings ...)
  2015-10-18  6:28 ` Huawei Xie
@ 2015-10-20 15:30 ` Huawei Xie
  2015-10-20 15:30   ` [dpdk-dev] [PATCH v3 1/7] virtio: add virtio_rxtx.h header file Huawei Xie
                     ` (6 more replies)
  2015-10-22 12:09 ` [dpdk-dev] [PATCH v4 0/7] virtio ring layout optimization and simple rx/tx processing Huawei Xie
                   ` (2 subsequent siblings)
  14 siblings, 7 replies; 92+ messages in thread
From: Huawei Xie @ 2015-10-20 15:30 UTC (permalink / raw)
  To: dev

Changes in v2:
- Remove the configure macro
- Enable simple R/TX processing when user specifies simple txq flags
- Reword some comments and commit messages

Changes in v3:
- Remove unnecessary NULL test for rte_free
- Remove unnecessary assign of local var after free
- Remove return at the end of void function
- Remove always_inline attribute for virtio_xmit_cleanup
- Reword some commit messages
- Add TODO in the commit message of simple tx patch

In DPDK based switching enviroment, mostly vhost runs on a dedicated core
while virtio processing in guest VMs runs on other different cores.
Take RX for example, with generic implementation, for each guest buffer,
a) virtio driver allocates a descriptor from free descriptor list
b) modify the entry of avail ring to point to allocated descriptor
c) after packet is received, free the descriptor

When vhost fetches the avail ring, it need to fetch the modified L1 cache from
virtio core, which is a heavy cost in current CPU implementation.

This idea of this optimization is:
    allocate the fixed descriptor for each entry of avail ring, so avail ring will
always be the same during the run.
This removes L1M cache transfer from virtio core to vhost core for avail ring.
(Note we couldn't avoid the cache transfer for descriptors).
Besides, descriptor allocation and free operation is eliminated.
This also makes vector procesing possible to further accelerate the processing.

This is the layout for the avail ring(take 256 ring entries for example), with
each entry pointing to the descriptor with the same index.
                    avail
                    idx
                    +
                    |
+----+----+---+-------------+------+
| 0  | 1  | 2 | ... |  254  | 255  |  avail ring
+-+--+-+--+-+-+---------+---+--+---+
  |    |    |       |   |      |
  |    |    |       |   |      |
  v    v    v       |   v      v
+-+--+-+--+-+-+---------+---+--+---+
| 0  | 1  | 2 | ... |  254  | 255  |  desc ring
+----+----+---+-------------+------+
                    |
                    |
+----+----+---+-------------+------+
| 0  | 1  | 2 |     |  254  | 255  |  used ring
+----+----+---+-------------+------+
                    |
                    +

This is the ring layout for TX.
As we need one virtio header for each xmit packet, we have 128 slots available.

                         ++
                         ||
                         ||
+-----+-----+-----+--------------+------+------+------+
|  0  |  1  | ... |  127 || 128  | 129  | ...  | 255  |   avail ring
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
   |     |            |  ||  |      |             |
   v     v            v  ||  v      v             v
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
| 127 | 128 | ... |  255 || 127  | 128  | ...  | 255  |   desc ring for virtio_net_hdr
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
   |     |            |  ||  |      |             |
   v     v            v  ||  v      v             v
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
|  0  |  1  | ... |  127 ||  0   |  1   | ...  | 127  |   desc ring for tx dat
+-----+-----+-----+--------------+------+------+------+
                         ||
                         ||
                         ++


Performance boost could be observed only if the virtio backend isn't the bottleneck or in VM2VM
case.
There are also several vhost optimization patches to be submitted later.

Huawei Xie (7):
  virtio: add virtio_rxtx.h header file
  virtio: add software rx ring, fake_buf into virtqueue
  virtio: rx/tx ring layout optimization
  virtio: fill RX avail ring with blank mbufs
  virtio: virtio vec rx
  virtio: simple tx routine
  virtio: choose simple rx/tx func

 drivers/net/virtio/Makefile             |   2 +-
 drivers/net/virtio/virtio_ethdev.c      |  12 +-
 drivers/net/virtio/virtio_ethdev.h      |   5 +
 drivers/net/virtio/virtio_rxtx.c        |  53 ++++-
 drivers/net/virtio/virtio_rxtx.h        |  39 ++++
 drivers/net/virtio/virtio_rxtx_simple.c | 401 ++++++++++++++++++++++++++++++++
 drivers/net/virtio/virtqueue.h          |   5 +
 7 files changed, 513 insertions(+), 4 deletions(-)
 create mode 100644 drivers/net/virtio/virtio_rxtx.h
 create mode 100644 drivers/net/virtio/virtio_rxtx_simple.c

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH v3 1/7] virtio: add virtio_rxtx.h header file
  2015-10-20 15:30 ` [dpdk-dev] [PATCH v3 0/7] virtio ring layout optimization and simple rx/tx processing Huawei Xie
@ 2015-10-20 15:30   ` Huawei Xie
  2015-10-20 15:30   ` [dpdk-dev] [PATCH v3 2/7] virtio: add software rx ring, fake_buf into virtqueue Huawei Xie
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 92+ messages in thread
From: Huawei Xie @ 2015-10-20 15:30 UTC (permalink / raw)
  To: dev

Would move all rx/tx related declarations into this header file in future.
Add RTE_VIRTIO_PMD_MAX_BURST.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 drivers/net/virtio/virtio_ethdev.c |  1 +
 drivers/net/virtio/virtio_rxtx.c   |  1 +
 drivers/net/virtio/virtio_rxtx.h   | 34 ++++++++++++++++++++++++++++++++++
 3 files changed, 36 insertions(+)
 create mode 100644 drivers/net/virtio/virtio_rxtx.h

diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index 465d3cd..79a3640 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -61,6 +61,7 @@
 #include "virtio_pci.h"
 #include "virtio_logs.h"
 #include "virtqueue.h"
+#include "virtio_rxtx.h"
 
 
 static int eth_virtio_dev_init(struct rte_eth_dev *eth_dev);
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index c5b53bb..9324f7f 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -54,6 +54,7 @@
 #include "virtio_logs.h"
 #include "virtio_ethdev.h"
 #include "virtqueue.h"
+#include "virtio_rxtx.h"
 
 #ifdef RTE_LIBRTE_VIRTIO_DEBUG_DUMP
 #define VIRTIO_DUMP_PACKET(m, len) rte_pktmbuf_dump(stdout, m, len)
diff --git a/drivers/net/virtio/virtio_rxtx.h b/drivers/net/virtio/virtio_rxtx.h
new file mode 100644
index 0000000..a10aa69
--- /dev/null
+++ b/drivers/net/virtio/virtio_rxtx.h
@@ -0,0 +1,34 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#define RTE_PMD_VIRTIO_RX_MAX_BURST 64
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH v3 2/7] virtio: add software rx ring, fake_buf into virtqueue
  2015-10-20 15:30 ` [dpdk-dev] [PATCH v3 0/7] virtio ring layout optimization and simple rx/tx processing Huawei Xie
  2015-10-20 15:30   ` [dpdk-dev] [PATCH v3 1/7] virtio: add virtio_rxtx.h header file Huawei Xie
@ 2015-10-20 15:30   ` Huawei Xie
  2015-10-20 15:30   ` [dpdk-dev] [PATCH v3 3/7] virtio: rx/tx ring layout optimization Huawei Xie
                     ` (4 subsequent siblings)
  6 siblings, 0 replies; 92+ messages in thread
From: Huawei Xie @ 2015-10-20 15:30 UTC (permalink / raw)
  To: dev

Changes in v3:
- Remove unnecessary NULL test for rte_free
- Remove unnecessary assign of local var vq after free

Add software RX ring in virtqueue.
Add fake_mbuf in virtqueue for wraparound processing.
Use global simple_rxtx to indicate whether simple rxtx is enabled

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 drivers/net/virtio/virtio_ethdev.c | 11 ++++++++++-
 drivers/net/virtio/virtio_rxtx.c   |  7 +++++++
 drivers/net/virtio/virtqueue.h     |  4 ++++
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index 79a3640..82676d3 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -247,8 +247,8 @@ virtio_dev_queue_release(struct virtqueue *vq) {
 		VIRTIO_WRITE_REG_2(hw, VIRTIO_PCI_QUEUE_SEL, vq->queue_id);
 		VIRTIO_WRITE_REG_4(hw, VIRTIO_PCI_QUEUE_PFN, 0);
 
+		rte_free(vq->sw_ring);
 		rte_free(vq);
-		vq = NULL;
 	}
 }
 
@@ -292,6 +292,9 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
 			dev->data->port_id, queue_idx);
 		vq = rte_zmalloc(vq_name, sizeof(struct virtqueue) +
 			vq_size * sizeof(struct vq_desc_extra), RTE_CACHE_LINE_SIZE);
+		vq->sw_ring = rte_zmalloc_socket("rxq->sw_ring",
+			(RTE_PMD_VIRTIO_RX_MAX_BURST + vq_size) *
+			sizeof(vq->sw_ring[0]), RTE_CACHE_LINE_SIZE, socket_id);
 	} else if (queue_type == VTNET_TQ) {
 		snprintf(vq_name, sizeof(vq_name), "port%d_tvq%d",
 			dev->data->port_id, queue_idx);
@@ -308,6 +311,12 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
 		PMD_INIT_LOG(ERR, "%s: Can not allocate virtqueue", __func__);
 		return (-ENOMEM);
 	}
+	if (queue_type == VTNET_RQ && vq->sw_ring == NULL) {
+		PMD_INIT_LOG(ERR, "%s: Can not allocate RX soft ring",
+			__func__);
+		rte_free(vq);
+		return -ENOMEM;
+	}
 
 	vq->hw = hw;
 	vq->port_id = dev->data->port_id;
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 9324f7f..5c00e9d 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -62,6 +62,8 @@
 #define  VIRTIO_DUMP_PACKET(m, len) do { } while (0)
 #endif
 
+static int use_simple_rxtx;
+
 static void
 vq_ring_free_chain(struct virtqueue *vq, uint16_t desc_idx)
 {
@@ -299,6 +301,11 @@ virtio_dev_vring_start(struct virtqueue *vq, int queue_type)
 		/* Allocate blank mbufs for the each rx descriptor */
 		nbufs = 0;
 		error = ENOSPC;
+
+		memset(&vq->fake_mbuf, 0, sizeof(vq->fake_mbuf));
+		for (i = 0; i < RTE_PMD_VIRTIO_RX_MAX_BURST; i++)
+			vq->sw_ring[vq->vq_nentries + i] = &vq->fake_mbuf;
+
 		while (!virtqueue_full(vq)) {
 			m = rte_rxmbuf_alloc(vq->mpool);
 			if (m == NULL)
diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
index 7789411..6a1ec48 100644
--- a/drivers/net/virtio/virtqueue.h
+++ b/drivers/net/virtio/virtqueue.h
@@ -190,6 +190,10 @@ struct virtqueue {
 	uint16_t vq_avail_idx;
 	phys_addr_t virtio_net_hdr_mem; /**< hdr for each xmit packet */
 
+	struct rte_mbuf **sw_ring; /**< RX software ring. */
+	/* dummy mbuf, for wraparound when processing RX ring. */
+	struct rte_mbuf fake_mbuf;
+
 	/* Statistics */
 	uint64_t	packets;
 	uint64_t	bytes;
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH v3 3/7] virtio: rx/tx ring layout optimization
  2015-10-20 15:30 ` [dpdk-dev] [PATCH v3 0/7] virtio ring layout optimization and simple rx/tx processing Huawei Xie
  2015-10-20 15:30   ` [dpdk-dev] [PATCH v3 1/7] virtio: add virtio_rxtx.h header file Huawei Xie
  2015-10-20 15:30   ` [dpdk-dev] [PATCH v3 2/7] virtio: add software rx ring, fake_buf into virtqueue Huawei Xie
@ 2015-10-20 15:30   ` Huawei Xie
  2015-10-20 15:30   ` [dpdk-dev] [PATCH v3 4/7] virtio: fill RX avail ring with blank mbufs Huawei Xie
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 92+ messages in thread
From: Huawei Xie @ 2015-10-20 15:30 UTC (permalink / raw)
  To: dev

In DPDK based switching enviroment, mostly vhost runs on a dedicated core
while virtio processing in guest VMs runs on other different cores.
Take RX for example, with generic implementation, for each guest buffer,
a) virtio driver allocates a descriptor from free descriptor list
b) modify the entry of avail ring to point to allocated descriptor
c) after packet is received, free the descriptor

When vhost fetches the avail ring, it need to fetch the modified L1 cache from
virtio core, which is a heavy cost in current CPU implementation.

This idea of this optimization is:
    allocate the fixed descriptor for each entry of avail ring, so avail ring will
always be the same during the run.
This removes L1M cache transfer from virtio core to vhost core for avail ring.
(Note we couldn't avoid the cache transfer for descriptors).
Besides, descriptor allocation and free operation is eliminated.
This also makes vector procesing possible to further accelerate the processing.

This is the layout for the avail ring(take 256 ring entries for example), with
each entry pointing to the descriptor with the same index.
                    avail
                    idx
                    +
                    |
+----+----+---+-------------+------+
| 0  | 1  | 2 | ... |  254  | 255  |  avail ring
+-+--+-+--+-+-+---------+---+--+---+
  |    |    |       |   |      |
  |    |    |       |   |      |
  v    v    v       |   v      v
+-+--+-+--+-+-+---------+---+--+---+
| 0  | 1  | 2 | ... |  254  | 255  |  desc ring
+----+----+---+-------------+------+
                    |
                    |
+----+----+---+-------------+------+
| 0  | 1  | 2 |     |  254  | 255  |  used ring
+----+----+---+-------------+------+
                    |
                    +

This is the ring layout for TX.
As we need one virtio header for each xmit packet, we have 128 slots available.

                         ++
                         ||
                         ||
+-----+-----+-----+--------------+------+------+------+
|  0  |  1  | ... |  127 || 128  | 129  | ...  | 255  |   avail ring
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
   |     |            |  ||  |      |             |
   v     v            v  ||  v      v             v
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
| 127 | 128 | ... |  255 || 127  | 128  | ...  | 255  |   desc ring for virtio_net_hdr
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
   |     |            |  ||  |      |             |
   v     v            v  ||  v      v             v
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
|  0  |  1  | ... |  127 ||  0   |  1   | ...  | 127  |   desc ring for tx dat
+-----+-----+-----+--------------+------+------+------+
                         ||
                         ||
                         ++

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 drivers/net/virtio/virtio_rxtx.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 5c00e9d..7c82a6a 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -302,6 +302,12 @@ virtio_dev_vring_start(struct virtqueue *vq, int queue_type)
 		nbufs = 0;
 		error = ENOSPC;
 
+		if (use_simple_rxtx)
+			for (i = 0; i < vq->vq_nentries; i++) {
+				vq->vq_ring.avail->ring[i] = i;
+				vq->vq_ring.desc[i].flags = VRING_DESC_F_WRITE;
+			}
+
 		memset(&vq->fake_mbuf, 0, sizeof(vq->fake_mbuf));
 		for (i = 0; i < RTE_PMD_VIRTIO_RX_MAX_BURST; i++)
 			vq->sw_ring[vq->vq_nentries + i] = &vq->fake_mbuf;
@@ -332,6 +338,24 @@ virtio_dev_vring_start(struct virtqueue *vq, int queue_type)
 		VIRTIO_WRITE_REG_4(vq->hw, VIRTIO_PCI_QUEUE_PFN,
 			vq->mz->phys_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT);
 	} else if (queue_type == VTNET_TQ) {
+		if (use_simple_rxtx) {
+			int mid_idx  = vq->vq_nentries >> 1;
+			for (i = 0; i < mid_idx; i++) {
+				vq->vq_ring.avail->ring[i] = i + mid_idx;
+				vq->vq_ring.desc[i + mid_idx].next = i;
+				vq->vq_ring.desc[i + mid_idx].addr =
+					vq->virtio_net_hdr_mem +
+						mid_idx * vq->hw->vtnet_hdr_size;
+				vq->vq_ring.desc[i + mid_idx].len =
+					vq->hw->vtnet_hdr_size;
+				vq->vq_ring.desc[i + mid_idx].flags =
+					VRING_DESC_F_NEXT;
+				vq->vq_ring.desc[i].flags = 0;
+			}
+			for (i = mid_idx; i < vq->vq_nentries; i++)
+				vq->vq_ring.avail->ring[i] = i;
+		}
+
 		VIRTIO_WRITE_REG_2(vq->hw, VIRTIO_PCI_QUEUE_SEL,
 			vq->vq_queue_index);
 		VIRTIO_WRITE_REG_4(vq->hw, VIRTIO_PCI_QUEUE_PFN,
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH v3 4/7] virtio: fill RX avail ring with blank mbufs
  2015-10-20 15:30 ` [dpdk-dev] [PATCH v3 0/7] virtio ring layout optimization and simple rx/tx processing Huawei Xie
                     ` (2 preceding siblings ...)
  2015-10-20 15:30   ` [dpdk-dev] [PATCH v3 3/7] virtio: rx/tx ring layout optimization Huawei Xie
@ 2015-10-20 15:30   ` Huawei Xie
  2015-10-20 15:30   ` [dpdk-dev] [PATCH v3 5/7] virtio: virtio vec rx Huawei Xie
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 92+ messages in thread
From: Huawei Xie @ 2015-10-20 15:30 UTC (permalink / raw)
  To: dev

fill avail ring with blank mbufs in virtio_dev_vring_start

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 drivers/net/virtio/Makefile             |  2 +-
 drivers/net/virtio/virtio_rxtx.c        |  6 ++-
 drivers/net/virtio/virtio_rxtx.h        |  3 ++
 drivers/net/virtio/virtio_rxtx_simple.c | 84 +++++++++++++++++++++++++++++++++
 4 files changed, 92 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/virtio/virtio_rxtx_simple.c

diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
index 930b60f..43835ba 100644
--- a/drivers/net/virtio/Makefile
+++ b/drivers/net/virtio/Makefile
@@ -50,7 +50,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtqueue.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_pci.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_ethdev.c
-
+SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_simple.c
 
 # this lib depends upon:
 DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_eal lib/librte_ether
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 7c82a6a..5162ce6 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -320,8 +320,10 @@ virtio_dev_vring_start(struct virtqueue *vq, int queue_type)
 			/******************************************
 			*         Enqueue allocated buffers        *
 			*******************************************/
-			error = virtqueue_enqueue_recv_refill(vq, m);
-
+			if (use_simple_rxtx)
+				error = virtqueue_enqueue_recv_refill_simple(vq, m);
+			else
+				error = virtqueue_enqueue_recv_refill(vq, m);
 			if (error) {
 				rte_pktmbuf_free(m);
 				break;
diff --git a/drivers/net/virtio/virtio_rxtx.h b/drivers/net/virtio/virtio_rxtx.h
index a10aa69..7d2d8fe 100644
--- a/drivers/net/virtio/virtio_rxtx.h
+++ b/drivers/net/virtio/virtio_rxtx.h
@@ -32,3 +32,6 @@
  */
 
 #define RTE_PMD_VIRTIO_RX_MAX_BURST 64
+
+int virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,
+	struct rte_mbuf *m);
diff --git a/drivers/net/virtio/virtio_rxtx_simple.c b/drivers/net/virtio/virtio_rxtx_simple.c
new file mode 100644
index 0000000..cac5b9f
--- /dev/null
+++ b/drivers/net/virtio/virtio_rxtx_simple.c
@@ -0,0 +1,84 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <errno.h>
+
+#include <tmmintrin.h>
+
+#include <rte_cycles.h>
+#include <rte_memory.h>
+#include <rte_memzone.h>
+#include <rte_branch_prediction.h>
+#include <rte_mempool.h>
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+#include <rte_ether.h>
+#include <rte_ethdev.h>
+#include <rte_prefetch.h>
+#include <rte_string_fns.h>
+#include <rte_errno.h>
+#include <rte_byteorder.h>
+
+#include "virtio_logs.h"
+#include "virtio_ethdev.h"
+#include "virtqueue.h"
+#include "virtio_rxtx.h"
+
+int __attribute__((cold))
+virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,
+	struct rte_mbuf *cookie)
+{
+	struct vq_desc_extra *dxp;
+	struct vring_desc *start_dp;
+	uint16_t desc_idx;
+
+	desc_idx = vq->vq_avail_idx & (vq->vq_nentries - 1);
+	dxp = &vq->vq_descx[desc_idx];
+	dxp->cookie = (void *)cookie;
+	vq->sw_ring[desc_idx] = cookie;
+
+	start_dp = vq->vq_ring.desc;
+	start_dp[desc_idx].addr = (uint64_t)((uintptr_t)cookie->buf_physaddr +
+		RTE_PKTMBUF_HEADROOM - sizeof(struct virtio_net_hdr));
+	start_dp[desc_idx].len = cookie->buf_len -
+		RTE_PKTMBUF_HEADROOM + sizeof(struct virtio_net_hdr);
+
+	vq->vq_free_cnt--;
+	vq->vq_avail_idx++;
+
+	return 0;
+}
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH v3 5/7] virtio: virtio vec rx
  2015-10-20 15:30 ` [dpdk-dev] [PATCH v3 0/7] virtio ring layout optimization and simple rx/tx processing Huawei Xie
                     ` (3 preceding siblings ...)
  2015-10-20 15:30   ` [dpdk-dev] [PATCH v3 4/7] virtio: fill RX avail ring with blank mbufs Huawei Xie
@ 2015-10-20 15:30   ` Huawei Xie
  2015-10-22  4:04     ` Wang, Zhihong
  2015-10-20 15:30   ` [dpdk-dev] [PATCH v3 6/7] virtio: simple tx routine Huawei Xie
  2015-10-20 15:30   ` [dpdk-dev] [PATCH v3 7/7] virtio: pick simple rx/tx func Huawei Xie
  6 siblings, 1 reply; 92+ messages in thread
From: Huawei Xie @ 2015-10-20 15:30 UTC (permalink / raw)
  To: dev

With fixed avail ring, we don't need to get desc idx from avail ring.
virtio driver only has to deal with desc ring.
This patch uses vector instruction to accelerate processing desc ring.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 drivers/net/virtio/virtio_ethdev.h      |   2 +
 drivers/net/virtio/virtio_rxtx.c        |   3 +
 drivers/net/virtio/virtio_rxtx.h        |   2 +
 drivers/net/virtio/virtio_rxtx_simple.c | 224 ++++++++++++++++++++++++++++++++
 drivers/net/virtio/virtqueue.h          |   1 +
 5 files changed, 232 insertions(+)

diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h
index 9026d42..d7797ab 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -108,6 +108,8 @@ uint16_t virtio_recv_mergeable_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 uint16_t virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
+		uint16_t nb_pkts);
 
 /*
  * The VIRTIO_NET_F_GUEST_TSO[46] features permit the host to send us
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 5162ce6..947fc46 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -432,6 +432,9 @@ virtio_dev_rx_queue_setup(struct rte_eth_dev *dev,
 	vq->mpool = mp;
 
 	dev->data->rx_queues[queue_idx] = vq;
+
+	virtio_rxq_vec_setup(vq);
+
 	return 0;
 }
 
diff --git a/drivers/net/virtio/virtio_rxtx.h b/drivers/net/virtio/virtio_rxtx.h
index 7d2d8fe..831e492 100644
--- a/drivers/net/virtio/virtio_rxtx.h
+++ b/drivers/net/virtio/virtio_rxtx.h
@@ -33,5 +33,7 @@
 
 #define RTE_PMD_VIRTIO_RX_MAX_BURST 64
 
+int virtio_rxq_vec_setup(struct virtqueue *rxq);
+
 int virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,
 	struct rte_mbuf *m);
diff --git a/drivers/net/virtio/virtio_rxtx_simple.c b/drivers/net/virtio/virtio_rxtx_simple.c
index cac5b9f..ef17562 100644
--- a/drivers/net/virtio/virtio_rxtx_simple.c
+++ b/drivers/net/virtio/virtio_rxtx_simple.c
@@ -58,6 +58,10 @@
 #include "virtqueue.h"
 #include "virtio_rxtx.h"
 
+#define RTE_VIRTIO_VPMD_RX_BURST 32
+#define RTE_VIRTIO_DESC_PER_LOOP 8
+#define RTE_VIRTIO_VPMD_RX_REARM_THRESH RTE_VIRTIO_VPMD_RX_BURST
+
 int __attribute__((cold))
 virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,
 	struct rte_mbuf *cookie)
@@ -82,3 +86,223 @@ virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,
 
 	return 0;
 }
+
+static inline void
+virtio_rxq_rearm_vec(struct virtqueue *rxvq)
+{
+	int i;
+	uint16_t desc_idx;
+	struct rte_mbuf **sw_ring;
+	struct vring_desc *start_dp;
+	int ret;
+
+	desc_idx = rxvq->vq_avail_idx & (rxvq->vq_nentries - 1);
+	sw_ring = &rxvq->sw_ring[desc_idx];
+	start_dp = &rxvq->vq_ring.desc[desc_idx];
+
+	ret = rte_mempool_get_bulk(rxvq->mpool, (void **)sw_ring,
+		RTE_VIRTIO_VPMD_RX_REARM_THRESH);
+	if (unlikely(ret)) {
+		rte_eth_devices[rxvq->port_id].data->rx_mbuf_alloc_failed +=
+			RTE_VIRTIO_VPMD_RX_REARM_THRESH;
+		return;
+	}
+
+	for (i = 0; i < RTE_VIRTIO_VPMD_RX_REARM_THRESH; i++) {
+		uintptr_t p;
+
+		p = (uintptr_t)&sw_ring[i]->rearm_data;
+		*(uint64_t *)p = rxvq->mbuf_initializer;
+
+		start_dp[i].addr =
+			(uint64_t)((uintptr_t)sw_ring[i]->buf_physaddr +
+			RTE_PKTMBUF_HEADROOM - sizeof(struct virtio_net_hdr));
+		start_dp[i].len = sw_ring[i]->buf_len -
+			RTE_PKTMBUF_HEADROOM + sizeof(struct virtio_net_hdr);
+	}
+
+	rxvq->vq_avail_idx += RTE_VIRTIO_VPMD_RX_REARM_THRESH;
+	rxvq->vq_free_cnt -= RTE_VIRTIO_VPMD_RX_REARM_THRESH;
+	vq_update_avail_idx(rxvq);
+}
+
+/* virtio vPMD receive routine, only accept(nb_pkts >= RTE_VIRTIO_DESC_PER_LOOP)
+ *
+ * This routine is for non-mergable RX, one desc for each guest buffer.
+ * This routine is based on the RX ring layout optimization. Each entry in the
+ * avail ring points to the desc with the same index in the desc ring and this
+ * will never be changed in the driver.
+ *
+ * - nb_pkts < RTE_VIRTIO_DESC_PER_LOOP, just return no packet
+ */
+uint16_t
+virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
+	uint16_t nb_pkts)
+{
+	struct virtqueue *rxvq = rx_queue;
+	uint16_t nb_used;
+	uint16_t desc_idx;
+	struct vring_used_elem *rused;
+	struct rte_mbuf **sw_ring;
+	struct rte_mbuf **sw_ring_end;
+	uint16_t nb_pkts_received;
+	__m128i shuf_msk1, shuf_msk2, len_adjust;
+
+	shuf_msk1 = _mm_set_epi8(
+		0xFF, 0xFF, 0xFF, 0xFF,
+		0xFF, 0xFF,		/* vlan tci */
+		5, 4,			/* dat len */
+		0xFF, 0xFF, 5, 4,	/* pkt len */
+		0xFF, 0xFF, 0xFF, 0xFF	/* packet type */
+
+	);
+
+	shuf_msk2 = _mm_set_epi8(
+		0xFF, 0xFF, 0xFF, 0xFF,
+		0xFF, 0xFF,		/* vlan tci */
+		13, 12,			/* dat len */
+		0xFF, 0xFF, 13, 12,	/* pkt len */
+		0xFF, 0xFF, 0xFF, 0xFF	/* packet type */
+	);
+
+	/* Substract the header length.
+	*  In which case do we need the header length in used->len ?
+	*/
+	len_adjust = _mm_set_epi16(
+		0, 0,
+		0,
+		(uint16_t) -sizeof(struct virtio_net_hdr),
+		0, (uint16_t) -sizeof(struct virtio_net_hdr),
+		0, 0);
+
+	if (unlikely(nb_pkts < RTE_VIRTIO_DESC_PER_LOOP))
+		return 0;
+
+	nb_used = *(volatile uint16_t *)&rxvq->vq_ring.used->idx -
+		rxvq->vq_used_cons_idx;
+
+	rte_compiler_barrier();
+
+	if (unlikely(nb_used == 0))
+		return 0;
+
+	nb_pkts = RTE_ALIGN_FLOOR(nb_pkts, RTE_VIRTIO_DESC_PER_LOOP);
+	nb_used = RTE_MIN(nb_used, nb_pkts);
+
+	desc_idx = (uint16_t)(rxvq->vq_used_cons_idx & (rxvq->vq_nentries - 1));
+	rused = &rxvq->vq_ring.used->ring[desc_idx];
+	sw_ring  = &rxvq->sw_ring[desc_idx];
+	sw_ring_end = &rxvq->sw_ring[rxvq->vq_nentries];
+
+	_mm_prefetch((const void *)rused, _MM_HINT_T0);
+
+	if (rxvq->vq_free_cnt >= RTE_VIRTIO_VPMD_RX_REARM_THRESH) {
+		virtio_rxq_rearm_vec(rxvq);
+		if (unlikely(virtqueue_kick_prepare(rxvq)))
+			virtqueue_notify(rxvq);
+	}
+
+	for (nb_pkts_received = 0;
+		nb_pkts_received < nb_used;) {
+		__m128i desc[RTE_VIRTIO_DESC_PER_LOOP / 2];
+		__m128i mbp[RTE_VIRTIO_DESC_PER_LOOP / 2];
+		__m128i pkt_mb[RTE_VIRTIO_DESC_PER_LOOP];
+
+		mbp[0] = _mm_loadu_si128((__m128i *)(sw_ring + 0));
+		desc[0] = _mm_loadu_si128((__m128i *)(rused + 0));
+		_mm_storeu_si128((__m128i *)&rx_pkts[0], mbp[0]);
+
+		mbp[1] = _mm_loadu_si128((__m128i *)(sw_ring + 2));
+		desc[1] = _mm_loadu_si128((__m128i *)(rused + 2));
+		_mm_storeu_si128((__m128i *)&rx_pkts[2], mbp[1]);
+
+		mbp[2] = _mm_loadu_si128((__m128i *)(sw_ring + 4));
+		desc[2] = _mm_loadu_si128((__m128i *)(rused + 4));
+		_mm_storeu_si128((__m128i *)&rx_pkts[4], mbp[2]);
+
+		mbp[3] = _mm_loadu_si128((__m128i *)(sw_ring + 6));
+		desc[3] = _mm_loadu_si128((__m128i *)(rused + 6));
+		_mm_storeu_si128((__m128i *)&rx_pkts[6], mbp[3]);
+
+		pkt_mb[1] = _mm_shuffle_epi8(desc[0], shuf_msk2);
+		pkt_mb[0] = _mm_shuffle_epi8(desc[0], shuf_msk1);
+		pkt_mb[1] = _mm_add_epi16(pkt_mb[1], len_adjust);
+		pkt_mb[0] = _mm_add_epi16(pkt_mb[0], len_adjust);
+		_mm_storeu_si128((void *)&rx_pkts[1]->rx_descriptor_fields1,
+			pkt_mb[1]);
+		_mm_storeu_si128((void *)&rx_pkts[0]->rx_descriptor_fields1,
+			pkt_mb[0]);
+
+		pkt_mb[3] = _mm_shuffle_epi8(desc[1], shuf_msk2);
+		pkt_mb[2] = _mm_shuffle_epi8(desc[1], shuf_msk1);
+		pkt_mb[3] = _mm_add_epi16(pkt_mb[3], len_adjust);
+		pkt_mb[2] = _mm_add_epi16(pkt_mb[2], len_adjust);
+		_mm_storeu_si128((void *)&rx_pkts[3]->rx_descriptor_fields1,
+			pkt_mb[3]);
+		_mm_storeu_si128((void *)&rx_pkts[2]->rx_descriptor_fields1,
+			pkt_mb[2]);
+
+		pkt_mb[5] = _mm_shuffle_epi8(desc[2], shuf_msk2);
+		pkt_mb[4] = _mm_shuffle_epi8(desc[2], shuf_msk1);
+		pkt_mb[5] = _mm_add_epi16(pkt_mb[5], len_adjust);
+		pkt_mb[4] = _mm_add_epi16(pkt_mb[4], len_adjust);
+		_mm_storeu_si128((void *)&rx_pkts[5]->rx_descriptor_fields1,
+			pkt_mb[5]);
+		_mm_storeu_si128((void *)&rx_pkts[4]->rx_descriptor_fields1,
+			pkt_mb[4]);
+
+		pkt_mb[7] = _mm_shuffle_epi8(desc[3], shuf_msk2);
+		pkt_mb[6] = _mm_shuffle_epi8(desc[3], shuf_msk1);
+		pkt_mb[7] = _mm_add_epi16(pkt_mb[7], len_adjust);
+		pkt_mb[6] = _mm_add_epi16(pkt_mb[6], len_adjust);
+		_mm_storeu_si128((void *)&rx_pkts[7]->rx_descriptor_fields1,
+			pkt_mb[7]);
+		_mm_storeu_si128((void *)&rx_pkts[6]->rx_descriptor_fields1,
+			pkt_mb[6]);
+
+		if (unlikely(nb_used <= RTE_VIRTIO_DESC_PER_LOOP)) {
+			if (sw_ring + nb_used <= sw_ring_end)
+				nb_pkts_received += nb_used;
+			else
+				nb_pkts_received += sw_ring_end - sw_ring;
+			break;
+		} else {
+			if (unlikely(sw_ring + RTE_VIRTIO_DESC_PER_LOOP >=
+				sw_ring_end)) {
+				nb_pkts_received += sw_ring_end - sw_ring;
+				break;
+			} else {
+				nb_pkts_received += RTE_VIRTIO_DESC_PER_LOOP;
+
+				rx_pkts += RTE_VIRTIO_DESC_PER_LOOP;
+				sw_ring += RTE_VIRTIO_DESC_PER_LOOP;
+				rused   += RTE_VIRTIO_DESC_PER_LOOP;
+				nb_used -= RTE_VIRTIO_DESC_PER_LOOP;
+			}
+		}
+	}
+
+	rxvq->vq_used_cons_idx += nb_pkts_received;
+	rxvq->vq_free_cnt += nb_pkts_received;
+	rxvq->packets += nb_pkts_received;
+	return nb_pkts_received;
+}
+
+int __attribute__((cold))
+virtio_rxq_vec_setup(struct virtqueue *rxq)
+{
+	uintptr_t p;
+	struct rte_mbuf mb_def = { .buf_addr = 0 }; /* zeroed mbuf */
+
+	mb_def.nb_segs = 1;
+	mb_def.data_off = RTE_PKTMBUF_HEADROOM;
+	mb_def.port = rxq->port_id;
+	rte_mbuf_refcnt_set(&mb_def, 1);
+
+	/* prevent compiler reordering: rearm_data covers previous fields */
+	rte_compiler_barrier();
+	p = (uintptr_t)&mb_def.rearm_data;
+	rxq->mbuf_initializer = *(uint64_t *)p;
+
+	return 0;
+}
diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
index 6a1ec48..98a77d5 100644
--- a/drivers/net/virtio/virtqueue.h
+++ b/drivers/net/virtio/virtqueue.h
@@ -188,6 +188,7 @@ struct virtqueue {
 	 */
 	uint16_t vq_used_cons_idx;
 	uint16_t vq_avail_idx;
+	uint64_t mbuf_initializer; /**< value to init mbufs. */
 	phys_addr_t virtio_net_hdr_mem; /**< hdr for each xmit packet */
 
 	struct rte_mbuf **sw_ring; /**< RX software ring. */
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH v3 6/7] virtio: simple tx routine
  2015-10-20 15:30 ` [dpdk-dev] [PATCH v3 0/7] virtio ring layout optimization and simple rx/tx processing Huawei Xie
                     ` (4 preceding siblings ...)
  2015-10-20 15:30   ` [dpdk-dev] [PATCH v3 5/7] virtio: virtio vec rx Huawei Xie
@ 2015-10-20 15:30   ` Huawei Xie
  2015-10-20 18:58     ` Stephen Hemminger
  2015-10-20 15:30   ` [dpdk-dev] [PATCH v3 7/7] virtio: pick simple rx/tx func Huawei Xie
  6 siblings, 1 reply; 92+ messages in thread
From: Huawei Xie @ 2015-10-20 15:30 UTC (permalink / raw)
  To: dev

Changes in v3:
- Remove return at the end of void function
- Remove always_inline attribute for virtio_xmit_cleanup

bulk free of mbufs when clean used ring.
shift operation of idx could be saved if vq_free_cnt means
free slots rather than free descriptors.

TODO: rearrange vq data structure, pack the stats var together so that we could use
one vec instruction to update all of them.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 drivers/net/virtio/virtio_ethdev.h      |  3 ++
 drivers/net/virtio/virtio_rxtx_simple.c | 93 +++++++++++++++++++++++++++++++++
 2 files changed, 96 insertions(+)

diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h
index d7797ab..ae2d47d 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -111,6 +111,9 @@ uint16_t virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 uint16_t virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t virtio_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 /*
  * The VIRTIO_NET_F_GUEST_TSO[46] features permit the host to send us
  * frames larger than 1514 bytes. We do not yet support software LRO
diff --git a/drivers/net/virtio/virtio_rxtx_simple.c b/drivers/net/virtio/virtio_rxtx_simple.c
index ef17562..a53d462 100644
--- a/drivers/net/virtio/virtio_rxtx_simple.c
+++ b/drivers/net/virtio/virtio_rxtx_simple.c
@@ -288,6 +288,99 @@ virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
 	return nb_pkts_received;
 }
 
+#define VIRTIO_TX_FREE_THRESH 32
+#define VIRTIO_TX_MAX_FREE_BUF_SZ 32
+#define VIRTIO_TX_FREE_NR 32
+/* TODO: vq->tx_free_cnt could mean num of free slots so we could avoid shift */
+static inline void
+virtio_xmit_cleanup(struct virtqueue *vq)
+{
+	uint16_t i, desc_idx;
+	int nb_free = 0;
+	struct rte_mbuf *m, *free[VIRTIO_TX_MAX_FREE_BUF_SZ];
+
+	desc_idx = (uint16_t)(vq->vq_used_cons_idx &
+		((vq->vq_nentries >> 1) - 1));
+	free[0] = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
+	nb_free = 1;
+
+	for (i = 1; i < VIRTIO_TX_FREE_NR; i++) {
+		m = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
+		if (likely(m->pool == free[0]->pool))
+			free[nb_free++] = m;
+		else {
+			rte_mempool_put_bulk(free[0]->pool, (void **)free,
+				nb_free);
+			free[0] = m;
+			nb_free = 1;
+		}
+	}
+
+	rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
+	vq->vq_used_cons_idx += VIRTIO_TX_FREE_NR;
+	vq->vq_free_cnt += (VIRTIO_TX_FREE_NR << 1);
+}
+
+uint16_t
+virtio_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
+	uint16_t nb_pkts)
+{
+	struct virtqueue *txvq = tx_queue;
+	uint16_t nb_used;
+	uint16_t desc_idx;
+	struct vring_desc *start_dp;
+	uint16_t nb_tail, nb_commit;
+	int i;
+	uint16_t desc_idx_max = (txvq->vq_nentries >> 1) - 1;
+
+	nb_used = VIRTQUEUE_NUSED(txvq);
+	rte_compiler_barrier();
+
+	nb_commit = nb_pkts = RTE_MIN((txvq->vq_free_cnt >> 1), nb_pkts);
+	desc_idx = (uint16_t) (txvq->vq_avail_idx & desc_idx_max);
+	start_dp = txvq->vq_ring.desc;
+	nb_tail = (uint16_t) (desc_idx_max + 1 - desc_idx);
+
+	if (nb_used >= VIRTIO_TX_FREE_THRESH)
+		virtio_xmit_cleanup(tx_queue);
+
+	if (nb_commit >= nb_tail) {
+		for (i = 0; i < nb_tail; i++)
+			txvq->vq_descx[desc_idx + i].cookie = tx_pkts[i];
+		for (i = 0; i < nb_tail; i++) {
+			start_dp[desc_idx].addr =
+				RTE_MBUF_DATA_DMA_ADDR(*tx_pkts);
+			start_dp[desc_idx].len = (*tx_pkts)->pkt_len;
+			tx_pkts++;
+			desc_idx++;
+		}
+		nb_commit -= nb_tail;
+		desc_idx = 0;
+	}
+	for (i = 0; i < nb_commit; i++)
+		txvq->vq_descx[desc_idx + i].cookie = tx_pkts[i];
+	for (i = 0; i < nb_commit; i++) {
+		start_dp[desc_idx].addr = RTE_MBUF_DATA_DMA_ADDR(*tx_pkts);
+		start_dp[desc_idx].len = (*tx_pkts)->pkt_len;
+		tx_pkts++;
+		desc_idx++;
+	}
+
+	rte_compiler_barrier();
+
+	txvq->vq_free_cnt -= (uint16_t)(nb_pkts << 1);
+	txvq->vq_avail_idx += nb_pkts;
+	txvq->vq_ring.avail->idx = txvq->vq_avail_idx;
+	txvq->packets += nb_pkts;
+
+	if (likely(nb_pkts)) {
+		if (unlikely(virtqueue_kick_prepare(txvq)))
+			virtqueue_notify(txvq);
+	}
+
+	return nb_pkts;
+}
+
 int __attribute__((cold))
 virtio_rxq_vec_setup(struct virtqueue *rxq)
 {
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH v3 7/7] virtio: pick simple rx/tx func
  2015-10-20 15:30 ` [dpdk-dev] [PATCH v3 0/7] virtio ring layout optimization and simple rx/tx processing Huawei Xie
                     ` (5 preceding siblings ...)
  2015-10-20 15:30   ` [dpdk-dev] [PATCH v3 6/7] virtio: simple tx routine Huawei Xie
@ 2015-10-20 15:30   ` Huawei Xie
  2015-10-22  2:50     ` Tan, Jianfeng
  6 siblings, 1 reply; 92+ messages in thread
From: Huawei Xie @ 2015-10-20 15:30 UTC (permalink / raw)
  To: dev

simple rx/tx func is enabled when user specifies single segment and no offload support.
merge-able should be disabled to use simple rxtx.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 drivers/net/virtio/virtio_rxtx.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 947fc46..71f8cd4 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -62,6 +62,10 @@
 #define  VIRTIO_DUMP_PACKET(m, len) do { } while (0)
 #endif
 
+
+#define VIRTIO_SIMPLE_FLAGS ((uint32_t)ETH_TXQ_FLAGS_NOMULTSEGS | \
+	ETH_TXQ_FLAGS_NOOFFLOADS)
+
 static int use_simple_rxtx;
 
 static void
@@ -471,6 +475,14 @@ virtio_dev_tx_queue_setup(struct rte_eth_dev *dev,
 		return -EINVAL;
 	}
 
+	/* Use simple rx/tx func if single segment and no offloads */
+	if ((tx_conf->txq_flags & VIRTIO_SIMPLE_FLAGS) == VIRTIO_SIMPLE_FLAGS) {
+		PMD_INIT_LOG(INFO, "Using simple rx/tx path");
+		dev->tx_pkt_burst = virtio_xmit_pkts_simple;
+		dev->rx_pkt_burst = virtio_recv_pkts_vec;
+		use_simple_rxtx = 1;
+	}
+
 	ret = virtio_dev_queue_setup(dev, VTNET_TQ, queue_idx, vtpci_queue_idx,
 			nb_desc, socket_id, &vq);
 	if (ret < 0) {
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [dpdk-dev] [PATCH v2 2/7] virtio: add software rx ring, fake_buf into virtqueue
  2015-10-19  5:06       ` Xie, Huawei
@ 2015-10-20 15:32         ` Xie, Huawei
  0 siblings, 0 replies; 92+ messages in thread
From: Xie, Huawei @ 2015-10-20 15:32 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

On 10/19/2015 1:07 PM, Xie, Huawei wrote:
> On 10/19/2015 12:20 PM, Stephen Hemminger wrote:
>
> On Sun, 18 Oct 2015 14:28:59 +0800
> Huawei Xie <huawei.xie@intel.com><mailto:huawei.xie@intel.com> wrote:
>
>
>
> +               if (vq->sw_ring)
> +                       rte_free(vq->sw_ring);
> +
>
>
>
> Do not need to test for NULL before calling rte_free.
> Better to just rely on the fact that rte_free(NULL) is documented
> to be ok (no operation).
>
>
>
>  ok, btw, in previous commit, just in the same function,
> void virtio_dev_queue_release(vq)
> [...]
>
>                            rte_free(vq);
>
>                            vq = NULL;
> I think there is no need to set NULL to vq. Will submit a patch to fix it if you agree.
In v3 patch, i also remove the "vq = NULL".
>
>


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [dpdk-dev] [PATCH v3 6/7] virtio: simple tx routine
  2015-10-20 15:30   ` [dpdk-dev] [PATCH v3 6/7] virtio: simple tx routine Huawei Xie
@ 2015-10-20 18:58     ` Stephen Hemminger
  2015-10-22  5:43       ` Xie, Huawei
  0 siblings, 1 reply; 92+ messages in thread
From: Stephen Hemminger @ 2015-10-20 18:58 UTC (permalink / raw)
  To: Huawei Xie; +Cc: dev

On Tue, 20 Oct 2015 23:30:06 +0800
Huawei Xie <huawei.xie@intel.com> wrote:

> +	desc_idx = (uint16_t)(vq->vq_used_cons_idx &
> +		((vq->vq_nentries >> 1) - 1));
> +	free[0] = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
> +	nb_free = 1;
> +
> +	for (i = 1; i < VIRTIO_TX_FREE_NR; i++) {
> +		m = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
> +		if (likely(m->pool == free[0]->pool))
> +			free[nb_free++] = m;
> +		else {
> +			rte_mempool_put_bulk(free[0]->pool, (void **)free,
> +				nb_free);
> +			free[0] = m;
> +			nb_free = 1;
> +		}
> +	}
> +
> +	rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);

Might be better to introduce a function in rte_mbuf.h which
does this so other drivers can use same code?

rte_pktmbuf_free_bulk(pkts[], n)

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [dpdk-dev] [PATCH v3 7/7] virtio: pick simple rx/tx func
  2015-10-20 15:30   ` [dpdk-dev] [PATCH v3 7/7] virtio: pick simple rx/tx func Huawei Xie
@ 2015-10-22  2:50     ` Tan, Jianfeng
  2015-10-22 11:40       ` Xie, Huawei
  0 siblings, 1 reply; 92+ messages in thread
From: Tan, Jianfeng @ 2015-10-22  2:50 UTC (permalink / raw)
  To: Xie, Huawei, dev

On 10/22/2015 10:45 AM, Jianfeng wrote:

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Huawei Xie
> Sent: Tuesday, October 20, 2015 11:30 PM
> To: dev@dpdk.org
> Subject: [dpdk-dev] [PATCH v3 7/7] virtio: pick simple rx/tx func
> 
> simple rx/tx func is enabled when user specifies single segment and no
> offload support.
> merge-able should be disabled to use simple rxtx.
> 
> Signed-off-by: Huawei Xie <huawei.xie@intel.com>
> ---
>  drivers/net/virtio/virtio_rxtx.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
> index 947fc46..71f8cd4 100644
> --- a/drivers/net/virtio/virtio_rxtx.c
> +++ b/drivers/net/virtio/virtio_rxtx.c
> @@ -62,6 +62,10 @@
>  #define  VIRTIO_DUMP_PACKET(m, len) do { } while (0)  #endif
> 
> +
> +#define VIRTIO_SIMPLE_FLAGS ((uint32_t)ETH_TXQ_FLAGS_NOMULTSEGS
> | \
> +	ETH_TXQ_FLAGS_NOOFFLOADS)
> +
>  static int use_simple_rxtx;
> 
>  static void
> @@ -471,6 +475,14 @@ virtio_dev_tx_queue_setup(struct rte_eth_dev
> *dev,
>  		return -EINVAL;
>  	}
> 
> +	/* Use simple rx/tx func if single segment and no offloads */
> +	if ((tx_conf->txq_flags & VIRTIO_SIMPLE_FLAGS) ==
> VIRTIO_SIMPLE_FLAGS) {
> +		PMD_INIT_LOG(INFO, "Using simple rx/tx path");
> +		dev->tx_pkt_burst = virtio_xmit_pkts_simple;
> +		dev->rx_pkt_burst = virtio_recv_pkts_vec;

Whether recv side mergeable is supported is controlled by virtio_negotiate_feature().
So "dev->rx_pkt_burst = virtio_recv_pkts_vec" should be restricted by 
hw->guest_features & VIRTIO_NET_F_MRG_RXBUF, right?

> +		use_simple_rxtx = 1;
> +	}
> +
>  	ret = virtio_dev_queue_setup(dev, VTNET_TQ, queue_idx,
> vtpci_queue_idx,
>  			nb_desc, socket_id, &vq);
>  	if (ret < 0) {
> --
> 1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [dpdk-dev] [PATCH v3 5/7] virtio: virtio vec rx
  2015-10-20 15:30   ` [dpdk-dev] [PATCH v3 5/7] virtio: virtio vec rx Huawei Xie
@ 2015-10-22  4:04     ` Wang, Zhihong
  2015-10-22  5:48       ` Xie, Huawei
  0 siblings, 1 reply; 92+ messages in thread
From: Wang, Zhihong @ 2015-10-22  4:04 UTC (permalink / raw)
  To: Xie, Huawei; +Cc: dev



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Huawei Xie
> Sent: Tuesday, October 20, 2015 11:30 PM
> To: dev@dpdk.org
> Subject: [dpdk-dev] [PATCH v3 5/7] virtio: virtio vec rx
> 
> With fixed avail ring, we don't need to get desc idx from avail ring.
> virtio driver only has to deal with desc ring.
> This patch uses vector instruction to accelerate processing desc ring.
> 
> Signed-off-by: Huawei Xie <huawei.xie@intel.com>
> ---
>  drivers/net/virtio/virtio_ethdev.h      |   2 +
>  drivers/net/virtio/virtio_rxtx.c        |   3 +
>  drivers/net/virtio/virtio_rxtx.h        |   2 +
>  drivers/net/virtio/virtio_rxtx_simple.c | 224
> ++++++++++++++++++++++++++++++++
>  drivers/net/virtio/virtqueue.h          |   1 +
>  5 files changed, 232 insertions(+)
> 
> diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h
> index 9026d42..d7797ab 100644
> --- a/drivers/net/virtio/virtio_ethdev.h
> +++ b/drivers/net/virtio/virtio_ethdev.h
> @@ -108,6 +108,8 @@ uint16_t virtio_recv_mergeable_pkts(void *rx_queue,
> struct rte_mbuf **rx_pkts,
>  uint16_t virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>  		uint16_t nb_pkts);
> 
> +uint16_t virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
> +		uint16_t nb_pkts);
> 
>  /*
>   * The VIRTIO_NET_F_GUEST_TSO[46] features permit the host to send us
> diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
> index 5162ce6..947fc46 100644
> --- a/drivers/net/virtio/virtio_rxtx.c
> +++ b/drivers/net/virtio/virtio_rxtx.c
> @@ -432,6 +432,9 @@ virtio_dev_rx_queue_setup(struct rte_eth_dev *dev,
>  	vq->mpool = mp;
> 
>  	dev->data->rx_queues[queue_idx] = vq;
> +
> +	virtio_rxq_vec_setup(vq);
> +
>  	return 0;
>  }
> 
> diff --git a/drivers/net/virtio/virtio_rxtx.h b/drivers/net/virtio/virtio_rxtx.h
> index 7d2d8fe..831e492 100644
> --- a/drivers/net/virtio/virtio_rxtx.h
> +++ b/drivers/net/virtio/virtio_rxtx.h
> @@ -33,5 +33,7 @@
> 
>  #define RTE_PMD_VIRTIO_RX_MAX_BURST 64
> 
> +int virtio_rxq_vec_setup(struct virtqueue *rxq);
> +
>  int virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,
>  	struct rte_mbuf *m);
> diff --git a/drivers/net/virtio/virtio_rxtx_simple.c
> b/drivers/net/virtio/virtio_rxtx_simple.c
> index cac5b9f..ef17562 100644
> --- a/drivers/net/virtio/virtio_rxtx_simple.c
> +++ b/drivers/net/virtio/virtio_rxtx_simple.c
> @@ -58,6 +58,10 @@
>  #include "virtqueue.h"
>  #include "virtio_rxtx.h"
> 
> +#define RTE_VIRTIO_VPMD_RX_BURST 32
> +#define RTE_VIRTIO_DESC_PER_LOOP 8
> +#define RTE_VIRTIO_VPMD_RX_REARM_THRESH
> RTE_VIRTIO_VPMD_RX_BURST
> +
>  int __attribute__((cold))
>  virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,
>  	struct rte_mbuf *cookie)
> @@ -82,3 +86,223 @@ virtqueue_enqueue_recv_refill_simple(struct
> virtqueue *vq,
> 
>  	return 0;
>  }
> +
> +static inline void
> +virtio_rxq_rearm_vec(struct virtqueue *rxvq)
> +{
> +	int i;
> +	uint16_t desc_idx;
> +	struct rte_mbuf **sw_ring;
> +	struct vring_desc *start_dp;
> +	int ret;
> +
> +	desc_idx = rxvq->vq_avail_idx & (rxvq->vq_nentries - 1);
> +	sw_ring = &rxvq->sw_ring[desc_idx];
> +	start_dp = &rxvq->vq_ring.desc[desc_idx];
> +
> +	ret = rte_mempool_get_bulk(rxvq->mpool, (void **)sw_ring,
> +		RTE_VIRTIO_VPMD_RX_REARM_THRESH);
> +	if (unlikely(ret)) {
> +		rte_eth_devices[rxvq->port_id].data->rx_mbuf_alloc_failed +=
> +			RTE_VIRTIO_VPMD_RX_REARM_THRESH;
> +		return;
> +	}
> +
> +	for (i = 0; i < RTE_VIRTIO_VPMD_RX_REARM_THRESH; i++) {
> +		uintptr_t p;
> +
> +		p = (uintptr_t)&sw_ring[i]->rearm_data;
> +		*(uint64_t *)p = rxvq->mbuf_initializer;
> +
> +		start_dp[i].addr =
> +			(uint64_t)((uintptr_t)sw_ring[i]->buf_physaddr +
> +			RTE_PKTMBUF_HEADROOM - sizeof(struct virtio_net_hdr));
> +		start_dp[i].len = sw_ring[i]->buf_len -
> +			RTE_PKTMBUF_HEADROOM + sizeof(struct virtio_net_hdr);
> +	}
> +
> +	rxvq->vq_avail_idx += RTE_VIRTIO_VPMD_RX_REARM_THRESH;
> +	rxvq->vq_free_cnt -= RTE_VIRTIO_VPMD_RX_REARM_THRESH;
> +	vq_update_avail_idx(rxvq);
> +}
> +
> +/* virtio vPMD receive routine, only accept(nb_pkts >=
> RTE_VIRTIO_DESC_PER_LOOP)
> + *
> + * This routine is for non-mergable RX, one desc for each guest buffer.
> + * This routine is based on the RX ring layout optimization. Each entry in the
> + * avail ring points to the desc with the same index in the desc ring and this
> + * will never be changed in the driver.
> + *
> + * - nb_pkts < RTE_VIRTIO_DESC_PER_LOOP, just return no packet
> + */
> +uint16_t
> +virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
> +	uint16_t nb_pkts)
> +{
> +	struct virtqueue *rxvq = rx_queue;
> +	uint16_t nb_used;
> +	uint16_t desc_idx;
> +	struct vring_used_elem *rused;
> +	struct rte_mbuf **sw_ring;
> +	struct rte_mbuf **sw_ring_end;
> +	uint16_t nb_pkts_received;
> +	__m128i shuf_msk1, shuf_msk2, len_adjust;
> +
> +	shuf_msk1 = _mm_set_epi8(
> +		0xFF, 0xFF, 0xFF, 0xFF,
> +		0xFF, 0xFF,		/* vlan tci */
> +		5, 4,			/* dat len */
> +		0xFF, 0xFF, 5, 4,	/* pkt len */
> +		0xFF, 0xFF, 0xFF, 0xFF	/* packet type */
> +
> +	);
> +
> +	shuf_msk2 = _mm_set_epi8(
> +		0xFF, 0xFF, 0xFF, 0xFF,
> +		0xFF, 0xFF,		/* vlan tci */
> +		13, 12,			/* dat len */
> +		0xFF, 0xFF, 13, 12,	/* pkt len */
> +		0xFF, 0xFF, 0xFF, 0xFF	/* packet type */
> +	);
> +
> +	/* Substract the header length.
> +	*  In which case do we need the header length in used->len ?
> +	*/
> +	len_adjust = _mm_set_epi16(
> +		0, 0,
> +		0,
> +		(uint16_t) -sizeof(struct virtio_net_hdr),
> +		0, (uint16_t) -sizeof(struct virtio_net_hdr),
> +		0, 0);
> +
> +	if (unlikely(nb_pkts < RTE_VIRTIO_DESC_PER_LOOP))
> +		return 0;
> +
> +	nb_used = *(volatile uint16_t *)&rxvq->vq_ring.used->idx -
> +		rxvq->vq_used_cons_idx;
> +
> +	rte_compiler_barrier();
> +
> +	if (unlikely(nb_used == 0))
> +		return 0;
> +
> +	nb_pkts = RTE_ALIGN_FLOOR(nb_pkts, RTE_VIRTIO_DESC_PER_LOOP);
> +	nb_used = RTE_MIN(nb_used, nb_pkts);
> +
> +	desc_idx = (uint16_t)(rxvq->vq_used_cons_idx & (rxvq->vq_nentries - 1));
> +	rused = &rxvq->vq_ring.used->ring[desc_idx];
> +	sw_ring  = &rxvq->sw_ring[desc_idx];
> +	sw_ring_end = &rxvq->sw_ring[rxvq->vq_nentries];
> +
> +	_mm_prefetch((const void *)rused, _MM_HINT_T0);


Wonder if the prefetch will actually help here.
Will prefetching rx_pkts[i] be more helpful?


> +
> +	if (rxvq->vq_free_cnt >= RTE_VIRTIO_VPMD_RX_REARM_THRESH) {
> +		virtio_rxq_rearm_vec(rxvq);
> +		if (unlikely(virtqueue_kick_prepare(rxvq)))
> +			virtqueue_notify(rxvq);
> +	}
> +
> +	for (nb_pkts_received = 0;
> +		nb_pkts_received < nb_used;) {
> +		__m128i desc[RTE_VIRTIO_DESC_PER_LOOP / 2];
> +		__m128i mbp[RTE_VIRTIO_DESC_PER_LOOP / 2];
> +		__m128i pkt_mb[RTE_VIRTIO_DESC_PER_LOOP];
> +
> +		mbp[0] = _mm_loadu_si128((__m128i *)(sw_ring + 0));
> +		desc[0] = _mm_loadu_si128((__m128i *)(rused + 0));
> +		_mm_storeu_si128((__m128i *)&rx_pkts[0], mbp[0]);
> +
> +		mbp[1] = _mm_loadu_si128((__m128i *)(sw_ring + 2));
> +		desc[1] = _mm_loadu_si128((__m128i *)(rused + 2));
> +		_mm_storeu_si128((__m128i *)&rx_pkts[2], mbp[1]);
> +
> +		mbp[2] = _mm_loadu_si128((__m128i *)(sw_ring + 4));
> +		desc[2] = _mm_loadu_si128((__m128i *)(rused + 4));
> +		_mm_storeu_si128((__m128i *)&rx_pkts[4], mbp[2]);
> +
> +		mbp[3] = _mm_loadu_si128((__m128i *)(sw_ring + 6));
> +		desc[3] = _mm_loadu_si128((__m128i *)(rused + 6));
> +		_mm_storeu_si128((__m128i *)&rx_pkts[6], mbp[3]);
> +
> +		pkt_mb[1] = _mm_shuffle_epi8(desc[0], shuf_msk2);
> +		pkt_mb[0] = _mm_shuffle_epi8(desc[0], shuf_msk1);
> +		pkt_mb[1] = _mm_add_epi16(pkt_mb[1], len_adjust);
> +		pkt_mb[0] = _mm_add_epi16(pkt_mb[0], len_adjust);
> +		_mm_storeu_si128((void *)&rx_pkts[1]->rx_descriptor_fields1,
> +			pkt_mb[1]);
> +		_mm_storeu_si128((void *)&rx_pkts[0]->rx_descriptor_fields1,
> +			pkt_mb[0]);
> +
> +		pkt_mb[3] = _mm_shuffle_epi8(desc[1], shuf_msk2);
> +		pkt_mb[2] = _mm_shuffle_epi8(desc[1], shuf_msk1);
> +		pkt_mb[3] = _mm_add_epi16(pkt_mb[3], len_adjust);
> +		pkt_mb[2] = _mm_add_epi16(pkt_mb[2], len_adjust);
> +		_mm_storeu_si128((void *)&rx_pkts[3]->rx_descriptor_fields1,
> +			pkt_mb[3]);
> +		_mm_storeu_si128((void *)&rx_pkts[2]->rx_descriptor_fields1,
> +			pkt_mb[2]);
> +
> +		pkt_mb[5] = _mm_shuffle_epi8(desc[2], shuf_msk2);
> +		pkt_mb[4] = _mm_shuffle_epi8(desc[2], shuf_msk1);
> +		pkt_mb[5] = _mm_add_epi16(pkt_mb[5], len_adjust);
> +		pkt_mb[4] = _mm_add_epi16(pkt_mb[4], len_adjust);
> +		_mm_storeu_si128((void *)&rx_pkts[5]->rx_descriptor_fields1,
> +			pkt_mb[5]);
> +		_mm_storeu_si128((void *)&rx_pkts[4]->rx_descriptor_fields1,
> +			pkt_mb[4]);
> +
> +		pkt_mb[7] = _mm_shuffle_epi8(desc[3], shuf_msk2);
> +		pkt_mb[6] = _mm_shuffle_epi8(desc[3], shuf_msk1);
> +		pkt_mb[7] = _mm_add_epi16(pkt_mb[7], len_adjust);
> +		pkt_mb[6] = _mm_add_epi16(pkt_mb[6], len_adjust);
> +		_mm_storeu_si128((void *)&rx_pkts[7]->rx_descriptor_fields1,
> +			pkt_mb[7]);
> +		_mm_storeu_si128((void *)&rx_pkts[6]->rx_descriptor_fields1,
> +			pkt_mb[6]);
> +
> +		if (unlikely(nb_used <= RTE_VIRTIO_DESC_PER_LOOP)) {
> +			if (sw_ring + nb_used <= sw_ring_end)
> +				nb_pkts_received += nb_used;
> +			else
> +				nb_pkts_received += sw_ring_end - sw_ring;
> +			break;
> +		} else {
> +			if (unlikely(sw_ring + RTE_VIRTIO_DESC_PER_LOOP >=
> +				sw_ring_end)) {
> +				nb_pkts_received += sw_ring_end - sw_ring;
> +				break;
> +			} else {
> +				nb_pkts_received += RTE_VIRTIO_DESC_PER_LOOP;
> +
> +				rx_pkts += RTE_VIRTIO_DESC_PER_LOOP;
> +				sw_ring += RTE_VIRTIO_DESC_PER_LOOP;
> +				rused   += RTE_VIRTIO_DESC_PER_LOOP;
> +				nb_used -= RTE_VIRTIO_DESC_PER_LOOP;
> +			}
> +		}
> +	}
> +
> +	rxvq->vq_used_cons_idx += nb_pkts_received;
> +	rxvq->vq_free_cnt += nb_pkts_received;
> +	rxvq->packets += nb_pkts_received;
> +	return nb_pkts_received;
> +}
> +
> +int __attribute__((cold))
> +virtio_rxq_vec_setup(struct virtqueue *rxq)
> +{
> +	uintptr_t p;
> +	struct rte_mbuf mb_def = { .buf_addr = 0 }; /* zeroed mbuf */
> +
> +	mb_def.nb_segs = 1;
> +	mb_def.data_off = RTE_PKTMBUF_HEADROOM;
> +	mb_def.port = rxq->port_id;
> +	rte_mbuf_refcnt_set(&mb_def, 1);
> +
> +	/* prevent compiler reordering: rearm_data covers previous fields */
> +	rte_compiler_barrier();
> +	p = (uintptr_t)&mb_def.rearm_data;
> +	rxq->mbuf_initializer = *(uint64_t *)p;
> +
> +	return 0;
> +}
> diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
> index 6a1ec48..98a77d5 100644
> --- a/drivers/net/virtio/virtqueue.h
> +++ b/drivers/net/virtio/virtqueue.h
> @@ -188,6 +188,7 @@ struct virtqueue {
>  	 */
>  	uint16_t vq_used_cons_idx;
>  	uint16_t vq_avail_idx;
> +	uint64_t mbuf_initializer; /**< value to init mbufs. */
>  	phys_addr_t virtio_net_hdr_mem; /**< hdr for each xmit packet */
> 
>  	struct rte_mbuf **sw_ring; /**< RX software ring. */
> --
> 1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [dpdk-dev] [PATCH v3 6/7] virtio: simple tx routine
  2015-10-20 18:58     ` Stephen Hemminger
@ 2015-10-22  5:43       ` Xie, Huawei
  0 siblings, 0 replies; 92+ messages in thread
From: Xie, Huawei @ 2015-10-22  5:43 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

On 10/21/2015 2:58 AM, Stephen Hemminger wrote:
> On Tue, 20 Oct 2015 23:30:06 +0800
> Huawei Xie <huawei.xie@intel.com> wrote:
>
>> +	desc_idx = (uint16_t)(vq->vq_used_cons_idx &
>> +		((vq->vq_nentries >> 1) - 1));
>> +	free[0] = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
>> +	nb_free = 1;
>> +
>> +	for (i = 1; i < VIRTIO_TX_FREE_NR; i++) {
>> +		m = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
>> +		if (likely(m->pool == free[0]->pool))
>> +			free[nb_free++] = m;
>> +		else {
>> +			rte_mempool_put_bulk(free[0]->pool, (void **)free,
>> +				nb_free);
>> +			free[0] = m;
>> +			nb_free = 1;
>> +		}
>> +	}
>> +
>> +	rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
> Might be better to introduce a function in rte_mbuf.h which
> does this so other drivers can use same code?
>
> rte_pktmbuf_free_bulk(pkts[], n)
Agree. It would be good to have a generic rte_pktmbuf_free(/alloc)_bulk.
Several other drivers and future vhost patches also use the same logic.
I prefer to implement this later as this is API change.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [dpdk-dev] [PATCH v3 5/7] virtio: virtio vec rx
  2015-10-22  4:04     ` Wang, Zhihong
@ 2015-10-22  5:48       ` Xie, Huawei
  0 siblings, 0 replies; 92+ messages in thread
From: Xie, Huawei @ 2015-10-22  5:48 UTC (permalink / raw)
  To: Wang, Zhihong; +Cc: dev

On 10/22/2015 12:04 PM, Wang, Zhihong wrote:
> Wonder if the prefetch will actually help here.
> Will prefetching rx_pkts[i] be more helpful?
What is your concern prefetch the virtio ring?
rx_pkts is local array, Why do we need to prefetch it?


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [dpdk-dev] [PATCH v3 7/7] virtio: pick simple rx/tx func
  2015-10-22  2:50     ` Tan, Jianfeng
@ 2015-10-22 11:40       ` Xie, Huawei
  0 siblings, 0 replies; 92+ messages in thread
From: Xie, Huawei @ 2015-10-22 11:40 UTC (permalink / raw)
  To: Tan, Jianfeng, dev

On 10/22/2015 10:50 AM, Tan, Jianfeng wrote:
> On 10/22/2015 10:45 AM, Jianfeng wrote:
>
>> -----Original Message-----
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Huawei Xie
>> Sent: Tuesday, October 20, 2015 11:30 PM
>> To: dev@dpdk.org
>> Subject: [dpdk-dev] [PATCH v3 7/7] virtio: pick simple rx/tx func
>>
>> simple rx/tx func is enabled when user specifies single segment and no
>> offload support.
>> merge-able should be disabled to use simple rxtx.
>>
>> Signed-off-by: Huawei Xie <huawei.xie@intel.com>
>> ---
>>  drivers/net/virtio/virtio_rxtx.c | 12 ++++++++++++
>>  1 file changed, 12 insertions(+)
>>
>> diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
>> index 947fc46..71f8cd4 100644
>> --- a/drivers/net/virtio/virtio_rxtx.c
>> +++ b/drivers/net/virtio/virtio_rxtx.c
>> @@ -62,6 +62,10 @@
>>  #define  VIRTIO_DUMP_PACKET(m, len) do { } while (0)  #endif
>>
>> +
>> +#define VIRTIO_SIMPLE_FLAGS ((uint32_t)ETH_TXQ_FLAGS_NOMULTSEGS
>> | \
>> +	ETH_TXQ_FLAGS_NOOFFLOADS)
>> +
>>  static int use_simple_rxtx;
>>
>>  static void
>> @@ -471,6 +475,14 @@ virtio_dev_tx_queue_setup(struct rte_eth_dev
>> *dev,
>>  		return -EINVAL;
>>  	}
>>
>> +	/* Use simple rx/tx func if single segment and no offloads */
>> +	if ((tx_conf->txq_flags & VIRTIO_SIMPLE_FLAGS) ==
>> VIRTIO_SIMPLE_FLAGS) {
>> +		PMD_INIT_LOG(INFO, "Using simple rx/tx path");
>> +		dev->tx_pkt_burst = virtio_xmit_pkts_simple;
>> +		dev->rx_pkt_burst = virtio_recv_pkts_vec;
> Whether recv side mergeable is supported is controlled by virtio_negotiate_feature().
> So "dev->rx_pkt_burst = virtio_recv_pkts_vec" should be restricted by 
> hw->guest_features & VIRTIO_NET_F_MRG_RXBUF, right?
Add this check in next version. However it will still be put here as we
want to leave us a chance to dynamically choose normal/simple rx function.
>
>> +		use_simple_rxtx = 1;
>> +	}
>> +
>>  	ret = virtio_dev_queue_setup(dev, VTNET_TQ, queue_idx,
>> vtpci_queue_idx,
>>  			nb_desc, socket_id, &vq);
>>  	if (ret < 0) {
>> --
>> 1.8.1.4
>


^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH v4 0/7] virtio ring layout optimization and simple rx/tx processing
  2015-09-29 14:45 [dpdk-dev] [PATCH 0/8] virtio: virtio ring layout optimization and RX vector processing Huawei Xie
                   ` (11 preceding siblings ...)
  2015-10-20 15:30 ` [dpdk-dev] [PATCH v3 0/7] virtio ring layout optimization and simple rx/tx processing Huawei Xie
@ 2015-10-22 12:09 ` Huawei Xie
  2015-10-22 12:09   ` [dpdk-dev] [PATCH v4 1/7] virtio: add virtio_rxtx.h header file Huawei Xie
                     ` (6 more replies)
  2015-10-25 15:34 ` [dpdk-dev] [PATCH v5 0/7] virtio ring layout optimization and simple rx/tx processing Huawei Xie
  2015-10-29 14:53 ` [dpdk-dev] [PATCH v6 0/8] " Huawei Xie
  14 siblings, 7 replies; 92+ messages in thread
From: Huawei Xie @ 2015-10-22 12:09 UTC (permalink / raw)
  To: dev

Changes in v2:
- Remove the configure macro
- Enable simple R/TX processing when user specifies simple txq flags
- Reword some comments and commit messages

Changes in v3:
- Remove unnecessary NULL test for rte_free
- Remove unnecessary assign of local var after free
- Remove return at the end of void function
- Remove always_inline attribute for virtio_xmit_cleanup
- Reword some commit messages
- Add TODO in the commit message of simple tx patch

Changes in v4:
- Fix the error in virtio tx ring layout ascii chart in the commit message
- move virtio_xmit_cleanup ahead to free descriptors earlier
- Test merge-able feature when select simple rx/tx functions

In DPDK based switching enviroment, mostly vhost runs on a dedicated core
while virtio processing in guest VMs runs on other different cores.
Take RX for example, with generic implementation, for each guest buffer,
a) virtio driver allocates a descriptor from free descriptor list
b) modify the entry of avail ring to point to allocated descriptor
c) after packet is received, free the descriptor

When vhost fetches the avail ring, it need to fetch the modified L1 cache from
virtio core, which is a heavy cost in current CPU implementation.

This idea of this optimization is:
    allocate the fixed descriptor for each entry of avail ring, so avail ring will
always be the same during the run.
This removes L1M cache transfer from virtio core to vhost core for avail ring.
(Note we couldn't avoid the cache transfer for descriptors).
Besides, descriptor allocation and free operation is eliminated.
This also makes vector procesing possible to further accelerate the processing.

This is the layout for the avail ring(take 256 ring entries for example), with
each entry pointing to the descriptor with the same index.
                    avail
                    idx
                    +
                    |
+----+----+---+-------------+------+
| 0  | 1  | 2 | ... |  254  | 255  |  avail ring
+-+--+-+--+-+-+---------+---+--+---+
  |    |    |       |   |      |
  |    |    |       |   |      |
  v    v    v       |   v      v
+-+--+-+--+-+-+---------+---+--+---+
| 0  | 1  | 2 | ... |  254  | 255  |  desc ring
+----+----+---+-------------+------+
                    |
                    |
+----+----+---+-------------+------+
| 0  | 1  | 2 |     |  254  | 255  |  used ring
+----+----+---+-------------+------+
                    |
                    +

This is the ring layout for TX.
As we need one virtio header for each xmit packet, we have 128 slots available.

                         ++
                         ||
                         ||
+-----+-----+-----+--------------+------+------+------+
|  0  |  1  | ... |  127 || 128  | 129  | ...  | 255  |   avail ring
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
   |     |            |  ||  |      |             |
   v     v            v  ||  v      v             v
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
| 127 | 128 | ... |  255 || 127  | 128  | ...  | 255  |   desc ring for virtio_net_hdr
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
   |     |            |  ||  |      |             |
   v     v            v  ||  v      v             v
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
|  0  |  1  | ... |  127 ||  0   |  1   | ...  | 127  |   desc ring for tx dat
+-----+-----+-----+--------------+------+------+------+
                         ||
                         ||
                         ++


Performance boost could be observed only if the virtio backend isn't the bottleneck or in VM2VM
case.
There are also several vhost optimization patches to be submitted later.

Huawei Xie (7):
  virtio: add virtio_rxtx.h header file
  virtio: add software rx ring, fake_buf into virtqueue
  virtio: rx/tx ring layout optimization
  virtio: fill RX avail ring with blank mbufs
  virtio: virtio vec rx
  virtio: simple tx routine
  virtio: choose simple rx/tx func

 drivers/net/virtio/Makefile             |   2 +-
 drivers/net/virtio/virtio_ethdev.c      |  12 +-
 drivers/net/virtio/virtio_ethdev.h      |   5 +
 drivers/net/virtio/virtio_rxtx.c        |  56 ++++-
 drivers/net/virtio/virtio_rxtx.h        |  39 ++++
 drivers/net/virtio/virtio_rxtx_simple.c | 401 ++++++++++++++++++++++++++++++++
 drivers/net/virtio/virtqueue.h          |   5 +
 7 files changed, 516 insertions(+), 4 deletions(-)
 create mode 100644 drivers/net/virtio/virtio_rxtx.h
 create mode 100644 drivers/net/virtio/virtio_rxtx_simple.c

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH v4 1/7] virtio: add virtio_rxtx.h header file
  2015-10-22 12:09 ` [dpdk-dev] [PATCH v4 0/7] virtio ring layout optimization and simple rx/tx processing Huawei Xie
@ 2015-10-22 12:09   ` Huawei Xie
  2015-10-22 12:09   ` [dpdk-dev] [PATCH v4 2/7] virtio: add software rx ring, fake_buf into virtqueue Huawei Xie
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 92+ messages in thread
From: Huawei Xie @ 2015-10-22 12:09 UTC (permalink / raw)
  To: dev

Would move all rx/tx related declarations into this header file in future.
Add RTE_VIRTIO_PMD_MAX_BURST.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 drivers/net/virtio/virtio_ethdev.c |  1 +
 drivers/net/virtio/virtio_rxtx.c   |  1 +
 drivers/net/virtio/virtio_rxtx.h   | 34 ++++++++++++++++++++++++++++++++++
 3 files changed, 36 insertions(+)
 create mode 100644 drivers/net/virtio/virtio_rxtx.h

diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index 465d3cd..79a3640 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -61,6 +61,7 @@
 #include "virtio_pci.h"
 #include "virtio_logs.h"
 #include "virtqueue.h"
+#include "virtio_rxtx.h"
 
 
 static int eth_virtio_dev_init(struct rte_eth_dev *eth_dev);
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index c5b53bb..9324f7f 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -54,6 +54,7 @@
 #include "virtio_logs.h"
 #include "virtio_ethdev.h"
 #include "virtqueue.h"
+#include "virtio_rxtx.h"
 
 #ifdef RTE_LIBRTE_VIRTIO_DEBUG_DUMP
 #define VIRTIO_DUMP_PACKET(m, len) rte_pktmbuf_dump(stdout, m, len)
diff --git a/drivers/net/virtio/virtio_rxtx.h b/drivers/net/virtio/virtio_rxtx.h
new file mode 100644
index 0000000..a10aa69
--- /dev/null
+++ b/drivers/net/virtio/virtio_rxtx.h
@@ -0,0 +1,34 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#define RTE_PMD_VIRTIO_RX_MAX_BURST 64
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH v4 2/7] virtio: add software rx ring, fake_buf into virtqueue
  2015-10-22 12:09 ` [dpdk-dev] [PATCH v4 0/7] virtio ring layout optimization and simple rx/tx processing Huawei Xie
  2015-10-22 12:09   ` [dpdk-dev] [PATCH v4 1/7] virtio: add virtio_rxtx.h header file Huawei Xie
@ 2015-10-22 12:09   ` Huawei Xie
  2015-10-22 12:09   ` [dpdk-dev] [PATCH v4 3/7] virtio: rx/tx ring layout optimization Huawei Xie
                     ` (4 subsequent siblings)
  6 siblings, 0 replies; 92+ messages in thread
From: Huawei Xie @ 2015-10-22 12:09 UTC (permalink / raw)
  To: dev

Changes in v3:
- Remove unnecessary NULL test for rte_free
- Remove unnecessary assign of local var vq after free

Add software RX ring in virtqueue.
Add fake_mbuf in virtqueue for wraparound processing.
Use global simple_rxtx to indicate whether simple rxtx is enabled

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 drivers/net/virtio/virtio_ethdev.c | 11 ++++++++++-
 drivers/net/virtio/virtio_rxtx.c   |  7 +++++++
 drivers/net/virtio/virtqueue.h     |  4 ++++
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index 79a3640..82676d3 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -247,8 +247,8 @@ virtio_dev_queue_release(struct virtqueue *vq) {
 		VIRTIO_WRITE_REG_2(hw, VIRTIO_PCI_QUEUE_SEL, vq->queue_id);
 		VIRTIO_WRITE_REG_4(hw, VIRTIO_PCI_QUEUE_PFN, 0);
 
+		rte_free(vq->sw_ring);
 		rte_free(vq);
-		vq = NULL;
 	}
 }
 
@@ -292,6 +292,9 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
 			dev->data->port_id, queue_idx);
 		vq = rte_zmalloc(vq_name, sizeof(struct virtqueue) +
 			vq_size * sizeof(struct vq_desc_extra), RTE_CACHE_LINE_SIZE);
+		vq->sw_ring = rte_zmalloc_socket("rxq->sw_ring",
+			(RTE_PMD_VIRTIO_RX_MAX_BURST + vq_size) *
+			sizeof(vq->sw_ring[0]), RTE_CACHE_LINE_SIZE, socket_id);
 	} else if (queue_type == VTNET_TQ) {
 		snprintf(vq_name, sizeof(vq_name), "port%d_tvq%d",
 			dev->data->port_id, queue_idx);
@@ -308,6 +311,12 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
 		PMD_INIT_LOG(ERR, "%s: Can not allocate virtqueue", __func__);
 		return (-ENOMEM);
 	}
+	if (queue_type == VTNET_RQ && vq->sw_ring == NULL) {
+		PMD_INIT_LOG(ERR, "%s: Can not allocate RX soft ring",
+			__func__);
+		rte_free(vq);
+		return -ENOMEM;
+	}
 
 	vq->hw = hw;
 	vq->port_id = dev->data->port_id;
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 9324f7f..5c00e9d 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -62,6 +62,8 @@
 #define  VIRTIO_DUMP_PACKET(m, len) do { } while (0)
 #endif
 
+static int use_simple_rxtx;
+
 static void
 vq_ring_free_chain(struct virtqueue *vq, uint16_t desc_idx)
 {
@@ -299,6 +301,11 @@ virtio_dev_vring_start(struct virtqueue *vq, int queue_type)
 		/* Allocate blank mbufs for the each rx descriptor */
 		nbufs = 0;
 		error = ENOSPC;
+
+		memset(&vq->fake_mbuf, 0, sizeof(vq->fake_mbuf));
+		for (i = 0; i < RTE_PMD_VIRTIO_RX_MAX_BURST; i++)
+			vq->sw_ring[vq->vq_nentries + i] = &vq->fake_mbuf;
+
 		while (!virtqueue_full(vq)) {
 			m = rte_rxmbuf_alloc(vq->mpool);
 			if (m == NULL)
diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
index 7789411..6a1ec48 100644
--- a/drivers/net/virtio/virtqueue.h
+++ b/drivers/net/virtio/virtqueue.h
@@ -190,6 +190,10 @@ struct virtqueue {
 	uint16_t vq_avail_idx;
 	phys_addr_t virtio_net_hdr_mem; /**< hdr for each xmit packet */
 
+	struct rte_mbuf **sw_ring; /**< RX software ring. */
+	/* dummy mbuf, for wraparound when processing RX ring. */
+	struct rte_mbuf fake_mbuf;
+
 	/* Statistics */
 	uint64_t	packets;
 	uint64_t	bytes;
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH v4 3/7] virtio: rx/tx ring layout optimization
  2015-10-22 12:09 ` [dpdk-dev] [PATCH v4 0/7] virtio ring layout optimization and simple rx/tx processing Huawei Xie
  2015-10-22 12:09   ` [dpdk-dev] [PATCH v4 1/7] virtio: add virtio_rxtx.h header file Huawei Xie
  2015-10-22 12:09   ` [dpdk-dev] [PATCH v4 2/7] virtio: add software rx ring, fake_buf into virtqueue Huawei Xie
@ 2015-10-22 12:09   ` Huawei Xie
  2015-10-22 12:09   ` [dpdk-dev] [PATCH v4 4/7] virtio: fill RX avail ring with blank mbufs Huawei Xie
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 92+ messages in thread
From: Huawei Xie @ 2015-10-22 12:09 UTC (permalink / raw)
  To: dev

Changes in V4:
- fix the error in tx ring layout chart in this commit message.

In DPDK based switching envrioment, mostly vhost runs on a dedicated core
while virtio processing in guest VMs runs on different cores.
Take RX for example, with generic implementation, for each guest buffer,
a) virtio driver allocates a descriptor from free descriptor list
b) modify the entry of avail ring to point to allocated descriptor
c) after packet is received, free the descriptor

When vhost fetches the avail ring, it need to fetch the modified L1 cache from
virtio core, which is a heavy cost in current CPU implementation.

This idea of this optimization is:
    allocate the fixed descriptor for each entry of avail ring, so avail ring will
always be the same during the run.
This removes L1M cache transfer from virtio core to vhost core for avail ring.
(Note we couldn't avoid the cache transfer for descriptors).
Besides, descriptor allocation and free operation is eliminated.
This also makes vector procesing possible to further accelerate the processing.

This is the layout for the avail ring(take 256 ring entries for example), with
each entry pointing to the descriptor with the same index.
                    avail
                    idx
                    +
                    |
+----+----+---+-------------+------+
| 0  | 1  | 2 | ... |  254  | 255  |  avail ring
+-+--+-+--+-+-+---------+---+--+---+
  |    |    |       |   |      |
  |    |    |       |   |      |
  v    v    v       |   v      v
+-+--+-+--+-+-+---------+---+--+---+
| 0  | 1  | 2 | ... |  254  | 255  |  desc ring
+----+----+---+-------------+------+
                    |
                    |
+----+----+---+-------------+------+
| 0  | 1  | 2 |     |  254  | 255  |  used ring
+----+----+---+-------------+------+
                    |
                    +

This is the ring layout for TX.
As we need one virtio header for each xmit packet, we have 128 slots available.

                         ++
                         ||
                         ||
+-----+-----+-----+--------------+------+------+------+
|  0  |  1  | ... |  127 || 128  | 129  | ...  | 255  |   avail ring
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
   |     |            |  ||  |      |             |
   v     v            v  ||  v      v             v
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
| 128 | 129 | ... |  255 || 128  | 129  | ...  | 255  |   desc ring for virtio_net_hdr
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
   |     |            |  ||  |      |             |
   v     v            v  ||  v      v             v
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
|  0  |  1  | ... |  127 ||  0   |  1   | ...  | 127  |   desc ring for tx dat
+-----+-----+-----+--------------+------+------+------+
                         ||
                         ||
                         ++

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 drivers/net/virtio/virtio_rxtx.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 5c00e9d..7c82a6a 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -302,6 +302,12 @@ virtio_dev_vring_start(struct virtqueue *vq, int queue_type)
 		nbufs = 0;
 		error = ENOSPC;
 
+		if (use_simple_rxtx)
+			for (i = 0; i < vq->vq_nentries; i++) {
+				vq->vq_ring.avail->ring[i] = i;
+				vq->vq_ring.desc[i].flags = VRING_DESC_F_WRITE;
+			}
+
 		memset(&vq->fake_mbuf, 0, sizeof(vq->fake_mbuf));
 		for (i = 0; i < RTE_PMD_VIRTIO_RX_MAX_BURST; i++)
 			vq->sw_ring[vq->vq_nentries + i] = &vq->fake_mbuf;
@@ -332,6 +338,24 @@ virtio_dev_vring_start(struct virtqueue *vq, int queue_type)
 		VIRTIO_WRITE_REG_4(vq->hw, VIRTIO_PCI_QUEUE_PFN,
 			vq->mz->phys_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT);
 	} else if (queue_type == VTNET_TQ) {
+		if (use_simple_rxtx) {
+			int mid_idx  = vq->vq_nentries >> 1;
+			for (i = 0; i < mid_idx; i++) {
+				vq->vq_ring.avail->ring[i] = i + mid_idx;
+				vq->vq_ring.desc[i + mid_idx].next = i;
+				vq->vq_ring.desc[i + mid_idx].addr =
+					vq->virtio_net_hdr_mem +
+						mid_idx * vq->hw->vtnet_hdr_size;
+				vq->vq_ring.desc[i + mid_idx].len =
+					vq->hw->vtnet_hdr_size;
+				vq->vq_ring.desc[i + mid_idx].flags =
+					VRING_DESC_F_NEXT;
+				vq->vq_ring.desc[i].flags = 0;
+			}
+			for (i = mid_idx; i < vq->vq_nentries; i++)
+				vq->vq_ring.avail->ring[i] = i;
+		}
+
 		VIRTIO_WRITE_REG_2(vq->hw, VIRTIO_PCI_QUEUE_SEL,
 			vq->vq_queue_index);
 		VIRTIO_WRITE_REG_4(vq->hw, VIRTIO_PCI_QUEUE_PFN,
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH v4 4/7] virtio: fill RX avail ring with blank mbufs
  2015-10-22 12:09 ` [dpdk-dev] [PATCH v4 0/7] virtio ring layout optimization and simple rx/tx processing Huawei Xie
                     ` (2 preceding siblings ...)
  2015-10-22 12:09   ` [dpdk-dev] [PATCH v4 3/7] virtio: rx/tx ring layout optimization Huawei Xie
@ 2015-10-22 12:09   ` Huawei Xie
  2015-10-23  5:56     ` Tan, Jianfeng
  2015-10-22 12:09   ` [dpdk-dev] [PATCH v4 5/7] virtio: virtio vec rx Huawei Xie
                     ` (2 subsequent siblings)
  6 siblings, 1 reply; 92+ messages in thread
From: Huawei Xie @ 2015-10-22 12:09 UTC (permalink / raw)
  To: dev

fill avail ring with blank mbufs in virtio_dev_vring_start

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 drivers/net/virtio/Makefile             |  2 +-
 drivers/net/virtio/virtio_rxtx.c        |  6 ++-
 drivers/net/virtio/virtio_rxtx.h        |  3 ++
 drivers/net/virtio/virtio_rxtx_simple.c | 84 +++++++++++++++++++++++++++++++++
 4 files changed, 92 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/virtio/virtio_rxtx_simple.c

diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
index 930b60f..43835ba 100644
--- a/drivers/net/virtio/Makefile
+++ b/drivers/net/virtio/Makefile
@@ -50,7 +50,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtqueue.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_pci.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_ethdev.c
-
+SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_simple.c
 
 # this lib depends upon:
 DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_eal lib/librte_ether
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 7c82a6a..5162ce6 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -320,8 +320,10 @@ virtio_dev_vring_start(struct virtqueue *vq, int queue_type)
 			/******************************************
 			*         Enqueue allocated buffers        *
 			*******************************************/
-			error = virtqueue_enqueue_recv_refill(vq, m);
-
+			if (use_simple_rxtx)
+				error = virtqueue_enqueue_recv_refill_simple(vq, m);
+			else
+				error = virtqueue_enqueue_recv_refill(vq, m);
 			if (error) {
 				rte_pktmbuf_free(m);
 				break;
diff --git a/drivers/net/virtio/virtio_rxtx.h b/drivers/net/virtio/virtio_rxtx.h
index a10aa69..7d2d8fe 100644
--- a/drivers/net/virtio/virtio_rxtx.h
+++ b/drivers/net/virtio/virtio_rxtx.h
@@ -32,3 +32,6 @@
  */
 
 #define RTE_PMD_VIRTIO_RX_MAX_BURST 64
+
+int virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,
+	struct rte_mbuf *m);
diff --git a/drivers/net/virtio/virtio_rxtx_simple.c b/drivers/net/virtio/virtio_rxtx_simple.c
new file mode 100644
index 0000000..cac5b9f
--- /dev/null
+++ b/drivers/net/virtio/virtio_rxtx_simple.c
@@ -0,0 +1,84 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <errno.h>
+
+#include <tmmintrin.h>
+
+#include <rte_cycles.h>
+#include <rte_memory.h>
+#include <rte_memzone.h>
+#include <rte_branch_prediction.h>
+#include <rte_mempool.h>
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+#include <rte_ether.h>
+#include <rte_ethdev.h>
+#include <rte_prefetch.h>
+#include <rte_string_fns.h>
+#include <rte_errno.h>
+#include <rte_byteorder.h>
+
+#include "virtio_logs.h"
+#include "virtio_ethdev.h"
+#include "virtqueue.h"
+#include "virtio_rxtx.h"
+
+int __attribute__((cold))
+virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,
+	struct rte_mbuf *cookie)
+{
+	struct vq_desc_extra *dxp;
+	struct vring_desc *start_dp;
+	uint16_t desc_idx;
+
+	desc_idx = vq->vq_avail_idx & (vq->vq_nentries - 1);
+	dxp = &vq->vq_descx[desc_idx];
+	dxp->cookie = (void *)cookie;
+	vq->sw_ring[desc_idx] = cookie;
+
+	start_dp = vq->vq_ring.desc;
+	start_dp[desc_idx].addr = (uint64_t)((uintptr_t)cookie->buf_physaddr +
+		RTE_PKTMBUF_HEADROOM - sizeof(struct virtio_net_hdr));
+	start_dp[desc_idx].len = cookie->buf_len -
+		RTE_PKTMBUF_HEADROOM + sizeof(struct virtio_net_hdr);
+
+	vq->vq_free_cnt--;
+	vq->vq_avail_idx++;
+
+	return 0;
+}
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH v4 5/7] virtio: virtio vec rx
  2015-10-22 12:09 ` [dpdk-dev] [PATCH v4 0/7] virtio ring layout optimization and simple rx/tx processing Huawei Xie
                     ` (3 preceding siblings ...)
  2015-10-22 12:09   ` [dpdk-dev] [PATCH v4 4/7] virtio: fill RX avail ring with blank mbufs Huawei Xie
@ 2015-10-22 12:09   ` Huawei Xie
  2015-10-22 12:09   ` [dpdk-dev] [PATCH v4 6/7] virtio: simple tx routine Huawei Xie
  2015-10-22 12:09   ` [dpdk-dev] [PATCH v4 7/7] virtio: pick simple rx/tx func Huawei Xie
  6 siblings, 0 replies; 92+ messages in thread
From: Huawei Xie @ 2015-10-22 12:09 UTC (permalink / raw)
  To: dev

With fixed avail ring, we don't need to get desc idx from avail ring.
virtio driver only has to deal with desc ring.
This patch uses vector instruction to accelerate processing desc ring.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 drivers/net/virtio/virtio_ethdev.h      |   2 +
 drivers/net/virtio/virtio_rxtx.c        |   3 +
 drivers/net/virtio/virtio_rxtx.h        |   2 +
 drivers/net/virtio/virtio_rxtx_simple.c | 224 ++++++++++++++++++++++++++++++++
 drivers/net/virtio/virtqueue.h          |   1 +
 5 files changed, 232 insertions(+)

diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h
index 9026d42..d7797ab 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -108,6 +108,8 @@ uint16_t virtio_recv_mergeable_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 uint16_t virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
+		uint16_t nb_pkts);
 
 /*
  * The VIRTIO_NET_F_GUEST_TSO[46] features permit the host to send us
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 5162ce6..947fc46 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -432,6 +432,9 @@ virtio_dev_rx_queue_setup(struct rte_eth_dev *dev,
 	vq->mpool = mp;
 
 	dev->data->rx_queues[queue_idx] = vq;
+
+	virtio_rxq_vec_setup(vq);
+
 	return 0;
 }
 
diff --git a/drivers/net/virtio/virtio_rxtx.h b/drivers/net/virtio/virtio_rxtx.h
index 7d2d8fe..831e492 100644
--- a/drivers/net/virtio/virtio_rxtx.h
+++ b/drivers/net/virtio/virtio_rxtx.h
@@ -33,5 +33,7 @@
 
 #define RTE_PMD_VIRTIO_RX_MAX_BURST 64
 
+int virtio_rxq_vec_setup(struct virtqueue *rxq);
+
 int virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,
 	struct rte_mbuf *m);
diff --git a/drivers/net/virtio/virtio_rxtx_simple.c b/drivers/net/virtio/virtio_rxtx_simple.c
index cac5b9f..ef17562 100644
--- a/drivers/net/virtio/virtio_rxtx_simple.c
+++ b/drivers/net/virtio/virtio_rxtx_simple.c
@@ -58,6 +58,10 @@
 #include "virtqueue.h"
 #include "virtio_rxtx.h"
 
+#define RTE_VIRTIO_VPMD_RX_BURST 32
+#define RTE_VIRTIO_DESC_PER_LOOP 8
+#define RTE_VIRTIO_VPMD_RX_REARM_THRESH RTE_VIRTIO_VPMD_RX_BURST
+
 int __attribute__((cold))
 virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,
 	struct rte_mbuf *cookie)
@@ -82,3 +86,223 @@ virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,
 
 	return 0;
 }
+
+static inline void
+virtio_rxq_rearm_vec(struct virtqueue *rxvq)
+{
+	int i;
+	uint16_t desc_idx;
+	struct rte_mbuf **sw_ring;
+	struct vring_desc *start_dp;
+	int ret;
+
+	desc_idx = rxvq->vq_avail_idx & (rxvq->vq_nentries - 1);
+	sw_ring = &rxvq->sw_ring[desc_idx];
+	start_dp = &rxvq->vq_ring.desc[desc_idx];
+
+	ret = rte_mempool_get_bulk(rxvq->mpool, (void **)sw_ring,
+		RTE_VIRTIO_VPMD_RX_REARM_THRESH);
+	if (unlikely(ret)) {
+		rte_eth_devices[rxvq->port_id].data->rx_mbuf_alloc_failed +=
+			RTE_VIRTIO_VPMD_RX_REARM_THRESH;
+		return;
+	}
+
+	for (i = 0; i < RTE_VIRTIO_VPMD_RX_REARM_THRESH; i++) {
+		uintptr_t p;
+
+		p = (uintptr_t)&sw_ring[i]->rearm_data;
+		*(uint64_t *)p = rxvq->mbuf_initializer;
+
+		start_dp[i].addr =
+			(uint64_t)((uintptr_t)sw_ring[i]->buf_physaddr +
+			RTE_PKTMBUF_HEADROOM - sizeof(struct virtio_net_hdr));
+		start_dp[i].len = sw_ring[i]->buf_len -
+			RTE_PKTMBUF_HEADROOM + sizeof(struct virtio_net_hdr);
+	}
+
+	rxvq->vq_avail_idx += RTE_VIRTIO_VPMD_RX_REARM_THRESH;
+	rxvq->vq_free_cnt -= RTE_VIRTIO_VPMD_RX_REARM_THRESH;
+	vq_update_avail_idx(rxvq);
+}
+
+/* virtio vPMD receive routine, only accept(nb_pkts >= RTE_VIRTIO_DESC_PER_LOOP)
+ *
+ * This routine is for non-mergable RX, one desc for each guest buffer.
+ * This routine is based on the RX ring layout optimization. Each entry in the
+ * avail ring points to the desc with the same index in the desc ring and this
+ * will never be changed in the driver.
+ *
+ * - nb_pkts < RTE_VIRTIO_DESC_PER_LOOP, just return no packet
+ */
+uint16_t
+virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
+	uint16_t nb_pkts)
+{
+	struct virtqueue *rxvq = rx_queue;
+	uint16_t nb_used;
+	uint16_t desc_idx;
+	struct vring_used_elem *rused;
+	struct rte_mbuf **sw_ring;
+	struct rte_mbuf **sw_ring_end;
+	uint16_t nb_pkts_received;
+	__m128i shuf_msk1, shuf_msk2, len_adjust;
+
+	shuf_msk1 = _mm_set_epi8(
+		0xFF, 0xFF, 0xFF, 0xFF,
+		0xFF, 0xFF,		/* vlan tci */
+		5, 4,			/* dat len */
+		0xFF, 0xFF, 5, 4,	/* pkt len */
+		0xFF, 0xFF, 0xFF, 0xFF	/* packet type */
+
+	);
+
+	shuf_msk2 = _mm_set_epi8(
+		0xFF, 0xFF, 0xFF, 0xFF,
+		0xFF, 0xFF,		/* vlan tci */
+		13, 12,			/* dat len */
+		0xFF, 0xFF, 13, 12,	/* pkt len */
+		0xFF, 0xFF, 0xFF, 0xFF	/* packet type */
+	);
+
+	/* Substract the header length.
+	*  In which case do we need the header length in used->len ?
+	*/
+	len_adjust = _mm_set_epi16(
+		0, 0,
+		0,
+		(uint16_t) -sizeof(struct virtio_net_hdr),
+		0, (uint16_t) -sizeof(struct virtio_net_hdr),
+		0, 0);
+
+	if (unlikely(nb_pkts < RTE_VIRTIO_DESC_PER_LOOP))
+		return 0;
+
+	nb_used = *(volatile uint16_t *)&rxvq->vq_ring.used->idx -
+		rxvq->vq_used_cons_idx;
+
+	rte_compiler_barrier();
+
+	if (unlikely(nb_used == 0))
+		return 0;
+
+	nb_pkts = RTE_ALIGN_FLOOR(nb_pkts, RTE_VIRTIO_DESC_PER_LOOP);
+	nb_used = RTE_MIN(nb_used, nb_pkts);
+
+	desc_idx = (uint16_t)(rxvq->vq_used_cons_idx & (rxvq->vq_nentries - 1));
+	rused = &rxvq->vq_ring.used->ring[desc_idx];
+	sw_ring  = &rxvq->sw_ring[desc_idx];
+	sw_ring_end = &rxvq->sw_ring[rxvq->vq_nentries];
+
+	_mm_prefetch((const void *)rused, _MM_HINT_T0);
+
+	if (rxvq->vq_free_cnt >= RTE_VIRTIO_VPMD_RX_REARM_THRESH) {
+		virtio_rxq_rearm_vec(rxvq);
+		if (unlikely(virtqueue_kick_prepare(rxvq)))
+			virtqueue_notify(rxvq);
+	}
+
+	for (nb_pkts_received = 0;
+		nb_pkts_received < nb_used;) {
+		__m128i desc[RTE_VIRTIO_DESC_PER_LOOP / 2];
+		__m128i mbp[RTE_VIRTIO_DESC_PER_LOOP / 2];
+		__m128i pkt_mb[RTE_VIRTIO_DESC_PER_LOOP];
+
+		mbp[0] = _mm_loadu_si128((__m128i *)(sw_ring + 0));
+		desc[0] = _mm_loadu_si128((__m128i *)(rused + 0));
+		_mm_storeu_si128((__m128i *)&rx_pkts[0], mbp[0]);
+
+		mbp[1] = _mm_loadu_si128((__m128i *)(sw_ring + 2));
+		desc[1] = _mm_loadu_si128((__m128i *)(rused + 2));
+		_mm_storeu_si128((__m128i *)&rx_pkts[2], mbp[1]);
+
+		mbp[2] = _mm_loadu_si128((__m128i *)(sw_ring + 4));
+		desc[2] = _mm_loadu_si128((__m128i *)(rused + 4));
+		_mm_storeu_si128((__m128i *)&rx_pkts[4], mbp[2]);
+
+		mbp[3] = _mm_loadu_si128((__m128i *)(sw_ring + 6));
+		desc[3] = _mm_loadu_si128((__m128i *)(rused + 6));
+		_mm_storeu_si128((__m128i *)&rx_pkts[6], mbp[3]);
+
+		pkt_mb[1] = _mm_shuffle_epi8(desc[0], shuf_msk2);
+		pkt_mb[0] = _mm_shuffle_epi8(desc[0], shuf_msk1);
+		pkt_mb[1] = _mm_add_epi16(pkt_mb[1], len_adjust);
+		pkt_mb[0] = _mm_add_epi16(pkt_mb[0], len_adjust);
+		_mm_storeu_si128((void *)&rx_pkts[1]->rx_descriptor_fields1,
+			pkt_mb[1]);
+		_mm_storeu_si128((void *)&rx_pkts[0]->rx_descriptor_fields1,
+			pkt_mb[0]);
+
+		pkt_mb[3] = _mm_shuffle_epi8(desc[1], shuf_msk2);
+		pkt_mb[2] = _mm_shuffle_epi8(desc[1], shuf_msk1);
+		pkt_mb[3] = _mm_add_epi16(pkt_mb[3], len_adjust);
+		pkt_mb[2] = _mm_add_epi16(pkt_mb[2], len_adjust);
+		_mm_storeu_si128((void *)&rx_pkts[3]->rx_descriptor_fields1,
+			pkt_mb[3]);
+		_mm_storeu_si128((void *)&rx_pkts[2]->rx_descriptor_fields1,
+			pkt_mb[2]);
+
+		pkt_mb[5] = _mm_shuffle_epi8(desc[2], shuf_msk2);
+		pkt_mb[4] = _mm_shuffle_epi8(desc[2], shuf_msk1);
+		pkt_mb[5] = _mm_add_epi16(pkt_mb[5], len_adjust);
+		pkt_mb[4] = _mm_add_epi16(pkt_mb[4], len_adjust);
+		_mm_storeu_si128((void *)&rx_pkts[5]->rx_descriptor_fields1,
+			pkt_mb[5]);
+		_mm_storeu_si128((void *)&rx_pkts[4]->rx_descriptor_fields1,
+			pkt_mb[4]);
+
+		pkt_mb[7] = _mm_shuffle_epi8(desc[3], shuf_msk2);
+		pkt_mb[6] = _mm_shuffle_epi8(desc[3], shuf_msk1);
+		pkt_mb[7] = _mm_add_epi16(pkt_mb[7], len_adjust);
+		pkt_mb[6] = _mm_add_epi16(pkt_mb[6], len_adjust);
+		_mm_storeu_si128((void *)&rx_pkts[7]->rx_descriptor_fields1,
+			pkt_mb[7]);
+		_mm_storeu_si128((void *)&rx_pkts[6]->rx_descriptor_fields1,
+			pkt_mb[6]);
+
+		if (unlikely(nb_used <= RTE_VIRTIO_DESC_PER_LOOP)) {
+			if (sw_ring + nb_used <= sw_ring_end)
+				nb_pkts_received += nb_used;
+			else
+				nb_pkts_received += sw_ring_end - sw_ring;
+			break;
+		} else {
+			if (unlikely(sw_ring + RTE_VIRTIO_DESC_PER_LOOP >=
+				sw_ring_end)) {
+				nb_pkts_received += sw_ring_end - sw_ring;
+				break;
+			} else {
+				nb_pkts_received += RTE_VIRTIO_DESC_PER_LOOP;
+
+				rx_pkts += RTE_VIRTIO_DESC_PER_LOOP;
+				sw_ring += RTE_VIRTIO_DESC_PER_LOOP;
+				rused   += RTE_VIRTIO_DESC_PER_LOOP;
+				nb_used -= RTE_VIRTIO_DESC_PER_LOOP;
+			}
+		}
+	}
+
+	rxvq->vq_used_cons_idx += nb_pkts_received;
+	rxvq->vq_free_cnt += nb_pkts_received;
+	rxvq->packets += nb_pkts_received;
+	return nb_pkts_received;
+}
+
+int __attribute__((cold))
+virtio_rxq_vec_setup(struct virtqueue *rxq)
+{
+	uintptr_t p;
+	struct rte_mbuf mb_def = { .buf_addr = 0 }; /* zeroed mbuf */
+
+	mb_def.nb_segs = 1;
+	mb_def.data_off = RTE_PKTMBUF_HEADROOM;
+	mb_def.port = rxq->port_id;
+	rte_mbuf_refcnt_set(&mb_def, 1);
+
+	/* prevent compiler reordering: rearm_data covers previous fields */
+	rte_compiler_barrier();
+	p = (uintptr_t)&mb_def.rearm_data;
+	rxq->mbuf_initializer = *(uint64_t *)p;
+
+	return 0;
+}
diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
index 6a1ec48..98a77d5 100644
--- a/drivers/net/virtio/virtqueue.h
+++ b/drivers/net/virtio/virtqueue.h
@@ -188,6 +188,7 @@ struct virtqueue {
 	 */
 	uint16_t vq_used_cons_idx;
 	uint16_t vq_avail_idx;
+	uint64_t mbuf_initializer; /**< value to init mbufs. */
 	phys_addr_t virtio_net_hdr_mem; /**< hdr for each xmit packet */
 
 	struct rte_mbuf **sw_ring; /**< RX software ring. */
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH v4 6/7] virtio: simple tx routine
  2015-10-22 12:09 ` [dpdk-dev] [PATCH v4 0/7] virtio ring layout optimization and simple rx/tx processing Huawei Xie
                     ` (4 preceding siblings ...)
  2015-10-22 12:09   ` [dpdk-dev] [PATCH v4 5/7] virtio: virtio vec rx Huawei Xie
@ 2015-10-22 12:09   ` Huawei Xie
  2015-10-22 16:57     ` Stephen Hemminger
  2015-10-22 12:09   ` [dpdk-dev] [PATCH v4 7/7] virtio: pick simple rx/tx func Huawei Xie
  6 siblings, 1 reply; 92+ messages in thread
From: Huawei Xie @ 2015-10-22 12:09 UTC (permalink / raw)
  To: dev

Changes in v4:
- move virtio_xmit_cleanup ahead to free descriptors earlier

Changes in v3:
- Remove return at the end of void function
- Remove always_inline attribute for virtio_xmit_cleanup
bulk free of mbufs when clean used ring.
shift operation of idx could be saved if vq_free_cnt means
free slots rather than free descriptors.

TODO: rearrange vq data structure, pack the stats var together so that we
could use one vec instruction to update all of them.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 drivers/net/virtio/virtio_ethdev.h      |  3 ++
 drivers/net/virtio/virtio_rxtx_simple.c | 93 +++++++++++++++++++++++++++++++++
 2 files changed, 96 insertions(+)

diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h
index d7797ab..ae2d47d 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -111,6 +111,9 @@ uint16_t virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 uint16_t virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t virtio_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 /*
  * The VIRTIO_NET_F_GUEST_TSO[46] features permit the host to send us
  * frames larger than 1514 bytes. We do not yet support software LRO
diff --git a/drivers/net/virtio/virtio_rxtx_simple.c b/drivers/net/virtio/virtio_rxtx_simple.c
index ef17562..79b4f7f 100644
--- a/drivers/net/virtio/virtio_rxtx_simple.c
+++ b/drivers/net/virtio/virtio_rxtx_simple.c
@@ -288,6 +288,99 @@ virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
 	return nb_pkts_received;
 }
 
+#define VIRTIO_TX_FREE_THRESH 32
+#define VIRTIO_TX_MAX_FREE_BUF_SZ 32
+#define VIRTIO_TX_FREE_NR 32
+/* TODO: vq->tx_free_cnt could mean num of free slots so we could avoid shift */
+static inline void
+virtio_xmit_cleanup(struct virtqueue *vq)
+{
+	uint16_t i, desc_idx;
+	int nb_free = 0;
+	struct rte_mbuf *m, *free[VIRTIO_TX_MAX_FREE_BUF_SZ];
+
+	desc_idx = (uint16_t)(vq->vq_used_cons_idx &
+		((vq->vq_nentries >> 1) - 1));
+	free[0] = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
+	nb_free = 1;
+
+	for (i = 1; i < VIRTIO_TX_FREE_NR; i++) {
+		m = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
+		if (likely(m->pool == free[0]->pool))
+			free[nb_free++] = m;
+		else {
+			rte_mempool_put_bulk(free[0]->pool, (void **)free,
+				nb_free);
+			free[0] = m;
+			nb_free = 1;
+		}
+	}
+
+	rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
+	vq->vq_used_cons_idx += VIRTIO_TX_FREE_NR;
+	vq->vq_free_cnt += (VIRTIO_TX_FREE_NR << 1);
+}
+
+uint16_t
+virtio_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
+	uint16_t nb_pkts)
+{
+	struct virtqueue *txvq = tx_queue;
+	uint16_t nb_used;
+	uint16_t desc_idx;
+	struct vring_desc *start_dp;
+	uint16_t nb_tail, nb_commit;
+	int i;
+	uint16_t desc_idx_max = (txvq->vq_nentries >> 1) - 1;
+
+	nb_used = VIRTQUEUE_NUSED(txvq);
+	rte_compiler_barrier();
+
+	if (nb_used >= VIRTIO_TX_FREE_THRESH)
+		virtio_xmit_cleanup(tx_queue);
+
+	nb_commit = nb_pkts = RTE_MIN((txvq->vq_free_cnt >> 1), nb_pkts);
+	desc_idx = (uint16_t) (txvq->vq_avail_idx & desc_idx_max);
+	start_dp = txvq->vq_ring.desc;
+	nb_tail = (uint16_t) (desc_idx_max + 1 - desc_idx);
+
+	if (nb_commit >= nb_tail) {
+		for (i = 0; i < nb_tail; i++)
+			txvq->vq_descx[desc_idx + i].cookie = tx_pkts[i];
+		for (i = 0; i < nb_tail; i++) {
+			start_dp[desc_idx].addr =
+				RTE_MBUF_DATA_DMA_ADDR(*tx_pkts);
+			start_dp[desc_idx].len = (*tx_pkts)->pkt_len;
+			tx_pkts++;
+			desc_idx++;
+		}
+		nb_commit -= nb_tail;
+		desc_idx = 0;
+	}
+	for (i = 0; i < nb_commit; i++)
+		txvq->vq_descx[desc_idx + i].cookie = tx_pkts[i];
+	for (i = 0; i < nb_commit; i++) {
+		start_dp[desc_idx].addr = RTE_MBUF_DATA_DMA_ADDR(*tx_pkts);
+		start_dp[desc_idx].len = (*tx_pkts)->pkt_len;
+		tx_pkts++;
+		desc_idx++;
+	}
+
+	rte_compiler_barrier();
+
+	txvq->vq_free_cnt -= (uint16_t)(nb_pkts << 1);
+	txvq->vq_avail_idx += nb_pkts;
+	txvq->vq_ring.avail->idx = txvq->vq_avail_idx;
+	txvq->packets += nb_pkts;
+
+	if (likely(nb_pkts)) {
+		if (unlikely(virtqueue_kick_prepare(txvq)))
+			virtqueue_notify(txvq);
+	}
+
+	return nb_pkts;
+}
+
 int __attribute__((cold))
 virtio_rxq_vec_setup(struct virtqueue *rxq)
 {
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH v4 7/7] virtio: pick simple rx/tx func
  2015-10-22 12:09 ` [dpdk-dev] [PATCH v4 0/7] virtio ring layout optimization and simple rx/tx processing Huawei Xie
                     ` (5 preceding siblings ...)
  2015-10-22 12:09   ` [dpdk-dev] [PATCH v4 6/7] virtio: simple tx routine Huawei Xie
@ 2015-10-22 12:09   ` Huawei Xie
  2015-10-22 16:58     ` Stephen Hemminger
  6 siblings, 1 reply; 92+ messages in thread
From: Huawei Xie @ 2015-10-22 12:09 UTC (permalink / raw)
  To: dev

Changes in v4:
Check merge-able feature when select simple rx/tx functions.

simple rx/tx func is chose when merge-able rx is disabled and user specifies single segment and
no offload support.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 drivers/net/virtio/virtio_rxtx.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 947fc46..0f1daf2 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -53,6 +53,7 @@
 
 #include "virtio_logs.h"
 #include "virtio_ethdev.h"
+#include "virtio_pci.h"
 #include "virtqueue.h"
 #include "virtio_rxtx.h"
 
@@ -62,6 +63,10 @@
 #define  VIRTIO_DUMP_PACKET(m, len) do { } while (0)
 #endif
 
+
+#define VIRTIO_SIMPLE_FLAGS ((uint32_t)ETH_TXQ_FLAGS_NOMULTSEGS | \
+	ETH_TXQ_FLAGS_NOOFFLOADS)
+
 static int use_simple_rxtx;
 
 static void
@@ -459,6 +464,7 @@ virtio_dev_tx_queue_setup(struct rte_eth_dev *dev,
 			const struct rte_eth_txconf *tx_conf)
 {
 	uint8_t vtpci_queue_idx = 2 * queue_idx + VTNET_SQ_TQ_QUEUE_IDX;
+	struct virtio_hw *hw = dev->data->dev_private;
 	struct virtqueue *vq;
 	uint16_t tx_free_thresh;
 	int ret;
@@ -471,6 +477,15 @@ virtio_dev_tx_queue_setup(struct rte_eth_dev *dev,
 		return -EINVAL;
 	}
 
+	/* Use simple rx/tx func if single segment and no offloads */
+	if ((tx_conf->txq_flags & VIRTIO_SIMPLE_FLAGS) == VIRTIO_SIMPLE_FLAGS &&
+	     !vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF)) {
+		PMD_INIT_LOG(INFO, "Using simple rx/tx path");
+		dev->tx_pkt_burst = virtio_xmit_pkts_simple;
+		dev->rx_pkt_burst = virtio_recv_pkts_vec;
+		use_simple_rxtx = 1;
+	}
+
 	ret = virtio_dev_queue_setup(dev, VTNET_TQ, queue_idx, vtpci_queue_idx,
 			nb_desc, socket_id, &vq);
 	if (ret < 0) {
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [dpdk-dev] [PATCH v4 6/7] virtio: simple tx routine
  2015-10-22 12:09   ` [dpdk-dev] [PATCH v4 6/7] virtio: simple tx routine Huawei Xie
@ 2015-10-22 16:57     ` Stephen Hemminger
  2015-10-23  2:17       ` Xie, Huawei
  0 siblings, 1 reply; 92+ messages in thread
From: Stephen Hemminger @ 2015-10-22 16:57 UTC (permalink / raw)
  To: Huawei Xie; +Cc: dev

On Thu, 22 Oct 2015 20:09:50 +0800
Huawei Xie <huawei.xie@intel.com> wrote:

> Changes in v4:
> - move virtio_xmit_cleanup ahead to free descriptors earlier
> 
> Changes in v3:
> - Remove return at the end of void function
> - Remove always_inline attribute for virtio_xmit_cleanup
> bulk free of mbufs when clean used ring.
> shift operation of idx could be saved if vq_free_cnt means
> free slots rather than free descriptors.
> 
> TODO: rearrange vq data structure, pack the stats var together so that we
> could use one vec instruction to update all of them.
> 
> Signed-off-by: Huawei Xie <huawei.xie@intel.com>
> ---
>  drivers/net/virtio/virtio_ethdev.h      |  3 ++
>  drivers/net/virtio/virtio_rxtx_simple.c | 93 +++++++++++++++++++++++++++++++++
>  2 files changed, 96 insertions(+)
> 
> diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h
> index d7797ab..ae2d47d 100644
> --- a/drivers/net/virtio/virtio_ethdev.h
> +++ b/drivers/net/virtio/virtio_ethdev.h
> @@ -111,6 +111,9 @@ uint16_t virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>  uint16_t virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
>  		uint16_t nb_pkts);
>  
> +uint16_t virtio_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
> +		uint16_t nb_pkts);
> +
>  /*
>   * The VIRTIO_NET_F_GUEST_TSO[46] features permit the host to send us
>   * frames larger than 1514 bytes. We do not yet support software LRO
> diff --git a/drivers/net/virtio/virtio_rxtx_simple.c b/drivers/net/virtio/virtio_rxtx_simple.c
> index ef17562..79b4f7f 100644
> --- a/drivers/net/virtio/virtio_rxtx_simple.c
> +++ b/drivers/net/virtio/virtio_rxtx_simple.c
> @@ -288,6 +288,99 @@ virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
>  	return nb_pkts_received;
>  }
>  
> +#define VIRTIO_TX_FREE_THRESH 32
> +#define VIRTIO_TX_MAX_FREE_BUF_SZ 32
> +#define VIRTIO_TX_FREE_NR 32
> +/* TODO: vq->tx_free_cnt could mean num of free slots so we could avoid shift */
> +static inline void
> +virtio_xmit_cleanup(struct virtqueue *vq)
> +{
> +	uint16_t i, desc_idx;
> +	int nb_free = 0;
> +	struct rte_mbuf *m, *free[VIRTIO_TX_MAX_FREE_BUF_SZ];
> +
> +	desc_idx = (uint16_t)(vq->vq_used_cons_idx &
> +		((vq->vq_nentries >> 1) - 1));
> +	free[0] = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
> +	nb_free = 1;
> +
> +	for (i = 1; i < VIRTIO_TX_FREE_NR; i++) {
> +		m = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
> +		if (likely(m->pool == free[0]->pool))
> +			free[nb_free++] = m;
> +		else {
> +			rte_mempool_put_bulk(free[0]->pool, (void **)free,
> +				nb_free);
> +			free[0] = m;
> +			nb_free = 1;
> +		}
> +	}
> +
> +	rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
> +	vq->vq_used_cons_idx += VIRTIO_TX_FREE_NR;
> +	vq->vq_free_cnt += (VIRTIO_TX_FREE_NR << 1);
> +}

I think you need to handle refcount, here is a similar patch
for ixgbe.

Subject: ixgbe: speed up transmit

Coalesce transmit buffers and put them back into the pool
in one burst.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -120,12 +120,16 @@ rte_rxmbuf_alloc(struct rte_mempool *mp)
  * Check for descriptors with their DD bit set and free mbufs.
  * Return the total number of buffers freed.
  */
+#define TX_FREE_BULK 32
+
 static inline int __attribute__((always_inline))
 ixgbe_tx_free_bufs(struct ixgbe_tx_queue *txq)
 {
 	struct ixgbe_tx_entry *txep;
 	uint32_t status;
-	int i;
+	int i, n = 0;
+	struct rte_mempool *txpool = NULL;
+	struct rte_mbuf *free_list[TX_FREE_BULK];
 
 	/* check DD bit on threshold descriptor */
 	status = txq->tx_ring[txq->tx_next_dd].wb.status;
@@ -138,20 +142,26 @@ ixgbe_tx_free_bufs(struct ixgbe_tx_queue
 	 */
 	txep = &(txq->sw_ring[txq->tx_next_dd - (txq->tx_rs_thresh - 1)]);
 
-	/* free buffers one at a time */
-	if ((txq->txq_flags & (uint32_t)ETH_TXQ_FLAGS_NOREFCOUNT) != 0) {
-		for (i = 0; i < txq->tx_rs_thresh; ++i, ++txep) {
-			txep->mbuf->next = NULL;
-			rte_mempool_put(txep->mbuf->pool, txep->mbuf);
-			txep->mbuf = NULL;
-		}
-	} else {
-		for (i = 0; i < txq->tx_rs_thresh; ++i, ++txep) {
-			rte_pktmbuf_free_seg(txep->mbuf);
-			txep->mbuf = NULL;
+	for (i = 0; i < txq->tx_rs_thresh; ++i, ++txep) {
+		struct rte_mbuf *m;
+
+		/* free buffers one at a time */
+		m = __rte_pktmbuf_prefree_seg(txep->mbuf);
+		txep->mbuf = NULL;
+
+		if (n >= TX_FREE_BULK  ||
+		    (n > 0 && m->pool != txpool)) {
+			rte_mempool_put_bulk(txpool, (void **)free_list, n);
+			n = 0;
 		}
+
+		txpool = m->pool;
+		free_list[n++] = m;
 	}
 
+	if (n > 0)
+		rte_mempool_put_bulk(txpool, (void **)free_list, n);
+
 	/* buffers were freed, update counters */
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + txq->tx_rs_thresh);
 	txq->tx_next_dd = (uint16_t)(txq->tx_next_dd + txq->tx_rs_thresh);

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [dpdk-dev] [PATCH v4 7/7] virtio: pick simple rx/tx func
  2015-10-22 12:09   ` [dpdk-dev] [PATCH v4 7/7] virtio: pick simple rx/tx func Huawei Xie
@ 2015-10-22 16:58     ` Stephen Hemminger
  2015-10-23  1:38       ` Xie, Huawei
  0 siblings, 1 reply; 92+ messages in thread
From: Stephen Hemminger @ 2015-10-22 16:58 UTC (permalink / raw)
  To: Huawei Xie; +Cc: dev

On Thu, 22 Oct 2015 20:09:51 +0800
Huawei Xie <huawei.xie@intel.com> wrote:

> +	/* Use simple rx/tx func if single segment and no offloads */
> +	if ((tx_conf->txq_flags & VIRTIO_SIMPLE_FLAGS) == VIRTIO_SIMPLE_FLAGS &&
> +	     !vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF))

Since with QEMU/KVM the code will negotiate to use MRG_RXBUF, this
code path will not get used in common case anyway.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [dpdk-dev] [PATCH v4 7/7] virtio: pick simple rx/tx func
  2015-10-22 16:58     ` Stephen Hemminger
@ 2015-10-23  1:38       ` Xie, Huawei
  0 siblings, 0 replies; 92+ messages in thread
From: Xie, Huawei @ 2015-10-23  1:38 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

On 10/23/2015 12:59 AM, Stephen Hemminger wrote:
> On Thu, 22 Oct 2015 20:09:51 +0800
> Huawei Xie <huawei.xie@intel.com> wrote:
>
>> +	/* Use simple rx/tx func if single segment and no offloads */
>> +	if ((tx_conf->txq_flags & VIRTIO_SIMPLE_FLAGS) == VIRTIO_SIMPLE_FLAGS &&
>> +	     !vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF))
> Since with QEMU/KVM the code will negotiate to use MRG_RXBUF, this
> code path will not get used in common case anyway.
>
Yes, the common configuration is merge-able enabled.
We need to add mrg_rxbuf=off(in qemu command line), or disable
merge-able feature(in switch application) through vhost API for
non-mergable.



^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [dpdk-dev] [PATCH v4 6/7] virtio: simple tx routine
  2015-10-22 16:57     ` Stephen Hemminger
@ 2015-10-23  2:17       ` Xie, Huawei
  2015-10-23  2:20         ` Xie, Huawei
  0 siblings, 1 reply; 92+ messages in thread
From: Xie, Huawei @ 2015-10-23  2:17 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

On 10/23/2015 12:57 AM, Stephen Hemminger wrote:
> On Thu, 22 Oct 2015 20:09:50 +0800
> Huawei Xie <huawei.xie@intel.com> wrote:
>
>> Changes in v4:
>> - move virtio_xmit_cleanup ahead to free descriptors earlier
>>
>> Changes in v3:
>> - Remove return at the end of void function
>> - Remove always_inline attribute for virtio_xmit_cleanup
>> bulk free of mbufs when clean used ring.
>> shift operation of idx could be saved if vq_free_cnt means
>> free slots rather than free descriptors.
>>
>> TODO: rearrange vq data structure, pack the stats var together so that we
>> could use one vec instruction to update all of them.
>>
>> Signed-off-by: Huawei Xie <huawei.xie@intel.com>
>> ---
>>  drivers/net/virtio/virtio_ethdev.h      |  3 ++
>>  drivers/net/virtio/virtio_rxtx_simple.c | 93 +++++++++++++++++++++++++++++++++
>>  2 files changed, 96 insertions(+)
>>
>> diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h
>> index d7797ab..ae2d47d 100644
>> --- a/drivers/net/virtio/virtio_ethdev.h
>> +++ b/drivers/net/virtio/virtio_ethdev.h
>> @@ -111,6 +111,9 @@ uint16_t virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>>  uint16_t virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
>>  		uint16_t nb_pkts);
>>  
>> +uint16_t virtio_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
>> +		uint16_t nb_pkts);
>> +
>>  /*
>>   * The VIRTIO_NET_F_GUEST_TSO[46] features permit the host to send us
>>   * frames larger than 1514 bytes. We do not yet support software LRO
>> diff --git a/drivers/net/virtio/virtio_rxtx_simple.c b/drivers/net/virtio/virtio_rxtx_simple.c
>> index ef17562..79b4f7f 100644
>> --- a/drivers/net/virtio/virtio_rxtx_simple.c
>> +++ b/drivers/net/virtio/virtio_rxtx_simple.c
>> @@ -288,6 +288,99 @@ virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
>>  	return nb_pkts_received;
>>  }
>>  
>> +#define VIRTIO_TX_FREE_THRESH 32
>> +#define VIRTIO_TX_MAX_FREE_BUF_SZ 32
>> +#define VIRTIO_TX_FREE_NR 32
>> +/* TODO: vq->tx_free_cnt could mean num of free slots so we could avoid shift */
>> +static inline void
>> +virtio_xmit_cleanup(struct virtqueue *vq)
>> +{
>> +	uint16_t i, desc_idx;
>> +	int nb_free = 0;
>> +	struct rte_mbuf *m, *free[VIRTIO_TX_MAX_FREE_BUF_SZ];
>> +
>> +	desc_idx = (uint16_t)(vq->vq_used_cons_idx &
>> +		((vq->vq_nentries >> 1) - 1));
>> +	free[0] = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
>> +	nb_free = 1;
>> +
>> +	for (i = 1; i < VIRTIO_TX_FREE_NR; i++) {
>> +		m = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
>> +		if (likely(m->pool == free[0]->pool))
>> +			free[nb_free++] = m;
>> +		else {
>> +			rte_mempool_put_bulk(free[0]->pool, (void **)free,
>> +				nb_free);
>> +			free[0] = m;
>> +			nb_free = 1;
>> +		}
>> +	}
>> +
>> +	rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
>> +	vq->vq_used_cons_idx += VIRTIO_TX_FREE_NR;
>> +	vq->vq_free_cnt += (VIRTIO_TX_FREE_NR << 1);
>> +}
> I think you need to handle refcount, here is a similar patch
> for ixgbe.
ok, like this:

m = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
if (likely(m != NULL)) {
    ...

>
> Subject: ixgbe: speed up transmit
>
> Coalesce transmit buffers and put them back into the pool
> in one burst.
>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
>
> --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> @@ -120,12 +120,16 @@ rte_rxmbuf_alloc(struct rte_mempool *mp)
>   * Check for descriptors with their DD bit set and free mbufs.
>   * Return the total number of buffers freed.
>   */
> +#define TX_FREE_BULK 32
> +
>  static inline int __attribute__((always_inline))
>  ixgbe_tx_free_bufs(struct ixgbe_tx_queue *txq)
>  {
>  	struct ixgbe_tx_entry *txep;
>  	uint32_t status;
> -	int i;
> +	int i, n = 0;
> +	struct rte_mempool *txpool = NULL;
> +	struct rte_mbuf *free_list[TX_FREE_BULK];
>  
>  	/* check DD bit on threshold descriptor */
>  	status = txq->tx_ring[txq->tx_next_dd].wb.status;
> @@ -138,20 +142,26 @@ ixgbe_tx_free_bufs(struct ixgbe_tx_queue
>  	 */
>  	txep = &(txq->sw_ring[txq->tx_next_dd - (txq->tx_rs_thresh - 1)]);
>  
> -	/* free buffers one at a time */
> -	if ((txq->txq_flags & (uint32_t)ETH_TXQ_FLAGS_NOREFCOUNT) != 0) {
> -		for (i = 0; i < txq->tx_rs_thresh; ++i, ++txep) {
> -			txep->mbuf->next = NULL;
> -			rte_mempool_put(txep->mbuf->pool, txep->mbuf);
> -			txep->mbuf = NULL;
> -		}
> -	} else {
> -		for (i = 0; i < txq->tx_rs_thresh; ++i, ++txep) {
> -			rte_pktmbuf_free_seg(txep->mbuf);
> -			txep->mbuf = NULL;
> +	for (i = 0; i < txq->tx_rs_thresh; ++i, ++txep) {
> +		struct rte_mbuf *m;
> +
> +		/* free buffers one at a time */
> +		m = __rte_pktmbuf_prefree_seg(txep->mbuf);
> +		txep->mbuf = NULL;
> +
> +		if (n >= TX_FREE_BULK  ||
check whether m is NULL here.
> +		    (n > 0 && m->pool != txpool)) {
> +			rte_mempool_put_bulk(txpool, (void **)free_list, n);
> +			n = 0;
>  		}
> +
> +		txpool = m->pool;
> +		free_list[n++] = m;
>  	}
>  
> +	if (n > 0)
> +		rte_mempool_put_bulk(txpool, (void **)free_list, n);
> +
>  	/* buffers were freed, update counters */
>  	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + txq->tx_rs_thresh);
>  	txq->tx_next_dd = (uint16_t)(txq->tx_next_dd + txq->tx_rs_thresh);
>
>


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [dpdk-dev] [PATCH v4 6/7] virtio: simple tx routine
  2015-10-23  2:17       ` Xie, Huawei
@ 2015-10-23  2:20         ` Xie, Huawei
  0 siblings, 0 replies; 92+ messages in thread
From: Xie, Huawei @ 2015-10-23  2:20 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

On 10/23/2015 10:17 AM, Xie, Huawei wrote:
> On 10/23/2015 12:57 AM, Stephen Hemminger wrote:
>> On Thu, 22 Oct 2015 20:09:50 +0800
>> Huawei Xie <huawei.xie@intel.com> wrote:
>>
>>> Changes in v4:
>>> - move virtio_xmit_cleanup ahead to free descriptors earlier
>>>
>>> Changes in v3:
>>> - Remove return at the end of void function
>>> - Remove always_inline attribute for virtio_xmit_cleanup
>>> bulk free of mbufs when clean used ring.
>>> shift operation of idx could be saved if vq_free_cnt means
>>> free slots rather than free descriptors.
>>>
>>> TODO: rearrange vq data structure, pack the stats var together so that we
>>> could use one vec instruction to update all of them.
>>>
>>> Signed-off-by: Huawei Xie <huawei.xie@intel.com>
>>> ---
>>>  drivers/net/virtio/virtio_ethdev.h      |  3 ++
>>>  drivers/net/virtio/virtio_rxtx_simple.c | 93 +++++++++++++++++++++++++++++++++
>>>  2 files changed, 96 insertions(+)
>>>
>>> diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h
>>> index d7797ab..ae2d47d 100644
>>> --- a/drivers/net/virtio/virtio_ethdev.h
>>> +++ b/drivers/net/virtio/virtio_ethdev.h
>>> @@ -111,6 +111,9 @@ uint16_t virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>>>  uint16_t virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
>>>  		uint16_t nb_pkts);
>>>  
>>> +uint16_t virtio_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
>>> +		uint16_t nb_pkts);
>>> +
>>>  /*
>>>   * The VIRTIO_NET_F_GUEST_TSO[46] features permit the host to send us
>>>   * frames larger than 1514 bytes. We do not yet support software LRO
>>> diff --git a/drivers/net/virtio/virtio_rxtx_simple.c b/drivers/net/virtio/virtio_rxtx_simple.c
>>> index ef17562..79b4f7f 100644
>>> --- a/drivers/net/virtio/virtio_rxtx_simple.c
>>> +++ b/drivers/net/virtio/virtio_rxtx_simple.c
>>> @@ -288,6 +288,99 @@ virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
>>>  	return nb_pkts_received;
>>>  }
>>>  
>>> +#define VIRTIO_TX_FREE_THRESH 32
>>> +#define VIRTIO_TX_MAX_FREE_BUF_SZ 32
>>> +#define VIRTIO_TX_FREE_NR 32
>>> +/* TODO: vq->tx_free_cnt could mean num of free slots so we could avoid shift */
>>> +static inline void
>>> +virtio_xmit_cleanup(struct virtqueue *vq)
>>> +{
>>> +	uint16_t i, desc_idx;
>>> +	int nb_free = 0;
>>> +	struct rte_mbuf *m, *free[VIRTIO_TX_MAX_FREE_BUF_SZ];
>>> +
>>> +	desc_idx = (uint16_t)(vq->vq_used_cons_idx &
>>> +		((vq->vq_nentries >> 1) - 1));
>>> +	free[0] = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
>>> +	nb_free = 1;
>>> +
>>> +	for (i = 1; i < VIRTIO_TX_FREE_NR; i++) {
>>> +		m = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
>>> +		if (likely(m->pool == free[0]->pool))
>>> +			free[nb_free++] = m;
>>> +		else {
>>> +			rte_mempool_put_bulk(free[0]->pool, (void **)free,
>>> +				nb_free);
>>> +			free[0] = m;
>>> +			nb_free = 1;
>>> +		}
>>> +	}
>>> +
>>> +	rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
>>> +	vq->vq_used_cons_idx += VIRTIO_TX_FREE_NR;
>>> +	vq->vq_free_cnt += (VIRTIO_TX_FREE_NR << 1);
>>> +}
>> I think you need to handle refcount, here is a similar patch
>> for ixgbe.
> ok, like this:
>
> m = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
Missed a line
  m = __rte_pktmbuf_prefree_seg(m)
> if (likely(m != NULL)) {
>     ...
>
>> Subject: ixgbe: speed up transmit
>>
>> Coalesce transmit buffers and put them back into the pool
>> in one burst.
>>
>> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
>>
>> --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
>> +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
>> @@ -120,12 +120,16 @@ rte_rxmbuf_alloc(struct rte_mempool *mp)
>>   * Check for descriptors with their DD bit set and free mbufs.
>>   * Return the total number of buffers freed.
>>   */
>> +#define TX_FREE_BULK 32
>> +
>>  static inline int __attribute__((always_inline))
>>  ixgbe_tx_free_bufs(struct ixgbe_tx_queue *txq)
>>  {
>>  	struct ixgbe_tx_entry *txep;
>>  	uint32_t status;
>> -	int i;
>> +	int i, n = 0;
>> +	struct rte_mempool *txpool = NULL;
>> +	struct rte_mbuf *free_list[TX_FREE_BULK];
>>  
>>  	/* check DD bit on threshold descriptor */
>>  	status = txq->tx_ring[txq->tx_next_dd].wb.status;
>> @@ -138,20 +142,26 @@ ixgbe_tx_free_bufs(struct ixgbe_tx_queue
>>  	 */
>>  	txep = &(txq->sw_ring[txq->tx_next_dd - (txq->tx_rs_thresh - 1)]);
>>  
>> -	/* free buffers one at a time */
>> -	if ((txq->txq_flags & (uint32_t)ETH_TXQ_FLAGS_NOREFCOUNT) != 0) {
>> -		for (i = 0; i < txq->tx_rs_thresh; ++i, ++txep) {
>> -			txep->mbuf->next = NULL;
>> -			rte_mempool_put(txep->mbuf->pool, txep->mbuf);
>> -			txep->mbuf = NULL;
>> -		}
>> -	} else {
>> -		for (i = 0; i < txq->tx_rs_thresh; ++i, ++txep) {
>> -			rte_pktmbuf_free_seg(txep->mbuf);
>> -			txep->mbuf = NULL;
>> +	for (i = 0; i < txq->tx_rs_thresh; ++i, ++txep) {
>> +		struct rte_mbuf *m;
>> +
>> +		/* free buffers one at a time */
>> +		m = __rte_pktmbuf_prefree_seg(txep->mbuf);
>> +		txep->mbuf = NULL;
>> +
>> +		if (n >= TX_FREE_BULK  ||
> check whether m is NULL here.
>> +		    (n > 0 && m->pool != txpool)) {
>> +			rte_mempool_put_bulk(txpool, (void **)free_list, n);
>> +			n = 0;
>>  		}
>> +
>> +		txpool = m->pool;
>> +		free_list[n++] = m;
>>  	}
>>  
>> +	if (n > 0)
>> +		rte_mempool_put_bulk(txpool, (void **)free_list, n);
>> +
>>  	/* buffers were freed, update counters */
>>  	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + txq->tx_rs_thresh);
>>  	txq->tx_next_dd = (uint16_t)(txq->tx_next_dd + txq->tx_rs_thresh);
>>
>>
>


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [dpdk-dev] [PATCH v4 4/7] virtio: fill RX avail ring with blank mbufs
  2015-10-22 12:09   ` [dpdk-dev] [PATCH v4 4/7] virtio: fill RX avail ring with blank mbufs Huawei Xie
@ 2015-10-23  5:56     ` Tan, Jianfeng
  2015-10-25 15:40       ` Xie, Huawei
  0 siblings, 1 reply; 92+ messages in thread
From: Tan, Jianfeng @ 2015-10-23  5:56 UTC (permalink / raw)
  To: Xie, Huawei, dev

On 10/23/2015 1:51 PM, Jianfeng wrote:

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Huawei Xie
> Sent: Thursday, October 22, 2015 8:10 PM
> To: dev@dpdk.org
> Subject: [dpdk-dev] [PATCH v4 4/7] virtio: fill RX avail ring with blank mbufs
> +int __attribute__((cold))
> +virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,
> +	struct rte_mbuf *cookie)
> +{
> +	struct vq_desc_extra *dxp;
> +	struct vring_desc *start_dp;
> +	uint16_t desc_idx;
> +
> +	desc_idx = vq->vq_avail_idx & (vq->vq_nentries - 1);
> +	dxp = &vq->vq_descx[desc_idx];
> +	dxp->cookie = (void *)cookie;
> +	vq->sw_ring[desc_idx] = cookie;
> +
> +	start_dp = vq->vq_ring.desc;
> +	start_dp[desc_idx].addr = (uint64_t)((uintptr_t)cookie-
> >buf_physaddr +
> +		RTE_PKTMBUF_HEADROOM - sizeof(struct virtio_net_hdr));

Please use RTE_MBUF_DATA_DMA_ADDR instead of "buf_physaddr + RTE_PKTMBUF_HEADROOM".

> +	start_dp[desc_idx].len = cookie->buf_len -
> +		RTE_PKTMBUF_HEADROOM + sizeof(struct virtio_net_hdr);
> +
> +	vq->vq_free_cnt--;
> +	vq->vq_avail_idx++;
> +
> +	return 0;
> +}
> --
> 1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH v5 0/7] virtio ring layout optimization and simple rx/tx processing
  2015-09-29 14:45 [dpdk-dev] [PATCH 0/8] virtio: virtio ring layout optimization and RX vector processing Huawei Xie
                   ` (12 preceding siblings ...)
  2015-10-22 12:09 ` [dpdk-dev] [PATCH v4 0/7] virtio ring layout optimization and simple rx/tx processing Huawei Xie
@ 2015-10-25 15:34 ` Huawei Xie
  2015-10-25 15:34   ` [dpdk-dev] [PATCH v5 1/7] virtio: add virtio_rxtx.h header file Huawei Xie
                     ` (7 more replies)
  2015-10-29 14:53 ` [dpdk-dev] [PATCH v6 0/8] " Huawei Xie
  14 siblings, 8 replies; 92+ messages in thread
From: Huawei Xie @ 2015-10-25 15:34 UTC (permalink / raw)
  To: dev

Changes in v5:
- Call __rte_pktmbuf_prefree_seg to check refcnt when free mbufs

Changes in v4:
- Fix the error in virtio tx ring layout ascii chart in the commit message
- Move virtio_xmit_cleanup ahead to free descriptors earlier
- Test merge-able feature when select simple rx/tx functions

Changes in v3:
- Remove unnecessary NULL test for rte_free
- Remove unnecessary assign of local var after free
- Remove return at the end of void function
- Remove always_inline attribute for virtio_xmit_cleanup
- Reword some commit messages
- Add TODO in the commit message of simple tx patch

Changes in v2:
- Remove the configure macro
- Enable simple R/TX processing when user specifies simple txq flags
- Reword some comments and commit messages

In DPDK based switching enviroment, mostly vhost runs on a dedicated core
while virtio processing in guest VMs runs on other different cores.
Take RX for example, with generic implementation, for each guest buffer,
a) virtio driver allocates a descriptor from free descriptor list
b) modify the entry of avail ring to point to allocated descriptor
c) after packet is received, free the descriptor

When vhost fetches the avail ring, it need to fetch the modified L1 cache from
virtio core, which is a heavy cost in current CPU implementation.

This idea of this optimization is:
    allocate the fixed descriptor for each entry of avail ring, so avail ring will
always be the same during the run.
This removes L1M cache transfer from virtio core to vhost core for avail ring.
(Note we couldn't avoid the cache transfer for descriptors).
Besides, descriptor allocation and free operation is eliminated.
This also makes vector procesing possible to further accelerate the processing.

This is the layout for the avail ring(take 256 ring entries for example), with
each entry pointing to the descriptor with the same index.
                    avail
                    idx
                    +
                    |
+----+----+---+-------------+------+
| 0  | 1  | 2 | ... |  254  | 255  |  avail ring
+-+--+-+--+-+-+---------+---+--+---+
  |    |    |       |   |      |
  |    |    |       |   |      |
  v    v    v       |   v      v
+-+--+-+--+-+-+---------+---+--+---+
| 0  | 1  | 2 | ... |  254  | 255  |  desc ring
+----+----+---+-------------+------+
                    |
                    |
+----+----+---+-------------+------+
| 0  | 1  | 2 |     |  254  | 255  |  used ring
+----+----+---+-------------+------+
                    |
                    +

This is the ring layout for TX.
As we need one virtio header for each xmit packet, we have 128 slots available.

                         ++
                         ||
                         ||
+-----+-----+-----+--------------+------+------+------+
|  0  |  1  | ... |  127 || 128  | 129  | ...  | 255  |   avail ring
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
   |     |            |  ||  |      |             |
   v     v            v  ||  v      v             v
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
| 127 | 128 | ... |  255 || 127  | 128  | ...  | 255  |   desc ring for virtio_net_hdr
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
   |     |            |  ||  |      |             |
   v     v            v  ||  v      v             v
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
|  0  |  1  | ... |  127 ||  0   |  1   | ...  | 127  |   desc ring for tx dat
+-----+-----+-----+--------------+------+------+------+
                         ||
                         ||
                         ++


Performance boost could be observed only if the virtio backend isn't the bottleneck or in VM2VM
case.
There are also several vhost optimization patches to be submitted later.


Huawei Xie (7):
  virtio: add virtio_rxtx.h header file
  virtio: add software rx ring, fake_buf into virtqueue
  virtio: rx/tx ring layout optimization
  virtio: fill RX avail ring with blank mbufs
  virtio: virtio vec rx
  virtio: simple tx routine
  virtio: pick simple rx/tx func

 drivers/net/virtio/Makefile             |   2 +-
 drivers/net/virtio/virtio_ethdev.c      |  12 +-
 drivers/net/virtio/virtio_ethdev.h      |   5 +
 drivers/net/virtio/virtio_rxtx.c        |  56 ++++-
 drivers/net/virtio/virtio_rxtx.h        |  39 +++
 drivers/net/virtio/virtio_rxtx_simple.c | 414 ++++++++++++++++++++++++++++++++
 drivers/net/virtio/virtqueue.h          |   5 +
 7 files changed, 529 insertions(+), 4 deletions(-)
 create mode 100644 drivers/net/virtio/virtio_rxtx.h
 create mode 100644 drivers/net/virtio/virtio_rxtx_simple.c

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH v5 1/7] virtio: add virtio_rxtx.h header file
  2015-10-25 15:34 ` [dpdk-dev] [PATCH v5 0/7] virtio ring layout optimization and simple rx/tx processing Huawei Xie
@ 2015-10-25 15:34   ` Huawei Xie
  2015-10-25 15:34   ` [dpdk-dev] [PATCH v5 2/7] virtio: add software rx ring, fake_buf into virtqueue Huawei Xie
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 92+ messages in thread
From: Huawei Xie @ 2015-10-25 15:34 UTC (permalink / raw)
  To: dev

Would move all rx/tx related declarations into this header file in future.
Add RTE_VIRTIO_PMD_MAX_BURST.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 drivers/net/virtio/virtio_ethdev.c |  1 +
 drivers/net/virtio/virtio_rxtx.c   |  1 +
 drivers/net/virtio/virtio_rxtx.h   | 34 ++++++++++++++++++++++++++++++++++
 3 files changed, 36 insertions(+)
 create mode 100644 drivers/net/virtio/virtio_rxtx.h

diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index 465d3cd..79a3640 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -61,6 +61,7 @@
 #include "virtio_pci.h"
 #include "virtio_logs.h"
 #include "virtqueue.h"
+#include "virtio_rxtx.h"
 
 
 static int eth_virtio_dev_init(struct rte_eth_dev *eth_dev);
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index c5b53bb..9324f7f 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -54,6 +54,7 @@
 #include "virtio_logs.h"
 #include "virtio_ethdev.h"
 #include "virtqueue.h"
+#include "virtio_rxtx.h"
 
 #ifdef RTE_LIBRTE_VIRTIO_DEBUG_DUMP
 #define VIRTIO_DUMP_PACKET(m, len) rte_pktmbuf_dump(stdout, m, len)
diff --git a/drivers/net/virtio/virtio_rxtx.h b/drivers/net/virtio/virtio_rxtx.h
new file mode 100644
index 0000000..a10aa69
--- /dev/null
+++ b/drivers/net/virtio/virtio_rxtx.h
@@ -0,0 +1,34 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#define RTE_PMD_VIRTIO_RX_MAX_BURST 64
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH v5 2/7] virtio: add software rx ring, fake_buf into virtqueue
  2015-10-25 15:34 ` [dpdk-dev] [PATCH v5 0/7] virtio ring layout optimization and simple rx/tx processing Huawei Xie
  2015-10-25 15:34   ` [dpdk-dev] [PATCH v5 1/7] virtio: add virtio_rxtx.h header file Huawei Xie
@ 2015-10-25 15:34   ` Huawei Xie
  2015-10-25 15:35   ` [dpdk-dev] [PATCH v5 3/7] virtio: rx/tx ring layout optimization Huawei Xie
                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 92+ messages in thread
From: Huawei Xie @ 2015-10-25 15:34 UTC (permalink / raw)
  To: dev

Changes in v3:
- Remove unnecessary NULL test for rte_free
- Remove unnecessary assign of local var vq after free

Add software RX ring in virtqueue.
Add fake_mbuf in virtqueue for wraparound processing.
Use global simple_rxtx to indicate whether simple rxtx is enabled

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 drivers/net/virtio/virtio_ethdev.c | 11 ++++++++++-
 drivers/net/virtio/virtio_rxtx.c   |  7 +++++++
 drivers/net/virtio/virtqueue.h     |  4 ++++
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index 79a3640..82676d3 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -247,8 +247,8 @@ virtio_dev_queue_release(struct virtqueue *vq) {
 		VIRTIO_WRITE_REG_2(hw, VIRTIO_PCI_QUEUE_SEL, vq->queue_id);
 		VIRTIO_WRITE_REG_4(hw, VIRTIO_PCI_QUEUE_PFN, 0);
 
+		rte_free(vq->sw_ring);
 		rte_free(vq);
-		vq = NULL;
 	}
 }
 
@@ -292,6 +292,9 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
 			dev->data->port_id, queue_idx);
 		vq = rte_zmalloc(vq_name, sizeof(struct virtqueue) +
 			vq_size * sizeof(struct vq_desc_extra), RTE_CACHE_LINE_SIZE);
+		vq->sw_ring = rte_zmalloc_socket("rxq->sw_ring",
+			(RTE_PMD_VIRTIO_RX_MAX_BURST + vq_size) *
+			sizeof(vq->sw_ring[0]), RTE_CACHE_LINE_SIZE, socket_id);
 	} else if (queue_type == VTNET_TQ) {
 		snprintf(vq_name, sizeof(vq_name), "port%d_tvq%d",
 			dev->data->port_id, queue_idx);
@@ -308,6 +311,12 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
 		PMD_INIT_LOG(ERR, "%s: Can not allocate virtqueue", __func__);
 		return (-ENOMEM);
 	}
+	if (queue_type == VTNET_RQ && vq->sw_ring == NULL) {
+		PMD_INIT_LOG(ERR, "%s: Can not allocate RX soft ring",
+			__func__);
+		rte_free(vq);
+		return -ENOMEM;
+	}
 
 	vq->hw = hw;
 	vq->port_id = dev->data->port_id;
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 9324f7f..5c00e9d 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -62,6 +62,8 @@
 #define  VIRTIO_DUMP_PACKET(m, len) do { } while (0)
 #endif
 
+static int use_simple_rxtx;
+
 static void
 vq_ring_free_chain(struct virtqueue *vq, uint16_t desc_idx)
 {
@@ -299,6 +301,11 @@ virtio_dev_vring_start(struct virtqueue *vq, int queue_type)
 		/* Allocate blank mbufs for the each rx descriptor */
 		nbufs = 0;
 		error = ENOSPC;
+
+		memset(&vq->fake_mbuf, 0, sizeof(vq->fake_mbuf));
+		for (i = 0; i < RTE_PMD_VIRTIO_RX_MAX_BURST; i++)
+			vq->sw_ring[vq->vq_nentries + i] = &vq->fake_mbuf;
+
 		while (!virtqueue_full(vq)) {
 			m = rte_rxmbuf_alloc(vq->mpool);
 			if (m == NULL)
diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
index 7789411..6a1ec48 100644
--- a/drivers/net/virtio/virtqueue.h
+++ b/drivers/net/virtio/virtqueue.h
@@ -190,6 +190,10 @@ struct virtqueue {
 	uint16_t vq_avail_idx;
 	phys_addr_t virtio_net_hdr_mem; /**< hdr for each xmit packet */
 
+	struct rte_mbuf **sw_ring; /**< RX software ring. */
+	/* dummy mbuf, for wraparound when processing RX ring. */
+	struct rte_mbuf fake_mbuf;
+
 	/* Statistics */
 	uint64_t	packets;
 	uint64_t	bytes;
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH v5 3/7] virtio: rx/tx ring layout optimization
  2015-10-25 15:34 ` [dpdk-dev] [PATCH v5 0/7] virtio ring layout optimization and simple rx/tx processing Huawei Xie
  2015-10-25 15:34   ` [dpdk-dev] [PATCH v5 1/7] virtio: add virtio_rxtx.h header file Huawei Xie
  2015-10-25 15:34   ` [dpdk-dev] [PATCH v5 2/7] virtio: add software rx ring, fake_buf into virtqueue Huawei Xie
@ 2015-10-25 15:35   ` Huawei Xie
  2015-10-25 15:35   ` [dpdk-dev] [PATCH v5 4/7] virtio: fill RX avail ring with blank mbufs Huawei Xie
                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 92+ messages in thread
From: Huawei Xie @ 2015-10-25 15:35 UTC (permalink / raw)
  To: dev

Changes in V4:
- fix the error in tx ring layout chart in this commit message.

In DPDK based switching envrioment, mostly vhost runs on a dedicated core
while virtio processing in guest VMs runs on different cores.
Take RX for example, with generic implementation, for each guest buffer,
a) virtio driver allocates a descriptor from free descriptor list
b) modify the entry of avail ring to point to allocated descriptor
c) after packet is received, free the descriptor

When vhost fetches the avail ring, it need to fetch the modified L1 cache from
virtio core, which is a heavy cost in current CPU implementation.

This idea of this optimization is:
    allocate the fixed descriptor for each entry of avail ring, so avail ring will
always be the same during the run.
This removes L1M cache transfer from virtio core to vhost core for avail ring.
(Note we couldn't avoid the cache transfer for descriptors).
Besides, descriptor allocation and free operation is eliminated.
This also makes vector procesing possible to further accelerate the processing.

This is the layout for the avail ring(take 256 ring entries for example), with
each entry pointing to the descriptor with the same index.
                    avail
                    idx
                    +
                    |
+----+----+---+-------------+------+
| 0  | 1  | 2 | ... |  254  | 255  |  avail ring
+-+--+-+--+-+-+---------+---+--+---+
  |    |    |       |   |      |
  |    |    |       |   |      |
  v    v    v       |   v      v
+-+--+-+--+-+-+---------+---+--+---+
| 0  | 1  | 2 | ... |  254  | 255  |  desc ring
+----+----+---+-------------+------+
                    |
                    |
+----+----+---+-------------+------+
| 0  | 1  | 2 |     |  254  | 255  |  used ring
+----+----+---+-------------+------+
                    |
                    +

This is the ring layout for TX.
As we need one virtio header for each xmit packet, we have 128 slots available.

                         ++
                         ||
                         ||
+-----+-----+-----+--------------+------+------+------+
|  0  |  1  | ... |  127 || 128  | 129  | ...  | 255  |   avail ring
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
   |     |            |  ||  |      |             |
   v     v            v  ||  v      v             v
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
| 128 | 129 | ... |  255 || 128  | 129  | ...  | 255  |   desc ring for virtio_net_hdr
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
   |     |            |  ||  |      |             |
   v     v            v  ||  v      v             v
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
|  0  |  1  | ... |  127 ||  0   |  1   | ...  | 127  |   desc ring for tx dat
+-----+-----+-----+--------------+------+------+------+
                         ||
                         ||
                         ++

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 drivers/net/virtio/virtio_rxtx.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 5c00e9d..7c82a6a 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -302,6 +302,12 @@ virtio_dev_vring_start(struct virtqueue *vq, int queue_type)
 		nbufs = 0;
 		error = ENOSPC;
 
+		if (use_simple_rxtx)
+			for (i = 0; i < vq->vq_nentries; i++) {
+				vq->vq_ring.avail->ring[i] = i;
+				vq->vq_ring.desc[i].flags = VRING_DESC_F_WRITE;
+			}
+
 		memset(&vq->fake_mbuf, 0, sizeof(vq->fake_mbuf));
 		for (i = 0; i < RTE_PMD_VIRTIO_RX_MAX_BURST; i++)
 			vq->sw_ring[vq->vq_nentries + i] = &vq->fake_mbuf;
@@ -332,6 +338,24 @@ virtio_dev_vring_start(struct virtqueue *vq, int queue_type)
 		VIRTIO_WRITE_REG_4(vq->hw, VIRTIO_PCI_QUEUE_PFN,
 			vq->mz->phys_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT);
 	} else if (queue_type == VTNET_TQ) {
+		if (use_simple_rxtx) {
+			int mid_idx  = vq->vq_nentries >> 1;
+			for (i = 0; i < mid_idx; i++) {
+				vq->vq_ring.avail->ring[i] = i + mid_idx;
+				vq->vq_ring.desc[i + mid_idx].next = i;
+				vq->vq_ring.desc[i + mid_idx].addr =
+					vq->virtio_net_hdr_mem +
+						mid_idx * vq->hw->vtnet_hdr_size;
+				vq->vq_ring.desc[i + mid_idx].len =
+					vq->hw->vtnet_hdr_size;
+				vq->vq_ring.desc[i + mid_idx].flags =
+					VRING_DESC_F_NEXT;
+				vq->vq_ring.desc[i].flags = 0;
+			}
+			for (i = mid_idx; i < vq->vq_nentries; i++)
+				vq->vq_ring.avail->ring[i] = i;
+		}
+
 		VIRTIO_WRITE_REG_2(vq->hw, VIRTIO_PCI_QUEUE_SEL,
 			vq->vq_queue_index);
 		VIRTIO_WRITE_REG_4(vq->hw, VIRTIO_PCI_QUEUE_PFN,
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH v5 4/7] virtio: fill RX avail ring with blank mbufs
  2015-10-25 15:34 ` [dpdk-dev] [PATCH v5 0/7] virtio ring layout optimization and simple rx/tx processing Huawei Xie
                     ` (2 preceding siblings ...)
  2015-10-25 15:35   ` [dpdk-dev] [PATCH v5 3/7] virtio: rx/tx ring layout optimization Huawei Xie
@ 2015-10-25 15:35   ` Huawei Xie
  2015-10-25 15:35   ` [dpdk-dev] [PATCH v5 5/7] virtio: virtio vec rx Huawei Xie
                     ` (3 subsequent siblings)
  7 siblings, 0 replies; 92+ messages in thread
From: Huawei Xie @ 2015-10-25 15:35 UTC (permalink / raw)
  To: dev

fill avail ring with blank mbufs in virtio_dev_vring_start

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 drivers/net/virtio/Makefile             |  2 +-
 drivers/net/virtio/virtio_rxtx.c        |  6 ++-
 drivers/net/virtio/virtio_rxtx.h        |  3 ++
 drivers/net/virtio/virtio_rxtx_simple.c | 84 +++++++++++++++++++++++++++++++++
 4 files changed, 92 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/virtio/virtio_rxtx_simple.c

diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
index 930b60f..43835ba 100644
--- a/drivers/net/virtio/Makefile
+++ b/drivers/net/virtio/Makefile
@@ -50,7 +50,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtqueue.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_pci.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_ethdev.c
-
+SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_simple.c
 
 # this lib depends upon:
 DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_eal lib/librte_ether
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 7c82a6a..5162ce6 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -320,8 +320,10 @@ virtio_dev_vring_start(struct virtqueue *vq, int queue_type)
 			/******************************************
 			*         Enqueue allocated buffers        *
 			*******************************************/
-			error = virtqueue_enqueue_recv_refill(vq, m);
-
+			if (use_simple_rxtx)
+				error = virtqueue_enqueue_recv_refill_simple(vq, m);
+			else
+				error = virtqueue_enqueue_recv_refill(vq, m);
 			if (error) {
 				rte_pktmbuf_free(m);
 				break;
diff --git a/drivers/net/virtio/virtio_rxtx.h b/drivers/net/virtio/virtio_rxtx.h
index a10aa69..7d2d8fe 100644
--- a/drivers/net/virtio/virtio_rxtx.h
+++ b/drivers/net/virtio/virtio_rxtx.h
@@ -32,3 +32,6 @@
  */
 
 #define RTE_PMD_VIRTIO_RX_MAX_BURST 64
+
+int virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,
+	struct rte_mbuf *m);
diff --git a/drivers/net/virtio/virtio_rxtx_simple.c b/drivers/net/virtio/virtio_rxtx_simple.c
new file mode 100644
index 0000000..cac5b9f
--- /dev/null
+++ b/drivers/net/virtio/virtio_rxtx_simple.c
@@ -0,0 +1,84 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <errno.h>
+
+#include <tmmintrin.h>
+
+#include <rte_cycles.h>
+#include <rte_memory.h>
+#include <rte_memzone.h>
+#include <rte_branch_prediction.h>
+#include <rte_mempool.h>
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+#include <rte_ether.h>
+#include <rte_ethdev.h>
+#include <rte_prefetch.h>
+#include <rte_string_fns.h>
+#include <rte_errno.h>
+#include <rte_byteorder.h>
+
+#include "virtio_logs.h"
+#include "virtio_ethdev.h"
+#include "virtqueue.h"
+#include "virtio_rxtx.h"
+
+int __attribute__((cold))
+virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,
+	struct rte_mbuf *cookie)
+{
+	struct vq_desc_extra *dxp;
+	struct vring_desc *start_dp;
+	uint16_t desc_idx;
+
+	desc_idx = vq->vq_avail_idx & (vq->vq_nentries - 1);
+	dxp = &vq->vq_descx[desc_idx];
+	dxp->cookie = (void *)cookie;
+	vq->sw_ring[desc_idx] = cookie;
+
+	start_dp = vq->vq_ring.desc;
+	start_dp[desc_idx].addr = (uint64_t)((uintptr_t)cookie->buf_physaddr +
+		RTE_PKTMBUF_HEADROOM - sizeof(struct virtio_net_hdr));
+	start_dp[desc_idx].len = cookie->buf_len -
+		RTE_PKTMBUF_HEADROOM + sizeof(struct virtio_net_hdr);
+
+	vq->vq_free_cnt--;
+	vq->vq_avail_idx++;
+
+	return 0;
+}
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH v5 5/7] virtio: virtio vec rx
  2015-10-25 15:34 ` [dpdk-dev] [PATCH v5 0/7] virtio ring layout optimization and simple rx/tx processing Huawei Xie
                     ` (3 preceding siblings ...)
  2015-10-25 15:35   ` [dpdk-dev] [PATCH v5 4/7] virtio: fill RX avail ring with blank mbufs Huawei Xie
@ 2015-10-25 15:35   ` Huawei Xie
  2015-10-26  8:34     ` Wang, Zhihong
  2015-10-25 15:35   ` [dpdk-dev] [PATCH v5 6/7] virtio: simple tx routine Huawei Xie
                     ` (2 subsequent siblings)
  7 siblings, 1 reply; 92+ messages in thread
From: Huawei Xie @ 2015-10-25 15:35 UTC (permalink / raw)
  To: dev

With fixed avail ring, we don't need to get desc idx from avail ring.
virtio driver only has to deal with desc ring.
This patch uses vector instruction to accelerate processing desc ring.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 drivers/net/virtio/virtio_ethdev.h      |   2 +
 drivers/net/virtio/virtio_rxtx.c        |   3 +
 drivers/net/virtio/virtio_rxtx.h        |   2 +
 drivers/net/virtio/virtio_rxtx_simple.c | 224 ++++++++++++++++++++++++++++++++
 drivers/net/virtio/virtqueue.h          |   1 +
 5 files changed, 232 insertions(+)

diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h
index 9026d42..d7797ab 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -108,6 +108,8 @@ uint16_t virtio_recv_mergeable_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 uint16_t virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
+		uint16_t nb_pkts);
 
 /*
  * The VIRTIO_NET_F_GUEST_TSO[46] features permit the host to send us
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 5162ce6..947fc46 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -432,6 +432,9 @@ virtio_dev_rx_queue_setup(struct rte_eth_dev *dev,
 	vq->mpool = mp;
 
 	dev->data->rx_queues[queue_idx] = vq;
+
+	virtio_rxq_vec_setup(vq);
+
 	return 0;
 }
 
diff --git a/drivers/net/virtio/virtio_rxtx.h b/drivers/net/virtio/virtio_rxtx.h
index 7d2d8fe..831e492 100644
--- a/drivers/net/virtio/virtio_rxtx.h
+++ b/drivers/net/virtio/virtio_rxtx.h
@@ -33,5 +33,7 @@
 
 #define RTE_PMD_VIRTIO_RX_MAX_BURST 64
 
+int virtio_rxq_vec_setup(struct virtqueue *rxq);
+
 int virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,
 	struct rte_mbuf *m);
diff --git a/drivers/net/virtio/virtio_rxtx_simple.c b/drivers/net/virtio/virtio_rxtx_simple.c
index cac5b9f..ef17562 100644
--- a/drivers/net/virtio/virtio_rxtx_simple.c
+++ b/drivers/net/virtio/virtio_rxtx_simple.c
@@ -58,6 +58,10 @@
 #include "virtqueue.h"
 #include "virtio_rxtx.h"
 
+#define RTE_VIRTIO_VPMD_RX_BURST 32
+#define RTE_VIRTIO_DESC_PER_LOOP 8
+#define RTE_VIRTIO_VPMD_RX_REARM_THRESH RTE_VIRTIO_VPMD_RX_BURST
+
 int __attribute__((cold))
 virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,
 	struct rte_mbuf *cookie)
@@ -82,3 +86,223 @@ virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,
 
 	return 0;
 }
+
+static inline void
+virtio_rxq_rearm_vec(struct virtqueue *rxvq)
+{
+	int i;
+	uint16_t desc_idx;
+	struct rte_mbuf **sw_ring;
+	struct vring_desc *start_dp;
+	int ret;
+
+	desc_idx = rxvq->vq_avail_idx & (rxvq->vq_nentries - 1);
+	sw_ring = &rxvq->sw_ring[desc_idx];
+	start_dp = &rxvq->vq_ring.desc[desc_idx];
+
+	ret = rte_mempool_get_bulk(rxvq->mpool, (void **)sw_ring,
+		RTE_VIRTIO_VPMD_RX_REARM_THRESH);
+	if (unlikely(ret)) {
+		rte_eth_devices[rxvq->port_id].data->rx_mbuf_alloc_failed +=
+			RTE_VIRTIO_VPMD_RX_REARM_THRESH;
+		return;
+	}
+
+	for (i = 0; i < RTE_VIRTIO_VPMD_RX_REARM_THRESH; i++) {
+		uintptr_t p;
+
+		p = (uintptr_t)&sw_ring[i]->rearm_data;
+		*(uint64_t *)p = rxvq->mbuf_initializer;
+
+		start_dp[i].addr =
+			(uint64_t)((uintptr_t)sw_ring[i]->buf_physaddr +
+			RTE_PKTMBUF_HEADROOM - sizeof(struct virtio_net_hdr));
+		start_dp[i].len = sw_ring[i]->buf_len -
+			RTE_PKTMBUF_HEADROOM + sizeof(struct virtio_net_hdr);
+	}
+
+	rxvq->vq_avail_idx += RTE_VIRTIO_VPMD_RX_REARM_THRESH;
+	rxvq->vq_free_cnt -= RTE_VIRTIO_VPMD_RX_REARM_THRESH;
+	vq_update_avail_idx(rxvq);
+}
+
+/* virtio vPMD receive routine, only accept(nb_pkts >= RTE_VIRTIO_DESC_PER_LOOP)
+ *
+ * This routine is for non-mergable RX, one desc for each guest buffer.
+ * This routine is based on the RX ring layout optimization. Each entry in the
+ * avail ring points to the desc with the same index in the desc ring and this
+ * will never be changed in the driver.
+ *
+ * - nb_pkts < RTE_VIRTIO_DESC_PER_LOOP, just return no packet
+ */
+uint16_t
+virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
+	uint16_t nb_pkts)
+{
+	struct virtqueue *rxvq = rx_queue;
+	uint16_t nb_used;
+	uint16_t desc_idx;
+	struct vring_used_elem *rused;
+	struct rte_mbuf **sw_ring;
+	struct rte_mbuf **sw_ring_end;
+	uint16_t nb_pkts_received;
+	__m128i shuf_msk1, shuf_msk2, len_adjust;
+
+	shuf_msk1 = _mm_set_epi8(
+		0xFF, 0xFF, 0xFF, 0xFF,
+		0xFF, 0xFF,		/* vlan tci */
+		5, 4,			/* dat len */
+		0xFF, 0xFF, 5, 4,	/* pkt len */
+		0xFF, 0xFF, 0xFF, 0xFF	/* packet type */
+
+	);
+
+	shuf_msk2 = _mm_set_epi8(
+		0xFF, 0xFF, 0xFF, 0xFF,
+		0xFF, 0xFF,		/* vlan tci */
+		13, 12,			/* dat len */
+		0xFF, 0xFF, 13, 12,	/* pkt len */
+		0xFF, 0xFF, 0xFF, 0xFF	/* packet type */
+	);
+
+	/* Substract the header length.
+	*  In which case do we need the header length in used->len ?
+	*/
+	len_adjust = _mm_set_epi16(
+		0, 0,
+		0,
+		(uint16_t) -sizeof(struct virtio_net_hdr),
+		0, (uint16_t) -sizeof(struct virtio_net_hdr),
+		0, 0);
+
+	if (unlikely(nb_pkts < RTE_VIRTIO_DESC_PER_LOOP))
+		return 0;
+
+	nb_used = *(volatile uint16_t *)&rxvq->vq_ring.used->idx -
+		rxvq->vq_used_cons_idx;
+
+	rte_compiler_barrier();
+
+	if (unlikely(nb_used == 0))
+		return 0;
+
+	nb_pkts = RTE_ALIGN_FLOOR(nb_pkts, RTE_VIRTIO_DESC_PER_LOOP);
+	nb_used = RTE_MIN(nb_used, nb_pkts);
+
+	desc_idx = (uint16_t)(rxvq->vq_used_cons_idx & (rxvq->vq_nentries - 1));
+	rused = &rxvq->vq_ring.used->ring[desc_idx];
+	sw_ring  = &rxvq->sw_ring[desc_idx];
+	sw_ring_end = &rxvq->sw_ring[rxvq->vq_nentries];
+
+	_mm_prefetch((const void *)rused, _MM_HINT_T0);
+
+	if (rxvq->vq_free_cnt >= RTE_VIRTIO_VPMD_RX_REARM_THRESH) {
+		virtio_rxq_rearm_vec(rxvq);
+		if (unlikely(virtqueue_kick_prepare(rxvq)))
+			virtqueue_notify(rxvq);
+	}
+
+	for (nb_pkts_received = 0;
+		nb_pkts_received < nb_used;) {
+		__m128i desc[RTE_VIRTIO_DESC_PER_LOOP / 2];
+		__m128i mbp[RTE_VIRTIO_DESC_PER_LOOP / 2];
+		__m128i pkt_mb[RTE_VIRTIO_DESC_PER_LOOP];
+
+		mbp[0] = _mm_loadu_si128((__m128i *)(sw_ring + 0));
+		desc[0] = _mm_loadu_si128((__m128i *)(rused + 0));
+		_mm_storeu_si128((__m128i *)&rx_pkts[0], mbp[0]);
+
+		mbp[1] = _mm_loadu_si128((__m128i *)(sw_ring + 2));
+		desc[1] = _mm_loadu_si128((__m128i *)(rused + 2));
+		_mm_storeu_si128((__m128i *)&rx_pkts[2], mbp[1]);
+
+		mbp[2] = _mm_loadu_si128((__m128i *)(sw_ring + 4));
+		desc[2] = _mm_loadu_si128((__m128i *)(rused + 4));
+		_mm_storeu_si128((__m128i *)&rx_pkts[4], mbp[2]);
+
+		mbp[3] = _mm_loadu_si128((__m128i *)(sw_ring + 6));
+		desc[3] = _mm_loadu_si128((__m128i *)(rused + 6));
+		_mm_storeu_si128((__m128i *)&rx_pkts[6], mbp[3]);
+
+		pkt_mb[1] = _mm_shuffle_epi8(desc[0], shuf_msk2);
+		pkt_mb[0] = _mm_shuffle_epi8(desc[0], shuf_msk1);
+		pkt_mb[1] = _mm_add_epi16(pkt_mb[1], len_adjust);
+		pkt_mb[0] = _mm_add_epi16(pkt_mb[0], len_adjust);
+		_mm_storeu_si128((void *)&rx_pkts[1]->rx_descriptor_fields1,
+			pkt_mb[1]);
+		_mm_storeu_si128((void *)&rx_pkts[0]->rx_descriptor_fields1,
+			pkt_mb[0]);
+
+		pkt_mb[3] = _mm_shuffle_epi8(desc[1], shuf_msk2);
+		pkt_mb[2] = _mm_shuffle_epi8(desc[1], shuf_msk1);
+		pkt_mb[3] = _mm_add_epi16(pkt_mb[3], len_adjust);
+		pkt_mb[2] = _mm_add_epi16(pkt_mb[2], len_adjust);
+		_mm_storeu_si128((void *)&rx_pkts[3]->rx_descriptor_fields1,
+			pkt_mb[3]);
+		_mm_storeu_si128((void *)&rx_pkts[2]->rx_descriptor_fields1,
+			pkt_mb[2]);
+
+		pkt_mb[5] = _mm_shuffle_epi8(desc[2], shuf_msk2);
+		pkt_mb[4] = _mm_shuffle_epi8(desc[2], shuf_msk1);
+		pkt_mb[5] = _mm_add_epi16(pkt_mb[5], len_adjust);
+		pkt_mb[4] = _mm_add_epi16(pkt_mb[4], len_adjust);
+		_mm_storeu_si128((void *)&rx_pkts[5]->rx_descriptor_fields1,
+			pkt_mb[5]);
+		_mm_storeu_si128((void *)&rx_pkts[4]->rx_descriptor_fields1,
+			pkt_mb[4]);
+
+		pkt_mb[7] = _mm_shuffle_epi8(desc[3], shuf_msk2);
+		pkt_mb[6] = _mm_shuffle_epi8(desc[3], shuf_msk1);
+		pkt_mb[7] = _mm_add_epi16(pkt_mb[7], len_adjust);
+		pkt_mb[6] = _mm_add_epi16(pkt_mb[6], len_adjust);
+		_mm_storeu_si128((void *)&rx_pkts[7]->rx_descriptor_fields1,
+			pkt_mb[7]);
+		_mm_storeu_si128((void *)&rx_pkts[6]->rx_descriptor_fields1,
+			pkt_mb[6]);
+
+		if (unlikely(nb_used <= RTE_VIRTIO_DESC_PER_LOOP)) {
+			if (sw_ring + nb_used <= sw_ring_end)
+				nb_pkts_received += nb_used;
+			else
+				nb_pkts_received += sw_ring_end - sw_ring;
+			break;
+		} else {
+			if (unlikely(sw_ring + RTE_VIRTIO_DESC_PER_LOOP >=
+				sw_ring_end)) {
+				nb_pkts_received += sw_ring_end - sw_ring;
+				break;
+			} else {
+				nb_pkts_received += RTE_VIRTIO_DESC_PER_LOOP;
+
+				rx_pkts += RTE_VIRTIO_DESC_PER_LOOP;
+				sw_ring += RTE_VIRTIO_DESC_PER_LOOP;
+				rused   += RTE_VIRTIO_DESC_PER_LOOP;
+				nb_used -= RTE_VIRTIO_DESC_PER_LOOP;
+			}
+		}
+	}
+
+	rxvq->vq_used_cons_idx += nb_pkts_received;
+	rxvq->vq_free_cnt += nb_pkts_received;
+	rxvq->packets += nb_pkts_received;
+	return nb_pkts_received;
+}
+
+int __attribute__((cold))
+virtio_rxq_vec_setup(struct virtqueue *rxq)
+{
+	uintptr_t p;
+	struct rte_mbuf mb_def = { .buf_addr = 0 }; /* zeroed mbuf */
+
+	mb_def.nb_segs = 1;
+	mb_def.data_off = RTE_PKTMBUF_HEADROOM;
+	mb_def.port = rxq->port_id;
+	rte_mbuf_refcnt_set(&mb_def, 1);
+
+	/* prevent compiler reordering: rearm_data covers previous fields */
+	rte_compiler_barrier();
+	p = (uintptr_t)&mb_def.rearm_data;
+	rxq->mbuf_initializer = *(uint64_t *)p;
+
+	return 0;
+}
diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
index 6a1ec48..98a77d5 100644
--- a/drivers/net/virtio/virtqueue.h
+++ b/drivers/net/virtio/virtqueue.h
@@ -188,6 +188,7 @@ struct virtqueue {
 	 */
 	uint16_t vq_used_cons_idx;
 	uint16_t vq_avail_idx;
+	uint64_t mbuf_initializer; /**< value to init mbufs. */
 	phys_addr_t virtio_net_hdr_mem; /**< hdr for each xmit packet */
 
 	struct rte_mbuf **sw_ring; /**< RX software ring. */
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH v5 6/7] virtio: simple tx routine
  2015-10-25 15:34 ` [dpdk-dev] [PATCH v5 0/7] virtio ring layout optimization and simple rx/tx processing Huawei Xie
                     ` (4 preceding siblings ...)
  2015-10-25 15:35   ` [dpdk-dev] [PATCH v5 5/7] virtio: virtio vec rx Huawei Xie
@ 2015-10-25 15:35   ` Huawei Xie
  2015-10-25 15:35   ` [dpdk-dev] [PATCH v5 7/7] virtio: pick simple rx/tx func Huawei Xie
  2015-10-27  1:44   ` [dpdk-dev] [PATCH v5 0/7] virtio ring layout optimization and simple rx/tx processing Tan, Jianfeng
  7 siblings, 0 replies; 92+ messages in thread
From: Huawei Xie @ 2015-10-25 15:35 UTC (permalink / raw)
  To: dev

Changes in v5:
- call __rte_pktmbuf_prefree_seg to check refcnt when free mbufs

Changes in v4:
- move virtio_xmit_cleanup ahead to free descriptors earlier

Changes in v3:
- Remove return at the end of void function
- Remove always_inline attribute for virtio_xmit_cleanup

bulk free of mbufs when clean used ring.
shift operation of idx could be saved if vq_free_cnt means
free slots rather than free descriptors.

TODO: rearrange vq data structure, pack the stats var together so that we
could use one vec instruction to update all of them.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 drivers/net/virtio/virtio_ethdev.h      |   3 +
 drivers/net/virtio/virtio_rxtx_simple.c | 106 ++++++++++++++++++++++++++++++++
 2 files changed, 109 insertions(+)

diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h
index d7797ab..ae2d47d 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -111,6 +111,9 @@ uint16_t virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 uint16_t virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t virtio_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 /*
  * The VIRTIO_NET_F_GUEST_TSO[46] features permit the host to send us
  * frames larger than 1514 bytes. We do not yet support software LRO
diff --git a/drivers/net/virtio/virtio_rxtx_simple.c b/drivers/net/virtio/virtio_rxtx_simple.c
index ef17562..624e789 100644
--- a/drivers/net/virtio/virtio_rxtx_simple.c
+++ b/drivers/net/virtio/virtio_rxtx_simple.c
@@ -288,6 +288,112 @@ virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
 	return nb_pkts_received;
 }
 
+#define VIRTIO_TX_FREE_THRESH 32
+#define VIRTIO_TX_MAX_FREE_BUF_SZ 32
+#define VIRTIO_TX_FREE_NR 32
+/* TODO: vq->tx_free_cnt could mean num of free slots so we could avoid shift */
+static inline void
+virtio_xmit_cleanup(struct virtqueue *vq)
+{
+	uint16_t i, desc_idx;
+	int nb_free = 0;
+	struct rte_mbuf *m, *free[VIRTIO_TX_MAX_FREE_BUF_SZ];
+
+	desc_idx = (uint16_t)(vq->vq_used_cons_idx &
+		   ((vq->vq_nentries >> 1) - 1));
+	m = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
+	m = __rte_pktmbuf_prefree_seg(m);
+	if (likely(m != NULL)) {
+		free[0] = m;
+		nb_free = 1;
+		for (i = 1; i < VIRTIO_TX_FREE_NR; i++) {
+			m = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
+			m = __rte_pktmbuf_prefree_seg(m);
+			if (likely(m != NULL)) {
+				if (likely(m->pool == free[0]->pool))
+					free[nb_free++] = m;
+				else {
+					rte_mempool_put_bulk(free[0]->pool,
+						(void **)free, nb_free);
+					free[0] = m;
+					nb_free = 1;
+				}
+			}
+		}
+		rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
+	} else {
+		for (i = 1; i < VIRTIO_TX_FREE_NR; i++) {
+			m = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
+			m = __rte_pktmbuf_prefree_seg(m);
+			if (m != NULL)
+				rte_mempool_put(m->pool, m);
+		}
+	}
+
+	vq->vq_used_cons_idx += VIRTIO_TX_FREE_NR;
+	vq->vq_free_cnt += (VIRTIO_TX_FREE_NR << 1);
+}
+
+uint16_t
+virtio_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
+	uint16_t nb_pkts)
+{
+	struct virtqueue *txvq = tx_queue;
+	uint16_t nb_used;
+	uint16_t desc_idx;
+	struct vring_desc *start_dp;
+	uint16_t nb_tail, nb_commit;
+	int i;
+	uint16_t desc_idx_max = (txvq->vq_nentries >> 1) - 1;
+
+	nb_used = VIRTQUEUE_NUSED(txvq);
+	rte_compiler_barrier();
+
+	if (nb_used >= VIRTIO_TX_FREE_THRESH)
+		virtio_xmit_cleanup(tx_queue);
+
+	nb_commit = nb_pkts = RTE_MIN((txvq->vq_free_cnt >> 1), nb_pkts);
+	desc_idx = (uint16_t) (txvq->vq_avail_idx & desc_idx_max);
+	start_dp = txvq->vq_ring.desc;
+	nb_tail = (uint16_t) (desc_idx_max + 1 - desc_idx);
+
+	if (nb_commit >= nb_tail) {
+		for (i = 0; i < nb_tail; i++)
+			txvq->vq_descx[desc_idx + i].cookie = tx_pkts[i];
+		for (i = 0; i < nb_tail; i++) {
+			start_dp[desc_idx].addr =
+				RTE_MBUF_DATA_DMA_ADDR(*tx_pkts);
+			start_dp[desc_idx].len = (*tx_pkts)->pkt_len;
+			tx_pkts++;
+			desc_idx++;
+		}
+		nb_commit -= nb_tail;
+		desc_idx = 0;
+	}
+	for (i = 0; i < nb_commit; i++)
+		txvq->vq_descx[desc_idx + i].cookie = tx_pkts[i];
+	for (i = 0; i < nb_commit; i++) {
+		start_dp[desc_idx].addr = RTE_MBUF_DATA_DMA_ADDR(*tx_pkts);
+		start_dp[desc_idx].len = (*tx_pkts)->pkt_len;
+		tx_pkts++;
+		desc_idx++;
+	}
+
+	rte_compiler_barrier();
+
+	txvq->vq_free_cnt -= (uint16_t)(nb_pkts << 1);
+	txvq->vq_avail_idx += nb_pkts;
+	txvq->vq_ring.avail->idx = txvq->vq_avail_idx;
+	txvq->packets += nb_pkts;
+
+	if (likely(nb_pkts)) {
+		if (unlikely(virtqueue_kick_prepare(txvq)))
+			virtqueue_notify(txvq);
+	}
+
+	return nb_pkts;
+}
+
 int __attribute__((cold))
 virtio_rxq_vec_setup(struct virtqueue *rxq)
 {
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH v5 7/7] virtio: pick simple rx/tx func
  2015-10-25 15:34 ` [dpdk-dev] [PATCH v5 0/7] virtio ring layout optimization and simple rx/tx processing Huawei Xie
                     ` (5 preceding siblings ...)
  2015-10-25 15:35   ` [dpdk-dev] [PATCH v5 6/7] virtio: simple tx routine Huawei Xie
@ 2015-10-25 15:35   ` Huawei Xie
  2015-10-27  1:44   ` [dpdk-dev] [PATCH v5 0/7] virtio ring layout optimization and simple rx/tx processing Tan, Jianfeng
  7 siblings, 0 replies; 92+ messages in thread
From: Huawei Xie @ 2015-10-25 15:35 UTC (permalink / raw)
  To: dev

Changes in v4:
Check merge-able feature when select simple rx/tx functions.

simple rx/tx func is chose when merge-able rx is disabled and user specifies single segment and
no offload support.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 drivers/net/virtio/virtio_rxtx.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 947fc46..0f1daf2 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -53,6 +53,7 @@
 
 #include "virtio_logs.h"
 #include "virtio_ethdev.h"
+#include "virtio_pci.h"
 #include "virtqueue.h"
 #include "virtio_rxtx.h"
 
@@ -62,6 +63,10 @@
 #define  VIRTIO_DUMP_PACKET(m, len) do { } while (0)
 #endif
 
+
+#define VIRTIO_SIMPLE_FLAGS ((uint32_t)ETH_TXQ_FLAGS_NOMULTSEGS | \
+	ETH_TXQ_FLAGS_NOOFFLOADS)
+
 static int use_simple_rxtx;
 
 static void
@@ -459,6 +464,7 @@ virtio_dev_tx_queue_setup(struct rte_eth_dev *dev,
 			const struct rte_eth_txconf *tx_conf)
 {
 	uint8_t vtpci_queue_idx = 2 * queue_idx + VTNET_SQ_TQ_QUEUE_IDX;
+	struct virtio_hw *hw = dev->data->dev_private;
 	struct virtqueue *vq;
 	uint16_t tx_free_thresh;
 	int ret;
@@ -471,6 +477,15 @@ virtio_dev_tx_queue_setup(struct rte_eth_dev *dev,
 		return -EINVAL;
 	}
 
+	/* Use simple rx/tx func if single segment and no offloads */
+	if ((tx_conf->txq_flags & VIRTIO_SIMPLE_FLAGS) == VIRTIO_SIMPLE_FLAGS &&
+	     !vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF)) {
+		PMD_INIT_LOG(INFO, "Using simple rx/tx path");
+		dev->tx_pkt_burst = virtio_xmit_pkts_simple;
+		dev->rx_pkt_burst = virtio_recv_pkts_vec;
+		use_simple_rxtx = 1;
+	}
+
 	ret = virtio_dev_queue_setup(dev, VTNET_TQ, queue_idx, vtpci_queue_idx,
 			nb_desc, socket_id, &vq);
 	if (ret < 0) {
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [dpdk-dev] [PATCH v4 4/7] virtio: fill RX avail ring with blank mbufs
  2015-10-23  5:56     ` Tan, Jianfeng
@ 2015-10-25 15:40       ` Xie, Huawei
  0 siblings, 0 replies; 92+ messages in thread
From: Xie, Huawei @ 2015-10-25 15:40 UTC (permalink / raw)
  To: Tan, Jianfeng, dev

On 10/23/2015 1:56 PM, Tan, Jianfeng wrote:
> On 10/23/2015 1:51 PM, Jianfeng wrote:
>
>> -----Original Message-----
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Huawei Xie
>> Sent: Thursday, October 22, 2015 8:10 PM
>> To: dev@dpdk.org
>> Subject: [dpdk-dev] [PATCH v4 4/7] virtio: fill RX avail ring with blank mbufs
>> +int __attribute__((cold))
>> +virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,
>> +	struct rte_mbuf *cookie)
>> +{
>> +	struct vq_desc_extra *dxp;
>> +	struct vring_desc *start_dp;
>> +	uint16_t desc_idx;
>> +
>> +	desc_idx = vq->vq_avail_idx & (vq->vq_nentries - 1);
>> +	dxp = &vq->vq_descx[desc_idx];
>> +	dxp->cookie = (void *)cookie;
>> +	vq->sw_ring[desc_idx] = cookie;
>> +
>> +	start_dp = vq->vq_ring.desc;
>> +	start_dp[desc_idx].addr = (uint64_t)((uintptr_t)cookie-
>>> buf_physaddr +
>> +		RTE_PKTMBUF_HEADROOM - sizeof(struct virtio_net_hdr));
> Please use RTE_MBUF_DATA_DMA_ADDR instead of "buf_physaddr + RTE_PKTMBUF_HEADROOM".
RTE_MBUF_DATA_DMA_ADDR is used for tx. For rx, we should use
RTE_MBUF_DATA_DMA_DEFAULT. 
We could use a separate patch to fix all this in virito code.
I remember there is a patch to move
RTE_MBUF_DATA_DMA_ADDR/RTE_MBUF_DATA_DMA_DEFAULT definition into the
common header file.
>
>> +	start_dp[desc_idx].len = cookie->buf_len -
>> +		RTE_PKTMBUF_HEADROOM + sizeof(struct virtio_net_hdr);
>> +
>> +	vq->vq_free_cnt--;
>> +	vq->vq_avail_idx++;
>> +
>> +	return 0;
>> +}
>> --
>> 1.8.1.4
>


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [dpdk-dev] [PATCH v5 5/7] virtio: virtio vec rx
  2015-10-25 15:35   ` [dpdk-dev] [PATCH v5 5/7] virtio: virtio vec rx Huawei Xie
@ 2015-10-26  8:34     ` Wang, Zhihong
  0 siblings, 0 replies; 92+ messages in thread
From: Wang, Zhihong @ 2015-10-26  8:34 UTC (permalink / raw)
  To: Xie, Huawei, dev

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Huawei Xie
> Sent: Sunday, October 25, 2015 11:35 PM
> To: dev@dpdk.org
> Subject: [dpdk-dev] [PATCH v5 5/7] virtio: virtio vec rx
> 
> With fixed avail ring, we don't need to get desc idx from avail ring.
> virtio driver only has to deal with desc ring.
> This patch uses vector instruction to accelerate processing desc ring.
> 
> Signed-off-by: Huawei Xie <huawei.xie@intel.com>

Acked-by: Wang, Zhihong <zhihong.wang@intel.com>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [dpdk-dev] [PATCH v5 0/7] virtio ring layout optimization and simple rx/tx processing
  2015-10-25 15:34 ` [dpdk-dev] [PATCH v5 0/7] virtio ring layout optimization and simple rx/tx processing Huawei Xie
                     ` (6 preceding siblings ...)
  2015-10-25 15:35   ` [dpdk-dev] [PATCH v5 7/7] virtio: pick simple rx/tx func Huawei Xie
@ 2015-10-27  1:44   ` Tan, Jianfeng
  2015-10-27  2:15     ` Yuanhan Liu
  7 siblings, 1 reply; 92+ messages in thread
From: Tan, Jianfeng @ 2015-10-27  1:44 UTC (permalink / raw)
  To: Xie, Huawei, dev



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Huawei Xie
> Sent: Sunday, October 25, 2015 11:35 PM
> To: dev@dpdk.org
> Subject: [dpdk-dev] [PATCH v5 0/7] virtio ring layout optimization and simple
> rx/tx processing
> 
> Changes in v5:
> - Call __rte_pktmbuf_prefree_seg to check refcnt when free mbufs
> 
> Changes in v4:
> - Fix the error in virtio tx ring layout ascii chart in the commit message
> - Move virtio_xmit_cleanup ahead to free descriptors earlier
> - Test merge-able feature when select simple rx/tx functions
> 
> Changes in v3:
> - Remove unnecessary NULL test for rte_free
> - Remove unnecessary assign of local var after free
> - Remove return at the end of void function
> - Remove always_inline attribute for virtio_xmit_cleanup
> - Reword some commit messages
> - Add TODO in the commit message of simple tx patch
> 
> Changes in v2:
> - Remove the configure macro
> - Enable simple R/TX processing when user specifies simple txq flags
> - Reword some comments and commit messages
> 
> In DPDK based switching enviroment, mostly vhost runs on a dedicated core
> while virtio processing in guest VMs runs on other different cores.
> Take RX for example, with generic implementation, for each guest buffer,
> a) virtio driver allocates a descriptor from free descriptor list
> b) modify the entry of avail ring to point to allocated descriptor
> c) after packet is received, free the descriptor
> 
> When vhost fetches the avail ring, it need to fetch the modified L1 cache
> from virtio core, which is a heavy cost in current CPU implementation.
> 
> This idea of this optimization is:
>     allocate the fixed descriptor for each entry of avail ring, so avail ring will
> always be the same during the run.
> This removes L1M cache transfer from virtio core to vhost core for avail ring.
> (Note we couldn't avoid the cache transfer for descriptors).
> Besides, descriptor allocation and free operation is eliminated.
> This also makes vector procesing possible to further accelerate the
> processing.
> 
> This is the layout for the avail ring(take 256 ring entries for example), with
> each entry pointing to the descriptor with the same index.
>                     avail
>                     idx
>                     +
>                     |
> +----+----+---+-------------+------+
> | 0  | 1  | 2 | ... |  254  | 255  |  avail ring
> +-+--+-+--+-+-+---------+---+--+---+
>   |    |    |       |   |      |
>   |    |    |       |   |      |
>   v    v    v       |   v      v
> +-+--+-+--+-+-+---------+---+--+---+
> | 0  | 1  | 2 | ... |  254  | 255  |  desc ring
> +----+----+---+-------------+------+
>                     |
>                     |
> +----+----+---+-------------+------+
> | 0  | 1  | 2 |     |  254  | 255  |  used ring
> +----+----+---+-------------+------+
>                     |
>                     +
> 
> This is the ring layout for TX.
> As we need one virtio header for each xmit packet, we have 128 slots
> available.
> 
>                          ++
>                          ||
>                          ||
> +-----+-----+-----+--------------+------+------+------+
> |  0  |  1  | ... |  127 || 128  | 129  | ...  | 255  |   avail ring
> +--+--+--+--+-----+---+------+---+--+---+------+--+---+
>    |     |            |  ||  |      |             |
>    v     v            v  ||  v      v             v
> +--+--+--+--+-----+---+------+---+--+---+------+--+---+
> | 127 | 128 | ... |  255 || 127  | 128  | ...  | 255  |   desc ring for virtio_net_hdr
> +--+--+--+--+-----+---+------+---+--+---+------+--+---+
>    |     |            |  ||  |      |             |
>    v     v            v  ||  v      v             v
> +--+--+--+--+-----+---+------+---+--+---+------+--+---+
> |  0  |  1  | ... |  127 ||  0   |  1   | ...  | 127  |   desc ring for tx dat
> +-----+-----+-----+--------------+------+------+------+
>                          ||
>                          ||
>                          ++
> 
> 
> Performance boost could be observed only if the virtio backend isn't the
> bottleneck or in VM2VM case.
> There are also several vhost optimization patches to be submitted later.
> 
> 
> Huawei Xie (7):
>   virtio: add virtio_rxtx.h header file
>   virtio: add software rx ring, fake_buf into virtqueue
>   virtio: rx/tx ring layout optimization
>   virtio: fill RX avail ring with blank mbufs
>   virtio: virtio vec rx
>   virtio: simple tx routine
>   virtio: pick simple rx/tx func
> 
>  drivers/net/virtio/Makefile             |   2 +-
>  drivers/net/virtio/virtio_ethdev.c      |  12 +-
>  drivers/net/virtio/virtio_ethdev.h      |   5 +
>  drivers/net/virtio/virtio_rxtx.c        |  56 ++++-
>  drivers/net/virtio/virtio_rxtx.h        |  39 +++
>  drivers/net/virtio/virtio_rxtx_simple.c | 414
> ++++++++++++++++++++++++++++++++
>  drivers/net/virtio/virtqueue.h          |   5 +
>  7 files changed, 529 insertions(+), 4 deletions(-)  create mode 100644
> drivers/net/virtio/virtio_rxtx.h  create mode 100644
> drivers/net/virtio/virtio_rxtx_simple.c
> 
> --
> 1.8.1.4


Acked-by Jianfeng Tan <jianfeng.tan@intel.com>

Thanks,
Jianfeng

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [dpdk-dev] [PATCH v5 0/7] virtio ring layout optimization and simple rx/tx processing
  2015-10-27  1:44   ` [dpdk-dev] [PATCH v5 0/7] virtio ring layout optimization and simple rx/tx processing Tan, Jianfeng
@ 2015-10-27  2:15     ` Yuanhan Liu
  2015-10-27 10:17       ` Bruce Richardson
  0 siblings, 1 reply; 92+ messages in thread
From: Yuanhan Liu @ 2015-10-27  2:15 UTC (permalink / raw)
  To: Tan, Jianfeng; +Cc: dev

On Tue, Oct 27, 2015 at 01:44:09AM +0000, Tan, Jianfeng wrote:
> 
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Huawei Xie
> > Sent: Sunday, October 25, 2015 11:35 PM
> > To: dev@dpdk.org
> > Subject: [dpdk-dev] [PATCH v5 0/7] virtio ring layout optimization and simple
> > rx/tx processing
> > 
> > Changes in v5:
> > - Call __rte_pktmbuf_prefree_seg to check refcnt when free mbufs
> > 
> > Changes in v4:
> > - Fix the error in virtio tx ring layout ascii chart in the commit message
> > - Move virtio_xmit_cleanup ahead to free descriptors earlier
> > - Test merge-able feature when select simple rx/tx functions

[...]

> 
> Acked-by Jianfeng Tan <jianfeng.tan@intel.com>

Jianfeng,

I often see a reply like this, just dropping an ACK at the end of
long email, and no more, which takes me (as well as others) some
time to scroll it many times to the bottom till see that.

TBH, it's always a bit frustrating that, after the scroll effort,
I just see such a reply that could have been put on the top of
the email so that I can see it with a glimpse only.

So, top reply would be good for this case, or you could reply like
what I did, removing other context to make your reply fit in one
screen.

	--yliu

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [dpdk-dev] [PATCH v5 0/7] virtio ring layout optimization and simple rx/tx processing
  2015-10-27  2:15     ` Yuanhan Liu
@ 2015-10-27 10:17       ` Bruce Richardson
  0 siblings, 0 replies; 92+ messages in thread
From: Bruce Richardson @ 2015-10-27 10:17 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev

On Tue, Oct 27, 2015 at 10:15:01AM +0800, Yuanhan Liu wrote:
> On Tue, Oct 27, 2015 at 01:44:09AM +0000, Tan, Jianfeng wrote:
> > 
> > 
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Huawei Xie
> > > Sent: Sunday, October 25, 2015 11:35 PM
> > > To: dev@dpdk.org
> > > Subject: [dpdk-dev] [PATCH v5 0/7] virtio ring layout optimization and simple
> > > rx/tx processing
> > > 
> > > Changes in v5:
> > > - Call __rte_pktmbuf_prefree_seg to check refcnt when free mbufs
> > > 
> > > Changes in v4:
> > > - Fix the error in virtio tx ring layout ascii chart in the commit message
> > > - Move virtio_xmit_cleanup ahead to free descriptors earlier
> > > - Test merge-able feature when select simple rx/tx functions
> 
> [...]
> 
> > 
> > Acked-by Jianfeng Tan <jianfeng.tan@intel.com>
> 
> Jianfeng,
> 
> I often see a reply like this, just dropping an ACK at the end of
> long email, and no more, which takes me (as well as others) some
> time to scroll it many times to the bottom till see that.
> 
> TBH, it's always a bit frustrating that, after the scroll effort,
> I just see such a reply that could have been put on the top of
> the email so that I can see it with a glimpse only.
> 
> So, top reply would be good for this case, or you could reply like
> what I did, removing other context to make your reply fit in one
> screen.
> 
> 	--yliu

+1 

When ack'ing patches, please place the ack on the line underneath the signoff
and delete the rest of the email below, as it's unneeded.

/Bruce

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH v6 0/8] virtio ring layout optimization and simple rx/tx processing
  2015-09-29 14:45 [dpdk-dev] [PATCH 0/8] virtio: virtio ring layout optimization and RX vector processing Huawei Xie
                   ` (13 preceding siblings ...)
  2015-10-25 15:34 ` [dpdk-dev] [PATCH v5 0/7] virtio ring layout optimization and simple rx/tx processing Huawei Xie
@ 2015-10-29 14:53 ` Huawei Xie
  2015-10-29 14:53   ` [dpdk-dev] [PATCH v6 1/8] virtio: add virtio_rxtx.h header file Huawei Xie
                     ` (9 more replies)
  14 siblings, 10 replies; 92+ messages in thread
From: Huawei Xie @ 2015-10-29 14:53 UTC (permalink / raw)
  To: dev

Changes in v6:
- Update release notes
- Fix the error in virtio tx ring layout ascii chart in the cover-letter

Changes in v5:
- Call __rte_pktmbuf_prefree_seg to check refcnt when free mbufs

Changes in v4:
- Fix the error in virtio tx ring layout ascii chart in the commit message
- Move virtio_xmit_cleanup ahead to free descriptors earlier
- Test merge-able feature when select simple rx/tx functions

Changes in v3:
- Remove unnecessary NULL test for rte_free
- Remove unnecessary assign of local var after free
- Remove return at the end of void function
- Remove always_inline attribute for virtio_xmit_cleanup
- Reword some commit messages
- Add TODO in the commit message of simple tx patch

Changes in v2:
- Remove the configure macro
- Enable simple R/TX processing when user specifies simple txq flags
- Reword some comments and commit messages

In DPDK based switching enviroment, mostly vhost runs on a dedicated core
while virtio processing in guest VMs runs on other different cores.
Take RX for example, with generic implementation, for each guest buffer,
a) virtio driver allocates a descriptor from free descriptor list
b) modify the entry of avail ring to point to allocated descriptor
c) after packet is received, free the descriptor

When vhost fetches the avail ring, it need to fetch the modified L1 cache from
virtio core, which is a heavy cost in current CPU implementation.

This idea of this optimization is:
    allocate the fixed descriptor for each entry of avail ring, so avail ring will
always be the same during the run.
This removes L1M cache transfer from virtio core to vhost core for avail ring.
(Note we couldn't avoid the cache transfer for descriptors).
Besides, descriptor allocation and free operation is eliminated.
This also makes vector procesing possible to further accelerate the processing.

This is the layout for the avail ring(take 256 ring entries for example), with
each entry pointing to the descriptor with the same index.
                    avail
                    idx
                    +
                    |
+----+----+---+-------------+------+
| 0  | 1  | 2 | ... |  254  | 255  |  avail ring
+-+--+-+--+-+-+---------+---+--+---+
  |    |    |       |   |      |
  |    |    |       |   |      |
  v    v    v       |   v      v
+-+--+-+--+-+-+---------+---+--+---+
| 0  | 1  | 2 | ... |  254  | 255  |  desc ring
+----+----+---+-------------+------+
                    |
                    |
+----+----+---+-------------+------+
| 0  | 1  | 2 |     |  254  | 255  |  used ring
+----+----+---+-------------+------+
                    |
                    +

This is the ring layout for TX.
As we need one virtio header for each xmit packet, we have 128 slots available.

                         ++
                         ||
                         ||
+-----+-----+-----+--------------+------+------+------+
|  0  |  1  | ... |  127 || 128  | 129  | ...  | 255  |   avail ring
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
   |     |            |  ||  |      |             |
   v     v            v  ||  v      v             v
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
| 128 | 129 | ... |  255 || 127  | 128  | ...  | 255  |   desc ring for virtio_net_hdr
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
   |     |            |  ||  |      |             |
   v     v            v  ||  v      v             v
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
|  0  |  1  | ... |  127 ||  0   |  1   | ...  | 127  |   desc ring for tx dat
+-----+-----+-----+--------------+------+------+------+
                         ||
                         ||
                         ++


Performance boost could be observed only if the virtio backend isn't the bottleneck or in VM2VM
case.
There are also several vhost optimization patches to be submitted later.


Huawei Xie (8):
  virtio: add virtio_rxtx.h header file
  virtio: add software rx ring, fake_buf into virtqueue
  virtio: rx/tx ring layout optimization
  virtio: fill RX avail ring with blank mbufs
  virtio: virtio vec rx
  virtio: simple tx routine
  virtio: pick simple rx/tx func
  doc: update release notes 2.2 about virtio performance optimization

 doc/guides/rel_notes/release_2_2.rst    |   3 +
 drivers/net/virtio/Makefile             |   2 +-
 drivers/net/virtio/virtio_ethdev.c      |  12 +-
 drivers/net/virtio/virtio_ethdev.h      |   5 +
 drivers/net/virtio/virtio_rxtx.c        |  56 ++++-
 drivers/net/virtio/virtio_rxtx.h        |  39 +++
 drivers/net/virtio/virtio_rxtx_simple.c | 414 ++++++++++++++++++++++++++++++++
 drivers/net/virtio/virtqueue.h          |   5 +
 8 files changed, 532 insertions(+), 4 deletions(-)
 create mode 100644 drivers/net/virtio/virtio_rxtx.h
 create mode 100644 drivers/net/virtio/virtio_rxtx_simple.c

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH v6 1/8] virtio: add virtio_rxtx.h header file
  2015-10-29 14:53 ` [dpdk-dev] [PATCH v6 0/8] " Huawei Xie
@ 2015-10-29 14:53   ` Huawei Xie
  2015-10-29 14:53   ` [dpdk-dev] [PATCH v6 2/8] virtio: add software rx ring, fake_buf into virtqueue Huawei Xie
                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 92+ messages in thread
From: Huawei Xie @ 2015-10-29 14:53 UTC (permalink / raw)
  To: dev

Would move all rx/tx related declarations into this header file in future.
Add RTE_VIRTIO_PMD_MAX_BURST.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 drivers/net/virtio/virtio_ethdev.c |  1 +
 drivers/net/virtio/virtio_rxtx.c   |  1 +
 drivers/net/virtio/virtio_rxtx.h   | 34 ++++++++++++++++++++++++++++++++++
 3 files changed, 36 insertions(+)
 create mode 100644 drivers/net/virtio/virtio_rxtx.h

diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index 465d3cd..79a3640 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -61,6 +61,7 @@
 #include "virtio_pci.h"
 #include "virtio_logs.h"
 #include "virtqueue.h"
+#include "virtio_rxtx.h"
 
 
 static int eth_virtio_dev_init(struct rte_eth_dev *eth_dev);
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index c5b53bb..9324f7f 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -54,6 +54,7 @@
 #include "virtio_logs.h"
 #include "virtio_ethdev.h"
 #include "virtqueue.h"
+#include "virtio_rxtx.h"
 
 #ifdef RTE_LIBRTE_VIRTIO_DEBUG_DUMP
 #define VIRTIO_DUMP_PACKET(m, len) rte_pktmbuf_dump(stdout, m, len)
diff --git a/drivers/net/virtio/virtio_rxtx.h b/drivers/net/virtio/virtio_rxtx.h
new file mode 100644
index 0000000..a10aa69
--- /dev/null
+++ b/drivers/net/virtio/virtio_rxtx.h
@@ -0,0 +1,34 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#define RTE_PMD_VIRTIO_RX_MAX_BURST 64
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH v6 2/8] virtio: add software rx ring, fake_buf into virtqueue
  2015-10-29 14:53 ` [dpdk-dev] [PATCH v6 0/8] " Huawei Xie
  2015-10-29 14:53   ` [dpdk-dev] [PATCH v6 1/8] virtio: add virtio_rxtx.h header file Huawei Xie
@ 2015-10-29 14:53   ` Huawei Xie
  2015-10-30 18:13     ` Thomas Monjalon
  2015-10-29 14:53   ` [dpdk-dev] [PATCH v6 3/8] virtio: rx/tx ring layout optimization Huawei Xie
                     ` (7 subsequent siblings)
  9 siblings, 1 reply; 92+ messages in thread
From: Huawei Xie @ 2015-10-29 14:53 UTC (permalink / raw)
  To: dev

Changes in v3:
- Remove unnecessary NULL test for rte_free
- Remove unnecessary assign of local var vq after free

Add software RX ring in virtqueue.
Add fake_mbuf in virtqueue for wraparound processing.
Use global simple_rxtx to indicate whether simple rxtx is enabled

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 drivers/net/virtio/virtio_ethdev.c | 11 ++++++++++-
 drivers/net/virtio/virtio_rxtx.c   |  7 +++++++
 drivers/net/virtio/virtqueue.h     |  4 ++++
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index 79a3640..82676d3 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -247,8 +247,8 @@ virtio_dev_queue_release(struct virtqueue *vq) {
 		VIRTIO_WRITE_REG_2(hw, VIRTIO_PCI_QUEUE_SEL, vq->queue_id);
 		VIRTIO_WRITE_REG_4(hw, VIRTIO_PCI_QUEUE_PFN, 0);
 
+		rte_free(vq->sw_ring);
 		rte_free(vq);
-		vq = NULL;
 	}
 }
 
@@ -292,6 +292,9 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
 			dev->data->port_id, queue_idx);
 		vq = rte_zmalloc(vq_name, sizeof(struct virtqueue) +
 			vq_size * sizeof(struct vq_desc_extra), RTE_CACHE_LINE_SIZE);
+		vq->sw_ring = rte_zmalloc_socket("rxq->sw_ring",
+			(RTE_PMD_VIRTIO_RX_MAX_BURST + vq_size) *
+			sizeof(vq->sw_ring[0]), RTE_CACHE_LINE_SIZE, socket_id);
 	} else if (queue_type == VTNET_TQ) {
 		snprintf(vq_name, sizeof(vq_name), "port%d_tvq%d",
 			dev->data->port_id, queue_idx);
@@ -308,6 +311,12 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
 		PMD_INIT_LOG(ERR, "%s: Can not allocate virtqueue", __func__);
 		return (-ENOMEM);
 	}
+	if (queue_type == VTNET_RQ && vq->sw_ring == NULL) {
+		PMD_INIT_LOG(ERR, "%s: Can not allocate RX soft ring",
+			__func__);
+		rte_free(vq);
+		return -ENOMEM;
+	}
 
 	vq->hw = hw;
 	vq->port_id = dev->data->port_id;
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 9324f7f..5c00e9d 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -62,6 +62,8 @@
 #define  VIRTIO_DUMP_PACKET(m, len) do { } while (0)
 #endif
 
+static int use_simple_rxtx;
+
 static void
 vq_ring_free_chain(struct virtqueue *vq, uint16_t desc_idx)
 {
@@ -299,6 +301,11 @@ virtio_dev_vring_start(struct virtqueue *vq, int queue_type)
 		/* Allocate blank mbufs for the each rx descriptor */
 		nbufs = 0;
 		error = ENOSPC;
+
+		memset(&vq->fake_mbuf, 0, sizeof(vq->fake_mbuf));
+		for (i = 0; i < RTE_PMD_VIRTIO_RX_MAX_BURST; i++)
+			vq->sw_ring[vq->vq_nentries + i] = &vq->fake_mbuf;
+
 		while (!virtqueue_full(vq)) {
 			m = rte_rxmbuf_alloc(vq->mpool);
 			if (m == NULL)
diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
index 7789411..6a1ec48 100644
--- a/drivers/net/virtio/virtqueue.h
+++ b/drivers/net/virtio/virtqueue.h
@@ -190,6 +190,10 @@ struct virtqueue {
 	uint16_t vq_avail_idx;
 	phys_addr_t virtio_net_hdr_mem; /**< hdr for each xmit packet */
 
+	struct rte_mbuf **sw_ring; /**< RX software ring. */
+	/* dummy mbuf, for wraparound when processing RX ring. */
+	struct rte_mbuf fake_mbuf;
+
 	/* Statistics */
 	uint64_t	packets;
 	uint64_t	bytes;
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH v6 3/8] virtio: rx/tx ring layout optimization
  2015-10-29 14:53 ` [dpdk-dev] [PATCH v6 0/8] " Huawei Xie
  2015-10-29 14:53   ` [dpdk-dev] [PATCH v6 1/8] virtio: add virtio_rxtx.h header file Huawei Xie
  2015-10-29 14:53   ` [dpdk-dev] [PATCH v6 2/8] virtio: add software rx ring, fake_buf into virtqueue Huawei Xie
@ 2015-10-29 14:53   ` Huawei Xie
  2015-10-29 14:53   ` [dpdk-dev] [PATCH v6 4/8] virtio: fill RX avail ring with blank mbufs Huawei Xie
                     ` (6 subsequent siblings)
  9 siblings, 0 replies; 92+ messages in thread
From: Huawei Xie @ 2015-10-29 14:53 UTC (permalink / raw)
  To: dev

Changes in V4:
- fix the error in tx ring layout chart in this commit message.

In DPDK based switching envrioment, mostly vhost runs on a dedicated core
while virtio processing in guest VMs runs on different cores.
Take RX for example, with generic implementation, for each guest buffer,
a) virtio driver allocates a descriptor from free descriptor list
b) modify the entry of avail ring to point to allocated descriptor
c) after packet is received, free the descriptor

When vhost fetches the avail ring, it need to fetch the modified L1 cache from
virtio core, which is a heavy cost in current CPU implementation.

This idea of this optimization is:
    allocate the fixed descriptor for each entry of avail ring, so avail ring will
always be the same during the run.
This removes L1M cache transfer from virtio core to vhost core for avail ring.
(Note we couldn't avoid the cache transfer for descriptors).
Besides, descriptor allocation and free operation is eliminated.
This also makes vector procesing possible to further accelerate the processing.

This is the layout for the avail ring(take 256 ring entries for example), with
each entry pointing to the descriptor with the same index.
                    avail
                    idx
                    +
                    |
+----+----+---+-------------+------+
| 0  | 1  | 2 | ... |  254  | 255  |  avail ring
+-+--+-+--+-+-+---------+---+--+---+
  |    |    |       |   |      |
  |    |    |       |   |      |
  v    v    v       |   v      v
+-+--+-+--+-+-+---------+---+--+---+
| 0  | 1  | 2 | ... |  254  | 255  |  desc ring
+----+----+---+-------------+------+
                    |
                    |
+----+----+---+-------------+------+
| 0  | 1  | 2 |     |  254  | 255  |  used ring
+----+----+---+-------------+------+
                    |
                    +

This is the ring layout for TX.
As we need one virtio header for each xmit packet, we have 128 slots available.

                         ++
                         ||
                         ||
+-----+-----+-----+--------------+------+------+------+
|  0  |  1  | ... |  127 || 128  | 129  | ...  | 255  |   avail ring
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
   |     |            |  ||  |      |             |
   v     v            v  ||  v      v             v
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
| 128 | 129 | ... |  255 || 128  | 129  | ...  | 255  |   desc ring for virtio_net_hdr
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
   |     |            |  ||  |      |             |
   v     v            v  ||  v      v             v
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
|  0  |  1  | ... |  127 ||  0   |  1   | ...  | 127  |   desc ring for tx dat
+-----+-----+-----+--------------+------+------+------+
                         ||
                         ||
                         ++

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 drivers/net/virtio/virtio_rxtx.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 5c00e9d..7c82a6a 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -302,6 +302,12 @@ virtio_dev_vring_start(struct virtqueue *vq, int queue_type)
 		nbufs = 0;
 		error = ENOSPC;
 
+		if (use_simple_rxtx)
+			for (i = 0; i < vq->vq_nentries; i++) {
+				vq->vq_ring.avail->ring[i] = i;
+				vq->vq_ring.desc[i].flags = VRING_DESC_F_WRITE;
+			}
+
 		memset(&vq->fake_mbuf, 0, sizeof(vq->fake_mbuf));
 		for (i = 0; i < RTE_PMD_VIRTIO_RX_MAX_BURST; i++)
 			vq->sw_ring[vq->vq_nentries + i] = &vq->fake_mbuf;
@@ -332,6 +338,24 @@ virtio_dev_vring_start(struct virtqueue *vq, int queue_type)
 		VIRTIO_WRITE_REG_4(vq->hw, VIRTIO_PCI_QUEUE_PFN,
 			vq->mz->phys_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT);
 	} else if (queue_type == VTNET_TQ) {
+		if (use_simple_rxtx) {
+			int mid_idx  = vq->vq_nentries >> 1;
+			for (i = 0; i < mid_idx; i++) {
+				vq->vq_ring.avail->ring[i] = i + mid_idx;
+				vq->vq_ring.desc[i + mid_idx].next = i;
+				vq->vq_ring.desc[i + mid_idx].addr =
+					vq->virtio_net_hdr_mem +
+						mid_idx * vq->hw->vtnet_hdr_size;
+				vq->vq_ring.desc[i + mid_idx].len =
+					vq->hw->vtnet_hdr_size;
+				vq->vq_ring.desc[i + mid_idx].flags =
+					VRING_DESC_F_NEXT;
+				vq->vq_ring.desc[i].flags = 0;
+			}
+			for (i = mid_idx; i < vq->vq_nentries; i++)
+				vq->vq_ring.avail->ring[i] = i;
+		}
+
 		VIRTIO_WRITE_REG_2(vq->hw, VIRTIO_PCI_QUEUE_SEL,
 			vq->vq_queue_index);
 		VIRTIO_WRITE_REG_4(vq->hw, VIRTIO_PCI_QUEUE_PFN,
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH v6 4/8] virtio: fill RX avail ring with blank mbufs
  2015-10-29 14:53 ` [dpdk-dev] [PATCH v6 0/8] " Huawei Xie
                     ` (2 preceding siblings ...)
  2015-10-29 14:53   ` [dpdk-dev] [PATCH v6 3/8] virtio: rx/tx ring layout optimization Huawei Xie
@ 2015-10-29 14:53   ` Huawei Xie
  2015-10-29 14:53   ` [dpdk-dev] [PATCH v6 5/8] virtio: virtio vec rx Huawei Xie
                     ` (5 subsequent siblings)
  9 siblings, 0 replies; 92+ messages in thread
From: Huawei Xie @ 2015-10-29 14:53 UTC (permalink / raw)
  To: dev

fill avail ring with blank mbufs in virtio_dev_vring_start

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 drivers/net/virtio/Makefile             |  2 +-
 drivers/net/virtio/virtio_rxtx.c        |  6 ++-
 drivers/net/virtio/virtio_rxtx.h        |  3 ++
 drivers/net/virtio/virtio_rxtx_simple.c | 84 +++++++++++++++++++++++++++++++++
 4 files changed, 92 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/virtio/virtio_rxtx_simple.c

diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
index 930b60f..43835ba 100644
--- a/drivers/net/virtio/Makefile
+++ b/drivers/net/virtio/Makefile
@@ -50,7 +50,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtqueue.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_pci.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_ethdev.c
-
+SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_simple.c
 
 # this lib depends upon:
 DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_eal lib/librte_ether
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 7c82a6a..5162ce6 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -320,8 +320,10 @@ virtio_dev_vring_start(struct virtqueue *vq, int queue_type)
 			/******************************************
 			*         Enqueue allocated buffers        *
 			*******************************************/
-			error = virtqueue_enqueue_recv_refill(vq, m);
-
+			if (use_simple_rxtx)
+				error = virtqueue_enqueue_recv_refill_simple(vq, m);
+			else
+				error = virtqueue_enqueue_recv_refill(vq, m);
 			if (error) {
 				rte_pktmbuf_free(m);
 				break;
diff --git a/drivers/net/virtio/virtio_rxtx.h b/drivers/net/virtio/virtio_rxtx.h
index a10aa69..7d2d8fe 100644
--- a/drivers/net/virtio/virtio_rxtx.h
+++ b/drivers/net/virtio/virtio_rxtx.h
@@ -32,3 +32,6 @@
  */
 
 #define RTE_PMD_VIRTIO_RX_MAX_BURST 64
+
+int virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,
+	struct rte_mbuf *m);
diff --git a/drivers/net/virtio/virtio_rxtx_simple.c b/drivers/net/virtio/virtio_rxtx_simple.c
new file mode 100644
index 0000000..cac5b9f
--- /dev/null
+++ b/drivers/net/virtio/virtio_rxtx_simple.c
@@ -0,0 +1,84 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <errno.h>
+
+#include <tmmintrin.h>
+
+#include <rte_cycles.h>
+#include <rte_memory.h>
+#include <rte_memzone.h>
+#include <rte_branch_prediction.h>
+#include <rte_mempool.h>
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+#include <rte_ether.h>
+#include <rte_ethdev.h>
+#include <rte_prefetch.h>
+#include <rte_string_fns.h>
+#include <rte_errno.h>
+#include <rte_byteorder.h>
+
+#include "virtio_logs.h"
+#include "virtio_ethdev.h"
+#include "virtqueue.h"
+#include "virtio_rxtx.h"
+
+int __attribute__((cold))
+virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,
+	struct rte_mbuf *cookie)
+{
+	struct vq_desc_extra *dxp;
+	struct vring_desc *start_dp;
+	uint16_t desc_idx;
+
+	desc_idx = vq->vq_avail_idx & (vq->vq_nentries - 1);
+	dxp = &vq->vq_descx[desc_idx];
+	dxp->cookie = (void *)cookie;
+	vq->sw_ring[desc_idx] = cookie;
+
+	start_dp = vq->vq_ring.desc;
+	start_dp[desc_idx].addr = (uint64_t)((uintptr_t)cookie->buf_physaddr +
+		RTE_PKTMBUF_HEADROOM - sizeof(struct virtio_net_hdr));
+	start_dp[desc_idx].len = cookie->buf_len -
+		RTE_PKTMBUF_HEADROOM + sizeof(struct virtio_net_hdr);
+
+	vq->vq_free_cnt--;
+	vq->vq_avail_idx++;
+
+	return 0;
+}
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH v6 5/8] virtio: virtio vec rx
  2015-10-29 14:53 ` [dpdk-dev] [PATCH v6 0/8] " Huawei Xie
                     ` (3 preceding siblings ...)
  2015-10-29 14:53   ` [dpdk-dev] [PATCH v6 4/8] virtio: fill RX avail ring with blank mbufs Huawei Xie
@ 2015-10-29 14:53   ` Huawei Xie
  2015-10-30 18:19     ` Thomas Monjalon
  2015-10-29 14:53   ` [dpdk-dev] [PATCH v6 6/8] virtio: simple tx routine Huawei Xie
                     ` (4 subsequent siblings)
  9 siblings, 1 reply; 92+ messages in thread
From: Huawei Xie @ 2015-10-29 14:53 UTC (permalink / raw)
  To: dev

With fixed avail ring, we don't need to get desc idx from avail ring.
virtio driver only has to deal with desc ring.
This patch uses vector instruction to accelerate processing desc ring.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 drivers/net/virtio/virtio_ethdev.h      |   2 +
 drivers/net/virtio/virtio_rxtx.c        |   3 +
 drivers/net/virtio/virtio_rxtx.h        |   2 +
 drivers/net/virtio/virtio_rxtx_simple.c | 224 ++++++++++++++++++++++++++++++++
 drivers/net/virtio/virtqueue.h          |   1 +
 5 files changed, 232 insertions(+)

diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h
index 9026d42..d7797ab 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -108,6 +108,8 @@ uint16_t virtio_recv_mergeable_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 uint16_t virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
+		uint16_t nb_pkts);
 
 /*
  * The VIRTIO_NET_F_GUEST_TSO[46] features permit the host to send us
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 5162ce6..947fc46 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -432,6 +432,9 @@ virtio_dev_rx_queue_setup(struct rte_eth_dev *dev,
 	vq->mpool = mp;
 
 	dev->data->rx_queues[queue_idx] = vq;
+
+	virtio_rxq_vec_setup(vq);
+
 	return 0;
 }
 
diff --git a/drivers/net/virtio/virtio_rxtx.h b/drivers/net/virtio/virtio_rxtx.h
index 7d2d8fe..831e492 100644
--- a/drivers/net/virtio/virtio_rxtx.h
+++ b/drivers/net/virtio/virtio_rxtx.h
@@ -33,5 +33,7 @@
 
 #define RTE_PMD_VIRTIO_RX_MAX_BURST 64
 
+int virtio_rxq_vec_setup(struct virtqueue *rxq);
+
 int virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,
 	struct rte_mbuf *m);
diff --git a/drivers/net/virtio/virtio_rxtx_simple.c b/drivers/net/virtio/virtio_rxtx_simple.c
index cac5b9f..ef17562 100644
--- a/drivers/net/virtio/virtio_rxtx_simple.c
+++ b/drivers/net/virtio/virtio_rxtx_simple.c
@@ -58,6 +58,10 @@
 #include "virtqueue.h"
 #include "virtio_rxtx.h"
 
+#define RTE_VIRTIO_VPMD_RX_BURST 32
+#define RTE_VIRTIO_DESC_PER_LOOP 8
+#define RTE_VIRTIO_VPMD_RX_REARM_THRESH RTE_VIRTIO_VPMD_RX_BURST
+
 int __attribute__((cold))
 virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,
 	struct rte_mbuf *cookie)
@@ -82,3 +86,223 @@ virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,
 
 	return 0;
 }
+
+static inline void
+virtio_rxq_rearm_vec(struct virtqueue *rxvq)
+{
+	int i;
+	uint16_t desc_idx;
+	struct rte_mbuf **sw_ring;
+	struct vring_desc *start_dp;
+	int ret;
+
+	desc_idx = rxvq->vq_avail_idx & (rxvq->vq_nentries - 1);
+	sw_ring = &rxvq->sw_ring[desc_idx];
+	start_dp = &rxvq->vq_ring.desc[desc_idx];
+
+	ret = rte_mempool_get_bulk(rxvq->mpool, (void **)sw_ring,
+		RTE_VIRTIO_VPMD_RX_REARM_THRESH);
+	if (unlikely(ret)) {
+		rte_eth_devices[rxvq->port_id].data->rx_mbuf_alloc_failed +=
+			RTE_VIRTIO_VPMD_RX_REARM_THRESH;
+		return;
+	}
+
+	for (i = 0; i < RTE_VIRTIO_VPMD_RX_REARM_THRESH; i++) {
+		uintptr_t p;
+
+		p = (uintptr_t)&sw_ring[i]->rearm_data;
+		*(uint64_t *)p = rxvq->mbuf_initializer;
+
+		start_dp[i].addr =
+			(uint64_t)((uintptr_t)sw_ring[i]->buf_physaddr +
+			RTE_PKTMBUF_HEADROOM - sizeof(struct virtio_net_hdr));
+		start_dp[i].len = sw_ring[i]->buf_len -
+			RTE_PKTMBUF_HEADROOM + sizeof(struct virtio_net_hdr);
+	}
+
+	rxvq->vq_avail_idx += RTE_VIRTIO_VPMD_RX_REARM_THRESH;
+	rxvq->vq_free_cnt -= RTE_VIRTIO_VPMD_RX_REARM_THRESH;
+	vq_update_avail_idx(rxvq);
+}
+
+/* virtio vPMD receive routine, only accept(nb_pkts >= RTE_VIRTIO_DESC_PER_LOOP)
+ *
+ * This routine is for non-mergable RX, one desc for each guest buffer.
+ * This routine is based on the RX ring layout optimization. Each entry in the
+ * avail ring points to the desc with the same index in the desc ring and this
+ * will never be changed in the driver.
+ *
+ * - nb_pkts < RTE_VIRTIO_DESC_PER_LOOP, just return no packet
+ */
+uint16_t
+virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
+	uint16_t nb_pkts)
+{
+	struct virtqueue *rxvq = rx_queue;
+	uint16_t nb_used;
+	uint16_t desc_idx;
+	struct vring_used_elem *rused;
+	struct rte_mbuf **sw_ring;
+	struct rte_mbuf **sw_ring_end;
+	uint16_t nb_pkts_received;
+	__m128i shuf_msk1, shuf_msk2, len_adjust;
+
+	shuf_msk1 = _mm_set_epi8(
+		0xFF, 0xFF, 0xFF, 0xFF,
+		0xFF, 0xFF,		/* vlan tci */
+		5, 4,			/* dat len */
+		0xFF, 0xFF, 5, 4,	/* pkt len */
+		0xFF, 0xFF, 0xFF, 0xFF	/* packet type */
+
+	);
+
+	shuf_msk2 = _mm_set_epi8(
+		0xFF, 0xFF, 0xFF, 0xFF,
+		0xFF, 0xFF,		/* vlan tci */
+		13, 12,			/* dat len */
+		0xFF, 0xFF, 13, 12,	/* pkt len */
+		0xFF, 0xFF, 0xFF, 0xFF	/* packet type */
+	);
+
+	/* Substract the header length.
+	*  In which case do we need the header length in used->len ?
+	*/
+	len_adjust = _mm_set_epi16(
+		0, 0,
+		0,
+		(uint16_t) -sizeof(struct virtio_net_hdr),
+		0, (uint16_t) -sizeof(struct virtio_net_hdr),
+		0, 0);
+
+	if (unlikely(nb_pkts < RTE_VIRTIO_DESC_PER_LOOP))
+		return 0;
+
+	nb_used = *(volatile uint16_t *)&rxvq->vq_ring.used->idx -
+		rxvq->vq_used_cons_idx;
+
+	rte_compiler_barrier();
+
+	if (unlikely(nb_used == 0))
+		return 0;
+
+	nb_pkts = RTE_ALIGN_FLOOR(nb_pkts, RTE_VIRTIO_DESC_PER_LOOP);
+	nb_used = RTE_MIN(nb_used, nb_pkts);
+
+	desc_idx = (uint16_t)(rxvq->vq_used_cons_idx & (rxvq->vq_nentries - 1));
+	rused = &rxvq->vq_ring.used->ring[desc_idx];
+	sw_ring  = &rxvq->sw_ring[desc_idx];
+	sw_ring_end = &rxvq->sw_ring[rxvq->vq_nentries];
+
+	_mm_prefetch((const void *)rused, _MM_HINT_T0);
+
+	if (rxvq->vq_free_cnt >= RTE_VIRTIO_VPMD_RX_REARM_THRESH) {
+		virtio_rxq_rearm_vec(rxvq);
+		if (unlikely(virtqueue_kick_prepare(rxvq)))
+			virtqueue_notify(rxvq);
+	}
+
+	for (nb_pkts_received = 0;
+		nb_pkts_received < nb_used;) {
+		__m128i desc[RTE_VIRTIO_DESC_PER_LOOP / 2];
+		__m128i mbp[RTE_VIRTIO_DESC_PER_LOOP / 2];
+		__m128i pkt_mb[RTE_VIRTIO_DESC_PER_LOOP];
+
+		mbp[0] = _mm_loadu_si128((__m128i *)(sw_ring + 0));
+		desc[0] = _mm_loadu_si128((__m128i *)(rused + 0));
+		_mm_storeu_si128((__m128i *)&rx_pkts[0], mbp[0]);
+
+		mbp[1] = _mm_loadu_si128((__m128i *)(sw_ring + 2));
+		desc[1] = _mm_loadu_si128((__m128i *)(rused + 2));
+		_mm_storeu_si128((__m128i *)&rx_pkts[2], mbp[1]);
+
+		mbp[2] = _mm_loadu_si128((__m128i *)(sw_ring + 4));
+		desc[2] = _mm_loadu_si128((__m128i *)(rused + 4));
+		_mm_storeu_si128((__m128i *)&rx_pkts[4], mbp[2]);
+
+		mbp[3] = _mm_loadu_si128((__m128i *)(sw_ring + 6));
+		desc[3] = _mm_loadu_si128((__m128i *)(rused + 6));
+		_mm_storeu_si128((__m128i *)&rx_pkts[6], mbp[3]);
+
+		pkt_mb[1] = _mm_shuffle_epi8(desc[0], shuf_msk2);
+		pkt_mb[0] = _mm_shuffle_epi8(desc[0], shuf_msk1);
+		pkt_mb[1] = _mm_add_epi16(pkt_mb[1], len_adjust);
+		pkt_mb[0] = _mm_add_epi16(pkt_mb[0], len_adjust);
+		_mm_storeu_si128((void *)&rx_pkts[1]->rx_descriptor_fields1,
+			pkt_mb[1]);
+		_mm_storeu_si128((void *)&rx_pkts[0]->rx_descriptor_fields1,
+			pkt_mb[0]);
+
+		pkt_mb[3] = _mm_shuffle_epi8(desc[1], shuf_msk2);
+		pkt_mb[2] = _mm_shuffle_epi8(desc[1], shuf_msk1);
+		pkt_mb[3] = _mm_add_epi16(pkt_mb[3], len_adjust);
+		pkt_mb[2] = _mm_add_epi16(pkt_mb[2], len_adjust);
+		_mm_storeu_si128((void *)&rx_pkts[3]->rx_descriptor_fields1,
+			pkt_mb[3]);
+		_mm_storeu_si128((void *)&rx_pkts[2]->rx_descriptor_fields1,
+			pkt_mb[2]);
+
+		pkt_mb[5] = _mm_shuffle_epi8(desc[2], shuf_msk2);
+		pkt_mb[4] = _mm_shuffle_epi8(desc[2], shuf_msk1);
+		pkt_mb[5] = _mm_add_epi16(pkt_mb[5], len_adjust);
+		pkt_mb[4] = _mm_add_epi16(pkt_mb[4], len_adjust);
+		_mm_storeu_si128((void *)&rx_pkts[5]->rx_descriptor_fields1,
+			pkt_mb[5]);
+		_mm_storeu_si128((void *)&rx_pkts[4]->rx_descriptor_fields1,
+			pkt_mb[4]);
+
+		pkt_mb[7] = _mm_shuffle_epi8(desc[3], shuf_msk2);
+		pkt_mb[6] = _mm_shuffle_epi8(desc[3], shuf_msk1);
+		pkt_mb[7] = _mm_add_epi16(pkt_mb[7], len_adjust);
+		pkt_mb[6] = _mm_add_epi16(pkt_mb[6], len_adjust);
+		_mm_storeu_si128((void *)&rx_pkts[7]->rx_descriptor_fields1,
+			pkt_mb[7]);
+		_mm_storeu_si128((void *)&rx_pkts[6]->rx_descriptor_fields1,
+			pkt_mb[6]);
+
+		if (unlikely(nb_used <= RTE_VIRTIO_DESC_PER_LOOP)) {
+			if (sw_ring + nb_used <= sw_ring_end)
+				nb_pkts_received += nb_used;
+			else
+				nb_pkts_received += sw_ring_end - sw_ring;
+			break;
+		} else {
+			if (unlikely(sw_ring + RTE_VIRTIO_DESC_PER_LOOP >=
+				sw_ring_end)) {
+				nb_pkts_received += sw_ring_end - sw_ring;
+				break;
+			} else {
+				nb_pkts_received += RTE_VIRTIO_DESC_PER_LOOP;
+
+				rx_pkts += RTE_VIRTIO_DESC_PER_LOOP;
+				sw_ring += RTE_VIRTIO_DESC_PER_LOOP;
+				rused   += RTE_VIRTIO_DESC_PER_LOOP;
+				nb_used -= RTE_VIRTIO_DESC_PER_LOOP;
+			}
+		}
+	}
+
+	rxvq->vq_used_cons_idx += nb_pkts_received;
+	rxvq->vq_free_cnt += nb_pkts_received;
+	rxvq->packets += nb_pkts_received;
+	return nb_pkts_received;
+}
+
+int __attribute__((cold))
+virtio_rxq_vec_setup(struct virtqueue *rxq)
+{
+	uintptr_t p;
+	struct rte_mbuf mb_def = { .buf_addr = 0 }; /* zeroed mbuf */
+
+	mb_def.nb_segs = 1;
+	mb_def.data_off = RTE_PKTMBUF_HEADROOM;
+	mb_def.port = rxq->port_id;
+	rte_mbuf_refcnt_set(&mb_def, 1);
+
+	/* prevent compiler reordering: rearm_data covers previous fields */
+	rte_compiler_barrier();
+	p = (uintptr_t)&mb_def.rearm_data;
+	rxq->mbuf_initializer = *(uint64_t *)p;
+
+	return 0;
+}
diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
index 6a1ec48..98a77d5 100644
--- a/drivers/net/virtio/virtqueue.h
+++ b/drivers/net/virtio/virtqueue.h
@@ -188,6 +188,7 @@ struct virtqueue {
 	 */
 	uint16_t vq_used_cons_idx;
 	uint16_t vq_avail_idx;
+	uint64_t mbuf_initializer; /**< value to init mbufs. */
 	phys_addr_t virtio_net_hdr_mem; /**< hdr for each xmit packet */
 
 	struct rte_mbuf **sw_ring; /**< RX software ring. */
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH v6 6/8] virtio: simple tx routine
  2015-10-29 14:53 ` [dpdk-dev] [PATCH v6 0/8] " Huawei Xie
                     ` (4 preceding siblings ...)
  2015-10-29 14:53   ` [dpdk-dev] [PATCH v6 5/8] virtio: virtio vec rx Huawei Xie
@ 2015-10-29 14:53   ` Huawei Xie
  2015-10-29 14:53   ` [dpdk-dev] [PATCH v6 7/8] virtio: pick simple rx/tx func Huawei Xie
                     ` (3 subsequent siblings)
  9 siblings, 0 replies; 92+ messages in thread
From: Huawei Xie @ 2015-10-29 14:53 UTC (permalink / raw)
  To: dev

Changes in v5:
- call __rte_pktmbuf_prefree_seg to check refcnt when free mbufs

Changes in v4:
- move virtio_xmit_cleanup ahead to free descriptors earlier

Changes in v3:
- Remove return at the end of void function
- Remove always_inline attribute for virtio_xmit_cleanup

bulk free of mbufs when clean used ring.
shift operation of idx could be saved if vq_free_cnt means
free slots rather than free descriptors.

TODO: rearrange vq data structure, pack the stats var together so that we
could use one vec instruction to update all of them.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 drivers/net/virtio/virtio_ethdev.h      |   3 +
 drivers/net/virtio/virtio_rxtx_simple.c | 106 ++++++++++++++++++++++++++++++++
 2 files changed, 109 insertions(+)

diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h
index d7797ab..ae2d47d 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -111,6 +111,9 @@ uint16_t virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 uint16_t virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t virtio_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
 /*
  * The VIRTIO_NET_F_GUEST_TSO[46] features permit the host to send us
  * frames larger than 1514 bytes. We do not yet support software LRO
diff --git a/drivers/net/virtio/virtio_rxtx_simple.c b/drivers/net/virtio/virtio_rxtx_simple.c
index ef17562..624e789 100644
--- a/drivers/net/virtio/virtio_rxtx_simple.c
+++ b/drivers/net/virtio/virtio_rxtx_simple.c
@@ -288,6 +288,112 @@ virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
 	return nb_pkts_received;
 }
 
+#define VIRTIO_TX_FREE_THRESH 32
+#define VIRTIO_TX_MAX_FREE_BUF_SZ 32
+#define VIRTIO_TX_FREE_NR 32
+/* TODO: vq->tx_free_cnt could mean num of free slots so we could avoid shift */
+static inline void
+virtio_xmit_cleanup(struct virtqueue *vq)
+{
+	uint16_t i, desc_idx;
+	int nb_free = 0;
+	struct rte_mbuf *m, *free[VIRTIO_TX_MAX_FREE_BUF_SZ];
+
+	desc_idx = (uint16_t)(vq->vq_used_cons_idx &
+		   ((vq->vq_nentries >> 1) - 1));
+	m = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
+	m = __rte_pktmbuf_prefree_seg(m);
+	if (likely(m != NULL)) {
+		free[0] = m;
+		nb_free = 1;
+		for (i = 1; i < VIRTIO_TX_FREE_NR; i++) {
+			m = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
+			m = __rte_pktmbuf_prefree_seg(m);
+			if (likely(m != NULL)) {
+				if (likely(m->pool == free[0]->pool))
+					free[nb_free++] = m;
+				else {
+					rte_mempool_put_bulk(free[0]->pool,
+						(void **)free, nb_free);
+					free[0] = m;
+					nb_free = 1;
+				}
+			}
+		}
+		rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
+	} else {
+		for (i = 1; i < VIRTIO_TX_FREE_NR; i++) {
+			m = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
+			m = __rte_pktmbuf_prefree_seg(m);
+			if (m != NULL)
+				rte_mempool_put(m->pool, m);
+		}
+	}
+
+	vq->vq_used_cons_idx += VIRTIO_TX_FREE_NR;
+	vq->vq_free_cnt += (VIRTIO_TX_FREE_NR << 1);
+}
+
+uint16_t
+virtio_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
+	uint16_t nb_pkts)
+{
+	struct virtqueue *txvq = tx_queue;
+	uint16_t nb_used;
+	uint16_t desc_idx;
+	struct vring_desc *start_dp;
+	uint16_t nb_tail, nb_commit;
+	int i;
+	uint16_t desc_idx_max = (txvq->vq_nentries >> 1) - 1;
+
+	nb_used = VIRTQUEUE_NUSED(txvq);
+	rte_compiler_barrier();
+
+	if (nb_used >= VIRTIO_TX_FREE_THRESH)
+		virtio_xmit_cleanup(tx_queue);
+
+	nb_commit = nb_pkts = RTE_MIN((txvq->vq_free_cnt >> 1), nb_pkts);
+	desc_idx = (uint16_t) (txvq->vq_avail_idx & desc_idx_max);
+	start_dp = txvq->vq_ring.desc;
+	nb_tail = (uint16_t) (desc_idx_max + 1 - desc_idx);
+
+	if (nb_commit >= nb_tail) {
+		for (i = 0; i < nb_tail; i++)
+			txvq->vq_descx[desc_idx + i].cookie = tx_pkts[i];
+		for (i = 0; i < nb_tail; i++) {
+			start_dp[desc_idx].addr =
+				RTE_MBUF_DATA_DMA_ADDR(*tx_pkts);
+			start_dp[desc_idx].len = (*tx_pkts)->pkt_len;
+			tx_pkts++;
+			desc_idx++;
+		}
+		nb_commit -= nb_tail;
+		desc_idx = 0;
+	}
+	for (i = 0; i < nb_commit; i++)
+		txvq->vq_descx[desc_idx + i].cookie = tx_pkts[i];
+	for (i = 0; i < nb_commit; i++) {
+		start_dp[desc_idx].addr = RTE_MBUF_DATA_DMA_ADDR(*tx_pkts);
+		start_dp[desc_idx].len = (*tx_pkts)->pkt_len;
+		tx_pkts++;
+		desc_idx++;
+	}
+
+	rte_compiler_barrier();
+
+	txvq->vq_free_cnt -= (uint16_t)(nb_pkts << 1);
+	txvq->vq_avail_idx += nb_pkts;
+	txvq->vq_ring.avail->idx = txvq->vq_avail_idx;
+	txvq->packets += nb_pkts;
+
+	if (likely(nb_pkts)) {
+		if (unlikely(virtqueue_kick_prepare(txvq)))
+			virtqueue_notify(txvq);
+	}
+
+	return nb_pkts;
+}
+
 int __attribute__((cold))
 virtio_rxq_vec_setup(struct virtqueue *rxq)
 {
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH v6 7/8] virtio: pick simple rx/tx func
  2015-10-29 14:53 ` [dpdk-dev] [PATCH v6 0/8] " Huawei Xie
                     ` (5 preceding siblings ...)
  2015-10-29 14:53   ` [dpdk-dev] [PATCH v6 6/8] virtio: simple tx routine Huawei Xie
@ 2015-10-29 14:53   ` Huawei Xie
  2015-10-29 14:53   ` [dpdk-dev] [PATCH v6 8/8] doc: update release notes 2.2 about virtio performance optimization Huawei Xie
                     ` (2 subsequent siblings)
  9 siblings, 0 replies; 92+ messages in thread
From: Huawei Xie @ 2015-10-29 14:53 UTC (permalink / raw)
  To: dev

Changes in v4:
Check merge-able feature when select simple rx/tx functions.

simple rx/tx func is chose when merge-able rx is disabled and user specifies single segment and
no offload support.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 drivers/net/virtio/virtio_rxtx.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 947fc46..0f1daf2 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -53,6 +53,7 @@
 
 #include "virtio_logs.h"
 #include "virtio_ethdev.h"
+#include "virtio_pci.h"
 #include "virtqueue.h"
 #include "virtio_rxtx.h"
 
@@ -62,6 +63,10 @@
 #define  VIRTIO_DUMP_PACKET(m, len) do { } while (0)
 #endif
 
+
+#define VIRTIO_SIMPLE_FLAGS ((uint32_t)ETH_TXQ_FLAGS_NOMULTSEGS | \
+	ETH_TXQ_FLAGS_NOOFFLOADS)
+
 static int use_simple_rxtx;
 
 static void
@@ -459,6 +464,7 @@ virtio_dev_tx_queue_setup(struct rte_eth_dev *dev,
 			const struct rte_eth_txconf *tx_conf)
 {
 	uint8_t vtpci_queue_idx = 2 * queue_idx + VTNET_SQ_TQ_QUEUE_IDX;
+	struct virtio_hw *hw = dev->data->dev_private;
 	struct virtqueue *vq;
 	uint16_t tx_free_thresh;
 	int ret;
@@ -471,6 +477,15 @@ virtio_dev_tx_queue_setup(struct rte_eth_dev *dev,
 		return -EINVAL;
 	}
 
+	/* Use simple rx/tx func if single segment and no offloads */
+	if ((tx_conf->txq_flags & VIRTIO_SIMPLE_FLAGS) == VIRTIO_SIMPLE_FLAGS &&
+	     !vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF)) {
+		PMD_INIT_LOG(INFO, "Using simple rx/tx path");
+		dev->tx_pkt_burst = virtio_xmit_pkts_simple;
+		dev->rx_pkt_burst = virtio_recv_pkts_vec;
+		use_simple_rxtx = 1;
+	}
+
 	ret = virtio_dev_queue_setup(dev, VTNET_TQ, queue_idx, vtpci_queue_idx,
 			nb_desc, socket_id, &vq);
 	if (ret < 0) {
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [dpdk-dev] [PATCH v6 8/8] doc: update release notes 2.2 about virtio performance optimization
  2015-10-29 14:53 ` [dpdk-dev] [PATCH v6 0/8] " Huawei Xie
                     ` (6 preceding siblings ...)
  2015-10-29 14:53   ` [dpdk-dev] [PATCH v6 7/8] virtio: pick simple rx/tx func Huawei Xie
@ 2015-10-29 14:53   ` Huawei Xie
  2015-10-30  2:05   ` [dpdk-dev] [PATCH v6 0/8] virtio ring layout optimization and simple rx/tx processing Tan, Jianfeng
  2015-11-27  6:03   ` Xu, Qian Q
  9 siblings, 0 replies; 92+ messages in thread
From: Huawei Xie @ 2015-10-29 14:53 UTC (permalink / raw)
  To: dev

Update release notes about virtio ring layout optimization,
vector rx and simple tx support

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 doc/guides/rel_notes/release_2_2.rst | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/doc/guides/rel_notes/release_2_2.rst b/doc/guides/rel_notes/release_2_2.rst
index 682f468..5d0dd6f 100644
--- a/doc/guides/rel_notes/release_2_2.rst
+++ b/doc/guides/rel_notes/release_2_2.rst
@@ -4,6 +4,9 @@ DPDK Release 2.2
 New Features
 ------------
 
+* **Enhanced support for virtio driver.**
+
+  *  Virtio ring layout optimization(fixed avail ring), vector RX and simple tx support
 
 Resolved Issues
 ---------------
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [dpdk-dev] [PATCH v6 0/8] virtio ring layout optimization and simple rx/tx processing
  2015-10-29 14:53 ` [dpdk-dev] [PATCH v6 0/8] " Huawei Xie
                     ` (7 preceding siblings ...)
  2015-10-29 14:53   ` [dpdk-dev] [PATCH v6 8/8] doc: update release notes 2.2 about virtio performance optimization Huawei Xie
@ 2015-10-30  2:05   ` Tan, Jianfeng
  2015-11-02 22:09     ` Thomas Monjalon
  2015-11-27  6:03   ` Xu, Qian Q
  9 siblings, 1 reply; 92+ messages in thread
From: Tan, Jianfeng @ 2015-10-30  2:05 UTC (permalink / raw)
  To: Xie, Huawei, dev



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Huawei Xie
> Sent: Thursday, October 29, 2015 10:53 PM
> To: dev@dpdk.org
> Subject: [dpdk-dev] [PATCH v6 0/8] virtio ring layout optimization and simple
> rx/tx processing
> 
> Changes in v6:
> - Update release notes
> - Fix the error in virtio tx ring layout ascii chart in the cover-letter
> 
......
> Huawei Xie (8):
>   virtio: add virtio_rxtx.h header file
>   virtio: add software rx ring, fake_buf into virtqueue
>   virtio: rx/tx ring layout optimization
>   virtio: fill RX avail ring with blank mbufs
>   virtio: virtio vec rx
>   virtio: simple tx routine
>   virtio: pick simple rx/tx func
>   doc: update release notes 2.2 about virtio performance optimization
> 
>  doc/guides/rel_notes/release_2_2.rst    |   3 +
>  drivers/net/virtio/Makefile             |   2 +-
>  drivers/net/virtio/virtio_ethdev.c      |  12 +-
>  drivers/net/virtio/virtio_ethdev.h      |   5 +
>  drivers/net/virtio/virtio_rxtx.c        |  56 ++++-
>  drivers/net/virtio/virtio_rxtx.h        |  39 +++
>  drivers/net/virtio/virtio_rxtx_simple.c | 414
> ++++++++++++++++++++++++++++++++
>  drivers/net/virtio/virtqueue.h          |   5 +
>  8 files changed, 532 insertions(+), 4 deletions(-)  create mode 100644
> drivers/net/virtio/virtio_rxtx.h  create mode 100644
> drivers/net/virtio/virtio_rxtx_simple.c
> 
> --
> 1.8.1.4

Acked-by: Jianfeng Tan<jianfeng.tan@intel.com>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [dpdk-dev] [PATCH v6 2/8] virtio: add software rx ring, fake_buf into virtqueue
  2015-10-29 14:53   ` [dpdk-dev] [PATCH v6 2/8] virtio: add software rx ring, fake_buf into virtqueue Huawei Xie
@ 2015-10-30 18:13     ` Thomas Monjalon
  0 siblings, 0 replies; 92+ messages in thread
From: Thomas Monjalon @ 2015-10-30 18:13 UTC (permalink / raw)
  To: Huawei Xie; +Cc: dev

2015-10-29 22:53, Huawei Xie:
> +static int use_simple_rxtx;

error: ‘use_simple_rxtx’ defined but not used

I'll try to move it in the next patch.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [dpdk-dev] [PATCH v6 5/8] virtio: virtio vec rx
  2015-10-29 14:53   ` [dpdk-dev] [PATCH v6 5/8] virtio: virtio vec rx Huawei Xie
@ 2015-10-30 18:19     ` Thomas Monjalon
  2015-11-02  2:18       ` Xie, Huawei
  0 siblings, 1 reply; 92+ messages in thread
From: Thomas Monjalon @ 2015-10-30 18:19 UTC (permalink / raw)
  To: Huawei Xie; +Cc: dev

Sorry, there is a clang error.

2015-10-29 22:53, Huawei Xie:
> +	_mm_prefetch((const void *)rused, _MM_HINT_T0);

virtio_rxtx_simple.c:197:2: error: cast from 'const void *' to
      'void *' drops const qualifier

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [dpdk-dev] [PATCH v6 5/8] virtio: virtio vec rx
  2015-10-30 18:19     ` Thomas Monjalon
@ 2015-11-02  2:18       ` Xie, Huawei
  2015-11-02  7:28         ` Thomas Monjalon
  0 siblings, 1 reply; 92+ messages in thread
From: Xie, Huawei @ 2015-11-02  2:18 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

On 10/31/2015 2:21 AM, Thomas Monjalon wrote:
> Sorry, there is a clang error.
>
> 2015-10-29 22:53, Huawei Xie:
>> +	_mm_prefetch((const void *)rused, _MM_HINT_T0);
> virtio_rxtx_simple.c:197:2: error: cast from 'const void *' to
>       'void *' drops const qualifier
This is weird. This conversion is actually from 'void *' to 'const void
*', not the opposite, so there should be no error.
I checked clang build, it doesn't report error.
    clang version 3.3 (tags/RELEASE_33/rc2)


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [dpdk-dev] [PATCH v6 5/8] virtio: virtio vec rx
  2015-11-02  2:18       ` Xie, Huawei
@ 2015-11-02  7:28         ` Thomas Monjalon
  2015-11-02  8:49           ` Xie, Huawei
  0 siblings, 1 reply; 92+ messages in thread
From: Thomas Monjalon @ 2015-11-02  7:28 UTC (permalink / raw)
  To: Xie, Huawei; +Cc: dev

2015-11-02 02:18, Xie, Huawei:
> On 10/31/2015 2:21 AM, Thomas Monjalon wrote:
> > Sorry, there is a clang error.
> >
> > 2015-10-29 22:53, Huawei Xie:
> >> +	_mm_prefetch((const void *)rused, _MM_HINT_T0);
> > virtio_rxtx_simple.c:197:2: error: cast from 'const void *' to
> >       'void *' drops const qualifier
> This is weird. This conversion is actually from 'void *' to 'const void
> *', not the opposite, so there should be no error.
> I checked clang build, it doesn't report error.
>     clang version 3.3 (tags/RELEASE_33/rc2)

I'm using clang 3.6.2.
Anybody else to check please?

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [dpdk-dev] [PATCH v6 5/8] virtio: virtio vec rx
  2015-11-02  7:28         ` Thomas Monjalon
@ 2015-11-02  8:49           ` Xie, Huawei
  2015-11-02  9:03             ` Thomas Monjalon
  0 siblings, 1 reply; 92+ messages in thread
From: Xie, Huawei @ 2015-11-02  8:49 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

On 11/2/2015 3:31 PM, Thomas Monjalon wrote:
> 2015-11-02 02:18, Xie, Huawei:
>> On 10/31/2015 2:21 AM, Thomas Monjalon wrote:
>>> Sorry, there is a clang error.
>>>
>>> 2015-10-29 22:53, Huawei Xie:
>>>> +	_mm_prefetch((const void *)rused, _MM_HINT_T0);
>>> virtio_rxtx_simple.c:197:2: error: cast from 'const void *' to
>>>       'void *' drops const qualifier
>> This is weird. This conversion is actually from 'void *' to 'const void
>> *', not the opposite, so there should be no error.
>> I checked clang build, it doesn't report error.
>>     clang version 3.3 (tags/RELEASE_33/rc2)
> I'm using clang 3.6.2.
> Anybody else to check please?
Thomas:

I checked clang-3.5 on Fedora 22 and clang-3.6 on Ubuntu 15.04.
Clang-3.6 reports warnings, but the definition of this macro doesn't change.

Why (const void*) conversion is used in the code is because when
__OPTIMIZE__ is defined, GCC defines first parameter to be "const void *".

Could you add the following macro(used in other vec pmds as well) before
virtqueue_enqueue_recv_refill_simple or do i need to submit a new patchset?

+#ifndef __INTEL_COMPILER
+#pragma GCC diagnostic ignored "-Wcast-qual"
+#endif



^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [dpdk-dev] [PATCH v6 5/8] virtio: virtio vec rx
  2015-11-02  8:49           ` Xie, Huawei
@ 2015-11-02  9:03             ` Thomas Monjalon
  0 siblings, 0 replies; 92+ messages in thread
From: Thomas Monjalon @ 2015-11-02  9:03 UTC (permalink / raw)
  To: Xie, Huawei; +Cc: dev

2015-11-02 08:49, Xie, Huawei:
> On 11/2/2015 3:31 PM, Thomas Monjalon wrote:
> > 2015-11-02 02:18, Xie, Huawei:
> >> On 10/31/2015 2:21 AM, Thomas Monjalon wrote:
> >>> Sorry, there is a clang error.
> >>>
> >>> 2015-10-29 22:53, Huawei Xie:
> >>>> +	_mm_prefetch((const void *)rused, _MM_HINT_T0);
> >>> virtio_rxtx_simple.c:197:2: error: cast from 'const void *' to
> >>>       'void *' drops const qualifier
> >> This is weird. This conversion is actually from 'void *' to 'const void
> >> *', not the opposite, so there should be no error.
> >> I checked clang build, it doesn't report error.
> >>     clang version 3.3 (tags/RELEASE_33/rc2)
> > I'm using clang 3.6.2.
> > Anybody else to check please?
> Thomas:
> 
> I checked clang-3.5 on Fedora 22 and clang-3.6 on Ubuntu 15.04.
> Clang-3.6 reports warnings, but the definition of this macro doesn't change.
> 
> Why (const void*) conversion is used in the code is because when
> __OPTIMIZE__ is defined, GCC defines first parameter to be "const void *".
> 
> Could you add the following macro(used in other vec pmds as well) before
> virtqueue_enqueue_recv_refill_simple or do i need to submit a new patchset?
> 
> +#ifndef __INTEL_COMPILER
> +#pragma GCC diagnostic ignored "-Wcast-qual"
> +#endif

OK I'll try it.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [dpdk-dev] [PATCH v6 0/8] virtio ring layout optimization and simple rx/tx processing
  2015-10-30  2:05   ` [dpdk-dev] [PATCH v6 0/8] virtio ring layout optimization and simple rx/tx processing Tan, Jianfeng
@ 2015-11-02 22:09     ` Thomas Monjalon
  2015-11-02 22:10       ` Thomas Monjalon
  0 siblings, 1 reply; 92+ messages in thread
From: Thomas Monjalon @ 2015-11-02 22:09 UTC (permalink / raw)
  To: Xie, Huawei; +Cc: dev

> Acked-by: Jianfeng Tan<jianfeng.tan@intel.com>

Applied with the modifications discussed in this thread, thanks.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [dpdk-dev] [PATCH v6 0/8] virtio ring layout optimization and simple rx/tx processing
  2015-11-02 22:09     ` Thomas Monjalon
@ 2015-11-02 22:10       ` Thomas Monjalon
  2015-11-03 10:30         ` Xie, Huawei
  0 siblings, 1 reply; 92+ messages in thread
From: Thomas Monjalon @ 2015-11-02 22:10 UTC (permalink / raw)
  To: Xie, Huawei; +Cc: dev

2015-11-02 23:09, Thomas Monjalon:
> > Acked-by: Jianfeng Tan<jianfeng.tan@intel.com>
> 
> Applied with the modifications discussed in this thread, thanks.

Please Huawei,
Could you share some numbers for these optimizations?

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [dpdk-dev] [PATCH v6 0/8] virtio ring layout optimization and simple rx/tx processing
  2015-11-02 22:10       ` Thomas Monjalon
@ 2015-11-03 10:30         ` Xie, Huawei
  0 siblings, 0 replies; 92+ messages in thread
From: Xie, Huawei @ 2015-11-03 10:30 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

On 11/3/2015 6:12 AM, Thomas Monjalon wrote:
> 2015-11-02 23:09, Thomas Monjalon:
>>> Acked-by: Jianfeng Tan<jianfeng.tan@intel.com>
>> Applied with the modifications discussed in this thread, thanks.
> Please Huawei,
> Could you share some numbers for these optimizations?
Thomas, thanks for the effort.
My optimization is based on highly optimized dpdkvhost(currently vhost
is the bottleneck) of old version dpdk. I would re-prepare a new
optimized dpdkvhost, and measure the performance again.


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [dpdk-dev] [PATCH v6 0/8] virtio ring layout optimization and simple rx/tx processing
  2015-10-29 14:53 ` [dpdk-dev] [PATCH v6 0/8] " Huawei Xie
                     ` (8 preceding siblings ...)
  2015-10-30  2:05   ` [dpdk-dev] [PATCH v6 0/8] virtio ring layout optimization and simple rx/tx processing Tan, Jianfeng
@ 2015-11-27  6:03   ` Xu, Qian Q
  2015-12-17  5:22     ` Xie, Huawei
  9 siblings, 1 reply; 92+ messages in thread
From: Xu, Qian Q @ 2015-11-27  6:03 UTC (permalink / raw)
  To: Xie, Huawei, dev

Some virtio-pmd optimization performance data sharing: 
1. Use simplified vhost-sample, only doing the dequeuer and free, so virtio only tx, then test the virtio tx performance improvement. Then in the VM, using one virtio to do the txonly, and let the virtio tx working. Also modified the txonly file to remove the memory copy part, then check the virtio TX rate. The performance of optimized virtio-pmd will have ~2x performance than the non-optimized virtio-pmd. 
2. Similarly as item1, but use the default txonly file, so with memory copy, then the performance of optimized virtio-pmd will have ~37% performance improvement than the non-optimized virtio-pmd. 
3. In the OVS test scenario, one physical NIC + one virtio in the VM, then let the virtio do the loopback(having rx and tx), running testpmd in the VM, then the performance will have 60% performance improvement than the non-optimized virtio-pmd. 



Thanks
Qian

-----Original Message-----
From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Huawei Xie
Sent: Thursday, October 29, 2015 10:53 PM
To: dev@dpdk.org
Subject: [dpdk-dev] [PATCH v6 0/8] virtio ring layout optimization and simple rx/tx processing

Changes in v6:
- Update release notes
- Fix the error in virtio tx ring layout ascii chart in the cover-letter

Changes in v5:
- Call __rte_pktmbuf_prefree_seg to check refcnt when free mbufs

Changes in v4:
- Fix the error in virtio tx ring layout ascii chart in the commit message
- Move virtio_xmit_cleanup ahead to free descriptors earlier
- Test merge-able feature when select simple rx/tx functions

Changes in v3:
- Remove unnecessary NULL test for rte_free
- Remove unnecessary assign of local var after free
- Remove return at the end of void function
- Remove always_inline attribute for virtio_xmit_cleanup
- Reword some commit messages
- Add TODO in the commit message of simple tx patch

Changes in v2:
- Remove the configure macro
- Enable simple R/TX processing when user specifies simple txq flags
- Reword some comments and commit messages

In DPDK based switching enviroment, mostly vhost runs on a dedicated core while virtio processing in guest VMs runs on other different cores.
Take RX for example, with generic implementation, for each guest buffer,
a) virtio driver allocates a descriptor from free descriptor list
b) modify the entry of avail ring to point to allocated descriptor
c) after packet is received, free the descriptor

When vhost fetches the avail ring, it need to fetch the modified L1 cache from virtio core, which is a heavy cost in current CPU implementation.

This idea of this optimization is:
    allocate the fixed descriptor for each entry of avail ring, so avail ring will always be the same during the run.
This removes L1M cache transfer from virtio core to vhost core for avail ring.
(Note we couldn't avoid the cache transfer for descriptors).
Besides, descriptor allocation and free operation is eliminated.
This also makes vector procesing possible to further accelerate the processing.

This is the layout for the avail ring(take 256 ring entries for example), with each entry pointing to the descriptor with the same index.
                    avail
                    idx
                    +
                    |
+----+----+---+-------------+------+
| 0  | 1  | 2 | ... |  254  | 255  |  avail ring
+-+--+-+--+-+-+---------+---+--+---+
  |    |    |       |   |      |
  |    |    |       |   |      |
  v    v    v       |   v      v
+-+--+-+--+-+-+---------+---+--+---+
| 0  | 1  | 2 | ... |  254  | 255  |  desc ring
+----+----+---+-------------+------+
                    |
                    |
+----+----+---+-------------+------+
| 0  | 1  | 2 |     |  254  | 255  |  used ring
+----+----+---+-------------+------+
                    |
                    +

This is the ring layout for TX.
As we need one virtio header for each xmit packet, we have 128 slots available.

                         ++
                         ||
                         ||
+-----+-----+-----+--------------+------+------+------+
|  0  |  1  | ... |  127 || 128  | 129  | ...  | 255  |   avail ring
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
   |     |            |  ||  |      |             |
   v     v            v  ||  v      v             v
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
| 128 | 129 | ... |  255 || 127  | 128  | ...  | 255  |   desc ring for virtio_net_hdr
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
   |     |            |  ||  |      |             |
   v     v            v  ||  v      v             v
+--+--+--+--+-----+---+------+---+--+---+------+--+---+
|  0  |  1  | ... |  127 ||  0   |  1   | ...  | 127  |   desc ring for tx dat
+-----+-----+-----+--------------+------+------+------+
                         ||
                         ||
                         ++


Performance boost could be observed only if the virtio backend isn't the bottleneck or in VM2VM case.
There are also several vhost optimization patches to be submitted later.


Huawei Xie (8):
  virtio: add virtio_rxtx.h header file
  virtio: add software rx ring, fake_buf into virtqueue
  virtio: rx/tx ring layout optimization
  virtio: fill RX avail ring with blank mbufs
  virtio: virtio vec rx
  virtio: simple tx routine
  virtio: pick simple rx/tx func
  doc: update release notes 2.2 about virtio performance optimization

 doc/guides/rel_notes/release_2_2.rst    |   3 +
 drivers/net/virtio/Makefile             |   2 +-
 drivers/net/virtio/virtio_ethdev.c      |  12 +-
 drivers/net/virtio/virtio_ethdev.h      |   5 +
 drivers/net/virtio/virtio_rxtx.c        |  56 ++++-
 drivers/net/virtio/virtio_rxtx.h        |  39 +++
 drivers/net/virtio/virtio_rxtx_simple.c | 414 ++++++++++++++++++++++++++++++++
 drivers/net/virtio/virtqueue.h          |   5 +
 8 files changed, 532 insertions(+), 4 deletions(-)  create mode 100644 drivers/net/virtio/virtio_rxtx.h  create mode 100644 drivers/net/virtio/virtio_rxtx_simple.c

--
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [dpdk-dev] [PATCH v6 0/8] virtio ring layout optimization and simple rx/tx processing
  2015-11-27  6:03   ` Xu, Qian Q
@ 2015-12-17  5:22     ` Xie, Huawei
  2015-12-17  9:08       ` Thomas Monjalon
  0 siblings, 1 reply; 92+ messages in thread
From: Xie, Huawei @ 2015-12-17  5:22 UTC (permalink / raw)
  To: Xu, Qian Q, dev, Thomas Monjalon, Stephen Hemminger

On 11/27/2015 2:03 PM, Xu, Qian Q wrote:
> Some virtio-pmd optimization performance data sharing: 
> 1. Use simplified vhost-sample, only doing the dequeuer and free, so virtio only tx, then test the virtio tx performance improvement. Then in the VM, using one virtio to do the txonly, and let the virtio tx working. Also modified the txonly file to remove the memory copy part, then check the virtio TX rate. The performance of optimized virtio-pmd will have ~2x performance than the non-optimized virtio-pmd. 
> 2. Similarly as item1, but use the default txonly file, so with memory copy, then the performance of optimized virtio-pmd will have ~37% performance improvement than the non-optimized virtio-pmd. 
> 3. In the OVS test scenario, one physical NIC + one virtio in the VM, then let the virtio do the loopback(having rx and tx), running testpmd in the VM, then the performance will have 60% performance improvement than the non-optimized virtio-pmd. 
Thomas:
You ever asked about the performance data.
Another thing is how about adding a simple vhost performance example,
like the vring bench which is used to test virtio performance, so that
each time we have some performance related patches, we could use this
benchmark to report the performance difference?
>
>
>
> Thanks
> Qian
>
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Huawei Xie
> Sent: Thursday, October 29, 2015 10:53 PM
> To: dev@dpdk.org
> Subject: [dpdk-dev] [PATCH v6 0/8] virtio ring layout optimization and simple rx/tx processing
>
> Changes in v6:
> - Update release notes
> - Fix the error in virtio tx ring layout ascii chart in the cover-letter
>
> Changes in v5:
> - Call __rte_pktmbuf_prefree_seg to check refcnt when free mbufs
>
> Changes in v4:
> - Fix the error in virtio tx ring layout ascii chart in the commit message
> - Move virtio_xmit_cleanup ahead to free descriptors earlier
> - Test merge-able feature when select simple rx/tx functions
>
> Changes in v3:
> - Remove unnecessary NULL test for rte_free
> - Remove unnecessary assign of local var after free
> - Remove return at the end of void function
> - Remove always_inline attribute for virtio_xmit_cleanup
> - Reword some commit messages
> - Add TODO in the commit message of simple tx patch
>
> Changes in v2:
> - Remove the configure macro
> - Enable simple R/TX processing when user specifies simple txq flags
> - Reword some comments and commit messages
>
> In DPDK based switching enviroment, mostly vhost runs on a dedicated core while virtio processing in guest VMs runs on other different cores.
> Take RX for example, with generic implementation, for each guest buffer,
> a) virtio driver allocates a descriptor from free descriptor list
> b) modify the entry of avail ring to point to allocated descriptor
> c) after packet is received, free the descriptor
>
> When vhost fetches the avail ring, it need to fetch the modified L1 cache from virtio core, which is a heavy cost in current CPU implementation.
>
> This idea of this optimization is:
>     allocate the fixed descriptor for each entry of avail ring, so avail ring will always be the same during the run.
> This removes L1M cache transfer from virtio core to vhost core for avail ring.
> (Note we couldn't avoid the cache transfer for descriptors).
> Besides, descriptor allocation and free operation is eliminated.
> This also makes vector procesing possible to further accelerate the processing.
>
> This is the layout for the avail ring(take 256 ring entries for example), with each entry pointing to the descriptor with the same index.
>                     avail
>                     idx
>                     +
>                     |
> +----+----+---+-------------+------+
> | 0  | 1  | 2 | ... |  254  | 255  |  avail ring
> +-+--+-+--+-+-+---------+---+--+---+
>   |    |    |       |   |      |
>   |    |    |       |   |      |
>   v    v    v       |   v      v
> +-+--+-+--+-+-+---------+---+--+---+
> | 0  | 1  | 2 | ... |  254  | 255  |  desc ring
> +----+----+---+-------------+------+
>                     |
>                     |
> +----+----+---+-------------+------+
> | 0  | 1  | 2 |     |  254  | 255  |  used ring
> +----+----+---+-------------+------+
>                     |
>                     +
>
> This is the ring layout for TX.
> As we need one virtio header for each xmit packet, we have 128 slots available.
>
>                          ++
>                          ||
>                          ||
> +-----+-----+-----+--------------+------+------+------+
> |  0  |  1  | ... |  127 || 128  | 129  | ...  | 255  |   avail ring
> +--+--+--+--+-----+---+------+---+--+---+------+--+---+
>    |     |            |  ||  |      |             |
>    v     v            v  ||  v      v             v
> +--+--+--+--+-----+---+------+---+--+---+------+--+---+
> | 128 | 129 | ... |  255 || 127  | 128  | ...  | 255  |   desc ring for virtio_net_hdr
> +--+--+--+--+-----+---+------+---+--+---+------+--+---+
>    |     |            |  ||  |      |             |
>    v     v            v  ||  v      v             v
> +--+--+--+--+-----+---+------+---+--+---+------+--+---+
> |  0  |  1  | ... |  127 ||  0   |  1   | ...  | 127  |   desc ring for tx dat
> +-----+-----+-----+--------------+------+------+------+
>                          ||
>                          ||
>                          ++
>
>
> Performance boost could be observed only if the virtio backend isn't the bottleneck or in VM2VM case.
> There are also several vhost optimization patches to be submitted later.
>
>
> Huawei Xie (8):
>   virtio: add virtio_rxtx.h header file
>   virtio: add software rx ring, fake_buf into virtqueue
>   virtio: rx/tx ring layout optimization
>   virtio: fill RX avail ring with blank mbufs
>   virtio: virtio vec rx
>   virtio: simple tx routine
>   virtio: pick simple rx/tx func
>   doc: update release notes 2.2 about virtio performance optimization
>
>  doc/guides/rel_notes/release_2_2.rst    |   3 +
>  drivers/net/virtio/Makefile             |   2 +-
>  drivers/net/virtio/virtio_ethdev.c      |  12 +-
>  drivers/net/virtio/virtio_ethdev.h      |   5 +
>  drivers/net/virtio/virtio_rxtx.c        |  56 ++++-
>  drivers/net/virtio/virtio_rxtx.h        |  39 +++
>  drivers/net/virtio/virtio_rxtx_simple.c | 414 ++++++++++++++++++++++++++++++++
>  drivers/net/virtio/virtqueue.h          |   5 +
>  8 files changed, 532 insertions(+), 4 deletions(-)  create mode 100644 drivers/net/virtio/virtio_rxtx.h  create mode 100644 drivers/net/virtio/virtio_rxtx_simple.c
>
> --
> 1.8.1.4
>
>


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [dpdk-dev] [PATCH v6 0/8] virtio ring layout optimization and simple rx/tx processing
  2015-12-17  5:22     ` Xie, Huawei
@ 2015-12-17  9:08       ` Thomas Monjalon
  0 siblings, 0 replies; 92+ messages in thread
From: Thomas Monjalon @ 2015-12-17  9:08 UTC (permalink / raw)
  To: Xie, Huawei; +Cc: dev

2015-12-17 05:22, Xie, Huawei:
> You ever asked about the performance data.
> Another thing is how about adding a simple vhost performance example,
> like the vring bench which is used to test virtio performance, so that
> each time we have some performance related patches, we could use this
> benchmark to report the performance difference?

The examples are part of the doc and should be enough simple to be used
in a tutorial.
The tool to test the DPDK drivers is testpmd. There is also the simpler
unit tests in app/test. If something more complex is needed (with qemu
options automated), maybe that dts is a better choice.

^ permalink raw reply	[flat|nested] 92+ messages in thread

end of thread, other threads:[~2015-12-17  9:10 UTC | newest]

Thread overview: 92+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-29 14:45 [dpdk-dev] [PATCH 0/8] virtio: virtio ring layout optimization and RX vector processing Huawei Xie
2015-09-29 14:45 ` [dpdk-dev] [PATCH 1/8] virtio: add configure for simple virtio rx/tx Huawei Xie
2015-09-29 14:45 ` [dpdk-dev] [PATCH 2/8] virtio: add virtio_rxtx.h header file Huawei Xie
2015-09-29 14:45 ` [dpdk-dev] [PATCH 3/8] virtio: add software rx ring, fake_buf, simple_rxtx into virtqueue Huawei Xie
2015-09-29 16:15   ` Stephen Hemminger
2015-09-29 14:45 ` [dpdk-dev] [PATCH 4/8] virtio: rx/tx ring layout optimization Huawei Xie
2015-09-29 14:45 ` [dpdk-dev] [PATCH 5/8] virtio: fill RX avail ring with blank mbufs Huawei Xie
2015-09-29 14:45 ` [dpdk-dev] [PATCH 6/8] virtio: virtio vec rx Huawei Xie
2015-09-29 14:45 ` [dpdk-dev] [PATCH 7/8] virtio: simple tx routine Huawei Xie
2015-09-29 14:45 ` [dpdk-dev] [PATCH 8/8] virtio: rxtx_func_get Huawei Xie
2015-09-29 15:41 ` [dpdk-dev] [PATCH 0/8] virtio: virtio ring layout optimization and RX vector processing Xie, Huawei
2015-10-18  6:28 ` [dpdk-dev] [PATCH v2 0/7] virtio ring layout optimization and simple rx/tx processing Huawei Xie
2015-10-18  6:28 ` Huawei Xie
2015-10-18  6:28   ` [dpdk-dev] [PATCH v2 1/7] virtio: add virtio_rxtx.h header file Huawei Xie
2015-10-18  6:28   ` [dpdk-dev] [PATCH v2 2/7] virtio: add software rx ring, fake_buf into virtqueue Huawei Xie
2015-10-19  4:20     ` Stephen Hemminger
2015-10-19  5:06       ` Xie, Huawei
2015-10-20 15:32         ` Xie, Huawei
2015-10-18  6:29   ` [dpdk-dev] [PATCH v2 3/7] virtio: rx/tx ring layout optimization Huawei Xie
2015-10-18  6:29   ` [dpdk-dev] [PATCH v2 4/7] virtio: fill RX avail ring with blank mbufs Huawei Xie
2015-10-18  6:29   ` [dpdk-dev] [PATCH v2 5/7] virtio: virtio vec rx Huawei Xie
2015-10-18  6:29   ` [dpdk-dev] [PATCH v2 6/7] virtio: simple tx routine Huawei Xie
2015-10-19  4:16     ` Stephen Hemminger
2015-10-19  5:22       ` Xie, Huawei
2015-10-19  4:18     ` Stephen Hemminger
2015-10-19  5:15       ` Xie, Huawei
2015-10-19  4:19     ` Stephen Hemminger
2015-10-19  5:12       ` Xie, Huawei
2015-10-18  6:29   ` [dpdk-dev] [PATCH v2 7/7] virtio: pick simple rx/tx func Huawei Xie
2015-10-20 15:30 ` [dpdk-dev] [PATCH v3 0/7] virtio ring layout optimization and simple rx/tx processing Huawei Xie
2015-10-20 15:30   ` [dpdk-dev] [PATCH v3 1/7] virtio: add virtio_rxtx.h header file Huawei Xie
2015-10-20 15:30   ` [dpdk-dev] [PATCH v3 2/7] virtio: add software rx ring, fake_buf into virtqueue Huawei Xie
2015-10-20 15:30   ` [dpdk-dev] [PATCH v3 3/7] virtio: rx/tx ring layout optimization Huawei Xie
2015-10-20 15:30   ` [dpdk-dev] [PATCH v3 4/7] virtio: fill RX avail ring with blank mbufs Huawei Xie
2015-10-20 15:30   ` [dpdk-dev] [PATCH v3 5/7] virtio: virtio vec rx Huawei Xie
2015-10-22  4:04     ` Wang, Zhihong
2015-10-22  5:48       ` Xie, Huawei
2015-10-20 15:30   ` [dpdk-dev] [PATCH v3 6/7] virtio: simple tx routine Huawei Xie
2015-10-20 18:58     ` Stephen Hemminger
2015-10-22  5:43       ` Xie, Huawei
2015-10-20 15:30   ` [dpdk-dev] [PATCH v3 7/7] virtio: pick simple rx/tx func Huawei Xie
2015-10-22  2:50     ` Tan, Jianfeng
2015-10-22 11:40       ` Xie, Huawei
2015-10-22 12:09 ` [dpdk-dev] [PATCH v4 0/7] virtio ring layout optimization and simple rx/tx processing Huawei Xie
2015-10-22 12:09   ` [dpdk-dev] [PATCH v4 1/7] virtio: add virtio_rxtx.h header file Huawei Xie
2015-10-22 12:09   ` [dpdk-dev] [PATCH v4 2/7] virtio: add software rx ring, fake_buf into virtqueue Huawei Xie
2015-10-22 12:09   ` [dpdk-dev] [PATCH v4 3/7] virtio: rx/tx ring layout optimization Huawei Xie
2015-10-22 12:09   ` [dpdk-dev] [PATCH v4 4/7] virtio: fill RX avail ring with blank mbufs Huawei Xie
2015-10-23  5:56     ` Tan, Jianfeng
2015-10-25 15:40       ` Xie, Huawei
2015-10-22 12:09   ` [dpdk-dev] [PATCH v4 5/7] virtio: virtio vec rx Huawei Xie
2015-10-22 12:09   ` [dpdk-dev] [PATCH v4 6/7] virtio: simple tx routine Huawei Xie
2015-10-22 16:57     ` Stephen Hemminger
2015-10-23  2:17       ` Xie, Huawei
2015-10-23  2:20         ` Xie, Huawei
2015-10-22 12:09   ` [dpdk-dev] [PATCH v4 7/7] virtio: pick simple rx/tx func Huawei Xie
2015-10-22 16:58     ` Stephen Hemminger
2015-10-23  1:38       ` Xie, Huawei
2015-10-25 15:34 ` [dpdk-dev] [PATCH v5 0/7] virtio ring layout optimization and simple rx/tx processing Huawei Xie
2015-10-25 15:34   ` [dpdk-dev] [PATCH v5 1/7] virtio: add virtio_rxtx.h header file Huawei Xie
2015-10-25 15:34   ` [dpdk-dev] [PATCH v5 2/7] virtio: add software rx ring, fake_buf into virtqueue Huawei Xie
2015-10-25 15:35   ` [dpdk-dev] [PATCH v5 3/7] virtio: rx/tx ring layout optimization Huawei Xie
2015-10-25 15:35   ` [dpdk-dev] [PATCH v5 4/7] virtio: fill RX avail ring with blank mbufs Huawei Xie
2015-10-25 15:35   ` [dpdk-dev] [PATCH v5 5/7] virtio: virtio vec rx Huawei Xie
2015-10-26  8:34     ` Wang, Zhihong
2015-10-25 15:35   ` [dpdk-dev] [PATCH v5 6/7] virtio: simple tx routine Huawei Xie
2015-10-25 15:35   ` [dpdk-dev] [PATCH v5 7/7] virtio: pick simple rx/tx func Huawei Xie
2015-10-27  1:44   ` [dpdk-dev] [PATCH v5 0/7] virtio ring layout optimization and simple rx/tx processing Tan, Jianfeng
2015-10-27  2:15     ` Yuanhan Liu
2015-10-27 10:17       ` Bruce Richardson
2015-10-29 14:53 ` [dpdk-dev] [PATCH v6 0/8] " Huawei Xie
2015-10-29 14:53   ` [dpdk-dev] [PATCH v6 1/8] virtio: add virtio_rxtx.h header file Huawei Xie
2015-10-29 14:53   ` [dpdk-dev] [PATCH v6 2/8] virtio: add software rx ring, fake_buf into virtqueue Huawei Xie
2015-10-30 18:13     ` Thomas Monjalon
2015-10-29 14:53   ` [dpdk-dev] [PATCH v6 3/8] virtio: rx/tx ring layout optimization Huawei Xie
2015-10-29 14:53   ` [dpdk-dev] [PATCH v6 4/8] virtio: fill RX avail ring with blank mbufs Huawei Xie
2015-10-29 14:53   ` [dpdk-dev] [PATCH v6 5/8] virtio: virtio vec rx Huawei Xie
2015-10-30 18:19     ` Thomas Monjalon
2015-11-02  2:18       ` Xie, Huawei
2015-11-02  7:28         ` Thomas Monjalon
2015-11-02  8:49           ` Xie, Huawei
2015-11-02  9:03             ` Thomas Monjalon
2015-10-29 14:53   ` [dpdk-dev] [PATCH v6 6/8] virtio: simple tx routine Huawei Xie
2015-10-29 14:53   ` [dpdk-dev] [PATCH v6 7/8] virtio: pick simple rx/tx func Huawei Xie
2015-10-29 14:53   ` [dpdk-dev] [PATCH v6 8/8] doc: update release notes 2.2 about virtio performance optimization Huawei Xie
2015-10-30  2:05   ` [dpdk-dev] [PATCH v6 0/8] virtio ring layout optimization and simple rx/tx processing Tan, Jianfeng
2015-11-02 22:09     ` Thomas Monjalon
2015-11-02 22:10       ` Thomas Monjalon
2015-11-03 10:30         ` Xie, Huawei
2015-11-27  6:03   ` Xu, Qian Q
2015-12-17  5:22     ` Xie, Huawei
2015-12-17  9:08       ` Thomas Monjalon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).