[dpdk-dev] [PATCH v7 0/8] vhost-user multiple queues enabling

DPDK patches and discussions
 help / color / mirror / Atom feed

* [dpdk-dev] [PATCH v7 0/8] vhost-user multiple queues enabling
@ 2015-10-21  3:48 Yuanhan Liu
  2015-10-21  3:48 ` [dpdk-dev] [PATCH v7 1/8] vhost-user: add protocol features support Yuanhan Liu
                   ` (8 more replies)
  0 siblings, 9 replies; 66+ messages in thread
From: Yuanhan Liu @ 2015-10-21  3:48 UTC (permalink / raw)
  To: dev; +Cc: Michael S. Tsirkin, marcel

This patch set enables vhost-user multiple queues.

v7:

- Removed vhost-user mq examples in this patch set

  Because the example leverages the hardware VMDq feature to
  demonstrate the mq feature, which introduces too much 
  limitation, yet it's turned out to be not elegant.

- Commit log fixes

- Dropped the patch to fix RESET_OWNER handling, as I found
  Jerome's solution works as well, and it makes more sense to
  me:

  http://dpdk.org/dev/patchwork/project/dpdk/list/?submitter=354

Overview
========

It depends on some QEMU patches that has already been merged to upstream.
Those qemu patches introduce some new vhost-user messages, for vhost-user
mq enabling negotiation. Here is the main negotiation steps (Qemu
as master, and DPDK vhost-user as slave):

- Master queries features by VHOST_USER_GET_FEATURES from slave

- Check if VHOST_USER_F_PROTOCOL_FEATURES exist. If not, mq is not
  supported. (check patch 1 for why VHOST_USER_F_PROTOCOL_FEATURES
  is introduced)

- Master then sends another command, VHOST_USER_GET_QUEUE_NUM, for
  querying how many queues the slave supports.

  Master will compare the result with the requested queue number.
  Qemu exits if the former is smaller.

- Master then tries to initiate all queue pairs by sending some vhost
  user commands, including VHOST_USER_SET_VRING_CALL, which will
  trigger the slave to do related vring setup, such as vring allocation.

Till now, all necessary initiation and negotiation are done. And master
could send another message, VHOST_USER_SET_VRING_ENABLE, to enable/disable
a specific queue dynamically later.

Patchset
========

Patch 1-5 are all prepare works for enabling mq; they are all atomic
changes, with "do not breaking anything" beared in mind while making
them.

Patch 6 actually enables mq feature, by setting two key feature flags.

Patch 7 handles VHOST_USER_SET_VRING_ENABLE message, which is for enabling
disabling a specific virt queue pair, and there is only one queue pair is
enabled by default.

Test with OVS
=============

Marcel created a simple yet quite clear test guide with OVS at:

   http://wiki.qemu.org/Features/vhost-user-ovs-dpdk

Cc: Jerome Jutteau <jerome.jutteau@outscale.com> 

---
Changchun Ouyang (3):
  vhost: rxtx: use queue id instead of constant ring index
  virtio: fix deadloop due to reading virtio_net_config incorrectly
  vhost: add VHOST_USER_SET_VRING_ENABLE message

Yuanhan Liu (5):
  vhost-user: add protocol features support
  vhost-user: add VHOST_USER_GET_QUEUE_NUM message
  vhost: vring queue setup for multiple queue support
  vhost-user: enable vhost-user multiple queue
  doc: update release note for vhost-user mq support

 doc/guides/rel_notes/release_2_2.rst          |   4 +
 drivers/net/virtio/virtio_ethdev.c            |  16 ++-
 lib/librte_vhost/rte_virtio_net.h             |  13 ++-
 lib/librte_vhost/vhost_rxtx.c                 |  56 ++++++---
 lib/librte_vhost/vhost_user/vhost-net-user.c  |  25 ++++-
 lib/librte_vhost/vhost_user/vhost-net-user.h  |   4 +
 lib/librte_vhost/vhost_user/virtio-net-user.c |  83 ++++++++++----
 lib/librte_vhost/vhost_user/virtio-net-user.h |  10 ++
 lib/librte_vhost/virtio-net.c                 | 156 ++++++++++++++++----------
 9 files changed, 269 insertions(+), 98 deletions(-)

-- 
1.9.0

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [dpdk-dev] [PATCH v7 1/8] vhost-user: add protocol features support
  2015-10-21  3:48 [dpdk-dev] [PATCH v7 0/8] vhost-user multiple queues enabling Yuanhan Liu
@ 2015-10-21  3:48 ` Yuanhan Liu
  2015-10-22  9:52   ` Xie, Huawei
  2015-10-21  3:48 ` [dpdk-dev] [PATCH v7 2/8] vhost-user: add VHOST_USER_GET_QUEUE_NUM message Yuanhan Liu
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 66+ messages in thread
From: Yuanhan Liu @ 2015-10-21  3:48 UTC (permalink / raw)
  To: dev; +Cc: marcel, Michael S. Tsirkin

The two protocol features messages are introduced by qemu vhost
maintainer(Michael) for extendting vhost-user interface. Here is
an excerpta from the vhost-user spec:

    Any protocol extensions are gated by protocol feature bits,
    which allows full backwards compatibility on both master
    and slave.

The vhost-user multiple queue features will be treated as a vhost-user
extension, hence, we have to implement the two messages first.

VHOST_USER_PROTOCOL_FEATURES is initated to 0, as we don't support
any yet.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
---
 lib/librte_vhost/rte_virtio_net.h             |  1 +
 lib/librte_vhost/vhost_user/vhost-net-user.c  | 13 ++++++++++++-
 lib/librte_vhost/vhost_user/vhost-net-user.h  |  2 ++
 lib/librte_vhost/vhost_user/virtio-net-user.c | 13 +++++++++++++
 lib/librte_vhost/vhost_user/virtio-net-user.h |  5 +++++
 lib/librte_vhost/virtio-net.c                 |  5 ++++-
 6 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
index a037c15..e3a21e5 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -99,6 +99,7 @@ struct virtio_net {
 	struct vhost_virtqueue	*virtqueue[VIRTIO_QNUM];	/**< Contains all virtqueue information. */
 	struct virtio_memory	*mem;		/**< QEMU memory and memory region information. */
 	uint64_t		features;	/**< Negotiated feature set. */
+	uint64_t		protocol_features;	/**< Negotiated protocol feature set. */
 	uint64_t		device_fh;	/**< device identifier. */
 	uint32_t		flags;		/**< Device flags. Only used to check if device is running on data core. */
 #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ)
diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c b/lib/librte_vhost/vhost_user/vhost-net-user.c
index d1f8877..bc2ad24 100644
--- a/lib/librte_vhost/vhost_user/vhost-net-user.c
+++ b/lib/librte_vhost/vhost_user/vhost-net-user.c
@@ -95,7 +95,9 @@ static const char *vhost_message_str[VHOST_USER_MAX] = {
 	[VHOST_USER_GET_VRING_BASE] = "VHOST_USER_GET_VRING_BASE",
 	[VHOST_USER_SET_VRING_KICK] = "VHOST_USER_SET_VRING_KICK",
 	[VHOST_USER_SET_VRING_CALL] = "VHOST_USER_SET_VRING_CALL",
-	[VHOST_USER_SET_VRING_ERR]  = "VHOST_USER_SET_VRING_ERR"
+	[VHOST_USER_SET_VRING_ERR]  = "VHOST_USER_SET_VRING_ERR",
+	[VHOST_USER_GET_PROTOCOL_FEATURES]  = "VHOST_USER_GET_PROTOCOL_FEATURES",
+	[VHOST_USER_SET_PROTOCOL_FEATURES]  = "VHOST_USER_SET_PROTOCOL_FEATURES",
 };
 
 /**
@@ -363,6 +365,15 @@ vserver_message_handler(int connfd, void *dat, int *remove)
 		ops->set_features(ctx, &features);
 		break;
 
+	case VHOST_USER_GET_PROTOCOL_FEATURES:
+		msg.payload.u64 = VHOST_USER_PROTOCOL_FEATURES;
+		msg.size = sizeof(msg.payload.u64);
+		send_vhost_message(connfd, &msg);
+		break;
+	case VHOST_USER_SET_PROTOCOL_FEATURES:
+		user_set_protocol_features(ctx, msg.payload.u64);
+		break;
+
 	case VHOST_USER_SET_OWNER:
 		ops->set_owner(ctx);
 		break;
diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.h b/lib/librte_vhost/vhost_user/vhost-net-user.h
index 2e72f3c..4490d23 100644
--- a/lib/librte_vhost/vhost_user/vhost-net-user.h
+++ b/lib/librte_vhost/vhost_user/vhost-net-user.h
@@ -63,6 +63,8 @@ typedef enum VhostUserRequest {
 	VHOST_USER_SET_VRING_KICK = 12,
 	VHOST_USER_SET_VRING_CALL = 13,
 	VHOST_USER_SET_VRING_ERR = 14,
+	VHOST_USER_GET_PROTOCOL_FEATURES = 15,
+	VHOST_USER_SET_PROTOCOL_FEATURES = 16,
 	VHOST_USER_MAX
 } VhostUserRequest;
 
diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c b/lib/librte_vhost/vhost_user/virtio-net-user.c
index 4689927..360254e 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.c
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
@@ -316,3 +316,16 @@ user_destroy_device(struct vhost_device_ctx ctx)
 		dev->mem = NULL;
 	}
 }
+
+void
+user_set_protocol_features(struct vhost_device_ctx ctx,
+			   uint64_t protocol_features)
+{
+	struct virtio_net *dev;
+
+	dev = get_device(ctx);
+	if (dev == NULL || protocol_features & ~VHOST_USER_PROTOCOL_FEATURES)
+		return;
+
+	dev->protocol_features = protocol_features;
+}
diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.h b/lib/librte_vhost/vhost_user/virtio-net-user.h
index df24860..e7a6ff4 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.h
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.h
@@ -37,12 +37,17 @@
 #include "vhost-net.h"
 #include "vhost-net-user.h"
 
+#define VHOST_USER_PROTOCOL_FEATURES	0ULL
+
 int user_set_mem_table(struct vhost_device_ctx, struct VhostUserMsg *);
 
 void user_set_vring_call(struct vhost_device_ctx, struct VhostUserMsg *);
 
 void user_set_vring_kick(struct vhost_device_ctx, struct VhostUserMsg *);
 
+void user_set_protocol_features(struct vhost_device_ctx ctx,
+				uint64_t protocol_features);
+
 int user_get_vring_base(struct vhost_device_ctx, struct vhost_vring_state *);
 
 void user_destroy_device(struct vhost_device_ctx);
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index d0f1764..deac6b9 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -67,11 +67,14 @@ struct virtio_net_device_ops const *notify_ops;
 /* root address of the linked list of managed virtio devices */
 static struct virtio_net_config_ll *ll_root;
 
+#define VHOST_USER_F_PROTOCOL_FEATURES	30
+
 /* Features supported by this lib. */
 #define VHOST_SUPPORTED_FEATURES ((1ULL << VIRTIO_NET_F_MRG_RXBUF) | \
 				(1ULL << VIRTIO_NET_F_CTRL_VQ) | \
 				(1ULL << VIRTIO_NET_F_CTRL_RX) | \
-				(1ULL << VHOST_F_LOG_ALL))
+				(1ULL << VHOST_F_LOG_ALL)      | \
+				(1ULL << VHOST_USER_F_PROTOCOL_FEATURES))
 static uint64_t VHOST_FEATURES = VHOST_SUPPORTED_FEATURES;
 
 
-- 
1.9.0

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [dpdk-dev] [PATCH v7 2/8] vhost-user: add VHOST_USER_GET_QUEUE_NUM message
  2015-10-21  3:48 [dpdk-dev] [PATCH v7 0/8] vhost-user multiple queues enabling Yuanhan Liu
  2015-10-21  3:48 ` [dpdk-dev] [PATCH v7 1/8] vhost-user: add protocol features support Yuanhan Liu
@ 2015-10-21  3:48 ` Yuanhan Liu
  2015-10-22  9:38   ` Xie, Huawei
  2015-10-21  3:48 ` [dpdk-dev] [PATCH v7 3/8] vhost: vring queue setup for multiple queue support Yuanhan Liu
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 66+ messages in thread
From: Yuanhan Liu @ 2015-10-21  3:48 UTC (permalink / raw)
  To: dev; +Cc: marcel, Michael S. Tsirkin

To tell the frontend (qemu) how many queue pairs we support.

And it is initiated to VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
---
 lib/librte_vhost/vhost_user/vhost-net-user.c | 7 +++++++
 lib/librte_vhost/vhost_user/vhost-net-user.h | 1 +
 2 files changed, 8 insertions(+)

diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c b/lib/librte_vhost/vhost_user/vhost-net-user.c
index bc2ad24..8675cd4 100644
--- a/lib/librte_vhost/vhost_user/vhost-net-user.c
+++ b/lib/librte_vhost/vhost_user/vhost-net-user.c
@@ -98,6 +98,7 @@ static const char *vhost_message_str[VHOST_USER_MAX] = {
 	[VHOST_USER_SET_VRING_ERR]  = "VHOST_USER_SET_VRING_ERR",
 	[VHOST_USER_GET_PROTOCOL_FEATURES]  = "VHOST_USER_GET_PROTOCOL_FEATURES",
 	[VHOST_USER_SET_PROTOCOL_FEATURES]  = "VHOST_USER_SET_PROTOCOL_FEATURES",
+	[VHOST_USER_GET_QUEUE_NUM]  = "VHOST_USER_GET_QUEUE_NUM",
 };
 
 /**
@@ -421,6 +422,12 @@ vserver_message_handler(int connfd, void *dat, int *remove)
 		RTE_LOG(INFO, VHOST_CONFIG, "not implemented\n");
 		break;
 
+	case VHOST_USER_GET_QUEUE_NUM:
+		msg.payload.u64 = VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX;
+		msg.size = sizeof(msg.payload.u64);
+		send_vhost_message(connfd, &msg);
+		break;
+
 	default:
 		break;
 
diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.h b/lib/librte_vhost/vhost_user/vhost-net-user.h
index 4490d23..389d21d 100644
--- a/lib/librte_vhost/vhost_user/vhost-net-user.h
+++ b/lib/librte_vhost/vhost_user/vhost-net-user.h
@@ -65,6 +65,7 @@ typedef enum VhostUserRequest {
 	VHOST_USER_SET_VRING_ERR = 14,
 	VHOST_USER_GET_PROTOCOL_FEATURES = 15,
 	VHOST_USER_SET_PROTOCOL_FEATURES = 16,
+	VHOST_USER_GET_QUEUE_NUM = 17,
 	VHOST_USER_MAX
 } VhostUserRequest;
 
-- 
1.9.0

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [dpdk-dev] [PATCH v7 3/8] vhost: vring queue setup for multiple queue support
  2015-10-21  3:48 [dpdk-dev] [PATCH v7 0/8] vhost-user multiple queues enabling Yuanhan Liu
  2015-10-21  3:48 ` [dpdk-dev] [PATCH v7 1/8] vhost-user: add protocol features support Yuanhan Liu
  2015-10-21  3:48 ` [dpdk-dev] [PATCH v7 2/8] vhost-user: add VHOST_USER_GET_QUEUE_NUM message Yuanhan Liu
@ 2015-10-21  3:48 ` Yuanhan Liu
  2015-10-21  4:45   ` Stephen Hemminger
  2015-10-22  9:49   ` Xie, Huawei
  2015-10-21  3:48 ` [dpdk-dev] [PATCH v7 4/8] vhost: rxtx: use queue id instead of constant ring index Yuanhan Liu
                   ` (5 subsequent siblings)
  8 siblings, 2 replies; 66+ messages in thread
From: Yuanhan Liu @ 2015-10-21  3:48 UTC (permalink / raw)
  To: dev; +Cc: marcel, Michael S. Tsirkin

All queue pairs, including the default (the first) queue pair,
are allocated dynamically, when a vring_call message is received
first time for a specific queue pair.

This is a refactor work for enabling vhost-user multiple queue;
it should not break anything as it does no functional changes:
we don't support mq set, so there is only one mq at max.

This patch is based on Changchun's patch.

Signed-off-by: Ouyang Changchun <changchun.ouyang@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
---
 lib/librte_vhost/rte_virtio_net.h             |   3 +-
 lib/librte_vhost/vhost_user/virtio-net-user.c |  44 ++++----
 lib/librte_vhost/virtio-net.c                 | 144 ++++++++++++++++----------
 3 files changed, 114 insertions(+), 77 deletions(-)

diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
index e3a21e5..5dd6493 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -96,7 +96,7 @@ struct vhost_virtqueue {
  * Device structure contains all configuration information relating to the device.
  */
 struct virtio_net {
-	struct vhost_virtqueue	*virtqueue[VIRTIO_QNUM];	/**< Contains all virtqueue information. */
+	struct vhost_virtqueue	*virtqueue[VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX];	/**< Contains all virtqueue information. */
 	struct virtio_memory	*mem;		/**< QEMU memory and memory region information. */
 	uint64_t		features;	/**< Negotiated feature set. */
 	uint64_t		protocol_features;	/**< Negotiated protocol feature set. */
@@ -104,6 +104,7 @@ struct virtio_net {
 	uint32_t		flags;		/**< Device flags. Only used to check if device is running on data core. */
 #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ)
 	char			ifname[IF_NAME_SZ];	/**< Name of the tap device or socket path. */
+	uint32_t		virt_qp_nb;	/**< number of queue pair we have allocated */
 	void			*priv;		/**< private context */
 } __rte_cache_aligned;
 
diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c b/lib/librte_vhost/vhost_user/virtio-net-user.c
index 360254e..e83d279 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.c
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
@@ -206,25 +206,33 @@ err_mmap:
 }
 
 static int
+vq_is_ready(struct vhost_virtqueue *vq)
+{
+	return vq && vq->desc   &&
+	       vq->kickfd != -1 &&
+	       vq->callfd != -1;
+}
+
+static int
 virtio_is_ready(struct virtio_net *dev)
 {
 	struct vhost_virtqueue *rvq, *tvq;
+	uint32_t i;
 
-	/* mq support in future.*/
-	rvq = dev->virtqueue[VIRTIO_RXQ];
-	tvq = dev->virtqueue[VIRTIO_TXQ];
-	if (rvq && tvq && rvq->desc && tvq->desc &&
-		(rvq->kickfd != -1) &&
-		(rvq->callfd != -1) &&
-		(tvq->kickfd != -1) &&
-		(tvq->callfd != -1)) {
-		RTE_LOG(INFO, VHOST_CONFIG,
-			"virtio is now ready for processing.\n");
-		return 1;
+	for (i = 0; i < dev->virt_qp_nb; i++) {
+		rvq = dev->virtqueue[i * VIRTIO_QNUM + VIRTIO_RXQ];
+		tvq = dev->virtqueue[i * VIRTIO_QNUM + VIRTIO_TXQ];
+
+		if (!vq_is_ready(rvq) || !vq_is_ready(tvq)) {
+			RTE_LOG(INFO, VHOST_CONFIG,
+				"virtio is not ready for processing.\n");
+			return 0;
+		}
 	}
+
 	RTE_LOG(INFO, VHOST_CONFIG,
-		"virtio isn't ready for processing.\n");
-	return 0;
+		"virtio is now ready for processing.\n");
+	return 1;
 }
 
 void
@@ -290,13 +298,9 @@ user_get_vring_base(struct vhost_device_ctx ctx,
 	 * sent and only sent in vhost_vring_stop.
 	 * TODO: cleanup the vring, it isn't usable since here.
 	 */
-	if ((dev->virtqueue[VIRTIO_RXQ]->kickfd) >= 0) {
-		close(dev->virtqueue[VIRTIO_RXQ]->kickfd);
-		dev->virtqueue[VIRTIO_RXQ]->kickfd = -1;
-	}
-	if ((dev->virtqueue[VIRTIO_TXQ]->kickfd) >= 0) {
-		close(dev->virtqueue[VIRTIO_TXQ]->kickfd);
-		dev->virtqueue[VIRTIO_TXQ]->kickfd = -1;
+	if ((dev->virtqueue[state->index]->kickfd) >= 0) {
+		close(dev->virtqueue[state->index]->kickfd);
+		dev->virtqueue[state->index]->kickfd = -1;
 	}
 
 	return 0;
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index deac6b9..57fb7b1 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -36,6 +36,7 @@
 #include <stddef.h>
 #include <stdint.h>
 #include <stdlib.h>
+#include <assert.h>
 #include <sys/mman.h>
 #include <unistd.h>
 #ifdef RTE_LIBRTE_VHOST_NUMA
@@ -178,6 +179,15 @@ add_config_ll_entry(struct virtio_net_config_ll *new_ll_dev)
 
 }
 
+static void
+cleanup_vq(struct vhost_virtqueue *vq)
+{
+	if (vq->callfd >= 0)
+		close(vq->callfd);
+	if (vq->kickfd >= 0)
+		close(vq->kickfd);
+}
+
 /*
  * Unmap any memory, close any file descriptors and
  * free any memory owned by a device.
@@ -185,6 +195,8 @@ add_config_ll_entry(struct virtio_net_config_ll *new_ll_dev)
 static void
 cleanup_device(struct virtio_net *dev)
 {
+	uint32_t i;
+
 	/* Unmap QEMU memory file if mapped. */
 	if (dev->mem) {
 		munmap((void *)(uintptr_t)dev->mem->mapped_address,
@@ -192,15 +204,10 @@ cleanup_device(struct virtio_net *dev)
 		free(dev->mem);
 	}
 
-	/* Close any event notifiers opened by device. */
-	if (dev->virtqueue[VIRTIO_RXQ]->callfd >= 0)
-		close(dev->virtqueue[VIRTIO_RXQ]->callfd);
-	if (dev->virtqueue[VIRTIO_RXQ]->kickfd >= 0)
-		close(dev->virtqueue[VIRTIO_RXQ]->kickfd);
-	if (dev->virtqueue[VIRTIO_TXQ]->callfd >= 0)
-		close(dev->virtqueue[VIRTIO_TXQ]->callfd);
-	if (dev->virtqueue[VIRTIO_TXQ]->kickfd >= 0)
-		close(dev->virtqueue[VIRTIO_TXQ]->kickfd);
+	for (i = 0; i < dev->virt_qp_nb; i++) {
+		cleanup_vq(dev->virtqueue[i * VIRTIO_QNUM + VIRTIO_RXQ]);
+		cleanup_vq(dev->virtqueue[i * VIRTIO_QNUM + VIRTIO_TXQ]);
+	}
 }
 
 /*
@@ -209,9 +216,11 @@ cleanup_device(struct virtio_net *dev)
 static void
 free_device(struct virtio_net_config_ll *ll_dev)
 {
-	/* Free any malloc'd memory */
-	rte_free(ll_dev->dev.virtqueue[VIRTIO_RXQ]);
-	rte_free(ll_dev->dev.virtqueue[VIRTIO_TXQ]);
+	uint32_t i;
+
+	for (i = 0; i < ll_dev->dev.virt_qp_nb; i++)
+		rte_free(ll_dev->dev.virtqueue[i * VIRTIO_QNUM]);
+
 	rte_free(ll_dev);
 }
 
@@ -244,6 +253,50 @@ rm_config_ll_entry(struct virtio_net_config_ll *ll_dev,
 	}
 }
 
+static void
+init_vring_queue(struct vhost_virtqueue *vq)
+{
+	memset(vq, 0, sizeof(struct vhost_virtqueue));
+
+	vq->kickfd = -1;
+	vq->callfd = -1;
+
+	/* Backends are set to -1 indicating an inactive device. */
+	vq->backend = -1;
+}
+
+static void
+init_vring_queue_pair(struct virtio_net *dev, uint32_t qp_idx)
+{
+	init_vring_queue(dev->virtqueue[qp_idx * VIRTIO_QNUM + VIRTIO_RXQ]);
+	init_vring_queue(dev->virtqueue[qp_idx * VIRTIO_QNUM + VIRTIO_TXQ]);
+}
+
+static int
+alloc_vring_queue_pair(struct virtio_net *dev, uint32_t qp_idx)
+{
+	struct vhost_virtqueue *virtqueue = NULL;
+	uint32_t virt_rx_q_idx = qp_idx * VIRTIO_QNUM + VIRTIO_RXQ;
+	uint32_t virt_tx_q_idx = qp_idx * VIRTIO_QNUM + VIRTIO_TXQ;
+
+	virtqueue = rte_malloc(NULL,
+			       sizeof(struct vhost_virtqueue) * VIRTIO_QNUM, 0);
+	if (virtqueue == NULL) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"Failed to allocate memory for virt qp:%d.\n", qp_idx);
+		return -1;
+	}
+
+	dev->virtqueue[virt_rx_q_idx] = virtqueue;
+	dev->virtqueue[virt_tx_q_idx] = virtqueue + VIRTIO_TXQ;
+
+	init_vring_queue_pair(dev, qp_idx);
+
+	dev->virt_qp_nb += 1;
+
+	return 0;
+}
+
 /*
  *  Initialise all variables in device structure.
  */
@@ -251,6 +304,7 @@ static void
 init_device(struct virtio_net *dev)
 {
 	uint64_t vq_offset;
+	uint32_t i;
 
 	/*
 	 * Virtqueues have already been malloced so
@@ -261,17 +315,9 @@ init_device(struct virtio_net *dev)
 	/* Set everything to 0. */
 	memset((void *)(uintptr_t)((uint64_t)(uintptr_t)dev + vq_offset), 0,
 		(sizeof(struct virtio_net) - (size_t)vq_offset));
-	memset(dev->virtqueue[VIRTIO_RXQ], 0, sizeof(struct vhost_virtqueue));
-	memset(dev->virtqueue[VIRTIO_TXQ], 0, sizeof(struct vhost_virtqueue));
-
-	dev->virtqueue[VIRTIO_RXQ]->kickfd = -1;
-	dev->virtqueue[VIRTIO_RXQ]->callfd = -1;
-	dev->virtqueue[VIRTIO_TXQ]->kickfd = -1;
-	dev->virtqueue[VIRTIO_TXQ]->callfd = -1;
 
-	/* Backends are set to -1 indicating an inactive device. */
-	dev->virtqueue[VIRTIO_RXQ]->backend = VIRTIO_DEV_STOPPED;
-	dev->virtqueue[VIRTIO_TXQ]->backend = VIRTIO_DEV_STOPPED;
+	for (i = 0; i < dev->virt_qp_nb; i++)
+		init_vring_queue_pair(dev, i);
 }
 
 /*
@@ -283,7 +329,6 @@ static int
 new_device(struct vhost_device_ctx ctx)
 {
 	struct virtio_net_config_ll *new_ll_dev;
-	struct vhost_virtqueue *virtqueue_rx, *virtqueue_tx;
 
 	/* Setup device and virtqueues. */
 	new_ll_dev = rte_malloc(NULL, sizeof(struct virtio_net_config_ll), 0);
@@ -294,28 +339,6 @@ new_device(struct vhost_device_ctx ctx)
 		return -1;
 	}
 
-	virtqueue_rx = rte_malloc(NULL, sizeof(struct vhost_virtqueue), 0);
-	if (virtqueue_rx == NULL) {
-		rte_free(new_ll_dev);
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") Failed to allocate memory for rxq.\n",
-			ctx.fh);
-		return -1;
-	}
-
-	virtqueue_tx = rte_malloc(NULL, sizeof(struct vhost_virtqueue), 0);
-	if (virtqueue_tx == NULL) {
-		rte_free(virtqueue_rx);
-		rte_free(new_ll_dev);
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") Failed to allocate memory for txq.\n",
-			ctx.fh);
-		return -1;
-	}
-
-	new_ll_dev->dev.virtqueue[VIRTIO_RXQ] = virtqueue_rx;
-	new_ll_dev->dev.virtqueue[VIRTIO_TXQ] = virtqueue_tx;
-
 	/* Initialise device and virtqueues. */
 	init_device(&new_ll_dev->dev);
 
@@ -437,6 +460,8 @@ static int
 set_features(struct vhost_device_ctx ctx, uint64_t *pu)
 {
 	struct virtio_net *dev;
+	uint16_t vhost_hlen;
+	uint16_t i;
 
 	dev = get_device(ctx);
 	if (dev == NULL)
@@ -444,27 +469,26 @@ set_features(struct vhost_device_ctx ctx, uint64_t *pu)
 	if (*pu & ~VHOST_FEATURES)
 		return -1;
 
-	/* Store the negotiated feature list for the device. */
 	dev->features = *pu;
-
-	/* Set the vhost_hlen depending on if VIRTIO_NET_F_MRG_RXBUF is set. */
 	if (dev->features & (1 << VIRTIO_NET_F_MRG_RXBUF)) {
 		LOG_DEBUG(VHOST_CONFIG,
 			"(%"PRIu64") Mergeable RX buffers enabled\n",
 			dev->device_fh);
-		dev->virtqueue[VIRTIO_RXQ]->vhost_hlen =
-			sizeof(struct virtio_net_hdr_mrg_rxbuf);
-		dev->virtqueue[VIRTIO_TXQ]->vhost_hlen =
-			sizeof(struct virtio_net_hdr_mrg_rxbuf);
+		vhost_hlen = sizeof(struct virtio_net_hdr_mrg_rxbuf);
 	} else {
 		LOG_DEBUG(VHOST_CONFIG,
 			"(%"PRIu64") Mergeable RX buffers disabled\n",
 			dev->device_fh);
-		dev->virtqueue[VIRTIO_RXQ]->vhost_hlen =
-			sizeof(struct virtio_net_hdr);
-		dev->virtqueue[VIRTIO_TXQ]->vhost_hlen =
-			sizeof(struct virtio_net_hdr);
+		vhost_hlen = sizeof(struct virtio_net_hdr);
+	}
+
+	for (i = 0; i < dev->virt_qp_nb; i++) {
+		uint16_t base_idx = i * VIRTIO_QNUM;
+
+		dev->virtqueue[base_idx + VIRTIO_RXQ]->vhost_hlen = vhost_hlen;
+		dev->virtqueue[base_idx + VIRTIO_TXQ]->vhost_hlen = vhost_hlen;
 	}
+
 	return 0;
 }
 
@@ -680,13 +704,21 @@ set_vring_call(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
 {
 	struct virtio_net *dev;
 	struct vhost_virtqueue *vq;
+	uint32_t cur_qp_idx = file->index / VIRTIO_QNUM;
 
 	dev = get_device(ctx);
 	if (dev == NULL)
 		return -1;
 
+	/* alloc vring queue pair if it is a new queue pair */
+	if (cur_qp_idx + 1 > dev->virt_qp_nb) {
+		if (alloc_vring_queue_pair(dev, cur_qp_idx) < 0)
+			return -1;
+	}
+
 	/* file->index refers to the queue index. The txq is 1, rxq is 0. */
 	vq = dev->virtqueue[file->index];
+	assert(vq != NULL);
 
 	if (vq->callfd >= 0)
 		close(vq->callfd);
-- 
1.9.0

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [dpdk-dev] [PATCH v7 4/8] vhost: rxtx: use queue id instead of constant ring index
  2015-10-21  3:48 [dpdk-dev] [PATCH v7 0/8] vhost-user multiple queues enabling Yuanhan Liu
                   ` (2 preceding siblings ...)
  2015-10-21  3:48 ` [dpdk-dev] [PATCH v7 3/8] vhost: vring queue setup for multiple queue support Yuanhan Liu
@ 2015-10-21  3:48 ` Yuanhan Liu
  2015-10-21  4:43   ` Stephen Hemminger
                     ` (2 more replies)
  2015-10-21  3:48 ` [dpdk-dev] [PATCH v7 5/8] virtio: fix deadloop due to reading virtio_net_config incorrectly Yuanhan Liu
                   ` (4 subsequent siblings)
  8 siblings, 3 replies; 66+ messages in thread
From: Yuanhan Liu @ 2015-10-21  3:48 UTC (permalink / raw)
  To: dev; +Cc: Michael S. Tsirkin, marcel, Changchun Ouyang

From: Changchun Ouyang <changchun.ouyang@intel.com>

Do not use VIRTIO_RXQ or VIRTIO_TXQ anymore; use the queue_id
instead, which will be set to a proper value for a specific queue
when we have multiple queue support enabled.

For now, queue_id is still set with VIRTIO_RXQ or VIRTIO_TXQ,
so it should not break anything.

Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>

---

v7: commit title fix
---
 lib/librte_vhost/vhost_rxtx.c | 46 ++++++++++++++++++++++++++++++-------------
 1 file changed, 32 insertions(+), 14 deletions(-)

diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index 7026bfa..14e00ef 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -42,6 +42,16 @@
 
 #define MAX_PKT_BURST 32
 
+static inline int __attribute__((always_inline))
+is_valid_virt_queue_idx(uint32_t virtq_idx, int is_tx, uint32_t max_qp_idx)
+{
+	if ((is_tx ^ (virtq_idx & 0x1)) ||
+	    (virtq_idx >= max_qp_idx * VIRTIO_QNUM))
+		return 0;
+
+	return 1;
+}
+
 /**
  * This function adds buffers to the virtio devices RX virtqueue. Buffers can
  * be received from the physical port or from another virtio device. A packet
@@ -68,12 +78,14 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
 	uint8_t success = 0;
 
 	LOG_DEBUG(VHOST_DATA, "(%"PRIu64") virtio_dev_rx()\n", dev->device_fh);
-	if (unlikely(queue_id != VIRTIO_RXQ)) {
-		LOG_DEBUG(VHOST_DATA, "mq isn't supported in this version.\n");
+	if (unlikely(!is_valid_virt_queue_idx(queue_id, 0, dev->virt_qp_nb))) {
+		RTE_LOG(ERR, VHOST_DATA,
+			"%s (%"PRIu64"): virtqueue idx:%d invalid.\n",
+			__func__, dev->device_fh, queue_id);
 		return 0;
 	}
 
-	vq = dev->virtqueue[VIRTIO_RXQ];
+	vq = dev->virtqueue[queue_id];
 	count = (count > MAX_PKT_BURST) ? MAX_PKT_BURST : count;
 
 	/*
@@ -235,8 +247,9 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
 }
 
 static inline uint32_t __attribute__((always_inline))
-copy_from_mbuf_to_vring(struct virtio_net *dev, uint16_t res_base_idx,
-	uint16_t res_end_idx, struct rte_mbuf *pkt)
+copy_from_mbuf_to_vring(struct virtio_net *dev, uint32_t queue_id,
+			uint16_t res_base_idx, uint16_t res_end_idx,
+			struct rte_mbuf *pkt)
 {
 	uint32_t vec_idx = 0;
 	uint32_t entry_success = 0;
@@ -264,7 +277,7 @@ copy_from_mbuf_to_vring(struct virtio_net *dev, uint16_t res_base_idx,
 	 * Convert from gpa to vva
 	 * (guest physical addr -> vhost virtual addr)
 	 */
-	vq = dev->virtqueue[VIRTIO_RXQ];
+	vq = dev->virtqueue[queue_id];
 	vb_addr = gpa_to_vva(dev, vq->buf_vec[vec_idx].buf_addr);
 	vb_hdr_addr = vb_addr;
 
@@ -464,11 +477,14 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id,
 
 	LOG_DEBUG(VHOST_DATA, "(%"PRIu64") virtio_dev_merge_rx()\n",
 		dev->device_fh);
-	if (unlikely(queue_id != VIRTIO_RXQ)) {
-		LOG_DEBUG(VHOST_DATA, "mq isn't supported in this version.\n");
+	if (unlikely(!is_valid_virt_queue_idx(queue_id, 0, dev->virt_qp_nb))) {
+		RTE_LOG(ERR, VHOST_DATA,
+			"%s (%"PRIu64"): virtqueue idx:%d invalid.\n",
+			__func__, dev->device_fh, queue_id);
+		return 0;
 	}
 
-	vq = dev->virtqueue[VIRTIO_RXQ];
+	vq = dev->virtqueue[queue_id];
 	count = RTE_MIN((uint32_t)MAX_PKT_BURST, count);
 
 	if (count == 0)
@@ -509,8 +525,8 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id,
 							res_cur_idx);
 		} while (success == 0);
 
-		entry_success = copy_from_mbuf_to_vring(dev, res_base_idx,
-			res_cur_idx, pkts[pkt_idx]);
+		entry_success = copy_from_mbuf_to_vring(dev, queue_id,
+			res_base_idx, res_cur_idx, pkts[pkt_idx]);
 
 		rte_compiler_barrier();
 
@@ -562,12 +578,14 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id,
 	uint16_t free_entries, entry_success = 0;
 	uint16_t avail_idx;
 
-	if (unlikely(queue_id != VIRTIO_TXQ)) {
-		LOG_DEBUG(VHOST_DATA, "mq isn't supported in this version.\n");
+	if (unlikely(!is_valid_virt_queue_idx(queue_id, 1, dev->virt_qp_nb))) {
+		RTE_LOG(ERR, VHOST_DATA,
+			"%s (%"PRIu64"): virtqueue idx:%d invalid.\n",
+			__func__, dev->device_fh, queue_id);
 		return 0;
 	}
 
-	vq = dev->virtqueue[VIRTIO_TXQ];
+	vq = dev->virtqueue[queue_id];
 	avail_idx =  *((volatile uint16_t *)&vq->avail->idx);
 
 	/* If there are no available buffers then return. */
-- 
1.9.0

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [dpdk-dev] [PATCH v7 5/8] virtio: fix deadloop due to reading virtio_net_config incorrectly
  2015-10-21  3:48 [dpdk-dev] [PATCH v7 0/8] vhost-user multiple queues enabling Yuanhan Liu
                   ` (3 preceding siblings ...)
  2015-10-21  3:48 ` [dpdk-dev] [PATCH v7 4/8] vhost: rxtx: use queue id instead of constant ring index Yuanhan Liu
@ 2015-10-21  3:48 ` Yuanhan Liu
  2015-10-21  3:48 ` [dpdk-dev] [PATCH v7 6/8] vhost-user: enable vhost-user multiple queue Yuanhan Liu
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 66+ messages in thread
From: Yuanhan Liu @ 2015-10-21  3:48 UTC (permalink / raw)
  To: dev; +Cc: Michael S. Tsirkin, marcel, Changchun Ouyang

From: Changchun Ouyang <changchun.ouyang@intel.com>

The old code adjusts the config bytes we want to read depending on
what kind of features we have, but we later cast the entire buf we
read with "struct virtio_net_config", which is obviously wrong.

The wrong config reading results to a dead loop at virtio_send_command()
while starting testpmd.

The right way to go is to read related config bytes when corresponding
feature is set, which is exactly what this patch does.

Fixes: 823ad647950a ("virtio: support multiple queues")

Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>

---

v7: commit log fixes

v6: read mac unconditionally.
---
 drivers/net/virtio/virtio_ethdev.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index 147aca1..a654168 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1163,7 +1163,6 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
 	struct virtio_hw *hw = eth_dev->data->dev_private;
 	struct virtio_net_config *config;
 	struct virtio_net_config local_config;
-	uint32_t offset_conf = sizeof(config->mac);
 	struct rte_pci_device *pci_dev;
 
 	RTE_BUILD_BUG_ON(RTE_PKTMBUF_HEADROOM < sizeof(struct virtio_net_hdr));
@@ -1222,8 +1221,14 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
 	if (vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_VQ)) {
 		config = &local_config;
 
+		vtpci_read_dev_config(hw,
+			offsetof(struct virtio_net_config, mac),
+			&config->mac, sizeof(config->mac));
+
 		if (vtpci_with_feature(hw, VIRTIO_NET_F_STATUS)) {
-			offset_conf += sizeof(config->status);
+			vtpci_read_dev_config(hw,
+				offsetof(struct virtio_net_config, status),
+				&config->status, sizeof(config->status));
 		} else {
 			PMD_INIT_LOG(DEBUG,
 				     "VIRTIO_NET_F_STATUS is not supported");
@@ -1231,15 +1236,16 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
 		}
 
 		if (vtpci_with_feature(hw, VIRTIO_NET_F_MQ)) {
-			offset_conf += sizeof(config->max_virtqueue_pairs);
+			vtpci_read_dev_config(hw,
+				offsetof(struct virtio_net_config, max_virtqueue_pairs),
+				&config->max_virtqueue_pairs,
+				sizeof(config->max_virtqueue_pairs));
 		} else {
 			PMD_INIT_LOG(DEBUG,
 				     "VIRTIO_NET_F_MQ is not supported");
 			config->max_virtqueue_pairs = 1;
 		}
 
-		vtpci_read_dev_config(hw, 0, (uint8_t *)config, offset_conf);
-
 		hw->max_rx_queues =
 			(VIRTIO_MAX_RX_QUEUES < config->max_virtqueue_pairs) ?
 			VIRTIO_MAX_RX_QUEUES : config->max_virtqueue_pairs;
-- 
1.9.0

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [dpdk-dev] [PATCH v7 6/8] vhost-user: enable vhost-user multiple queue
  2015-10-21  3:48 [dpdk-dev] [PATCH v7 0/8] vhost-user multiple queues enabling Yuanhan Liu
                   ` (4 preceding siblings ...)
  2015-10-21  3:48 ` [dpdk-dev] [PATCH v7 5/8] virtio: fix deadloop due to reading virtio_net_config incorrectly Yuanhan Liu
@ 2015-10-21  3:48 ` Yuanhan Liu
  2015-10-21  3:48 ` [dpdk-dev] [PATCH v7 7/8] vhost: add VHOST_USER_SET_VRING_ENABLE message Yuanhan Liu
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 66+ messages in thread
From: Yuanhan Liu @ 2015-10-21  3:48 UTC (permalink / raw)
  To: dev; +Cc: marcel, Michael S. Tsirkin

By setting VHOST_USER_PROTOCOL_F_MQ protocol feature bit, and
VIRTIO_NET_F_MQ feature bit.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
---
 lib/librte_vhost/vhost_user/virtio-net-user.h | 4 +++-
 lib/librte_vhost/virtio-net.c                 | 1 +
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.h b/lib/librte_vhost/vhost_user/virtio-net-user.h
index e7a6ff4..5f6d667 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.h
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.h
@@ -37,7 +37,9 @@
 #include "vhost-net.h"
 #include "vhost-net-user.h"
 
-#define VHOST_USER_PROTOCOL_FEATURES	0ULL
+#define VHOST_USER_PROTOCOL_F_MQ	0
+
+#define VHOST_USER_PROTOCOL_FEATURES	(1ULL << VHOST_USER_PROTOCOL_F_MQ)
 
 int user_set_mem_table(struct vhost_device_ctx, struct VhostUserMsg *);
 
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 57fb7b1..d644022 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -74,6 +74,7 @@ static struct virtio_net_config_ll *ll_root;
 #define VHOST_SUPPORTED_FEATURES ((1ULL << VIRTIO_NET_F_MRG_RXBUF) | \
 				(1ULL << VIRTIO_NET_F_CTRL_VQ) | \
 				(1ULL << VIRTIO_NET_F_CTRL_RX) | \
+				(1ULL << VIRTIO_NET_F_MQ)      | \
 				(1ULL << VHOST_F_LOG_ALL)      | \
 				(1ULL << VHOST_USER_F_PROTOCOL_FEATURES))
 static uint64_t VHOST_FEATURES = VHOST_SUPPORTED_FEATURES;
-- 
1.9.0

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [dpdk-dev] [PATCH v7 7/8] vhost: add VHOST_USER_SET_VRING_ENABLE message
  2015-10-21  3:48 [dpdk-dev] [PATCH v7 0/8] vhost-user multiple queues enabling Yuanhan Liu
                   ` (5 preceding siblings ...)
  2015-10-21  3:48 ` [dpdk-dev] [PATCH v7 6/8] vhost-user: enable vhost-user multiple queue Yuanhan Liu
@ 2015-10-21  3:48 ` Yuanhan Liu
  2015-10-21  3:48 ` [dpdk-dev] [PATCH v7 8/8] doc: update release note for vhost-user mq support Yuanhan Liu
  2015-10-22 12:35 ` [dpdk-dev] [PATCH v8 0/8] vhost-user multiple queues enabling Yuanhan Liu
  8 siblings, 0 replies; 66+ messages in thread
From: Yuanhan Liu @ 2015-10-21  3:48 UTC (permalink / raw)
  To: dev; +Cc: Michael S. Tsirkin, marcel, Changchun Ouyang

From: Changchun Ouyang <changchun.ouyang@intel.com>

This message is used to enable/disable a specific vring queue pair.
The first queue pair is enabled by default.

Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>

---
v7: invoke vring_state_changed() callback once for each queue pair.

v6: add a vring state changed callback, for informing the application
    that a specific vring is enabled/disabled. You could either flush
    packets haven't been processed yet, or simply just drop them.
---
 lib/librte_vhost/rte_virtio_net.h             |  9 ++++++++-
 lib/librte_vhost/vhost_rxtx.c                 | 10 ++++++++++
 lib/librte_vhost/vhost_user/vhost-net-user.c  |  5 +++++
 lib/librte_vhost/vhost_user/vhost-net-user.h  |  1 +
 lib/librte_vhost/vhost_user/virtio-net-user.c | 28 +++++++++++++++++++++++++++
 lib/librte_vhost/vhost_user/virtio-net-user.h |  3 +++
 lib/librte_vhost/virtio-net.c                 | 12 +++++++++---
 7 files changed, 64 insertions(+), 4 deletions(-)

diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
index 5dd6493..fd87f01 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -89,6 +89,7 @@ struct vhost_virtqueue {
 	volatile uint16_t	last_used_idx_res;	/**< Used for multiple devices reserving buffers. */
 	int			callfd;			/**< Used to notify the guest (trigger interrupt). */
 	int			kickfd;			/**< Currently unused as polling mode is enabled. */
+	int			enabled;
 	struct buf_vector	buf_vec[BUF_VECTOR_MAX];	/**< for scatter RX. */
 } __rte_cache_aligned;
 
@@ -132,7 +133,7 @@ struct virtio_memory {
 };
 
 /**
- * Device operations to add/remove device.
+ * Device and vring operations.
  *
  * Make sure to set VIRTIO_DEV_RUNNING to the device flags in new_device and
  * remove it in destroy_device.
@@ -141,12 +142,18 @@ struct virtio_memory {
 struct virtio_net_device_ops {
 	int (*new_device)(struct virtio_net *);	/**< Add device. */
 	void (*destroy_device)(volatile struct virtio_net *);	/**< Remove device. */
+
+	int (*vring_state_changed)(struct virtio_net *dev, uint16_t queue_id, int enable);	/**< triggered when a vring is enabled or disabled */
 };
 
 static inline uint16_t __attribute__((always_inline))
 rte_vring_available_entries(struct virtio_net *dev, uint16_t queue_id)
 {
 	struct vhost_virtqueue *vq = dev->virtqueue[queue_id];
+
+	if (vq->enabled)
+		return 0;
+
 	return *(volatile uint16_t *)&vq->avail->idx - vq->last_used_idx_res;
 }
 
diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index 14e00ef..400f263 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -86,6 +86,9 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
 	}
 
 	vq = dev->virtqueue[queue_id];
+	if (unlikely(vq->enabled == 0))
+		return 0;
+
 	count = (count > MAX_PKT_BURST) ? MAX_PKT_BURST : count;
 
 	/*
@@ -278,6 +281,7 @@ copy_from_mbuf_to_vring(struct virtio_net *dev, uint32_t queue_id,
 	 * (guest physical addr -> vhost virtual addr)
 	 */
 	vq = dev->virtqueue[queue_id];
+
 	vb_addr = gpa_to_vva(dev, vq->buf_vec[vec_idx].buf_addr);
 	vb_hdr_addr = vb_addr;
 
@@ -485,6 +489,9 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id,
 	}
 
 	vq = dev->virtqueue[queue_id];
+	if (unlikely(vq->enabled == 0))
+		return 0;
+
 	count = RTE_MIN((uint32_t)MAX_PKT_BURST, count);
 
 	if (count == 0)
@@ -586,6 +593,9 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id,
 	}
 
 	vq = dev->virtqueue[queue_id];
+	if (unlikely(vq->enabled == 0))
+		return 0;
+
 	avail_idx =  *((volatile uint16_t *)&vq->avail->idx);
 
 	/* If there are no available buffers then return. */
diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c b/lib/librte_vhost/vhost_user/vhost-net-user.c
index 8675cd4..f681676 100644
--- a/lib/librte_vhost/vhost_user/vhost-net-user.c
+++ b/lib/librte_vhost/vhost_user/vhost-net-user.c
@@ -99,6 +99,7 @@ static const char *vhost_message_str[VHOST_USER_MAX] = {
 	[VHOST_USER_GET_PROTOCOL_FEATURES]  = "VHOST_USER_GET_PROTOCOL_FEATURES",
 	[VHOST_USER_SET_PROTOCOL_FEATURES]  = "VHOST_USER_SET_PROTOCOL_FEATURES",
 	[VHOST_USER_GET_QUEUE_NUM]  = "VHOST_USER_GET_QUEUE_NUM",
+	[VHOST_USER_SET_VRING_ENABLE]  = "VHOST_USER_SET_VRING_ENABLE",
 };
 
 /**
@@ -428,6 +429,10 @@ vserver_message_handler(int connfd, void *dat, int *remove)
 		send_vhost_message(connfd, &msg);
 		break;
 
+	case VHOST_USER_SET_VRING_ENABLE:
+		user_set_vring_enable(ctx, &msg.payload.state);
+		break;
+
 	default:
 		break;
 
diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.h b/lib/librte_vhost/vhost_user/vhost-net-user.h
index 389d21d..38637cc 100644
--- a/lib/librte_vhost/vhost_user/vhost-net-user.h
+++ b/lib/librte_vhost/vhost_user/vhost-net-user.h
@@ -66,6 +66,7 @@ typedef enum VhostUserRequest {
 	VHOST_USER_GET_PROTOCOL_FEATURES = 15,
 	VHOST_USER_SET_PROTOCOL_FEATURES = 16,
 	VHOST_USER_GET_QUEUE_NUM = 17,
+	VHOST_USER_SET_VRING_ENABLE = 18,
 	VHOST_USER_MAX
 } VhostUserRequest;
 
diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c b/lib/librte_vhost/vhost_user/virtio-net-user.c
index e83d279..dfddc43 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.c
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
@@ -306,6 +306,34 @@ user_get_vring_base(struct vhost_device_ctx ctx,
 	return 0;
 }
 
+/*
+ * when virtio queues are ready to work, qemu will send us to
+ * enable the virtio queue pair.
+ */
+int
+user_set_vring_enable(struct vhost_device_ctx ctx,
+		      struct vhost_vring_state *state)
+{
+	struct virtio_net *dev = get_device(ctx);
+	uint16_t base_idx = state->index;
+	int enable = (int)state->num;
+
+	RTE_LOG(INFO, VHOST_CONFIG,
+		"set queue enable: %d to qp idx: %d\n",
+		enable, state->index);
+
+	if (dev->protocol_features & (1 << VHOST_USER_PROTOCOL_F_MQ) &&
+	    notify_ops->vring_state_changed) {
+		notify_ops->vring_state_changed(dev, base_idx / VIRTIO_QNUM,
+						enable);
+	}
+
+	dev->virtqueue[base_idx + VIRTIO_RXQ]->enabled = enable;
+	dev->virtqueue[base_idx + VIRTIO_TXQ]->enabled = enable;
+
+	return 0;
+}
+
 void
 user_destroy_device(struct vhost_device_ctx ctx)
 {
diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.h b/lib/librte_vhost/vhost_user/virtio-net-user.h
index 5f6d667..b82108d 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.h
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.h
@@ -52,5 +52,8 @@ void user_set_protocol_features(struct vhost_device_ctx ctx,
 
 int user_get_vring_base(struct vhost_device_ctx, struct vhost_vring_state *);
 
+int user_set_vring_enable(struct vhost_device_ctx ctx,
+			  struct vhost_vring_state *state);
+
 void user_destroy_device(struct vhost_device_ctx);
 #endif
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index d644022..b11fd61 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -255,7 +255,7 @@ rm_config_ll_entry(struct virtio_net_config_ll *ll_dev,
 }
 
 static void
-init_vring_queue(struct vhost_virtqueue *vq)
+init_vring_queue(struct vhost_virtqueue *vq, int qp_idx)
 {
 	memset(vq, 0, sizeof(struct vhost_virtqueue));
 
@@ -264,13 +264,19 @@ init_vring_queue(struct vhost_virtqueue *vq)
 
 	/* Backends are set to -1 indicating an inactive device. */
 	vq->backend = -1;
+
+	/* always set the default vq pair to enabled */
+	if (qp_idx == 0)
+		vq->enabled = 1;
 }
 
 static void
 init_vring_queue_pair(struct virtio_net *dev, uint32_t qp_idx)
 {
-	init_vring_queue(dev->virtqueue[qp_idx * VIRTIO_QNUM + VIRTIO_RXQ]);
-	init_vring_queue(dev->virtqueue[qp_idx * VIRTIO_QNUM + VIRTIO_TXQ]);
+	uint32_t base_idx = qp_idx * VIRTIO_QNUM;
+
+	init_vring_queue(dev->virtqueue[base_idx + VIRTIO_RXQ], qp_idx);
+	init_vring_queue(dev->virtqueue[base_idx + VIRTIO_TXQ], qp_idx);
 }
 
 static int
-- 
1.9.0

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [dpdk-dev] [PATCH v7 8/8] doc: update release note for vhost-user mq support
  2015-10-21  3:48 [dpdk-dev] [PATCH v7 0/8] vhost-user multiple queues enabling Yuanhan Liu
                   ` (6 preceding siblings ...)
  2015-10-21  3:48 ` [dpdk-dev] [PATCH v7 7/8] vhost: add VHOST_USER_SET_VRING_ENABLE message Yuanhan Liu
@ 2015-10-21  3:48 ` Yuanhan Liu
  2015-10-22 12:35 ` [dpdk-dev] [PATCH v8 0/8] vhost-user multiple queues enabling Yuanhan Liu
  8 siblings, 0 replies; 66+ messages in thread
From: Yuanhan Liu @ 2015-10-21  3:48 UTC (permalink / raw)
  To: dev; +Cc: marcel, Michael S. Tsirkin

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
---
 doc/guides/rel_notes/release_2_2.rst | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/doc/guides/rel_notes/release_2_2.rst b/doc/guides/rel_notes/release_2_2.rst
index 4f75cff..612ddd9 100644
--- a/doc/guides/rel_notes/release_2_2.rst
+++ b/doc/guides/rel_notes/release_2_2.rst
@@ -9,6 +9,10 @@ New Features
   *  Added support for Jumbo Frames.
   *  Optimize forwarding performance for Chelsio T5 40GbE cards.
 
+* **vhost: added vhost-user mulitple queue support.**
+
+  Added vhost-user multiple queue support.
+
 
 Resolved Issues
 ---------------
-- 
1.9.0

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v7 4/8] vhost: rxtx: use queue id instead of constant ring index
  2015-10-21  3:48 ` [dpdk-dev] [PATCH v7 4/8] vhost: rxtx: use queue id instead of constant ring index Yuanhan Liu
@ 2015-10-21  4:43   ` Stephen Hemminger
  2015-10-21  6:54     ` Yuanhan Liu
  2015-10-21  7:16     ` Xie, Huawei
  2015-10-21 10:31   ` Michael S. Tsirkin
  2015-10-22  7:26   ` Xie, Huawei
  2 siblings, 2 replies; 66+ messages in thread
From: Stephen Hemminger @ 2015-10-21  4:43 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev, marcel, Changchun Ouyang, Michael S. Tsirkin

On Wed, 21 Oct 2015 11:48:10 +0800
Yuanhan Liu <yuanhan.liu@linux.intel.com> wrote:

>  
> +static inline int __attribute__((always_inline))
> +is_valid_virt_queue_idx(uint32_t virtq_idx, int is_tx, uint32_t max_qp_idx)
> +{
> +	if ((is_tx ^ (virtq_idx & 0x1)) ||
> +	    (virtq_idx >= max_qp_idx * VIRTIO_QNUM))
> +		return 0;
> +
> +	return 1;
> +}

minor nits:
 * this doesn't need to be marked as always inline, 
    that is as they say in English "shooting a fly with a bazooka"
 * prefer to just return logical result rather than have conditional:
 * for booleans prefer the <stdbool.h> type boolean.

static bool
is_valid_virt_queue_idx(uint32_t virtq_idx, bool is_tx, uint32_t max_qp_idx)
{
	return (is_tx ^ (virtq_idx & 1)) || 
		virtq_idx >= max_qp_idx * VIRTIO_QNUM;
}

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v7 3/8] vhost: vring queue setup for multiple queue support
  2015-10-21  3:48 ` [dpdk-dev] [PATCH v7 3/8] vhost: vring queue setup for multiple queue support Yuanhan Liu
@ 2015-10-21  4:45   ` Stephen Hemminger
  2015-10-21  6:52     ` Yuanhan Liu
  2015-10-22  9:49   ` Xie, Huawei
  1 sibling, 1 reply; 66+ messages in thread
From: Stephen Hemminger @ 2015-10-21  4:45 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev, marcel, Michael S. Tsirkin

On Wed, 21 Oct 2015 11:48:09 +0800
Yuanhan Liu <yuanhan.liu@linux.intel.com> wrote:

>  struct virtio_net {
> -	struct vhost_virtqueue	*virtqueue[VIRTIO_QNUM];	/**< Contains all virtqueue information. */
> +	struct vhost_virtqueue	*virtqueue[VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX];	/**< Contains all virtqueue information. */
>  	struct virtio_memory	*mem;		/**< QEMU memory and memory region information. */

Since vhost_virtqueue takes up space, why not put it at end of array,
that way offsets are smaller and all the early fields will be in
adjacent cache lines.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v7 3/8] vhost: vring queue setup for multiple queue support
  2015-10-21  4:45   ` Stephen Hemminger
@ 2015-10-21  6:52     ` Yuanhan Liu
  0 siblings, 0 replies; 66+ messages in thread
From: Yuanhan Liu @ 2015-10-21  6:52 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, marcel, Michael S. Tsirkin

On Tue, Oct 20, 2015 at 09:45:12PM -0700, Stephen Hemminger wrote:
> On Wed, 21 Oct 2015 11:48:09 +0800
> Yuanhan Liu <yuanhan.liu@linux.intel.com> wrote:
> 
> >  struct virtio_net {
> > -	struct vhost_virtqueue	*virtqueue[VIRTIO_QNUM];	/**< Contains all virtqueue information. */
> > +	struct vhost_virtqueue	*virtqueue[VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX];	/**< Contains all virtqueue information. */
> >  	struct virtio_memory	*mem;		/**< QEMU memory and memory region information. */
> 
> Since vhost_virtqueue takes up space, why not put it at end of array,
> that way offsets are smaller and all the early fields will be in
> adjacent cache lines.

Makes sense, and here it is:


-- >8 --
>From 382d0abb744c577f60dbc8e1cdd766615d20b930 Mon Sep 17 00:00:00 2001
From: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Date: Fri, 18 Sep 2015 16:01:10 +0800
Subject: [PATCH] vhost: vring queue setup for multiple queue support

All queue pairs, including the default (the first) queue pair,
are allocated dynamically, when a vring_call message is received
first time for a specific queue pair.

This is a refactor work for enabling vhost-user multiple queue;
it should not break anything as it does no functional changes:
we don't support mq set, so there is only one mq at max.

This patch is based on Changchun's patch.

Signed-off-by: Ouyang Changchun <changchun.ouyang@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>

---

v8: move virtuque field to the end of `virtio_net' struct.
---
 lib/librte_vhost/rte_virtio_net.h             |   3 +-
 lib/librte_vhost/vhost_user/virtio-net-user.c |  44 ++++----
 lib/librte_vhost/virtio-net.c                 | 153 +++++++++++++++-----------
 3 files changed, 117 insertions(+), 83 deletions(-)

diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
index e3a21e5..9a32a95 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -96,7 +96,6 @@ struct vhost_virtqueue {
  * Device structure contains all configuration information relating to the device.
  */
 struct virtio_net {
-	struct vhost_virtqueue	*virtqueue[VIRTIO_QNUM];	/**< Contains all virtqueue information. */
 	struct virtio_memory	*mem;		/**< QEMU memory and memory region information. */
 	uint64_t		features;	/**< Negotiated feature set. */
 	uint64_t		protocol_features;	/**< Negotiated protocol feature set. */
@@ -104,7 +103,9 @@ struct virtio_net {
 	uint32_t		flags;		/**< Device flags. Only used to check if device is running on data core. */
 #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ)
 	char			ifname[IF_NAME_SZ];	/**< Name of the tap device or socket path. */
+	uint32_t		virt_qp_nb;	/**< number of queue pair we have allocated */
 	void			*priv;		/**< private context */
+	struct vhost_virtqueue	*virtqueue[VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX];	/**< Contains all virtqueue information. */
 } __rte_cache_aligned;
 
 /**
diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c b/lib/librte_vhost/vhost_user/virtio-net-user.c
index 360254e..e83d279 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.c
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
@@ -206,25 +206,33 @@ err_mmap:
 }
 
 static int
+vq_is_ready(struct vhost_virtqueue *vq)
+{
+	return vq && vq->desc   &&
+	       vq->kickfd != -1 &&
+	       vq->callfd != -1;
+}
+
+static int
 virtio_is_ready(struct virtio_net *dev)
 {
 	struct vhost_virtqueue *rvq, *tvq;
+	uint32_t i;
 
-	/* mq support in future.*/
-	rvq = dev->virtqueue[VIRTIO_RXQ];
-	tvq = dev->virtqueue[VIRTIO_TXQ];
-	if (rvq && tvq && rvq->desc && tvq->desc &&
-		(rvq->kickfd != -1) &&
-		(rvq->callfd != -1) &&
-		(tvq->kickfd != -1) &&
-		(tvq->callfd != -1)) {
-		RTE_LOG(INFO, VHOST_CONFIG,
-			"virtio is now ready for processing.\n");
-		return 1;
+	for (i = 0; i < dev->virt_qp_nb; i++) {
+		rvq = dev->virtqueue[i * VIRTIO_QNUM + VIRTIO_RXQ];
+		tvq = dev->virtqueue[i * VIRTIO_QNUM + VIRTIO_TXQ];
+
+		if (!vq_is_ready(rvq) || !vq_is_ready(tvq)) {
+			RTE_LOG(INFO, VHOST_CONFIG,
+				"virtio is not ready for processing.\n");
+			return 0;
+		}
 	}
+
 	RTE_LOG(INFO, VHOST_CONFIG,
-		"virtio isn't ready for processing.\n");
-	return 0;
+		"virtio is now ready for processing.\n");
+	return 1;
 }
 
 void
@@ -290,13 +298,9 @@ user_get_vring_base(struct vhost_device_ctx ctx,
 	 * sent and only sent in vhost_vring_stop.
 	 * TODO: cleanup the vring, it isn't usable since here.
 	 */
-	if ((dev->virtqueue[VIRTIO_RXQ]->kickfd) >= 0) {
-		close(dev->virtqueue[VIRTIO_RXQ]->kickfd);
-		dev->virtqueue[VIRTIO_RXQ]->kickfd = -1;
-	}
-	if ((dev->virtqueue[VIRTIO_TXQ]->kickfd) >= 0) {
-		close(dev->virtqueue[VIRTIO_TXQ]->kickfd);
-		dev->virtqueue[VIRTIO_TXQ]->kickfd = -1;
+	if ((dev->virtqueue[state->index]->kickfd) >= 0) {
+		close(dev->virtqueue[state->index]->kickfd);
+		dev->virtqueue[state->index]->kickfd = -1;
 	}
 
 	return 0;
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index deac6b9..decf740 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -36,6 +36,7 @@
 #include <stddef.h>
 #include <stdint.h>
 #include <stdlib.h>
+#include <assert.h>
 #include <sys/mman.h>
 #include <unistd.h>
 #ifdef RTE_LIBRTE_VHOST_NUMA
@@ -178,6 +179,15 @@ add_config_ll_entry(struct virtio_net_config_ll *new_ll_dev)
 
 }
 
+static void
+cleanup_vq(struct vhost_virtqueue *vq)
+{
+	if (vq->callfd >= 0)
+		close(vq->callfd);
+	if (vq->kickfd >= 0)
+		close(vq->kickfd);
+}
+
 /*
  * Unmap any memory, close any file descriptors and
  * free any memory owned by a device.
@@ -185,6 +195,8 @@ add_config_ll_entry(struct virtio_net_config_ll *new_ll_dev)
 static void
 cleanup_device(struct virtio_net *dev)
 {
+	uint32_t i;
+
 	/* Unmap QEMU memory file if mapped. */
 	if (dev->mem) {
 		munmap((void *)(uintptr_t)dev->mem->mapped_address,
@@ -192,15 +204,10 @@ cleanup_device(struct virtio_net *dev)
 		free(dev->mem);
 	}
 
-	/* Close any event notifiers opened by device. */
-	if (dev->virtqueue[VIRTIO_RXQ]->callfd >= 0)
-		close(dev->virtqueue[VIRTIO_RXQ]->callfd);
-	if (dev->virtqueue[VIRTIO_RXQ]->kickfd >= 0)
-		close(dev->virtqueue[VIRTIO_RXQ]->kickfd);
-	if (dev->virtqueue[VIRTIO_TXQ]->callfd >= 0)
-		close(dev->virtqueue[VIRTIO_TXQ]->callfd);
-	if (dev->virtqueue[VIRTIO_TXQ]->kickfd >= 0)
-		close(dev->virtqueue[VIRTIO_TXQ]->kickfd);
+	for (i = 0; i < dev->virt_qp_nb; i++) {
+		cleanup_vq(dev->virtqueue[i * VIRTIO_QNUM + VIRTIO_RXQ]);
+		cleanup_vq(dev->virtqueue[i * VIRTIO_QNUM + VIRTIO_TXQ]);
+	}
 }
 
 /*
@@ -209,9 +216,11 @@ cleanup_device(struct virtio_net *dev)
 static void
 free_device(struct virtio_net_config_ll *ll_dev)
 {
-	/* Free any malloc'd memory */
-	rte_free(ll_dev->dev.virtqueue[VIRTIO_RXQ]);
-	rte_free(ll_dev->dev.virtqueue[VIRTIO_TXQ]);
+	uint32_t i;
+
+	for (i = 0; i < ll_dev->dev.virt_qp_nb; i++)
+		rte_free(ll_dev->dev.virtqueue[i * VIRTIO_QNUM]);
+
 	rte_free(ll_dev);
 }
 
@@ -244,34 +253,68 @@ rm_config_ll_entry(struct virtio_net_config_ll *ll_dev,
 	}
 }
 
+static void
+init_vring_queue(struct vhost_virtqueue *vq)
+{
+	memset(vq, 0, sizeof(struct vhost_virtqueue));
+
+	vq->kickfd = -1;
+	vq->callfd = -1;
+
+	/* Backends are set to -1 indicating an inactive device. */
+	vq->backend = -1;
+}
+
+static void
+init_vring_queue_pair(struct virtio_net *dev, uint32_t qp_idx)
+{
+	init_vring_queue(dev->virtqueue[qp_idx * VIRTIO_QNUM + VIRTIO_RXQ]);
+	init_vring_queue(dev->virtqueue[qp_idx * VIRTIO_QNUM + VIRTIO_TXQ]);
+}
+
+static int
+alloc_vring_queue_pair(struct virtio_net *dev, uint32_t qp_idx)
+{
+	struct vhost_virtqueue *virtqueue = NULL;
+	uint32_t virt_rx_q_idx = qp_idx * VIRTIO_QNUM + VIRTIO_RXQ;
+	uint32_t virt_tx_q_idx = qp_idx * VIRTIO_QNUM + VIRTIO_TXQ;
+
+	virtqueue = rte_malloc(NULL,
+			       sizeof(struct vhost_virtqueue) * VIRTIO_QNUM, 0);
+	if (virtqueue == NULL) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"Failed to allocate memory for virt qp:%d.\n", qp_idx);
+		return -1;
+	}
+
+	dev->virtqueue[virt_rx_q_idx] = virtqueue;
+	dev->virtqueue[virt_tx_q_idx] = virtqueue + VIRTIO_TXQ;
+
+	init_vring_queue_pair(dev, qp_idx);
+
+	dev->virt_qp_nb += 1;
+
+	return 0;
+}
+
 /*
  *  Initialise all variables in device structure.
  */
 static void
 init_device(struct virtio_net *dev)
 {
-	uint64_t vq_offset;
+	int vq_offset;
+	uint32_t i;
 
 	/*
 	 * Virtqueues have already been malloced so
 	 * we don't want to set them to NULL.
 	 */
-	vq_offset = offsetof(struct virtio_net, mem);
+	vq_offset = offsetof(struct virtio_net, virtqueue);
+	memset(dev, 0, vq_offset);
 
-	/* Set everything to 0. */
-	memset((void *)(uintptr_t)((uint64_t)(uintptr_t)dev + vq_offset), 0,
-		(sizeof(struct virtio_net) - (size_t)vq_offset));
-	memset(dev->virtqueue[VIRTIO_RXQ], 0, sizeof(struct vhost_virtqueue));
-	memset(dev->virtqueue[VIRTIO_TXQ], 0, sizeof(struct vhost_virtqueue));
-
-	dev->virtqueue[VIRTIO_RXQ]->kickfd = -1;
-	dev->virtqueue[VIRTIO_RXQ]->callfd = -1;
-	dev->virtqueue[VIRTIO_TXQ]->kickfd = -1;
-	dev->virtqueue[VIRTIO_TXQ]->callfd = -1;
-
-	/* Backends are set to -1 indicating an inactive device. */
-	dev->virtqueue[VIRTIO_RXQ]->backend = VIRTIO_DEV_STOPPED;
-	dev->virtqueue[VIRTIO_TXQ]->backend = VIRTIO_DEV_STOPPED;
+	for (i = 0; i < dev->virt_qp_nb; i++)
+		init_vring_queue_pair(dev, i);
 }
 
 /*
@@ -283,7 +326,6 @@ static int
 new_device(struct vhost_device_ctx ctx)
 {
 	struct virtio_net_config_ll *new_ll_dev;
-	struct vhost_virtqueue *virtqueue_rx, *virtqueue_tx;
 
 	/* Setup device and virtqueues. */
 	new_ll_dev = rte_malloc(NULL, sizeof(struct virtio_net_config_ll), 0);
@@ -294,28 +336,6 @@ new_device(struct vhost_device_ctx ctx)
 		return -1;
 	}
 
-	virtqueue_rx = rte_malloc(NULL, sizeof(struct vhost_virtqueue), 0);
-	if (virtqueue_rx == NULL) {
-		rte_free(new_ll_dev);
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") Failed to allocate memory for rxq.\n",
-			ctx.fh);
-		return -1;
-	}
-
-	virtqueue_tx = rte_malloc(NULL, sizeof(struct vhost_virtqueue), 0);
-	if (virtqueue_tx == NULL) {
-		rte_free(virtqueue_rx);
-		rte_free(new_ll_dev);
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") Failed to allocate memory for txq.\n",
-			ctx.fh);
-		return -1;
-	}
-
-	new_ll_dev->dev.virtqueue[VIRTIO_RXQ] = virtqueue_rx;
-	new_ll_dev->dev.virtqueue[VIRTIO_TXQ] = virtqueue_tx;
-
 	/* Initialise device and virtqueues. */
 	init_device(&new_ll_dev->dev);
 
@@ -437,6 +457,8 @@ static int
 set_features(struct vhost_device_ctx ctx, uint64_t *pu)
 {
 	struct virtio_net *dev;
+	uint16_t vhost_hlen;
+	uint16_t i;
 
 	dev = get_device(ctx);
 	if (dev == NULL)
@@ -444,27 +466,26 @@ set_features(struct vhost_device_ctx ctx, uint64_t *pu)
 	if (*pu & ~VHOST_FEATURES)
 		return -1;
 
-	/* Store the negotiated feature list for the device. */
 	dev->features = *pu;
-
-	/* Set the vhost_hlen depending on if VIRTIO_NET_F_MRG_RXBUF is set. */
 	if (dev->features & (1 << VIRTIO_NET_F_MRG_RXBUF)) {
 		LOG_DEBUG(VHOST_CONFIG,
 			"(%"PRIu64") Mergeable RX buffers enabled\n",
 			dev->device_fh);
-		dev->virtqueue[VIRTIO_RXQ]->vhost_hlen =
-			sizeof(struct virtio_net_hdr_mrg_rxbuf);
-		dev->virtqueue[VIRTIO_TXQ]->vhost_hlen =
-			sizeof(struct virtio_net_hdr_mrg_rxbuf);
+		vhost_hlen = sizeof(struct virtio_net_hdr_mrg_rxbuf);
 	} else {
 		LOG_DEBUG(VHOST_CONFIG,
 			"(%"PRIu64") Mergeable RX buffers disabled\n",
 			dev->device_fh);
-		dev->virtqueue[VIRTIO_RXQ]->vhost_hlen =
-			sizeof(struct virtio_net_hdr);
-		dev->virtqueue[VIRTIO_TXQ]->vhost_hlen =
-			sizeof(struct virtio_net_hdr);
+		vhost_hlen = sizeof(struct virtio_net_hdr);
+	}
+
+	for (i = 0; i < dev->virt_qp_nb; i++) {
+		uint16_t base_idx = i * VIRTIO_QNUM;
+
+		dev->virtqueue[base_idx + VIRTIO_RXQ]->vhost_hlen = vhost_hlen;
+		dev->virtqueue[base_idx + VIRTIO_TXQ]->vhost_hlen = vhost_hlen;
 	}
+
 	return 0;
 }
 
@@ -680,13 +701,21 @@ set_vring_call(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
 {
 	struct virtio_net *dev;
 	struct vhost_virtqueue *vq;
+	uint32_t cur_qp_idx = file->index / VIRTIO_QNUM;
 
 	dev = get_device(ctx);
 	if (dev == NULL)
 		return -1;
 
+	/* alloc vring queue pair if it is a new queue pair */
+	if (cur_qp_idx + 1 > dev->virt_qp_nb) {
+		if (alloc_vring_queue_pair(dev, cur_qp_idx) < 0)
+			return -1;
+	}
+
 	/* file->index refers to the queue index. The txq is 1, rxq is 0. */
 	vq = dev->virtqueue[file->index];
+	assert(vq != NULL);
 
 	if (vq->callfd >= 0)
 		close(vq->callfd);
-- 
1.9.0

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v7 4/8] vhost: rxtx: use queue id instead of constant ring index
  2015-10-21  4:43   ` Stephen Hemminger
@ 2015-10-21  6:54     ` Yuanhan Liu
  2015-10-21  7:16     ` Xie, Huawei
  1 sibling, 0 replies; 66+ messages in thread
From: Yuanhan Liu @ 2015-10-21  6:54 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, marcel, Michael S. Tsirkin

On Tue, Oct 20, 2015 at 09:43:54PM -0700, Stephen Hemminger wrote:
> On Wed, 21 Oct 2015 11:48:10 +0800
> Yuanhan Liu <yuanhan.liu@linux.intel.com> wrote:
> 
> >  
> > +static inline int __attribute__((always_inline))
> > +is_valid_virt_queue_idx(uint32_t virtq_idx, int is_tx, uint32_t max_qp_idx)
> > +{
> > +	if ((is_tx ^ (virtq_idx & 0x1)) ||
> > +	    (virtq_idx >= max_qp_idx * VIRTIO_QNUM))
> > +		return 0;
> > +
> > +	return 1;
> > +}
> 
> minor nits:
>  * this doesn't need to be marked as always inline, 
>     that is as they say in English "shooting a fly with a bazooka"
>  * prefer to just return logical result rather than have conditional:
>  * for booleans prefer the <stdbool.h> type boolean.
> 
> static bool
> is_valid_virt_queue_idx(uint32_t virtq_idx, bool is_tx, uint32_t max_qp_idx)
> {
> 	return (is_tx ^ (virtq_idx & 1)) || 
> 		virtq_idx >= max_qp_idx * VIRTIO_QNUM;
> }

Thanks for review, and here you go:


-- >8 --
>From 0be45f37c63c86e93b9caf751ceaeec0a4e66fa5 Mon Sep 17 00:00:00 2001
From: Changchun Ouyang <changchun.ouyang@intel.com>
Date: Wed, 16 Sep 2015 15:40:32 +0800
Subject: [PATCH] vhost: rxtx: use queue id instead of constant ring index

Do not use VIRTIO_RXQ or VIRTIO_TXQ anymore; use the queue_id
instead, which will be set to a proper value for a specific queue
when we have multiple queue support enabled.

For now, queue_id is still set with VIRTIO_RXQ or VIRTIO_TXQ,
so it should not break anything.

Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>

---

v8: simplify is_valid_vrit_queue_idx()

v7: commit title fix
---
 lib/librte_vhost/vhost_rxtx.c | 43 +++++++++++++++++++++++++++++--------------
 1 file changed, 29 insertions(+), 14 deletions(-)

diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index 7026bfa..1ec8850 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -32,6 +32,7 @@
  */
 
 #include <stdint.h>
+#include <stdbool.h>
 #include <linux/virtio_net.h>
 
 #include <rte_mbuf.h>
@@ -42,6 +43,12 @@
 
 #define MAX_PKT_BURST 32
 
+static bool
+is_valid_virt_queue_idx(uint32_t idx, int is_tx, uint32_t qp_nb)
+{
+	return (is_tx ^ (idx & 1)) == 0 && idx < qp_nb * VIRTIO_QNUM;
+}
+
 /**
  * This function adds buffers to the virtio devices RX virtqueue. Buffers can
  * be received from the physical port or from another virtio device. A packet
@@ -68,12 +75,14 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
 	uint8_t success = 0;
 
 	LOG_DEBUG(VHOST_DATA, "(%"PRIu64") virtio_dev_rx()\n", dev->device_fh);
-	if (unlikely(queue_id != VIRTIO_RXQ)) {
-		LOG_DEBUG(VHOST_DATA, "mq isn't supported in this version.\n");
+	if (unlikely(!is_valid_virt_queue_idx(queue_id, 0, dev->virt_qp_nb))) {
+		RTE_LOG(ERR, VHOST_DATA,
+			"%s (%"PRIu64"): virtqueue idx:%d invalid.\n",
+			__func__, dev->device_fh, queue_id);
 		return 0;
 	}
 
-	vq = dev->virtqueue[VIRTIO_RXQ];
+	vq = dev->virtqueue[queue_id];
 	count = (count > MAX_PKT_BURST) ? MAX_PKT_BURST : count;
 
 	/*
@@ -235,8 +244,9 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
 }
 
 static inline uint32_t __attribute__((always_inline))
-copy_from_mbuf_to_vring(struct virtio_net *dev, uint16_t res_base_idx,
-	uint16_t res_end_idx, struct rte_mbuf *pkt)
+copy_from_mbuf_to_vring(struct virtio_net *dev, uint32_t queue_id,
+			uint16_t res_base_idx, uint16_t res_end_idx,
+			struct rte_mbuf *pkt)
 {
 	uint32_t vec_idx = 0;
 	uint32_t entry_success = 0;
@@ -264,7 +274,7 @@ copy_from_mbuf_to_vring(struct virtio_net *dev, uint16_t res_base_idx,
 	 * Convert from gpa to vva
 	 * (guest physical addr -> vhost virtual addr)
 	 */
-	vq = dev->virtqueue[VIRTIO_RXQ];
+	vq = dev->virtqueue[queue_id];
 	vb_addr = gpa_to_vva(dev, vq->buf_vec[vec_idx].buf_addr);
 	vb_hdr_addr = vb_addr;
 
@@ -464,11 +474,14 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id,
 
 	LOG_DEBUG(VHOST_DATA, "(%"PRIu64") virtio_dev_merge_rx()\n",
 		dev->device_fh);
-	if (unlikely(queue_id != VIRTIO_RXQ)) {
-		LOG_DEBUG(VHOST_DATA, "mq isn't supported in this version.\n");
+	if (unlikely(!is_valid_virt_queue_idx(queue_id, 0, dev->virt_qp_nb))) {
+		RTE_LOG(ERR, VHOST_DATA,
+			"%s (%"PRIu64"): virtqueue idx:%d invalid.\n",
+			__func__, dev->device_fh, queue_id);
+		return 0;
 	}
 
-	vq = dev->virtqueue[VIRTIO_RXQ];
+	vq = dev->virtqueue[queue_id];
 	count = RTE_MIN((uint32_t)MAX_PKT_BURST, count);
 
 	if (count == 0)
@@ -509,8 +522,8 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id,
 							res_cur_idx);
 		} while (success == 0);
 
-		entry_success = copy_from_mbuf_to_vring(dev, res_base_idx,
-			res_cur_idx, pkts[pkt_idx]);
+		entry_success = copy_from_mbuf_to_vring(dev, queue_id,
+			res_base_idx, res_cur_idx, pkts[pkt_idx]);
 
 		rte_compiler_barrier();
 
@@ -562,12 +575,14 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id,
 	uint16_t free_entries, entry_success = 0;
 	uint16_t avail_idx;
 
-	if (unlikely(queue_id != VIRTIO_TXQ)) {
-		LOG_DEBUG(VHOST_DATA, "mq isn't supported in this version.\n");
+	if (unlikely(!is_valid_virt_queue_idx(queue_id, 1, dev->virt_qp_nb))) {
+		RTE_LOG(ERR, VHOST_DATA,
+			"%s (%"PRIu64"): virtqueue idx:%d invalid.\n",
+			__func__, dev->device_fh, queue_id);
 		return 0;
 	}
 
-	vq = dev->virtqueue[VIRTIO_TXQ];
+	vq = dev->virtqueue[queue_id];
 	avail_idx =  *((volatile uint16_t *)&vq->avail->idx);
 
 	/* If there are no available buffers then return. */
-- 
1.9.0

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v7 4/8] vhost: rxtx: use queue id instead of constant ring index
  2015-10-21  4:43   ` Stephen Hemminger
  2015-10-21  6:54     ` Yuanhan Liu
@ 2015-10-21  7:16     ` Xie, Huawei
  2015-10-21  9:38       ` Ananyev, Konstantin
  1 sibling, 1 reply; 66+ messages in thread
From: Xie, Huawei @ 2015-10-21  7:16 UTC (permalink / raw)
  To: Stephen Hemminger, Yuanhan Liu
  Cc: dev, marcel, Michael S. Tsirkin, Changchun Ouyang

On 10/21/2015 12:44 PM, Stephen Hemminger wrote:
> On Wed, 21 Oct 2015 11:48:10 +0800
> Yuanhan Liu <yuanhan.liu@linux.intel.com> wrote:
>
>>  
>> +static inline int __attribute__((always_inline))
>> +is_valid_virt_queue_idx(uint32_t virtq_idx, int is_tx, uint32_t max_qp_idx)
>> +{
>> +	if ((is_tx ^ (virtq_idx & 0x1)) ||
>> +	    (virtq_idx >= max_qp_idx * VIRTIO_QNUM))
>> +		return 0;
>> +
>> +	return 1;
>> +}
> minor nits:
>  * this doesn't need to be marked as always inline, 
>     that is as they say in English "shooting a fly with a bazooka"
Stephen:
always_inline "forces" the compiler to inline this function, like a macro.
When should it be used or is it not preferred at all?

>  * prefer to just return logical result rather than have conditional:
>  * for booleans prefer the <stdbool.h> type boolean.
>
> static bool
> is_valid_virt_queue_idx(uint32_t virtq_idx, bool is_tx, uint32_t max_qp_idx)
> {
> 	return (is_tx ^ (virtq_idx & 1)) || 
> 		virtq_idx >= max_qp_idx * VIRTIO_QNUM;
> }
>


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v7 4/8] vhost: rxtx: use queue id instead of constant ring index
  2015-10-21  7:16     ` Xie, Huawei
@ 2015-10-21  9:38       ` Ananyev, Konstantin
  2015-10-21 15:47         ` Stephen Hemminger
  0 siblings, 1 reply; 66+ messages in thread
From: Ananyev, Konstantin @ 2015-10-21  9:38 UTC (permalink / raw)
  To: Xie, Huawei, Stephen Hemminger, Yuanhan Liu
  Cc: dev, marcel, Changchun Ouyang, Michael S. Tsirkin



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Xie, Huawei
> Sent: Wednesday, October 21, 2015 8:16 AM
> To: Stephen Hemminger; Yuanhan Liu
> Cc: dev@dpdk.org; marcel@redhat.com; Michael S. Tsirkin; Changchun Ouyang
> Subject: Re: [dpdk-dev] [PATCH v7 4/8] vhost: rxtx: use queue id instead of constant ring index
> 
> On 10/21/2015 12:44 PM, Stephen Hemminger wrote:
> > On Wed, 21 Oct 2015 11:48:10 +0800
> > Yuanhan Liu <yuanhan.liu@linux.intel.com> wrote:
> >
> >>
> >> +static inline int __attribute__((always_inline))
> >> +is_valid_virt_queue_idx(uint32_t virtq_idx, int is_tx, uint32_t max_qp_idx)
> >> +{
> >> +	if ((is_tx ^ (virtq_idx & 0x1)) ||
> >> +	    (virtq_idx >= max_qp_idx * VIRTIO_QNUM))
> >> +		return 0;
> >> +
> >> +	return 1;
> >> +}
> > minor nits:
> >  * this doesn't need to be marked as always inline,
> >     that is as they say in English "shooting a fly with a bazooka"
> Stephen:
> always_inline "forces" the compiler to inline this function, like a macro.
> When should it be used or is it not preferred at all?

I also don't understand what's wrong with using 'always_inline' here.
As I understand the author wants compiler to *always inline* that function.
So seems perfectly ok to use it here.
As I remember just 'inline' is sort of recommendation that compiler is free to ignore.
Konstantin 

> 
> >  * prefer to just return logical result rather than have conditional:
> >  * for booleans prefer the <stdbool.h> type boolean.
> >
> > static bool
> > is_valid_virt_queue_idx(uint32_t virtq_idx, bool is_tx, uint32_t max_qp_idx)
> > {
> > 	return (is_tx ^ (virtq_idx & 1)) ||
> > 		virtq_idx >= max_qp_idx * VIRTIO_QNUM;
> > }
> >

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v7 4/8] vhost: rxtx: use queue id instead of constant ring index
  2015-10-21  3:48 ` [dpdk-dev] [PATCH v7 4/8] vhost: rxtx: use queue id instead of constant ring index Yuanhan Liu
  2015-10-21  4:43   ` Stephen Hemminger
@ 2015-10-21 10:31   ` Michael S. Tsirkin
  2015-10-21 12:48     ` Yuanhan Liu
  2015-10-22  7:26   ` Xie, Huawei
  2 siblings, 1 reply; 66+ messages in thread
From: Michael S. Tsirkin @ 2015-10-21 10:31 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev, marcel, Changchun Ouyang

On Wed, Oct 21, 2015 at 11:48:10AM +0800, Yuanhan Liu wrote:
> From: Changchun Ouyang <changchun.ouyang@intel.com>
> 
> Do not use VIRTIO_RXQ or VIRTIO_TXQ anymore; use the queue_id
> instead, which will be set to a proper value for a specific queue
> when we have multiple queue support enabled.
> 
> For now, queue_id is still set with VIRTIO_RXQ or VIRTIO_TXQ,
> so it should not break anything.
> 
> Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> Acked-by: Flavio Leitner <fbl@sysclose.org>

I tried to figure out how is queue_id set and I couldn't.
Please note that for virtio devices, guest is supposed to
control the placement of incoming packets in RX queues.


> ---
> 
> v7: commit title fix
> ---
>  lib/librte_vhost/vhost_rxtx.c | 46 ++++++++++++++++++++++++++++++-------------
>  1 file changed, 32 insertions(+), 14 deletions(-)
> 
> diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
> index 7026bfa..14e00ef 100644
> --- a/lib/librte_vhost/vhost_rxtx.c
> +++ b/lib/librte_vhost/vhost_rxtx.c
> @@ -42,6 +42,16 @@
>  
>  #define MAX_PKT_BURST 32
>  
> +static inline int __attribute__((always_inline))
> +is_valid_virt_queue_idx(uint32_t virtq_idx, int is_tx, uint32_t max_qp_idx)
> +{
> +	if ((is_tx ^ (virtq_idx & 0x1)) ||
> +	    (virtq_idx >= max_qp_idx * VIRTIO_QNUM))
> +		return 0;
> +
> +	return 1;
> +}
> +
>  /**
>   * This function adds buffers to the virtio devices RX virtqueue. Buffers can
>   * be received from the physical port or from another virtio device. A packet
> @@ -68,12 +78,14 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
>  	uint8_t success = 0;
>  
>  	LOG_DEBUG(VHOST_DATA, "(%"PRIu64") virtio_dev_rx()\n", dev->device_fh);
> -	if (unlikely(queue_id != VIRTIO_RXQ)) {
> -		LOG_DEBUG(VHOST_DATA, "mq isn't supported in this version.\n");
> +	if (unlikely(!is_valid_virt_queue_idx(queue_id, 0, dev->virt_qp_nb))) {
> +		RTE_LOG(ERR, VHOST_DATA,
> +			"%s (%"PRIu64"): virtqueue idx:%d invalid.\n",
> +			__func__, dev->device_fh, queue_id);
>  		return 0;
>  	}
>  
> -	vq = dev->virtqueue[VIRTIO_RXQ];
> +	vq = dev->virtqueue[queue_id];
>  	count = (count > MAX_PKT_BURST) ? MAX_PKT_BURST : count;
>  
>  	/*
> @@ -235,8 +247,9 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
>  }
>  
>  static inline uint32_t __attribute__((always_inline))
> -copy_from_mbuf_to_vring(struct virtio_net *dev, uint16_t res_base_idx,
> -	uint16_t res_end_idx, struct rte_mbuf *pkt)
> +copy_from_mbuf_to_vring(struct virtio_net *dev, uint32_t queue_id,
> +			uint16_t res_base_idx, uint16_t res_end_idx,
> +			struct rte_mbuf *pkt)
>  {
>  	uint32_t vec_idx = 0;
>  	uint32_t entry_success = 0;
> @@ -264,7 +277,7 @@ copy_from_mbuf_to_vring(struct virtio_net *dev, uint16_t res_base_idx,
>  	 * Convert from gpa to vva
>  	 * (guest physical addr -> vhost virtual addr)
>  	 */
> -	vq = dev->virtqueue[VIRTIO_RXQ];
> +	vq = dev->virtqueue[queue_id];
>  	vb_addr = gpa_to_vva(dev, vq->buf_vec[vec_idx].buf_addr);
>  	vb_hdr_addr = vb_addr;
>  
> @@ -464,11 +477,14 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id,
>  
>  	LOG_DEBUG(VHOST_DATA, "(%"PRIu64") virtio_dev_merge_rx()\n",
>  		dev->device_fh);
> -	if (unlikely(queue_id != VIRTIO_RXQ)) {
> -		LOG_DEBUG(VHOST_DATA, "mq isn't supported in this version.\n");
> +	if (unlikely(!is_valid_virt_queue_idx(queue_id, 0, dev->virt_qp_nb))) {
> +		RTE_LOG(ERR, VHOST_DATA,
> +			"%s (%"PRIu64"): virtqueue idx:%d invalid.\n",
> +			__func__, dev->device_fh, queue_id);
> +		return 0;
>  	}
>  
> -	vq = dev->virtqueue[VIRTIO_RXQ];
> +	vq = dev->virtqueue[queue_id];
>  	count = RTE_MIN((uint32_t)MAX_PKT_BURST, count);
>  
>  	if (count == 0)
> @@ -509,8 +525,8 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id,
>  							res_cur_idx);
>  		} while (success == 0);
>  
> -		entry_success = copy_from_mbuf_to_vring(dev, res_base_idx,
> -			res_cur_idx, pkts[pkt_idx]);
> +		entry_success = copy_from_mbuf_to_vring(dev, queue_id,
> +			res_base_idx, res_cur_idx, pkts[pkt_idx]);
>  
>  		rte_compiler_barrier();
>  
> @@ -562,12 +578,14 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id,
>  	uint16_t free_entries, entry_success = 0;
>  	uint16_t avail_idx;
>  
> -	if (unlikely(queue_id != VIRTIO_TXQ)) {
> -		LOG_DEBUG(VHOST_DATA, "mq isn't supported in this version.\n");
> +	if (unlikely(!is_valid_virt_queue_idx(queue_id, 1, dev->virt_qp_nb))) {
> +		RTE_LOG(ERR, VHOST_DATA,
> +			"%s (%"PRIu64"): virtqueue idx:%d invalid.\n",
> +			__func__, dev->device_fh, queue_id);
>  		return 0;
>  	}
>  
> -	vq = dev->virtqueue[VIRTIO_TXQ];
> +	vq = dev->virtqueue[queue_id];
>  	avail_idx =  *((volatile uint16_t *)&vq->avail->idx);
>  
>  	/* If there are no available buffers then return. */
> -- 
> 1.9.0

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v7 4/8] vhost: rxtx: use queue id instead of constant ring index
  2015-10-21 10:31   ` Michael S. Tsirkin
@ 2015-10-21 12:48     ` Yuanhan Liu
  2015-10-21 14:26       ` Michael S. Tsirkin
  0 siblings, 1 reply; 66+ messages in thread
From: Yuanhan Liu @ 2015-10-21 12:48 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: dev, marcel

On Wed, Oct 21, 2015 at 01:31:55PM +0300, Michael S. Tsirkin wrote:
> On Wed, Oct 21, 2015 at 11:48:10AM +0800, Yuanhan Liu wrote:
> > From: Changchun Ouyang <changchun.ouyang@intel.com>
> > 
> > Do not use VIRTIO_RXQ or VIRTIO_TXQ anymore; use the queue_id
> > instead, which will be set to a proper value for a specific queue
> > when we have multiple queue support enabled.
> > 
> > For now, queue_id is still set with VIRTIO_RXQ or VIRTIO_TXQ,
> > so it should not break anything.
> > 
> > Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
> > Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> > Acked-by: Flavio Leitner <fbl@sysclose.org>
> 
> I tried to figure out how is queue_id set and I couldn't.

queue_id is set outside the DPDK library, it's up to application
to select a queue. There was a demo (examples/vhost/vhost-switch)
before, and it was removed. (check the cover letter for the reason).

> Please note that for virtio devices, guest is supposed to
> control the placement of incoming packets in RX queues.

I may not follow you.

Enqueuing packets to a RX queue is done at vhost lib, outside the
guest, how could the guest take the control here?

	--yliu

> > ---
> > 
> > v7: commit title fix
> > ---
> >  lib/librte_vhost/vhost_rxtx.c | 46 ++++++++++++++++++++++++++++++-------------
> >  1 file changed, 32 insertions(+), 14 deletions(-)
> > 
> > diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
> > index 7026bfa..14e00ef 100644
> > --- a/lib/librte_vhost/vhost_rxtx.c
> > +++ b/lib/librte_vhost/vhost_rxtx.c
> > @@ -42,6 +42,16 @@
> >  
> >  #define MAX_PKT_BURST 32
> >  
> > +static inline int __attribute__((always_inline))
> > +is_valid_virt_queue_idx(uint32_t virtq_idx, int is_tx, uint32_t max_qp_idx)
> > +{
> > +	if ((is_tx ^ (virtq_idx & 0x1)) ||
> > +	    (virtq_idx >= max_qp_idx * VIRTIO_QNUM))
> > +		return 0;
> > +
> > +	return 1;
> > +}
> > +
> >  /**
> >   * This function adds buffers to the virtio devices RX virtqueue. Buffers can
> >   * be received from the physical port or from another virtio device. A packet
> > @@ -68,12 +78,14 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
> >  	uint8_t success = 0;
> >  
> >  	LOG_DEBUG(VHOST_DATA, "(%"PRIu64") virtio_dev_rx()\n", dev->device_fh);
> > -	if (unlikely(queue_id != VIRTIO_RXQ)) {
> > -		LOG_DEBUG(VHOST_DATA, "mq isn't supported in this version.\n");
> > +	if (unlikely(!is_valid_virt_queue_idx(queue_id, 0, dev->virt_qp_nb))) {
> > +		RTE_LOG(ERR, VHOST_DATA,
> > +			"%s (%"PRIu64"): virtqueue idx:%d invalid.\n",
> > +			__func__, dev->device_fh, queue_id);
> >  		return 0;
> >  	}
> >  
> > -	vq = dev->virtqueue[VIRTIO_RXQ];
> > +	vq = dev->virtqueue[queue_id];
> >  	count = (count > MAX_PKT_BURST) ? MAX_PKT_BURST : count;
> >  
> >  	/*
> > @@ -235,8 +247,9 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
> >  }
> >  
> >  static inline uint32_t __attribute__((always_inline))
> > -copy_from_mbuf_to_vring(struct virtio_net *dev, uint16_t res_base_idx,
> > -	uint16_t res_end_idx, struct rte_mbuf *pkt)
> > +copy_from_mbuf_to_vring(struct virtio_net *dev, uint32_t queue_id,
> > +			uint16_t res_base_idx, uint16_t res_end_idx,
> > +			struct rte_mbuf *pkt)
> >  {
> >  	uint32_t vec_idx = 0;
> >  	uint32_t entry_success = 0;
> > @@ -264,7 +277,7 @@ copy_from_mbuf_to_vring(struct virtio_net *dev, uint16_t res_base_idx,
> >  	 * Convert from gpa to vva
> >  	 * (guest physical addr -> vhost virtual addr)
> >  	 */
> > -	vq = dev->virtqueue[VIRTIO_RXQ];
> > +	vq = dev->virtqueue[queue_id];
> >  	vb_addr = gpa_to_vva(dev, vq->buf_vec[vec_idx].buf_addr);
> >  	vb_hdr_addr = vb_addr;
> >  
> > @@ -464,11 +477,14 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id,
> >  
> >  	LOG_DEBUG(VHOST_DATA, "(%"PRIu64") virtio_dev_merge_rx()\n",
> >  		dev->device_fh);
> > -	if (unlikely(queue_id != VIRTIO_RXQ)) {
> > -		LOG_DEBUG(VHOST_DATA, "mq isn't supported in this version.\n");
> > +	if (unlikely(!is_valid_virt_queue_idx(queue_id, 0, dev->virt_qp_nb))) {
> > +		RTE_LOG(ERR, VHOST_DATA,
> > +			"%s (%"PRIu64"): virtqueue idx:%d invalid.\n",
> > +			__func__, dev->device_fh, queue_id);
> > +		return 0;
> >  	}
> >  
> > -	vq = dev->virtqueue[VIRTIO_RXQ];
> > +	vq = dev->virtqueue[queue_id];
> >  	count = RTE_MIN((uint32_t)MAX_PKT_BURST, count);
> >  
> >  	if (count == 0)
> > @@ -509,8 +525,8 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id,
> >  							res_cur_idx);
> >  		} while (success == 0);
> >  
> > -		entry_success = copy_from_mbuf_to_vring(dev, res_base_idx,
> > -			res_cur_idx, pkts[pkt_idx]);
> > +		entry_success = copy_from_mbuf_to_vring(dev, queue_id,
> > +			res_base_idx, res_cur_idx, pkts[pkt_idx]);
> >  
> >  		rte_compiler_barrier();
> >  
> > @@ -562,12 +578,14 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id,
> >  	uint16_t free_entries, entry_success = 0;
> >  	uint16_t avail_idx;
> >  
> > -	if (unlikely(queue_id != VIRTIO_TXQ)) {
> > -		LOG_DEBUG(VHOST_DATA, "mq isn't supported in this version.\n");
> > +	if (unlikely(!is_valid_virt_queue_idx(queue_id, 1, dev->virt_qp_nb))) {
> > +		RTE_LOG(ERR, VHOST_DATA,
> > +			"%s (%"PRIu64"): virtqueue idx:%d invalid.\n",
> > +			__func__, dev->device_fh, queue_id);
> >  		return 0;
> >  	}
> >  
> > -	vq = dev->virtqueue[VIRTIO_TXQ];
> > +	vq = dev->virtqueue[queue_id];
> >  	avail_idx =  *((volatile uint16_t *)&vq->avail->idx);
> >  
> >  	/* If there are no available buffers then return. */
> > -- 
> > 1.9.0

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v7 4/8] vhost: rxtx: use queue id instead of constant ring index
  2015-10-21 12:48     ` Yuanhan Liu
@ 2015-10-21 14:26       ` Michael S. Tsirkin
  2015-10-21 14:59         ` Yuanhan Liu
  2015-10-22  9:49         ` Yuanhan Liu
  0 siblings, 2 replies; 66+ messages in thread
From: Michael S. Tsirkin @ 2015-10-21 14:26 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev, marcel

On Wed, Oct 21, 2015 at 08:48:15PM +0800, Yuanhan Liu wrote:
> > Please note that for virtio devices, guest is supposed to
> > control the placement of incoming packets in RX queues.
> 
> I may not follow you.
> 
> Enqueuing packets to a RX queue is done at vhost lib, outside the
> guest, how could the guest take the control here?
> 
> 	--yliu

vhost should do what guest told it to.

See virtio spec:
	5.1.6.5.5 Automatic receive steering in multiqueue mode

-- 
MST

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v7 4/8] vhost: rxtx: use queue id instead of constant ring index
  2015-10-21 14:26       ` Michael S. Tsirkin
@ 2015-10-21 14:59         ` Yuanhan Liu
  2015-10-22  9:49         ` Yuanhan Liu
  1 sibling, 0 replies; 66+ messages in thread
From: Yuanhan Liu @ 2015-10-21 14:59 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: dev, marcel

On Wed, Oct 21, 2015 at 05:26:18PM +0300, Michael S. Tsirkin wrote:
> On Wed, Oct 21, 2015 at 08:48:15PM +0800, Yuanhan Liu wrote:
> > > Please note that for virtio devices, guest is supposed to
> > > control the placement of incoming packets in RX queues.
> > 
> > I may not follow you.
> > 
> > Enqueuing packets to a RX queue is done at vhost lib, outside the
> > guest, how could the guest take the control here?
> > 
> > 	--yliu
> 
> vhost should do what guest told it to.
> 
> See virtio spec:
> 	5.1.6.5.5 Automatic receive steering in multiqueue mode


Thanks for the info. I'll have a look tomorrow.

	--yliu

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v7 4/8] vhost: rxtx: use queue id instead of constant ring index
  2015-10-21  9:38       ` Ananyev, Konstantin
@ 2015-10-21 15:47         ` Stephen Hemminger
  2015-10-21 15:52           ` Thomas Monjalon
  2015-10-21 15:55           ` Bruce Richardson
  0 siblings, 2 replies; 66+ messages in thread
From: Stephen Hemminger @ 2015-10-21 15:47 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: Michael S. Tsirkin, dev, marcel, Changchun Ouyang

On Wed, 21 Oct 2015 09:38:37 +0000
"Ananyev, Konstantin" <konstantin.ananyev@intel.com> wrote:

> > > minor nits:
> > >  * this doesn't need to be marked as always inline,
> > >     that is as they say in English "shooting a fly with a bazooka"  
> > Stephen:
> > always_inline "forces" the compiler to inline this function, like a macro.
> > When should it be used or is it not preferred at all?  
> 
> I also don't understand what's wrong with using 'always_inline' here.
> As I understand the author wants compiler to *always inline* that function.
> So seems perfectly ok to use it here.
> As I remember just 'inline' is sort of recommendation that compiler is free to ignore.
> Konstantin 

I follow Linux/Linus advice and resist the urge to add strong inlining.
The compiler does a good job of deciding to inline, and many times
the reason it chooses for not inlining are quite good like:
  - the code is on an unlikely branch
  - register pressure means inlining would mean the code would be worse

Therefore my rules are:
  * only use inline for small functions. Let compiler decide on larger static funcs
  * write code where most functions are static (localized scope) where compiler
    can decide
  * reserve always inline for things that access hardware and would break if not inlined.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v7 4/8] vhost: rxtx: use queue id instead of constant ring index
  2015-10-21 15:47         ` Stephen Hemminger
@ 2015-10-21 15:52           ` Thomas Monjalon
  2015-10-21 15:57             ` Bruce Richardson
  2015-10-21 15:55           ` Bruce Richardson
  1 sibling, 1 reply; 66+ messages in thread
From: Thomas Monjalon @ 2015-10-21 15:52 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, marcel, Michael S. Tsirkin

2015-10-21 08:47, Stephen Hemminger:
> On Wed, 21 Oct 2015 09:38:37 +0000
> "Ananyev, Konstantin" <konstantin.ananyev@intel.com> wrote:
> > I also don't understand what's wrong with using 'always_inline' here.
> > As I understand the author wants compiler to *always inline* that function.
> > So seems perfectly ok to use it here.
> > As I remember just 'inline' is sort of recommendation that compiler is free to ignore.
> > Konstantin 
> 
> I follow Linux/Linus advice and resist the urge to add strong inlining.
> The compiler does a good job of deciding to inline, and many times
> the reason it chooses for not inlining are quite good like:
>   - the code is on an unlikely branch
>   - register pressure means inlining would mean the code would be worse
> 
> Therefore my rules are:
>   * only use inline for small functions. Let compiler decide on larger static funcs
>   * write code where most functions are static (localized scope) where compiler
>     can decide
>   * reserve always inline for things that access hardware and would break if not inlined.

It would be interesting to do some benchmarks with/without "always" keyword
and add these rules in the coding style guide.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v7 4/8] vhost: rxtx: use queue id instead of constant ring index
  2015-10-21 15:47         ` Stephen Hemminger
  2015-10-21 15:52           ` Thomas Monjalon
@ 2015-10-21 15:55           ` Bruce Richardson
  2015-10-21 16:29             ` Ananyev, Konstantin
  1 sibling, 1 reply; 66+ messages in thread
From: Bruce Richardson @ 2015-10-21 15:55 UTC (permalink / raw)
  To: Stephen Hemminger, Ananyev, Konstantin
  Cc: dev, marcel, Changchun Ouyang, Michael S. Tsirkin



On 21/10/2015 16:47, Stephen Hemminger wrote:
> On Wed, 21 Oct 2015 09:38:37 +0000
> "Ananyev, Konstantin" <konstantin.ananyev@intel.com> wrote:
>
>>>> minor nits:
>>>>   * this doesn't need to be marked as always inline,
>>>>      that is as they say in English "shooting a fly with a bazooka"
>>> Stephen:
>>> always_inline "forces" the compiler to inline this function, like a macro.
>>> When should it be used or is it not preferred at all?
>> I also don't understand what's wrong with using 'always_inline' here.
>> As I understand the author wants compiler to *always inline* that function.
>> So seems perfectly ok to use it here.
>> As I remember just 'inline' is sort of recommendation that compiler is free to ignore.
>> Konstantin
> I follow Linux/Linus advice and resist the urge to add strong inlining.
> The compiler does a good job of deciding to inline, and many times
> the reason it chooses for not inlining are quite good like:
>    - the code is on an unlikely branch
>    - register pressure means inlining would mean the code would be worse
>
> Therefore my rules are:
>    * only use inline for small functions. Let compiler decide on larger static funcs
>    * write code where most functions are static (localized scope) where compiler
>      can decide
>    * reserve always inline for things that access hardware and would break if not inlined.
>
On the other hand, there are cases where we know the compiler will 
likely inline, but we also know that not inlining could have a high 
performance penalty, and in that case marking as "always inline" would 
be appropriate - even though it is likely unnecessary for most 
compilers. In such a case, I would expect the verification check to be: 
explicitly mark the function as *not* to be inlined, and see what the 
perf drop is. If it's a noticable drop, marking as always-inline is an 
ok precaution against future compiler changes.

Also, we need to remember that compilers cannot know whether a function 
is data path or not, and also whether a function will be called 
per-packet or per-burst. That's only something the programmer will know, 
and functions called per-packet on the datapath generally need to be 
inlined for performance.

/Bruce

/Bruce

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v7 4/8] vhost: rxtx: use queue id instead of constant ring index
  2015-10-21 15:52           ` Thomas Monjalon
@ 2015-10-21 15:57             ` Bruce Richardson
  0 siblings, 0 replies; 66+ messages in thread
From: Bruce Richardson @ 2015-10-21 15:57 UTC (permalink / raw)
  To: dev



On 21/10/2015 16:52, Thomas Monjalon wrote:
> 2015-10-21 08:47, Stephen Hemminger:
>> On Wed, 21 Oct 2015 09:38:37 +0000
>> "Ananyev, Konstantin" <konstantin.ananyev@intel.com> wrote:
>>> I also don't understand what's wrong with using 'always_inline' here.
>>> As I understand the author wants compiler to *always inline* that function.
>>> So seems perfectly ok to use it here.
>>> As I remember just 'inline' is sort of recommendation that compiler is free to ignore.
>>> Konstantin
>> I follow Linux/Linus advice and resist the urge to add strong inlining.
>> The compiler does a good job of deciding to inline, and many times
>> the reason it chooses for not inlining are quite good like:
>>    - the code is on an unlikely branch
>>    - register pressure means inlining would mean the code would be worse
>>
>> Therefore my rules are:
>>    * only use inline for small functions. Let compiler decide on larger static funcs
>>    * write code where most functions are static (localized scope) where compiler
>>      can decide
>>    * reserve always inline for things that access hardware and would break if not inlined.
> It would be interesting to do some benchmarks with/without "always" keyword
> and add these rules in the coding style guide.
>
Better test would be to measure the hit by explicitly not having it 
inlined. You need to know the hit of the compiler making the wrong 
choice, even if it normally makes the right one.

Bruce

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v7 4/8] vhost: rxtx: use queue id instead of constant ring index
  2015-10-21 15:55           ` Bruce Richardson
@ 2015-10-21 16:29             ` Ananyev, Konstantin
  0 siblings, 0 replies; 66+ messages in thread
From: Ananyev, Konstantin @ 2015-10-21 16:29 UTC (permalink / raw)
  To: Richardson, Bruce, Stephen Hemminger
  Cc: dev, marcel, Changchun Ouyang, Michael S. Tsirkin



> -----Original Message-----
> From: Richardson, Bruce
> Sent: Wednesday, October 21, 2015 4:56 PM
> To: Stephen Hemminger; Ananyev, Konstantin
> Cc: Michael S. Tsirkin; dev@dpdk.org; marcel@redhat.com; Changchun Ouyang
> Subject: Re: [dpdk-dev] [PATCH v7 4/8] vhost: rxtx: use queue id instead of constant ring index
> 
> 
> 
> On 21/10/2015 16:47, Stephen Hemminger wrote:
> > On Wed, 21 Oct 2015 09:38:37 +0000
> > "Ananyev, Konstantin" <konstantin.ananyev@intel.com> wrote:
> >
> >>>> minor nits:
> >>>>   * this doesn't need to be marked as always inline,
> >>>>      that is as they say in English "shooting a fly with a bazooka"
> >>> Stephen:
> >>> always_inline "forces" the compiler to inline this function, like a macro.
> >>> When should it be used or is it not preferred at all?
> >> I also don't understand what's wrong with using 'always_inline' here.
> >> As I understand the author wants compiler to *always inline* that function.
> >> So seems perfectly ok to use it here.
> >> As I remember just 'inline' is sort of recommendation that compiler is free to ignore.
> >> Konstantin
> > I follow Linux/Linus advice and resist the urge to add strong inlining.
> > The compiler does a good job of deciding to inline, and many times
> > the reason it chooses for not inlining are quite good like:
> >    - the code is on an unlikely branch
> >    - register pressure means inlining would mean the code would be worse

Yep, that's all true, but as I remember 'inline' keyword itself doesn't force compiler to
always inline that function. It is more like a recommendation to the compiler.
Looking at any dpdk binary, there are plenty of places where function is declared as 'inline',
but compiler decided not to and followed standard function call convention for it.
Again, from C spec:
"6. A function declared with an inline function specifier is an inline function. Making a
function an inline function suggests that calls to the function be as fast as possible.138)
7. The extent to which such suggestions are effective is implementation-defined.139) 
...
139) For example, an implementation might never perform inline substitution, or might only perform inline
substitutions to calls in the scope of an inline declaration."

> >
> > Therefore my rules are:
> >    * only use inline for small functions. Let compiler decide on larger static funcs

As I remember function we are talking about is really small.

> >    * write code where most functions are static (localized scope) where compiler
> >      can decide
> >    * reserve always inline for things that access hardware and would break if not inlined.

Sorry, but the latest rule looks too restrictive to me.
Don't see any reason why we all have to follow it.
BTW, as I can see there are plenty of always_inline functions inside linux kernel
(memory allocator, scheduler, etc).  

> >
> On the other hand, there are cases where we know the compiler will
> likely inline, but we also know that not inlining could have a high
> performance penalty, and in that case marking as "always inline" would
> be appropriate - even though it is likely unnecessary for most
> compilers.

Yep, totally agree here.
If memory serves me right - in the past we observed few noticeable performance drops
because of that when switching from one compiler version to another. 

Konstantin


 In such a case, I would expect the verification check to be:
> explicitly mark the function as *not* to be inlined, and see what the
> perf drop is. If it's a noticable drop, marking as always-inline is an
> ok precaution against future compiler changes.
> 
> Also, we need to remember that compilers cannot know whether a function
> is data path or not, and also whether a function will be called
> per-packet or per-burst. That's only something the programmer will know,
> and functions called per-packet on the datapath generally need to be
> inlined for performance.
> 
> /Bruce
> 
> /Bruce

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v7 4/8] vhost: rxtx: use queue id instead of constant ring index
  2015-10-21  3:48 ` [dpdk-dev] [PATCH v7 4/8] vhost: rxtx: use queue id instead of constant ring index Yuanhan Liu
  2015-10-21  4:43   ` Stephen Hemminger
  2015-10-21 10:31   ` Michael S. Tsirkin
@ 2015-10-22  7:26   ` Xie, Huawei
  2 siblings, 0 replies; 66+ messages in thread
From: Xie, Huawei @ 2015-10-22  7:26 UTC (permalink / raw)
  To: Yuanhan Liu, dev; +Cc: Adams, Steve, Michael S. Tsirkin, marcel

On 10/21/2015 11:48 AM, Yuanhan Liu wrote:

[...]
>  
>  #define MAX_PKT_BURST 32
>  
> +static inline int __attribute__((always_inline))
> +is_valid_virt_queue_idx(uint32_t virtq_idx, int is_tx, uint32_t max_qp_idx)
> +{
> +	if ((is_tx ^ (virtq_idx & 0x1)) ||
> +	    (virtq_idx >= max_qp_idx * VIRTIO_QNUM))
> +		return 0;
> +
> +	return 1;
> +}
> +
>  /**
>   * This function adds buffers to the virtio devices RX virtqueue. Buffers can
>   * be received from the physical port or from another virtio device. A packet
> @@ -68,12 +78,14 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
>  	uint8_t success = 0;
>  
>  	LOG_DEBUG(VHOST_DATA, "(%"PRIu64") virtio_dev_rx()\n", dev->device_fh);
> -	if (unlikely(queue_id != VIRTIO_RXQ)) {
> -		LOG_DEBUG(VHOST_DATA, "mq isn't supported in this version.\n");
> +	if (unlikely(!is_valid_virt_queue_idx(queue_id, 0, dev->virt_qp_nb))) {
> +		RTE_LOG(ERR, VHOST_DATA,
> +			"%s (%"PRIu64"): virtqueue idx:%d invalid.\n",
> +			__func__, dev->device_fh, queue_id);
>  		return 0;
>  	}
>  
> -	vq = dev->virtqueue[VIRTIO_RXQ];
> +	vq = dev->virtqueue[queue_id];
>  	count = (count > MAX_PKT_BURST) ? MAX_PKT_BURST : count;
>  
>  	/*
>
Besides the always_inline issue, i think we should remove the queue_id
check here in the "data" path. Caller should guarantee that they pass us
the correct queue idx.
We could add VHOST_DEBUG macro for the sanity check for debug purpose only.

On the other hand, currently we lack of enough check for the guest
because there could be malicious guests. Plan to fix this in next release.

[...]


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v7 2/8] vhost-user: add VHOST_USER_GET_QUEUE_NUM message
  2015-10-21  3:48 ` [dpdk-dev] [PATCH v7 2/8] vhost-user: add VHOST_USER_GET_QUEUE_NUM message Yuanhan Liu
@ 2015-10-22  9:38   ` Xie, Huawei
  0 siblings, 0 replies; 66+ messages in thread
From: Xie, Huawei @ 2015-10-22  9:38 UTC (permalink / raw)
  To: Yuanhan Liu, dev; +Cc: marcel, Michael S. Tsirkin

On 10/21/2015 11:48 AM, Yuanhan Liu wrote:
> To tell the frontend (qemu) how many queue pairs we support.
>
> And it is initiated to VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX.
s/initiated/initialized/

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v7 4/8] vhost: rxtx: use queue id instead of constant ring index
  2015-10-21 14:26       ` Michael S. Tsirkin
  2015-10-21 14:59         ` Yuanhan Liu
@ 2015-10-22  9:49         ` Yuanhan Liu
  2015-10-22 11:32           ` Michael S. Tsirkin
  1 sibling, 1 reply; 66+ messages in thread
From: Yuanhan Liu @ 2015-10-22  9:49 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: dev, marcel

On Wed, Oct 21, 2015 at 05:26:18PM +0300, Michael S. Tsirkin wrote:
> On Wed, Oct 21, 2015 at 08:48:15PM +0800, Yuanhan Liu wrote:
> > > Please note that for virtio devices, guest is supposed to
> > > control the placement of incoming packets in RX queues.
> > 
> > I may not follow you.
> > 
> > Enqueuing packets to a RX queue is done at vhost lib, outside the
> > guest, how could the guest take the control here?
> > 
> > 	--yliu
> 
> vhost should do what guest told it to.
> 
> See virtio spec:
> 	5.1.6.5.5 Automatic receive steering in multiqueue mode

Spec says:

    After the driver transmitted a packet of a flow on transmitqX,
    the device SHOULD cause incoming packets for that flow to be
    steered to receiveqX.


Michael, I still have no idea how vhost could know the flow even
after discussion with Huawei. Could you be more specific about
this? Say, how could guest know that? And how could guest tell
vhost which RX is gonna to use?

Thanks.

	--yliu

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v7 3/8] vhost: vring queue setup for multiple queue support
  2015-10-21  3:48 ` [dpdk-dev] [PATCH v7 3/8] vhost: vring queue setup for multiple queue support Yuanhan Liu
  2015-10-21  4:45   ` Stephen Hemminger
@ 2015-10-22  9:49   ` Xie, Huawei
  2015-10-22 11:30     ` Yuanhan Liu
  1 sibling, 1 reply; 66+ messages in thread
From: Xie, Huawei @ 2015-10-22  9:49 UTC (permalink / raw)
  To: Yuanhan Liu, dev; +Cc: marcel, Michael S. Tsirkin

On 10/21/2015 11:48 AM, Yuanhan Liu wrote:
> All queue pairs, including the default (the first) queue pair,
> are allocated dynamically, when a vring_call message is received
> first time for a specific queue pair.
>
> This is a refactor work for enabling vhost-user multiple queue;
> it should not break anything as it does no functional changes:
> we don't support mq set, so there is only one mq at max.
>
> This patch is based on Changchun's patch.
>
[...]
>  
>  void
> @@ -290,13 +298,9 @@ user_get_vring_base(struct vhost_device_ctx ctx,
>  	 * sent and only sent in vhost_vring_stop.
>  	 * TODO: cleanup the vring, it isn't usable since here.
>  	 */
> -	if ((dev->virtqueue[VIRTIO_RXQ]->kickfd) >= 0) {
> -		close(dev->virtqueue[VIRTIO_RXQ]->kickfd);
> -		dev->virtqueue[VIRTIO_RXQ]->kickfd = -1;
> -	}
> -	if ((dev->virtqueue[VIRTIO_TXQ]->kickfd) >= 0) {
> -		close(dev->virtqueue[VIRTIO_TXQ]->kickfd);
> -		dev->virtqueue[VIRTIO_TXQ]->kickfd = -1;
> +	if ((dev->virtqueue[state->index]->kickfd) >= 0) {
> +		close(dev->virtqueue[state->index]->kickfd);
> +		dev->virtqueue[state->index]->kickfd = -1;
>  	}
Since we change the behavior here, better list in the commit message as
well.

>  
>  
> @@ -680,13 +704,21 @@ set_vring_call(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
>  {
>  	struct virtio_net *dev;
>  	struct vhost_virtqueue *vq;
> +	uint32_t cur_qp_idx = file->index / VIRTIO_QNUM;
>  
>  	dev = get_device(ctx);
>  	if (dev == NULL)
>  		return -1;
>  
> +	/* alloc vring queue pair if it is a new queue pair */
> +	if (cur_qp_idx + 1 > dev->virt_qp_nb) {
> +		if (alloc_vring_queue_pair(dev, cur_qp_idx) < 0)
> +			return -1;
> +	}
> +
Here we rely on the fact that this set_vring_call message is sent in the
continuous ascending order of queue idx 0, 1, 2, ...

>  	/* file->index refers to the queue index. The txq is 1, rxq is 0. */
>  	vq = dev->virtqueue[file->index];
> +	assert(vq != NULL);
>  
If we allocate the queue until the we receive the first vring message,
better add comment that we rely on this fact.
Could we add the vhost-user message to tell us the queue number QEMU
allocates before vring message?
>  	if (vq->callfd >= 0)
>  		close(vq->callfd);


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v7 1/8] vhost-user: add protocol features support
  2015-10-21  3:48 ` [dpdk-dev] [PATCH v7 1/8] vhost-user: add protocol features support Yuanhan Liu
@ 2015-10-22  9:52   ` Xie, Huawei
  0 siblings, 0 replies; 66+ messages in thread
From: Xie, Huawei @ 2015-10-22  9:52 UTC (permalink / raw)
  To: Yuanhan Liu, dev; +Cc: marcel, Michael S. Tsirkin

On 10/21/2015 11:48 AM, Yuanhan Liu wrote:
[...]
>
> VHOST_USER_PROTOCOL_FEATURES is initated to 0, as we don't support
>
s/initiated/initialized/


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v7 3/8] vhost: vring queue setup for multiple queue support
  2015-10-22  9:49   ` Xie, Huawei
@ 2015-10-22 11:30     ` Yuanhan Liu
  0 siblings, 0 replies; 66+ messages in thread
From: Yuanhan Liu @ 2015-10-22 11:30 UTC (permalink / raw)
  To: Xie, Huawei; +Cc: dev, marcel, Michael S. Tsirkin

On Thu, Oct 22, 2015 at 09:49:58AM +0000, Xie, Huawei wrote:
> On 10/21/2015 11:48 AM, Yuanhan Liu wrote:
> > All queue pairs, including the default (the first) queue pair,
> > are allocated dynamically, when a vring_call message is received
> > first time for a specific queue pair.
> >
> > This is a refactor work for enabling vhost-user multiple queue;
> > it should not break anything as it does no functional changes:
> > we don't support mq set, so there is only one mq at max.
> >
> > This patch is based on Changchun's patch.
> >
> [...]
> >  
> >  void
> > @@ -290,13 +298,9 @@ user_get_vring_base(struct vhost_device_ctx ctx,
> >  	 * sent and only sent in vhost_vring_stop.
> >  	 * TODO: cleanup the vring, it isn't usable since here.
> >  	 */
> > -	if ((dev->virtqueue[VIRTIO_RXQ]->kickfd) >= 0) {
> > -		close(dev->virtqueue[VIRTIO_RXQ]->kickfd);
> > -		dev->virtqueue[VIRTIO_RXQ]->kickfd = -1;
> > -	}
> > -	if ((dev->virtqueue[VIRTIO_TXQ]->kickfd) >= 0) {
> > -		close(dev->virtqueue[VIRTIO_TXQ]->kickfd);
> > -		dev->virtqueue[VIRTIO_TXQ]->kickfd = -1;
> > +	if ((dev->virtqueue[state->index]->kickfd) >= 0) {
> > +		close(dev->virtqueue[state->index]->kickfd);
> > +		dev->virtqueue[state->index]->kickfd = -1;
> >  	}
> Since we change the behavior here, better list in the commit message as
> well.

I checked the code again, and found I should not change that:
GET_VRING_BASE is sent per virt queue pair.

BTW, it's wrong to do this kind of stuff here, we need fix
it in future.

> 
> >  
> >  
> > @@ -680,13 +704,21 @@ set_vring_call(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
> >  {
> >  	struct virtio_net *dev;
> >  	struct vhost_virtqueue *vq;
> > +	uint32_t cur_qp_idx = file->index / VIRTIO_QNUM;
> >  
> >  	dev = get_device(ctx);
> >  	if (dev == NULL)
> >  		return -1;
> >  
> > +	/* alloc vring queue pair if it is a new queue pair */
> > +	if (cur_qp_idx + 1 > dev->virt_qp_nb) {
> > +		if (alloc_vring_queue_pair(dev, cur_qp_idx) < 0)
> > +			return -1;
> > +	}
> > +
> Here we rely on the fact that this set_vring_call message is sent in the
> continuous ascending order of queue idx 0, 1, 2, ...

That's true.

> 
> >  	/* file->index refers to the queue index. The txq is 1, rxq is 0. */
> >  	vq = dev->virtqueue[file->index];
> > +	assert(vq != NULL);
> >  
> If we allocate the queue until the we receive the first vring message,
> better add comment that we rely on this fact.

Will do that.

> Could we add the vhost-user message to tell us the queue number QEMU
> allocates before vring message?

We may need do that. But it's too late to make it in v2.2

	--yliu

> >  	if (vq->callfd >= 0)
> >  		close(vq->callfd);
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v7 4/8] vhost: rxtx: use queue id instead of constant ring index
  2015-10-22  9:49         ` Yuanhan Liu
@ 2015-10-22 11:32           ` Michael S. Tsirkin
  2015-10-22 14:07             ` Yuanhan Liu
  2015-10-24  2:34             ` Flavio Leitner
  0 siblings, 2 replies; 66+ messages in thread
From: Michael S. Tsirkin @ 2015-10-22 11:32 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev, marcel

On Thu, Oct 22, 2015 at 05:49:55PM +0800, Yuanhan Liu wrote:
> On Wed, Oct 21, 2015 at 05:26:18PM +0300, Michael S. Tsirkin wrote:
> > On Wed, Oct 21, 2015 at 08:48:15PM +0800, Yuanhan Liu wrote:
> > > > Please note that for virtio devices, guest is supposed to
> > > > control the placement of incoming packets in RX queues.
> > > 
> > > I may not follow you.
> > > 
> > > Enqueuing packets to a RX queue is done at vhost lib, outside the
> > > guest, how could the guest take the control here?
> > > 
> > > 	--yliu
> > 
> > vhost should do what guest told it to.
> > 
> > See virtio spec:
> > 	5.1.6.5.5 Automatic receive steering in multiqueue mode
> 
> Spec says:
> 
>     After the driver transmitted a packet of a flow on transmitqX,
>     the device SHOULD cause incoming packets for that flow to be
>     steered to receiveqX.
> 
> 
> Michael, I still have no idea how vhost could know the flow even
> after discussion with Huawei. Could you be more specific about
> this? Say, how could guest know that? And how could guest tell
> vhost which RX is gonna to use?
> 
> Thanks.
> 
> 	--yliu

I don't really understand the question.

When guests transmits a packet, it makes a decision
about the flow to use, and maps that to a tx/rx pair of queues.

It sends packets out on the tx queue and expects device to
return packets from the same flow on the rx queue.

During transmit, device needs to figure out the flow
of packets as they are received from guest, and track
which flows go on which tx queue.
When it selects the rx queue, it has to use the same table.

There is currently no provision for controlling
steering for uni-directional
flows which are possible e.g. with UDP.

We might solve this in a future spec - for example, set a flag notifying
guest that steering information is missing for a given flow, for example
by setting a flag in a packet, or using the command queue, and have
guest send a dummy empty packet to set steering rule for this flow.

-- 
MST

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [dpdk-dev] [PATCH v8 0/8] vhost-user multiple queues enabling
  2015-10-21  3:48 [dpdk-dev] [PATCH v7 0/8] vhost-user multiple queues enabling Yuanhan Liu
                   ` (7 preceding siblings ...)
  2015-10-21  3:48 ` [dpdk-dev] [PATCH v7 8/8] doc: update release note for vhost-user mq support Yuanhan Liu
@ 2015-10-22 12:35 ` Yuanhan Liu
  2015-10-22 12:35   ` [dpdk-dev] [PATCH v8 1/8] vhost-user: add protocol features support Yuanhan Liu
                     ` (8 more replies)
  8 siblings, 9 replies; 66+ messages in thread
From: Yuanhan Liu @ 2015-10-22 12:35 UTC (permalink / raw)
  To: dev; +Cc: marcel, Michael S. Tsirkin

This patch set enables vhost-user multiple queue feature.


v8:

   - put SET_VRING_ENABLE() patch before the patch actually
     enable mq, since that make more sense.

   - don't change the kickfd reset behavior for patch 3

   - move virt_queue field to the end of virtio_net struct.

   - comment and type fixe


v7:

   - Removed vhost-user mq examples in this patch set
   
     Because the example leverages the hardware VMDq feature to
     demonstrate the mq feature, which introduces too much 
     limitation, yet it's turned out to be not elegant.
   
   - Commit log fixes
   
   - Dropped the patch to fix RESET_OWNER handling, as I found
     Jerome's solution works as well, and it makes more sense to
     me:
   
     http://dpdk.org/dev/patchwork/project/dpdk/list/?submitter=354



Overview
========

It depends on some QEMU patches that has already been merged to upstream.
Those qemu patches introduce some new vhost-user messages, for vhost-user
mq enabling negotiation. Here is the main negotiation steps (Qemu
as master, and DPDK vhost-user as slave):

- Master queries features by VHOST_USER_GET_FEATURES from slave

- Check if VHOST_USER_F_PROTOCOL_FEATURES exist. If not, mq is not
  supported. (check patch 1 for why VHOST_USER_F_PROTOCOL_FEATURES
  is introduced)

- Master then sends another command, VHOST_USER_GET_QUEUE_NUM, for
  querying how many queues the slave supports.

  Master will compare the result with the requested queue number.
  Qemu exits if the former is smaller.

- Master then tries to initiate all queue pairs by sending some vhost
  user commands, including VHOST_USER_SET_VRING_CALL, which will
  trigger the slave to do related vring setup, such as vring allocation.


Till now, all necessary initiation and negotiation are done. And master
could send another message, VHOST_USER_SET_VRING_ENABLE, to enable/disable
a specific queue dynamically later.


Patchset
========

Patch 1-6 are all prepare works for enabling mq; they are all atomic
changes, with "do not breaking anything" beared in mind while making
them.

Patch 7 actually enables mq feature, by setting two key feature flags.


Test with OVS
=============

Marcel created a simple yet quite clear test guide with OVS at:

   http://wiki.qemu.org/Features/vhost-user-ovs-dpdk




---
Changchun Ouyang (3):
  vhost: rxtx: use queue id instead of constant ring index
  virtio: fix deadloop due to reading virtio_net_config incorrectly
  vhost: add VHOST_USER_SET_VRING_ENABLE message

Yuanhan Liu (5):
  vhost-user: add protocol features support
  vhost-user: add VHOST_USER_GET_QUEUE_NUM message
  vhost: vring queue setup for multiple queue support
  vhost-user: enable vhost-user multiple queue
  doc: update release note for vhost-user mq support

 doc/guides/rel_notes/release_2_2.rst          |   4 +
 drivers/net/virtio/virtio_ethdev.c            |  16 ++-
 lib/librte_vhost/rte_virtio_net.h             |  13 +-
 lib/librte_vhost/vhost_rxtx.c                 |  53 +++++---
 lib/librte_vhost/vhost_user/vhost-net-user.c  |  25 +++-
 lib/librte_vhost/vhost_user/vhost-net-user.h  |   4 +
 lib/librte_vhost/vhost_user/virtio-net-user.c |  86 ++++++++++---
 lib/librte_vhost/vhost_user/virtio-net-user.h |  10 ++
 lib/librte_vhost/virtio-net.c                 | 168 ++++++++++++++++----------
 9 files changed, 275 insertions(+), 104 deletions(-)

-- 
1.9.0

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [dpdk-dev] [PATCH v8 1/8] vhost-user: add protocol features support
  2015-10-22 12:35 ` [dpdk-dev] [PATCH v8 0/8] vhost-user multiple queues enabling Yuanhan Liu
@ 2015-10-22 12:35   ` Yuanhan Liu
  2015-10-22 12:35   ` [dpdk-dev] [PATCH v8 2/8] vhost-user: add VHOST_USER_GET_QUEUE_NUM message Yuanhan Liu
                     ` (7 subsequent siblings)
  8 siblings, 0 replies; 66+ messages in thread
From: Yuanhan Liu @ 2015-10-22 12:35 UTC (permalink / raw)
  To: dev; +Cc: marcel, Michael S. Tsirkin

The two protocol features messages are introduced by qemu vhost
maintainer(Michael) for extendting vhost-user interface. Here is
an excerpta from the vhost-user spec:

    Any protocol extensions are gated by protocol feature bits,
    which allows full backwards compatibility on both master
    and slave.

The vhost-user multiple queue features will be treated as a vhost-user
extension, hence, we have to implement the two messages first.

VHOST_USER_PROTOCOL_FEATURES is initialized to 0, as we don't support
any yet.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
---
 lib/librte_vhost/rte_virtio_net.h             |  1 +
 lib/librte_vhost/vhost_user/vhost-net-user.c  | 13 ++++++++++++-
 lib/librte_vhost/vhost_user/vhost-net-user.h  |  2 ++
 lib/librte_vhost/vhost_user/virtio-net-user.c | 13 +++++++++++++
 lib/librte_vhost/vhost_user/virtio-net-user.h |  5 +++++
 lib/librte_vhost/virtio-net.c                 |  5 ++++-
 6 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
index a037c15..e3a21e5 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -99,6 +99,7 @@ struct virtio_net {
 	struct vhost_virtqueue	*virtqueue[VIRTIO_QNUM];	/**< Contains all virtqueue information. */
 	struct virtio_memory	*mem;		/**< QEMU memory and memory region information. */
 	uint64_t		features;	/**< Negotiated feature set. */
+	uint64_t		protocol_features;	/**< Negotiated protocol feature set. */
 	uint64_t		device_fh;	/**< device identifier. */
 	uint32_t		flags;		/**< Device flags. Only used to check if device is running on data core. */
 #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ)
diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c b/lib/librte_vhost/vhost_user/vhost-net-user.c
index d1f8877..bc2ad24 100644
--- a/lib/librte_vhost/vhost_user/vhost-net-user.c
+++ b/lib/librte_vhost/vhost_user/vhost-net-user.c
@@ -95,7 +95,9 @@ static const char *vhost_message_str[VHOST_USER_MAX] = {
 	[VHOST_USER_GET_VRING_BASE] = "VHOST_USER_GET_VRING_BASE",
 	[VHOST_USER_SET_VRING_KICK] = "VHOST_USER_SET_VRING_KICK",
 	[VHOST_USER_SET_VRING_CALL] = "VHOST_USER_SET_VRING_CALL",
-	[VHOST_USER_SET_VRING_ERR]  = "VHOST_USER_SET_VRING_ERR"
+	[VHOST_USER_SET_VRING_ERR]  = "VHOST_USER_SET_VRING_ERR",
+	[VHOST_USER_GET_PROTOCOL_FEATURES]  = "VHOST_USER_GET_PROTOCOL_FEATURES",
+	[VHOST_USER_SET_PROTOCOL_FEATURES]  = "VHOST_USER_SET_PROTOCOL_FEATURES",
 };
 
 /**
@@ -363,6 +365,15 @@ vserver_message_handler(int connfd, void *dat, int *remove)
 		ops->set_features(ctx, &features);
 		break;
 
+	case VHOST_USER_GET_PROTOCOL_FEATURES:
+		msg.payload.u64 = VHOST_USER_PROTOCOL_FEATURES;
+		msg.size = sizeof(msg.payload.u64);
+		send_vhost_message(connfd, &msg);
+		break;
+	case VHOST_USER_SET_PROTOCOL_FEATURES:
+		user_set_protocol_features(ctx, msg.payload.u64);
+		break;
+
 	case VHOST_USER_SET_OWNER:
 		ops->set_owner(ctx);
 		break;
diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.h b/lib/librte_vhost/vhost_user/vhost-net-user.h
index 2e72f3c..4490d23 100644
--- a/lib/librte_vhost/vhost_user/vhost-net-user.h
+++ b/lib/librte_vhost/vhost_user/vhost-net-user.h
@@ -63,6 +63,8 @@ typedef enum VhostUserRequest {
 	VHOST_USER_SET_VRING_KICK = 12,
 	VHOST_USER_SET_VRING_CALL = 13,
 	VHOST_USER_SET_VRING_ERR = 14,
+	VHOST_USER_GET_PROTOCOL_FEATURES = 15,
+	VHOST_USER_SET_PROTOCOL_FEATURES = 16,
 	VHOST_USER_MAX
 } VhostUserRequest;
 
diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c b/lib/librte_vhost/vhost_user/virtio-net-user.c
index e0bc2a4..6da729d 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.c
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
@@ -318,3 +318,16 @@ user_destroy_device(struct vhost_device_ctx ctx)
 		dev->mem = NULL;
 	}
 }
+
+void
+user_set_protocol_features(struct vhost_device_ctx ctx,
+			   uint64_t protocol_features)
+{
+	struct virtio_net *dev;
+
+	dev = get_device(ctx);
+	if (dev == NULL || protocol_features & ~VHOST_USER_PROTOCOL_FEATURES)
+		return;
+
+	dev->protocol_features = protocol_features;
+}
diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.h b/lib/librte_vhost/vhost_user/virtio-net-user.h
index df24860..e7a6ff4 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.h
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.h
@@ -37,12 +37,17 @@
 #include "vhost-net.h"
 #include "vhost-net-user.h"
 
+#define VHOST_USER_PROTOCOL_FEATURES	0ULL
+
 int user_set_mem_table(struct vhost_device_ctx, struct VhostUserMsg *);
 
 void user_set_vring_call(struct vhost_device_ctx, struct VhostUserMsg *);
 
 void user_set_vring_kick(struct vhost_device_ctx, struct VhostUserMsg *);
 
+void user_set_protocol_features(struct vhost_device_ctx ctx,
+				uint64_t protocol_features);
+
 int user_get_vring_base(struct vhost_device_ctx, struct vhost_vring_state *);
 
 void user_destroy_device(struct vhost_device_ctx);
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index a6ab245..830f22a 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -67,11 +67,14 @@ struct virtio_net_device_ops const *notify_ops;
 /* root address of the linked list of managed virtio devices */
 static struct virtio_net_config_ll *ll_root;
 
+#define VHOST_USER_F_PROTOCOL_FEATURES	30
+
 /* Features supported by this lib. */
 #define VHOST_SUPPORTED_FEATURES ((1ULL << VIRTIO_NET_F_MRG_RXBUF) | \
 				(1ULL << VIRTIO_NET_F_CTRL_VQ) | \
 				(1ULL << VIRTIO_NET_F_CTRL_RX) | \
-				(1ULL << VHOST_F_LOG_ALL))
+				(1ULL << VHOST_F_LOG_ALL)      | \
+				(1ULL << VHOST_USER_F_PROTOCOL_FEATURES))
 static uint64_t VHOST_FEATURES = VHOST_SUPPORTED_FEATURES;
 
 
-- 
1.9.0

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [dpdk-dev] [PATCH v8 2/8] vhost-user: add VHOST_USER_GET_QUEUE_NUM message
  2015-10-22 12:35 ` [dpdk-dev] [PATCH v8 0/8] vhost-user multiple queues enabling Yuanhan Liu
  2015-10-22 12:35   ` [dpdk-dev] [PATCH v8 1/8] vhost-user: add protocol features support Yuanhan Liu
@ 2015-10-22 12:35   ` Yuanhan Liu
  2015-10-22 12:35   ` [dpdk-dev] [PATCH v8 3/8] vhost: vring queue setup for multiple queue support Yuanhan Liu
                     ` (6 subsequent siblings)
  8 siblings, 0 replies; 66+ messages in thread
From: Yuanhan Liu @ 2015-10-22 12:35 UTC (permalink / raw)
  To: dev; +Cc: marcel, Michael S. Tsirkin

To tell the frontend (qemu) how many queue pairs we support.

And it is initiated to VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
---
 lib/librte_vhost/vhost_user/vhost-net-user.c | 7 +++++++
 lib/librte_vhost/vhost_user/vhost-net-user.h | 1 +
 2 files changed, 8 insertions(+)

diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c b/lib/librte_vhost/vhost_user/vhost-net-user.c
index bc2ad24..8675cd4 100644
--- a/lib/librte_vhost/vhost_user/vhost-net-user.c
+++ b/lib/librte_vhost/vhost_user/vhost-net-user.c
@@ -98,6 +98,7 @@ static const char *vhost_message_str[VHOST_USER_MAX] = {
 	[VHOST_USER_SET_VRING_ERR]  = "VHOST_USER_SET_VRING_ERR",
 	[VHOST_USER_GET_PROTOCOL_FEATURES]  = "VHOST_USER_GET_PROTOCOL_FEATURES",
 	[VHOST_USER_SET_PROTOCOL_FEATURES]  = "VHOST_USER_SET_PROTOCOL_FEATURES",
+	[VHOST_USER_GET_QUEUE_NUM]  = "VHOST_USER_GET_QUEUE_NUM",
 };
 
 /**
@@ -421,6 +422,12 @@ vserver_message_handler(int connfd, void *dat, int *remove)
 		RTE_LOG(INFO, VHOST_CONFIG, "not implemented\n");
 		break;
 
+	case VHOST_USER_GET_QUEUE_NUM:
+		msg.payload.u64 = VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX;
+		msg.size = sizeof(msg.payload.u64);
+		send_vhost_message(connfd, &msg);
+		break;
+
 	default:
 		break;
 
diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.h b/lib/librte_vhost/vhost_user/vhost-net-user.h
index 4490d23..389d21d 100644
--- a/lib/librte_vhost/vhost_user/vhost-net-user.h
+++ b/lib/librte_vhost/vhost_user/vhost-net-user.h
@@ -65,6 +65,7 @@ typedef enum VhostUserRequest {
 	VHOST_USER_SET_VRING_ERR = 14,
 	VHOST_USER_GET_PROTOCOL_FEATURES = 15,
 	VHOST_USER_SET_PROTOCOL_FEATURES = 16,
+	VHOST_USER_GET_QUEUE_NUM = 17,
 	VHOST_USER_MAX
 } VhostUserRequest;
 
-- 
1.9.0

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [dpdk-dev] [PATCH v8 3/8] vhost: vring queue setup for multiple queue support
  2015-10-22 12:35 ` [dpdk-dev] [PATCH v8 0/8] vhost-user multiple queues enabling Yuanhan Liu
  2015-10-22 12:35   ` [dpdk-dev] [PATCH v8 1/8] vhost-user: add protocol features support Yuanhan Liu
  2015-10-22 12:35   ` [dpdk-dev] [PATCH v8 2/8] vhost-user: add VHOST_USER_GET_QUEUE_NUM message Yuanhan Liu
@ 2015-10-22 12:35   ` Yuanhan Liu
  2015-10-26  5:24     ` Tetsuya Mukawa
  2015-10-22 12:35   ` [dpdk-dev] [PATCH v8 4/8] vhost: rxtx: use queue id instead of constant ring index Yuanhan Liu
                     ` (5 subsequent siblings)
  8 siblings, 1 reply; 66+ messages in thread
From: Yuanhan Liu @ 2015-10-22 12:35 UTC (permalink / raw)
  To: dev; +Cc: marcel, Michael S. Tsirkin

All queue pairs, including the default (the first) queue pair,
are allocated dynamically, when a vring_call message is received
first time for a specific queue pair.

This is a refactor work for enabling vhost-user multiple queue;
it should not break anything as it does no functional changes:
we don't support mq set, so there is only one mq at max.

This patch is based on Changchun's patch.

Signed-off-by: Ouyang Changchun <changchun.ouyang@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>

---

v8: - move virtuque field to the end of `virtio_net' struct.

    - Add a FIXME at set_vring_call() for doing vring queue pair
      allocation.
---
 lib/librte_vhost/rte_virtio_net.h             |   3 +-
 lib/librte_vhost/vhost_user/virtio-net-user.c |  46 ++++----
 lib/librte_vhost/virtio-net.c                 | 156 ++++++++++++++++----------
 3 files changed, 123 insertions(+), 82 deletions(-)

diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
index e3a21e5..9a32a95 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -96,7 +96,6 @@ struct vhost_virtqueue {
  * Device structure contains all configuration information relating to the device.
  */
 struct virtio_net {
-	struct vhost_virtqueue	*virtqueue[VIRTIO_QNUM];	/**< Contains all virtqueue information. */
 	struct virtio_memory	*mem;		/**< QEMU memory and memory region information. */
 	uint64_t		features;	/**< Negotiated feature set. */
 	uint64_t		protocol_features;	/**< Negotiated protocol feature set. */
@@ -104,7 +103,9 @@ struct virtio_net {
 	uint32_t		flags;		/**< Device flags. Only used to check if device is running on data core. */
 #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ)
 	char			ifname[IF_NAME_SZ];	/**< Name of the tap device or socket path. */
+	uint32_t		virt_qp_nb;	/**< number of queue pair we have allocated */
 	void			*priv;		/**< private context */
+	struct vhost_virtqueue	*virtqueue[VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX];	/**< Contains all virtqueue information. */
 } __rte_cache_aligned;
 
 /**
diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c b/lib/librte_vhost/vhost_user/virtio-net-user.c
index 6da729d..d62f3d7 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.c
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
@@ -206,25 +206,33 @@ err_mmap:
 }
 
 static int
+vq_is_ready(struct vhost_virtqueue *vq)
+{
+	return vq && vq->desc   &&
+	       vq->kickfd != -1 &&
+	       vq->callfd != -1;
+}
+
+static int
 virtio_is_ready(struct virtio_net *dev)
 {
 	struct vhost_virtqueue *rvq, *tvq;
+	uint32_t i;
 
-	/* mq support in future.*/
-	rvq = dev->virtqueue[VIRTIO_RXQ];
-	tvq = dev->virtqueue[VIRTIO_TXQ];
-	if (rvq && tvq && rvq->desc && tvq->desc &&
-		(rvq->kickfd != -1) &&
-		(rvq->callfd != -1) &&
-		(tvq->kickfd != -1) &&
-		(tvq->callfd != -1)) {
-		RTE_LOG(INFO, VHOST_CONFIG,
-			"virtio is now ready for processing.\n");
-		return 1;
+	for (i = 0; i < dev->virt_qp_nb; i++) {
+		rvq = dev->virtqueue[i * VIRTIO_QNUM + VIRTIO_RXQ];
+		tvq = dev->virtqueue[i * VIRTIO_QNUM + VIRTIO_TXQ];
+
+		if (!vq_is_ready(rvq) || !vq_is_ready(tvq)) {
+			RTE_LOG(INFO, VHOST_CONFIG,
+				"virtio is not ready for processing.\n");
+			return 0;
+		}
 	}
+
 	RTE_LOG(INFO, VHOST_CONFIG,
-		"virtio isn't ready for processing.\n");
-	return 0;
+		"virtio is now ready for processing.\n");
+	return 1;
 }
 
 void
@@ -292,13 +300,13 @@ user_get_vring_base(struct vhost_device_ctx ctx,
 	 * sent and only sent in vhost_vring_stop.
 	 * TODO: cleanup the vring, it isn't usable since here.
 	 */
-	if ((dev->virtqueue[VIRTIO_RXQ]->kickfd) >= 0) {
-		close(dev->virtqueue[VIRTIO_RXQ]->kickfd);
-		dev->virtqueue[VIRTIO_RXQ]->kickfd = -1;
+	if ((dev->virtqueue[state->index]->kickfd + VIRTIO_RXQ) >= 0) {
+		close(dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd);
+		dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd = -1;
 	}
-	if ((dev->virtqueue[VIRTIO_TXQ]->kickfd) >= 0) {
-		close(dev->virtqueue[VIRTIO_TXQ]->kickfd);
-		dev->virtqueue[VIRTIO_TXQ]->kickfd = -1;
+	if ((dev->virtqueue[state->index]->kickfd + VIRTIO_TXQ) >= 0) {
+		close(dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd);
+		dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd = -1;
 	}
 
 	return 0;
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 830f22a..772f835 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -36,6 +36,7 @@
 #include <stddef.h>
 #include <stdint.h>
 #include <stdlib.h>
+#include <assert.h>
 #include <sys/mman.h>
 #include <unistd.h>
 #ifdef RTE_LIBRTE_VHOST_NUMA
@@ -178,6 +179,15 @@ add_config_ll_entry(struct virtio_net_config_ll *new_ll_dev)
 
 }
 
+static void
+cleanup_vq(struct vhost_virtqueue *vq)
+{
+	if (vq->callfd >= 0)
+		close(vq->callfd);
+	if (vq->kickfd >= 0)
+		close(vq->kickfd);
+}
+
 /*
  * Unmap any memory, close any file descriptors and
  * free any memory owned by a device.
@@ -185,6 +195,8 @@ add_config_ll_entry(struct virtio_net_config_ll *new_ll_dev)
 static void
 cleanup_device(struct virtio_net *dev)
 {
+	uint32_t i;
+
 	/* Unmap QEMU memory file if mapped. */
 	if (dev->mem) {
 		munmap((void *)(uintptr_t)dev->mem->mapped_address,
@@ -192,15 +204,10 @@ cleanup_device(struct virtio_net *dev)
 		free(dev->mem);
 	}
 
-	/* Close any event notifiers opened by device. */
-	if (dev->virtqueue[VIRTIO_RXQ]->callfd >= 0)
-		close(dev->virtqueue[VIRTIO_RXQ]->callfd);
-	if (dev->virtqueue[VIRTIO_RXQ]->kickfd >= 0)
-		close(dev->virtqueue[VIRTIO_RXQ]->kickfd);
-	if (dev->virtqueue[VIRTIO_TXQ]->callfd >= 0)
-		close(dev->virtqueue[VIRTIO_TXQ]->callfd);
-	if (dev->virtqueue[VIRTIO_TXQ]->kickfd >= 0)
-		close(dev->virtqueue[VIRTIO_TXQ]->kickfd);
+	for (i = 0; i < dev->virt_qp_nb; i++) {
+		cleanup_vq(dev->virtqueue[i * VIRTIO_QNUM + VIRTIO_RXQ]);
+		cleanup_vq(dev->virtqueue[i * VIRTIO_QNUM + VIRTIO_TXQ]);
+	}
 }
 
 /*
@@ -209,9 +216,11 @@ cleanup_device(struct virtio_net *dev)
 static void
 free_device(struct virtio_net_config_ll *ll_dev)
 {
-	/* Free any malloc'd memory */
-	rte_free(ll_dev->dev.virtqueue[VIRTIO_RXQ]);
-	rte_free(ll_dev->dev.virtqueue[VIRTIO_TXQ]);
+	uint32_t i;
+
+	for (i = 0; i < ll_dev->dev.virt_qp_nb; i++)
+		rte_free(ll_dev->dev.virtqueue[i * VIRTIO_QNUM]);
+
 	rte_free(ll_dev);
 }
 
@@ -244,34 +253,68 @@ rm_config_ll_entry(struct virtio_net_config_ll *ll_dev,
 	}
 }
 
+static void
+init_vring_queue(struct vhost_virtqueue *vq)
+{
+	memset(vq, 0, sizeof(struct vhost_virtqueue));
+
+	vq->kickfd = -1;
+	vq->callfd = -1;
+
+	/* Backends are set to -1 indicating an inactive device. */
+	vq->backend = -1;
+}
+
+static void
+init_vring_queue_pair(struct virtio_net *dev, uint32_t qp_idx)
+{
+	init_vring_queue(dev->virtqueue[qp_idx * VIRTIO_QNUM + VIRTIO_RXQ]);
+	init_vring_queue(dev->virtqueue[qp_idx * VIRTIO_QNUM + VIRTIO_TXQ]);
+}
+
+static int
+alloc_vring_queue_pair(struct virtio_net *dev, uint32_t qp_idx)
+{
+	struct vhost_virtqueue *virtqueue = NULL;
+	uint32_t virt_rx_q_idx = qp_idx * VIRTIO_QNUM + VIRTIO_RXQ;
+	uint32_t virt_tx_q_idx = qp_idx * VIRTIO_QNUM + VIRTIO_TXQ;
+
+	virtqueue = rte_malloc(NULL,
+			       sizeof(struct vhost_virtqueue) * VIRTIO_QNUM, 0);
+	if (virtqueue == NULL) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"Failed to allocate memory for virt qp:%d.\n", qp_idx);
+		return -1;
+	}
+
+	dev->virtqueue[virt_rx_q_idx] = virtqueue;
+	dev->virtqueue[virt_tx_q_idx] = virtqueue + VIRTIO_TXQ;
+
+	init_vring_queue_pair(dev, qp_idx);
+
+	dev->virt_qp_nb += 1;
+
+	return 0;
+}
+
 /*
  *  Initialise all variables in device structure.
  */
 static void
 init_device(struct virtio_net *dev)
 {
-	uint64_t vq_offset;
+	int vq_offset;
+	uint32_t i;
 
 	/*
 	 * Virtqueues have already been malloced so
 	 * we don't want to set them to NULL.
 	 */
-	vq_offset = offsetof(struct virtio_net, mem);
-
-	/* Set everything to 0. */
-	memset((void *)(uintptr_t)((uint64_t)(uintptr_t)dev + vq_offset), 0,
-		(sizeof(struct virtio_net) - (size_t)vq_offset));
-	memset(dev->virtqueue[VIRTIO_RXQ], 0, sizeof(struct vhost_virtqueue));
-	memset(dev->virtqueue[VIRTIO_TXQ], 0, sizeof(struct vhost_virtqueue));
-
-	dev->virtqueue[VIRTIO_RXQ]->kickfd = -1;
-	dev->virtqueue[VIRTIO_RXQ]->callfd = -1;
-	dev->virtqueue[VIRTIO_TXQ]->kickfd = -1;
-	dev->virtqueue[VIRTIO_TXQ]->callfd = -1;
+	vq_offset = offsetof(struct virtio_net, virtqueue);
+	memset(dev, 0, vq_offset);
 
-	/* Backends are set to -1 indicating an inactive device. */
-	dev->virtqueue[VIRTIO_RXQ]->backend = VIRTIO_DEV_STOPPED;
-	dev->virtqueue[VIRTIO_TXQ]->backend = VIRTIO_DEV_STOPPED;
+	for (i = 0; i < dev->virt_qp_nb; i++)
+		init_vring_queue_pair(dev, i);
 }
 
 /*
@@ -283,7 +326,6 @@ static int
 new_device(struct vhost_device_ctx ctx)
 {
 	struct virtio_net_config_ll *new_ll_dev;
-	struct vhost_virtqueue *virtqueue_rx, *virtqueue_tx;
 
 	/* Setup device and virtqueues. */
 	new_ll_dev = rte_malloc(NULL, sizeof(struct virtio_net_config_ll), 0);
@@ -294,28 +336,6 @@ new_device(struct vhost_device_ctx ctx)
 		return -1;
 	}
 
-	virtqueue_rx = rte_malloc(NULL, sizeof(struct vhost_virtqueue), 0);
-	if (virtqueue_rx == NULL) {
-		rte_free(new_ll_dev);
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") Failed to allocate memory for rxq.\n",
-			ctx.fh);
-		return -1;
-	}
-
-	virtqueue_tx = rte_malloc(NULL, sizeof(struct vhost_virtqueue), 0);
-	if (virtqueue_tx == NULL) {
-		rte_free(virtqueue_rx);
-		rte_free(new_ll_dev);
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") Failed to allocate memory for txq.\n",
-			ctx.fh);
-		return -1;
-	}
-
-	new_ll_dev->dev.virtqueue[VIRTIO_RXQ] = virtqueue_rx;
-	new_ll_dev->dev.virtqueue[VIRTIO_TXQ] = virtqueue_tx;
-
 	/* Initialise device and virtqueues. */
 	init_device(&new_ll_dev->dev);
 
@@ -441,6 +461,8 @@ static int
 set_features(struct vhost_device_ctx ctx, uint64_t *pu)
 {
 	struct virtio_net *dev;
+	uint16_t vhost_hlen;
+	uint16_t i;
 
 	dev = get_device(ctx);
 	if (dev == NULL)
@@ -448,27 +470,26 @@ set_features(struct vhost_device_ctx ctx, uint64_t *pu)
 	if (*pu & ~VHOST_FEATURES)
 		return -1;
 
-	/* Store the negotiated feature list for the device. */
 	dev->features = *pu;
-
-	/* Set the vhost_hlen depending on if VIRTIO_NET_F_MRG_RXBUF is set. */
 	if (dev->features & (1 << VIRTIO_NET_F_MRG_RXBUF)) {
 		LOG_DEBUG(VHOST_CONFIG,
 			"(%"PRIu64") Mergeable RX buffers enabled\n",
 			dev->device_fh);
-		dev->virtqueue[VIRTIO_RXQ]->vhost_hlen =
-			sizeof(struct virtio_net_hdr_mrg_rxbuf);
-		dev->virtqueue[VIRTIO_TXQ]->vhost_hlen =
-			sizeof(struct virtio_net_hdr_mrg_rxbuf);
+		vhost_hlen = sizeof(struct virtio_net_hdr_mrg_rxbuf);
 	} else {
 		LOG_DEBUG(VHOST_CONFIG,
 			"(%"PRIu64") Mergeable RX buffers disabled\n",
 			dev->device_fh);
-		dev->virtqueue[VIRTIO_RXQ]->vhost_hlen =
-			sizeof(struct virtio_net_hdr);
-		dev->virtqueue[VIRTIO_TXQ]->vhost_hlen =
-			sizeof(struct virtio_net_hdr);
+		vhost_hlen = sizeof(struct virtio_net_hdr);
+	}
+
+	for (i = 0; i < dev->virt_qp_nb; i++) {
+		uint16_t base_idx = i * VIRTIO_QNUM;
+
+		dev->virtqueue[base_idx + VIRTIO_RXQ]->vhost_hlen = vhost_hlen;
+		dev->virtqueue[base_idx + VIRTIO_TXQ]->vhost_hlen = vhost_hlen;
 	}
+
 	return 0;
 }
 
@@ -684,13 +705,24 @@ set_vring_call(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
 {
 	struct virtio_net *dev;
 	struct vhost_virtqueue *vq;
+	uint32_t cur_qp_idx = file->index / VIRTIO_QNUM;
 
 	dev = get_device(ctx);
 	if (dev == NULL)
 		return -1;
 
+	/*
+	 * FIXME: VHOST_SET_VRING_CALL is the first per-vring message
+	 * we get, so we do vring queue pair allocation here.
+	 */
+	if (cur_qp_idx + 1 > dev->virt_qp_nb) {
+		if (alloc_vring_queue_pair(dev, cur_qp_idx) < 0)
+			return -1;
+	}
+
 	/* file->index refers to the queue index. The txq is 1, rxq is 0. */
 	vq = dev->virtqueue[file->index];
+	assert(vq != NULL);
 
 	if (vq->callfd >= 0)
 		close(vq->callfd);
-- 
1.9.0

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [dpdk-dev] [PATCH v8 4/8] vhost: rxtx: use queue id instead of constant ring index
  2015-10-22 12:35 ` [dpdk-dev] [PATCH v8 0/8] vhost-user multiple queues enabling Yuanhan Liu
                     ` (2 preceding siblings ...)
  2015-10-22 12:35   ` [dpdk-dev] [PATCH v8 3/8] vhost: vring queue setup for multiple queue support Yuanhan Liu
@ 2015-10-22 12:35   ` Yuanhan Liu
  2015-10-22 12:35   ` [dpdk-dev] [PATCH v8 5/8] virtio: fix deadloop due to reading virtio_net_config incorrectly Yuanhan Liu
                     ` (4 subsequent siblings)
  8 siblings, 0 replies; 66+ messages in thread
From: Yuanhan Liu @ 2015-10-22 12:35 UTC (permalink / raw)
  To: dev; +Cc: Michael S. Tsirkin, marcel, Changchun Ouyang

From: Changchun Ouyang <changchun.ouyang@intel.com>

Do not use VIRTIO_RXQ or VIRTIO_TXQ anymore; use the queue_id
instead, which will be set to a proper value for a specific queue
when we have multiple queue support enabled.

For now, queue_id is still set with VIRTIO_RXQ or VIRTIO_TXQ,
so it should not break anything.

Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>

---

v8: simplify is_valid_vrit_queue_idx()

v7: commit title fix
---
 lib/librte_vhost/vhost_rxtx.c | 43 +++++++++++++++++++++++++++++--------------
 1 file changed, 29 insertions(+), 14 deletions(-)

diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index 7026bfa..1ec8850 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -32,6 +32,7 @@
  */
 
 #include <stdint.h>
+#include <stdbool.h>
 #include <linux/virtio_net.h>
 
 #include <rte_mbuf.h>
@@ -42,6 +43,12 @@
 
 #define MAX_PKT_BURST 32
 
+static bool
+is_valid_virt_queue_idx(uint32_t idx, int is_tx, uint32_t qp_nb)
+{
+	return (is_tx ^ (idx & 1)) == 0 && idx < qp_nb * VIRTIO_QNUM;
+}
+
 /**
  * This function adds buffers to the virtio devices RX virtqueue. Buffers can
  * be received from the physical port or from another virtio device. A packet
@@ -68,12 +75,14 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
 	uint8_t success = 0;
 
 	LOG_DEBUG(VHOST_DATA, "(%"PRIu64") virtio_dev_rx()\n", dev->device_fh);
-	if (unlikely(queue_id != VIRTIO_RXQ)) {
-		LOG_DEBUG(VHOST_DATA, "mq isn't supported in this version.\n");
+	if (unlikely(!is_valid_virt_queue_idx(queue_id, 0, dev->virt_qp_nb))) {
+		RTE_LOG(ERR, VHOST_DATA,
+			"%s (%"PRIu64"): virtqueue idx:%d invalid.\n",
+			__func__, dev->device_fh, queue_id);
 		return 0;
 	}
 
-	vq = dev->virtqueue[VIRTIO_RXQ];
+	vq = dev->virtqueue[queue_id];
 	count = (count > MAX_PKT_BURST) ? MAX_PKT_BURST : count;
 
 	/*
@@ -235,8 +244,9 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
 }
 
 static inline uint32_t __attribute__((always_inline))
-copy_from_mbuf_to_vring(struct virtio_net *dev, uint16_t res_base_idx,
-	uint16_t res_end_idx, struct rte_mbuf *pkt)
+copy_from_mbuf_to_vring(struct virtio_net *dev, uint32_t queue_id,
+			uint16_t res_base_idx, uint16_t res_end_idx,
+			struct rte_mbuf *pkt)
 {
 	uint32_t vec_idx = 0;
 	uint32_t entry_success = 0;
@@ -264,7 +274,7 @@ copy_from_mbuf_to_vring(struct virtio_net *dev, uint16_t res_base_idx,
 	 * Convert from gpa to vva
 	 * (guest physical addr -> vhost virtual addr)
 	 */
-	vq = dev->virtqueue[VIRTIO_RXQ];
+	vq = dev->virtqueue[queue_id];
 	vb_addr = gpa_to_vva(dev, vq->buf_vec[vec_idx].buf_addr);
 	vb_hdr_addr = vb_addr;
 
@@ -464,11 +474,14 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id,
 
 	LOG_DEBUG(VHOST_DATA, "(%"PRIu64") virtio_dev_merge_rx()\n",
 		dev->device_fh);
-	if (unlikely(queue_id != VIRTIO_RXQ)) {
-		LOG_DEBUG(VHOST_DATA, "mq isn't supported in this version.\n");
+	if (unlikely(!is_valid_virt_queue_idx(queue_id, 0, dev->virt_qp_nb))) {
+		RTE_LOG(ERR, VHOST_DATA,
+			"%s (%"PRIu64"): virtqueue idx:%d invalid.\n",
+			__func__, dev->device_fh, queue_id);
+		return 0;
 	}
 
-	vq = dev->virtqueue[VIRTIO_RXQ];
+	vq = dev->virtqueue[queue_id];
 	count = RTE_MIN((uint32_t)MAX_PKT_BURST, count);
 
 	if (count == 0)
@@ -509,8 +522,8 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id,
 							res_cur_idx);
 		} while (success == 0);
 
-		entry_success = copy_from_mbuf_to_vring(dev, res_base_idx,
-			res_cur_idx, pkts[pkt_idx]);
+		entry_success = copy_from_mbuf_to_vring(dev, queue_id,
+			res_base_idx, res_cur_idx, pkts[pkt_idx]);
 
 		rte_compiler_barrier();
 
@@ -562,12 +575,14 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id,
 	uint16_t free_entries, entry_success = 0;
 	uint16_t avail_idx;
 
-	if (unlikely(queue_id != VIRTIO_TXQ)) {
-		LOG_DEBUG(VHOST_DATA, "mq isn't supported in this version.\n");
+	if (unlikely(!is_valid_virt_queue_idx(queue_id, 1, dev->virt_qp_nb))) {
+		RTE_LOG(ERR, VHOST_DATA,
+			"%s (%"PRIu64"): virtqueue idx:%d invalid.\n",
+			__func__, dev->device_fh, queue_id);
 		return 0;
 	}
 
-	vq = dev->virtqueue[VIRTIO_TXQ];
+	vq = dev->virtqueue[queue_id];
 	avail_idx =  *((volatile uint16_t *)&vq->avail->idx);
 
 	/* If there are no available buffers then return. */
-- 
1.9.0

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [dpdk-dev] [PATCH v8 5/8] virtio: fix deadloop due to reading virtio_net_config incorrectly
  2015-10-22 12:35 ` [dpdk-dev] [PATCH v8 0/8] vhost-user multiple queues enabling Yuanhan Liu
                     ` (3 preceding siblings ...)
  2015-10-22 12:35   ` [dpdk-dev] [PATCH v8 4/8] vhost: rxtx: use queue id instead of constant ring index Yuanhan Liu
@ 2015-10-22 12:35   ` Yuanhan Liu
  2015-10-22 12:35   ` [dpdk-dev] [PATCH v8 6/8] vhost: add VHOST_USER_SET_VRING_ENABLE message Yuanhan Liu
                     ` (3 subsequent siblings)
  8 siblings, 0 replies; 66+ messages in thread
From: Yuanhan Liu @ 2015-10-22 12:35 UTC (permalink / raw)
  To: dev; +Cc: Michael S. Tsirkin, marcel, Changchun Ouyang

From: Changchun Ouyang <changchun.ouyang@intel.com>

The old code adjusts the config bytes we want to read depending on
what kind of features we have, but we later cast the entire buf we
read with "struct virtio_net_config", which is obviously wrong.

The wrong config reading results to a dead loop at virtio_send_command()
while starting testpmd.

The right way to go is to read related config bytes when corresponding
feature is set, which is exactly what this patch does.

Fixes: 823ad647950a ("virtio: support multiple queues")

Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>

---

v7: commit log fixes

v6: read mac unconditionally.
---
 drivers/net/virtio/virtio_ethdev.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index 02f698a..12fcc23 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1162,7 +1162,6 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
 	struct virtio_hw *hw = eth_dev->data->dev_private;
 	struct virtio_net_config *config;
 	struct virtio_net_config local_config;
-	uint32_t offset_conf = sizeof(config->mac);
 	struct rte_pci_device *pci_dev;
 
 	RTE_BUILD_BUG_ON(RTE_PKTMBUF_HEADROOM < sizeof(struct virtio_net_hdr));
@@ -1225,8 +1224,14 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
 	if (vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_VQ)) {
 		config = &local_config;
 
+		vtpci_read_dev_config(hw,
+			offsetof(struct virtio_net_config, mac),
+			&config->mac, sizeof(config->mac));
+
 		if (vtpci_with_feature(hw, VIRTIO_NET_F_STATUS)) {
-			offset_conf += sizeof(config->status);
+			vtpci_read_dev_config(hw,
+				offsetof(struct virtio_net_config, status),
+				&config->status, sizeof(config->status));
 		} else {
 			PMD_INIT_LOG(DEBUG,
 				     "VIRTIO_NET_F_STATUS is not supported");
@@ -1234,15 +1239,16 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
 		}
 
 		if (vtpci_with_feature(hw, VIRTIO_NET_F_MQ)) {
-			offset_conf += sizeof(config->max_virtqueue_pairs);
+			vtpci_read_dev_config(hw,
+				offsetof(struct virtio_net_config, max_virtqueue_pairs),
+				&config->max_virtqueue_pairs,
+				sizeof(config->max_virtqueue_pairs));
 		} else {
 			PMD_INIT_LOG(DEBUG,
 				     "VIRTIO_NET_F_MQ is not supported");
 			config->max_virtqueue_pairs = 1;
 		}
 
-		vtpci_read_dev_config(hw, 0, (uint8_t *)config, offset_conf);
-
 		hw->max_rx_queues =
 			(VIRTIO_MAX_RX_QUEUES < config->max_virtqueue_pairs) ?
 			VIRTIO_MAX_RX_QUEUES : config->max_virtqueue_pairs;
-- 
1.9.0

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [dpdk-dev] [PATCH v8 6/8] vhost: add VHOST_USER_SET_VRING_ENABLE message
  2015-10-22 12:35 ` [dpdk-dev] [PATCH v8 0/8] vhost-user multiple queues enabling Yuanhan Liu
                     ` (4 preceding siblings ...)
  2015-10-22 12:35   ` [dpdk-dev] [PATCH v8 5/8] virtio: fix deadloop due to reading virtio_net_config incorrectly Yuanhan Liu
@ 2015-10-22 12:35   ` Yuanhan Liu
  2015-10-22 12:35   ` [dpdk-dev] [PATCH v8 7/8] vhost-user: enable vhost-user multiple queue Yuanhan Liu
                     ` (2 subsequent siblings)
  8 siblings, 0 replies; 66+ messages in thread
From: Yuanhan Liu @ 2015-10-22 12:35 UTC (permalink / raw)
  To: dev; +Cc: Michael S. Tsirkin, marcel, Changchun Ouyang

From: Changchun Ouyang <changchun.ouyang@intel.com>

This message is used to enable/disable a specific vring queue pair.
The first queue pair is enabled by default.

Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>

---
v7: invoke vring_state_changed() callback once for each queue pair.

v6: add a vring state changed callback, for informing the application
    that a specific vring is enabled/disabled. You could either flush
    packets haven't been processed yet, or simply just drop them.
---
 lib/librte_vhost/rte_virtio_net.h             |  9 ++++++++-
 lib/librte_vhost/vhost_rxtx.c                 | 10 ++++++++++
 lib/librte_vhost/vhost_user/vhost-net-user.c  |  5 +++++
 lib/librte_vhost/vhost_user/vhost-net-user.h  |  1 +
 lib/librte_vhost/vhost_user/virtio-net-user.c | 27 +++++++++++++++++++++++++++
 lib/librte_vhost/vhost_user/virtio-net-user.h |  3 +++
 lib/librte_vhost/virtio-net.c                 | 12 +++++++++---
 7 files changed, 63 insertions(+), 4 deletions(-)

diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
index 9a32a95..426a70d 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -89,6 +89,7 @@ struct vhost_virtqueue {
 	volatile uint16_t	last_used_idx_res;	/**< Used for multiple devices reserving buffers. */
 	int			callfd;			/**< Used to notify the guest (trigger interrupt). */
 	int			kickfd;			/**< Currently unused as polling mode is enabled. */
+	int			enabled;
 	struct buf_vector	buf_vec[BUF_VECTOR_MAX];	/**< for scatter RX. */
 } __rte_cache_aligned;
 
@@ -132,7 +133,7 @@ struct virtio_memory {
 };
 
 /**
- * Device operations to add/remove device.
+ * Device and vring operations.
  *
  * Make sure to set VIRTIO_DEV_RUNNING to the device flags in new_device and
  * remove it in destroy_device.
@@ -141,12 +142,18 @@ struct virtio_memory {
 struct virtio_net_device_ops {
 	int (*new_device)(struct virtio_net *);	/**< Add device. */
 	void (*destroy_device)(volatile struct virtio_net *);	/**< Remove device. */
+
+	int (*vring_state_changed)(struct virtio_net *dev, uint16_t queue_id, int enable);	/**< triggered when a vring is enabled or disabled */
 };
 
 static inline uint16_t __attribute__((always_inline))
 rte_vring_available_entries(struct virtio_net *dev, uint16_t queue_id)
 {
 	struct vhost_virtqueue *vq = dev->virtqueue[queue_id];
+
+	if (!vq->enabled)
+		return 0;
+
 	return *(volatile uint16_t *)&vq->avail->idx - vq->last_used_idx_res;
 }
 
diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index 1ec8850..9322ce6 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -83,6 +83,9 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
 	}
 
 	vq = dev->virtqueue[queue_id];
+	if (unlikely(vq->enabled == 0))
+		return 0;
+
 	count = (count > MAX_PKT_BURST) ? MAX_PKT_BURST : count;
 
 	/*
@@ -275,6 +278,7 @@ copy_from_mbuf_to_vring(struct virtio_net *dev, uint32_t queue_id,
 	 * (guest physical addr -> vhost virtual addr)
 	 */
 	vq = dev->virtqueue[queue_id];
+
 	vb_addr = gpa_to_vva(dev, vq->buf_vec[vec_idx].buf_addr);
 	vb_hdr_addr = vb_addr;
 
@@ -482,6 +486,9 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id,
 	}
 
 	vq = dev->virtqueue[queue_id];
+	if (unlikely(vq->enabled == 0))
+		return 0;
+
 	count = RTE_MIN((uint32_t)MAX_PKT_BURST, count);
 
 	if (count == 0)
@@ -583,6 +590,9 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id,
 	}
 
 	vq = dev->virtqueue[queue_id];
+	if (unlikely(vq->enabled == 0))
+		return 0;
+
 	avail_idx =  *((volatile uint16_t *)&vq->avail->idx);
 
 	/* If there are no available buffers then return. */
diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c b/lib/librte_vhost/vhost_user/vhost-net-user.c
index 8675cd4..f681676 100644
--- a/lib/librte_vhost/vhost_user/vhost-net-user.c
+++ b/lib/librte_vhost/vhost_user/vhost-net-user.c
@@ -99,6 +99,7 @@ static const char *vhost_message_str[VHOST_USER_MAX] = {
 	[VHOST_USER_GET_PROTOCOL_FEATURES]  = "VHOST_USER_GET_PROTOCOL_FEATURES",
 	[VHOST_USER_SET_PROTOCOL_FEATURES]  = "VHOST_USER_SET_PROTOCOL_FEATURES",
 	[VHOST_USER_GET_QUEUE_NUM]  = "VHOST_USER_GET_QUEUE_NUM",
+	[VHOST_USER_SET_VRING_ENABLE]  = "VHOST_USER_SET_VRING_ENABLE",
 };
 
 /**
@@ -428,6 +429,10 @@ vserver_message_handler(int connfd, void *dat, int *remove)
 		send_vhost_message(connfd, &msg);
 		break;
 
+	case VHOST_USER_SET_VRING_ENABLE:
+		user_set_vring_enable(ctx, &msg.payload.state);
+		break;
+
 	default:
 		break;
 
diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.h b/lib/librte_vhost/vhost_user/vhost-net-user.h
index 389d21d..38637cc 100644
--- a/lib/librte_vhost/vhost_user/vhost-net-user.h
+++ b/lib/librte_vhost/vhost_user/vhost-net-user.h
@@ -66,6 +66,7 @@ typedef enum VhostUserRequest {
 	VHOST_USER_GET_PROTOCOL_FEATURES = 15,
 	VHOST_USER_SET_PROTOCOL_FEATURES = 16,
 	VHOST_USER_GET_QUEUE_NUM = 17,
+	VHOST_USER_SET_VRING_ENABLE = 18,
 	VHOST_USER_MAX
 } VhostUserRequest;
 
diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c b/lib/librte_vhost/vhost_user/virtio-net-user.c
index d62f3d7..6ebbd4f 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.c
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
@@ -312,6 +312,33 @@ user_get_vring_base(struct vhost_device_ctx ctx,
 	return 0;
 }
 
+/*
+ * when virtio queues are ready to work, qemu will send us to
+ * enable the virtio queue pair.
+ */
+int
+user_set_vring_enable(struct vhost_device_ctx ctx,
+		      struct vhost_vring_state *state)
+{
+	struct virtio_net *dev = get_device(ctx);
+	uint16_t base_idx = state->index;
+	int enable = (int)state->num;
+
+	RTE_LOG(INFO, VHOST_CONFIG,
+		"set queue enable: %d to qp idx: %d\n",
+		enable, state->index);
+
+	if (notify_ops->vring_state_changed) {
+		notify_ops->vring_state_changed(dev, base_idx / VIRTIO_QNUM,
+						enable);
+	}
+
+	dev->virtqueue[base_idx + VIRTIO_RXQ]->enabled = enable;
+	dev->virtqueue[base_idx + VIRTIO_TXQ]->enabled = enable;
+
+	return 0;
+}
+
 void
 user_destroy_device(struct vhost_device_ctx ctx)
 {
diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.h b/lib/librte_vhost/vhost_user/virtio-net-user.h
index e7a6ff4..d46057e 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.h
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.h
@@ -50,5 +50,8 @@ void user_set_protocol_features(struct vhost_device_ctx ctx,
 
 int user_get_vring_base(struct vhost_device_ctx, struct vhost_vring_state *);
 
+int user_set_vring_enable(struct vhost_device_ctx ctx,
+			  struct vhost_vring_state *state);
+
 void user_destroy_device(struct vhost_device_ctx);
 #endif
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 772f835..48629d0 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -254,7 +254,7 @@ rm_config_ll_entry(struct virtio_net_config_ll *ll_dev,
 }
 
 static void
-init_vring_queue(struct vhost_virtqueue *vq)
+init_vring_queue(struct vhost_virtqueue *vq, int qp_idx)
 {
 	memset(vq, 0, sizeof(struct vhost_virtqueue));
 
@@ -263,13 +263,19 @@ init_vring_queue(struct vhost_virtqueue *vq)
 
 	/* Backends are set to -1 indicating an inactive device. */
 	vq->backend = -1;
+
+	/* always set the default vq pair to enabled */
+	if (qp_idx == 0)
+		vq->enabled = 1;
 }
 
 static void
 init_vring_queue_pair(struct virtio_net *dev, uint32_t qp_idx)
 {
-	init_vring_queue(dev->virtqueue[qp_idx * VIRTIO_QNUM + VIRTIO_RXQ]);
-	init_vring_queue(dev->virtqueue[qp_idx * VIRTIO_QNUM + VIRTIO_TXQ]);
+	uint32_t base_idx = qp_idx * VIRTIO_QNUM;
+
+	init_vring_queue(dev->virtqueue[base_idx + VIRTIO_RXQ], qp_idx);
+	init_vring_queue(dev->virtqueue[base_idx + VIRTIO_TXQ], qp_idx);
 }
 
 static int
-- 
1.9.0

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [dpdk-dev] [PATCH v8 7/8] vhost-user: enable vhost-user multiple queue
  2015-10-22 12:35 ` [dpdk-dev] [PATCH v8 0/8] vhost-user multiple queues enabling Yuanhan Liu
                     ` (5 preceding siblings ...)
  2015-10-22 12:35   ` [dpdk-dev] [PATCH v8 6/8] vhost: add VHOST_USER_SET_VRING_ENABLE message Yuanhan Liu
@ 2015-10-22 12:35   ` Yuanhan Liu
  2015-10-22 12:35   ` [dpdk-dev] [PATCH v8 8/8] doc: update release note for vhost-user mq support Yuanhan Liu
  2015-10-26  1:36   ` [dpdk-dev] [PATCH v8 0/8] vhost-user multiple queues enabling Xie, Huawei
  8 siblings, 0 replies; 66+ messages in thread
From: Yuanhan Liu @ 2015-10-22 12:35 UTC (permalink / raw)
  To: dev; +Cc: marcel, Michael S. Tsirkin

By setting VHOST_USER_PROTOCOL_F_MQ protocol feature bit, and
VIRTIO_NET_F_MQ feature bit.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
---
 lib/librte_vhost/vhost_user/virtio-net-user.h | 4 +++-
 lib/librte_vhost/virtio-net.c                 | 1 +
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.h b/lib/librte_vhost/vhost_user/virtio-net-user.h
index d46057e..b82108d 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.h
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.h
@@ -37,7 +37,9 @@
 #include "vhost-net.h"
 #include "vhost-net-user.h"
 
-#define VHOST_USER_PROTOCOL_FEATURES	0ULL
+#define VHOST_USER_PROTOCOL_F_MQ	0
+
+#define VHOST_USER_PROTOCOL_FEATURES	(1ULL << VHOST_USER_PROTOCOL_F_MQ)
 
 int user_set_mem_table(struct vhost_device_ctx, struct VhostUserMsg *);
 
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 48629d0..97213c5 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -74,6 +74,7 @@ static struct virtio_net_config_ll *ll_root;
 #define VHOST_SUPPORTED_FEATURES ((1ULL << VIRTIO_NET_F_MRG_RXBUF) | \
 				(1ULL << VIRTIO_NET_F_CTRL_VQ) | \
 				(1ULL << VIRTIO_NET_F_CTRL_RX) | \
+				(1ULL << VIRTIO_NET_F_MQ)      | \
 				(1ULL << VHOST_F_LOG_ALL)      | \
 				(1ULL << VHOST_USER_F_PROTOCOL_FEATURES))
 static uint64_t VHOST_FEATURES = VHOST_SUPPORTED_FEATURES;
-- 
1.9.0

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [dpdk-dev] [PATCH v8 8/8] doc: update release note for vhost-user mq support
  2015-10-22 12:35 ` [dpdk-dev] [PATCH v8 0/8] vhost-user multiple queues enabling Yuanhan Liu
                     ` (6 preceding siblings ...)
  2015-10-22 12:35   ` [dpdk-dev] [PATCH v8 7/8] vhost-user: enable vhost-user multiple queue Yuanhan Liu
@ 2015-10-22 12:35   ` Yuanhan Liu
  2015-10-26 20:22     ` Thomas Monjalon
  2015-10-26  1:36   ` [dpdk-dev] [PATCH v8 0/8] vhost-user multiple queues enabling Xie, Huawei
  8 siblings, 1 reply; 66+ messages in thread
From: Yuanhan Liu @ 2015-10-22 12:35 UTC (permalink / raw)
  To: dev; +Cc: marcel, Michael S. Tsirkin

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
---
 doc/guides/rel_notes/release_2_2.rst | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/doc/guides/rel_notes/release_2_2.rst b/doc/guides/rel_notes/release_2_2.rst
index 4f75cff..612ddd9 100644
--- a/doc/guides/rel_notes/release_2_2.rst
+++ b/doc/guides/rel_notes/release_2_2.rst
@@ -9,6 +9,10 @@ New Features
   *  Added support for Jumbo Frames.
   *  Optimize forwarding performance for Chelsio T5 40GbE cards.
 
+* **vhost: added vhost-user mulitple queue support.**
+
+  Added vhost-user multiple queue support.
+
 
 Resolved Issues
 ---------------
-- 
1.9.0

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v7 4/8] vhost: rxtx: use queue id instead of constant ring index
  2015-10-22 11:32           ` Michael S. Tsirkin
@ 2015-10-22 14:07             ` Yuanhan Liu
  2015-10-22 14:19               ` Michael S. Tsirkin
  2015-10-24  2:34             ` Flavio Leitner
  1 sibling, 1 reply; 66+ messages in thread
From: Yuanhan Liu @ 2015-10-22 14:07 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: dev, marcel

On Thu, Oct 22, 2015 at 02:32:31PM +0300, Michael S. Tsirkin wrote:
> On Thu, Oct 22, 2015 at 05:49:55PM +0800, Yuanhan Liu wrote:
> > On Wed, Oct 21, 2015 at 05:26:18PM +0300, Michael S. Tsirkin wrote:
> > > On Wed, Oct 21, 2015 at 08:48:15PM +0800, Yuanhan Liu wrote:
> > > > > Please note that for virtio devices, guest is supposed to
> > > > > control the placement of incoming packets in RX queues.
> > > > 
> > > > I may not follow you.
> > > > 
> > > > Enqueuing packets to a RX queue is done at vhost lib, outside the
> > > > guest, how could the guest take the control here?
> > > > 
> > > > 	--yliu
> > > 
> > > vhost should do what guest told it to.
> > > 
> > > See virtio spec:
> > > 	5.1.6.5.5 Automatic receive steering in multiqueue mode
> > 
> > Spec says:
> > 
> >     After the driver transmitted a packet of a flow on transmitqX,
> >     the device SHOULD cause incoming packets for that flow to be
> >     steered to receiveqX.
> > 
> > 
> > Michael, I still have no idea how vhost could know the flow even
> > after discussion with Huawei. Could you be more specific about
> > this? Say, how could guest know that? And how could guest tell
> > vhost which RX is gonna to use?
> > 
> > Thanks.
> > 
> > 	--yliu
> 
> I don't really understand the question.
> 
> When guests transmits a packet, it makes a decision
> about the flow to use, and maps that to a tx/rx pair of queues.
> 
> It sends packets out on the tx queue and expects device to
> return packets from the same flow on the rx queue.
> 
> During transmit, device needs to figure out the flow
> of packets as they are received from guest, and track
> which flows go on which tx queue.
> When it selects the rx queue, it has to use the same table.

Thanks for the length explanation, Michael!

I guess the key is are we able to get the table inside vhost-user
lib? And, are you looking for something like following?

	static int rte_vhost_enqueue_burst(pkts)
	{
		for_each_pkts(pkt) {
			int rxq = get_rxq_from_table(pkt);
	
			queue_to_rxq(pkt, rxq);
		}
	}

BTW, there should be such implementation at some where, right?
If so, would you please point it to me?

In the meanwhile, I will read more doc/code to try to understand
it.

	--yliu

> 
> There is currently no provision for controlling
> steering for uni-directional
> flows which are possible e.g. with UDP.
> 
> We might solve this in a future spec - for example, set a flag notifying
> guest that steering information is missing for a given flow, for example
> by setting a flag in a packet, or using the command queue, and have
> guest send a dummy empty packet to set steering rule for this flow.
> 
> 
> -- 
> MST

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v7 4/8] vhost: rxtx: use queue id instead of constant ring index
  2015-10-22 14:07             ` Yuanhan Liu
@ 2015-10-22 14:19               ` Michael S. Tsirkin
  2015-10-23  8:02                 ` Yuanhan Liu
  0 siblings, 1 reply; 66+ messages in thread
From: Michael S. Tsirkin @ 2015-10-22 14:19 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev, marcel

On Thu, Oct 22, 2015 at 10:07:10PM +0800, Yuanhan Liu wrote:
> On Thu, Oct 22, 2015 at 02:32:31PM +0300, Michael S. Tsirkin wrote:
> > On Thu, Oct 22, 2015 at 05:49:55PM +0800, Yuanhan Liu wrote:
> > > On Wed, Oct 21, 2015 at 05:26:18PM +0300, Michael S. Tsirkin wrote:
> > > > On Wed, Oct 21, 2015 at 08:48:15PM +0800, Yuanhan Liu wrote:
> > > > > > Please note that for virtio devices, guest is supposed to
> > > > > > control the placement of incoming packets in RX queues.
> > > > > 
> > > > > I may not follow you.
> > > > > 
> > > > > Enqueuing packets to a RX queue is done at vhost lib, outside the
> > > > > guest, how could the guest take the control here?
> > > > > 
> > > > > 	--yliu
> > > > 
> > > > vhost should do what guest told it to.
> > > > 
> > > > See virtio spec:
> > > > 	5.1.6.5.5 Automatic receive steering in multiqueue mode
> > > 
> > > Spec says:
> > > 
> > >     After the driver transmitted a packet of a flow on transmitqX,
> > >     the device SHOULD cause incoming packets for that flow to be
> > >     steered to receiveqX.
> > > 
> > > 
> > > Michael, I still have no idea how vhost could know the flow even
> > > after discussion with Huawei. Could you be more specific about
> > > this? Say, how could guest know that? And how could guest tell
> > > vhost which RX is gonna to use?
> > > 
> > > Thanks.
> > > 
> > > 	--yliu
> > 
> > I don't really understand the question.
> > 
> > When guests transmits a packet, it makes a decision
> > about the flow to use, and maps that to a tx/rx pair of queues.
> > 
> > It sends packets out on the tx queue and expects device to
> > return packets from the same flow on the rx queue.
> > 
> > During transmit, device needs to figure out the flow
> > of packets as they are received from guest, and track
> > which flows go on which tx queue.
> > When it selects the rx queue, it has to use the same table.
> 
> Thanks for the length explanation, Michael!
> 
> I guess the key is are we able to get the table inside vhost-user
> lib? And, are you looking for something like following?
> 
> 	static int rte_vhost_enqueue_burst(pkts)
> 	{
> 		for_each_pkts(pkt) {
> 			int rxq = get_rxq_from_table(pkt);
> 	
> 			queue_to_rxq(pkt, rxq);
> 		}
> 	}
> 
> BTW, there should be such implementation at some where, right?
> If so, would you please point it to me?

See tun_flow_update in drivers/net/tun.c in Linux.


> In the meanwhile, I will read more doc/code to try to understand
> it.
> 
> 	--yliu
> 
> > 
> > There is currently no provision for controlling
> > steering for uni-directional
> > flows which are possible e.g. with UDP.
> > 
> > We might solve this in a future spec - for example, set a flag notifying
> > guest that steering information is missing for a given flow, for example
> > by setting a flag in a packet, or using the command queue, and have
> > guest send a dummy empty packet to set steering rule for this flow.
> > 
> > 
> > -- 
> > MST

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v7 4/8] vhost: rxtx: use queue id instead of constant ring index
  2015-10-22 14:19               ` Michael S. Tsirkin
@ 2015-10-23  8:02                 ` Yuanhan Liu
  0 siblings, 0 replies; 66+ messages in thread
From: Yuanhan Liu @ 2015-10-23  8:02 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: dev, marcel

On Thu, Oct 22, 2015 at 05:19:01PM +0300, Michael S. Tsirkin wrote:
> On Thu, Oct 22, 2015 at 10:07:10PM +0800, Yuanhan Liu wrote:
> > On Thu, Oct 22, 2015 at 02:32:31PM +0300, Michael S. Tsirkin wrote:
> > > On Thu, Oct 22, 2015 at 05:49:55PM +0800, Yuanhan Liu wrote:
> > > > On Wed, Oct 21, 2015 at 05:26:18PM +0300, Michael S. Tsirkin wrote:
> > > > > On Wed, Oct 21, 2015 at 08:48:15PM +0800, Yuanhan Liu wrote:
> > > > > > > Please note that for virtio devices, guest is supposed to
> > > > > > > control the placement of incoming packets in RX queues.
> > > > > > 
> > > > > > I may not follow you.
> > > > > > 
> > > > > > Enqueuing packets to a RX queue is done at vhost lib, outside the
> > > > > > guest, how could the guest take the control here?
> > > > > > 
> > > > > > 	--yliu
> > > > > 
> > > > > vhost should do what guest told it to.
> > > > > 
> > > > > See virtio spec:
> > > > > 	5.1.6.5.5 Automatic receive steering in multiqueue mode
> > > > 
> > > > Spec says:
> > > > 
> > > >     After the driver transmitted a packet of a flow on transmitqX,
> > > >     the device SHOULD cause incoming packets for that flow to be
> > > >     steered to receiveqX.
> > > > 
> > > > 
> > > > Michael, I still have no idea how vhost could know the flow even
> > > > after discussion with Huawei. Could you be more specific about
> > > > this? Say, how could guest know that? And how could guest tell
> > > > vhost which RX is gonna to use?
> > > > 
> > > > Thanks.
> > > > 
> > > > 	--yliu
> > > 
> > > I don't really understand the question.
> > > 
> > > When guests transmits a packet, it makes a decision
> > > about the flow to use, and maps that to a tx/rx pair of queues.
> > > 
> > > It sends packets out on the tx queue and expects device to
> > > return packets from the same flow on the rx queue.
> > > 
> > > During transmit, device needs to figure out the flow
> > > of packets as they are received from guest, and track
> > > which flows go on which tx queue.
> > > When it selects the rx queue, it has to use the same table.
> > 
> > Thanks for the length explanation, Michael!
> > 
> > I guess the key is are we able to get the table inside vhost-user
> > lib? And, are you looking for something like following?
> > 
> > 	static int rte_vhost_enqueue_burst(pkts)
> > 	{
> > 		for_each_pkts(pkt) {
> > 			int rxq = get_rxq_from_table(pkt);
> > 	
> > 			queue_to_rxq(pkt, rxq);
> > 		}
> > 	}
> > 
> > BTW, there should be such implementation at some where, right?
> > If so, would you please point it to me?
> 
> See tun_flow_update in drivers/net/tun.c in Linux.

Thanks.

We had a discussion today, and we need implement that. However, the
v2.2 merge window is pretty near the end so far. So it's unlikely we
could make it in this release. We may add it in v2.3.


	--yliu
> 
> 
> > In the meanwhile, I will read more doc/code to try to understand
> > it.
> > 
> > 	--yliu
> > 
> > > 
> > > There is currently no provision for controlling
> > > steering for uni-directional
> > > flows which are possible e.g. with UDP.
> > > 
> > > We might solve this in a future spec - for example, set a flag notifying
> > > guest that steering information is missing for a given flow, for example
> > > by setting a flag in a packet, or using the command queue, and have
> > > guest send a dummy empty packet to set steering rule for this flow.
> > > 
> > > 
> > > -- 
> > > MST

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v7 4/8] vhost: rxtx: use queue id instead of constant ring index
  2015-10-22 11:32           ` Michael S. Tsirkin
  2015-10-22 14:07             ` Yuanhan Liu
@ 2015-10-24  2:34             ` Flavio Leitner
  2015-10-24 17:47               ` Michael S. Tsirkin
  1 sibling, 1 reply; 66+ messages in thread
From: Flavio Leitner @ 2015-10-24  2:34 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: dev, marcel

On Thu, Oct 22, 2015 at 02:32:31PM +0300, Michael S. Tsirkin wrote:
> On Thu, Oct 22, 2015 at 05:49:55PM +0800, Yuanhan Liu wrote:
> > On Wed, Oct 21, 2015 at 05:26:18PM +0300, Michael S. Tsirkin wrote:
> > > On Wed, Oct 21, 2015 at 08:48:15PM +0800, Yuanhan Liu wrote:
> > > > > Please note that for virtio devices, guest is supposed to
> > > > > control the placement of incoming packets in RX queues.
> > > > 
> > > > I may not follow you.
> > > > 
> > > > Enqueuing packets to a RX queue is done at vhost lib, outside the
> > > > guest, how could the guest take the control here?
> > > > 
> > > > 	--yliu
> > > 
> > > vhost should do what guest told it to.
> > > 
> > > See virtio spec:
> > > 	5.1.6.5.5 Automatic receive steering in multiqueue mode
> > 
> > Spec says:
> > 
> >     After the driver transmitted a packet of a flow on transmitqX,
> >     the device SHOULD cause incoming packets for that flow to be
> >     steered to receiveqX.
> > 
> > 
> > Michael, I still have no idea how vhost could know the flow even
> > after discussion with Huawei. Could you be more specific about
> > this? Say, how could guest know that? And how could guest tell
> > vhost which RX is gonna to use?
> > 
> > Thanks.
> > 
> > 	--yliu
> 
> I don't really understand the question.
> 
> When guests transmits a packet, it makes a decision
> about the flow to use, and maps that to a tx/rx pair of queues.
> 
> It sends packets out on the tx queue and expects device to
> return packets from the same flow on the rx queue.

Why? I can understand that there should be a mapping between
flows and queues in a way that there is no re-ordering, but
I can't see the relation of receiving a flow with a TX queue.

fbl
 
> During transmit, device needs to figure out the flow
> of packets as they are received from guest, and track
> which flows go on which tx queue.
> When it selects the rx queue, it has to use the same table.
> 
> There is currently no provision for controlling
> steering for uni-directional
> flows which are possible e.g. with UDP.
> 
> We might solve this in a future spec - for example, set a flag notifying
> guest that steering information is missing for a given flow, for example
> by setting a flag in a packet, or using the command queue, and have
> guest send a dummy empty packet to set steering rule for this flow.
> 
> 
> -- 
> MST
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v7 4/8] vhost: rxtx: use queue id instead of constant ring index
  2015-10-24  2:34             ` Flavio Leitner
@ 2015-10-24 17:47               ` Michael S. Tsirkin
  2015-10-28 20:30                 ` Flavio Leitner
  0 siblings, 1 reply; 66+ messages in thread
From: Michael S. Tsirkin @ 2015-10-24 17:47 UTC (permalink / raw)
  To: Flavio Leitner; +Cc: dev, marcel

On Sat, Oct 24, 2015 at 12:34:08AM -0200, Flavio Leitner wrote:
> On Thu, Oct 22, 2015 at 02:32:31PM +0300, Michael S. Tsirkin wrote:
> > On Thu, Oct 22, 2015 at 05:49:55PM +0800, Yuanhan Liu wrote:
> > > On Wed, Oct 21, 2015 at 05:26:18PM +0300, Michael S. Tsirkin wrote:
> > > > On Wed, Oct 21, 2015 at 08:48:15PM +0800, Yuanhan Liu wrote:
> > > > > > Please note that for virtio devices, guest is supposed to
> > > > > > control the placement of incoming packets in RX queues.
> > > > > 
> > > > > I may not follow you.
> > > > > 
> > > > > Enqueuing packets to a RX queue is done at vhost lib, outside the
> > > > > guest, how could the guest take the control here?
> > > > > 
> > > > > 	--yliu
> > > > 
> > > > vhost should do what guest told it to.
> > > > 
> > > > See virtio spec:
> > > > 	5.1.6.5.5 Automatic receive steering in multiqueue mode
> > > 
> > > Spec says:
> > > 
> > >     After the driver transmitted a packet of a flow on transmitqX,
> > >     the device SHOULD cause incoming packets for that flow to be
> > >     steered to receiveqX.
> > > 
> > > 
> > > Michael, I still have no idea how vhost could know the flow even
> > > after discussion with Huawei. Could you be more specific about
> > > this? Say, how could guest know that? And how could guest tell
> > > vhost which RX is gonna to use?
> > > 
> > > Thanks.
> > > 
> > > 	--yliu
> > 
> > I don't really understand the question.
> > 
> > When guests transmits a packet, it makes a decision
> > about the flow to use, and maps that to a tx/rx pair of queues.
> > 
> > It sends packets out on the tx queue and expects device to
> > return packets from the same flow on the rx queue.
> 
> Why? I can understand that there should be a mapping between
> flows and queues in a way that there is no re-ordering, but
> I can't see the relation of receiving a flow with a TX queue.
> 
> fbl

That's the way virtio chose to program the rx steering logic.

It's low overhead (no special commands), and
works well for TCP when user is an endpoint since rx and tx
for tcp are generally tied (because of ack handling).

We can discuss other ways, e.g. special commands for guests to
program steering.
We'd have to first see some data showing the current scheme
is problematic somehow.


> > During transmit, device needs to figure out the flow
> > of packets as they are received from guest, and track
> > which flows go on which tx queue.
> > When it selects the rx queue, it has to use the same table.
> > 
> > There is currently no provision for controlling
> > steering for uni-directional
> > flows which are possible e.g. with UDP.
> > 
> > We might solve this in a future spec - for example, set a flag notifying
> > guest that steering information is missing for a given flow, for example
> > by setting a flag in a packet, or using the command queue, and have
> > guest send a dummy empty packet to set steering rule for this flow.
> > 
> > 
> > -- 
> > MST
> > 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v8 0/8] vhost-user multiple queues enabling
  2015-10-22 12:35 ` [dpdk-dev] [PATCH v8 0/8] vhost-user multiple queues enabling Yuanhan Liu
                     ` (7 preceding siblings ...)
  2015-10-22 12:35   ` [dpdk-dev] [PATCH v8 8/8] doc: update release note for vhost-user mq support Yuanhan Liu
@ 2015-10-26  1:36   ` Xie, Huawei
  2015-10-26  3:09     ` Yuanhan Liu
  2015-10-26 20:26     ` Thomas Monjalon
  8 siblings, 2 replies; 66+ messages in thread
From: Xie, Huawei @ 2015-10-26  1:36 UTC (permalink / raw)
  To: Yuanhan Liu, dev; +Cc: marcel, Michael S. Tsirkin

On 10/22/2015 8:35 PM, Yuanhan Liu wrote:
> This patch set enables vhost-user multiple queue feature.
>
>
> ---
> Changchun Ouyang (3):
>   vhost: rxtx: use queue id instead of constant ring index
>   virtio: fix deadloop due to reading virtio_net_config incorrectly
>   vhost: add VHOST_USER_SET_VRING_ENABLE message
>
> Yuanhan Liu (5):
>   vhost-user: add protocol features support
>   vhost-user: add VHOST_USER_GET_QUEUE_NUM message
>   vhost: vring queue setup for multiple queue support
>   vhost-user: enable vhost-user multiple queue
>   doc: update release note for vhost-user mq support
>
>  doc/guides/rel_notes/release_2_2.rst          |   4 +
>  drivers/net/virtio/virtio_ethdev.c            |  16 ++-
>  lib/librte_vhost/rte_virtio_net.h             |  13 +-
>  lib/librte_vhost/vhost_rxtx.c                 |  53 +++++---
>  lib/librte_vhost/vhost_user/vhost-net-user.c  |  25 +++-
>  lib/librte_vhost/vhost_user/vhost-net-user.h  |   4 +
>  lib/librte_vhost/vhost_user/virtio-net-user.c |  86 ++++++++++---
>  lib/librte_vhost/vhost_user/virtio-net-user.h |  10 ++
>  lib/librte_vhost/virtio-net.c                 | 168 ++++++++++++++++----------
>  9 files changed, 275 insertions(+), 104 deletions(-)
>

Btw, Changchun's patch: "virtio: fix deadloop due to reading
virtio_net_config incorrectly" isn't included, so probably, you could
remove this from this cover-letter.

Acked-by: Huawei Xie <huawei.xie@intel.com>


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v8 0/8] vhost-user multiple queues enabling
  2015-10-26  1:36   ` [dpdk-dev] [PATCH v8 0/8] vhost-user multiple queues enabling Xie, Huawei
@ 2015-10-26  3:09     ` Yuanhan Liu
  2015-10-26 20:26     ` Thomas Monjalon
  1 sibling, 0 replies; 66+ messages in thread
From: Yuanhan Liu @ 2015-10-26  3:09 UTC (permalink / raw)
  To: Xie, Huawei; +Cc: dev, marcel, Michael S. Tsirkin

On Mon, Oct 26, 2015 at 01:36:10AM +0000, Xie, Huawei wrote:
> On 10/22/2015 8:35 PM, Yuanhan Liu wrote:
> > This patch set enables vhost-user multiple queue feature.
> >
> >
> > ---
> > Changchun Ouyang (3):
> >   vhost: rxtx: use queue id instead of constant ring index
> >   virtio: fix deadloop due to reading virtio_net_config incorrectly
> >   vhost: add VHOST_USER_SET_VRING_ENABLE message
> >
> > Yuanhan Liu (5):
> >   vhost-user: add protocol features support
> >   vhost-user: add VHOST_USER_GET_QUEUE_NUM message
> >   vhost: vring queue setup for multiple queue support
> >   vhost-user: enable vhost-user multiple queue
> >   doc: update release note for vhost-user mq support
> >
> >  doc/guides/rel_notes/release_2_2.rst          |   4 +
> >  drivers/net/virtio/virtio_ethdev.c            |  16 ++-
> >  lib/librte_vhost/rte_virtio_net.h             |  13 +-
> >  lib/librte_vhost/vhost_rxtx.c                 |  53 +++++---
> >  lib/librte_vhost/vhost_user/vhost-net-user.c  |  25 +++-
> >  lib/librte_vhost/vhost_user/vhost-net-user.h  |   4 +
> >  lib/librte_vhost/vhost_user/virtio-net-user.c |  86 ++++++++++---
> >  lib/librte_vhost/vhost_user/virtio-net-user.h |  10 ++
> >  lib/librte_vhost/virtio-net.c                 | 168 ++++++++++++++++----------
> >  9 files changed, 275 insertions(+), 104 deletions(-)
> >
> 
> Btw, Changchun's patch: "virtio: fix deadloop due to reading
> virtio_net_config incorrectly" isn't included, so probably, you could
> remove this from this cover-letter.

What do you mean by "isn't included"?

	[PATCH v8 5/8] virtio: fix deadloop due to reading virtio_net_config incorrectly

> Acked-by: Huawei Xie <huawei.xie@intel.com>

Thanks.

	--yliu

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v8 3/8] vhost: vring queue setup for multiple queue support
  2015-10-22 12:35   ` [dpdk-dev] [PATCH v8 3/8] vhost: vring queue setup for multiple queue support Yuanhan Liu
@ 2015-10-26  5:24     ` Tetsuya Mukawa
  2015-10-26  5:42       ` Yuanhan Liu
  0 siblings, 1 reply; 66+ messages in thread
From: Tetsuya Mukawa @ 2015-10-26  5:24 UTC (permalink / raw)
  To: Yuanhan Liu, dev; +Cc: marcel, Michael S. Tsirkin

On 2015/10/22 21:35, Yuanhan Liu wrote:
> All queue pairs, including the default (the first) queue pair,
> are allocated dynamically, when a vring_call message is received
> first time for a specific queue pair.
>
> This is a refactor work for enabling vhost-user multiple queue;
> it should not break anything as it does no functional changes:
> we don't support mq set, so there is only one mq at max.
>
> This patch is based on Changchun's patch.
>
> Signed-off-by: Ouyang Changchun <changchun.ouyang@intel.com>
> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> Acked-by: Flavio Leitner <fbl@sysclose.org>
>
> ---
>
> v8: - move virtuque field to the end of `virtio_net' struct.
>
>     - Add a FIXME at set_vring_call() for doing vring queue pair
>       allocation.
> ---
>  lib/librte_vhost/rte_virtio_net.h             |   3 +-
>  lib/librte_vhost/vhost_user/virtio-net-user.c |  46 ++++----
>  lib/librte_vhost/virtio-net.c                 | 156 ++++++++++++++++----------
>  3 files changed, 123 insertions(+), 82 deletions(-)
>
> diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
> index e3a21e5..9a32a95 100644
> --- a/lib/librte_vhost/rte_virtio_net.h
> +++ b/lib/librte_vhost/rte_virtio_net.h
> @@ -96,7 +96,6 @@ struct vhost_virtqueue {
>   * Device structure contains all configuration information relating to the device.
>   */
>  struct virtio_net {
> -	struct vhost_virtqueue	*virtqueue[VIRTIO_QNUM];	/**< Contains all virtqueue information. */
>  	struct virtio_memory	*mem;		/**< QEMU memory and memory region information. */
>  	uint64_t		features;	/**< Negotiated feature set. */
>  	uint64_t		protocol_features;	/**< Negotiated protocol feature set. */
> @@ -104,7 +103,9 @@ struct virtio_net {
>  	uint32_t		flags;		/**< Device flags. Only used to check if device is running on data core. */
>  #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ)
>  	char			ifname[IF_NAME_SZ];	/**< Name of the tap device or socket path. */
> +	uint32_t		virt_qp_nb;	/**< number of queue pair we have allocated */
>  	void			*priv;		/**< private context */
> +	struct vhost_virtqueue	*virtqueue[VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX];	/**< Contains all virtqueue information. */
>  } __rte_cache_aligned;
>  
>  /**
> diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c b/lib/librte_vhost/vhost_user/virtio-net-user.c
> index 6da729d..d62f3d7 100644
> --- a/lib/librte_vhost/vhost_user/virtio-net-user.c
> +++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
> @@ -206,25 +206,33 @@ err_mmap:
>  }
>  
>  static int
> +vq_is_ready(struct vhost_virtqueue *vq)
> +{
> +	return vq && vq->desc   &&
> +	       vq->kickfd != -1 &&
> +	       vq->callfd != -1;
> +}
> +
> +static int
>  virtio_is_ready(struct virtio_net *dev)
>  {
>  	struct vhost_virtqueue *rvq, *tvq;
> +	uint32_t i;
>  
> -	/* mq support in future.*/
> -	rvq = dev->virtqueue[VIRTIO_RXQ];
> -	tvq = dev->virtqueue[VIRTIO_TXQ];
> -	if (rvq && tvq && rvq->desc && tvq->desc &&
> -		(rvq->kickfd != -1) &&
> -		(rvq->callfd != -1) &&
> -		(tvq->kickfd != -1) &&
> -		(tvq->callfd != -1)) {
> -		RTE_LOG(INFO, VHOST_CONFIG,
> -			"virtio is now ready for processing.\n");
> -		return 1;
> +	for (i = 0; i < dev->virt_qp_nb; i++) {
> +		rvq = dev->virtqueue[i * VIRTIO_QNUM + VIRTIO_RXQ];
> +		tvq = dev->virtqueue[i * VIRTIO_QNUM + VIRTIO_TXQ];
> +
> +		if (!vq_is_ready(rvq) || !vq_is_ready(tvq)) {
> +			RTE_LOG(INFO, VHOST_CONFIG,
> +				"virtio is not ready for processing.\n");
> +			return 0;
> +		}
>  	}
> +
>  	RTE_LOG(INFO, VHOST_CONFIG,
> -		"virtio isn't ready for processing.\n");
> -	return 0;
> +		"virtio is now ready for processing.\n");
> +	return 1;
>  }
>  
>  void
> @@ -292,13 +300,13 @@ user_get_vring_base(struct vhost_device_ctx ctx,
>  	 * sent and only sent in vhost_vring_stop.
>  	 * TODO: cleanup the vring, it isn't usable since here.
>  	 */
> -	if ((dev->virtqueue[VIRTIO_RXQ]->kickfd) >= 0) {
> -		close(dev->virtqueue[VIRTIO_RXQ]->kickfd);
> -		dev->virtqueue[VIRTIO_RXQ]->kickfd = -1;
> +	if ((dev->virtqueue[state->index]->kickfd + VIRTIO_RXQ) >= 0) {
> +		close(dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd);
> +		dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd = -1;
>  	}

Hi Yuanhan,

Please let me make sure whether below is correct.
    if ((dev->virtqueue[state->index]->kickfd + VIRTIO_RXQ) >= 0) {

> -	if ((dev->virtqueue[VIRTIO_TXQ]->kickfd) >= 0) {
> -		close(dev->virtqueue[VIRTIO_TXQ]->kickfd);
> -		dev->virtqueue[VIRTIO_TXQ]->kickfd = -1;
> +	if ((dev->virtqueue[state->index]->kickfd + VIRTIO_TXQ) >= 0) {
> +		close(dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd);
> +		dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd = -1;

Also, same question here.

Thanks,
Tetsuya

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v8 3/8] vhost: vring queue setup for multiple queue support
  2015-10-26  5:24     ` Tetsuya Mukawa
@ 2015-10-26  5:42       ` Yuanhan Liu
  2015-10-27  6:20         ` Tetsuya Mukawa
  0 siblings, 1 reply; 66+ messages in thread
From: Yuanhan Liu @ 2015-10-26  5:42 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: dev, marcel, Michael S. Tsirkin

On Mon, Oct 26, 2015 at 02:24:08PM +0900, Tetsuya Mukawa wrote:
> On 2015/10/22 21:35, Yuanhan Liu wrote:
...
> > @@ -292,13 +300,13 @@ user_get_vring_base(struct vhost_device_ctx ctx,
> >  	 * sent and only sent in vhost_vring_stop.
> >  	 * TODO: cleanup the vring, it isn't usable since here.
> >  	 */
> > -	if ((dev->virtqueue[VIRTIO_RXQ]->kickfd) >= 0) {
> > -		close(dev->virtqueue[VIRTIO_RXQ]->kickfd);
> > -		dev->virtqueue[VIRTIO_RXQ]->kickfd = -1;
> > +	if ((dev->virtqueue[state->index]->kickfd + VIRTIO_RXQ) >= 0) {
> > +		close(dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd);
> > +		dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd = -1;
> >  	}
> 
> Hi Yuanhan,
> 
> Please let me make sure whether below is correct.
>     if ((dev->virtqueue[state->index]->kickfd + VIRTIO_RXQ) >= 0) {
> 
> > -	if ((dev->virtqueue[VIRTIO_TXQ]->kickfd) >= 0) {
> > -		close(dev->virtqueue[VIRTIO_TXQ]->kickfd);
> > -		dev->virtqueue[VIRTIO_TXQ]->kickfd = -1;
> > +	if ((dev->virtqueue[state->index]->kickfd + VIRTIO_TXQ) >= 0) {
> > +		close(dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd);
> > +		dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd = -1;
> 
> Also, same question here.

Oops, silly typos... Thanks for catching it out!

Here is an update patch (Thomas, please let me know if you prefer me
to send the whole patchset for you to apply):

-- >8 --
>From 2b7d8155b6c9f37bffcbb220e87f7634f329acee Mon Sep 17 00:00:00 2001
From: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Date: Fri, 18 Sep 2015 16:01:10 +0800
Subject: [PATCH] vhost: vring queue setup for multiple queue support

All queue pairs, including the default (the first) queue pair,
are allocated dynamically, when a vring_call message is received
first time for a specific queue pair.

This is a refactor work for enabling vhost-user multiple queue;
it should not break anything as it does no functional changes:
we don't support mq set, so there is only one mq at max.

This patch is based on Changchun's patch.

Signed-off-by: Ouyang Changchun <changchun.ouyang@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>

---

v9: - fix silly error "dev->virtqueue[state->index]->kickfd + VIRTIO_RXQ"

v8: - move virtuque field to the end of `virtio_net' struct.

    - Add a FIXME at set_vring_call() for doing vring queue pair
      allocation.
---
 lib/librte_vhost/rte_virtio_net.h             |   3 +-
 lib/librte_vhost/vhost_user/virtio-net-user.c |  46 ++++----
 lib/librte_vhost/virtio-net.c                 | 156 ++++++++++++++++----------
 3 files changed, 123 insertions(+), 82 deletions(-)

diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
index e3a21e5..9a32a95 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -96,7 +96,6 @@ struct vhost_virtqueue {
  * Device structure contains all configuration information relating to the device.
  */
 struct virtio_net {
-	struct vhost_virtqueue	*virtqueue[VIRTIO_QNUM];	/**< Contains all virtqueue information. */
 	struct virtio_memory	*mem;		/**< QEMU memory and memory region information. */
 	uint64_t		features;	/**< Negotiated feature set. */
 	uint64_t		protocol_features;	/**< Negotiated protocol feature set. */
@@ -104,7 +103,9 @@ struct virtio_net {
 	uint32_t		flags;		/**< Device flags. Only used to check if device is running on data core. */
 #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ)
 	char			ifname[IF_NAME_SZ];	/**< Name of the tap device or socket path. */
+	uint32_t		virt_qp_nb;	/**< number of queue pair we have allocated */
 	void			*priv;		/**< private context */
+	struct vhost_virtqueue	*virtqueue[VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX];	/**< Contains all virtqueue information. */
 } __rte_cache_aligned;
 
 /**
diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c b/lib/librte_vhost/vhost_user/virtio-net-user.c
index 6da729d..7fc3805 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.c
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
@@ -206,25 +206,33 @@ err_mmap:
 }
 
 static int
+vq_is_ready(struct vhost_virtqueue *vq)
+{
+	return vq && vq->desc   &&
+	       vq->kickfd != -1 &&
+	       vq->callfd != -1;
+}
+
+static int
 virtio_is_ready(struct virtio_net *dev)
 {
 	struct vhost_virtqueue *rvq, *tvq;
+	uint32_t i;
 
-	/* mq support in future.*/
-	rvq = dev->virtqueue[VIRTIO_RXQ];
-	tvq = dev->virtqueue[VIRTIO_TXQ];
-	if (rvq && tvq && rvq->desc && tvq->desc &&
-		(rvq->kickfd != -1) &&
-		(rvq->callfd != -1) &&
-		(tvq->kickfd != -1) &&
-		(tvq->callfd != -1)) {
-		RTE_LOG(INFO, VHOST_CONFIG,
-			"virtio is now ready for processing.\n");
-		return 1;
+	for (i = 0; i < dev->virt_qp_nb; i++) {
+		rvq = dev->virtqueue[i * VIRTIO_QNUM + VIRTIO_RXQ];
+		tvq = dev->virtqueue[i * VIRTIO_QNUM + VIRTIO_TXQ];
+
+		if (!vq_is_ready(rvq) || !vq_is_ready(tvq)) {
+			RTE_LOG(INFO, VHOST_CONFIG,
+				"virtio is not ready for processing.\n");
+			return 0;
+		}
 	}
+
 	RTE_LOG(INFO, VHOST_CONFIG,
-		"virtio isn't ready for processing.\n");
-	return 0;
+		"virtio is now ready for processing.\n");
+	return 1;
 }
 
 void
@@ -292,13 +300,13 @@ user_get_vring_base(struct vhost_device_ctx ctx,
 	 * sent and only sent in vhost_vring_stop.
 	 * TODO: cleanup the vring, it isn't usable since here.
 	 */
-	if ((dev->virtqueue[VIRTIO_RXQ]->kickfd) >= 0) {
-		close(dev->virtqueue[VIRTIO_RXQ]->kickfd);
-		dev->virtqueue[VIRTIO_RXQ]->kickfd = -1;
+	if (dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd >= 0) {
+		close(dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd);
+		dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd = -1;
 	}
-	if ((dev->virtqueue[VIRTIO_TXQ]->kickfd) >= 0) {
-		close(dev->virtqueue[VIRTIO_TXQ]->kickfd);
-		dev->virtqueue[VIRTIO_TXQ]->kickfd = -1;
+	if (dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd >= 0) {
+		close(dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd);
+		dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd = -1;
 	}
 
 	return 0;
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 830f22a..772f835 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -36,6 +36,7 @@
 #include <stddef.h>
 #include <stdint.h>
 #include <stdlib.h>
+#include <assert.h>
 #include <sys/mman.h>
 #include <unistd.h>
 #ifdef RTE_LIBRTE_VHOST_NUMA
@@ -178,6 +179,15 @@ add_config_ll_entry(struct virtio_net_config_ll *new_ll_dev)
 
 }
 
+static void
+cleanup_vq(struct vhost_virtqueue *vq)
+{
+	if (vq->callfd >= 0)
+		close(vq->callfd);
+	if (vq->kickfd >= 0)
+		close(vq->kickfd);
+}
+
 /*
  * Unmap any memory, close any file descriptors and
  * free any memory owned by a device.
@@ -185,6 +195,8 @@ add_config_ll_entry(struct virtio_net_config_ll *new_ll_dev)
 static void
 cleanup_device(struct virtio_net *dev)
 {
+	uint32_t i;
+
 	/* Unmap QEMU memory file if mapped. */
 	if (dev->mem) {
 		munmap((void *)(uintptr_t)dev->mem->mapped_address,
@@ -192,15 +204,10 @@ cleanup_device(struct virtio_net *dev)
 		free(dev->mem);
 	}
 
-	/* Close any event notifiers opened by device. */
-	if (dev->virtqueue[VIRTIO_RXQ]->callfd >= 0)
-		close(dev->virtqueue[VIRTIO_RXQ]->callfd);
-	if (dev->virtqueue[VIRTIO_RXQ]->kickfd >= 0)
-		close(dev->virtqueue[VIRTIO_RXQ]->kickfd);
-	if (dev->virtqueue[VIRTIO_TXQ]->callfd >= 0)
-		close(dev->virtqueue[VIRTIO_TXQ]->callfd);
-	if (dev->virtqueue[VIRTIO_TXQ]->kickfd >= 0)
-		close(dev->virtqueue[VIRTIO_TXQ]->kickfd);
+	for (i = 0; i < dev->virt_qp_nb; i++) {
+		cleanup_vq(dev->virtqueue[i * VIRTIO_QNUM + VIRTIO_RXQ]);
+		cleanup_vq(dev->virtqueue[i * VIRTIO_QNUM + VIRTIO_TXQ]);
+	}
 }
 
 /*
@@ -209,9 +216,11 @@ cleanup_device(struct virtio_net *dev)
 static void
 free_device(struct virtio_net_config_ll *ll_dev)
 {
-	/* Free any malloc'd memory */
-	rte_free(ll_dev->dev.virtqueue[VIRTIO_RXQ]);
-	rte_free(ll_dev->dev.virtqueue[VIRTIO_TXQ]);
+	uint32_t i;
+
+	for (i = 0; i < ll_dev->dev.virt_qp_nb; i++)
+		rte_free(ll_dev->dev.virtqueue[i * VIRTIO_QNUM]);
+
 	rte_free(ll_dev);
 }
 
@@ -244,34 +253,68 @@ rm_config_ll_entry(struct virtio_net_config_ll *ll_dev,
 	}
 }
 
+static void
+init_vring_queue(struct vhost_virtqueue *vq)
+{
+	memset(vq, 0, sizeof(struct vhost_virtqueue));
+
+	vq->kickfd = -1;
+	vq->callfd = -1;
+
+	/* Backends are set to -1 indicating an inactive device. */
+	vq->backend = -1;
+}
+
+static void
+init_vring_queue_pair(struct virtio_net *dev, uint32_t qp_idx)
+{
+	init_vring_queue(dev->virtqueue[qp_idx * VIRTIO_QNUM + VIRTIO_RXQ]);
+	init_vring_queue(dev->virtqueue[qp_idx * VIRTIO_QNUM + VIRTIO_TXQ]);
+}
+
+static int
+alloc_vring_queue_pair(struct virtio_net *dev, uint32_t qp_idx)
+{
+	struct vhost_virtqueue *virtqueue = NULL;
+	uint32_t virt_rx_q_idx = qp_idx * VIRTIO_QNUM + VIRTIO_RXQ;
+	uint32_t virt_tx_q_idx = qp_idx * VIRTIO_QNUM + VIRTIO_TXQ;
+
+	virtqueue = rte_malloc(NULL,
+			       sizeof(struct vhost_virtqueue) * VIRTIO_QNUM, 0);
+	if (virtqueue == NULL) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"Failed to allocate memory for virt qp:%d.\n", qp_idx);
+		return -1;
+	}
+
+	dev->virtqueue[virt_rx_q_idx] = virtqueue;
+	dev->virtqueue[virt_tx_q_idx] = virtqueue + VIRTIO_TXQ;
+
+	init_vring_queue_pair(dev, qp_idx);
+
+	dev->virt_qp_nb += 1;
+
+	return 0;
+}
+
 /*
  *  Initialise all variables in device structure.
  */
 static void
 init_device(struct virtio_net *dev)
 {
-	uint64_t vq_offset;
+	int vq_offset;
+	uint32_t i;
 
 	/*
 	 * Virtqueues have already been malloced so
 	 * we don't want to set them to NULL.
 	 */
-	vq_offset = offsetof(struct virtio_net, mem);
-
-	/* Set everything to 0. */
-	memset((void *)(uintptr_t)((uint64_t)(uintptr_t)dev + vq_offset), 0,
-		(sizeof(struct virtio_net) - (size_t)vq_offset));
-	memset(dev->virtqueue[VIRTIO_RXQ], 0, sizeof(struct vhost_virtqueue));
-	memset(dev->virtqueue[VIRTIO_TXQ], 0, sizeof(struct vhost_virtqueue));
-
-	dev->virtqueue[VIRTIO_RXQ]->kickfd = -1;
-	dev->virtqueue[VIRTIO_RXQ]->callfd = -1;
-	dev->virtqueue[VIRTIO_TXQ]->kickfd = -1;
-	dev->virtqueue[VIRTIO_TXQ]->callfd = -1;
+	vq_offset = offsetof(struct virtio_net, virtqueue);
+	memset(dev, 0, vq_offset);
 
-	/* Backends are set to -1 indicating an inactive device. */
-	dev->virtqueue[VIRTIO_RXQ]->backend = VIRTIO_DEV_STOPPED;
-	dev->virtqueue[VIRTIO_TXQ]->backend = VIRTIO_DEV_STOPPED;
+	for (i = 0; i < dev->virt_qp_nb; i++)
+		init_vring_queue_pair(dev, i);
 }
 
 /*
@@ -283,7 +326,6 @@ static int
 new_device(struct vhost_device_ctx ctx)
 {
 	struct virtio_net_config_ll *new_ll_dev;
-	struct vhost_virtqueue *virtqueue_rx, *virtqueue_tx;
 
 	/* Setup device and virtqueues. */
 	new_ll_dev = rte_malloc(NULL, sizeof(struct virtio_net_config_ll), 0);
@@ -294,28 +336,6 @@ new_device(struct vhost_device_ctx ctx)
 		return -1;
 	}
 
-	virtqueue_rx = rte_malloc(NULL, sizeof(struct vhost_virtqueue), 0);
-	if (virtqueue_rx == NULL) {
-		rte_free(new_ll_dev);
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") Failed to allocate memory for rxq.\n",
-			ctx.fh);
-		return -1;
-	}
-
-	virtqueue_tx = rte_malloc(NULL, sizeof(struct vhost_virtqueue), 0);
-	if (virtqueue_tx == NULL) {
-		rte_free(virtqueue_rx);
-		rte_free(new_ll_dev);
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") Failed to allocate memory for txq.\n",
-			ctx.fh);
-		return -1;
-	}
-
-	new_ll_dev->dev.virtqueue[VIRTIO_RXQ] = virtqueue_rx;
-	new_ll_dev->dev.virtqueue[VIRTIO_TXQ] = virtqueue_tx;
-
 	/* Initialise device and virtqueues. */
 	init_device(&new_ll_dev->dev);
 
@@ -441,6 +461,8 @@ static int
 set_features(struct vhost_device_ctx ctx, uint64_t *pu)
 {
 	struct virtio_net *dev;
+	uint16_t vhost_hlen;
+	uint16_t i;
 
 	dev = get_device(ctx);
 	if (dev == NULL)
@@ -448,27 +470,26 @@ set_features(struct vhost_device_ctx ctx, uint64_t *pu)
 	if (*pu & ~VHOST_FEATURES)
 		return -1;
 
-	/* Store the negotiated feature list for the device. */
 	dev->features = *pu;
-
-	/* Set the vhost_hlen depending on if VIRTIO_NET_F_MRG_RXBUF is set. */
 	if (dev->features & (1 << VIRTIO_NET_F_MRG_RXBUF)) {
 		LOG_DEBUG(VHOST_CONFIG,
 			"(%"PRIu64") Mergeable RX buffers enabled\n",
 			dev->device_fh);
-		dev->virtqueue[VIRTIO_RXQ]->vhost_hlen =
-			sizeof(struct virtio_net_hdr_mrg_rxbuf);
-		dev->virtqueue[VIRTIO_TXQ]->vhost_hlen =
-			sizeof(struct virtio_net_hdr_mrg_rxbuf);
+		vhost_hlen = sizeof(struct virtio_net_hdr_mrg_rxbuf);
 	} else {
 		LOG_DEBUG(VHOST_CONFIG,
 			"(%"PRIu64") Mergeable RX buffers disabled\n",
 			dev->device_fh);
-		dev->virtqueue[VIRTIO_RXQ]->vhost_hlen =
-			sizeof(struct virtio_net_hdr);
-		dev->virtqueue[VIRTIO_TXQ]->vhost_hlen =
-			sizeof(struct virtio_net_hdr);
+		vhost_hlen = sizeof(struct virtio_net_hdr);
+	}
+
+	for (i = 0; i < dev->virt_qp_nb; i++) {
+		uint16_t base_idx = i * VIRTIO_QNUM;
+
+		dev->virtqueue[base_idx + VIRTIO_RXQ]->vhost_hlen = vhost_hlen;
+		dev->virtqueue[base_idx + VIRTIO_TXQ]->vhost_hlen = vhost_hlen;
 	}
+
 	return 0;
 }
 
@@ -684,13 +705,24 @@ set_vring_call(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
 {
 	struct virtio_net *dev;
 	struct vhost_virtqueue *vq;
+	uint32_t cur_qp_idx = file->index / VIRTIO_QNUM;
 
 	dev = get_device(ctx);
 	if (dev == NULL)
 		return -1;
 
+	/*
+	 * FIXME: VHOST_SET_VRING_CALL is the first per-vring message
+	 * we get, so we do vring queue pair allocation here.
+	 */
+	if (cur_qp_idx + 1 > dev->virt_qp_nb) {
+		if (alloc_vring_queue_pair(dev, cur_qp_idx) < 0)
+			return -1;
+	}
+
 	/* file->index refers to the queue index. The txq is 1, rxq is 0. */
 	vq = dev->virtqueue[file->index];
+	assert(vq != NULL);
 
 	if (vq->callfd >= 0)
 		close(vq->callfd);
-- 
1.9.0

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v8 8/8] doc: update release note for vhost-user mq support
  2015-10-22 12:35   ` [dpdk-dev] [PATCH v8 8/8] doc: update release note for vhost-user mq support Yuanhan Liu
@ 2015-10-26 20:22     ` Thomas Monjalon
  2015-10-27  9:38       ` Yuanhan Liu
  0 siblings, 1 reply; 66+ messages in thread
From: Thomas Monjalon @ 2015-10-26 20:22 UTC (permalink / raw)
  To: dev, Yuanhan Liu

2015-10-22 20:35, Yuanhan Liu:
> +* **vhost: added vhost-user mulitple queue support.**
> +
> +  Added vhost-user multiple queue support.

Excepted the typo, it is the same sentence twice, so not needed.

General comment to every contributors: please avoid making a special commit
to just update the release notes.
There is no log message and it is understandable because it does not
deserve a commit.
It will be merged with the previous one here.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v8 0/8] vhost-user multiple queues enabling
  2015-10-26  1:36   ` [dpdk-dev] [PATCH v8 0/8] vhost-user multiple queues enabling Xie, Huawei
  2015-10-26  3:09     ` Yuanhan Liu
@ 2015-10-26 20:26     ` Thomas Monjalon
  1 sibling, 0 replies; 66+ messages in thread
From: Thomas Monjalon @ 2015-10-26 20:26 UTC (permalink / raw)
  To: dev, Yuanhan Liu; +Cc: marcel, Michael S. Tsirkin

> > Changchun Ouyang (3):
> >   vhost: rxtx: use queue id instead of constant ring index
> >   virtio: fix deadloop due to reading virtio_net_config incorrectly
> >   vhost: add VHOST_USER_SET_VRING_ENABLE message
> >
> > Yuanhan Liu (5):
> >   vhost-user: add protocol features support
> >   vhost-user: add VHOST_USER_GET_QUEUE_NUM message
> >   vhost: vring queue setup for multiple queue support
> >   vhost-user: enable vhost-user multiple queue
> >   doc: update release note for vhost-user mq support
> 
> Acked-by: Huawei Xie <huawei.xie@intel.com>

Applied, thanks

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v8 3/8] vhost: vring queue setup for multiple queue support
  2015-10-26  5:42       ` Yuanhan Liu
@ 2015-10-27  6:20         ` Tetsuya Mukawa
  2015-10-27  9:17           ` Michael S. Tsirkin
  0 siblings, 1 reply; 66+ messages in thread
From: Tetsuya Mukawa @ 2015-10-27  6:20 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev, marcel, Michael S. Tsirkin

On 2015/10/26 14:42, Yuanhan Liu wrote:
> On Mon, Oct 26, 2015 at 02:24:08PM +0900, Tetsuya Mukawa wrote:
>> On 2015/10/22 21:35, Yuanhan Liu wrote:
> ...
>>> @@ -292,13 +300,13 @@ user_get_vring_base(struct vhost_device_ctx ctx,
>>>  	 * sent and only sent in vhost_vring_stop.
>>>  	 * TODO: cleanup the vring, it isn't usable since here.
>>>  	 */
>>> -	if ((dev->virtqueue[VIRTIO_RXQ]->kickfd) >= 0) {
>>> -		close(dev->virtqueue[VIRTIO_RXQ]->kickfd);
>>> -		dev->virtqueue[VIRTIO_RXQ]->kickfd = -1;
>>> +	if ((dev->virtqueue[state->index]->kickfd + VIRTIO_RXQ) >= 0) {
>>> +		close(dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd);
>>> +		dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd = -1;
>>>  	}
>> Hi Yuanhan,
>>
>> Please let me make sure whether below is correct.
>>     if ((dev->virtqueue[state->index]->kickfd + VIRTIO_RXQ) >= 0) {
>>
>>> -	if ((dev->virtqueue[VIRTIO_TXQ]->kickfd) >= 0) {
>>> -		close(dev->virtqueue[VIRTIO_TXQ]->kickfd);
>>> -		dev->virtqueue[VIRTIO_TXQ]->kickfd = -1;
>>> +	if ((dev->virtqueue[state->index]->kickfd + VIRTIO_TXQ) >= 0) {
>>> +		close(dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd);
>>> +		dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd = -1;
>> Also, same question here.
> Oops, silly typos... Thanks for catching it out!
>
> Here is an update patch (Thomas, please let me know if you prefer me
> to send the whole patchset for you to apply):

Hi Yuanhan,

I may miss one more issue here.
Could you please see below patch I've submitted today?
(I may find a similar issue, so I've fixed it also in below patch.)
 
- http://dpdk.org/dev/patchwork/patch/8038/
 
Thanks,
Tetsuya

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v8 3/8] vhost: vring queue setup for multiple queue support
  2015-10-27  6:20         ` Tetsuya Mukawa
@ 2015-10-27  9:17           ` Michael S. Tsirkin
  2015-10-27  9:30             ` Yuanhan Liu
  0 siblings, 1 reply; 66+ messages in thread
From: Michael S. Tsirkin @ 2015-10-27  9:17 UTC (permalink / raw)
  To: Tetsuya Mukawa; +Cc: dev, marcel

On Tue, Oct 27, 2015 at 03:20:40PM +0900, Tetsuya Mukawa wrote:
> On 2015/10/26 14:42, Yuanhan Liu wrote:
> > On Mon, Oct 26, 2015 at 02:24:08PM +0900, Tetsuya Mukawa wrote:
> >> On 2015/10/22 21:35, Yuanhan Liu wrote:
> > ...
> >>> @@ -292,13 +300,13 @@ user_get_vring_base(struct vhost_device_ctx ctx,
> >>>  	 * sent and only sent in vhost_vring_stop.
> >>>  	 * TODO: cleanup the vring, it isn't usable since here.
> >>>  	 */
> >>> -	if ((dev->virtqueue[VIRTIO_RXQ]->kickfd) >= 0) {
> >>> -		close(dev->virtqueue[VIRTIO_RXQ]->kickfd);
> >>> -		dev->virtqueue[VIRTIO_RXQ]->kickfd = -1;
> >>> +	if ((dev->virtqueue[state->index]->kickfd + VIRTIO_RXQ) >= 0) {
> >>> +		close(dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd);
> >>> +		dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd = -1;
> >>>  	}
> >> Hi Yuanhan,
> >>
> >> Please let me make sure whether below is correct.
> >>     if ((dev->virtqueue[state->index]->kickfd + VIRTIO_RXQ) >= 0) {
> >>
> >>> -	if ((dev->virtqueue[VIRTIO_TXQ]->kickfd) >= 0) {
> >>> -		close(dev->virtqueue[VIRTIO_TXQ]->kickfd);
> >>> -		dev->virtqueue[VIRTIO_TXQ]->kickfd = -1;
> >>> +	if ((dev->virtqueue[state->index]->kickfd + VIRTIO_TXQ) >= 0) {
> >>> +		close(dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd);
> >>> +		dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd = -1;
> >> Also, same question here.
> > Oops, silly typos... Thanks for catching it out!
> >
> > Here is an update patch (Thomas, please let me know if you prefer me
> > to send the whole patchset for you to apply):
> 
> Hi Yuanhan,
> 
> I may miss one more issue here.
> Could you please see below patch I've submitted today?
> (I may find a similar issue, so I've fixed it also in below patch.)
>  
> - http://dpdk.org/dev/patchwork/patch/8038/
>  
> Thanks,
> Tetsuya

Looking at that, at least when MQ is enabled, please don't key
stopping queues off GET_VRING_BASE.

There are ENABLE/DISABLE messages for that.

Generally guys, don't take whatever QEMU happens to do for
granted! Look at the protocol spec under doc/specs directory,
if you are making more assumptions you must document them!

-- 
MST

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v8 3/8] vhost: vring queue setup for multiple queue support
  2015-10-27  9:17           ` Michael S. Tsirkin
@ 2015-10-27  9:30             ` Yuanhan Liu
  2015-10-27  9:42               ` Michael S. Tsirkin
  0 siblings, 1 reply; 66+ messages in thread
From: Yuanhan Liu @ 2015-10-27  9:30 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: dev, marcel

On Tue, Oct 27, 2015 at 11:17:16AM +0200, Michael S. Tsirkin wrote:
> On Tue, Oct 27, 2015 at 03:20:40PM +0900, Tetsuya Mukawa wrote:
> > On 2015/10/26 14:42, Yuanhan Liu wrote:
> > > On Mon, Oct 26, 2015 at 02:24:08PM +0900, Tetsuya Mukawa wrote:
> > >> On 2015/10/22 21:35, Yuanhan Liu wrote:
> > > ...
> > >>> @@ -292,13 +300,13 @@ user_get_vring_base(struct vhost_device_ctx ctx,
> > >>>  	 * sent and only sent in vhost_vring_stop.
> > >>>  	 * TODO: cleanup the vring, it isn't usable since here.
> > >>>  	 */
> > >>> -	if ((dev->virtqueue[VIRTIO_RXQ]->kickfd) >= 0) {
> > >>> -		close(dev->virtqueue[VIRTIO_RXQ]->kickfd);
> > >>> -		dev->virtqueue[VIRTIO_RXQ]->kickfd = -1;
> > >>> +	if ((dev->virtqueue[state->index]->kickfd + VIRTIO_RXQ) >= 0) {
> > >>> +		close(dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd);
> > >>> +		dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd = -1;
> > >>>  	}
> > >> Hi Yuanhan,
> > >>
> > >> Please let me make sure whether below is correct.
> > >>     if ((dev->virtqueue[state->index]->kickfd + VIRTIO_RXQ) >= 0) {
> > >>
> > >>> -	if ((dev->virtqueue[VIRTIO_TXQ]->kickfd) >= 0) {
> > >>> -		close(dev->virtqueue[VIRTIO_TXQ]->kickfd);
> > >>> -		dev->virtqueue[VIRTIO_TXQ]->kickfd = -1;
> > >>> +	if ((dev->virtqueue[state->index]->kickfd + VIRTIO_TXQ) >= 0) {
> > >>> +		close(dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd);
> > >>> +		dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd = -1;
> > >> Also, same question here.
> > > Oops, silly typos... Thanks for catching it out!
> > >
> > > Here is an update patch (Thomas, please let me know if you prefer me
> > > to send the whole patchset for you to apply):
> > 
> > Hi Yuanhan,
> > 
> > I may miss one more issue here.
> > Could you please see below patch I've submitted today?
> > (I may find a similar issue, so I've fixed it also in below patch.)
> >  
> > - http://dpdk.org/dev/patchwork/patch/8038/
> >  
> > Thanks,
> > Tetsuya
> 
> Looking at that, at least when MQ is enabled, please don't key
> stopping queues off GET_VRING_BASE.

Yes, that's only a workaround. I guess it has been there for quite a
while, maybe at the time qemu doesn't send RESET_OWNER message.

> There are ENABLE/DISABLE messages for that.

That's something new, though I have plan to use them instead, we still
need to make sure our code work with old qemu, without ENABLE/DISABLE
messages.

And I will think more while enabling live migration: I should have
more time to address issues like this at that time.

> Generally guys, don't take whatever QEMU happens to do for
> granted! Look at the protocol spec under doc/specs directory,
> if you are making more assumptions you must document them!

Indeed. And we will try to address them bit by bit in future.

	--yliu

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v8 8/8] doc: update release note for vhost-user mq support
  2015-10-26 20:22     ` Thomas Monjalon
@ 2015-10-27  9:38       ` Yuanhan Liu
  0 siblings, 0 replies; 66+ messages in thread
From: Yuanhan Liu @ 2015-10-27  9:38 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

On Mon, Oct 26, 2015 at 09:22:10PM +0100, Thomas Monjalon wrote:
> 2015-10-22 20:35, Yuanhan Liu:
> > +* **vhost: added vhost-user mulitple queue support.**
> > +
> > +  Added vhost-user multiple queue support.
> 
> Excepted the typo, it is the same sentence twice, so not needed.

Good to know; I thought it was a must, like a required style.
> 
> General comment to every contributors: please avoid making a special commit
> to just update the release notes.
> There is no log message and it is understandable because it does not
> deserve a commit.

Got it.

> It will be merged with the previous one here.

And thank you!

	--yliu

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v8 3/8] vhost: vring queue setup for multiple queue support
  2015-10-27  9:30             ` Yuanhan Liu
@ 2015-10-27  9:42               ` Michael S. Tsirkin
  2015-10-27  9:51                 ` Thomas Monjalon
  2015-10-27  9:53                 ` Yuanhan Liu
  0 siblings, 2 replies; 66+ messages in thread
From: Michael S. Tsirkin @ 2015-10-27  9:42 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev, marcel

On Tue, Oct 27, 2015 at 05:30:41PM +0800, Yuanhan Liu wrote:
> On Tue, Oct 27, 2015 at 11:17:16AM +0200, Michael S. Tsirkin wrote:
> > On Tue, Oct 27, 2015 at 03:20:40PM +0900, Tetsuya Mukawa wrote:
> > > On 2015/10/26 14:42, Yuanhan Liu wrote:
> > > > On Mon, Oct 26, 2015 at 02:24:08PM +0900, Tetsuya Mukawa wrote:
> > > >> On 2015/10/22 21:35, Yuanhan Liu wrote:
> > > > ...
> > > >>> @@ -292,13 +300,13 @@ user_get_vring_base(struct vhost_device_ctx ctx,
> > > >>>  	 * sent and only sent in vhost_vring_stop.
> > > >>>  	 * TODO: cleanup the vring, it isn't usable since here.
> > > >>>  	 */
> > > >>> -	if ((dev->virtqueue[VIRTIO_RXQ]->kickfd) >= 0) {
> > > >>> -		close(dev->virtqueue[VIRTIO_RXQ]->kickfd);
> > > >>> -		dev->virtqueue[VIRTIO_RXQ]->kickfd = -1;
> > > >>> +	if ((dev->virtqueue[state->index]->kickfd + VIRTIO_RXQ) >= 0) {
> > > >>> +		close(dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd);
> > > >>> +		dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd = -1;
> > > >>>  	}
> > > >> Hi Yuanhan,
> > > >>
> > > >> Please let me make sure whether below is correct.
> > > >>     if ((dev->virtqueue[state->index]->kickfd + VIRTIO_RXQ) >= 0) {
> > > >>
> > > >>> -	if ((dev->virtqueue[VIRTIO_TXQ]->kickfd) >= 0) {
> > > >>> -		close(dev->virtqueue[VIRTIO_TXQ]->kickfd);
> > > >>> -		dev->virtqueue[VIRTIO_TXQ]->kickfd = -1;
> > > >>> +	if ((dev->virtqueue[state->index]->kickfd + VIRTIO_TXQ) >= 0) {
> > > >>> +		close(dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd);
> > > >>> +		dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd = -1;
> > > >> Also, same question here.
> > > > Oops, silly typos... Thanks for catching it out!
> > > >
> > > > Here is an update patch (Thomas, please let me know if you prefer me
> > > > to send the whole patchset for you to apply):
> > > 
> > > Hi Yuanhan,
> > > 
> > > I may miss one more issue here.
> > > Could you please see below patch I've submitted today?
> > > (I may find a similar issue, so I've fixed it also in below patch.)
> > >  
> > > - http://dpdk.org/dev/patchwork/patch/8038/
> > >  
> > > Thanks,
> > > Tetsuya
> > 
> > Looking at that, at least when MQ is enabled, please don't key
> > stopping queues off GET_VRING_BASE.
> 
> Yes, that's only a workaround. I guess it has been there for quite a
> while, maybe at the time qemu doesn't send RESET_OWNER message.

RESET_OWNER was a bad idea since it basically closes
everything.

> > There are ENABLE/DISABLE messages for that.
> 
> That's something new,

That's part of multiqueue support. If you ignore them,
nothing works properly.

> though I have plan to use them instead, we still
> need to make sure our code work with old qemu, without ENABLE/DISABLE
> messages.

OK but don't rely on this for new code.

> And I will think more while enabling live migration: I should have
> more time to address issues like this at that time.
> 
> > Generally guys, don't take whatever QEMU happens to do for
> > granted! Look at the protocol spec under doc/specs directory,
> > if you are making more assumptions you must document them!
> 
> Indeed. And we will try to address them bit by bit in future.
> 
> 	--yliu

But don't pile up these workarounds meanwhile.  I'm very worried.  The
way you are carrying on, each new QEMU is likely to break your
assumptions.

-- 
MST

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v8 3/8] vhost: vring queue setup for multiple queue support
  2015-10-27  9:42               ` Michael S. Tsirkin
@ 2015-10-27  9:51                 ` Thomas Monjalon
  2015-10-27  9:55                   ` Michael S. Tsirkin
  2015-10-27  9:53                 ` Yuanhan Liu
  1 sibling, 1 reply; 66+ messages in thread
From: Thomas Monjalon @ 2015-10-27  9:51 UTC (permalink / raw)
  To: Michael S. Tsirkin, Yuanhan Liu; +Cc: dev, marcel

2015-10-27 11:42, Michael S. Tsirkin:
> On Tue, Oct 27, 2015 at 05:30:41PM +0800, Yuanhan Liu wrote:
> > On Tue, Oct 27, 2015 at 11:17:16AM +0200, Michael S. Tsirkin wrote:
> > > Looking at that, at least when MQ is enabled, please don't key
> > > stopping queues off GET_VRING_BASE.
> > 
> > Yes, that's only a workaround. I guess it has been there for quite a
> > while, maybe at the time qemu doesn't send RESET_OWNER message.
> 
> RESET_OWNER was a bad idea since it basically closes
> everything.
> 
> > > There are ENABLE/DISABLE messages for that.
> > 
> > That's something new,
> 
> That's part of multiqueue support. If you ignore them,
> nothing works properly.
> 
> > though I have plan to use them instead, we still
> > need to make sure our code work with old qemu, without ENABLE/DISABLE
> > messages.
> 
> OK but don't rely on this for new code.
> 
> > And I will think more while enabling live migration: I should have
> > more time to address issues like this at that time.
> > 
> > > Generally guys, don't take whatever QEMU happens to do for
> > > granted! Look at the protocol spec under doc/specs directory,
> > > if you are making more assumptions you must document them!
> > 
> > Indeed. And we will try to address them bit by bit in future.
> > 
> > 	--yliu
> 
> But don't pile up these workarounds meanwhile.  I'm very worried.  The
> way you are carrying on, each new QEMU is likely to break your
> assumptions.

I think it may be saner to increase the minimum QEMU version supported in
each DPDK release, dropping old stuff progressively.
Michael, you are welcome to suggest how to move precisely.
Thanks

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v8 3/8] vhost: vring queue setup for multiple queue support
  2015-10-27  9:42               ` Michael S. Tsirkin
  2015-10-27  9:51                 ` Thomas Monjalon
@ 2015-10-27  9:53                 ` Yuanhan Liu
  1 sibling, 0 replies; 66+ messages in thread
From: Yuanhan Liu @ 2015-10-27  9:53 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: dev, marcel

On Tue, Oct 27, 2015 at 11:42:24AM +0200, Michael S. Tsirkin wrote:
...
> > > Looking at that, at least when MQ is enabled, please don't key
> > > stopping queues off GET_VRING_BASE.
> > 
> > Yes, that's only a workaround. I guess it has been there for quite a
> > while, maybe at the time qemu doesn't send RESET_OWNER message.
> 
> RESET_OWNER was a bad idea since it basically closes
> everything.
> 
> > > There are ENABLE/DISABLE messages for that.
> > 
> > That's something new,
> 
> That's part of multiqueue support. If you ignore them,
> nothing works properly.

I will handle them shortly. (well, it may still need weeks :(

> > though I have plan to use them instead, we still
> > need to make sure our code work with old qemu, without ENABLE/DISABLE
> > messages.
> 
> OK but don't rely on this for new code.

Yes.

> 
> > And I will think more while enabling live migration: I should have
> > more time to address issues like this at that time.
> > 
> > > Generally guys, don't take whatever QEMU happens to do for
> > > granted! Look at the protocol spec under doc/specs directory,
> > > if you are making more assumptions you must document them!
> > 
> > Indeed. And we will try to address them bit by bit in future.
> > 
> > 	--yliu
> 
> But don't pile up these workarounds meanwhile.  I'm very worried.  The
> way you are carrying on, each new QEMU is likely to break your
> assumptions.

Good point. I'll have more discussion with Huawei, to see if we can
fix them sooner.

	--yliu

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v8 3/8] vhost: vring queue setup for multiple queue support
  2015-10-27  9:51                 ` Thomas Monjalon
@ 2015-10-27  9:55                   ` Michael S. Tsirkin
  2015-10-27 10:41                     ` Xie, Huawei
  0 siblings, 1 reply; 66+ messages in thread
From: Michael S. Tsirkin @ 2015-10-27  9:55 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, marcel

On Tue, Oct 27, 2015 at 10:51:14AM +0100, Thomas Monjalon wrote:
> 2015-10-27 11:42, Michael S. Tsirkin:
> > On Tue, Oct 27, 2015 at 05:30:41PM +0800, Yuanhan Liu wrote:
> > > On Tue, Oct 27, 2015 at 11:17:16AM +0200, Michael S. Tsirkin wrote:
> > > > Looking at that, at least when MQ is enabled, please don't key
> > > > stopping queues off GET_VRING_BASE.
> > > 
> > > Yes, that's only a workaround. I guess it has been there for quite a
> > > while, maybe at the time qemu doesn't send RESET_OWNER message.
> > 
> > RESET_OWNER was a bad idea since it basically closes
> > everything.
> > 
> > > > There are ENABLE/DISABLE messages for that.
> > > 
> > > That's something new,
> > 
> > That's part of multiqueue support. If you ignore them,
> > nothing works properly.
> > 
> > > though I have plan to use them instead, we still
> > > need to make sure our code work with old qemu, without ENABLE/DISABLE
> > > messages.
> > 
> > OK but don't rely on this for new code.
> > 
> > > And I will think more while enabling live migration: I should have
> > > more time to address issues like this at that time.
> > > 
> > > > Generally guys, don't take whatever QEMU happens to do for
> > > > granted! Look at the protocol spec under doc/specs directory,
> > > > if you are making more assumptions you must document them!
> > > 
> > > Indeed. And we will try to address them bit by bit in future.
> > > 
> > > 	--yliu
> > 
> > But don't pile up these workarounds meanwhile.  I'm very worried.  The
> > way you are carrying on, each new QEMU is likely to break your
> > assumptions.
> 
> I think it may be saner to increase the minimum QEMU version supported in
> each DPDK release, dropping old stuff progressively.
> Michael, you are welcome to suggest how to move precisely.
> Thanks

This doesn't work for downstreams which need to backport fixes and
features.

Just go by the spec, and if you find issues, fix them at the
source instead of working around them - the code is open.

For new features, we have protocol feature bits.

-- 
MST

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v8 3/8] vhost: vring queue setup for multiple queue support
  2015-10-27  9:55                   ` Michael S. Tsirkin
@ 2015-10-27 10:41                     ` Xie, Huawei
  0 siblings, 0 replies; 66+ messages in thread
From: Xie, Huawei @ 2015-10-27 10:41 UTC (permalink / raw)
  To: Michael S. Tsirkin, Thomas Monjalon; +Cc: dev, marcel

On 10/27/2015 5:56 PM, Michael S. Tsirkin wrote:
> On Tue, Oct 27, 2015 at 10:51:14AM +0100, Thomas Monjalon wrote:
>> 2015-10-27 11:42, Michael S. Tsirkin:
>>> On Tue, Oct 27, 2015 at 05:30:41PM +0800, Yuanhan Liu wrote:
>>>> On Tue, Oct 27, 2015 at 11:17:16AM +0200, Michael S. Tsirkin wrote:
>>>>> Looking at that, at least when MQ is enabled, please don't key
>>>>> stopping queues off GET_VRING_BASE.
>>>> Yes, that's only a workaround. I guess it has been there for quite a
>>>> while, maybe at the time qemu doesn't send RESET_OWNER message.
>>> RESET_OWNER was a bad idea since it basically closes
>>> everything.
>>>
>>>>> There are ENABLE/DISABLE messages for that.
>>>> That's something new,
>>> That's part of multiqueue support. If you ignore them,
>>> nothing works properly.
>>>
>>>> though I have plan to use them instead, we still
>>>> need to make sure our code work with old qemu, without ENABLE/DISABLE
>>>> messages.
>>> OK but don't rely on this for new code.
>>>
>>>> And I will think more while enabling live migration: I should have
>>>> more time to address issues like this at that time.
>>>>
>>>>> Generally guys, don't take whatever QEMU happens to do for
>>>>> granted! Look at the protocol spec under doc/specs directory,
>>>>> if you are making more assumptions you must document them!
>>>> Indeed. And we will try to address them bit by bit in future.
>>>>
>>>> 	--yliu
>>> But don't pile up these workarounds meanwhile.  I'm very worried.  The
>>> way you are carrying on, each new QEMU is likely to break your
>>> assumptions.
>> I think it may be saner to increase the minimum QEMU version supported in
>> each DPDK release, dropping old stuff progressively.
>> Michael, you are welcome to suggest how to move precisely.
>> Thanks
> This doesn't work for downstreams which need to backport fixes and
> features.
>
> Just go by the spec, and if you find issues, fix them at the
> source instead of working around them - the code is open.
>
> For new features, we have protocol feature bits.
To me, one requirement is we need clear message(or spec) to know when
virtio device (or better queue granularity) could be processed or should
be stopped from processing. We need have a clear state machine in mind.
For another requirement, we hope QEMU could send vhost an ID to let
vhost-user have the ability to identify the connection. Let us discuss
this in other thread.



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v7 4/8] vhost: rxtx: use queue id instead of constant ring index
  2015-10-24 17:47               ` Michael S. Tsirkin
@ 2015-10-28 20:30                 ` Flavio Leitner
  2015-10-28 21:12                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 66+ messages in thread
From: Flavio Leitner @ 2015-10-28 20:30 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: dev, marcel

On Sat, Oct 24, 2015 at 08:47:10PM +0300, Michael S. Tsirkin wrote:
> On Sat, Oct 24, 2015 at 12:34:08AM -0200, Flavio Leitner wrote:
> > On Thu, Oct 22, 2015 at 02:32:31PM +0300, Michael S. Tsirkin wrote:
> > > On Thu, Oct 22, 2015 at 05:49:55PM +0800, Yuanhan Liu wrote:
> > > > On Wed, Oct 21, 2015 at 05:26:18PM +0300, Michael S. Tsirkin wrote:
> > > > > On Wed, Oct 21, 2015 at 08:48:15PM +0800, Yuanhan Liu wrote:
> > > > > > > Please note that for virtio devices, guest is supposed to
> > > > > > > control the placement of incoming packets in RX queues.
> > > > > > 
> > > > > > I may not follow you.
> > > > > > 
> > > > > > Enqueuing packets to a RX queue is done at vhost lib, outside the
> > > > > > guest, how could the guest take the control here?
> > > > > > 
> > > > > > 	--yliu
> > > > > 
> > > > > vhost should do what guest told it to.
> > > > > 
> > > > > See virtio spec:
> > > > > 	5.1.6.5.5 Automatic receive steering in multiqueue mode
> > > > 
> > > > Spec says:
> > > > 
> > > >     After the driver transmitted a packet of a flow on transmitqX,
> > > >     the device SHOULD cause incoming packets for that flow to be
> > > >     steered to receiveqX.
> > > > 
> > > > 
> > > > Michael, I still have no idea how vhost could know the flow even
> > > > after discussion with Huawei. Could you be more specific about
> > > > this? Say, how could guest know that? And how could guest tell
> > > > vhost which RX is gonna to use?
> > > > 
> > > > Thanks.
> > > > 
> > > > 	--yliu
> > > 
> > > I don't really understand the question.
> > > 
> > > When guests transmits a packet, it makes a decision
> > > about the flow to use, and maps that to a tx/rx pair of queues.
> > > 
> > > It sends packets out on the tx queue and expects device to
> > > return packets from the same flow on the rx queue.
> > 
> > Why? I can understand that there should be a mapping between
> > flows and queues in a way that there is no re-ordering, but
> > I can't see the relation of receiving a flow with a TX queue.
> > 
> > fbl
> 
> That's the way virtio chose to program the rx steering logic.
> 
> It's low overhead (no special commands), and
> works well for TCP when user is an endpoint since rx and tx
> for tcp are generally tied (because of ack handling).
> 
> We can discuss other ways, e.g. special commands for guests to
> program steering.
> We'd have to first see some data showing the current scheme
> is problematic somehow.

The issue is that the restriction imposes operations to be done in the
data path.  For instance, Open vSwitch has N number of threads to manage
X RX queues. We distribute them in round-robin fashion.  So, the thread
polling one RX queue will do all the packet processing and push it to the
TX queue of the other device (vhost-user or not) using the same 'id'.

Doing so we can avoid locking between threads and TX queues and any other
extra computation while still keeping the packet ordering/distribution fine.

However, if vhost-user has to send packets according with guest mapping,
it will require locking between queues and additional operations to select
the appropriate queue.  Those actions will cause performance issues.

I see no real benefit from enforcing the guest mapping outside to
justify all the computation cost, so my suggestion is to change the
spec to suggest that behavior, but not to require that to be compliant.

Does that make sense?

Thanks,
fbl

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v7 4/8] vhost: rxtx: use queue id instead of constant ring index
  2015-10-28 20:30                 ` Flavio Leitner
@ 2015-10-28 21:12                   ` Michael S. Tsirkin
  2015-11-16 22:20                     ` Flavio Leitner
  0 siblings, 1 reply; 66+ messages in thread
From: Michael S. Tsirkin @ 2015-10-28 21:12 UTC (permalink / raw)
  To: Flavio Leitner; +Cc: dev, marcel

On Wed, Oct 28, 2015 at 06:30:41PM -0200, Flavio Leitner wrote:
> On Sat, Oct 24, 2015 at 08:47:10PM +0300, Michael S. Tsirkin wrote:
> > On Sat, Oct 24, 2015 at 12:34:08AM -0200, Flavio Leitner wrote:
> > > On Thu, Oct 22, 2015 at 02:32:31PM +0300, Michael S. Tsirkin wrote:
> > > > On Thu, Oct 22, 2015 at 05:49:55PM +0800, Yuanhan Liu wrote:
> > > > > On Wed, Oct 21, 2015 at 05:26:18PM +0300, Michael S. Tsirkin wrote:
> > > > > > On Wed, Oct 21, 2015 at 08:48:15PM +0800, Yuanhan Liu wrote:
> > > > > > > > Please note that for virtio devices, guest is supposed to
> > > > > > > > control the placement of incoming packets in RX queues.
> > > > > > > 
> > > > > > > I may not follow you.
> > > > > > > 
> > > > > > > Enqueuing packets to a RX queue is done at vhost lib, outside the
> > > > > > > guest, how could the guest take the control here?
> > > > > > > 
> > > > > > > 	--yliu
> > > > > > 
> > > > > > vhost should do what guest told it to.
> > > > > > 
> > > > > > See virtio spec:
> > > > > > 	5.1.6.5.5 Automatic receive steering in multiqueue mode
> > > > > 
> > > > > Spec says:
> > > > > 
> > > > >     After the driver transmitted a packet of a flow on transmitqX,
> > > > >     the device SHOULD cause incoming packets for that flow to be
> > > > >     steered to receiveqX.
> > > > > 
> > > > > 
> > > > > Michael, I still have no idea how vhost could know the flow even
> > > > > after discussion with Huawei. Could you be more specific about
> > > > > this? Say, how could guest know that? And how could guest tell
> > > > > vhost which RX is gonna to use?
> > > > > 
> > > > > Thanks.
> > > > > 
> > > > > 	--yliu
> > > > 
> > > > I don't really understand the question.
> > > > 
> > > > When guests transmits a packet, it makes a decision
> > > > about the flow to use, and maps that to a tx/rx pair of queues.
> > > > 
> > > > It sends packets out on the tx queue and expects device to
> > > > return packets from the same flow on the rx queue.
> > > 
> > > Why? I can understand that there should be a mapping between
> > > flows and queues in a way that there is no re-ordering, but
> > > I can't see the relation of receiving a flow with a TX queue.
> > > 
> > > fbl
> > 
> > That's the way virtio chose to program the rx steering logic.
> > 
> > It's low overhead (no special commands), and
> > works well for TCP when user is an endpoint since rx and tx
> > for tcp are generally tied (because of ack handling).
> > 
> > We can discuss other ways, e.g. special commands for guests to
> > program steering.
> > We'd have to first see some data showing the current scheme
> > is problematic somehow.
> 
> The issue is that the restriction imposes operations to be done in the
> data path.  For instance, Open vSwitch has N number of threads to manage
> X RX queues. We distribute them in round-robin fashion.  So, the thread
> polling one RX queue will do all the packet processing and push it to the
> TX queue of the other device (vhost-user or not) using the same 'id'.
> 
> Doing so we can avoid locking between threads and TX queues and any other
> extra computation while still keeping the packet ordering/distribution fine.
> 
> However, if vhost-user has to send packets according with guest mapping,
> it will require locking between queues and additional operations to select
> the appropriate queue.  Those actions will cause performance issues.


You only need to send updates if guest moves a flow to another queue.
This is very rare since guest must avoid reordering.

Oh and you don't have to have locking.  Just update the table and make
the target pick up the new value at leasure, worst case a packet ends up
in the wrong queue.


> I see no real benefit from enforcing the guest mapping outside to
> justify all the computation cost, so my suggestion is to change the
> spec to suggest that behavior, but not to require that to be compliant.
> 
> Does that make sense?
> 
> Thanks,
> fbl

It's not a question of what the spec says, it's a question of the
quality of implementation: guest needs to be able to balance load
between CPUs serving the queues, this means it needs a way to control
steering.

IMO having dpdk control it makes no sense in the scenario.

This is different from dpdk sending packets to real NIC
queues which all operate in parallel.

-- 
MST

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v7 4/8] vhost: rxtx: use queue id instead of constant ring index
  2015-10-28 21:12                   ` Michael S. Tsirkin
@ 2015-11-16 22:20                     ` Flavio Leitner
  2015-11-17  8:23                       ` Michael S. Tsirkin
  0 siblings, 1 reply; 66+ messages in thread
From: Flavio Leitner @ 2015-11-16 22:20 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: dev, marcel

On Wed, Oct 28, 2015 at 11:12:25PM +0200, Michael S. Tsirkin wrote:
> On Wed, Oct 28, 2015 at 06:30:41PM -0200, Flavio Leitner wrote:
> > On Sat, Oct 24, 2015 at 08:47:10PM +0300, Michael S. Tsirkin wrote:
> > > On Sat, Oct 24, 2015 at 12:34:08AM -0200, Flavio Leitner wrote:
> > > > On Thu, Oct 22, 2015 at 02:32:31PM +0300, Michael S. Tsirkin wrote:
> > > > > On Thu, Oct 22, 2015 at 05:49:55PM +0800, Yuanhan Liu wrote:
> > > > > > On Wed, Oct 21, 2015 at 05:26:18PM +0300, Michael S. Tsirkin wrote:
> > > > > > > On Wed, Oct 21, 2015 at 08:48:15PM +0800, Yuanhan Liu wrote:
> > > > > > > > > Please note that for virtio devices, guest is supposed to
> > > > > > > > > control the placement of incoming packets in RX queues.
> > > > > > > > 
> > > > > > > > I may not follow you.
> > > > > > > > 
> > > > > > > > Enqueuing packets to a RX queue is done at vhost lib, outside the
> > > > > > > > guest, how could the guest take the control here?
> > > > > > > > 
> > > > > > > > 	--yliu
> > > > > > > 
> > > > > > > vhost should do what guest told it to.
> > > > > > > 
> > > > > > > See virtio spec:
> > > > > > > 	5.1.6.5.5 Automatic receive steering in multiqueue mode
> > > > > > 
> > > > > > Spec says:
> > > > > > 
> > > > > >     After the driver transmitted a packet of a flow on transmitqX,
> > > > > >     the device SHOULD cause incoming packets for that flow to be
> > > > > >     steered to receiveqX.
> > > > > > 
> > > > > > 
> > > > > > Michael, I still have no idea how vhost could know the flow even
> > > > > > after discussion with Huawei. Could you be more specific about
> > > > > > this? Say, how could guest know that? And how could guest tell
> > > > > > vhost which RX is gonna to use?
> > > > > > 
> > > > > > Thanks.
> > > > > > 
> > > > > > 	--yliu
> > > > > 
> > > > > I don't really understand the question.
> > > > > 
> > > > > When guests transmits a packet, it makes a decision
> > > > > about the flow to use, and maps that to a tx/rx pair of queues.
> > > > > 
> > > > > It sends packets out on the tx queue and expects device to
> > > > > return packets from the same flow on the rx queue.
> > > > 
> > > > Why? I can understand that there should be a mapping between
> > > > flows and queues in a way that there is no re-ordering, but
> > > > I can't see the relation of receiving a flow with a TX queue.
> > > > 
> > > > fbl
> > > 
> > > That's the way virtio chose to program the rx steering logic.
> > > 
> > > It's low overhead (no special commands), and
> > > works well for TCP when user is an endpoint since rx and tx
> > > for tcp are generally tied (because of ack handling).

It is low overhead for the control plane, but not for the data plane.

> > > We can discuss other ways, e.g. special commands for guests to
> > > program steering.
> > > We'd have to first see some data showing the current scheme
> > > is problematic somehow.

The issue is that the spec assumes the packets are coming in
a serialized way and the distribution will be made by vhost-user
but that isn't necessarily true.


> > The issue is that the restriction imposes operations to be done in the
> > data path.  For instance, Open vSwitch has N number of threads to manage
> > X RX queues. We distribute them in round-robin fashion.  So, the thread
> > polling one RX queue will do all the packet processing and push it to the
> > TX queue of the other device (vhost-user or not) using the same 'id'.
> > 
> > Doing so we can avoid locking between threads and TX queues and any other
> > extra computation while still keeping the packet ordering/distribution fine.
> > 
> > However, if vhost-user has to send packets according with guest mapping,
> > it will require locking between queues and additional operations to select
> > the appropriate queue.  Those actions will cause performance issues.
> 
> You only need to send updates if guest moves a flow to another queue.
> This is very rare since guest must avoid reordering.

OK, maybe I missed something.  Could you point me to the spec talking
about the update?

 
> Oh and you don't have to have locking.  Just update the table and make
> the target pick up the new value at leasure, worst case a packet ends up
> in the wrong queue.

You do because packets are coming on different vswitch queues and they
could get mapped to the same virtio queue enforced by the guest, so some
sort of synchronization is needed.

That is one thing.  Another is that it will need some mapping between the
hash available in the vswitch (not necessary L2~L4) with the hash/queue
mapping provided by the guest.  That doesn't require locking, but it's a
costly operation.  Alternatively, vswitch could calculate full L2-L4 hash
which is also a costly operation.

Packets ending in the wrong queue isn't that bad, but then we need to
enforce processing order because re-ordering is really bad.


> > I see no real benefit from enforcing the guest mapping outside to
> > justify all the computation cost, so my suggestion is to change the
> > spec to suggest that behavior, but not to require that to be compliant.
> > 
> > Does that make sense?
> > 
> > Thanks,
> > fbl
> 
> It's not a question of what the spec says, it's a question of the
> quality of implementation: guest needs to be able to balance load
> between CPUs serving the queues, this means it needs a way to control
> steering.
 
Indeed, a mapping could be provided by the guest to steer certain flows
to specific queues and of course the implementation must follow that.
However, it seems that guest could let that mapping simply open too.


> IMO having dpdk control it makes no sense in the scenario.

Why not? The only requirement should be that the implemention
should avoid re-ordering by keeping the mapping stable between
streams and queues.


> This is different from dpdk sending packets to real NIC
> queues which all operate in parallel.

The goal of multiqueue support is to have them working in parallel.

fbl

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v7 4/8] vhost: rxtx: use queue id instead of constant ring index
  2015-11-16 22:20                     ` Flavio Leitner
@ 2015-11-17  8:23                       ` Michael S. Tsirkin
  2015-11-17  9:24                         ` Jason Wang
  2015-11-17 22:49                         ` Flavio Leitner
  0 siblings, 2 replies; 66+ messages in thread
From: Michael S. Tsirkin @ 2015-11-17  8:23 UTC (permalink / raw)
  To: Flavio Leitner; +Cc: dev, marcel

On Mon, Nov 16, 2015 at 02:20:57PM -0800, Flavio Leitner wrote:
> On Wed, Oct 28, 2015 at 11:12:25PM +0200, Michael S. Tsirkin wrote:
> > On Wed, Oct 28, 2015 at 06:30:41PM -0200, Flavio Leitner wrote:
> > > On Sat, Oct 24, 2015 at 08:47:10PM +0300, Michael S. Tsirkin wrote:
> > > > On Sat, Oct 24, 2015 at 12:34:08AM -0200, Flavio Leitner wrote:
> > > > > On Thu, Oct 22, 2015 at 02:32:31PM +0300, Michael S. Tsirkin wrote:
> > > > > > On Thu, Oct 22, 2015 at 05:49:55PM +0800, Yuanhan Liu wrote:
> > > > > > > On Wed, Oct 21, 2015 at 05:26:18PM +0300, Michael S. Tsirkin wrote:
> > > > > > > > On Wed, Oct 21, 2015 at 08:48:15PM +0800, Yuanhan Liu wrote:
> > > > > > > > > > Please note that for virtio devices, guest is supposed to
> > > > > > > > > > control the placement of incoming packets in RX queues.
> > > > > > > > > 
> > > > > > > > > I may not follow you.
> > > > > > > > > 
> > > > > > > > > Enqueuing packets to a RX queue is done at vhost lib, outside the
> > > > > > > > > guest, how could the guest take the control here?
> > > > > > > > > 
> > > > > > > > > 	--yliu
> > > > > > > > 
> > > > > > > > vhost should do what guest told it to.
> > > > > > > > 
> > > > > > > > See virtio spec:
> > > > > > > > 	5.1.6.5.5 Automatic receive steering in multiqueue mode
> > > > > > > 
> > > > > > > Spec says:
> > > > > > > 
> > > > > > >     After the driver transmitted a packet of a flow on transmitqX,
> > > > > > >     the device SHOULD cause incoming packets for that flow to be
> > > > > > >     steered to receiveqX.
> > > > > > > 
> > > > > > > 
> > > > > > > Michael, I still have no idea how vhost could know the flow even
> > > > > > > after discussion with Huawei. Could you be more specific about
> > > > > > > this? Say, how could guest know that? And how could guest tell
> > > > > > > vhost which RX is gonna to use?
> > > > > > > 
> > > > > > > Thanks.
> > > > > > > 
> > > > > > > 	--yliu
> > > > > > 
> > > > > > I don't really understand the question.
> > > > > > 
> > > > > > When guests transmits a packet, it makes a decision
> > > > > > about the flow to use, and maps that to a tx/rx pair of queues.
> > > > > > 
> > > > > > It sends packets out on the tx queue and expects device to
> > > > > > return packets from the same flow on the rx queue.
> > > > > 
> > > > > Why? I can understand that there should be a mapping between
> > > > > flows and queues in a way that there is no re-ordering, but
> > > > > I can't see the relation of receiving a flow with a TX queue.
> > > > > 
> > > > > fbl
> > > > 
> > > > That's the way virtio chose to program the rx steering logic.
> > > > 
> > > > It's low overhead (no special commands), and
> > > > works well for TCP when user is an endpoint since rx and tx
> > > > for tcp are generally tied (because of ack handling).
> 
> It is low overhead for the control plane, but not for the data plane.

Well, there's zero data plane overhead within the guest.
You can't go lower :)

> > > > We can discuss other ways, e.g. special commands for guests to
> > > > program steering.
> > > > We'd have to first see some data showing the current scheme
> > > > is problematic somehow.
> 
> The issue is that the spec assumes the packets are coming in
> a serialized way and the distribution will be made by vhost-user
> but that isn't necessarily true.
> 

Making the distribution guest controlled is obviously the right
thing to do if guest is the endpoint: we need guest scheduler to
make the decisions, it's the only entity that knows
how are tasks distributed across VCPUs.

It's possible that this is not the right thing for when guest
is just doing bridging between two VNICs:
are you saying packets should just go from RX queue N
on eth0 to TX queue N on eth1, making host make all
the queue selection decisions?

This sounds reasonable. Since there's a mix of local and
bridged traffic normally, does this mean we need
a per-packet flag that tells host to
ignore the packet for classification purposes?


> > > The issue is that the restriction imposes operations to be done in the
> > > data path.  For instance, Open vSwitch has N number of threads to manage
> > > X RX queues. We distribute them in round-robin fashion.  So, the thread
> > > polling one RX queue will do all the packet processing and push it to the
> > > TX queue of the other device (vhost-user or not) using the same 'id'.
> > > 
> > > Doing so we can avoid locking between threads and TX queues and any other
> > > extra computation while still keeping the packet ordering/distribution fine.
> > > 
> > > However, if vhost-user has to send packets according with guest mapping,
> > > it will require locking between queues and additional operations to select
> > > the appropriate queue.  Those actions will cause performance issues.
> > 
> > You only need to send updates if guest moves a flow to another queue.
> > This is very rare since guest must avoid reordering.
> 
> OK, maybe I missed something.  Could you point me to the spec talking
> about the update?
> 

It doesn't talk about that really - it's an implementation
detail. What I am saying is that you can have e.g.
a per queue data structure with flows using it.
If you find the flow there, then you know nothing changed
and there is no need to update other queues.



> > Oh and you don't have to have locking.  Just update the table and make
> > the target pick up the new value at leasure, worst case a packet ends up
> > in the wrong queue.
> 
> You do because packets are coming on different vswitch queues and they
> could get mapped to the same virtio queue enforced by the guest, so some
> sort of synchronization is needed.

Right. So to optimize that, you really need a 1:1 mapping, but this
optimization only makes sense if guest is not in the end processing
these packets in the application on the same CPU - otherwise you
are just causing IPIs.

With the per-packet flag to bypass the classifier as suggested above,
you would do a lookup, find flow is not classified and just forward
it 1:1 as you wanted to.

> That is one thing.  Another is that it will need some mapping between the
> hash available in the vswitch (not necessary L2~L4) with the hash/queue
> mapping provided by the guest.  That doesn't require locking, but it's a
> costly operation.  Alternatively, vswitch could calculate full L2-L4 hash
> which is also a costly operation.
> 
> Packets ending in the wrong queue isn't that bad, but then we need to
> enforce processing order because re-ordering is really bad.
> 

Right. So if you consider a mix of packets with guest as endpoint
and guest as a bridge, then there's apparently no way out -
you need to identify the flow somehow in order to know
which is which.

I guess one solution is to give up and make it a global
decision.

But OTOH I think igb supports calculating the RX hash in hardware:
it sets NETIF_F_RXHASH on Linux.
If so, can't that be used for the initial lookup?


> > > I see no real benefit from enforcing the guest mapping outside to
> > > justify all the computation cost, so my suggestion is to change the
> > > spec to suggest that behavior, but not to require that to be compliant.
> > > 
> > > Does that make sense?
> > > 
> > > Thanks,
> > > fbl
> > 
> > It's not a question of what the spec says, it's a question of the
> > quality of implementation: guest needs to be able to balance load
> > between CPUs serving the queues, this means it needs a way to control
> > steering.
>  
> Indeed, a mapping could be provided by the guest to steer certain flows
> to specific queues and of course the implementation must follow that.
> However, it seems that guest could let that mapping simply open too.

Right, we can add such an option in the spec.


> 
> > IMO having dpdk control it makes no sense in the scenario.
> 
> Why not? The only requirement should be that the implemention
> should avoid re-ordering by keeping the mapping stable between
> streams and queues.

Well this depends on whether there's an application within
guest that consumes the flow and does something with
the data. If yes, then we need to be careful not to
compete with that application for CPU, otherwise
it won't be able to produce data.

I guess that's not the case for pcktgen or forwarding,
in these cases networking is all you care about.


> 
> > This is different from dpdk sending packets to real NIC
> > queues which all operate in parallel.
> 
> The goal of multiqueue support is to have them working in parallel.
> 
> fbl

What I meant is "in parallel with the application doing the
actual logic and producing the packets".

-- 
MST

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v7 4/8] vhost: rxtx: use queue id instead of constant ring index
  2015-11-17  8:23                       ` Michael S. Tsirkin
@ 2015-11-17  9:24                         ` Jason Wang
  2015-11-17 22:49                         ` Flavio Leitner
  1 sibling, 0 replies; 66+ messages in thread
From: Jason Wang @ 2015-11-17  9:24 UTC (permalink / raw)
  To: Michael S. Tsirkin, Flavio Leitner; +Cc: dev, marcel



On 11/17/2015 04:23 PM, Michael S. Tsirkin wrote:
> On Mon, Nov 16, 2015 at 02:20:57PM -0800, Flavio Leitner wrote:
>> > On Wed, Oct 28, 2015 at 11:12:25PM +0200, Michael S. Tsirkin wrote:
>>> > > On Wed, Oct 28, 2015 at 06:30:41PM -0200, Flavio Leitner wrote:
>>>> > > > On Sat, Oct 24, 2015 at 08:47:10PM +0300, Michael S. Tsirkin wrote:
>>>>> > > > > On Sat, Oct 24, 2015 at 12:34:08AM -0200, Flavio Leitner wrote:
>>>>>> > > > > > On Thu, Oct 22, 2015 at 02:32:31PM +0300, Michael S. Tsirkin wrote:
>>>>>>> > > > > > > On Thu, Oct 22, 2015 at 05:49:55PM +0800, Yuanhan Liu wrote:
>>>>>>>> > > > > > > > On Wed, Oct 21, 2015 at 05:26:18PM +0300, Michael S. Tsirkin wrote:
>>>>>>>>> > > > > > > > > On Wed, Oct 21, 2015 at 08:48:15PM +0800, Yuanhan Liu wrote:
>>>>>>>>>>> > > > > > > > > > > Please note that for virtio devices, guest is supposed to
>>>>>>>>>>> > > > > > > > > > > control the placement of incoming packets in RX queues.
>>>>>>>>>> > > > > > > > > > 
>>>>>>>>>> > > > > > > > > > I may not follow you.
>>>>>>>>>> > > > > > > > > > 
>>>>>>>>>> > > > > > > > > > Enqueuing packets to a RX queue is done at vhost lib, outside the
>>>>>>>>>> > > > > > > > > > guest, how could the guest take the control here?
>>>>>>>>>> > > > > > > > > > 
>>>>>>>>>> > > > > > > > > > 	--yliu
>>>>>>>>> > > > > > > > > 
>>>>>>>>> > > > > > > > > vhost should do what guest told it to.
>>>>>>>>> > > > > > > > > 
>>>>>>>>> > > > > > > > > See virtio spec:
>>>>>>>>> > > > > > > > > 	5.1.6.5.5 Automatic receive steering in multiqueue mode
>>>>>>>> > > > > > > > 
>>>>>>>> > > > > > > > Spec says:
>>>>>>>> > > > > > > > 
>>>>>>>> > > > > > > >     After the driver transmitted a packet of a flow on transmitqX,
>>>>>>>> > > > > > > >     the device SHOULD cause incoming packets for that flow to be
>>>>>>>> > > > > > > >     steered to receiveqX.
>>>>>>>> > > > > > > > 
>>>>>>>> > > > > > > > 
>>>>>>>> > > > > > > > Michael, I still have no idea how vhost could know the flow even
>>>>>>>> > > > > > > > after discussion with Huawei. Could you be more specific about
>>>>>>>> > > > > > > > this? Say, how could guest know that? And how could guest tell
>>>>>>>> > > > > > > > vhost which RX is gonna to use?
>>>>>>>> > > > > > > > 
>>>>>>>> > > > > > > > Thanks.
>>>>>>>> > > > > > > > 
>>>>>>>> > > > > > > > 	--yliu
>>>>>>> > > > > > > 
>>>>>>> > > > > > > I don't really understand the question.
>>>>>>> > > > > > > 
>>>>>>> > > > > > > When guests transmits a packet, it makes a decision
>>>>>>> > > > > > > about the flow to use, and maps that to a tx/rx pair of queues.
>>>>>>> > > > > > > 
>>>>>>> > > > > > > It sends packets out on the tx queue and expects device to
>>>>>>> > > > > > > return packets from the same flow on the rx queue.
>>>>>> > > > > > 
>>>>>> > > > > > Why? I can understand that there should be a mapping between
>>>>>> > > > > > flows and queues in a way that there is no re-ordering, but
>>>>>> > > > > > I can't see the relation of receiving a flow with a TX queue.
>>>>>> > > > > > 
>>>>>> > > > > > fbl
>>>>> > > > > 
>>>>> > > > > That's the way virtio chose to program the rx steering logic.
>>>>> > > > > 
>>>>> > > > > It's low overhead (no special commands), and
>>>>> > > > > works well for TCP when user is an endpoint since rx and tx
>>>>> > > > > for tcp are generally tied (because of ack handling).
>> > 
>> > It is low overhead for the control plane, but not for the data plane.
> Well, there's zero data plane overhead within the guest.
> You can't go lower :)
>
>>>>> > > > > We can discuss other ways, e.g. special commands for guests to
>>>>> > > > > program steering.
>>>>> > > > > We'd have to first see some data showing the current scheme
>>>>> > > > > is problematic somehow.
>> > 
>> > The issue is that the spec assumes the packets are coming in
>> > a serialized way and the distribution will be made by vhost-user
>> > but that isn't necessarily true.
>> > 
> Making the distribution guest controlled is obviously the right
> thing to do if guest is the endpoint: we need guest scheduler to
> make the decisions, it's the only entity that knows
> how are tasks distributed across VCPUs.
>
> It's possible that this is not the right thing for when guest
> is just doing bridging between two VNICs:
> are you saying packets should just go from RX queue N
> on eth0 to TX queue N on eth1, making host make all
> the queue selection decisions?

The problem looks like current automatic steering policy is not flexible
for all kinds of workload in guest. So we can implement the feature of
ntuple filters and export the interfaces to let guest/drivers to decide.

>
> This sounds reasonable. Since there's a mix of local and
> bridged traffic normally, does this mean we need
> a per-packet flag that tells host to
> ignore the packet for classification purposes?

This may not work well for all workloads. E.g shot live connections.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [dpdk-dev] [PATCH v7 4/8] vhost: rxtx: use queue id instead of constant ring index
  2015-11-17  8:23                       ` Michael S. Tsirkin
  2015-11-17  9:24                         ` Jason Wang
@ 2015-11-17 22:49                         ` Flavio Leitner
  1 sibling, 0 replies; 66+ messages in thread
From: Flavio Leitner @ 2015-11-17 22:49 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: dev, marcel

On Tue, Nov 17, 2015 at 10:23:38AM +0200, Michael S. Tsirkin wrote:
> On Mon, Nov 16, 2015 at 02:20:57PM -0800, Flavio Leitner wrote:
> > On Wed, Oct 28, 2015 at 11:12:25PM +0200, Michael S. Tsirkin wrote:
> > > On Wed, Oct 28, 2015 at 06:30:41PM -0200, Flavio Leitner wrote:
> > > > On Sat, Oct 24, 2015 at 08:47:10PM +0300, Michael S. Tsirkin wrote:
> > > > > On Sat, Oct 24, 2015 at 12:34:08AM -0200, Flavio Leitner wrote:
> > > > > > On Thu, Oct 22, 2015 at 02:32:31PM +0300, Michael S. Tsirkin wrote:
> > > > > > > On Thu, Oct 22, 2015 at 05:49:55PM +0800, Yuanhan Liu wrote:
> > > > > > > > On Wed, Oct 21, 2015 at 05:26:18PM +0300, Michael S. Tsirkin wrote:
> > > > > > > > > On Wed, Oct 21, 2015 at 08:48:15PM +0800, Yuanhan Liu wrote:
> > > > > > > > > > > Please note that for virtio devices, guest is supposed to
> > > > > > > > > > > control the placement of incoming packets in RX queues.
> > > > > > > > > > 
> > > > > > > > > > I may not follow you.
> > > > > > > > > > 
> > > > > > > > > > Enqueuing packets to a RX queue is done at vhost lib, outside the
> > > > > > > > > > guest, how could the guest take the control here?
> > > > > > > > > > 
> > > > > > > > > > 	--yliu
> > > > > > > > > 
> > > > > > > > > vhost should do what guest told it to.
> > > > > > > > > 
> > > > > > > > > See virtio spec:
> > > > > > > > > 	5.1.6.5.5 Automatic receive steering in multiqueue mode
> > > > > > > > 
> > > > > > > > Spec says:
> > > > > > > > 
> > > > > > > >     After the driver transmitted a packet of a flow on transmitqX,
> > > > > > > >     the device SHOULD cause incoming packets for that flow to be
> > > > > > > >     steered to receiveqX.
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Michael, I still have no idea how vhost could know the flow even
> > > > > > > > after discussion with Huawei. Could you be more specific about
> > > > > > > > this? Say, how could guest know that? And how could guest tell
> > > > > > > > vhost which RX is gonna to use?
> > > > > > > > 
> > > > > > > > Thanks.
> > > > > > > > 
> > > > > > > > 	--yliu
> > > > > > > 
> > > > > > > I don't really understand the question.
> > > > > > > 
> > > > > > > When guests transmits a packet, it makes a decision
> > > > > > > about the flow to use, and maps that to a tx/rx pair of queues.
> > > > > > > 
> > > > > > > It sends packets out on the tx queue and expects device to
> > > > > > > return packets from the same flow on the rx queue.
> > > > > > 
> > > > > > Why? I can understand that there should be a mapping between
> > > > > > flows and queues in a way that there is no re-ordering, but
> > > > > > I can't see the relation of receiving a flow with a TX queue.
> > > > > > 
> > > > > > fbl
> > > > > 
> > > > > That's the way virtio chose to program the rx steering logic.
> > > > > 
> > > > > It's low overhead (no special commands), and
> > > > > works well for TCP when user is an endpoint since rx and tx
> > > > > for tcp are generally tied (because of ack handling).
> > 
> > It is low overhead for the control plane, but not for the data plane.
> 
> Well, there's zero data plane overhead within the guest.
> You can't go lower :)

I agree, but I am talking about vhost-user or whatever means we use to
provide packets to the virtio backend. That will have to distribute
the packets according to the guest's mapping which is not zero overhead.


> > > > > We can discuss other ways, e.g. special commands for guests to
> > > > > program steering.
> > > > > We'd have to first see some data showing the current scheme
> > > > > is problematic somehow.
> > 
> > The issue is that the spec assumes the packets are coming in
> > a serialized way and the distribution will be made by vhost-user
> > but that isn't necessarily true.
> > 
> 
> Making the distribution guest controlled is obviously the right
> thing to do if guest is the endpoint: we need guest scheduler to
> make the decisions, it's the only entity that knows
> how are tasks distributed across VCPUs.

Again, I agree.  My point is that it can also allows no mapping
or full freedom. I don't see that as an option now.

> It's possible that this is not the right thing for when guest
> is just doing bridging between two VNICs:
> are you saying packets should just go from RX queue N
> on eth0 to TX queue N on eth1, making host make all
> the queue selection decisions?

The idea is that the guest could TX on queue N and the host
would push packets from the same stream on RX queue Y. So,
guest is free to send packets on any queue and the host is
free to send packet on any queue as long as both keep a stable
mapping to avoid re-ordering.

What if the guest is not trustable and the host has the requirement
to send priority packets to queue#0?  That is not possible if 
backend is forced to follow guest mapping.

> This sounds reasonable. Since there's a mix of local and
> bridged traffic normally, does this mean we need
> a per-packet flag that tells host to
> ignore the packet for classification purposes?

Real NICs will apply a hash to each coming packet and send out
to a specific queue, then a CPU is selected from there.  So, the
NIC driver or the OS doesn't change that.  Same rationale works
for virtio-net.

Of course, we can use ntuple to force specific streams to go to
specific queues, but that isn't the default policy.


> > > > The issue is that the restriction imposes operations to be done in the
> > > > data path.  For instance, Open vSwitch has N number of threads to manage
> > > > X RX queues. We distribute them in round-robin fashion.  So, the thread
> > > > polling one RX queue will do all the packet processing and push it to the
> > > > TX queue of the other device (vhost-user or not) using the same 'id'.
> > > > 
> > > > Doing so we can avoid locking between threads and TX queues and any other
> > > > extra computation while still keeping the packet ordering/distribution fine.
> > > > 
> > > > However, if vhost-user has to send packets according with guest mapping,
> > > > it will require locking between queues and additional operations to select
> > > > the appropriate queue.  Those actions will cause performance issues.
> > > 
> > > You only need to send updates if guest moves a flow to another queue.
> > > This is very rare since guest must avoid reordering.
> > 
> > OK, maybe I missed something.  Could you point me to the spec talking
> > about the update?
> > 
> 
> It doesn't talk about that really - it's an implementation
> detail. What I am saying is that you can have e.g.
> a per queue data structure with flows using it.
> If you find the flow there, then you know nothing changed
> and there is no need to update other queues.
> 
> 
> 
> > > Oh and you don't have to have locking.  Just update the table and make
> > > the target pick up the new value at leasure, worst case a packet ends up
> > > in the wrong queue.
> > 
> > You do because packets are coming on different vswitch queues and they
> > could get mapped to the same virtio queue enforced by the guest, so some
> > sort of synchronization is needed.
> 
> Right. So to optimize that, you really need a 1:1 mapping, but this
> optimization only makes sense if guest is not in the end processing
> these packets in the application on the same CPU - otherwise you
> are just causing IPIs.

Guest should move the apps to the CPU processing the queues. That's
what Linux does by default and that's why I am saying the requirement
from spec should be about maintaining stable mapping.

> With the per-packet flag to bypass the classifier as suggested above,
> you would do a lookup, find flow is not classified and just forward
> it 1:1 as you wanted to.

That is heavy, we can't afford per packet inspection.

> > That is one thing.  Another is that it will need some mapping between the
> > hash available in the vswitch (not necessary L2~L4) with the hash/queue
> > mapping provided by the guest.  That doesn't require locking, but it's a
> > costly operation.  Alternatively, vswitch could calculate full L2-L4 hash
> > which is also a costly operation.
> > 
> > Packets ending in the wrong queue isn't that bad, but then we need to
> > enforce processing order because re-ordering is really bad.
> > 
> 
> Right. So if you consider a mix of packets with guest as endpoint
> and guest as a bridge, then there's apparently no way out -
> you need to identify the flow somehow in order to know
> which is which.
> 
> I guess one solution is to give up and make it a global
> decision.

My proposal is to:
1) keep stable flow-to-queue mapping stable by default
2) Respect guest's request to map certain flow to specific queue.


> But OTOH I think igb supports calculating the RX hash in hardware:
> it sets NETIF_F_RXHASH on Linux.
> If so, can't that be used for the initial lookup?

Yes, it does. But I can't guarantee all vswitch ports or packets will
have a valid rxhash. Even if we decide to use that, we still need to
move each packet coming from different vswitch queues to specific
virtio queues. (packets crossing queues)


> > > > I see no real benefit from enforcing the guest mapping outside to
> > > > justify all the computation cost, so my suggestion is to change the
> > > > spec to suggest that behavior, but not to require that to be compliant.
> > > > 
> > > > Does that make sense?
> > > > 
> > > > Thanks,
> > > > fbl
> > > 
> > > It's not a question of what the spec says, it's a question of the
> > > quality of implementation: guest needs to be able to balance load
> > > between CPUs serving the queues, this means it needs a way to control
> > > steering.
> >  
> > Indeed, a mapping could be provided by the guest to steer certain flows
> > to specific queues and of course the implementation must follow that.
> > However, it seems that guest could let that mapping simply open too.
> 
> Right, we can add such an option in the spec.

:-)

> > > IMO having dpdk control it makes no sense in the scenario.
> > 
> > Why not? The only requirement should be that the implemention
> > should avoid re-ordering by keeping the mapping stable between
> > streams and queues.
> 
> Well this depends on whether there's an application within
> guest that consumes the flow and does something with
> the data. If yes, then we need to be careful not to
> compete with that application for CPU, otherwise
> it won't be able to produce data.

When you have multiple queues, ideally irqbalance will spread
their interrupts to each CPU. So, when a specific queue receives
a packet, it will generate an interrupt, which runs a softirq
that puts the data into app's socket and schedule the app to run
next.  So, in summary, the app by default will run on the CPU
processing its traffic.

> I guess that's not the case for pcktgen or forwarding,
> in these cases networking is all you care about.

Those use-cases will work regardless.

> > > This is different from dpdk sending packets to real NIC
> > > queues which all operate in parallel.
> > 
> > The goal of multiqueue support is to have them working in parallel.
> > 
> > fbl
> 
> What I meant is "in parallel with the application doing the
> actual logic and producing the packets".


fbl

^ permalink raw reply	[flat|nested] 66+ messages in thread

end of thread, other threads:[~2015-11-17 22:49 UTC | newest]

Thread overview: 66+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-21  3:48 [dpdk-dev] [PATCH v7 0/8] vhost-user multiple queues enabling Yuanhan Liu
2015-10-21  3:48 ` [dpdk-dev] [PATCH v7 1/8] vhost-user: add protocol features support Yuanhan Liu
2015-10-22  9:52   ` Xie, Huawei
2015-10-21  3:48 ` [dpdk-dev] [PATCH v7 2/8] vhost-user: add VHOST_USER_GET_QUEUE_NUM message Yuanhan Liu
2015-10-22  9:38   ` Xie, Huawei
2015-10-21  3:48 ` [dpdk-dev] [PATCH v7 3/8] vhost: vring queue setup for multiple queue support Yuanhan Liu
2015-10-21  4:45   ` Stephen Hemminger
2015-10-21  6:52     ` Yuanhan Liu
2015-10-22  9:49   ` Xie, Huawei
2015-10-22 11:30     ` Yuanhan Liu
2015-10-21  3:48 ` [dpdk-dev] [PATCH v7 4/8] vhost: rxtx: use queue id instead of constant ring index Yuanhan Liu
2015-10-21  4:43   ` Stephen Hemminger
2015-10-21  6:54     ` Yuanhan Liu
2015-10-21  7:16     ` Xie, Huawei
2015-10-21  9:38       ` Ananyev, Konstantin
2015-10-21 15:47         ` Stephen Hemminger
2015-10-21 15:52           ` Thomas Monjalon
2015-10-21 15:57             ` Bruce Richardson
2015-10-21 15:55           ` Bruce Richardson
2015-10-21 16:29             ` Ananyev, Konstantin
2015-10-21 10:31   ` Michael S. Tsirkin
2015-10-21 12:48     ` Yuanhan Liu
2015-10-21 14:26       ` Michael S. Tsirkin
2015-10-21 14:59         ` Yuanhan Liu
2015-10-22  9:49         ` Yuanhan Liu
2015-10-22 11:32           ` Michael S. Tsirkin
2015-10-22 14:07             ` Yuanhan Liu
2015-10-22 14:19               ` Michael S. Tsirkin
2015-10-23  8:02                 ` Yuanhan Liu
2015-10-24  2:34             ` Flavio Leitner
2015-10-24 17:47               ` Michael S. Tsirkin
2015-10-28 20:30                 ` Flavio Leitner
2015-10-28 21:12                   ` Michael S. Tsirkin
2015-11-16 22:20                     ` Flavio Leitner
2015-11-17  8:23                       ` Michael S. Tsirkin
2015-11-17  9:24                         ` Jason Wang
2015-11-17 22:49                         ` Flavio Leitner
2015-10-22  7:26   ` Xie, Huawei
2015-10-21  3:48 ` [dpdk-dev] [PATCH v7 5/8] virtio: fix deadloop due to reading virtio_net_config incorrectly Yuanhan Liu
2015-10-21  3:48 ` [dpdk-dev] [PATCH v7 6/8] vhost-user: enable vhost-user multiple queue Yuanhan Liu
2015-10-21  3:48 ` [dpdk-dev] [PATCH v7 7/8] vhost: add VHOST_USER_SET_VRING_ENABLE message Yuanhan Liu
2015-10-21  3:48 ` [dpdk-dev] [PATCH v7 8/8] doc: update release note for vhost-user mq support Yuanhan Liu
2015-10-22 12:35 ` [dpdk-dev] [PATCH v8 0/8] vhost-user multiple queues enabling Yuanhan Liu
2015-10-22 12:35   ` [dpdk-dev] [PATCH v8 1/8] vhost-user: add protocol features support Yuanhan Liu
2015-10-22 12:35   ` [dpdk-dev] [PATCH v8 2/8] vhost-user: add VHOST_USER_GET_QUEUE_NUM message Yuanhan Liu
2015-10-22 12:35   ` [dpdk-dev] [PATCH v8 3/8] vhost: vring queue setup for multiple queue support Yuanhan Liu
2015-10-26  5:24     ` Tetsuya Mukawa
2015-10-26  5:42       ` Yuanhan Liu
2015-10-27  6:20         ` Tetsuya Mukawa
2015-10-27  9:17           ` Michael S. Tsirkin
2015-10-27  9:30             ` Yuanhan Liu
2015-10-27  9:42               ` Michael S. Tsirkin
2015-10-27  9:51                 ` Thomas Monjalon
2015-10-27  9:55                   ` Michael S. Tsirkin
2015-10-27 10:41                     ` Xie, Huawei
2015-10-27  9:53                 ` Yuanhan Liu
2015-10-22 12:35   ` [dpdk-dev] [PATCH v8 4/8] vhost: rxtx: use queue id instead of constant ring index Yuanhan Liu
2015-10-22 12:35   ` [dpdk-dev] [PATCH v8 5/8] virtio: fix deadloop due to reading virtio_net_config incorrectly Yuanhan Liu
2015-10-22 12:35   ` [dpdk-dev] [PATCH v8 6/8] vhost: add VHOST_USER_SET_VRING_ENABLE message Yuanhan Liu
2015-10-22 12:35   ` [dpdk-dev] [PATCH v8 7/8] vhost-user: enable vhost-user multiple queue Yuanhan Liu
2015-10-22 12:35   ` [dpdk-dev] [PATCH v8 8/8] doc: update release note for vhost-user mq support Yuanhan Liu
2015-10-26 20:22     ` Thomas Monjalon
2015-10-27  9:38       ` Yuanhan Liu
2015-10-26  1:36   ` [dpdk-dev] [PATCH v8 0/8] vhost-user multiple queues enabling Xie, Huawei
2015-10-26  3:09     ` Yuanhan Liu
2015-10-26 20:26     ` Thomas Monjalon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).