* [dpdk-dev] [PATCH v11 1/9] vhost: add the inflight description
2019-10-09 20:48 ` [dpdk-dev] [PATCH v11 0/9] vhost: support inflight share memory protocol feature Jin Yu
@ 2019-10-09 20:48 ` Jin Yu
2019-10-09 20:48 ` [dpdk-dev] [PATCH v11 2/9] vhost: add packed ring Jin Yu
` (8 subsequent siblings)
9 siblings, 0 replies; 27+ messages in thread
From: Jin Yu @ 2019-10-09 20:48 UTC (permalink / raw)
To: dev
Cc: changpeng.liu, maxime.coquelin, tiwei.bie, zhihong.wang, Jin Yu,
Lin Li, Xun Ni, Yu Zhang
This patch add the inflight message description and
the inflight share fd protocol feature flag.
Signed-off-by: Lin Li <lilin24@baidu.com>
Signed-off-by: Xun Ni <nixun@baidu.com>
Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
Signed-off-by: Jin Yu <jin.yu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
v1 - specify the APIs are split-ring only
v2 - fix APIs and judge split or packed
v3 - add rte_vhost_ prefix and fix one issue
v4 - add the packed ring support
v5 - revise get_vring_base func depend on Tiwei's suggestion
v6 - divide patch into small patches
v7 - updated base on Maxime's comments
v8 - updated base on Tiwei's comments
---
lib/librte_vhost/rte_vhost.h | 4 ++++
lib/librte_vhost/vhost_user.h | 8 ++++++++
2 files changed, 12 insertions(+)
diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
index 0226b3eff..9943575ce 100644
--- a/lib/librte_vhost/rte_vhost.h
+++ b/lib/librte_vhost/rte_vhost.h
@@ -71,6 +71,10 @@ extern "C" {
#define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER 11
#endif
+#ifndef VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD
+#define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD 12
+#endif
+
/** Indicate whether protocol features negotiation is supported. */
#ifndef VHOST_USER_F_PROTOCOL_FEATURES
#define VHOST_USER_F_PROTOCOL_FEATURES 30
diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
index 2a650fe4b..17a1d7bca 100644
--- a/lib/librte_vhost/vhost_user.h
+++ b/lib/librte_vhost/vhost_user.h
@@ -112,6 +112,13 @@ typedef struct VhostUserVringArea {
uint64_t offset;
} VhostUserVringArea;
+typedef struct VhostUserInflight {
+ uint64_t mmap_size;
+ uint64_t mmap_offset;
+ uint16_t num_queues;
+ uint16_t queue_size;
+} VhostUserInflight;
+
typedef struct VhostUserMsg {
union {
uint32_t master; /* a VhostUserRequest value */
@@ -134,6 +141,7 @@ typedef struct VhostUserMsg {
struct vhost_iotlb_msg iotlb;
VhostUserCryptoSessionParam crypto_session;
VhostUserVringArea area;
+ VhostUserInflight inflight;
} payload;
int fds[VHOST_MEMORY_MAX_NREGIONS];
int fd_num;
--
2.17.2
^ permalink raw reply [flat|nested] 27+ messages in thread
* [dpdk-dev] [PATCH v11 2/9] vhost: add packed ring
2019-10-09 20:48 ` [dpdk-dev] [PATCH v11 0/9] vhost: support inflight share memory protocol feature Jin Yu
2019-10-09 20:48 ` [dpdk-dev] [PATCH v11 1/9] vhost: add the inflight description Jin Yu
@ 2019-10-09 20:48 ` Jin Yu
2019-10-09 20:48 ` [dpdk-dev] [PATCH v11 3/9] vhost: add the inflight structure Jin Yu
` (7 subsequent siblings)
9 siblings, 0 replies; 27+ messages in thread
From: Jin Yu @ 2019-10-09 20:48 UTC (permalink / raw)
To: dev
Cc: changpeng.liu, maxime.coquelin, tiwei.bie, zhihong.wang, Jin Yu,
Lin Li, Xun Ni, Yu Zhang
This patch add the packed ring in the rte_vhost_vring.
Signed-off-by: Lin Li <lilin24@baidu.com>
Signed-off-by: Xun Ni <nixun@baidu.com>
Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
Signed-off-by: Jin Yu <jin.yu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
lib/librte_vhost/rte_vhost.h | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
index 9943575ce..7257f0965 100644
--- a/lib/librte_vhost/rte_vhost.h
+++ b/lib/librte_vhost/rte_vhost.h
@@ -103,9 +103,18 @@ struct rte_vhost_memory {
};
struct rte_vhost_vring {
- struct vring_desc *desc;
- struct vring_avail *avail;
- struct vring_used *used;
+ union {
+ struct vring_desc *desc;
+ struct vring_packed_desc *desc_packed;
+ };
+ union {
+ struct vring_avail *avail;
+ struct vring_packed_desc_event *driver_event;
+ };
+ union {
+ struct vring_used *used;
+ struct vring_packed_desc_event *device_event;
+ };
uint64_t log_guest_addr;
/** Deprecated, use rte_vhost_vring_call() instead. */
--
2.17.2
^ permalink raw reply [flat|nested] 27+ messages in thread
* [dpdk-dev] [PATCH v11 3/9] vhost: add the inflight structure
2019-10-09 20:48 ` [dpdk-dev] [PATCH v11 0/9] vhost: support inflight share memory protocol feature Jin Yu
2019-10-09 20:48 ` [dpdk-dev] [PATCH v11 1/9] vhost: add the inflight description Jin Yu
2019-10-09 20:48 ` [dpdk-dev] [PATCH v11 2/9] vhost: add packed ring Jin Yu
@ 2019-10-09 20:48 ` Jin Yu
2019-10-11 10:01 ` Maxime Coquelin
2019-10-09 20:48 ` [dpdk-dev] [PATCH v11 4/9] vhost: add two new messages to support a shared buffer Jin Yu
` (6 subsequent siblings)
9 siblings, 1 reply; 27+ messages in thread
From: Jin Yu @ 2019-10-09 20:48 UTC (permalink / raw)
To: dev
Cc: changpeng.liu, maxime.coquelin, tiwei.bie, zhihong.wang, Jin Yu,
Lin Li, Xun Ni, Yu Zhang
This patch adds the inflight queue region structure include
the split and packed.
Signed-off-by: Lin Li <lilin24@baidu.com>
Signed-off-by: Xun Ni <nixun@baidu.com>
Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
Signed-off-by: Jin Yu <jin.yu@intel.com>
---
lib/librte_vhost/rte_vhost.h | 43 ++++++++++++++++++++++++++++++++++++
1 file changed, 43 insertions(+)
diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
index 7257f0965..5241e36e8 100644
--- a/lib/librte_vhost/rte_vhost.h
+++ b/lib/librte_vhost/rte_vhost.h
@@ -102,6 +102,49 @@ struct rte_vhost_memory {
struct rte_vhost_mem_region regions[];
};
+struct rte_vhost_inflight_desc_split {
+ uint8_t inflight;
+ uint8_t padding[5];
+ uint16_t next;
+ uint64_t counter;
+};
+
+struct rte_vhost_inflight_info_split {
+ uint64_t features;
+ uint16_t version;
+ uint16_t desc_num;
+ uint16_t last_inflight_io;
+ uint16_t used_idx;
+ struct rte_vhost_inflight_desc_split desc[0];
+};
+
+struct rte_vhost_inflight_desc_packed {
+ uint8_t inflight;
+ uint8_t padding;
+ uint16_t next;
+ uint16_t last;
+ uint16_t num;
+ uint64_t counter;
+ uint16_t id;
+ uint16_t flags;
+ uint32_t len;
+ uint64_t addr;
+};
+
+struct rte_vhost_inflight_info_packed {
+ uint64_t features;
+ uint16_t version;
+ uint16_t desc_num;
+ uint16_t free_head;
+ uint16_t old_free_head;
+ uint16_t used_idx;
+ uint16_t old_used_idx;
+ uint8_t used_wrap_counter;
+ uint8_t old_used_wrap_counter;
+ uint8_t padding[7];
+ struct rte_vhost_inflight_desc_packed desc[0];
+};
+
struct rte_vhost_vring {
union {
struct vring_desc *desc;
--
2.17.2
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [dpdk-dev] [PATCH v11 3/9] vhost: add the inflight structure
2019-10-09 20:48 ` [dpdk-dev] [PATCH v11 3/9] vhost: add the inflight structure Jin Yu
@ 2019-10-11 10:01 ` Maxime Coquelin
0 siblings, 0 replies; 27+ messages in thread
From: Maxime Coquelin @ 2019-10-11 10:01 UTC (permalink / raw)
To: Jin Yu, dev
Cc: changpeng.liu, tiwei.bie, zhihong.wang, Lin Li, Xun Ni, Yu Zhang
On 10/9/19 10:48 PM, Jin Yu wrote:
> This patch adds the inflight queue region structure include
> the split and packed.
>
> Signed-off-by: Lin Li <lilin24@baidu.com>
> Signed-off-by: Xun Ni <nixun@baidu.com>
> Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
> Signed-off-by: Jin Yu <jin.yu@intel.com>
> ---
> lib/librte_vhost/rte_vhost.h | 43 ++++++++++++++++++++++++++++++++++++
> 1 file changed, 43 insertions(+)
>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
^ permalink raw reply [flat|nested] 27+ messages in thread
* [dpdk-dev] [PATCH v11 4/9] vhost: add two new messages to support a shared buffer
2019-10-09 20:48 ` [dpdk-dev] [PATCH v11 0/9] vhost: support inflight share memory protocol feature Jin Yu
` (2 preceding siblings ...)
2019-10-09 20:48 ` [dpdk-dev] [PATCH v11 3/9] vhost: add the inflight structure Jin Yu
@ 2019-10-09 20:48 ` Jin Yu
2019-10-11 10:07 ` Maxime Coquelin
2019-10-09 20:48 ` [dpdk-dev] [PATCH v11 5/9] vhost: checkout the resubmit inflight information Jin Yu
` (5 subsequent siblings)
9 siblings, 1 reply; 27+ messages in thread
From: Jin Yu @ 2019-10-09 20:48 UTC (permalink / raw)
To: dev
Cc: changpeng.liu, maxime.coquelin, tiwei.bie, zhihong.wang, Jin Yu,
Lin Li, Xun Ni, Yu Zhang
This patch introduces two new messages VHOST_USER_GET_INFLIGHT_FD
and VHOST_USER_SET_INFLIGHT_FD to support transferring a shared
buffer between qemu and backend.
Signed-off-by: Lin Li <lilin24@baidu.com>
Signed-off-by: Xun Ni <nixun@baidu.com>
Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
Signed-off-by: Jin Yu <jin.yu@intel.com>
---
lib/librte_vhost/vhost.h | 7 +
lib/librte_vhost/vhost_user.c | 243 +++++++++++++++++++++++++++++++++-
lib/librte_vhost/vhost_user.h | 4 +-
3 files changed, 252 insertions(+), 2 deletions(-)
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 884befa85..d67ba849a 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -286,6 +286,12 @@ struct guest_page {
uint64_t size;
};
+struct inflight_mem_info {
+ int fd;
+ void *addr;
+ uint64_t size;
+};
+
/**
* Device structure contains all configuration information relating
* to the device.
@@ -303,6 +309,7 @@ struct virtio_net {
uint32_t nr_vring;
int dequeue_zero_copy;
struct vhost_virtqueue *virtqueue[VHOST_MAX_QUEUE_PAIRS * 2];
+ struct inflight_mem_info *inflight_info;
#define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ)
char ifname[IF_NAME_SZ];
uint64_t log_size;
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index c9e29ece8..a7bc42050 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -37,6 +37,10 @@
#ifdef RTE_LIBRTE_VHOST_POSTCOPY
#include <linux/userfaultfd.h>
#endif
+#ifdef F_ADD_SEALS /* if file sealing is supported, so is memfd */
+#include <linux/memfd.h>
+#define MEMFD_SUPPORTED
+#endif
#include <rte_common.h>
#include <rte_malloc.h>
@@ -49,6 +53,9 @@
#define VIRTIO_MIN_MTU 68
#define VIRTIO_MAX_MTU 65535
+#define INFLIGHT_ALIGNMENT 64
+#define INFLIGHT_VERSION 0x1
+
static const char *vhost_message_str[VHOST_USER_MAX] = {
[VHOST_USER_NONE] = "VHOST_USER_NONE",
[VHOST_USER_GET_FEATURES] = "VHOST_USER_GET_FEATURES",
@@ -78,6 +85,8 @@ static const char *vhost_message_str[VHOST_USER_MAX] = {
[VHOST_USER_POSTCOPY_ADVISE] = "VHOST_USER_POSTCOPY_ADVISE",
[VHOST_USER_POSTCOPY_LISTEN] = "VHOST_USER_POSTCOPY_LISTEN",
[VHOST_USER_POSTCOPY_END] = "VHOST_USER_POSTCOPY_END",
+ [VHOST_USER_GET_INFLIGHT_FD] = "VHOST_USER_GET_INFLIGHT_FD",
+ [VHOST_USER_SET_INFLIGHT_FD] = "VHOST_USER_SET_INFLIGHT_FD",
};
static int send_vhost_reply(int sockfd, struct VhostUserMsg *msg);
@@ -160,6 +169,22 @@ vhost_backend_cleanup(struct virtio_net *dev)
dev->log_addr = 0;
}
+ if (dev->inflight_info) {
+ if (dev->inflight_info->addr) {
+ munmap(dev->inflight_info->addr,
+ dev->inflight_info->size);
+ dev->inflight_info->addr = NULL;
+ }
+
+ if (dev->inflight_info->fd > 0) {
+ close(dev->inflight_info->fd);
+ dev->inflight_info->fd = -1;
+ }
+
+ free(dev->inflight_info);
+ dev->inflight_info = NULL;
+ }
+
if (dev->slave_req_fd >= 0) {
close(dev->slave_req_fd);
dev->slave_req_fd = -1;
@@ -1165,6 +1190,221 @@ virtio_is_ready(struct virtio_net *dev)
return 1;
}
+static void *
+inflight_mem_alloc(const char *name, size_t size, int *fd)
+{
+ void *ptr;
+ int mfd = -1;
+ char fname[20] = "/tmp/memfd-XXXXXX";
+
+ *fd = -1;
+#ifdef MEMFD_SUPPORTED
+ mfd = memfd_create(name, MFD_CLOEXEC);
+#else
+ RTE_SET_USED(name);
+#endif
+ if (mfd == -1) {
+ mfd = mkstemp(fname);
+ if (mfd == -1) {
+ RTE_LOG(ERR, VHOST_CONFIG,
+ "failed to get inflight buffer fd\n");
+ return NULL;
+ }
+
+ unlink(fname);
+ }
+
+ if (ftruncate(mfd, size) == -1) {
+ RTE_LOG(ERR, VHOST_CONFIG,
+ "failed to alloc inflight buffer\n");
+ close(mfd);
+ return NULL;
+ }
+
+ ptr = mmap(0, size, PROT_READ | PROT_WRITE, MAP_SHARED, mfd, 0);
+ if (ptr == MAP_FAILED) {
+ RTE_LOG(ERR, VHOST_CONFIG,
+ "failed to mmap inflight buffer\n");
+ close(mfd);
+ return NULL;
+ }
+
+ *fd = mfd;
+ return ptr;
+}
+
+static uint32_t
+get_pervq_shm_size_split(uint16_t queue_size)
+{
+ return RTE_ALIGN_MUL_CEIL(sizeof(struct rte_vhost_inflight_desc_split) *
+ queue_size + sizeof(uint64_t) +
+ sizeof(uint16_t) * 4, INFLIGHT_ALIGNMENT);
+}
+
+static uint32_t
+get_pervq_shm_size_packed(uint16_t queue_size)
+{
+ return RTE_ALIGN_MUL_CEIL(sizeof(struct rte_vhost_inflight_desc_packed)
+ * queue_size + sizeof(uint64_t) +
+ sizeof(uint16_t) * 6 + sizeof(uint8_t) * 9,
+ INFLIGHT_ALIGNMENT);
+}
+
+static int
+vhost_user_get_inflight_fd(struct virtio_net **pdev,
+ VhostUserMsg *msg,
+ int main_fd __rte_unused)
+{
+ struct rte_vhost_inflight_info_packed *inflight_packed;
+ uint64_t pervq_inflight_size, mmap_size;
+ uint16_t num_queues, queue_size;
+ struct virtio_net *dev = *pdev;
+ int fd, i, j;
+ void *addr;
+
+ if (msg->size != sizeof(msg->payload.inflight)) {
+ RTE_LOG(ERR, VHOST_CONFIG,
+ "invalid get_inflight_fd message size is %d\n",
+ msg->size);
+ return RTE_VHOST_MSG_RESULT_ERR;
+ }
+
+ if (dev->inflight_info == NULL) {
+ dev->inflight_info = calloc(1,
+ sizeof(struct inflight_mem_info));
+ if (!dev->inflight_info) {
+ RTE_LOG(ERR, VHOST_CONFIG,
+ "failed to alloc dev inflight area\n");
+ return RTE_VHOST_MSG_RESULT_ERR;
+ }
+ }
+
+ num_queues = msg->payload.inflight.num_queues;
+ queue_size = msg->payload.inflight.queue_size;
+
+ RTE_LOG(INFO, VHOST_CONFIG, "get_inflight_fd num_queues: %u\n",
+ msg->payload.inflight.num_queues);
+ RTE_LOG(INFO, VHOST_CONFIG, "get_inflight_fd queue_size: %u\n",
+ msg->payload.inflight.queue_size);
+
+ if (vq_is_packed(dev))
+ pervq_inflight_size = get_pervq_shm_size_packed(queue_size);
+ else
+ pervq_inflight_size = get_pervq_shm_size_split(queue_size);
+
+ mmap_size = num_queues * pervq_inflight_size;
+ addr = inflight_mem_alloc("vhost-inflight", mmap_size, &fd);
+ if (!addr) {
+ RTE_LOG(ERR, VHOST_CONFIG,
+ "failed to alloc vhost inflight area\n");
+ msg->payload.inflight.mmap_size = 0;
+ return RTE_VHOST_MSG_RESULT_ERR;
+ }
+ memset(addr, 0, mmap_size);
+
+ dev->inflight_info->addr = addr;
+ dev->inflight_info->size = msg->payload.inflight.mmap_size = mmap_size;
+ dev->inflight_info->fd = msg->fds[0] = fd;
+ msg->payload.inflight.mmap_offset = 0;
+ msg->fd_num = 1;
+
+ if (vq_is_packed(dev)) {
+ for (i = 0; i < num_queues; i++) {
+ inflight_packed =
+ (struct rte_vhost_inflight_info_packed *)addr;
+ inflight_packed->used_wrap_counter = 1;
+ inflight_packed->old_used_wrap_counter = 1;
+ for (j = 0; j < queue_size; j++)
+ inflight_packed->desc[j].next = j + 1;
+ addr = (void *)((char *)addr + pervq_inflight_size);
+ }
+ }
+
+ RTE_LOG(INFO, VHOST_CONFIG,
+ "send inflight mmap_size: %"PRIu64"\n",
+ msg->payload.inflight.mmap_size);
+ RTE_LOG(INFO, VHOST_CONFIG,
+ "send inflight mmap_offset: %"PRIu64"\n",
+ msg->payload.inflight.mmap_offset);
+ RTE_LOG(INFO, VHOST_CONFIG,
+ "send inflight fd: %d\n", msg->fds[0]);
+
+ return RTE_VHOST_MSG_RESULT_REPLY;
+}
+
+static int
+vhost_user_set_inflight_fd(struct virtio_net **pdev, VhostUserMsg *msg,
+ int main_fd __rte_unused)
+{
+ uint64_t mmap_size, mmap_offset;
+ uint16_t num_queues, queue_size;
+ uint32_t pervq_inflight_size;
+ struct virtio_net *dev = *pdev;
+ void *addr;
+ int fd;
+
+ fd = msg->fds[0];
+ if (msg->size != sizeof(msg->payload.inflight) || fd < 0) {
+ RTE_LOG(ERR, VHOST_CONFIG,
+ "invalid set_inflight_fd message size is %d,fd is %d\n",
+ msg->size, fd);
+ return RTE_VHOST_MSG_RESULT_ERR;
+ }
+
+ mmap_size = msg->payload.inflight.mmap_size;
+ mmap_offset = msg->payload.inflight.mmap_offset;
+ num_queues = msg->payload.inflight.num_queues;
+ queue_size = msg->payload.inflight.queue_size;
+
+ if (vq_is_packed(dev))
+ pervq_inflight_size = get_pervq_shm_size_packed(queue_size);
+ else
+ pervq_inflight_size = get_pervq_shm_size_split(queue_size);
+
+ RTE_LOG(INFO, VHOST_CONFIG,
+ "set_inflight_fd mmap_size: %"PRIu64"\n", mmap_size);
+ RTE_LOG(INFO, VHOST_CONFIG,
+ "set_inflight_fd mmap_offset: %"PRIu64"\n", mmap_offset);
+ RTE_LOG(INFO, VHOST_CONFIG,
+ "set_inflight_fd num_queues: %u\n", num_queues);
+ RTE_LOG(INFO, VHOST_CONFIG,
+ "set_inflight_fd queue_size: %u\n", queue_size);
+ RTE_LOG(INFO, VHOST_CONFIG,
+ "set_inflight_fd fd: %d\n", fd);
+ RTE_LOG(INFO, VHOST_CONFIG,
+ "set_inflight_fd pervq_inflight_size: %d\n",
+ pervq_inflight_size);
+
+ if (!dev->inflight_info) {
+ dev->inflight_info = calloc(1,
+ sizeof(struct inflight_mem_info));
+ if (dev->inflight_info == NULL) {
+ RTE_LOG(ERR, VHOST_CONFIG,
+ "failed to alloc dev inflight area\n");
+ return RTE_VHOST_MSG_RESULT_ERR;
+ }
+ }
+
+ if (dev->inflight_info->addr)
+ munmap(dev->inflight_info->addr, dev->inflight_info->size);
+
+ addr = mmap(0, mmap_size, PROT_READ | PROT_WRITE, MAP_SHARED,
+ fd, mmap_offset);
+ if (addr == MAP_FAILED) {
+ RTE_LOG(ERR, VHOST_CONFIG, "failed to mmap share memory.\n");
+ return RTE_VHOST_MSG_RESULT_ERR;
+ }
+
+ if (dev->inflight_info->fd)
+ close(dev->inflight_info->fd);
+
+ dev->inflight_info->fd = fd;
+ dev->inflight_info->addr = addr;
+ dev->inflight_info->size = mmap_size;
+
+ return RTE_VHOST_MSG_RESULT_OK;
+}
+
static int
vhost_user_set_vring_call(struct virtio_net **pdev, struct VhostUserMsg *msg,
int main_fd __rte_unused)
@@ -1762,9 +2002,10 @@ static vhost_message_handler_t vhost_message_handlers[VHOST_USER_MAX] = {
[VHOST_USER_POSTCOPY_ADVISE] = vhost_user_set_postcopy_advise,
[VHOST_USER_POSTCOPY_LISTEN] = vhost_user_set_postcopy_listen,
[VHOST_USER_POSTCOPY_END] = vhost_user_postcopy_end,
+ [VHOST_USER_GET_INFLIGHT_FD] = vhost_user_get_inflight_fd,
+ [VHOST_USER_SET_INFLIGHT_FD] = vhost_user_set_inflight_fd,
};
-
/* return bytes# of read on success or negative val on failure. */
static int
read_vhost_message(int sockfd, struct VhostUserMsg *msg)
diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
index 17a1d7bca..6563f7315 100644
--- a/lib/librte_vhost/vhost_user.h
+++ b/lib/librte_vhost/vhost_user.h
@@ -54,7 +54,9 @@ typedef enum VhostUserRequest {
VHOST_USER_POSTCOPY_ADVISE = 28,
VHOST_USER_POSTCOPY_LISTEN = 29,
VHOST_USER_POSTCOPY_END = 30,
- VHOST_USER_MAX = 31
+ VHOST_USER_GET_INFLIGHT_FD = 31,
+ VHOST_USER_SET_INFLIGHT_FD = 32,
+ VHOST_USER_MAX = 33
} VhostUserRequest;
typedef enum VhostUserSlaveRequest {
--
2.17.2
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [dpdk-dev] [PATCH v11 4/9] vhost: add two new messages to support a shared buffer
2019-10-09 20:48 ` [dpdk-dev] [PATCH v11 4/9] vhost: add two new messages to support a shared buffer Jin Yu
@ 2019-10-11 10:07 ` Maxime Coquelin
0 siblings, 0 replies; 27+ messages in thread
From: Maxime Coquelin @ 2019-10-11 10:07 UTC (permalink / raw)
To: Jin Yu, dev
Cc: changpeng.liu, tiwei.bie, zhihong.wang, Lin Li, Xun Ni, Yu Zhang
On 10/9/19 10:48 PM, Jin Yu wrote:
> This patch introduces two new messages VHOST_USER_GET_INFLIGHT_FD
> and VHOST_USER_SET_INFLIGHT_FD to support transferring a shared
> buffer between qemu and backend.
>
> Signed-off-by: Lin Li <lilin24@baidu.com>
> Signed-off-by: Xun Ni <nixun@baidu.com>
> Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
> Signed-off-by: Jin Yu <jin.yu@intel.com>
> ---
> lib/librte_vhost/vhost.h | 7 +
> lib/librte_vhost/vhost_user.c | 243 +++++++++++++++++++++++++++++++++-
> lib/librte_vhost/vhost_user.h | 4 +-
> 3 files changed, 252 insertions(+), 2 deletions(-)
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
^ permalink raw reply [flat|nested] 27+ messages in thread
* [dpdk-dev] [PATCH v11 5/9] vhost: checkout the resubmit inflight information
2019-10-09 20:48 ` [dpdk-dev] [PATCH v11 0/9] vhost: support inflight share memory protocol feature Jin Yu
` (3 preceding siblings ...)
2019-10-09 20:48 ` [dpdk-dev] [PATCH v11 4/9] vhost: add two new messages to support a shared buffer Jin Yu
@ 2019-10-09 20:48 ` Jin Yu
2019-10-11 10:09 ` Maxime Coquelin
2019-10-09 20:48 ` [dpdk-dev] [PATCH v11 6/9] vhost: add the APIs to operate inflight ring Jin Yu
` (4 subsequent siblings)
9 siblings, 1 reply; 27+ messages in thread
From: Jin Yu @ 2019-10-09 20:48 UTC (permalink / raw)
To: dev
Cc: changpeng.liu, maxime.coquelin, tiwei.bie, zhihong.wang, Jin Yu,
Lin Li, Xun Ni, Yu Zhang
This patch shows how to checkout the inflight ring and construct
the resubmit information also include destroying resubmit info.
Signed-off-by: Lin Li <lilin24@baidu.com>
Signed-off-by: Xun Ni <nixun@baidu.com>
Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
Signed-off-by: Jin Yu <jin.yu@intel.com>
---
lib/librte_vhost/rte_vhost.h | 19 +++
lib/librte_vhost/vhost.c | 29 ++++-
lib/librte_vhost/vhost.h | 9 ++
lib/librte_vhost/vhost_user.c | 217 +++++++++++++++++++++++++++++++++-
4 files changed, 271 insertions(+), 3 deletions(-)
diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
index 5241e36e8..95e4d720e 100644
--- a/lib/librte_vhost/rte_vhost.h
+++ b/lib/librte_vhost/rte_vhost.h
@@ -145,6 +145,25 @@ struct rte_vhost_inflight_info_packed {
struct rte_vhost_inflight_desc_packed desc[0];
};
+struct rte_vhost_resubmit_desc {
+ uint16_t index;
+ uint64_t counter;
+};
+
+struct rte_vhost_resubmit_info {
+ struct rte_vhost_resubmit_desc *resubmit_list;
+ uint16_t resubmit_num;
+};
+
+struct rte_vhost_ring_inflight {
+ union {
+ struct rte_vhost_inflight_info_split *inflight_split;
+ struct rte_vhost_inflight_info_packed *inflight_packed;
+ };
+
+ struct rte_vhost_resubmit_info *resubmit_inflight;
+};
+
struct rte_vhost_vring {
union {
struct vring_desc *desc;
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 981837b5d..f8660edbf 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -242,6 +242,31 @@ cleanup_vq(struct vhost_virtqueue *vq, int destroy)
close(vq->kickfd);
}
+void
+cleanup_vq_inflight(struct virtio_net *dev, struct vhost_virtqueue *vq)
+{
+ if (!(dev->protocol_features &
+ (1ULL << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD)))
+ return;
+
+ if (vq_is_packed(dev)) {
+ if (vq->inflight_packed)
+ vq->inflight_packed = NULL;
+ } else {
+ if (vq->inflight_split)
+ vq->inflight_split = NULL;
+ }
+
+ if (vq->resubmit_inflight) {
+ if (vq->resubmit_inflight->resubmit_list) {
+ free(vq->resubmit_inflight->resubmit_list);
+ vq->resubmit_inflight->resubmit_list = NULL;
+ }
+ free(vq->resubmit_inflight);
+ vq->resubmit_inflight = NULL;
+ }
+}
+
/*
* Unmap any memory, close any file descriptors and
* free any memory owned by a device.
@@ -253,8 +278,10 @@ cleanup_device(struct virtio_net *dev, int destroy)
vhost_backend_cleanup(dev);
- for (i = 0; i < dev->nr_vring; i++)
+ for (i = 0; i < dev->nr_vring; i++) {
cleanup_vq(dev->virtqueue[i], destroy);
+ cleanup_vq_inflight(dev, dev->virtqueue[i]);
+ }
}
void
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index d67ba849a..ab95999c4 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -128,6 +128,14 @@ struct vhost_virtqueue {
/* Physical address of used ring, for logging */
uint64_t log_guest_addr;
+ /* inflight share memory info */
+ union {
+ struct rte_vhost_inflight_info_split *inflight_split;
+ struct rte_vhost_inflight_info_packed *inflight_packed;
+ };
+ struct rte_vhost_resubmit_info *resubmit_inflight;
+ uint64_t global_counter;
+
uint16_t nr_zmbuf;
uint16_t zmbuf_size;
uint16_t last_zmbuf_idx;
@@ -474,6 +482,7 @@ void vhost_destroy_device(int);
void vhost_destroy_device_notify(struct virtio_net *dev);
void cleanup_vq(struct vhost_virtqueue *vq, int destroy);
+void cleanup_vq_inflight(struct virtio_net *dev, struct vhost_virtqueue *vq);
void free_vq(struct virtio_net *dev, struct vhost_virtqueue *vq);
int alloc_vring_queue(struct virtio_net *dev, uint32_t vring_idx);
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index a7bc42050..5cc16cd91 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -331,6 +331,7 @@ vhost_user_set_features(struct virtio_net **pdev, struct VhostUserMsg *msg,
dev->virtqueue[dev->nr_vring] = NULL;
cleanup_vq(vq, 1);
+ cleanup_vq_inflight(dev, vq);
free_vq(dev, vq);
}
}
@@ -1338,10 +1339,11 @@ vhost_user_set_inflight_fd(struct virtio_net **pdev, VhostUserMsg *msg,
{
uint64_t mmap_size, mmap_offset;
uint16_t num_queues, queue_size;
- uint32_t pervq_inflight_size;
struct virtio_net *dev = *pdev;
+ uint32_t pervq_inflight_size;
+ struct vhost_virtqueue *vq;
void *addr;
- int fd;
+ int fd, i;
fd = msg->fds[0];
if (msg->size != sizeof(msg->payload.inflight) || fd < 0) {
@@ -1402,6 +1404,18 @@ vhost_user_set_inflight_fd(struct virtio_net **pdev, VhostUserMsg *msg,
dev->inflight_info->addr = addr;
dev->inflight_info->size = mmap_size;
+ for (i = 0; i < num_queues; i++) {
+ vq = dev->virtqueue[i];
+ if (vq_is_packed(dev)) {
+ vq->inflight_packed = addr;
+ vq->inflight_packed->desc_num = queue_size;
+ } else {
+ vq->inflight_split = addr;
+ vq->inflight_split->desc_num = queue_size;
+ }
+ addr = (void *)((char *)addr + pervq_inflight_size);
+ }
+
return RTE_VHOST_MSG_RESULT_OK;
}
@@ -1441,6 +1455,191 @@ static int vhost_user_set_vring_err(struct virtio_net **pdev __rte_unused,
return RTE_VHOST_MSG_RESULT_OK;
}
+static int
+resubmit_desc_compare(const void *a, const void *b)
+{
+ const struct rte_vhost_resubmit_desc *desc0 = a;
+ const struct rte_vhost_resubmit_desc *desc1 = b;
+
+ if (desc1->counter > desc0->counter)
+ return 1;
+
+ return -1;
+}
+
+static int
+vhost_check_queue_inflights_split(struct virtio_net *dev,
+ struct vhost_virtqueue *vq)
+{
+ uint16_t i;
+ uint16_t resubmit_num = 0, last_io, num;
+ struct vring_used *used = vq->used;
+ struct rte_vhost_resubmit_info *resubmit;
+ struct rte_vhost_inflight_info_split *inflight_split;
+
+ if (!(dev->protocol_features &
+ (1ULL << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD)))
+ return RTE_VHOST_MSG_RESULT_OK;
+
+ if ((!vq->inflight_split))
+ return RTE_VHOST_MSG_RESULT_ERR;
+
+ if (!vq->inflight_split->version) {
+ vq->inflight_split->version = INFLIGHT_VERSION;
+ return RTE_VHOST_MSG_RESULT_OK;
+ }
+
+ if (vq->resubmit_inflight)
+ return RTE_VHOST_MSG_RESULT_OK;
+
+ inflight_split = vq->inflight_split;
+ vq->global_counter = 0;
+ last_io = inflight_split->last_inflight_io;
+
+ if (inflight_split->used_idx != used->idx) {
+ inflight_split->desc[last_io].inflight = 0;
+ rte_smp_mb();
+ inflight_split->used_idx = used->idx;
+ }
+
+ for (i = 0; i < inflight_split->desc_num; i++) {
+ if (inflight_split->desc[i].inflight == 1)
+ resubmit_num++;
+ }
+
+ vq->last_avail_idx += resubmit_num;
+
+ if (resubmit_num) {
+ resubmit = calloc(1, sizeof(struct rte_vhost_resubmit_info));
+ if (!resubmit) {
+ RTE_LOG(ERR, VHOST_CONFIG,
+ "failed to allocate memory for resubmit info.\n");
+ return RTE_VHOST_MSG_RESULT_ERR;
+ }
+
+ resubmit->resubmit_list = calloc(resubmit_num,
+ sizeof(struct rte_vhost_resubmit_desc));
+ if (!resubmit->resubmit_list) {
+ RTE_LOG(ERR, VHOST_CONFIG,
+ "failed to allocate memory for inflight desc.\n");
+ free(resubmit);
+ return RTE_VHOST_MSG_RESULT_ERR;
+ }
+
+ num = 0;
+ for (i = 0; i < vq->inflight_split->desc_num; i++) {
+ if (vq->inflight_split->desc[i].inflight == 1) {
+ resubmit->resubmit_list[num].index = i;
+ resubmit->resubmit_list[num].counter =
+ inflight_split->desc[i].counter;
+ num++;
+ }
+ }
+ resubmit->resubmit_num = num;
+
+ if (resubmit->resubmit_num > 1)
+ qsort(resubmit->resubmit_list, resubmit->resubmit_num,
+ sizeof(struct rte_vhost_resubmit_desc),
+ resubmit_desc_compare);
+
+ vq->global_counter = resubmit->resubmit_list[0].counter + 1;
+ vq->resubmit_inflight = resubmit;
+ }
+
+ return RTE_VHOST_MSG_RESULT_OK;
+}
+
+static int
+vhost_check_queue_inflights_packed(struct virtio_net *dev,
+ struct vhost_virtqueue *vq)
+{
+ uint16_t i;
+ uint16_t resubmit_num = 0, old_used_idx, num;
+ struct rte_vhost_resubmit_info *resubmit;
+ struct rte_vhost_inflight_info_packed *inflight_packed;
+
+ if (!(dev->protocol_features &
+ (1ULL << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD)))
+ return RTE_VHOST_MSG_RESULT_OK;
+
+ if ((!vq->inflight_packed))
+ return RTE_VHOST_MSG_RESULT_ERR;
+
+ if (!vq->inflight_packed->version) {
+ vq->inflight_packed->version = INFLIGHT_VERSION;
+ return RTE_VHOST_MSG_RESULT_OK;
+ }
+
+ if (vq->resubmit_inflight)
+ return RTE_VHOST_MSG_RESULT_OK;
+
+ inflight_packed = vq->inflight_packed;
+ vq->global_counter = 0;
+ old_used_idx = inflight_packed->old_used_idx;
+
+ if (inflight_packed->used_idx != old_used_idx) {
+ if (inflight_packed->desc[old_used_idx].inflight == 0) {
+ inflight_packed->old_used_idx =
+ inflight_packed->used_idx;
+ inflight_packed->old_used_wrap_counter =
+ inflight_packed->used_wrap_counter;
+ inflight_packed->old_free_head =
+ inflight_packed->free_head;
+ } else {
+ inflight_packed->used_idx =
+ inflight_packed->old_used_idx;
+ inflight_packed->used_wrap_counter =
+ inflight_packed->old_used_wrap_counter;
+ inflight_packed->free_head =
+ inflight_packed->old_free_head;
+ }
+ }
+
+ for (i = 0; i < inflight_packed->desc_num; i++) {
+ if (inflight_packed->desc[i].inflight == 1)
+ resubmit_num++;
+ }
+
+ if (resubmit_num) {
+ resubmit = calloc(1, sizeof(struct rte_vhost_resubmit_info));
+ if (resubmit == NULL) {
+ RTE_LOG(ERR, VHOST_CONFIG,
+ "failed to allocate memory for resubmit info.\n");
+ return RTE_VHOST_MSG_RESULT_ERR;
+ }
+
+ resubmit->resubmit_list = calloc(resubmit_num,
+ sizeof(struct rte_vhost_resubmit_desc));
+ if (resubmit->resubmit_list == NULL) {
+ RTE_LOG(ERR, VHOST_CONFIG,
+ "failed to allocate memory for resubmit desc.\n");
+ free(resubmit);
+ return RTE_VHOST_MSG_RESULT_ERR;
+ }
+
+ num = 0;
+ for (i = 0; i < inflight_packed->desc_num; i++) {
+ if (vq->inflight_packed->desc[i].inflight == 1) {
+ resubmit->resubmit_list[num].index = i;
+ resubmit->resubmit_list[num].counter =
+ inflight_packed->desc[i].counter;
+ num++;
+ }
+ }
+ resubmit->resubmit_num = num;
+
+ if (resubmit->resubmit_num > 1)
+ qsort(resubmit->resubmit_list, resubmit->resubmit_num,
+ sizeof(struct rte_vhost_resubmit_desc),
+ resubmit_desc_compare);
+
+ vq->global_counter = resubmit->resubmit_list[0].counter + 1;
+ vq->resubmit_inflight = resubmit;
+ }
+
+ return RTE_VHOST_MSG_RESULT_OK;
+}
+
static int
vhost_user_set_vring_kick(struct virtio_net **pdev, struct VhostUserMsg *msg,
int main_fd __rte_unused)
@@ -1482,6 +1681,20 @@ vhost_user_set_vring_kick(struct virtio_net **pdev, struct VhostUserMsg *msg,
close(vq->kickfd);
vq->kickfd = file.fd;
+ if (vq_is_packed(dev)) {
+ if (vhost_check_queue_inflights_packed(dev, vq)) {
+ RTE_LOG(ERR, VHOST_CONFIG,
+ "failed to inflights for vq: %d\n", file.index);
+ return RTE_VHOST_MSG_RESULT_ERR;
+ }
+ } else {
+ if (vhost_check_queue_inflights_split(dev, vq)) {
+ RTE_LOG(ERR, VHOST_CONFIG,
+ "failed to inflights for vq: %d\n", file.index);
+ return RTE_VHOST_MSG_RESULT_ERR;
+ }
+ }
+
return RTE_VHOST_MSG_RESULT_OK;
}
--
2.17.2
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [dpdk-dev] [PATCH v11 5/9] vhost: checkout the resubmit inflight information
2019-10-09 20:48 ` [dpdk-dev] [PATCH v11 5/9] vhost: checkout the resubmit inflight information Jin Yu
@ 2019-10-11 10:09 ` Maxime Coquelin
0 siblings, 0 replies; 27+ messages in thread
From: Maxime Coquelin @ 2019-10-11 10:09 UTC (permalink / raw)
To: Jin Yu, dev
Cc: changpeng.liu, tiwei.bie, zhihong.wang, Lin Li, Xun Ni, Yu Zhang
On 10/9/19 10:48 PM, Jin Yu wrote:
> This patch shows how to checkout the inflight ring and construct
> the resubmit information also include destroying resubmit info.
>
> Signed-off-by: Lin Li <lilin24@baidu.com>
> Signed-off-by: Xun Ni <nixun@baidu.com>
> Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
> Signed-off-by: Jin Yu <jin.yu@intel.com>
> ---
> lib/librte_vhost/rte_vhost.h | 19 +++
> lib/librte_vhost/vhost.c | 29 ++++-
> lib/librte_vhost/vhost.h | 9 ++
> lib/librte_vhost/vhost_user.c | 217 +++++++++++++++++++++++++++++++++-
> 4 files changed, 271 insertions(+), 3 deletions(-)
>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
^ permalink raw reply [flat|nested] 27+ messages in thread
* [dpdk-dev] [PATCH v11 6/9] vhost: add the APIs to operate inflight ring
2019-10-09 20:48 ` [dpdk-dev] [PATCH v11 0/9] vhost: support inflight share memory protocol feature Jin Yu
` (4 preceding siblings ...)
2019-10-09 20:48 ` [dpdk-dev] [PATCH v11 5/9] vhost: checkout the resubmit inflight information Jin Yu
@ 2019-10-09 20:48 ` Jin Yu
2019-10-11 10:14 ` Maxime Coquelin
2019-10-09 20:48 ` [dpdk-dev] [PATCH v11 7/9] vhost: add APIs for user getting " Jin Yu
` (3 subsequent siblings)
9 siblings, 1 reply; 27+ messages in thread
From: Jin Yu @ 2019-10-09 20:48 UTC (permalink / raw)
To: dev
Cc: changpeng.liu, maxime.coquelin, tiwei.bie, zhihong.wang, Jin Yu,
Lin Li, Xun Ni, Yu Zhang
This patch introduces three APIs to operate the inflight
ring. Three APIs are set, set last and clear. It includes
split and packed ring.
Signed-off-by: Lin Li <lilin24@baidu.com>
Signed-off-by: Xun Ni <nixun@baidu.com>
Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
Signed-off-by: Jin Yu <jin.yu@intel.com>
---
lib/librte_vhost/rte_vhost.h | 116 +++++++++++
lib/librte_vhost/rte_vhost_version.map | 6 +
lib/librte_vhost/vhost.c | 273 +++++++++++++++++++++++++
3 files changed, 395 insertions(+)
diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
index 95e4d720e..15d7e67cd 100644
--- a/lib/librte_vhost/rte_vhost.h
+++ b/lib/librte_vhost/rte_vhost.h
@@ -693,6 +693,122 @@ int rte_vhost_get_mem_table(int vid, struct rte_vhost_memory **mem);
int rte_vhost_get_vhost_vring(int vid, uint16_t vring_idx,
struct rte_vhost_vring *vring);
+/**
+ * Set split inflight descriptor.
+ *
+ * This function save descriptors that has been comsumed in available
+ * ring
+ *
+ * @param vid
+ * vhost device ID
+ * @param vring_idx
+ * vring index
+ * @param idx
+ * inflight entry index
+ * @return
+ * 0 on success, -1 on failure
+ */
+__rte_experimental
+int
+rte_vhost_set_inflight_desc_split(int vid, uint16_t vring_idx,
+ uint16_t idx);
+
+/**
+ * Set packed inflight descriptor and get corresponding inflight entry
+ *
+ * This function save descriptors that has been comsumed
+ *
+ * @param vid
+ * vhost device ID
+ * @param vring_idx
+ * vring index
+ * @param head
+ * head of descriptors
+ * @param last
+ * last of descriptors
+ * @param inflight_entry
+ * corresponding inflight entry
+ * @return
+ * 0 on success, -1 on failure
+ */
+__rte_experimental
+int
+rte_vhost_set_inflight_desc_packed(int vid, uint16_t vring_idx,
+ uint16_t head, uint16_t last, uint16_t *inflight_entry);
+
+/**
+ * Save the head of list that the last batch of used descriptors.
+ *
+ * @param vid
+ * vhost device ID
+ * @param vring_idx
+ * vring index
+ * @param idx
+ * descriptor entry index
+ * @return
+ * 0 on success, -1 on failure
+ */
+__rte_experimental
+int
+rte_vhost_set_last_inflight_io_split(int vid,
+ uint16_t vring_idx, uint16_t idx);
+
+/**
+ * Update the inflight free_head, used_idx and used_wrap_counter.
+ *
+ * This function will update status first before updating descriptors
+ * to used
+ *
+ * @param vid
+ * vhost device ID
+ * @param vring_idx
+ * vring index
+ * @param head
+ * head of descriptors
+ * @return
+ * 0 on success, -1 on failure
+ */
+__rte_experimental
+int
+rte_vhost_set_last_inflight_io_packed(int vid,
+ uint16_t vring_idx, uint16_t head);
+
+/**
+ * Clear the split inflight status.
+ *
+ * @param vid
+ * vhost device ID
+ * @param vring_idx
+ * vring index
+ * @param last_used_idx
+ * last used idx of used ring
+ * @param idx
+ * inflight entry index
+ * @return
+ * 0 on success, -1 on failure
+ */
+__rte_experimental
+int
+rte_vhost_clr_inflight_desc_split(int vid, uint16_t vring_idx,
+ uint16_t last_used_idx, uint16_t idx);
+
+/**
+ * Clear the packed inflight status.
+ *
+ * @param vid
+ * vhost device ID
+ * @param vring_idx
+ * vring index
+ * @param head
+ * inflight entry index
+ * @return
+ * 0 on success, -1 on failure
+ */
+__rte_experimental
+int
+rte_vhost_clr_inflight_desc_packed(int vid, uint16_t vring_idx,
+ uint16_t head);
+
/**
* Notify the guest that used descriptors have been added to the vring. This
* function acts as a memory barrier.
diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map
index 5f1d4a75c..bc70bfaa5 100644
--- a/lib/librte_vhost/rte_vhost_version.map
+++ b/lib/librte_vhost/rte_vhost_version.map
@@ -87,4 +87,10 @@ EXPERIMENTAL {
rte_vdpa_relay_vring_used;
rte_vhost_extern_callback_register;
rte_vhost_driver_set_protocol_features;
+ rte_vhost_set_inflight_desc_split;
+ rte_vhost_set_inflight_desc_packed;
+ rte_vhost_set_last_inflight_io_split;
+ rte_vhost_set_last_inflight_io_packed;
+ rte_vhost_clr_inflight_desc_split;
+ rte_vhost_clr_inflight_desc_packed;
};
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index f8660edbf..b8c14a6ea 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -783,6 +783,279 @@ rte_vhost_get_vhost_vring(int vid, uint16_t vring_idx,
return 0;
}
+int
+rte_vhost_set_inflight_desc_split(int vid, uint16_t vring_idx,
+ uint16_t idx)
+{
+ struct vhost_virtqueue *vq;
+ struct virtio_net *dev;
+
+ dev = get_device(vid);
+ if (unlikely(!dev))
+ return -1;
+
+ if (unlikely(!(dev->protocol_features &
+ (1ULL << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD))))
+ return 0;
+
+ if (unlikely(vq_is_packed(dev)))
+ return -1;
+
+ if (unlikely(vring_idx >= VHOST_MAX_VRING))
+ return -1;
+
+ vq = dev->virtqueue[vring_idx];
+ if (unlikely(!vq))
+ return -1;
+
+ if (unlikely(!vq->inflight_split))
+ return -1;
+
+ if (unlikely(idx >= vq->size))
+ return -1;
+
+ vq->inflight_split->desc[idx].counter = vq->global_counter++;
+ vq->inflight_split->desc[idx].inflight = 1;
+ return 0;
+}
+
+int
+rte_vhost_set_inflight_desc_packed(int vid, uint16_t vring_idx,
+ uint16_t head, uint16_t last,
+ uint16_t *inflight_entry)
+{
+ struct rte_vhost_inflight_info_packed *inflight_info;
+ struct virtio_net *dev;
+ struct vhost_virtqueue *vq;
+ struct vring_packed_desc *desc;
+ uint16_t old_free_head, free_head;
+
+ dev = get_device(vid);
+ if (unlikely(!dev))
+ return -1;
+
+ if (unlikely(!(dev->protocol_features &
+ (1ULL << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD))))
+ return 0;
+
+ if (unlikely(!vq_is_packed(dev)))
+ return -1;
+
+ if (unlikely(vring_idx >= VHOST_MAX_VRING))
+ return -1;
+
+ vq = dev->virtqueue[vring_idx];
+ if (unlikely(!vq))
+ return -1;
+
+ inflight_info = vq->inflight_packed;
+ if (unlikely(!inflight_info))
+ return -1;
+
+ if (unlikely(head >= vq->size))
+ return -1;
+
+ desc = vq->desc_packed;
+ old_free_head = inflight_info->old_free_head;
+ if (unlikely(old_free_head >= vq->size))
+ return -1;
+
+ free_head = old_free_head;
+
+ /* init header descriptor */
+ inflight_info->desc[old_free_head].num = 0;
+ inflight_info->desc[old_free_head].counter = vq->global_counter++;
+ inflight_info->desc[old_free_head].inflight = 1;
+
+ /* save desc entry in flight entry */
+ while (head != ((last + 1) % vq->size)) {
+ inflight_info->desc[old_free_head].num++;
+ inflight_info->desc[free_head].addr = desc[head].addr;
+ inflight_info->desc[free_head].len = desc[head].len;
+ inflight_info->desc[free_head].flags = desc[head].flags;
+ inflight_info->desc[free_head].id = desc[head].id;
+
+ inflight_info->desc[old_free_head].last = free_head;
+ free_head = inflight_info->desc[free_head].next;
+ inflight_info->free_head = free_head;
+ head = (head + 1) % vq->size;
+ }
+
+ inflight_info->old_free_head = free_head;
+ *inflight_entry = old_free_head;
+
+ return 0;
+}
+
+int
+rte_vhost_clr_inflight_desc_split(int vid, uint16_t vring_idx,
+ uint16_t last_used_idx, uint16_t idx)
+{
+ struct virtio_net *dev;
+ struct vhost_virtqueue *vq;
+
+ dev = get_device(vid);
+ if (unlikely(!dev))
+ return -1;
+
+ if (unlikely(!(dev->protocol_features &
+ (1ULL << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD))))
+ return 0;
+
+ if (unlikely(vq_is_packed(dev)))
+ return -1;
+
+ if (unlikely(vring_idx >= VHOST_MAX_VRING))
+ return -1;
+
+ vq = dev->virtqueue[vring_idx];
+ if (unlikely(!vq))
+ return -1;
+
+ if (unlikely(!vq->inflight_split))
+ return -1;
+
+ if (unlikely(idx >= vq->size))
+ return -1;
+
+ rte_smp_mb();
+
+ vq->inflight_split->desc[idx].inflight = 0;
+
+ rte_smp_mb();
+
+ vq->inflight_split->used_idx = last_used_idx;
+ return 0;
+}
+
+int
+rte_vhost_clr_inflight_desc_packed(int vid, uint16_t vring_idx,
+ uint16_t head)
+{
+ struct rte_vhost_inflight_info_packed *inflight_info;
+ struct virtio_net *dev;
+ struct vhost_virtqueue *vq;
+
+ dev = get_device(vid);
+ if (unlikely(!dev))
+ return -1;
+
+ if (unlikely(!(dev->protocol_features &
+ (1ULL << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD))))
+ return 0;
+
+ if (unlikely(!vq_is_packed(dev)))
+ return -1;
+
+ if (unlikely(vring_idx >= VHOST_MAX_VRING))
+ return -1;
+
+ vq = dev->virtqueue[vring_idx];
+ if (unlikely(!vq))
+ return -1;
+
+ inflight_info = vq->inflight_packed;
+ if (unlikely(!inflight_info))
+ return -1;
+
+ if (unlikely(head >= vq->size))
+ return -1;
+
+ rte_smp_mb();
+
+ inflight_info->desc[head].inflight = 0;
+
+ rte_smp_mb();
+
+ inflight_info->old_free_head = inflight_info->free_head;
+ inflight_info->old_used_idx = inflight_info->used_idx;
+ inflight_info->old_used_wrap_counter = inflight_info->used_wrap_counter;
+
+ return 0;
+}
+
+int
+rte_vhost_set_last_inflight_io_split(int vid, uint16_t vring_idx,
+ uint16_t idx)
+{
+ struct virtio_net *dev;
+ struct vhost_virtqueue *vq;
+
+ dev = get_device(vid);
+ if (unlikely(!dev))
+ return -1;
+
+ if (unlikely(!(dev->protocol_features &
+ (1ULL << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD))))
+ return 0;
+
+ if (unlikely(vq_is_packed(dev)))
+ return -1;
+
+ if (unlikely(vring_idx >= VHOST_MAX_VRING))
+ return -1;
+
+ vq = dev->virtqueue[vring_idx];
+ if (unlikely(!vq))
+ return -1;
+
+ if (unlikely(!vq->inflight_split))
+ return -1;
+
+ vq->inflight_split->last_inflight_io = idx;
+ return 0;
+}
+
+int
+rte_vhost_set_last_inflight_io_packed(int vid, uint16_t vring_idx,
+ uint16_t head)
+{
+ struct rte_vhost_inflight_info_packed *inflight_info;
+ struct virtio_net *dev;
+ struct vhost_virtqueue *vq;
+ uint16_t last;
+
+ dev = get_device(vid);
+ if (unlikely(!dev))
+ return -1;
+
+ if (unlikely(!(dev->protocol_features &
+ (1ULL << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD))))
+ return 0;
+
+ if (unlikely(!vq_is_packed(dev)))
+ return -1;
+
+ if (unlikely(vring_idx >= VHOST_MAX_VRING))
+ return -1;
+
+ vq = dev->virtqueue[vring_idx];
+ if (unlikely(!vq))
+ return -1;
+
+ inflight_info = vq->inflight_packed;
+ if (unlikely(!inflight_info))
+ return -1;
+
+ if (unlikely(head >= vq->size))
+ return -1;
+
+ last = inflight_info->desc[head].last;
+ if (unlikely(last >= vq->size))
+ return -1;
+
+ inflight_info->desc[last].next = inflight_info->free_head;
+ inflight_info->free_head = head;
+ inflight_info->used_idx += inflight_info->desc[head].num;
+ if (inflight_info->used_idx >= inflight_info->desc_num) {
+ inflight_info->used_idx -= inflight_info->desc_num;
+ inflight_info->used_wrap_counter =
+ !inflight_info->used_wrap_counter;
+ }
+
+ return 0;
+}
+
int
rte_vhost_vring_call(int vid, uint16_t vring_idx)
{
--
2.17.2
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [dpdk-dev] [PATCH v11 6/9] vhost: add the APIs to operate inflight ring
2019-10-09 20:48 ` [dpdk-dev] [PATCH v11 6/9] vhost: add the APIs to operate inflight ring Jin Yu
@ 2019-10-11 10:14 ` Maxime Coquelin
0 siblings, 0 replies; 27+ messages in thread
From: Maxime Coquelin @ 2019-10-11 10:14 UTC (permalink / raw)
To: Jin Yu, dev
Cc: changpeng.liu, tiwei.bie, zhihong.wang, Lin Li, Xun Ni, Yu Zhang
On 10/9/19 10:48 PM, Jin Yu wrote:
> This patch introduces three APIs to operate the inflight
> ring. Three APIs are set, set last and clear. It includes
> split and packed ring.
>
> Signed-off-by: Lin Li <lilin24@baidu.com>
> Signed-off-by: Xun Ni <nixun@baidu.com>
> Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
> Signed-off-by: Jin Yu <jin.yu@intel.com>
> ---
> lib/librte_vhost/rte_vhost.h | 116 +++++++++++
> lib/librte_vhost/rte_vhost_version.map | 6 +
> lib/librte_vhost/vhost.c | 273 +++++++++++++++++++++++++
> 3 files changed, 395 insertions(+)
>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
^ permalink raw reply [flat|nested] 27+ messages in thread
* [dpdk-dev] [PATCH v11 7/9] vhost: add APIs for user getting inflight ring
2019-10-09 20:48 ` [dpdk-dev] [PATCH v11 0/9] vhost: support inflight share memory protocol feature Jin Yu
` (5 preceding siblings ...)
2019-10-09 20:48 ` [dpdk-dev] [PATCH v11 6/9] vhost: add the APIs to operate inflight ring Jin Yu
@ 2019-10-09 20:48 ` Jin Yu
2019-10-11 10:18 ` Maxime Coquelin
2019-10-09 20:48 ` [dpdk-dev] [PATCH v11 8/9] vhost: add vring functions packed ring support Jin Yu
` (2 subsequent siblings)
9 siblings, 1 reply; 27+ messages in thread
From: Jin Yu @ 2019-10-09 20:48 UTC (permalink / raw)
To: dev
Cc: changpeng.liu, maxime.coquelin, tiwei.bie, zhihong.wang, Jin Yu,
Lin Li, Xun Ni, Yu Zhang
This patch introduces two APIs. one is for getting inflgiht
ring and the other is for getting base.
Signed-off-by: Lin Li <lilin24@baidu.com>
Signed-off-by: Xun Ni <nixun@baidu.com>
Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
Signed-off-by: Jin Yu <jin.yu@intel.com>
---
lib/librte_vhost/rte_vhost.h | 40 +++++++++++++++++
lib/librte_vhost/rte_vhost_version.map | 2 +
lib/librte_vhost/vhost.c | 61 ++++++++++++++++++++++++++
3 files changed, 103 insertions(+)
diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
index 15d7e67cd..1e1e13d71 100644
--- a/lib/librte_vhost/rte_vhost.h
+++ b/lib/librte_vhost/rte_vhost.h
@@ -693,6 +693,23 @@ int rte_vhost_get_mem_table(int vid, struct rte_vhost_memory **mem);
int rte_vhost_get_vhost_vring(int vid, uint16_t vring_idx,
struct rte_vhost_vring *vring);
+/**
+ * Get guest inflight vring info, including inflight ring and resubmit list.
+ *
+ * @param vid
+ * vhost device ID
+ * @param vring_idx
+ * vring index
+ * @param vring
+ * the structure to hold the requested inflight vring info
+ * @return
+ * 0 on success, -1 on failure
+ */
+__rte_experimental
+int
+rte_vhost_get_vhost_ring_inflight(int vid, uint16_t vring_idx,
+ struct rte_vhost_ring_inflight *vring);
+
/**
* Set split inflight descriptor.
*
@@ -867,6 +884,29 @@ int __rte_experimental
rte_vhost_get_vring_base(int vid, uint16_t queue_id,
uint16_t *last_avail_idx, uint16_t *last_used_idx);
+/**
+ * Get last_avail/last_used of the vhost virtqueue
+ *
+ * This function is designed for the reconnection and it's specific for
+ * the packed ring as we can get the two parameters from the inflight
+ * queueregion
+ *
+ * @param vid
+ * vhost device ID
+ * @param queue_id
+ * vhost queue index
+ * @param last_avail_idx
+ * vhost last_avail_idx to get
+ * @param last_used_idx
+ * vhost last_used_idx to get
+ * @return
+ * 0 on success, -1 on failure
+ */
+__rte_experimental
+int
+rte_vhost_get_vring_base_from_inflight(int vid,
+ uint16_t queue_id, uint16_t *last_avail_idx, uint16_t *last_used_idx);
+
/**
* Set last_avail/used_idx of the vhost virtqueue
*
diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map
index bc70bfaa5..ce517b127 100644
--- a/lib/librte_vhost/rte_vhost_version.map
+++ b/lib/librte_vhost/rte_vhost_version.map
@@ -93,4 +93,6 @@ EXPERIMENTAL {
rte_vhost_set_last_inflight_io_packed;
rte_vhost_clr_inflight_desc_split;
rte_vhost_clr_inflight_desc_packed;
+ rte_vhost_get_vhost_ring_inflight;
+ rte_vhost_get_vring_base_from_inflight;
};
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index b8c14a6ea..f7ed37261 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -783,6 +783,41 @@ rte_vhost_get_vhost_vring(int vid, uint16_t vring_idx,
return 0;
}
+int
+rte_vhost_get_vhost_ring_inflight(int vid, uint16_t vring_idx,
+ struct rte_vhost_ring_inflight *vring)
+{
+ struct virtio_net *dev;
+ struct vhost_virtqueue *vq;
+
+ dev = get_device(vid);
+ if (unlikely(!dev))
+ return -1;
+
+ if (vring_idx >= VHOST_MAX_VRING)
+ return -1;
+
+ vq = dev->virtqueue[vring_idx];
+ if (unlikely(!vq))
+ return -1;
+
+ if (vq_is_packed(dev)) {
+ if (unlikely(!vq->inflight_packed))
+ return -1;
+
+ vring->inflight_packed = vq->inflight_packed;
+ } else {
+ if (unlikely(!vq->inflight_split))
+ return -1;
+
+ vring->inflight_split = vq->inflight_split;
+ }
+
+ vring->resubmit_inflight = vq->resubmit_inflight;
+
+ return 0;
+}
+
int
rte_vhost_set_inflight_desc_split(int vid, uint16_t vring_idx,
uint16_t idx)
@@ -1250,6 +1285,32 @@ int rte_vhost_get_vring_base(int vid, uint16_t queue_id,
return 0;
}
+int
+rte_vhost_get_vring_base_from_inflight(int vid,
+ uint16_t queue_id,
+ uint16_t *last_avail_idx,
+ uint16_t *last_used_idx)
+{
+ struct rte_vhost_inflight_info_packed *inflight_info;
+ struct virtio_net *dev = get_device(vid);
+
+ if (dev == NULL || last_avail_idx == NULL || last_used_idx == NULL)
+ return -1;
+
+ if (!vq_is_packed(dev))
+ return -1;
+
+ inflight_info = dev->virtqueue[queue_id]->inflight_packed;
+ if (!inflight_info)
+ return -1;
+
+ *last_avail_idx = (inflight_info->old_used_wrap_counter << 15) |
+ inflight_info->old_used_idx;
+ *last_used_idx = *last_avail_idx;
+
+ return 0;
+}
+
int rte_vhost_set_vring_base(int vid, uint16_t queue_id,
uint16_t last_avail_idx, uint16_t last_used_idx)
{
--
2.17.2
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [dpdk-dev] [PATCH v11 7/9] vhost: add APIs for user getting inflight ring
2019-10-09 20:48 ` [dpdk-dev] [PATCH v11 7/9] vhost: add APIs for user getting " Jin Yu
@ 2019-10-11 10:18 ` Maxime Coquelin
0 siblings, 0 replies; 27+ messages in thread
From: Maxime Coquelin @ 2019-10-11 10:18 UTC (permalink / raw)
To: Jin Yu, dev
Cc: changpeng.liu, tiwei.bie, zhihong.wang, Lin Li, Xun Ni, Yu Zhang
On 10/9/19 10:48 PM, Jin Yu wrote:
> This patch introduces two APIs. one is for getting inflgiht
s/inflgiht/inflight/
> ring and the other is for getting base.
>
> Signed-off-by: Lin Li <lilin24@baidu.com>
> Signed-off-by: Xun Ni <nixun@baidu.com>
> Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
> Signed-off-by: Jin Yu <jin.yu@intel.com>
> ---
> lib/librte_vhost/rte_vhost.h | 40 +++++++++++++++++
> lib/librte_vhost/rte_vhost_version.map | 2 +
> lib/librte_vhost/vhost.c | 61 ++++++++++++++++++++++++++
> 3 files changed, 103 insertions(+)
>
Other than the typo that I'll fix while applying:
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
^ permalink raw reply [flat|nested] 27+ messages in thread
* [dpdk-dev] [PATCH v11 8/9] vhost: add vring functions packed ring support
2019-10-09 20:48 ` [dpdk-dev] [PATCH v11 0/9] vhost: support inflight share memory protocol feature Jin Yu
` (6 preceding siblings ...)
2019-10-09 20:48 ` [dpdk-dev] [PATCH v11 7/9] vhost: add APIs for user getting " Jin Yu
@ 2019-10-09 20:48 ` Jin Yu
2019-10-11 10:19 ` Maxime Coquelin
2019-10-09 20:48 ` [dpdk-dev] [PATCH v11 9/9] vhost: add vhost-user-blk example which support inflight Jin Yu
2019-10-16 11:12 ` [dpdk-dev] [PATCH v11 0/9] vhost: support inflight share memory protocol feature Maxime Coquelin
9 siblings, 1 reply; 27+ messages in thread
From: Jin Yu @ 2019-10-09 20:48 UTC (permalink / raw)
To: dev
Cc: changpeng.liu, maxime.coquelin, tiwei.bie, zhihong.wang, Jin Yu,
Lin Li, Xun Ni, Yu Zhang
This patch add packed ring support in two APIs
so user can get the packed ring`.
Signed-off-by: Lin Li <lilin24@baidu.com>
Signed-off-by: Xun Ni <nixun@baidu.com>
Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
Signed-off-by: Jin Yu <jin.yu@intel.com>
---
lib/librte_vhost/vhost.c | 68 +++++++++++++++++++++++++++++-----------
1 file changed, 49 insertions(+), 19 deletions(-)
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index f7ed37261..fd3445207 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -771,9 +771,15 @@ rte_vhost_get_vhost_vring(int vid, uint16_t vring_idx,
if (!vq)
return -1;
- vring->desc = vq->desc;
- vring->avail = vq->avail;
- vring->used = vq->used;
+ if (vq_is_packed(dev)) {
+ vring->desc_packed = vq->desc_packed;
+ vring->driver_event = vq->driver_event;
+ vring->device_event = vq->device_event;
+ } else {
+ vring->desc = vq->desc;
+ vring->avail = vq->avail;
+ vring->used = vq->used;
+ }
vring->log_guest_addr = vq->log_guest_addr;
vring->callfd = vq->callfd;
@@ -1274,13 +1280,51 @@ int rte_vhost_get_log_base(int vid, uint64_t *log_base,
int rte_vhost_get_vring_base(int vid, uint16_t queue_id,
uint16_t *last_avail_idx, uint16_t *last_used_idx)
{
+ struct vhost_virtqueue *vq;
struct virtio_net *dev = get_device(vid);
if (dev == NULL || last_avail_idx == NULL || last_used_idx == NULL)
return -1;
- *last_avail_idx = dev->virtqueue[queue_id]->last_avail_idx;
- *last_used_idx = dev->virtqueue[queue_id]->last_used_idx;
+ vq = dev->virtqueue[queue_id];
+ if (!vq)
+ return -1;
+
+ if (vq_is_packed(dev)) {
+ *last_avail_idx = (vq->avail_wrap_counter << 15) |
+ vq->last_avail_idx;
+ *last_used_idx = (vq->used_wrap_counter << 15) |
+ vq->last_used_idx;
+ } else {
+ *last_avail_idx = vq->last_avail_idx;
+ *last_used_idx = vq->last_used_idx;
+ }
+
+ return 0;
+}
+
+int rte_vhost_set_vring_base(int vid, uint16_t queue_id,
+ uint16_t last_avail_idx, uint16_t last_used_idx)
+{
+ struct vhost_virtqueue *vq;
+ struct virtio_net *dev = get_device(vid);
+
+ if (!dev)
+ return -1;
+
+ vq = dev->virtqueue[queue_id];
+ if (!vq)
+ return -1;
+
+ if (vq_is_packed(dev)) {
+ vq->last_avail_idx = last_avail_idx & 0x7fff;
+ vq->avail_wrap_counter = !!(last_avail_idx & (1 << 15));
+ vq->last_used_idx = last_used_idx & 0x7fff;
+ vq->used_wrap_counter = !!(last_used_idx & (1 << 15));
+ } else {
+ vq->last_avail_idx = last_avail_idx;
+ vq->last_used_idx = last_used_idx;
+ }
return 0;
}
@@ -1311,20 +1355,6 @@ rte_vhost_get_vring_base_from_inflight(int vid,
return 0;
}
-int rte_vhost_set_vring_base(int vid, uint16_t queue_id,
- uint16_t last_avail_idx, uint16_t last_used_idx)
-{
- struct virtio_net *dev = get_device(vid);
-
- if (!dev)
- return -1;
-
- dev->virtqueue[queue_id]->last_avail_idx = last_avail_idx;
- dev->virtqueue[queue_id]->last_used_idx = last_used_idx;
-
- return 0;
-}
-
int rte_vhost_extern_callback_register(int vid,
struct rte_vhost_user_extern_ops const * const ops, void *ctx)
{
--
2.17.2
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [dpdk-dev] [PATCH v11 8/9] vhost: add vring functions packed ring support
2019-10-09 20:48 ` [dpdk-dev] [PATCH v11 8/9] vhost: add vring functions packed ring support Jin Yu
@ 2019-10-11 10:19 ` Maxime Coquelin
0 siblings, 0 replies; 27+ messages in thread
From: Maxime Coquelin @ 2019-10-11 10:19 UTC (permalink / raw)
To: Jin Yu, dev
Cc: changpeng.liu, tiwei.bie, zhihong.wang, Lin Li, Xun Ni, Yu Zhang
On 10/9/19 10:48 PM, Jin Yu wrote:
> This patch add packed ring support in two APIs
> so user can get the packed ring`.
>
> Signed-off-by: Lin Li <lilin24@baidu.com>
> Signed-off-by: Xun Ni <nixun@baidu.com>
> Signed-off-by: Yu Zhang <zhangyu31@baidu.com>
> Signed-off-by: Jin Yu <jin.yu@intel.com>
> ---
> lib/librte_vhost/vhost.c | 68 +++++++++++++++++++++++++++++-----------
> 1 file changed, 49 insertions(+), 19 deletions(-)
>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
^ permalink raw reply [flat|nested] 27+ messages in thread
* [dpdk-dev] [PATCH v11 9/9] vhost: add vhost-user-blk example which support inflight
2019-10-09 20:48 ` [dpdk-dev] [PATCH v11 0/9] vhost: support inflight share memory protocol feature Jin Yu
` (7 preceding siblings ...)
2019-10-09 20:48 ` [dpdk-dev] [PATCH v11 8/9] vhost: add vring functions packed ring support Jin Yu
@ 2019-10-09 20:48 ` Jin Yu
2019-10-11 10:24 ` Maxime Coquelin
2019-10-28 19:37 ` [dpdk-dev] [PATCH] " Jin Yu
2019-10-16 11:12 ` [dpdk-dev] [PATCH v11 0/9] vhost: support inflight share memory protocol feature Maxime Coquelin
9 siblings, 2 replies; 27+ messages in thread
From: Jin Yu @ 2019-10-09 20:48 UTC (permalink / raw)
To: dev; +Cc: changpeng.liu, maxime.coquelin, tiwei.bie, zhihong.wang, Jin Yu
A vhost-user-blk example that support inflight feature. It uses the
new APIs that introduced in the first patch, so it can show how these
APIs work to support inflight feature.
Signed-off-by: Jin Yu <jin.yu@intel.com>
---
V1 - add the case.
V2 - add the rte_vhost prefix.
V3 - add packed ring support
---
examples/vhost_blk/Makefile | 68 ++
examples/vhost_blk/blk.c | 125 +++
examples/vhost_blk/blk_spec.h | 95 +++
examples/vhost_blk/meson.build | 21 +
examples/vhost_blk/vhost_blk.c | 1092 +++++++++++++++++++++++++
examples/vhost_blk/vhost_blk.h | 128 +++
examples/vhost_blk/vhost_blk_compat.c | 195 +++++
7 files changed, 1724 insertions(+)
create mode 100644 examples/vhost_blk/Makefile
create mode 100644 examples/vhost_blk/blk.c
create mode 100644 examples/vhost_blk/blk_spec.h
create mode 100644 examples/vhost_blk/meson.build
create mode 100644 examples/vhost_blk/vhost_blk.c
create mode 100644 examples/vhost_blk/vhost_blk.h
create mode 100644 examples/vhost_blk/vhost_blk_compat.c
diff --git a/examples/vhost_blk/Makefile b/examples/vhost_blk/Makefile
new file mode 100644
index 000000000..a10a90071
--- /dev/null
+++ b/examples/vhost_blk/Makefile
@@ -0,0 +1,68 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2010-2014 Intel Corporation
+
+# binary name
+APP = vhost-blk
+
+# all source are stored in SRCS-y
+SRCS-y := blk.c vhost_blk.c vhost_blk_compat.c
+
+# Build using pkg-config variables if possible
+$(shell pkg-config --exists libdpdk)
+ifeq ($(.SHELLSTATUS),0)
+
+all: shared
+.PHONY: shared static
+shared: build/$(APP)-shared
+ ln -sf $(APP)-shared build/$(APP)
+static: build/$(APP)-static
+ ln -sf $(APP)-static build/$(APP)
+
+LDFLAGS += -pthread
+
+PC_FILE := $(shell pkg-config --path libdpdk)
+CFLAGS += -O3 $(shell pkg-config --cflags libdpdk)
+LDFLAGS_SHARED = $(shell pkg-config --libs libdpdk)
+LDFLAGS_STATIC = -Wl,-Bstatic $(shell pkg-config --static --libs libdpdk)
+
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+
+build/$(APP)-shared: $(SRCS-y) Makefile $(PC_FILE) | build
+ $(CC) $(CFLAGS) $(SRCS-y) -o $@ $(LDFLAGS) $(LDFLAGS_SHARED)
+
+build/$(APP)-static: $(SRCS-y) Makefile $(PC_FILE) | build
+ $(CC) $(CFLAGS) $(SRCS-y) -o $@ $(LDFLAGS) $(LDFLAGS_STATIC)
+
+build:
+ @mkdir -p $@
+
+.PHONY: clean
+clean:
+ rm -f build/$(APP) build/$(APP)-static build/$(APP)-shared
+ test -d build && rmdir -p build || true
+
+else # Build using legacy build system
+
+ifeq ($(RTE_SDK),)
+$(error "Please define RTE_SDK environment variable")
+endif
+
+# Default target, detect a build directory, by looking for a path with a .config
+RTE_TARGET ?= $(notdir $(abspath $(dir $(firstword $(wildcard $(RTE_SDK)/*/.config)))))
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+ifneq ($(CONFIG_RTE_EXEC_ENV_LINUX),y)
+$(info This application can only operate in a linux environment, \
+please change the definition of the RTE_TARGET environment variable)
+all:
+else
+
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+CFLAGS += -O2 -D_FILE_OFFSET_BITS=64
+CFLAGS += $(WERROR_FLAGS)
+
+include $(RTE_SDK)/mk/rte.extapp.mk
+
+endif
+endif
diff --git a/examples/vhost_blk/blk.c b/examples/vhost_blk/blk.c
new file mode 100644
index 000000000..424ed3015
--- /dev/null
+++ b/examples/vhost_blk/blk.c
@@ -0,0 +1,125 @@
+// SPDX-License-Identifier: BSD-3-Clause
+// Copyright(c) 2010-2019 Intel Corporation
+
+/**
+ * This work is largely based on the "vhost-user-blk" implementation by
+ * SPDK(https://github.com/spdk/spdk).
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <assert.h>
+#include <ctype.h>
+#include <string.h>
+#include <stddef.h>
+
+#include <rte_atomic.h>
+#include <rte_cycles.h>
+#include <rte_log.h>
+#include <rte_malloc.h>
+#include <rte_byteorder.h>
+#include <rte_string_fns.h>
+
+#include "vhost_blk.h"
+#include "blk_spec.h"
+
+static void
+vhost_strcpy_pad(void *dst, const char *src, size_t size, int pad)
+{
+ size_t len;
+
+ len = strlen(src);
+ if (len < size) {
+ memcpy(dst, src, len);
+ memset((char *)dst + len, pad, size - len);
+ } else {
+ memcpy(dst, src, size);
+ }
+}
+
+static int
+vhost_bdev_blk_readwrite(struct vhost_block_dev *bdev,
+ struct vhost_blk_task *task,
+ uint64_t lba_512, __rte_unused uint32_t xfer_len)
+{
+ uint32_t i;
+ uint64_t offset;
+ uint32_t nbytes = 0;
+
+ offset = lba_512 * 512;
+
+ for (i = 0; i < task->iovs_cnt; i++) {
+ if (task->dxfer_dir == BLK_DIR_TO_DEV)
+ memcpy(bdev->data + offset, task->iovs[i].iov_base,
+ task->iovs[i].iov_len);
+ else
+ memcpy(task->iovs[i].iov_base, bdev->data + offset,
+ task->iovs[i].iov_len);
+ offset += task->iovs[i].iov_len;
+ nbytes += task->iovs[i].iov_len;
+ }
+
+ return nbytes;
+}
+
+int
+vhost_bdev_process_blk_commands(struct vhost_block_dev *bdev,
+ struct vhost_blk_task *task)
+{
+ int used_len;
+
+ if (unlikely(task->data_len > (bdev->blockcnt * bdev->blocklen))) {
+ fprintf(stderr, "read or write beyond capacity\n");
+ return VIRTIO_BLK_S_UNSUPP;
+ }
+
+ switch (task->req->type) {
+ case VIRTIO_BLK_T_IN:
+ if (unlikely(task->data_len == 0 ||
+ (task->data_len & (512 - 1)) != 0)) {
+ fprintf(stderr,
+ "%s - passed IO buffer is not multiple of 512b"
+ "(req_idx = %"PRIu16").\n",
+ task->req->type ? "WRITE" : "READ",
+ task->head_idx);
+ return VIRTIO_BLK_S_UNSUPP;
+ }
+
+ task->dxfer_dir = BLK_DIR_FROM_DEV;
+ vhost_bdev_blk_readwrite(bdev, task,
+ task->req->sector, task->data_len);
+ break;
+ case VIRTIO_BLK_T_OUT:
+ if (unlikely(task->data_len == 0 ||
+ (task->data_len & (512 - 1)) != 0)) {
+ fprintf(stderr,
+ "%s - passed IO buffer is not multiple of 512b"
+ "(req_idx = %"PRIu16").\n",
+ task->req->type ? "WRITE" : "READ",
+ task->head_idx);
+ return VIRTIO_BLK_S_UNSUPP;
+ }
+
+ if (task->readtype) {
+ fprintf(stderr, "type isn't right\n");
+ return VIRTIO_BLK_S_IOERR;
+ }
+ task->dxfer_dir = BLK_DIR_TO_DEV;
+ vhost_bdev_blk_readwrite(bdev, task,
+ task->req->sector, task->data_len);
+ break;
+ case VIRTIO_BLK_T_GET_ID:
+ if (!task->iovs_cnt || task->data_len)
+ return VIRTIO_BLK_S_UNSUPP;
+ used_len = min(VIRTIO_BLK_ID_BYTES, task->data_len);
+ vhost_strcpy_pad(task->iovs[0].iov_base,
+ bdev->product_name, used_len, ' ');
+ break;
+ default:
+ fprintf(stderr, "unsupported cmd\n");
+ return VIRTIO_BLK_S_UNSUPP;
+ }
+
+ return VIRTIO_BLK_S_OK;
+}
diff --git a/examples/vhost_blk/blk_spec.h b/examples/vhost_blk/blk_spec.h
new file mode 100644
index 000000000..5875e2f86
--- /dev/null
+++ b/examples/vhost_blk/blk_spec.h
@@ -0,0 +1,95 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Intel Corporation
+ */
+
+#ifndef _BLK_SPEC_H
+#define _BLK_SPEC_H
+
+#include <stdint.h>
+
+#ifndef VHOST_USER_MEMORY_MAX_NREGIONS
+#define VHOST_USER_MEMORY_MAX_NREGIONS 8
+#endif
+
+#ifndef VHOST_USER_MAX_CONFIG_SIZE
+#define VHOST_USER_MAX_CONFIG_SIZE 256
+#endif
+
+#ifndef VHOST_USER_PROTOCOL_F_CONFIG
+#define VHOST_USER_PROTOCOL_F_CONFIG 9
+#endif
+
+#ifndef VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD
+#define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD 12
+#endif
+
+#define VIRTIO_BLK_ID_BYTES 20 /* ID string length */
+
+#define VIRTIO_BLK_T_IN 0
+#define VIRTIO_BLK_T_OUT 1
+#define VIRTIO_BLK_T_FLUSH 4
+#define VIRTIO_BLK_T_GET_ID 8
+#define VIRTIO_BLK_T_DISCARD 11
+#define VIRTIO_BLK_T_WRITE_ZEROES 13
+
+#define VIRTIO_BLK_S_OK 0
+#define VIRTIO_BLK_S_IOERR 1
+#define VIRTIO_BLK_S_UNSUPP 2
+
+enum vhost_user_request {
+ VHOST_USER_NONE = 0,
+ VHOST_USER_GET_FEATURES = 1,
+ VHOST_USER_SET_FEATURES = 2,
+ VHOST_USER_SET_OWNER = 3,
+ VHOST_USER_RESET_OWNER = 4,
+ VHOST_USER_SET_MEM_TABLE = 5,
+ VHOST_USER_SET_LOG_BASE = 6,
+ VHOST_USER_SET_LOG_FD = 7,
+ VHOST_USER_SET_VRING_NUM = 8,
+ VHOST_USER_SET_VRING_ADDR = 9,
+ VHOST_USER_SET_VRING_BASE = 10,
+ VHOST_USER_GET_VRING_BASE = 11,
+ VHOST_USER_SET_VRING_KICK = 12,
+ VHOST_USER_SET_VRING_CALL = 13,
+ VHOST_USER_SET_VRING_ERR = 14,
+ VHOST_USER_GET_PROTOCOL_FEATURES = 15,
+ VHOST_USER_SET_PROTOCOL_FEATURES = 16,
+ VHOST_USER_GET_QUEUE_NUM = 17,
+ VHOST_USER_SET_VRING_ENABLE = 18,
+ VHOST_USER_MAX
+};
+
+/** Get/set config msg payload */
+struct vhost_user_config {
+ uint32_t offset;
+ uint32_t size;
+ uint32_t flags;
+ uint8_t region[VHOST_USER_MAX_CONFIG_SIZE];
+};
+
+/** Fixed-size vhost_memory struct */
+struct vhost_memory_padded {
+ uint32_t nregions;
+ uint32_t padding;
+ struct vhost_memory_region regions[VHOST_USER_MEMORY_MAX_NREGIONS];
+};
+
+struct vhost_user_msg {
+ enum vhost_user_request request;
+
+#define VHOST_USER_VERSION_MASK 0x3
+#define VHOST_USER_REPLY_MASK (0x1 << 2)
+ uint32_t flags;
+ uint32_t size; /**< the following payload size */
+ union {
+#define VHOST_USER_VRING_IDX_MASK 0xff
+#define VHOST_USER_VRING_NOFD_MASK (0x1 << 8)
+ uint64_t u64;
+ struct vhost_vring_state state;
+ struct vhost_vring_addr addr;
+ struct vhost_memory_padded memory;
+ struct vhost_user_config cfg;
+ } payload;
+} __attribute((packed));
+
+#endif
diff --git a/examples/vhost_blk/meson.build b/examples/vhost_blk/meson.build
new file mode 100644
index 000000000..857367192
--- /dev/null
+++ b/examples/vhost_blk/meson.build
@@ -0,0 +1,21 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2017 Intel Corporation
+
+# meson file, for building this example as part of a main DPDK build.
+#
+# To build this example as a standalone application with an already-installed
+# DPDK instance, use 'make'
+
+if not is_linux
+ build = false
+endif
+
+if not cc.has_header('linux/virtio_blk.h')
+ build = false
+endif
+
+deps += 'vhost'
+allow_experimental_apis = true
+sources = files(
+ 'blk.c', 'vhost_blk.c', 'vhost_blk_compat.c'
+)
diff --git a/examples/vhost_blk/vhost_blk.c b/examples/vhost_blk/vhost_blk.c
new file mode 100644
index 000000000..49888b0cc
--- /dev/null
+++ b/examples/vhost_blk/vhost_blk.c
@@ -0,0 +1,1092 @@
+// SPDX-License-Identifier: BSD-3-Clause
+// Copyright(c) 2010-2017 Intel Corporation
+
+#include <stdint.h>
+#include <unistd.h>
+#include <stdbool.h>
+#include <signal.h>
+#include <assert.h>
+#include <semaphore.h>
+#include <linux/virtio_blk.h>
+#include <linux/virtio_ring.h>
+
+#include <rte_atomic.h>
+#include <rte_cycles.h>
+#include <rte_log.h>
+#include <rte_malloc.h>
+#include <rte_vhost.h>
+
+#include "vhost_blk.h"
+#include "blk_spec.h"
+
+#define VIRTQ_DESC_F_NEXT 1
+#define VIRTQ_DESC_F_AVAIL (1 << 7)
+#define VIRTQ_DESC_F_USED (1 << 15)
+
+#define MAX_TASK 12
+
+#define VHOST_BLK_FEATURES ((1ULL << VIRTIO_F_RING_PACKED) | \
+ (1ULL << VIRTIO_F_VERSION_1) |\
+ (1ULL << VIRTIO_F_NOTIFY_ON_EMPTY) | \
+ (1ULL << VHOST_USER_F_PROTOCOL_FEATURES))
+
+/* Path to folder where character device will be created. Can be set by user. */
+static char dev_pathname[PATH_MAX] = "";
+static sem_t exit_sem;
+
+struct vhost_blk_ctrlr *
+vhost_blk_ctrlr_find(const char *ctrlr_name)
+{
+ if (ctrlr_name == NULL)
+ return NULL;
+
+ /* currently we only support 1 socket file fd */
+ return g_vhost_ctrlr;
+}
+
+static uint64_t gpa_to_vva(int vid, uint64_t gpa, uint64_t *len)
+{
+ char path[PATH_MAX];
+ struct vhost_blk_ctrlr *ctrlr;
+ int ret = 0;
+
+ ret = rte_vhost_get_ifname(vid, path, PATH_MAX);
+ if (ret) {
+ fprintf(stderr, "Cannot get socket name\n");
+ assert(ret != 0);
+ }
+
+ ctrlr = vhost_blk_ctrlr_find(path);
+ if (!ctrlr) {
+ fprintf(stderr, "Controller is not ready\n");
+ assert(ctrlr != NULL);
+ }
+
+ assert(ctrlr->mem != NULL);
+
+ return rte_vhost_va_from_guest_pa(ctrlr->mem, gpa, len);
+}
+
+static struct vring_packed_desc *
+descriptor_get_next_packed(struct rte_vhost_vring *vq,
+ uint16_t *idx)
+{
+ if (vq->desc_packed[*idx % vq->size].flags & VIRTQ_DESC_F_NEXT) {
+ *idx += 1;
+ return &vq->desc_packed[*idx % vq->size];
+ }
+
+ return NULL;
+}
+
+static bool
+descriptor_has_next_packed(struct vring_packed_desc *cur_desc)
+{
+ return !!(cur_desc->flags & VRING_DESC_F_NEXT);
+}
+
+static bool
+descriptor_is_wr_packed(struct vring_packed_desc *cur_desc)
+{
+ return !!(cur_desc->flags & VRING_DESC_F_WRITE);
+}
+
+static struct rte_vhost_inflight_desc_packed *
+inflight_desc_get_next(struct rte_vhost_inflight_info_packed *inflight_packed,
+ struct rte_vhost_inflight_desc_packed *cur_desc)
+{
+ if (!!(cur_desc->flags & VIRTQ_DESC_F_NEXT))
+ return &inflight_packed->desc[cur_desc->next];
+
+ return NULL;
+}
+
+static bool
+inflight_desc_has_next(struct rte_vhost_inflight_desc_packed *cur_desc)
+{
+ return !!(cur_desc->flags & VRING_DESC_F_NEXT);
+}
+
+static bool
+inflight_desc_is_wr(struct rte_vhost_inflight_desc_packed *cur_desc)
+{
+ return !!(cur_desc->flags & VRING_DESC_F_WRITE);
+}
+
+static void
+inflight_process_payload_chain_packed(struct inflight_blk_task *task)
+{
+ void *data;
+ uint64_t chunck_len;
+ struct vhost_blk_task *blk_task;
+ struct rte_vhost_inflight_desc_packed *desc;
+
+ blk_task = &task->blk_task;
+ blk_task->iovs_cnt = 0;
+
+ do {
+ desc = task->inflight_desc;
+ chunck_len = desc->len;
+ data = (void *)(uintptr_t)gpa_to_vva(blk_task->bdev->vid,
+ desc->addr,
+ &chunck_len);
+ if (!data || chunck_len != desc->len) {
+ fprintf(stderr, "failed to translate desc address.\n");
+ return;
+ }
+
+ blk_task->iovs[blk_task->iovs_cnt].iov_base = data;
+ blk_task->iovs[blk_task->iovs_cnt].iov_len = desc->len;
+ blk_task->data_len += desc->len;
+ blk_task->iovs_cnt++;
+ task->inflight_desc = inflight_desc_get_next(
+ task->inflight_packed, desc);
+ } while (inflight_desc_has_next(task->inflight_desc));
+
+ chunck_len = task->inflight_desc->len;
+ blk_task->status = (void *)(uintptr_t)gpa_to_vva(
+ blk_task->bdev->vid, task->inflight_desc->addr, &chunck_len);
+ if (!blk_task->status || chunck_len != task->inflight_desc->len)
+ fprintf(stderr, "failed to translate desc address.\n");
+}
+
+static void
+inflight_submit_completion_packed(struct inflight_blk_task *task,
+ uint32_t q_idx, uint16_t *used_id,
+ bool *used_wrap_counter)
+{
+ struct vhost_blk_ctrlr *ctrlr;
+ struct rte_vhost_vring *vq;
+ struct vring_packed_desc *desc;
+ int ret;
+
+ ctrlr = vhost_blk_ctrlr_find(dev_pathname);
+ vq = task->blk_task.vq;
+
+ ret = rte_vhost_set_last_inflight_io_packed(ctrlr->bdev->vid, q_idx,
+ task->blk_task.head_idx);
+ if (ret != 0)
+ fprintf(stderr, "failed to set last inflight io\n");
+
+ desc = &vq->desc_packed[*used_id];
+ desc->id = task->blk_task.buffer_id;
+ rte_smp_mb();
+ if (*used_wrap_counter)
+ desc->flags |= VIRTQ_DESC_F_AVAIL | VIRTQ_DESC_F_USED;
+ else
+ desc->flags &= ~(VIRTQ_DESC_F_AVAIL | VIRTQ_DESC_F_USED);
+ rte_smp_mb();
+
+ *used_id += task->blk_task.iovs_cnt + 2;
+ if (*used_id >= vq->size) {
+ *used_id -= vq->size;
+ *used_wrap_counter = !(*used_wrap_counter);
+ }
+
+ ret = rte_vhost_clr_inflight_desc_packed(ctrlr->bdev->vid, q_idx,
+ task->blk_task.head_idx);
+ if (ret != 0)
+ fprintf(stderr, "failed to clear inflight io\n");
+
+ /* Send an interrupt back to the guest VM so that it knows
+ * a completion is ready to be processed.
+ */
+ rte_vhost_vring_call(task->blk_task.bdev->vid, q_idx);
+}
+
+static void
+submit_completion_packed(struct vhost_blk_task *task, uint32_t q_idx,
+ uint16_t *used_id, bool *used_wrap_counter)
+{
+ struct vhost_blk_ctrlr *ctrlr;
+ struct rte_vhost_vring *vq;
+ struct vring_packed_desc *desc;
+ int ret;
+
+ ctrlr = vhost_blk_ctrlr_find(dev_pathname);
+ vq = task->vq;
+
+ ret = rte_vhost_set_last_inflight_io_packed(ctrlr->bdev->vid, q_idx,
+ task->inflight_idx);
+ if (ret != 0)
+ fprintf(stderr, "failed to set last inflight io\n");
+
+ desc = &vq->desc_packed[*used_id];
+ desc->id = task->buffer_id;
+ rte_smp_mb();
+ if (*used_wrap_counter)
+ desc->flags |= VIRTQ_DESC_F_AVAIL | VIRTQ_DESC_F_USED;
+ else
+ desc->flags &= ~(VIRTQ_DESC_F_AVAIL | VIRTQ_DESC_F_USED);
+ rte_smp_mb();
+
+ *used_id += task->iovs_cnt + 2;
+ if (*used_id >= vq->size) {
+ *used_id -= vq->size;
+ *used_wrap_counter = !(*used_wrap_counter);
+ }
+
+ ret = rte_vhost_clr_inflight_desc_packed(ctrlr->bdev->vid, q_idx,
+ task->inflight_idx);
+ if (ret != 0)
+ fprintf(stderr, "failed to clear inflight io\n");
+
+ /* Send an interrupt back to the guest VM so that it knows
+ * a completion is ready to be processed.
+ */
+ rte_vhost_vring_call(task->bdev->vid, q_idx);
+}
+
+static void
+vhost_process_payload_chain_packed(struct vhost_blk_task *task,
+ uint16_t *idx)
+{
+ void *data;
+ uint64_t chunck_len;
+
+ task->iovs_cnt = 0;
+
+ do {
+ chunck_len = task->desc_packed->len;
+ data = (void *)(uintptr_t)gpa_to_vva(task->bdev->vid,
+ task->desc_packed->addr,
+ &chunck_len);
+ if (!data || chunck_len != task->desc_packed->len) {
+ fprintf(stderr, "failed to translate desc address.\n");
+ return;
+ }
+
+ task->iovs[task->iovs_cnt].iov_base = data;
+ task->iovs[task->iovs_cnt].iov_len = task->desc_packed->len;
+ task->data_len += task->desc_packed->len;
+ task->iovs_cnt++;
+ task->desc_packed = descriptor_get_next_packed(task->vq, idx);
+ } while (descriptor_has_next_packed(task->desc_packed));
+
+ task->last_idx = *idx % task->vq->size;
+ chunck_len = task->desc_packed->len;
+ task->status = (void *)(uintptr_t)gpa_to_vva(task->bdev->vid,
+ task->desc_packed->addr,
+ &chunck_len);
+ if (!task->status || chunck_len != task->desc_packed->len)
+ fprintf(stderr, "failed to translate desc address.\n");
+}
+
+
+static int
+descriptor_is_available(struct rte_vhost_vring *vring, uint16_t idx,
+ bool avail_wrap_counter)
+{
+ uint16_t flags = vring->desc_packed[idx].flags;
+
+ return ((!!(flags & VIRTQ_DESC_F_AVAIL) == avail_wrap_counter) &&
+ (!!(flags & VIRTQ_DESC_F_USED) != avail_wrap_counter));
+}
+
+static void
+process_requestq_packed(struct vhost_blk_ctrlr *ctrlr, uint32_t q_idx)
+{
+ bool avail_wrap_counter, used_wrap_counter;
+ uint16_t avail_idx, used_idx;
+ int ret;
+ uint64_t chunck_len;
+ struct vhost_blk_queue *blk_vq;
+ struct rte_vhost_vring *vq;
+ struct vhost_blk_task *task;
+
+ blk_vq = &ctrlr->bdev->queues[q_idx];
+ vq = &blk_vq->vq;
+
+ avail_idx = blk_vq->last_avail_idx;
+ avail_wrap_counter = blk_vq->avail_wrap_counter;
+ used_idx = blk_vq->last_used_idx;
+ used_wrap_counter = blk_vq->used_wrap_counter;
+
+ task = rte_zmalloc(NULL, sizeof(*task), 0);
+ assert(task != NULL);
+ task->vq = vq;
+ task->bdev = ctrlr->bdev;
+
+ while (descriptor_is_available(vq, avail_idx, avail_wrap_counter)) {
+ task->head_idx = avail_idx;
+ task->desc_packed = &task->vq->desc_packed[task->head_idx];
+ task->iovs_cnt = 0;
+ task->data_len = 0;
+ task->req = NULL;
+ task->status = NULL;
+
+ /* does not support indirect descriptors */
+ assert((task->desc_packed->flags & VRING_DESC_F_INDIRECT) == 0);
+
+ chunck_len = task->desc_packed->len;
+ task->req = (void *)(uintptr_t)gpa_to_vva(task->bdev->vid,
+ task->desc_packed->addr, &chunck_len);
+ if (!task->req || chunck_len != task->desc_packed->len) {
+ fprintf(stderr, "failed to translate desc address.\n");
+ rte_free(task);
+ return;
+ }
+
+ task->desc_packed = descriptor_get_next_packed(task->vq,
+ &avail_idx);
+ assert(task->desc_packed != NULL);
+ if (!descriptor_has_next_packed(task->desc_packed)) {
+ task->dxfer_dir = BLK_DIR_NONE;
+ task->last_idx = avail_idx % vq->size;
+ chunck_len = task->desc_packed->len;
+ task->status = (void *)(uintptr_t)
+ gpa_to_vva(task->bdev->vid,
+ task->desc_packed->addr,
+ &chunck_len);
+ if (!task->status ||
+ chunck_len != task->desc_packed->len) {
+ fprintf(stderr,
+ "failed to translate desc address.\n");
+ rte_free(task);
+ return;
+ }
+ } else {
+ task->readtype = descriptor_is_wr_packed(
+ task->desc_packed);
+ vhost_process_payload_chain_packed(task, &avail_idx);
+ }
+ task->buffer_id = vq->desc_packed[task->last_idx].id;
+ rte_vhost_set_inflight_desc_packed(ctrlr->bdev->vid, q_idx,
+ task->head_idx,
+ task->last_idx,
+ &task->inflight_idx);
+
+ if (++avail_idx >= vq->size) {
+ avail_idx -= vq->size;
+ avail_wrap_counter = !avail_wrap_counter;
+ }
+ blk_vq->last_avail_idx = avail_idx;
+ blk_vq->avail_wrap_counter = avail_wrap_counter;
+
+ ret = vhost_bdev_process_blk_commands(ctrlr->bdev, task);
+ if (ret) {
+ /* invalid response */
+ *task->status = VIRTIO_BLK_S_IOERR;
+ } else {
+ /* successfully */
+ *task->status = VIRTIO_BLK_S_OK;
+ }
+
+ submit_completion_packed(task, q_idx, &used_idx,
+ &used_wrap_counter);
+ blk_vq->last_used_idx = used_idx;
+ blk_vq->used_wrap_counter = used_wrap_counter;
+ }
+
+ rte_free(task);
+}
+
+static void
+submit_inflight_vq_packed(struct vhost_blk_ctrlr *ctrlr,
+ uint16_t q_idx)
+{
+ bool used_wrap_counter;
+ int req_idx, ret;
+ uint16_t used_idx;
+ uint64_t chunck_len;
+ struct vhost_blk_queue *blk_vq;
+ struct rte_vhost_ring_inflight *inflight_vq;
+ struct rte_vhost_resubmit_info *resubmit_info;
+ struct rte_vhost_vring *vq;
+ struct inflight_blk_task *task;
+ struct vhost_blk_task *blk_task;
+ struct rte_vhost_inflight_info_packed *inflight_info;
+
+ blk_vq = &ctrlr->bdev->queues[q_idx];
+ vq = &blk_vq->vq;
+ inflight_vq = &blk_vq->inflight_vq;
+ resubmit_info = inflight_vq->resubmit_inflight;
+ inflight_info = inflight_vq->inflight_packed;
+ used_idx = blk_vq->last_used_idx;
+ used_wrap_counter = blk_vq->used_wrap_counter;
+
+ task = rte_malloc(NULL, sizeof(*task), 0);
+ if (!task) {
+ fprintf(stderr, "failed to allocate memory\n");
+ return;
+ }
+ blk_task = &task->blk_task;
+ blk_task->vq = vq;
+ blk_task->bdev = ctrlr->bdev;
+ task->inflight_packed = inflight_vq->inflight_packed;
+
+ while (resubmit_info->resubmit_num-- > 0) {
+ req_idx = resubmit_info->resubmit_num;
+ blk_task->head_idx =
+ resubmit_info->resubmit_list[req_idx].index;
+ task->inflight_desc =
+ &inflight_info->desc[blk_task->head_idx];
+ task->blk_task.iovs_cnt = 0;
+ task->blk_task.data_len = 0;
+ task->blk_task.req = NULL;
+ task->blk_task.status = NULL;
+
+ /* update the avail idx too
+ * as it's initial value equals to used idx
+ */
+ blk_vq->last_avail_idx += task->inflight_desc->num;
+ if (blk_vq->last_avail_idx >= vq->size) {
+ blk_vq->last_avail_idx -= vq->size;
+ blk_vq->avail_wrap_counter =
+ !blk_vq->avail_wrap_counter;
+ }
+
+ /* does not support indirect descriptors */
+ assert(task->inflight_desc != NULL);
+ assert((task->inflight_desc->flags &
+ VRING_DESC_F_INDIRECT) == 0);
+
+ chunck_len = task->inflight_desc->len;
+ blk_task->req = (void *)(uintptr_t)
+ gpa_to_vva(blk_task->bdev->vid,
+ task->inflight_desc->addr,
+ &chunck_len);
+ if (!blk_task->req ||
+ chunck_len != task->inflight_desc->len) {
+ fprintf(stderr, "failed to translate desc address.\n");
+ rte_free(task);
+ return;
+ }
+
+ task->inflight_desc = inflight_desc_get_next(
+ task->inflight_packed, task->inflight_desc);
+ assert(task->inflight_desc != NULL);
+ if (!inflight_desc_has_next(task->inflight_desc)) {
+ blk_task->dxfer_dir = BLK_DIR_NONE;
+ chunck_len = task->inflight_desc->len;
+ blk_task->status = (void *)(uintptr_t)
+ gpa_to_vva(blk_task->bdev->vid,
+ task->inflight_desc->addr,
+ &chunck_len);
+ if (!blk_task->status ||
+ chunck_len != task->inflight_desc->len) {
+ fprintf(stderr,
+ "failed to translate desc address.\n");
+ rte_free(task);
+ return;
+ }
+ } else {
+ blk_task->readtype =
+ inflight_desc_is_wr(task->inflight_desc);
+ inflight_process_payload_chain_packed(task);
+ }
+
+ blk_task->buffer_id = task->inflight_desc->id;
+
+ ret = vhost_bdev_process_blk_commands(ctrlr->bdev, blk_task);
+ if (ret)
+ /* invalid response */
+ *blk_task->status = VIRTIO_BLK_S_IOERR;
+ else
+ /* successfully */
+ *blk_task->status = VIRTIO_BLK_S_OK;
+
+ inflight_submit_completion_packed(task, q_idx, &used_idx,
+ &used_wrap_counter);
+
+ blk_vq->last_used_idx = used_idx;
+ blk_vq->used_wrap_counter = used_wrap_counter;
+ }
+
+ rte_free(task);
+}
+
+static struct vring_desc *
+descriptor_get_next_split(struct vring_desc *vq_desc,
+ struct vring_desc *cur_desc)
+{
+ return &vq_desc[cur_desc->next];
+}
+
+static bool
+descriptor_has_next_split(struct vring_desc *cur_desc)
+{
+ return !!(cur_desc->flags & VRING_DESC_F_NEXT);
+}
+
+static bool
+descriptor_is_wr_split(struct vring_desc *cur_desc)
+{
+ return !!(cur_desc->flags & VRING_DESC_F_WRITE);
+}
+
+static void
+vhost_process_payload_chain_split(struct vhost_blk_task *task)
+{
+ void *data;
+ uint64_t chunck_len;
+
+ task->iovs_cnt = 0;
+
+ do {
+ chunck_len = task->desc_split->len;
+ data = (void *)(uintptr_t)gpa_to_vva(task->bdev->vid,
+ task->desc_split->addr,
+ &chunck_len);
+ if (!data || chunck_len != task->desc_split->len) {
+ fprintf(stderr, "failed to translate desc address.\n");
+ return;
+ }
+
+ task->iovs[task->iovs_cnt].iov_base = data;
+ task->iovs[task->iovs_cnt].iov_len = task->desc_split->len;
+ task->data_len += task->desc_split->len;
+ task->iovs_cnt++;
+ task->desc_split =
+ descriptor_get_next_split(task->vq->desc, task->desc_split);
+ } while (descriptor_has_next_split(task->desc_split));
+
+ chunck_len = task->desc_split->len;
+ task->status = (void *)(uintptr_t)gpa_to_vva(task->bdev->vid,
+ task->desc_split->addr,
+ &chunck_len);
+ if (!task->status || chunck_len != task->desc_split->len)
+ fprintf(stderr, "failed to translate desc address.\n");
+}
+
+static void
+submit_completion_split(struct vhost_blk_task *task, uint32_t vid,
+ uint32_t q_idx)
+{
+ struct rte_vhost_vring *vq;
+ struct vring_used *used;
+
+ vq = task->vq;
+ used = vq->used;
+
+ rte_vhost_set_last_inflight_io_split(vid, q_idx, task->req_idx);
+
+ /* Fill out the next entry in the "used" ring. id = the
+ * index of the descriptor that contained the blk request.
+ * len = the total amount of data transferred for the blk
+ * request. We must report the correct len, for variable
+ * length blk CDBs, where we may return less data than
+ * allocated by the guest VM.
+ */
+ used->ring[used->idx & (vq->size - 1)].id = task->req_idx;
+ used->ring[used->idx & (vq->size - 1)].len = task->data_len;
+ rte_smp_mb();
+ used->idx++;
+ rte_smp_mb();
+
+ rte_vhost_clr_inflight_desc_split(vid, q_idx, used->idx, task->req_idx);
+
+ /* Send an interrupt back to the guest VM so that it knows
+ * a completion is ready to be processed.
+ */
+ rte_vhost_vring_call(task->bdev->vid, q_idx);
+}
+
+static void
+submit_inflight_vq_split(struct vhost_blk_ctrlr *ctrlr,
+ uint32_t q_idx)
+{
+ struct vhost_blk_queue *blk_vq;
+ struct rte_vhost_ring_inflight *inflight_vq;
+ struct rte_vhost_resubmit_info *resubmit_inflight;
+ struct rte_vhost_resubmit_desc *resubmit_list;
+ struct vhost_blk_task *task;
+ int req_idx;
+ uint64_t chunck_len;
+ int ret;
+
+ blk_vq = &ctrlr->bdev->queues[q_idx];
+ inflight_vq = &blk_vq->inflight_vq;
+ resubmit_inflight = inflight_vq->resubmit_inflight;
+ resubmit_list = resubmit_inflight->resubmit_list;
+
+ task = rte_zmalloc(NULL, sizeof(*task), 0);
+ assert(task != NULL);
+
+ task->ctrlr = ctrlr;
+ task->bdev = ctrlr->bdev;
+ task->vq = &blk_vq->vq;
+
+ while (resubmit_inflight->resubmit_num-- > 0) {
+ req_idx = resubmit_list[resubmit_inflight->resubmit_num].index;
+ task->req_idx = req_idx;
+ task->desc_split = &task->vq->desc[task->req_idx];
+ task->iovs_cnt = 0;
+ task->data_len = 0;
+ task->req = NULL;
+ task->status = NULL;
+
+ /* does not support indirect descriptors */
+ assert(task->desc_split != NULL);
+ assert((task->desc_split->flags & VRING_DESC_F_INDIRECT) == 0);
+
+ chunck_len = task->desc_split->len;
+ task->req = (void *)(uintptr_t)gpa_to_vva(task->bdev->vid,
+ task->desc_split->addr, &chunck_len);
+ if (!task->req || chunck_len != task->desc_split->len) {
+ fprintf(stderr, "failed to translate desc address.\n");
+ rte_free(task);
+ return;
+ }
+
+ task->desc_split = descriptor_get_next_split(task->vq->desc,
+ task->desc_split);
+ if (!descriptor_has_next_split(task->desc_split)) {
+ task->dxfer_dir = BLK_DIR_NONE;
+ chunck_len = task->desc_split->len;
+ task->status = (void *)(uintptr_t)
+ gpa_to_vva(task->bdev->vid,
+ task->desc_split->addr,
+ &chunck_len);
+ if (!task->status ||
+ chunck_len != task->desc_split->len) {
+ fprintf(stderr,
+ "failed to translate desc address.\n");
+ rte_free(task);
+ return;
+ }
+ } else {
+ task->readtype =
+ descriptor_is_wr_split(task->desc_split);
+ vhost_process_payload_chain_split(task);
+ }
+
+ ret = vhost_bdev_process_blk_commands(ctrlr->bdev, task);
+ if (ret) {
+ /* invalid response */
+ *task->status = VIRTIO_BLK_S_IOERR;
+ } else {
+ /* successfully */
+ *task->status = VIRTIO_BLK_S_OK;
+ }
+ submit_completion_split(task, ctrlr->bdev->vid, q_idx);
+ }
+
+ rte_free(task);
+}
+
+static void
+process_requestq_split(struct vhost_blk_ctrlr *ctrlr, uint32_t q_idx)
+{
+ int ret;
+ int req_idx;
+ uint16_t last_idx;
+ uint64_t chunck_len;
+ struct vhost_blk_queue *blk_vq;
+ struct rte_vhost_vring *vq;
+ struct vhost_blk_task *task;
+
+ blk_vq = &ctrlr->bdev->queues[q_idx];
+ vq = &blk_vq->vq;
+
+ task = rte_zmalloc(NULL, sizeof(*task), 0);
+ assert(task != NULL);
+ task->ctrlr = ctrlr;
+ task->bdev = ctrlr->bdev;
+ task->vq = vq;
+
+ while (vq->avail->idx != blk_vq->last_avail_idx) {
+ last_idx = blk_vq->last_avail_idx & (vq->size - 1);
+ req_idx = vq->avail->ring[last_idx];
+ task->req_idx = req_idx;
+ task->desc_split = &task->vq->desc[task->req_idx];
+ task->iovs_cnt = 0;
+ task->data_len = 0;
+ task->req = NULL;
+ task->status = NULL;
+
+ rte_vhost_set_inflight_desc_split(ctrlr->bdev->vid, q_idx,
+ task->req_idx);
+
+ /* does not support indirect descriptors */
+ assert((task->desc_split->flags & VRING_DESC_F_INDIRECT) == 0);
+
+ chunck_len = task->desc_split->len;
+ task->req = (void *)(uintptr_t)gpa_to_vva(task->bdev->vid,
+ task->desc_split->addr, &chunck_len);
+ if (!task->req || chunck_len != task->desc_split->len) {
+ fprintf(stderr, "failed to translate desc address.\n");
+ rte_free(task);
+ return;
+ }
+
+ task->desc_split = descriptor_get_next_split(task->vq->desc,
+ task->desc_split);
+ if (!descriptor_has_next_split(task->desc_split)) {
+ task->dxfer_dir = BLK_DIR_NONE;
+ chunck_len = task->desc_split->len;
+ task->status = (void *)(uintptr_t)
+ gpa_to_vva(task->bdev->vid,
+ task->desc_split->addr,
+ &chunck_len);
+ if (!task->status ||
+ chunck_len != task->desc_split->len) {
+ fprintf(stderr,
+ "failed to translate desc address.\n");
+ rte_free(task);
+ return;
+ }
+ } else {
+ task->readtype =
+ descriptor_is_wr_split(task->desc_split);
+ vhost_process_payload_chain_split(task);
+ }
+ blk_vq->last_avail_idx++;
+
+ ret = vhost_bdev_process_blk_commands(ctrlr->bdev, task);
+ if (ret) {
+ /* invalid response */
+ *task->status = VIRTIO_BLK_S_IOERR;
+ } else {
+ /* successfully */
+ *task->status = VIRTIO_BLK_S_OK;
+ }
+
+ submit_completion_split(task, ctrlr->bdev->vid, q_idx);
+ }
+
+ rte_free(task);
+}
+
+static void *
+ctrlr_worker(void *arg)
+{
+ struct vhost_blk_ctrlr *ctrlr = (struct vhost_blk_ctrlr *)arg;
+ struct vhost_blk_queue *blk_vq;
+ struct rte_vhost_ring_inflight *inflight_vq;
+ cpu_set_t cpuset;
+ pthread_t thread;
+ int i;
+
+ fprintf(stdout, "Ctrlr Worker Thread start\n");
+
+ if (ctrlr == NULL || ctrlr->bdev == NULL) {
+ fprintf(stderr,
+ "%s: Error, invalid argument passed to worker thread\n",
+ __func__);
+ exit(0);
+ }
+
+ thread = pthread_self();
+ CPU_ZERO(&cpuset);
+ CPU_SET(0, &cpuset);
+ pthread_setaffinity_np(thread, sizeof(cpu_set_t), &cpuset);
+
+ for (i = 0; i < NUM_OF_BLK_QUEUES; i++) {
+ blk_vq = &ctrlr->bdev->queues[i];
+ inflight_vq = &blk_vq->inflight_vq;
+ if (inflight_vq->resubmit_inflight != NULL &&
+ inflight_vq->resubmit_inflight->resubmit_num != 0) {
+ if (ctrlr->packed_ring)
+ submit_inflight_vq_packed(ctrlr, i);
+ else
+ submit_inflight_vq_split(ctrlr, i);
+ }
+ }
+
+ while (!g_should_stop && ctrlr->bdev != NULL) {
+ for (i = 0; i < NUM_OF_BLK_QUEUES; i++) {
+ if (ctrlr->packed_ring)
+ process_requestq_packed(ctrlr, i);
+ else
+ process_requestq_split(ctrlr, i);
+ }
+ }
+
+ g_should_stop = 2;
+ fprintf(stdout, "Ctrlr Worker Thread Exiting\n");
+ sem_post(&exit_sem);
+ return NULL;
+}
+
+static int
+new_device(int vid)
+{
+ struct vhost_blk_ctrlr *ctrlr;
+ struct vhost_blk_queue *blk_vq;
+ struct rte_vhost_vring *vq;
+ uint64_t features;
+ pthread_t tid;
+ int i, ret;
+
+ ctrlr = vhost_blk_ctrlr_find(dev_pathname);
+ if (!ctrlr) {
+ fprintf(stderr, "Controller is not ready\n");
+ return -1;
+ }
+
+ if (ctrlr->started)
+ return 0;
+
+ ctrlr->bdev->vid = vid;
+ ret = rte_vhost_get_negotiated_features(vid, &features);
+ if (ret) {
+ fprintf(stderr, "failed to get the negotiated features\n");
+ return -1;
+ }
+ ctrlr->packed_ring = !!(features & (1ULL << VIRTIO_F_RING_PACKED));
+
+ ret = rte_vhost_get_mem_table(vid, &ctrlr->mem);
+ if (ret)
+ fprintf(stderr, "Get Controller memory region failed\n");
+ assert(ctrlr->mem != NULL);
+
+ /* Disable Notifications and init last idx */
+ for (i = 0; i < NUM_OF_BLK_QUEUES; i++) {
+ blk_vq = &ctrlr->bdev->queues[i];
+ vq = &blk_vq->vq;
+
+ ret = rte_vhost_get_vhost_vring(ctrlr->bdev->vid, i, vq);
+ assert(ret == 0);
+
+ ret = rte_vhost_get_vring_base(ctrlr->bdev->vid, i,
+ &blk_vq->last_avail_idx,
+ &blk_vq->last_used_idx);
+ assert(ret == 0);
+
+ ret = rte_vhost_get_vhost_ring_inflight(ctrlr->bdev->vid, i,
+ &blk_vq->inflight_vq);
+ assert(ret == 0);
+
+ if (ctrlr->packed_ring) {
+ /* for the reconnection */
+ ret = rte_vhost_get_vring_base_from_inflight(
+ ctrlr->bdev->vid, i,
+ &blk_vq->last_avail_idx,
+ &blk_vq->last_used_idx);
+ assert(ret == 0);
+
+ blk_vq->avail_wrap_counter = blk_vq->last_avail_idx &
+ (1 << 15);
+ blk_vq->last_avail_idx = blk_vq->last_avail_idx &
+ 0x7fff;
+ blk_vq->used_wrap_counter = blk_vq->last_used_idx &
+ (1 << 15);
+ blk_vq->last_used_idx = blk_vq->last_used_idx &
+ 0x7fff;
+ }
+
+ rte_vhost_enable_guest_notification(vid, i, 0);
+ }
+
+ /* start polling vring */
+ g_should_stop = 0;
+ fprintf(stdout, "New Device %s, Device ID %d\n", dev_pathname, vid);
+ if (pthread_create(&tid, NULL, &ctrlr_worker, ctrlr) < 0) {
+ fprintf(stderr, "Worker Thread Started Failed\n");
+ return -1;
+ }
+
+ /* device has been started */
+ ctrlr->started = 1;
+ pthread_detach(tid);
+ return 0;
+}
+
+static void
+destroy_device(int vid)
+{
+ char path[PATH_MAX];
+ struct vhost_blk_ctrlr *ctrlr;
+ struct vhost_blk_queue *blk_vq;
+ int i, ret;
+
+ ret = rte_vhost_get_ifname(vid, path, PATH_MAX);
+ if (ret) {
+ fprintf(stderr, "Destroy Ctrlr Failed\n");
+ return;
+ }
+ fprintf(stdout, "Destroy %s Device ID %d\n", path, vid);
+ ctrlr = vhost_blk_ctrlr_find(path);
+ if (!ctrlr) {
+ fprintf(stderr, "Destroy Ctrlr Failed\n");
+ return;
+ }
+
+ if (!ctrlr->started)
+ return;
+
+ g_should_stop = 1;
+ while (g_should_stop != 2)
+ ;
+
+ for (i = 0; i < NUM_OF_BLK_QUEUES; i++) {
+ blk_vq = &ctrlr->bdev->queues[i];
+ if (ctrlr->packed_ring) {
+ blk_vq->last_avail_idx |= (blk_vq->avail_wrap_counter <<
+ 15);
+ blk_vq->last_used_idx |= (blk_vq->used_wrap_counter <<
+ 15);
+ }
+ rte_vhost_set_vring_base(ctrlr->bdev->vid, i,
+ blk_vq->last_avail_idx,
+ blk_vq->last_used_idx);
+ }
+
+ free(ctrlr->mem);
+
+ ctrlr->started = 0;
+ sem_wait(&exit_sem);
+}
+
+static int
+new_connection(int vid)
+{
+ /* extend the proper features for block device */
+ vhost_session_install_rte_compat_hooks(vid);
+
+ return 0;
+}
+
+struct vhost_device_ops vhost_blk_device_ops = {
+ .new_device = new_device,
+ .destroy_device = destroy_device,
+ .new_connection = new_connection,
+};
+
+static struct vhost_block_dev *
+vhost_blk_bdev_construct(const char *bdev_name,
+ const char *bdev_serial, uint32_t blk_size, uint64_t blk_cnt,
+ bool wce_enable)
+{
+ struct vhost_block_dev *bdev;
+
+ bdev = rte_zmalloc(NULL, sizeof(*bdev), RTE_CACHE_LINE_SIZE);
+ if (!bdev)
+ return NULL;
+
+ strncpy(bdev->name, bdev_name, sizeof(bdev->name));
+ strncpy(bdev->product_name, bdev_serial, sizeof(bdev->product_name));
+ bdev->blocklen = blk_size;
+ bdev->blockcnt = blk_cnt;
+ bdev->write_cache = wce_enable;
+
+ fprintf(stdout, "blocklen=%d, blockcnt=%ld\n", bdev->blocklen,
+ bdev->blockcnt);
+
+ /* use memory as disk storage space */
+ bdev->data = rte_zmalloc(NULL, blk_cnt * blk_size, 0);
+ if (!bdev->data) {
+ fprintf(stderr, "no enough reserved huge memory for disk\n");
+ free(bdev);
+ return NULL;
+ }
+
+ return bdev;
+}
+
+static struct vhost_blk_ctrlr *
+vhost_blk_ctrlr_construct(const char *ctrlr_name)
+{
+ int ret;
+ struct vhost_blk_ctrlr *ctrlr;
+ char *path;
+ char cwd[PATH_MAX];
+
+ /* always use current directory */
+ path = getcwd(cwd, PATH_MAX);
+ if (!path) {
+ fprintf(stderr, "Cannot get current working directory\n");
+ return NULL;
+ }
+ snprintf(dev_pathname, sizeof(dev_pathname), "%s/%s", path, ctrlr_name);
+
+ if (access(dev_pathname, F_OK) != -1) {
+ if (unlink(dev_pathname) != 0)
+ rte_exit(EXIT_FAILURE, "Cannot remove %s.\n",
+ dev_pathname);
+ }
+
+ if (rte_vhost_driver_register(dev_pathname, 0) != 0) {
+ fprintf(stderr, "socket %s already exists\n", dev_pathname);
+ return NULL;
+ }
+
+ ret = rte_vhost_driver_set_features(dev_pathname, VHOST_BLK_FEATURES);
+ if (ret != 0) {
+ fprintf(stderr, "Set vhost driver features failed\n");
+ rte_vhost_driver_unregister(dev_pathname);
+ return NULL;
+ }
+
+ /* set proper features */
+ vhost_dev_install_rte_compat_hooks(dev_pathname);
+
+ ctrlr = rte_zmalloc(NULL, sizeof(*ctrlr), RTE_CACHE_LINE_SIZE);
+ if (!ctrlr) {
+ rte_vhost_driver_unregister(dev_pathname);
+ return NULL;
+ }
+
+ /* hardcoded block device information with 128MiB */
+ ctrlr->bdev = vhost_blk_bdev_construct("malloc0", "vhost_blk_malloc0",
+ 4096, 32768, 0);
+ if (!ctrlr->bdev) {
+ rte_free(ctrlr);
+ rte_vhost_driver_unregister(dev_pathname);
+ return NULL;
+ }
+
+ rte_vhost_driver_callback_register(dev_pathname,
+ &vhost_blk_device_ops);
+
+ return ctrlr;
+}
+
+static void
+signal_handler(__rte_unused int signum)
+{
+ struct vhost_blk_ctrlr *ctrlr;
+
+ if (access(dev_pathname, F_OK) == 0)
+ unlink(dev_pathname);
+
+ g_should_stop = 1;
+
+ while (g_should_stop != 2)
+ ;
+
+ ctrlr = vhost_blk_ctrlr_find(dev_pathname);
+ if (ctrlr != NULL) {
+ if (ctrlr->bdev != NULL) {
+ rte_free(ctrlr->bdev->data);
+ rte_free(ctrlr->bdev);
+ }
+ rte_free(ctrlr);
+ }
+
+ rte_vhost_driver_unregister(dev_pathname);
+ exit(0);
+}
+
+int main(int argc, char *argv[])
+{
+ int ret;
+
+ signal(SIGINT, signal_handler);
+
+ /* init EAL */
+ ret = rte_eal_init(argc, argv);
+ if (ret < 0)
+ rte_exit(EXIT_FAILURE, "Error with EAL initialization\n");
+
+ g_vhost_ctrlr = vhost_blk_ctrlr_construct("vhost.socket");
+ if (g_vhost_ctrlr == NULL) {
+ fprintf(stderr, "Construct vhost blk controller failed\n");
+ return 0;
+ }
+
+ if (sem_init(&exit_sem, 0, 0) < 0) {
+ fprintf(stderr, "Error init exit_sem\n");
+ return -1;
+ }
+
+ rte_vhost_driver_start(dev_pathname);
+
+ /* loop for exit the application */
+ while (1)
+ sleep(1);
+
+ return 0;
+}
+
diff --git a/examples/vhost_blk/vhost_blk.h b/examples/vhost_blk/vhost_blk.h
new file mode 100644
index 000000000..a0f2e4a8e
--- /dev/null
+++ b/examples/vhost_blk/vhost_blk.h
@@ -0,0 +1,128 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2017 Intel Corporation
+ */
+
+#ifndef _VHOST_BLK_H_
+#define _VHOST_BLK_H_
+
+#include <stdio.h>
+#include <sys/uio.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <linux/virtio_blk.h>
+#include <linux/virtio_ring.h>
+
+#include <rte_vhost.h>
+
+struct vring_packed_desc {
+ /* Buffer Address. */
+ __le64 addr;
+ /* Buffer Length. */
+ __le32 len;
+ /* Buffer ID. */
+ __le16 id;
+ /* The flags depending on descriptor type. */
+ __le16 flags;
+};
+
+struct vhost_blk_queue {
+ struct rte_vhost_vring vq;
+ struct rte_vhost_ring_inflight inflight_vq;
+ uint16_t last_avail_idx;
+ uint16_t last_used_idx;
+ bool avail_wrap_counter;
+ bool used_wrap_counter;
+};
+
+#define NUM_OF_BLK_QUEUES 1
+
+#ifndef VIRTIO_F_RING_PACKED
+#define VIRTIO_F_RING_PACKED 34
+#endif
+
+#define min(a, b) (((a) < (b)) ? (a) : (b))
+
+struct vhost_block_dev {
+ /** ID for vhost library. */
+ int vid;
+ /** Queues for the block device */
+ struct vhost_blk_queue queues[NUM_OF_BLK_QUEUES];
+ /** Unique name for this block device. */
+ char name[64];
+
+ /** Unique product name for this kind of block device. */
+ char product_name[256];
+
+ /** Size in bytes of a logical block for the backend */
+ uint32_t blocklen;
+
+ /** Number of blocks */
+ uint64_t blockcnt;
+
+ /** write cache enabled, not used at the moment */
+ int write_cache;
+
+ /** use memory as disk storage space */
+ uint8_t *data;
+};
+
+struct vhost_blk_ctrlr {
+ uint8_t started;
+ uint8_t packed_ring;
+ uint8_t need_restart;
+ /** Only support 1 LUN for the example */
+ struct vhost_block_dev *bdev;
+ /** VM memory region */
+ struct rte_vhost_memory *mem;
+} __rte_cache_aligned;
+
+#define VHOST_BLK_MAX_IOVS 128
+
+enum blk_data_dir {
+ BLK_DIR_NONE = 0,
+ BLK_DIR_TO_DEV = 1,
+ BLK_DIR_FROM_DEV = 2,
+};
+
+struct vhost_blk_task {
+ uint8_t readtype;
+ uint8_t req_idx;
+ uint16_t head_idx;
+ uint16_t last_idx;
+ uint16_t inflight_idx;
+ uint16_t buffer_id;
+ uint32_t dxfer_dir;
+ uint32_t data_len;
+ struct virtio_blk_outhdr *req;
+
+ volatile uint8_t *status;
+
+ struct iovec iovs[VHOST_BLK_MAX_IOVS];
+ uint32_t iovs_cnt;
+ struct vring_packed_desc *desc_packed;
+ struct vring_desc *desc_split;
+ struct rte_vhost_vring *vq;
+ struct vhost_block_dev *bdev;
+ struct vhost_blk_ctrlr *ctrlr;
+};
+
+struct inflight_blk_task {
+ struct vhost_blk_task blk_task;
+ struct rte_vhost_inflight_desc_packed *inflight_desc;
+ struct rte_vhost_inflight_info_packed *inflight_packed;
+};
+
+struct vhost_blk_ctrlr *g_vhost_ctrlr;
+struct vhost_device_ops vhost_blk_device_ops;
+int g_should_stop;
+
+int vhost_bdev_process_blk_commands(struct vhost_block_dev *bdev,
+ struct vhost_blk_task *task);
+
+void vhost_session_install_rte_compat_hooks(uint32_t vid);
+
+void vhost_dev_install_rte_compat_hooks(const char *path);
+
+struct vhost_blk_ctrlr *vhost_blk_ctrlr_find(const char *ctrlr_name);
+
+#endif /* _VHOST_blk_H_ */
diff --git a/examples/vhost_blk/vhost_blk_compat.c b/examples/vhost_blk/vhost_blk_compat.c
new file mode 100644
index 000000000..865f1affb
--- /dev/null
+++ b/examples/vhost_blk/vhost_blk_compat.c
@@ -0,0 +1,195 @@
+// SPDX-License-Identifier: BSD-3-Clause
+// Copyright(c) 2010-2017 Intel Corporation
+
+#ifndef _VHOST_BLK_COMPAT_H_
+#define _VHOST_BLK_COMPAT_H_
+
+#include <sys/uio.h>
+#include <stdint.h>
+#include <linux/virtio_blk.h>
+#include <linux/virtio_ring.h>
+
+#include <rte_vhost.h>
+#include "vhost_blk.h"
+#include "blk_spec.h"
+
+#define VHOST_MAX_VQUEUES 256
+#define SPDK_VHOST_MAX_VQ_SIZE 1024
+
+#define VHOST_USER_GET_CONFIG 24
+#define VHOST_USER_SET_CONFIG 25
+
+static int
+vhost_blk_get_config(struct vhost_block_dev *bdev, uint8_t *config,
+ uint32_t len)
+{
+ struct virtio_blk_config blkcfg;
+ uint32_t blk_size;
+ uint64_t blkcnt;
+
+ if (bdev == NULL) {
+ /* We can't just return -1 here as this GET_CONFIG message might
+ * be caused by a QEMU VM reboot. Returning -1 will indicate an
+ * error to QEMU, who might then decide to terminate itself.
+ * We don't want that. A simple reboot shouldn't break the
+ * system.
+ *
+ * Presenting a block device with block size 0 and block count 0
+ * doesn't cause any problems on QEMU side and the virtio-pci
+ * device is even still available inside the VM, but there will
+ * be no block device created for it - the kernel drivers will
+ * silently reject it.
+ */
+ blk_size = 0;
+ blkcnt = 0;
+ } else {
+ blk_size = bdev->blocklen;
+ blkcnt = bdev->blockcnt;
+ }
+
+ memset(&blkcfg, 0, sizeof(blkcfg));
+ blkcfg.blk_size = blk_size;
+ /* minimum I/O size in blocks */
+ blkcfg.min_io_size = 1;
+ /* expressed in 512 Bytes sectors */
+ blkcfg.capacity = (blkcnt * blk_size) / 512;
+ /* QEMU can overwrite this value when started */
+ blkcfg.num_queues = VHOST_MAX_VQUEUES;
+
+ fprintf(stdout, "block device:blk_size = %d, blkcnt = %ld\n", blk_size,
+ blkcnt);
+
+ memcpy(config, &blkcfg, min(len, sizeof(blkcfg)));
+
+ return 0;
+}
+
+static enum rte_vhost_msg_result
+extern_vhost_pre_msg_handler(int vid, void *_msg)
+{
+ char path[PATH_MAX];
+ struct vhost_blk_ctrlr *ctrlr;
+ struct vhost_user_msg *msg = _msg;
+ int ret;
+
+ ret = rte_vhost_get_ifname(vid, path, PATH_MAX);
+ if (ret) {
+ fprintf(stderr, "Cannot get socket name\n");
+ return -1;
+ }
+
+ ctrlr = vhost_blk_ctrlr_find(path);
+ if (!ctrlr) {
+ fprintf(stderr, "Controller is not ready\n");
+ return -1;
+ }
+
+ switch ((int)msg->request) {
+ case VHOST_USER_GET_VRING_BASE:
+ if (!g_should_stop && ctrlr->started)
+ vhost_blk_device_ops.destroy_device(vid);
+ break;
+ case VHOST_USER_SET_VRING_BASE:
+ case VHOST_USER_SET_VRING_ADDR:
+ case VHOST_USER_SET_VRING_NUM:
+ case VHOST_USER_SET_VRING_KICK:
+ if (!g_should_stop && ctrlr->started)
+ vhost_blk_device_ops.destroy_device(vid);
+ break;
+ case VHOST_USER_SET_VRING_CALL:
+ case VHOST_USER_SET_MEM_TABLE:
+ if (!g_should_stop && ctrlr->started) {
+ vhost_blk_device_ops.destroy_device(vid);
+ ctrlr->need_restart = 1;
+ }
+ break;
+ case VHOST_USER_GET_CONFIG: {
+ int rc = 0;
+
+ rc = vhost_blk_get_config(ctrlr->bdev,
+ msg->payload.cfg.region,
+ msg->payload.cfg.size);
+ if (rc != 0)
+ msg->size = 0;
+
+ return RTE_VHOST_MSG_RESULT_REPLY;
+ }
+ case VHOST_USER_SET_CONFIG:
+ default:
+ break;
+ }
+
+ return RTE_VHOST_MSG_RESULT_NOT_HANDLED;
+}
+
+static enum rte_vhost_msg_result
+extern_vhost_post_msg_handler(int vid, void *_msg)
+{
+ char path[PATH_MAX];
+ struct vhost_blk_ctrlr *ctrlr;
+ struct vhost_user_msg *msg = _msg;
+ int ret;
+
+ ret = rte_vhost_get_ifname(vid, path, PATH_MAX);
+ if (ret) {
+ fprintf(stderr, "Cannot get socket name\n");
+ return -1;
+ }
+
+ ctrlr = vhost_blk_ctrlr_find(path);
+ if (!ctrlr) {
+ fprintf(stderr, "Controller is not ready\n");
+ return -1;
+ }
+
+ if (ctrlr->need_restart) {
+ vhost_blk_device_ops.new_device(vid);
+ ctrlr->need_restart = 0;
+ }
+
+ switch (msg->request) {
+ case VHOST_USER_SET_FEATURES:
+ break;
+ case VHOST_USER_SET_VRING_KICK:
+ /* vhost-user spec tells us to start polling a queue after
+ * receiving its SET_VRING_KICK message. Let's do it!
+ */
+ if (g_should_stop && !ctrlr->started)
+ vhost_blk_device_ops.new_device(vid);
+ break;
+ default:
+ break;
+ }
+
+ return RTE_VHOST_MSG_RESULT_NOT_HANDLED;
+}
+
+struct rte_vhost_user_extern_ops g_extern_vhost_ops = {
+ .pre_msg_handle = extern_vhost_pre_msg_handler,
+ .post_msg_handle = extern_vhost_post_msg_handler,
+};
+
+void
+vhost_session_install_rte_compat_hooks(uint32_t vid)
+{
+ int rc;
+
+ rc = rte_vhost_extern_callback_register(vid, &g_extern_vhost_ops, NULL);
+ if (rc != 0)
+ fprintf(stderr,
+ "rte_vhost_extern_callback_register() failed for vid = %d\n",
+ vid);
+}
+
+void
+vhost_dev_install_rte_compat_hooks(const char *path)
+{
+ uint64_t protocol_features = 0;
+
+ rte_vhost_driver_get_protocol_features(path, &protocol_features);
+ protocol_features |= (1ULL << VHOST_USER_PROTOCOL_F_CONFIG);
+ protocol_features |= (1ULL << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD);
+ rte_vhost_driver_set_protocol_features(path, protocol_features);
+}
+
+#endif
--
2.17.2
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [dpdk-dev] [PATCH v11 9/9] vhost: add vhost-user-blk example which support inflight
2019-10-09 20:48 ` [dpdk-dev] [PATCH v11 9/9] vhost: add vhost-user-blk example which support inflight Jin Yu
@ 2019-10-11 10:24 ` Maxime Coquelin
2019-10-28 19:37 ` [dpdk-dev] [PATCH] " Jin Yu
1 sibling, 0 replies; 27+ messages in thread
From: Maxime Coquelin @ 2019-10-11 10:24 UTC (permalink / raw)
To: Jin Yu, dev; +Cc: changpeng.liu, tiwei.bie, zhihong.wang
On 10/9/19 10:48 PM, Jin Yu wrote:
> A vhost-user-blk example that support inflight feature. It uses the
> new APIs that introduced in the first patch, so it can show how these
> APIs work to support inflight feature.
>
> Signed-off-by: Jin Yu <jin.yu@intel.com>
> ---
> V1 - add the case.
> V2 - add the rte_vhost prefix.
> V3 - add packed ring support
> ---
> examples/vhost_blk/Makefile | 68 ++
> examples/vhost_blk/blk.c | 125 +++
> examples/vhost_blk/blk_spec.h | 95 +++
> examples/vhost_blk/meson.build | 21 +
> examples/vhost_blk/vhost_blk.c | 1092 +++++++++++++++++++++++++
> examples/vhost_blk/vhost_blk.h | 128 +++
> examples/vhost_blk/vhost_blk_compat.c | 195 +++++
> 7 files changed, 1724 insertions(+)
> create mode 100644 examples/vhost_blk/Makefile
> create mode 100644 examples/vhost_blk/blk.c
> create mode 100644 examples/vhost_blk/blk_spec.h
> create mode 100644 examples/vhost_blk/meson.build
> create mode 100644 examples/vhost_blk/vhost_blk.c
> create mode 100644 examples/vhost_blk/vhost_blk.h
> create mode 100644 examples/vhost_blk/vhost_blk_compat.c
>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
^ permalink raw reply [flat|nested] 27+ messages in thread
* [dpdk-dev] [PATCH] vhost: add vhost-user-blk example which support inflight
2019-10-09 20:48 ` [dpdk-dev] [PATCH v11 9/9] vhost: add vhost-user-blk example which support inflight Jin Yu
2019-10-11 10:24 ` Maxime Coquelin
@ 2019-10-28 19:37 ` Jin Yu
2019-11-01 10:42 ` [dpdk-dev] [PATCH v5] " Jin Yu
1 sibling, 1 reply; 27+ messages in thread
From: Jin Yu @ 2019-10-28 19:37 UTC (permalink / raw)
To: Thomas Monjalon, John McNamara, Marko Kovacevic, Maxime Coquelin,
Tiwei Bie, Zhihong Wang
Cc: dev, Jin Yu
A vhost-user-blk example that support inflight feature. It uses the
new APIs that introduced in the first patch, so it can show how these
APIs work to support inflight feature.
Signed-off-by: Jin Yu <jin.yu@intel.com>
---
V1 - add the case.
V2 - add the rte_vhost prefix.
V3 - add packed ring support
V4 - fix build, MAINTAINERS and add guides
---
MAINTAINERS | 2 +
doc/guides/sample_app_ug/index.rst | 1 +
doc/guides/sample_app_ug/vhost_blk.rst | 63 ++
examples/meson.build | 2 +-
examples/vhost_blk/Makefile | 68 ++
examples/vhost_blk/blk.c | 125 +++
examples/vhost_blk/blk_spec.h | 95 ++
examples/vhost_blk/meson.build | 21 +
examples/vhost_blk/vhost_blk.c | 1094 ++++++++++++++++++++++++
examples/vhost_blk/vhost_blk.h | 127 +++
examples/vhost_blk/vhost_blk_compat.c | 173 ++++
11 files changed, 1770 insertions(+), 1 deletion(-)
create mode 100644 doc/guides/sample_app_ug/vhost_blk.rst
create mode 100644 examples/vhost_blk/Makefile
create mode 100644 examples/vhost_blk/blk.c
create mode 100644 examples/vhost_blk/blk_spec.h
create mode 100644 examples/vhost_blk/meson.build
create mode 100644 examples/vhost_blk/vhost_blk.c
create mode 100644 examples/vhost_blk/vhost_blk.h
create mode 100644 examples/vhost_blk/vhost_blk_compat.c
diff --git a/MAINTAINERS b/MAINTAINERS
index 717c31801..c22a8312e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -839,6 +839,8 @@ F: lib/librte_vhost/
F: doc/guides/prog_guide/vhost_lib.rst
F: examples/vhost/
F: doc/guides/sample_app_ug/vhost.rst
+F: example/vhost_blk/
+F: doc/guides/sample_app_ug/vhost_blk.rst
F: examples/vhost_crypto/
F: examples/vdpa/
F: doc/guides/sample_app_ug/vdpa.rst
diff --git a/doc/guides/sample_app_ug/index.rst b/doc/guides/sample_app_ug/index.rst
index a3737c118..613f483f3 100644
--- a/doc/guides/sample_app_ug/index.rst
+++ b/doc/guides/sample_app_ug/index.rst
@@ -40,6 +40,7 @@ Sample Applications User Guides
packet_ordering
vmdq_dcb_forwarding
vhost
+ vhost_blk
vhost_crypto
vdpa
ip_pipeline
diff --git a/doc/guides/sample_app_ug/vhost_blk.rst b/doc/guides/sample_app_ug/vhost_blk.rst
new file mode 100644
index 000000000..39096e2e4
--- /dev/null
+++ b/doc/guides/sample_app_ug/vhost_blk.rst
@@ -0,0 +1,63 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+ Copyright(c) 2010-2017 Intel Corporation.
+
+Vhost_blk Sample Application
+=============================
+
+The vhost_blk sample application implemented a simple block device,
+which used as the backend of Qemu vhost-user-blk device. Users can extend
+the exist example to use other type of block device(e.g. AIO) besides
+memory based block device. Similar with vhost-user-net device, the sample
+application used domain socket to communicate with Qemu, and the virtio
+ring (split or packed format) was processed by vhost_blk sample application.
+
+The sample application reuse lots codes from SPDK(Storage Performance
+Development Kit, https://github.com/spdk/spdk) vhost-user-blk target,
+for DPDK vhost library used in storage area, user can take SPDK as
+reference as well.
+
+Testing steps
+-------------
+
+This section shows the steps how to start a VM with the block device as
+fast data path for critical application.
+
+Compiling the Application
+-------------------------
+
+To compile the sample application see :doc:`compiling`.
+
+The application is located in the ``examples`` sub-directory.
+
+You will also need to build DPDK both on the host and inside the guest
+
+Start the vhost_blk example
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: console
+
+ ./vhost_blk -m 1024
+
+.. _vhost_blk_app_run_vm:
+
+Start the VM
+~~~~~~~~~~~~
+
+.. code-block:: console
+
+ qemu-system-x86_64 -machine accel=kvm \
+ -m $mem -object memory-backend-file,id=mem,size=$mem,\
+ mem-path=/dev/hugepages,share=on -numa node,memdev=mem \
+ -drive file=os.img,if=none,id=disk \
+ -device ide-hd,drive=disk,bootindex=0 \
+ -chardev socket,id=char0,reconnect=1,path=/tmp/vhost.socket \
+ -device vhost-user-blk-pci,ring_packed=1,chardev=char0,num-queues=1 \
+ ...
+
+.. note::
+ You must check whether your Qemu can support "vhost-user-blk" or not,
+ Qemu v4.0 or newer version is required.
+ reconnect=1 means live recovery support that qemu can reconnect vhost_blk
+ after we restart vhost_blk example.
+ ring_packed=1 means the device support packed ring but need the guest kernel
+ version >= 5.0
diff --git a/examples/meson.build b/examples/meson.build
index 98ae50a49..10a6bd7ef 100644
--- a/examples/meson.build
+++ b/examples/meson.build
@@ -42,7 +42,7 @@ all_examples = [
'skeleton', 'tep_termination',
'timer', 'vdpa',
'vhost', 'vhost_crypto',
- 'vm_power_manager',
+ 'vhost_blk', 'vm_power_manager',
'vm_power_manager/guest_cli',
'vmdq', 'vmdq_dcb',
]
diff --git a/examples/vhost_blk/Makefile b/examples/vhost_blk/Makefile
new file mode 100644
index 000000000..a10a90071
--- /dev/null
+++ b/examples/vhost_blk/Makefile
@@ -0,0 +1,68 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2010-2014 Intel Corporation
+
+# binary name
+APP = vhost-blk
+
+# all source are stored in SRCS-y
+SRCS-y := blk.c vhost_blk.c vhost_blk_compat.c
+
+# Build using pkg-config variables if possible
+$(shell pkg-config --exists libdpdk)
+ifeq ($(.SHELLSTATUS),0)
+
+all: shared
+.PHONY: shared static
+shared: build/$(APP)-shared
+ ln -sf $(APP)-shared build/$(APP)
+static: build/$(APP)-static
+ ln -sf $(APP)-static build/$(APP)
+
+LDFLAGS += -pthread
+
+PC_FILE := $(shell pkg-config --path libdpdk)
+CFLAGS += -O3 $(shell pkg-config --cflags libdpdk)
+LDFLAGS_SHARED = $(shell pkg-config --libs libdpdk)
+LDFLAGS_STATIC = -Wl,-Bstatic $(shell pkg-config --static --libs libdpdk)
+
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+
+build/$(APP)-shared: $(SRCS-y) Makefile $(PC_FILE) | build
+ $(CC) $(CFLAGS) $(SRCS-y) -o $@ $(LDFLAGS) $(LDFLAGS_SHARED)
+
+build/$(APP)-static: $(SRCS-y) Makefile $(PC_FILE) | build
+ $(CC) $(CFLAGS) $(SRCS-y) -o $@ $(LDFLAGS) $(LDFLAGS_STATIC)
+
+build:
+ @mkdir -p $@
+
+.PHONY: clean
+clean:
+ rm -f build/$(APP) build/$(APP)-static build/$(APP)-shared
+ test -d build && rmdir -p build || true
+
+else # Build using legacy build system
+
+ifeq ($(RTE_SDK),)
+$(error "Please define RTE_SDK environment variable")
+endif
+
+# Default target, detect a build directory, by looking for a path with a .config
+RTE_TARGET ?= $(notdir $(abspath $(dir $(firstword $(wildcard $(RTE_SDK)/*/.config)))))
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+ifneq ($(CONFIG_RTE_EXEC_ENV_LINUX),y)
+$(info This application can only operate in a linux environment, \
+please change the definition of the RTE_TARGET environment variable)
+all:
+else
+
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+CFLAGS += -O2 -D_FILE_OFFSET_BITS=64
+CFLAGS += $(WERROR_FLAGS)
+
+include $(RTE_SDK)/mk/rte.extapp.mk
+
+endif
+endif
diff --git a/examples/vhost_blk/blk.c b/examples/vhost_blk/blk.c
new file mode 100644
index 000000000..424ed3015
--- /dev/null
+++ b/examples/vhost_blk/blk.c
@@ -0,0 +1,125 @@
+// SPDX-License-Identifier: BSD-3-Clause
+// Copyright(c) 2010-2019 Intel Corporation
+
+/**
+ * This work is largely based on the "vhost-user-blk" implementation by
+ * SPDK(https://github.com/spdk/spdk).
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <assert.h>
+#include <ctype.h>
+#include <string.h>
+#include <stddef.h>
+
+#include <rte_atomic.h>
+#include <rte_cycles.h>
+#include <rte_log.h>
+#include <rte_malloc.h>
+#include <rte_byteorder.h>
+#include <rte_string_fns.h>
+
+#include "vhost_blk.h"
+#include "blk_spec.h"
+
+static void
+vhost_strcpy_pad(void *dst, const char *src, size_t size, int pad)
+{
+ size_t len;
+
+ len = strlen(src);
+ if (len < size) {
+ memcpy(dst, src, len);
+ memset((char *)dst + len, pad, size - len);
+ } else {
+ memcpy(dst, src, size);
+ }
+}
+
+static int
+vhost_bdev_blk_readwrite(struct vhost_block_dev *bdev,
+ struct vhost_blk_task *task,
+ uint64_t lba_512, __rte_unused uint32_t xfer_len)
+{
+ uint32_t i;
+ uint64_t offset;
+ uint32_t nbytes = 0;
+
+ offset = lba_512 * 512;
+
+ for (i = 0; i < task->iovs_cnt; i++) {
+ if (task->dxfer_dir == BLK_DIR_TO_DEV)
+ memcpy(bdev->data + offset, task->iovs[i].iov_base,
+ task->iovs[i].iov_len);
+ else
+ memcpy(task->iovs[i].iov_base, bdev->data + offset,
+ task->iovs[i].iov_len);
+ offset += task->iovs[i].iov_len;
+ nbytes += task->iovs[i].iov_len;
+ }
+
+ return nbytes;
+}
+
+int
+vhost_bdev_process_blk_commands(struct vhost_block_dev *bdev,
+ struct vhost_blk_task *task)
+{
+ int used_len;
+
+ if (unlikely(task->data_len > (bdev->blockcnt * bdev->blocklen))) {
+ fprintf(stderr, "read or write beyond capacity\n");
+ return VIRTIO_BLK_S_UNSUPP;
+ }
+
+ switch (task->req->type) {
+ case VIRTIO_BLK_T_IN:
+ if (unlikely(task->data_len == 0 ||
+ (task->data_len & (512 - 1)) != 0)) {
+ fprintf(stderr,
+ "%s - passed IO buffer is not multiple of 512b"
+ "(req_idx = %"PRIu16").\n",
+ task->req->type ? "WRITE" : "READ",
+ task->head_idx);
+ return VIRTIO_BLK_S_UNSUPP;
+ }
+
+ task->dxfer_dir = BLK_DIR_FROM_DEV;
+ vhost_bdev_blk_readwrite(bdev, task,
+ task->req->sector, task->data_len);
+ break;
+ case VIRTIO_BLK_T_OUT:
+ if (unlikely(task->data_len == 0 ||
+ (task->data_len & (512 - 1)) != 0)) {
+ fprintf(stderr,
+ "%s - passed IO buffer is not multiple of 512b"
+ "(req_idx = %"PRIu16").\n",
+ task->req->type ? "WRITE" : "READ",
+ task->head_idx);
+ return VIRTIO_BLK_S_UNSUPP;
+ }
+
+ if (task->readtype) {
+ fprintf(stderr, "type isn't right\n");
+ return VIRTIO_BLK_S_IOERR;
+ }
+ task->dxfer_dir = BLK_DIR_TO_DEV;
+ vhost_bdev_blk_readwrite(bdev, task,
+ task->req->sector, task->data_len);
+ break;
+ case VIRTIO_BLK_T_GET_ID:
+ if (!task->iovs_cnt || task->data_len)
+ return VIRTIO_BLK_S_UNSUPP;
+ used_len = min(VIRTIO_BLK_ID_BYTES, task->data_len);
+ vhost_strcpy_pad(task->iovs[0].iov_base,
+ bdev->product_name, used_len, ' ');
+ break;
+ default:
+ fprintf(stderr, "unsupported cmd\n");
+ return VIRTIO_BLK_S_UNSUPP;
+ }
+
+ return VIRTIO_BLK_S_OK;
+}
diff --git a/examples/vhost_blk/blk_spec.h b/examples/vhost_blk/blk_spec.h
new file mode 100644
index 000000000..5875e2f86
--- /dev/null
+++ b/examples/vhost_blk/blk_spec.h
@@ -0,0 +1,95 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Intel Corporation
+ */
+
+#ifndef _BLK_SPEC_H
+#define _BLK_SPEC_H
+
+#include <stdint.h>
+
+#ifndef VHOST_USER_MEMORY_MAX_NREGIONS
+#define VHOST_USER_MEMORY_MAX_NREGIONS 8
+#endif
+
+#ifndef VHOST_USER_MAX_CONFIG_SIZE
+#define VHOST_USER_MAX_CONFIG_SIZE 256
+#endif
+
+#ifndef VHOST_USER_PROTOCOL_F_CONFIG
+#define VHOST_USER_PROTOCOL_F_CONFIG 9
+#endif
+
+#ifndef VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD
+#define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD 12
+#endif
+
+#define VIRTIO_BLK_ID_BYTES 20 /* ID string length */
+
+#define VIRTIO_BLK_T_IN 0
+#define VIRTIO_BLK_T_OUT 1
+#define VIRTIO_BLK_T_FLUSH 4
+#define VIRTIO_BLK_T_GET_ID 8
+#define VIRTIO_BLK_T_DISCARD 11
+#define VIRTIO_BLK_T_WRITE_ZEROES 13
+
+#define VIRTIO_BLK_S_OK 0
+#define VIRTIO_BLK_S_IOERR 1
+#define VIRTIO_BLK_S_UNSUPP 2
+
+enum vhost_user_request {
+ VHOST_USER_NONE = 0,
+ VHOST_USER_GET_FEATURES = 1,
+ VHOST_USER_SET_FEATURES = 2,
+ VHOST_USER_SET_OWNER = 3,
+ VHOST_USER_RESET_OWNER = 4,
+ VHOST_USER_SET_MEM_TABLE = 5,
+ VHOST_USER_SET_LOG_BASE = 6,
+ VHOST_USER_SET_LOG_FD = 7,
+ VHOST_USER_SET_VRING_NUM = 8,
+ VHOST_USER_SET_VRING_ADDR = 9,
+ VHOST_USER_SET_VRING_BASE = 10,
+ VHOST_USER_GET_VRING_BASE = 11,
+ VHOST_USER_SET_VRING_KICK = 12,
+ VHOST_USER_SET_VRING_CALL = 13,
+ VHOST_USER_SET_VRING_ERR = 14,
+ VHOST_USER_GET_PROTOCOL_FEATURES = 15,
+ VHOST_USER_SET_PROTOCOL_FEATURES = 16,
+ VHOST_USER_GET_QUEUE_NUM = 17,
+ VHOST_USER_SET_VRING_ENABLE = 18,
+ VHOST_USER_MAX
+};
+
+/** Get/set config msg payload */
+struct vhost_user_config {
+ uint32_t offset;
+ uint32_t size;
+ uint32_t flags;
+ uint8_t region[VHOST_USER_MAX_CONFIG_SIZE];
+};
+
+/** Fixed-size vhost_memory struct */
+struct vhost_memory_padded {
+ uint32_t nregions;
+ uint32_t padding;
+ struct vhost_memory_region regions[VHOST_USER_MEMORY_MAX_NREGIONS];
+};
+
+struct vhost_user_msg {
+ enum vhost_user_request request;
+
+#define VHOST_USER_VERSION_MASK 0x3
+#define VHOST_USER_REPLY_MASK (0x1 << 2)
+ uint32_t flags;
+ uint32_t size; /**< the following payload size */
+ union {
+#define VHOST_USER_VRING_IDX_MASK 0xff
+#define VHOST_USER_VRING_NOFD_MASK (0x1 << 8)
+ uint64_t u64;
+ struct vhost_vring_state state;
+ struct vhost_vring_addr addr;
+ struct vhost_memory_padded memory;
+ struct vhost_user_config cfg;
+ } payload;
+} __attribute((packed));
+
+#endif
diff --git a/examples/vhost_blk/meson.build b/examples/vhost_blk/meson.build
new file mode 100644
index 000000000..857367192
--- /dev/null
+++ b/examples/vhost_blk/meson.build
@@ -0,0 +1,21 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2017 Intel Corporation
+
+# meson file, for building this example as part of a main DPDK build.
+#
+# To build this example as a standalone application with an already-installed
+# DPDK instance, use 'make'
+
+if not is_linux
+ build = false
+endif
+
+if not cc.has_header('linux/virtio_blk.h')
+ build = false
+endif
+
+deps += 'vhost'
+allow_experimental_apis = true
+sources = files(
+ 'blk.c', 'vhost_blk.c', 'vhost_blk_compat.c'
+)
diff --git a/examples/vhost_blk/vhost_blk.c b/examples/vhost_blk/vhost_blk.c
new file mode 100644
index 000000000..8411577ed
--- /dev/null
+++ b/examples/vhost_blk/vhost_blk.c
@@ -0,0 +1,1094 @@
+// SPDX-License-Identifier: BSD-3-Clause
+// Copyright(c) 2010-2017 Intel Corporation
+
+#include <stdint.h>
+#include <unistd.h>
+#include <stdbool.h>
+#include <signal.h>
+#include <assert.h>
+#include <semaphore.h>
+#include <linux/virtio_blk.h>
+#include <linux/virtio_ring.h>
+
+#include <rte_atomic.h>
+#include <rte_cycles.h>
+#include <rte_log.h>
+#include <rte_malloc.h>
+#include <rte_vhost.h>
+
+#include "vhost_blk.h"
+#include "blk_spec.h"
+
+#define VIRTQ_DESC_F_NEXT 1
+#define VIRTQ_DESC_F_AVAIL (1 << 7)
+#define VIRTQ_DESC_F_USED (1 << 15)
+
+#define MAX_TASK 12
+
+#define VHOST_BLK_FEATURES ((1ULL << VIRTIO_F_RING_PACKED) | \
+ (1ULL << VIRTIO_F_VERSION_1) |\
+ (1ULL << VIRTIO_F_NOTIFY_ON_EMPTY) | \
+ (1ULL << VHOST_USER_F_PROTOCOL_FEATURES))
+
+/* Path to folder where character device will be created. Can be set by user. */
+static char dev_pathname[PATH_MAX] = "";
+static sem_t exit_sem;
+static int g_should_stop = -1;
+
+struct vhost_blk_ctrlr *
+vhost_blk_ctrlr_find(const char *ctrlr_name)
+{
+ if (ctrlr_name == NULL)
+ return NULL;
+
+ /* currently we only support 1 socket file fd */
+ return g_vhost_ctrlr;
+}
+
+static uint64_t gpa_to_vva(int vid, uint64_t gpa, uint64_t *len)
+{
+ char path[PATH_MAX];
+ struct vhost_blk_ctrlr *ctrlr;
+ int ret = 0;
+
+ ret = rte_vhost_get_ifname(vid, path, PATH_MAX);
+ if (ret) {
+ fprintf(stderr, "Cannot get socket name\n");
+ assert(ret != 0);
+ }
+
+ ctrlr = vhost_blk_ctrlr_find(path);
+ if (!ctrlr) {
+ fprintf(stderr, "Controller is not ready\n");
+ assert(ctrlr != NULL);
+ }
+
+ assert(ctrlr->mem != NULL);
+
+ return rte_vhost_va_from_guest_pa(ctrlr->mem, gpa, len);
+}
+
+static struct vring_packed_desc *
+descriptor_get_next_packed(struct rte_vhost_vring *vq,
+ uint16_t *idx)
+{
+ if (vq->desc_packed[*idx % vq->size].flags & VIRTQ_DESC_F_NEXT) {
+ *idx += 1;
+ return &vq->desc_packed[*idx % vq->size];
+ }
+
+ return NULL;
+}
+
+static bool
+descriptor_has_next_packed(struct vring_packed_desc *cur_desc)
+{
+ return !!(cur_desc->flags & VRING_DESC_F_NEXT);
+}
+
+static bool
+descriptor_is_wr_packed(struct vring_packed_desc *cur_desc)
+{
+ return !!(cur_desc->flags & VRING_DESC_F_WRITE);
+}
+
+static struct rte_vhost_inflight_desc_packed *
+inflight_desc_get_next(struct rte_vhost_inflight_info_packed *inflight_packed,
+ struct rte_vhost_inflight_desc_packed *cur_desc)
+{
+ if (!!(cur_desc->flags & VIRTQ_DESC_F_NEXT))
+ return &inflight_packed->desc[cur_desc->next];
+
+ return NULL;
+}
+
+static bool
+inflight_desc_has_next(struct rte_vhost_inflight_desc_packed *cur_desc)
+{
+ return !!(cur_desc->flags & VRING_DESC_F_NEXT);
+}
+
+static bool
+inflight_desc_is_wr(struct rte_vhost_inflight_desc_packed *cur_desc)
+{
+ return !!(cur_desc->flags & VRING_DESC_F_WRITE);
+}
+
+static void
+inflight_process_payload_chain_packed(struct inflight_blk_task *task)
+{
+ void *data;
+ uint64_t chunck_len;
+ struct vhost_blk_task *blk_task;
+ struct rte_vhost_inflight_desc_packed *desc;
+
+ blk_task = &task->blk_task;
+ blk_task->iovs_cnt = 0;
+
+ do {
+ desc = task->inflight_desc;
+ chunck_len = desc->len;
+ data = (void *)(uintptr_t)gpa_to_vva(blk_task->bdev->vid,
+ desc->addr,
+ &chunck_len);
+ if (!data || chunck_len != desc->len) {
+ fprintf(stderr, "failed to translate desc address.\n");
+ return;
+ }
+
+ blk_task->iovs[blk_task->iovs_cnt].iov_base = data;
+ blk_task->iovs[blk_task->iovs_cnt].iov_len = desc->len;
+ blk_task->data_len += desc->len;
+ blk_task->iovs_cnt++;
+ task->inflight_desc = inflight_desc_get_next(
+ task->inflight_packed, desc);
+ } while (inflight_desc_has_next(task->inflight_desc));
+
+ chunck_len = task->inflight_desc->len;
+ blk_task->status = (void *)(uintptr_t)gpa_to_vva(
+ blk_task->bdev->vid, task->inflight_desc->addr, &chunck_len);
+ if (!blk_task->status || chunck_len != task->inflight_desc->len)
+ fprintf(stderr, "failed to translate desc address.\n");
+}
+
+static void
+inflight_submit_completion_packed(struct inflight_blk_task *task,
+ uint32_t q_idx, uint16_t *used_id,
+ bool *used_wrap_counter)
+{
+ struct vhost_blk_ctrlr *ctrlr;
+ struct rte_vhost_vring *vq;
+ struct vring_packed_desc *desc;
+ int ret;
+
+ ctrlr = vhost_blk_ctrlr_find(dev_pathname);
+ vq = task->blk_task.vq;
+
+ ret = rte_vhost_set_last_inflight_io_packed(ctrlr->bdev->vid, q_idx,
+ task->blk_task.head_idx);
+ if (ret != 0)
+ fprintf(stderr, "failed to set last inflight io\n");
+
+ desc = &vq->desc_packed[*used_id];
+ desc->id = task->blk_task.buffer_id;
+ rte_smp_mb();
+ if (*used_wrap_counter)
+ desc->flags |= VIRTQ_DESC_F_AVAIL | VIRTQ_DESC_F_USED;
+ else
+ desc->flags &= ~(VIRTQ_DESC_F_AVAIL | VIRTQ_DESC_F_USED);
+ rte_smp_mb();
+
+ *used_id += task->blk_task.iovs_cnt + 2;
+ if (*used_id >= vq->size) {
+ *used_id -= vq->size;
+ *used_wrap_counter = !(*used_wrap_counter);
+ }
+
+ ret = rte_vhost_clr_inflight_desc_packed(ctrlr->bdev->vid, q_idx,
+ task->blk_task.head_idx);
+ if (ret != 0)
+ fprintf(stderr, "failed to clear inflight io\n");
+
+ /* Send an interrupt back to the guest VM so that it knows
+ * a completion is ready to be processed.
+ */
+ rte_vhost_vring_call(task->blk_task.bdev->vid, q_idx);
+}
+
+static void
+submit_completion_packed(struct vhost_blk_task *task, uint32_t q_idx,
+ uint16_t *used_id, bool *used_wrap_counter)
+{
+ struct vhost_blk_ctrlr *ctrlr;
+ struct rte_vhost_vring *vq;
+ struct vring_packed_desc *desc;
+ int ret;
+
+ ctrlr = vhost_blk_ctrlr_find(dev_pathname);
+ vq = task->vq;
+
+ ret = rte_vhost_set_last_inflight_io_packed(ctrlr->bdev->vid, q_idx,
+ task->inflight_idx);
+ if (ret != 0)
+ fprintf(stderr, "failed to set last inflight io\n");
+
+ desc = &vq->desc_packed[*used_id];
+ desc->id = task->buffer_id;
+ rte_smp_mb();
+ if (*used_wrap_counter)
+ desc->flags |= VIRTQ_DESC_F_AVAIL | VIRTQ_DESC_F_USED;
+ else
+ desc->flags &= ~(VIRTQ_DESC_F_AVAIL | VIRTQ_DESC_F_USED);
+ rte_smp_mb();
+
+ *used_id += task->iovs_cnt + 2;
+ if (*used_id >= vq->size) {
+ *used_id -= vq->size;
+ *used_wrap_counter = !(*used_wrap_counter);
+ }
+
+ ret = rte_vhost_clr_inflight_desc_packed(ctrlr->bdev->vid, q_idx,
+ task->inflight_idx);
+ if (ret != 0)
+ fprintf(stderr, "failed to clear inflight io\n");
+
+ /* Send an interrupt back to the guest VM so that it knows
+ * a completion is ready to be processed.
+ */
+ rte_vhost_vring_call(task->bdev->vid, q_idx);
+}
+
+static void
+vhost_process_payload_chain_packed(struct vhost_blk_task *task,
+ uint16_t *idx)
+{
+ void *data;
+ uint64_t chunck_len;
+
+ task->iovs_cnt = 0;
+
+ do {
+ chunck_len = task->desc_packed->len;
+ data = (void *)(uintptr_t)gpa_to_vva(task->bdev->vid,
+ task->desc_packed->addr,
+ &chunck_len);
+ if (!data || chunck_len != task->desc_packed->len) {
+ fprintf(stderr, "failed to translate desc address.\n");
+ return;
+ }
+
+ task->iovs[task->iovs_cnt].iov_base = data;
+ task->iovs[task->iovs_cnt].iov_len = task->desc_packed->len;
+ task->data_len += task->desc_packed->len;
+ task->iovs_cnt++;
+ task->desc_packed = descriptor_get_next_packed(task->vq, idx);
+ } while (descriptor_has_next_packed(task->desc_packed));
+
+ task->last_idx = *idx % task->vq->size;
+ chunck_len = task->desc_packed->len;
+ task->status = (void *)(uintptr_t)gpa_to_vva(task->bdev->vid,
+ task->desc_packed->addr,
+ &chunck_len);
+ if (!task->status || chunck_len != task->desc_packed->len)
+ fprintf(stderr, "failed to translate desc address.\n");
+}
+
+
+static int
+descriptor_is_available(struct rte_vhost_vring *vring, uint16_t idx,
+ bool avail_wrap_counter)
+{
+ uint16_t flags = vring->desc_packed[idx].flags;
+
+ return ((!!(flags & VIRTQ_DESC_F_AVAIL) == avail_wrap_counter) &&
+ (!!(flags & VIRTQ_DESC_F_USED) != avail_wrap_counter));
+}
+
+static void
+process_requestq_packed(struct vhost_blk_ctrlr *ctrlr, uint32_t q_idx)
+{
+ bool avail_wrap_counter, used_wrap_counter;
+ uint16_t avail_idx, used_idx;
+ int ret;
+ uint64_t chunck_len;
+ struct vhost_blk_queue *blk_vq;
+ struct rte_vhost_vring *vq;
+ struct vhost_blk_task *task;
+
+ blk_vq = &ctrlr->bdev->queues[q_idx];
+ vq = &blk_vq->vq;
+
+ avail_idx = blk_vq->last_avail_idx;
+ avail_wrap_counter = blk_vq->avail_wrap_counter;
+ used_idx = blk_vq->last_used_idx;
+ used_wrap_counter = blk_vq->used_wrap_counter;
+
+ task = rte_zmalloc(NULL, sizeof(*task), 0);
+ assert(task != NULL);
+ task->vq = vq;
+ task->bdev = ctrlr->bdev;
+
+ while (descriptor_is_available(vq, avail_idx, avail_wrap_counter)) {
+ task->head_idx = avail_idx;
+ task->desc_packed = &task->vq->desc_packed[task->head_idx];
+ task->iovs_cnt = 0;
+ task->data_len = 0;
+ task->req = NULL;
+ task->status = NULL;
+
+ /* does not support indirect descriptors */
+ assert((task->desc_packed->flags & VRING_DESC_F_INDIRECT) == 0);
+
+ chunck_len = task->desc_packed->len;
+ task->req = (void *)(uintptr_t)gpa_to_vva(task->bdev->vid,
+ task->desc_packed->addr, &chunck_len);
+ if (!task->req || chunck_len != task->desc_packed->len) {
+ fprintf(stderr, "failed to translate desc address.\n");
+ rte_free(task);
+ return;
+ }
+
+ task->desc_packed = descriptor_get_next_packed(task->vq,
+ &avail_idx);
+ assert(task->desc_packed != NULL);
+ if (!descriptor_has_next_packed(task->desc_packed)) {
+ task->dxfer_dir = BLK_DIR_NONE;
+ task->last_idx = avail_idx % vq->size;
+ chunck_len = task->desc_packed->len;
+ task->status = (void *)(uintptr_t)
+ gpa_to_vva(task->bdev->vid,
+ task->desc_packed->addr,
+ &chunck_len);
+ if (!task->status ||
+ chunck_len != task->desc_packed->len) {
+ fprintf(stderr,
+ "failed to translate desc address.\n");
+ rte_free(task);
+ return;
+ }
+ } else {
+ task->readtype = descriptor_is_wr_packed(
+ task->desc_packed);
+ vhost_process_payload_chain_packed(task, &avail_idx);
+ }
+ task->buffer_id = vq->desc_packed[task->last_idx].id;
+ rte_vhost_set_inflight_desc_packed(ctrlr->bdev->vid, q_idx,
+ task->head_idx,
+ task->last_idx,
+ &task->inflight_idx);
+
+ if (++avail_idx >= vq->size) {
+ avail_idx -= vq->size;
+ avail_wrap_counter = !avail_wrap_counter;
+ }
+ blk_vq->last_avail_idx = avail_idx;
+ blk_vq->avail_wrap_counter = avail_wrap_counter;
+
+ ret = vhost_bdev_process_blk_commands(ctrlr->bdev, task);
+ if (ret) {
+ /* invalid response */
+ *task->status = VIRTIO_BLK_S_IOERR;
+ } else {
+ /* successfully */
+ *task->status = VIRTIO_BLK_S_OK;
+ }
+
+ submit_completion_packed(task, q_idx, &used_idx,
+ &used_wrap_counter);
+ blk_vq->last_used_idx = used_idx;
+ blk_vq->used_wrap_counter = used_wrap_counter;
+ }
+
+ rte_free(task);
+}
+
+static void
+submit_inflight_vq_packed(struct vhost_blk_ctrlr *ctrlr,
+ uint16_t q_idx)
+{
+ bool used_wrap_counter;
+ int req_idx, ret;
+ uint16_t used_idx;
+ uint64_t chunck_len;
+ struct vhost_blk_queue *blk_vq;
+ struct rte_vhost_ring_inflight *inflight_vq;
+ struct rte_vhost_resubmit_info *resubmit_info;
+ struct rte_vhost_vring *vq;
+ struct inflight_blk_task *task;
+ struct vhost_blk_task *blk_task;
+ struct rte_vhost_inflight_info_packed *inflight_info;
+
+ blk_vq = &ctrlr->bdev->queues[q_idx];
+ vq = &blk_vq->vq;
+ inflight_vq = &blk_vq->inflight_vq;
+ resubmit_info = inflight_vq->resubmit_inflight;
+ inflight_info = inflight_vq->inflight_packed;
+ used_idx = blk_vq->last_used_idx;
+ used_wrap_counter = blk_vq->used_wrap_counter;
+
+ task = rte_malloc(NULL, sizeof(*task), 0);
+ if (!task) {
+ fprintf(stderr, "failed to allocate memory\n");
+ return;
+ }
+ blk_task = &task->blk_task;
+ blk_task->vq = vq;
+ blk_task->bdev = ctrlr->bdev;
+ task->inflight_packed = inflight_vq->inflight_packed;
+
+ while (resubmit_info->resubmit_num-- > 0) {
+ req_idx = resubmit_info->resubmit_num;
+ blk_task->head_idx =
+ resubmit_info->resubmit_list[req_idx].index;
+ task->inflight_desc =
+ &inflight_info->desc[blk_task->head_idx];
+ task->blk_task.iovs_cnt = 0;
+ task->blk_task.data_len = 0;
+ task->blk_task.req = NULL;
+ task->blk_task.status = NULL;
+
+ /* update the avail idx too
+ * as it's initial value equals to used idx
+ */
+ blk_vq->last_avail_idx += task->inflight_desc->num;
+ if (blk_vq->last_avail_idx >= vq->size) {
+ blk_vq->last_avail_idx -= vq->size;
+ blk_vq->avail_wrap_counter =
+ !blk_vq->avail_wrap_counter;
+ }
+
+ /* does not support indirect descriptors */
+ assert(task->inflight_desc != NULL);
+ assert((task->inflight_desc->flags &
+ VRING_DESC_F_INDIRECT) == 0);
+
+ chunck_len = task->inflight_desc->len;
+ blk_task->req = (void *)(uintptr_t)
+ gpa_to_vva(blk_task->bdev->vid,
+ task->inflight_desc->addr,
+ &chunck_len);
+ if (!blk_task->req ||
+ chunck_len != task->inflight_desc->len) {
+ fprintf(stderr, "failed to translate desc address.\n");
+ rte_free(task);
+ return;
+ }
+
+ task->inflight_desc = inflight_desc_get_next(
+ task->inflight_packed, task->inflight_desc);
+ assert(task->inflight_desc != NULL);
+ if (!inflight_desc_has_next(task->inflight_desc)) {
+ blk_task->dxfer_dir = BLK_DIR_NONE;
+ chunck_len = task->inflight_desc->len;
+ blk_task->status = (void *)(uintptr_t)
+ gpa_to_vva(blk_task->bdev->vid,
+ task->inflight_desc->addr,
+ &chunck_len);
+ if (!blk_task->status ||
+ chunck_len != task->inflight_desc->len) {
+ fprintf(stderr,
+ "failed to translate desc address.\n");
+ rte_free(task);
+ return;
+ }
+ } else {
+ blk_task->readtype =
+ inflight_desc_is_wr(task->inflight_desc);
+ inflight_process_payload_chain_packed(task);
+ }
+
+ blk_task->buffer_id = task->inflight_desc->id;
+
+ ret = vhost_bdev_process_blk_commands(ctrlr->bdev, blk_task);
+ if (ret)
+ /* invalid response */
+ *blk_task->status = VIRTIO_BLK_S_IOERR;
+ else
+ /* successfully */
+ *blk_task->status = VIRTIO_BLK_S_OK;
+
+ inflight_submit_completion_packed(task, q_idx, &used_idx,
+ &used_wrap_counter);
+
+ blk_vq->last_used_idx = used_idx;
+ blk_vq->used_wrap_counter = used_wrap_counter;
+ }
+
+ rte_free(task);
+}
+
+static struct vring_desc *
+descriptor_get_next_split(struct vring_desc *vq_desc,
+ struct vring_desc *cur_desc)
+{
+ return &vq_desc[cur_desc->next];
+}
+
+static bool
+descriptor_has_next_split(struct vring_desc *cur_desc)
+{
+ return !!(cur_desc->flags & VRING_DESC_F_NEXT);
+}
+
+static bool
+descriptor_is_wr_split(struct vring_desc *cur_desc)
+{
+ return !!(cur_desc->flags & VRING_DESC_F_WRITE);
+}
+
+static void
+vhost_process_payload_chain_split(struct vhost_blk_task *task)
+{
+ void *data;
+ uint64_t chunck_len;
+
+ task->iovs_cnt = 0;
+
+ do {
+ chunck_len = task->desc_split->len;
+ data = (void *)(uintptr_t)gpa_to_vva(task->bdev->vid,
+ task->desc_split->addr,
+ &chunck_len);
+ if (!data || chunck_len != task->desc_split->len) {
+ fprintf(stderr, "failed to translate desc address.\n");
+ return;
+ }
+
+ task->iovs[task->iovs_cnt].iov_base = data;
+ task->iovs[task->iovs_cnt].iov_len = task->desc_split->len;
+ task->data_len += task->desc_split->len;
+ task->iovs_cnt++;
+ task->desc_split =
+ descriptor_get_next_split(task->vq->desc, task->desc_split);
+ } while (descriptor_has_next_split(task->desc_split));
+
+ chunck_len = task->desc_split->len;
+ task->status = (void *)(uintptr_t)gpa_to_vva(task->bdev->vid,
+ task->desc_split->addr,
+ &chunck_len);
+ if (!task->status || chunck_len != task->desc_split->len)
+ fprintf(stderr, "failed to translate desc address.\n");
+}
+
+static void
+submit_completion_split(struct vhost_blk_task *task, uint32_t vid,
+ uint32_t q_idx)
+{
+ struct rte_vhost_vring *vq;
+ struct vring_used *used;
+
+ vq = task->vq;
+ used = vq->used;
+
+ rte_vhost_set_last_inflight_io_split(vid, q_idx, task->req_idx);
+
+ /* Fill out the next entry in the "used" ring. id = the
+ * index of the descriptor that contained the blk request.
+ * len = the total amount of data transferred for the blk
+ * request. We must report the correct len, for variable
+ * length blk CDBs, where we may return less data than
+ * allocated by the guest VM.
+ */
+ used->ring[used->idx & (vq->size - 1)].id = task->req_idx;
+ used->ring[used->idx & (vq->size - 1)].len = task->data_len;
+ rte_smp_mb();
+ used->idx++;
+ rte_smp_mb();
+
+ rte_vhost_clr_inflight_desc_split(vid, q_idx, used->idx, task->req_idx);
+
+ /* Send an interrupt back to the guest VM so that it knows
+ * a completion is ready to be processed.
+ */
+ rte_vhost_vring_call(task->bdev->vid, q_idx);
+}
+
+static void
+submit_inflight_vq_split(struct vhost_blk_ctrlr *ctrlr,
+ uint32_t q_idx)
+{
+ struct vhost_blk_queue *blk_vq;
+ struct rte_vhost_ring_inflight *inflight_vq;
+ struct rte_vhost_resubmit_info *resubmit_inflight;
+ struct rte_vhost_resubmit_desc *resubmit_list;
+ struct vhost_blk_task *task;
+ int req_idx;
+ uint64_t chunck_len;
+ int ret;
+
+ blk_vq = &ctrlr->bdev->queues[q_idx];
+ inflight_vq = &blk_vq->inflight_vq;
+ resubmit_inflight = inflight_vq->resubmit_inflight;
+ resubmit_list = resubmit_inflight->resubmit_list;
+
+ task = rte_zmalloc(NULL, sizeof(*task), 0);
+ assert(task != NULL);
+
+ task->ctrlr = ctrlr;
+ task->bdev = ctrlr->bdev;
+ task->vq = &blk_vq->vq;
+
+ while (resubmit_inflight->resubmit_num-- > 0) {
+ req_idx = resubmit_list[resubmit_inflight->resubmit_num].index;
+ task->req_idx = req_idx;
+ task->desc_split = &task->vq->desc[task->req_idx];
+ task->iovs_cnt = 0;
+ task->data_len = 0;
+ task->req = NULL;
+ task->status = NULL;
+
+ /* does not support indirect descriptors */
+ assert(task->desc_split != NULL);
+ assert((task->desc_split->flags & VRING_DESC_F_INDIRECT) == 0);
+
+ chunck_len = task->desc_split->len;
+ task->req = (void *)(uintptr_t)gpa_to_vva(task->bdev->vid,
+ task->desc_split->addr, &chunck_len);
+ if (!task->req || chunck_len != task->desc_split->len) {
+ fprintf(stderr, "failed to translate desc address.\n");
+ rte_free(task);
+ return;
+ }
+
+ task->desc_split = descriptor_get_next_split(task->vq->desc,
+ task->desc_split);
+ if (!descriptor_has_next_split(task->desc_split)) {
+ task->dxfer_dir = BLK_DIR_NONE;
+ chunck_len = task->desc_split->len;
+ task->status = (void *)(uintptr_t)
+ gpa_to_vva(task->bdev->vid,
+ task->desc_split->addr,
+ &chunck_len);
+ if (!task->status ||
+ chunck_len != task->desc_split->len) {
+ fprintf(stderr,
+ "failed to translate desc address.\n");
+ rte_free(task);
+ return;
+ }
+ } else {
+ task->readtype =
+ descriptor_is_wr_split(task->desc_split);
+ vhost_process_payload_chain_split(task);
+ }
+
+ ret = vhost_bdev_process_blk_commands(ctrlr->bdev, task);
+ if (ret) {
+ /* invalid response */
+ *task->status = VIRTIO_BLK_S_IOERR;
+ } else {
+ /* successfully */
+ *task->status = VIRTIO_BLK_S_OK;
+ }
+ submit_completion_split(task, ctrlr->bdev->vid, q_idx);
+ }
+
+ rte_free(task);
+}
+
+static void
+process_requestq_split(struct vhost_blk_ctrlr *ctrlr, uint32_t q_idx)
+{
+ int ret;
+ int req_idx;
+ uint16_t last_idx;
+ uint64_t chunck_len;
+ struct vhost_blk_queue *blk_vq;
+ struct rte_vhost_vring *vq;
+ struct vhost_blk_task *task;
+
+ blk_vq = &ctrlr->bdev->queues[q_idx];
+ vq = &blk_vq->vq;
+
+ task = rte_zmalloc(NULL, sizeof(*task), 0);
+ assert(task != NULL);
+ task->ctrlr = ctrlr;
+ task->bdev = ctrlr->bdev;
+ task->vq = vq;
+
+ while (vq->avail->idx != blk_vq->last_avail_idx) {
+ last_idx = blk_vq->last_avail_idx & (vq->size - 1);
+ req_idx = vq->avail->ring[last_idx];
+ task->req_idx = req_idx;
+ task->desc_split = &task->vq->desc[task->req_idx];
+ task->iovs_cnt = 0;
+ task->data_len = 0;
+ task->req = NULL;
+ task->status = NULL;
+
+ rte_vhost_set_inflight_desc_split(ctrlr->bdev->vid, q_idx,
+ task->req_idx);
+
+ /* does not support indirect descriptors */
+ assert((task->desc_split->flags & VRING_DESC_F_INDIRECT) == 0);
+
+ chunck_len = task->desc_split->len;
+ task->req = (void *)(uintptr_t)gpa_to_vva(task->bdev->vid,
+ task->desc_split->addr, &chunck_len);
+ if (!task->req || chunck_len != task->desc_split->len) {
+ fprintf(stderr, "failed to translate desc address.\n");
+ rte_free(task);
+ return;
+ }
+
+ task->desc_split = descriptor_get_next_split(task->vq->desc,
+ task->desc_split);
+ if (!descriptor_has_next_split(task->desc_split)) {
+ task->dxfer_dir = BLK_DIR_NONE;
+ chunck_len = task->desc_split->len;
+ task->status = (void *)(uintptr_t)
+ gpa_to_vva(task->bdev->vid,
+ task->desc_split->addr,
+ &chunck_len);
+ if (!task->status ||
+ chunck_len != task->desc_split->len) {
+ fprintf(stderr,
+ "failed to translate desc address.\n");
+ rte_free(task);
+ return;
+ }
+ } else {
+ task->readtype =
+ descriptor_is_wr_split(task->desc_split);
+ vhost_process_payload_chain_split(task);
+ }
+ blk_vq->last_avail_idx++;
+
+ ret = vhost_bdev_process_blk_commands(ctrlr->bdev, task);
+ if (ret) {
+ /* invalid response */
+ *task->status = VIRTIO_BLK_S_IOERR;
+ } else {
+ /* successfully */
+ *task->status = VIRTIO_BLK_S_OK;
+ }
+
+ submit_completion_split(task, ctrlr->bdev->vid, q_idx);
+ }
+
+ rte_free(task);
+}
+
+static void *
+ctrlr_worker(void *arg)
+{
+ struct vhost_blk_ctrlr *ctrlr = (struct vhost_blk_ctrlr *)arg;
+ struct vhost_blk_queue *blk_vq;
+ struct rte_vhost_ring_inflight *inflight_vq;
+ cpu_set_t cpuset;
+ pthread_t thread;
+ int i;
+
+ fprintf(stdout, "Ctrlr Worker Thread start\n");
+
+ if (ctrlr == NULL || ctrlr->bdev == NULL) {
+ fprintf(stderr,
+ "%s: Error, invalid argument passed to worker thread\n",
+ __func__);
+ exit(0);
+ }
+
+ thread = pthread_self();
+ CPU_ZERO(&cpuset);
+ CPU_SET(0, &cpuset);
+ pthread_setaffinity_np(thread, sizeof(cpu_set_t), &cpuset);
+
+ for (i = 0; i < NUM_OF_BLK_QUEUES; i++) {
+ blk_vq = &ctrlr->bdev->queues[i];
+ inflight_vq = &blk_vq->inflight_vq;
+ if (inflight_vq->resubmit_inflight != NULL &&
+ inflight_vq->resubmit_inflight->resubmit_num != 0) {
+ if (ctrlr->packed_ring)
+ submit_inflight_vq_packed(ctrlr, i);
+ else
+ submit_inflight_vq_split(ctrlr, i);
+ }
+ }
+
+ while (!g_should_stop && ctrlr->bdev != NULL) {
+ for (i = 0; i < NUM_OF_BLK_QUEUES; i++) {
+ if (ctrlr->packed_ring)
+ process_requestq_packed(ctrlr, i);
+ else
+ process_requestq_split(ctrlr, i);
+ }
+ }
+
+ g_should_stop = 2;
+ fprintf(stdout, "Ctrlr Worker Thread Exiting\n");
+ sem_post(&exit_sem);
+ return NULL;
+}
+
+static int
+new_device(int vid)
+{
+ struct vhost_blk_ctrlr *ctrlr;
+ struct vhost_blk_queue *blk_vq;
+ struct rte_vhost_vring *vq;
+ uint64_t features;
+ pthread_t tid;
+ int i, ret;
+
+ ctrlr = vhost_blk_ctrlr_find(dev_pathname);
+ if (!ctrlr) {
+ fprintf(stderr, "Controller is not ready\n");
+ return -1;
+ }
+
+ if (ctrlr->started)
+ return 0;
+
+ ctrlr->bdev->vid = vid;
+ ret = rte_vhost_get_negotiated_features(vid, &features);
+ if (ret) {
+ fprintf(stderr, "failed to get the negotiated features\n");
+ return -1;
+ }
+ ctrlr->packed_ring = !!(features & (1ULL << VIRTIO_F_RING_PACKED));
+
+ ret = rte_vhost_get_mem_table(vid, &ctrlr->mem);
+ if (ret)
+ fprintf(stderr, "Get Controller memory region failed\n");
+ assert(ctrlr->mem != NULL);
+
+ /* Disable Notifications and init last idx */
+ for (i = 0; i < NUM_OF_BLK_QUEUES; i++) {
+ blk_vq = &ctrlr->bdev->queues[i];
+ vq = &blk_vq->vq;
+
+ ret = rte_vhost_get_vhost_vring(ctrlr->bdev->vid, i, vq);
+ assert(ret == 0);
+
+ ret = rte_vhost_get_vring_base(ctrlr->bdev->vid, i,
+ &blk_vq->last_avail_idx,
+ &blk_vq->last_used_idx);
+ assert(ret == 0);
+
+ ret = rte_vhost_get_vhost_ring_inflight(ctrlr->bdev->vid, i,
+ &blk_vq->inflight_vq);
+ assert(ret == 0);
+
+ if (ctrlr->packed_ring) {
+ /* for the reconnection */
+ ret = rte_vhost_get_vring_base_from_inflight(
+ ctrlr->bdev->vid, i,
+ &blk_vq->last_avail_idx,
+ &blk_vq->last_used_idx);
+
+ blk_vq->avail_wrap_counter = blk_vq->last_avail_idx &
+ (1 << 15);
+ blk_vq->last_avail_idx = blk_vq->last_avail_idx &
+ 0x7fff;
+ blk_vq->used_wrap_counter = blk_vq->last_used_idx &
+ (1 << 15);
+ blk_vq->last_used_idx = blk_vq->last_used_idx &
+ 0x7fff;
+ }
+
+ rte_vhost_enable_guest_notification(vid, i, 0);
+ }
+
+ /* start polling vring */
+ g_should_stop = 0;
+ fprintf(stdout, "New Device %s, Device ID %d\n", dev_pathname, vid);
+ if (pthread_create(&tid, NULL, &ctrlr_worker, ctrlr) < 0) {
+ fprintf(stderr, "Worker Thread Started Failed\n");
+ return -1;
+ }
+
+ /* device has been started */
+ ctrlr->started = 1;
+ pthread_detach(tid);
+ return 0;
+}
+
+static void
+destroy_device(int vid)
+{
+ char path[PATH_MAX];
+ struct vhost_blk_ctrlr *ctrlr;
+ struct vhost_blk_queue *blk_vq;
+ int i, ret;
+
+ ret = rte_vhost_get_ifname(vid, path, PATH_MAX);
+ if (ret) {
+ fprintf(stderr, "Destroy Ctrlr Failed\n");
+ return;
+ }
+
+ fprintf(stdout, "Destroy %s Device ID %d\n", path, vid);
+ ctrlr = vhost_blk_ctrlr_find(path);
+ if (!ctrlr) {
+ fprintf(stderr, "Destroy Ctrlr Failed\n");
+ return;
+ }
+
+ if (!ctrlr->started)
+ return;
+
+ g_should_stop = 1;
+ while (g_should_stop != 2)
+ ;
+
+ for (i = 0; i < NUM_OF_BLK_QUEUES; i++) {
+ blk_vq = &ctrlr->bdev->queues[i];
+ if (ctrlr->packed_ring) {
+ blk_vq->last_avail_idx |= (blk_vq->avail_wrap_counter <<
+ 15);
+ blk_vq->last_used_idx |= (blk_vq->used_wrap_counter <<
+ 15);
+ }
+ rte_vhost_set_vring_base(ctrlr->bdev->vid, i,
+ blk_vq->last_avail_idx,
+ blk_vq->last_used_idx);
+ }
+
+ free(ctrlr->mem);
+
+ ctrlr->started = 0;
+ sem_wait(&exit_sem);
+}
+
+static int
+new_connection(int vid)
+{
+ /* extend the proper features for block device */
+ vhost_session_install_rte_compat_hooks(vid);
+
+ return 0;
+}
+
+struct vhost_device_ops vhost_blk_device_ops = {
+ .new_device = new_device,
+ .destroy_device = destroy_device,
+ .new_connection = new_connection,
+};
+
+static struct vhost_block_dev *
+vhost_blk_bdev_construct(const char *bdev_name,
+ const char *bdev_serial, uint32_t blk_size, uint64_t blk_cnt,
+ bool wce_enable)
+{
+ struct vhost_block_dev *bdev;
+
+ bdev = rte_zmalloc(NULL, sizeof(*bdev), RTE_CACHE_LINE_SIZE);
+ if (!bdev)
+ return NULL;
+
+ strncpy(bdev->name, bdev_name, sizeof(bdev->name));
+ strncpy(bdev->product_name, bdev_serial, sizeof(bdev->product_name));
+ bdev->blocklen = blk_size;
+ bdev->blockcnt = blk_cnt;
+ bdev->write_cache = wce_enable;
+
+ fprintf(stdout, "blocklen=%d, blockcnt=%ld\n", bdev->blocklen,
+ bdev->blockcnt);
+
+ /* use memory as disk storage space */
+ bdev->data = rte_zmalloc(NULL, blk_cnt * blk_size, 0);
+ if (!bdev->data) {
+ fprintf(stderr, "no enough reserved huge memory for disk\n");
+ free(bdev);
+ return NULL;
+ }
+
+ return bdev;
+}
+
+static struct vhost_blk_ctrlr *
+vhost_blk_ctrlr_construct(const char *ctrlr_name)
+{
+ int ret;
+ struct vhost_blk_ctrlr *ctrlr;
+ char *path;
+ char cwd[PATH_MAX];
+
+ /* always use current directory */
+ path = getcwd(cwd, PATH_MAX);
+ if (!path) {
+ fprintf(stderr, "Cannot get current working directory\n");
+ return NULL;
+ }
+ snprintf(dev_pathname, sizeof(dev_pathname), "%s/%s", path, ctrlr_name);
+
+ if (access(dev_pathname, F_OK) != -1) {
+ if (unlink(dev_pathname) != 0)
+ rte_exit(EXIT_FAILURE, "Cannot remove %s.\n",
+ dev_pathname);
+ }
+
+ if (rte_vhost_driver_register(dev_pathname, 0) != 0) {
+ fprintf(stderr, "socket %s already exists\n", dev_pathname);
+ return NULL;
+ }
+
+ ret = rte_vhost_driver_set_features(dev_pathname, VHOST_BLK_FEATURES);
+ if (ret != 0) {
+ fprintf(stderr, "Set vhost driver features failed\n");
+ rte_vhost_driver_unregister(dev_pathname);
+ return NULL;
+ }
+
+ /* set proper features */
+ vhost_dev_install_rte_compat_hooks(dev_pathname);
+
+ ctrlr = rte_zmalloc(NULL, sizeof(*ctrlr), RTE_CACHE_LINE_SIZE);
+ if (!ctrlr) {
+ rte_vhost_driver_unregister(dev_pathname);
+ return NULL;
+ }
+
+ /* hardcoded block device information with 128MiB */
+ ctrlr->bdev = vhost_blk_bdev_construct("malloc0", "vhost_blk_malloc0",
+ 4096, 32768, 0);
+ if (!ctrlr->bdev) {
+ rte_free(ctrlr);
+ rte_vhost_driver_unregister(dev_pathname);
+ return NULL;
+ }
+
+ rte_vhost_driver_callback_register(dev_pathname,
+ &vhost_blk_device_ops);
+
+ return ctrlr;
+}
+
+static void
+signal_handler(__rte_unused int signum)
+{
+ struct vhost_blk_ctrlr *ctrlr;
+
+ if (access(dev_pathname, F_OK) == 0)
+ unlink(dev_pathname);
+
+ if (g_should_stop != -1) {
+ g_should_stop = 1;
+ while (g_should_stop != 2)
+ ;
+ }
+
+ ctrlr = vhost_blk_ctrlr_find(dev_pathname);
+ if (ctrlr != NULL) {
+ if (ctrlr->bdev != NULL) {
+ rte_free(ctrlr->bdev->data);
+ rte_free(ctrlr->bdev);
+ }
+ rte_free(ctrlr);
+ }
+
+ rte_vhost_driver_unregister(dev_pathname);
+ exit(0);
+}
+
+int main(int argc, char *argv[])
+{
+ int ret;
+
+ signal(SIGINT, signal_handler);
+
+ /* init EAL */
+ ret = rte_eal_init(argc, argv);
+ if (ret < 0)
+ rte_exit(EXIT_FAILURE, "Error with EAL initialization\n");
+
+ g_vhost_ctrlr = vhost_blk_ctrlr_construct("vhost.socket");
+ if (g_vhost_ctrlr == NULL) {
+ fprintf(stderr, "Construct vhost blk controller failed\n");
+ return 0;
+ }
+
+ if (sem_init(&exit_sem, 0, 0) < 0) {
+ fprintf(stderr, "Error init exit_sem\n");
+ return -1;
+ }
+
+ rte_vhost_driver_start(dev_pathname);
+
+ /* loop for exit the application */
+ while (1)
+ sleep(1);
+
+ return 0;
+}
+
diff --git a/examples/vhost_blk/vhost_blk.h b/examples/vhost_blk/vhost_blk.h
new file mode 100644
index 000000000..a7a62fbae
--- /dev/null
+++ b/examples/vhost_blk/vhost_blk.h
@@ -0,0 +1,127 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2017 Intel Corporation
+ */
+
+#ifndef _VHOST_BLK_H_
+#define _VHOST_BLK_H_
+
+#include <stdio.h>
+#include <sys/uio.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <linux/virtio_blk.h>
+#include <linux/virtio_ring.h>
+
+#include <rte_vhost.h>
+
+struct vring_packed_desc {
+ /* Buffer Address. */
+ __le64 addr;
+ /* Buffer Length. */
+ __le32 len;
+ /* Buffer ID. */
+ __le16 id;
+ /* The flags depending on descriptor type. */
+ __le16 flags;
+};
+
+struct vhost_blk_queue {
+ struct rte_vhost_vring vq;
+ struct rte_vhost_ring_inflight inflight_vq;
+ uint16_t last_avail_idx;
+ uint16_t last_used_idx;
+ bool avail_wrap_counter;
+ bool used_wrap_counter;
+};
+
+#define NUM_OF_BLK_QUEUES 1
+
+#ifndef VIRTIO_F_RING_PACKED
+#define VIRTIO_F_RING_PACKED 34
+#endif
+
+#define min(a, b) (((a) < (b)) ? (a) : (b))
+
+struct vhost_block_dev {
+ /** ID for vhost library. */
+ int vid;
+ /** Queues for the block device */
+ struct vhost_blk_queue queues[NUM_OF_BLK_QUEUES];
+ /** Unique name for this block device. */
+ char name[64];
+
+ /** Unique product name for this kind of block device. */
+ char product_name[256];
+
+ /** Size in bytes of a logical block for the backend */
+ uint32_t blocklen;
+
+ /** Number of blocks */
+ uint64_t blockcnt;
+
+ /** write cache enabled, not used at the moment */
+ int write_cache;
+
+ /** use memory as disk storage space */
+ uint8_t *data;
+};
+
+struct vhost_blk_ctrlr {
+ uint8_t started;
+ uint8_t packed_ring;
+ uint8_t need_restart;
+ /** Only support 1 LUN for the example */
+ struct vhost_block_dev *bdev;
+ /** VM memory region */
+ struct rte_vhost_memory *mem;
+} __rte_cache_aligned;
+
+#define VHOST_BLK_MAX_IOVS 128
+
+enum blk_data_dir {
+ BLK_DIR_NONE = 0,
+ BLK_DIR_TO_DEV = 1,
+ BLK_DIR_FROM_DEV = 2,
+};
+
+struct vhost_blk_task {
+ uint8_t readtype;
+ uint8_t req_idx;
+ uint16_t head_idx;
+ uint16_t last_idx;
+ uint16_t inflight_idx;
+ uint16_t buffer_id;
+ uint32_t dxfer_dir;
+ uint32_t data_len;
+ struct virtio_blk_outhdr *req;
+
+ volatile uint8_t *status;
+
+ struct iovec iovs[VHOST_BLK_MAX_IOVS];
+ uint32_t iovs_cnt;
+ struct vring_packed_desc *desc_packed;
+ struct vring_desc *desc_split;
+ struct rte_vhost_vring *vq;
+ struct vhost_block_dev *bdev;
+ struct vhost_blk_ctrlr *ctrlr;
+};
+
+struct inflight_blk_task {
+ struct vhost_blk_task blk_task;
+ struct rte_vhost_inflight_desc_packed *inflight_desc;
+ struct rte_vhost_inflight_info_packed *inflight_packed;
+};
+
+struct vhost_blk_ctrlr *g_vhost_ctrlr;
+struct vhost_device_ops vhost_blk_device_ops;
+
+int vhost_bdev_process_blk_commands(struct vhost_block_dev *bdev,
+ struct vhost_blk_task *task);
+
+void vhost_session_install_rte_compat_hooks(uint32_t vid);
+
+void vhost_dev_install_rte_compat_hooks(const char *path);
+
+struct vhost_blk_ctrlr *vhost_blk_ctrlr_find(const char *ctrlr_name);
+
+#endif /* _VHOST_blk_H_ */
diff --git a/examples/vhost_blk/vhost_blk_compat.c b/examples/vhost_blk/vhost_blk_compat.c
new file mode 100644
index 000000000..4accfa498
--- /dev/null
+++ b/examples/vhost_blk/vhost_blk_compat.c
@@ -0,0 +1,173 @@
+// SPDX-License-Identifier: BSD-3-Clause
+// Copyright(c) 2010-2017 Intel Corporation
+
+#ifndef _VHOST_BLK_COMPAT_H_
+#define _VHOST_BLK_COMPAT_H_
+
+#include <sys/uio.h>
+#include <stdint.h>
+#include <linux/virtio_blk.h>
+#include <linux/virtio_ring.h>
+
+#include <rte_vhost.h>
+#include "vhost_blk.h"
+#include "blk_spec.h"
+
+#define VHOST_MAX_VQUEUES 256
+#define SPDK_VHOST_MAX_VQ_SIZE 1024
+
+#define VHOST_USER_GET_CONFIG 24
+#define VHOST_USER_SET_CONFIG 25
+
+static int
+vhost_blk_get_config(struct vhost_block_dev *bdev, uint8_t *config,
+ uint32_t len)
+{
+ struct virtio_blk_config blkcfg;
+ uint32_t blk_size;
+ uint64_t blkcnt;
+
+ if (bdev == NULL) {
+ /* We can't just return -1 here as this GET_CONFIG message might
+ * be caused by a QEMU VM reboot. Returning -1 will indicate an
+ * error to QEMU, who might then decide to terminate itself.
+ * We don't want that. A simple reboot shouldn't break the
+ * system.
+ *
+ * Presenting a block device with block size 0 and block count 0
+ * doesn't cause any problems on QEMU side and the virtio-pci
+ * device is even still available inside the VM, but there will
+ * be no block device created for it - the kernel drivers will
+ * silently reject it.
+ */
+ blk_size = 0;
+ blkcnt = 0;
+ } else {
+ blk_size = bdev->blocklen;
+ blkcnt = bdev->blockcnt;
+ }
+
+ memset(&blkcfg, 0, sizeof(blkcfg));
+ blkcfg.blk_size = blk_size;
+ /* minimum I/O size in blocks */
+ blkcfg.min_io_size = 1;
+ /* expressed in 512 Bytes sectors */
+ blkcfg.capacity = (blkcnt * blk_size) / 512;
+ /* QEMU can overwrite this value when started */
+ blkcfg.num_queues = VHOST_MAX_VQUEUES;
+
+ fprintf(stdout, "block device:blk_size = %d, blkcnt = %ld\n", blk_size,
+ blkcnt);
+
+ memcpy(config, &blkcfg, min(len, sizeof(blkcfg)));
+
+ return 0;
+}
+
+static enum rte_vhost_msg_result
+extern_vhost_pre_msg_handler(int vid, void *_msg)
+{
+ char path[PATH_MAX];
+ struct vhost_blk_ctrlr *ctrlr;
+ struct vhost_user_msg *msg = _msg;
+ int ret;
+
+ ret = rte_vhost_get_ifname(vid, path, PATH_MAX);
+ if (ret) {
+ fprintf(stderr, "Cannot get socket name\n");
+ return -1;
+ }
+
+ ctrlr = vhost_blk_ctrlr_find(path);
+ if (!ctrlr) {
+ fprintf(stderr, "Controller is not ready\n");
+ return -1;
+ }
+
+ switch ((int)msg->request) {
+ case VHOST_USER_GET_VRING_BASE:
+ case VHOST_USER_SET_VRING_BASE:
+ case VHOST_USER_SET_VRING_ADDR:
+ case VHOST_USER_SET_VRING_NUM:
+ case VHOST_USER_SET_VRING_KICK:
+ case VHOST_USER_SET_VRING_CALL:
+ case VHOST_USER_SET_MEM_TABLE:
+ break;
+ case VHOST_USER_GET_CONFIG: {
+ int rc = 0;
+
+ rc = vhost_blk_get_config(ctrlr->bdev,
+ msg->payload.cfg.region,
+ msg->payload.cfg.size);
+ if (rc != 0)
+ msg->size = 0;
+
+ return RTE_VHOST_MSG_RESULT_REPLY;
+ }
+ case VHOST_USER_SET_CONFIG:
+ default:
+ break;
+ }
+
+ return RTE_VHOST_MSG_RESULT_NOT_HANDLED;
+}
+
+static enum rte_vhost_msg_result
+extern_vhost_post_msg_handler(int vid, void *_msg)
+{
+ char path[PATH_MAX];
+ struct vhost_blk_ctrlr *ctrlr;
+ struct vhost_user_msg *msg = _msg;
+ int ret;
+
+ ret = rte_vhost_get_ifname(vid, path, PATH_MAX);
+ if (ret) {
+ fprintf(stderr, "Cannot get socket name\n");
+ return -1;
+ }
+
+ ctrlr = vhost_blk_ctrlr_find(path);
+ if (!ctrlr) {
+ fprintf(stderr, "Controller is not ready\n");
+ return -1;
+ }
+
+ switch (msg->request) {
+ case VHOST_USER_SET_FEATURES:
+ case VHOST_USER_SET_VRING_KICK:
+ default:
+ break;
+ }
+
+ return RTE_VHOST_MSG_RESULT_NOT_HANDLED;
+}
+
+struct rte_vhost_user_extern_ops g_extern_vhost_ops = {
+ .pre_msg_handle = extern_vhost_pre_msg_handler,
+ .post_msg_handle = extern_vhost_post_msg_handler,
+};
+
+void
+vhost_session_install_rte_compat_hooks(uint32_t vid)
+{
+ int rc;
+
+ rc = rte_vhost_extern_callback_register(vid, &g_extern_vhost_ops, NULL);
+ if (rc != 0)
+ fprintf(stderr,
+ "rte_vhost_extern_callback_register() failed for vid = %d\n",
+ vid);
+}
+
+void
+vhost_dev_install_rte_compat_hooks(const char *path)
+{
+ uint64_t protocol_features = 0;
+
+ rte_vhost_driver_get_protocol_features(path, &protocol_features);
+ protocol_features |= (1ULL << VHOST_USER_PROTOCOL_F_CONFIG);
+ protocol_features |= (1ULL << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD);
+ rte_vhost_driver_set_protocol_features(path, protocol_features);
+}
+
+#endif
--
2.17.2
^ permalink raw reply [flat|nested] 27+ messages in thread
* [dpdk-dev] [PATCH v5] vhost: add vhost-user-blk example which support inflight
2019-10-28 19:37 ` [dpdk-dev] [PATCH] " Jin Yu
@ 2019-11-01 10:42 ` Jin Yu
2019-11-04 16:36 ` [dpdk-dev] [PATCH v6] " Jin Yu
0 siblings, 1 reply; 27+ messages in thread
From: Jin Yu @ 2019-11-01 10:42 UTC (permalink / raw)
To: Thomas Monjalon, John McNamara, Marko Kovacevic, Maxime Coquelin,
Tiwei Bie, Zhihong Wang
Cc: dev, Jin Yu
A vhost-user-blk example that support inflight feature. It uses the
new APIs that introduced in the first patch, so it can show how these
APIs work to support inflight feature.
Signed-off-by: Jin Yu <jin.yu@intel.com>
---
v1 - add the case.
v2 - add the rte_vhost prefix.
v3 - add packed ring support
v4 - fix build, MAINTAINERS and add guides
v5 - fix ci/intel-compilation errors
---
MAINTAINERS | 2 +
doc/guides/sample_app_ug/index.rst | 1 +
doc/guides/sample_app_ug/vhost_blk.rst | 63 ++
examples/meson.build | 2 +-
examples/vhost_blk/Makefile | 68 ++
examples/vhost_blk/blk.c | 125 +++
examples/vhost_blk/blk_spec.h | 95 ++
examples/vhost_blk/meson.build | 21 +
examples/vhost_blk/vhost_blk.c | 1094 ++++++++++++++++++++++++
examples/vhost_blk/vhost_blk.h | 127 +++
examples/vhost_blk/vhost_blk_compat.c | 173 ++++
11 files changed, 1770 insertions(+), 1 deletion(-)
create mode 100644 doc/guides/sample_app_ug/vhost_blk.rst
create mode 100644 examples/vhost_blk/Makefile
create mode 100644 examples/vhost_blk/blk.c
create mode 100644 examples/vhost_blk/blk_spec.h
create mode 100644 examples/vhost_blk/meson.build
create mode 100644 examples/vhost_blk/vhost_blk.c
create mode 100644 examples/vhost_blk/vhost_blk.h
create mode 100644 examples/vhost_blk/vhost_blk_compat.c
diff --git a/MAINTAINERS b/MAINTAINERS
index 717c31801..c22a8312e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -839,6 +839,8 @@ F: lib/librte_vhost/
F: doc/guides/prog_guide/vhost_lib.rst
F: examples/vhost/
F: doc/guides/sample_app_ug/vhost.rst
+F: example/vhost_blk/
+F: doc/guides/sample_app_ug/vhost_blk.rst
F: examples/vhost_crypto/
F: examples/vdpa/
F: doc/guides/sample_app_ug/vdpa.rst
diff --git a/doc/guides/sample_app_ug/index.rst b/doc/guides/sample_app_ug/index.rst
index a3737c118..613f483f3 100644
--- a/doc/guides/sample_app_ug/index.rst
+++ b/doc/guides/sample_app_ug/index.rst
@@ -40,6 +40,7 @@ Sample Applications User Guides
packet_ordering
vmdq_dcb_forwarding
vhost
+ vhost_blk
vhost_crypto
vdpa
ip_pipeline
diff --git a/doc/guides/sample_app_ug/vhost_blk.rst b/doc/guides/sample_app_ug/vhost_blk.rst
new file mode 100644
index 000000000..39096e2e4
--- /dev/null
+++ b/doc/guides/sample_app_ug/vhost_blk.rst
@@ -0,0 +1,63 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+ Copyright(c) 2010-2017 Intel Corporation.
+
+Vhost_blk Sample Application
+=============================
+
+The vhost_blk sample application implemented a simple block device,
+which used as the backend of Qemu vhost-user-blk device. Users can extend
+the exist example to use other type of block device(e.g. AIO) besides
+memory based block device. Similar with vhost-user-net device, the sample
+application used domain socket to communicate with Qemu, and the virtio
+ring (split or packed format) was processed by vhost_blk sample application.
+
+The sample application reuse lots codes from SPDK(Storage Performance
+Development Kit, https://github.com/spdk/spdk) vhost-user-blk target,
+for DPDK vhost library used in storage area, user can take SPDK as
+reference as well.
+
+Testing steps
+-------------
+
+This section shows the steps how to start a VM with the block device as
+fast data path for critical application.
+
+Compiling the Application
+-------------------------
+
+To compile the sample application see :doc:`compiling`.
+
+The application is located in the ``examples`` sub-directory.
+
+You will also need to build DPDK both on the host and inside the guest
+
+Start the vhost_blk example
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: console
+
+ ./vhost_blk -m 1024
+
+.. _vhost_blk_app_run_vm:
+
+Start the VM
+~~~~~~~~~~~~
+
+.. code-block:: console
+
+ qemu-system-x86_64 -machine accel=kvm \
+ -m $mem -object memory-backend-file,id=mem,size=$mem,\
+ mem-path=/dev/hugepages,share=on -numa node,memdev=mem \
+ -drive file=os.img,if=none,id=disk \
+ -device ide-hd,drive=disk,bootindex=0 \
+ -chardev socket,id=char0,reconnect=1,path=/tmp/vhost.socket \
+ -device vhost-user-blk-pci,ring_packed=1,chardev=char0,num-queues=1 \
+ ...
+
+.. note::
+ You must check whether your Qemu can support "vhost-user-blk" or not,
+ Qemu v4.0 or newer version is required.
+ reconnect=1 means live recovery support that qemu can reconnect vhost_blk
+ after we restart vhost_blk example.
+ ring_packed=1 means the device support packed ring but need the guest kernel
+ version >= 5.0
diff --git a/examples/meson.build b/examples/meson.build
index 98ae50a49..10a6bd7ef 100644
--- a/examples/meson.build
+++ b/examples/meson.build
@@ -42,7 +42,7 @@ all_examples = [
'skeleton', 'tep_termination',
'timer', 'vdpa',
'vhost', 'vhost_crypto',
- 'vm_power_manager',
+ 'vhost_blk', 'vm_power_manager',
'vm_power_manager/guest_cli',
'vmdq', 'vmdq_dcb',
]
diff --git a/examples/vhost_blk/Makefile b/examples/vhost_blk/Makefile
new file mode 100644
index 000000000..a10a90071
--- /dev/null
+++ b/examples/vhost_blk/Makefile
@@ -0,0 +1,68 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2010-2014 Intel Corporation
+
+# binary name
+APP = vhost-blk
+
+# all source are stored in SRCS-y
+SRCS-y := blk.c vhost_blk.c vhost_blk_compat.c
+
+# Build using pkg-config variables if possible
+$(shell pkg-config --exists libdpdk)
+ifeq ($(.SHELLSTATUS),0)
+
+all: shared
+.PHONY: shared static
+shared: build/$(APP)-shared
+ ln -sf $(APP)-shared build/$(APP)
+static: build/$(APP)-static
+ ln -sf $(APP)-static build/$(APP)
+
+LDFLAGS += -pthread
+
+PC_FILE := $(shell pkg-config --path libdpdk)
+CFLAGS += -O3 $(shell pkg-config --cflags libdpdk)
+LDFLAGS_SHARED = $(shell pkg-config --libs libdpdk)
+LDFLAGS_STATIC = -Wl,-Bstatic $(shell pkg-config --static --libs libdpdk)
+
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+
+build/$(APP)-shared: $(SRCS-y) Makefile $(PC_FILE) | build
+ $(CC) $(CFLAGS) $(SRCS-y) -o $@ $(LDFLAGS) $(LDFLAGS_SHARED)
+
+build/$(APP)-static: $(SRCS-y) Makefile $(PC_FILE) | build
+ $(CC) $(CFLAGS) $(SRCS-y) -o $@ $(LDFLAGS) $(LDFLAGS_STATIC)
+
+build:
+ @mkdir -p $@
+
+.PHONY: clean
+clean:
+ rm -f build/$(APP) build/$(APP)-static build/$(APP)-shared
+ test -d build && rmdir -p build || true
+
+else # Build using legacy build system
+
+ifeq ($(RTE_SDK),)
+$(error "Please define RTE_SDK environment variable")
+endif
+
+# Default target, detect a build directory, by looking for a path with a .config
+RTE_TARGET ?= $(notdir $(abspath $(dir $(firstword $(wildcard $(RTE_SDK)/*/.config)))))
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+ifneq ($(CONFIG_RTE_EXEC_ENV_LINUX),y)
+$(info This application can only operate in a linux environment, \
+please change the definition of the RTE_TARGET environment variable)
+all:
+else
+
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+CFLAGS += -O2 -D_FILE_OFFSET_BITS=64
+CFLAGS += $(WERROR_FLAGS)
+
+include $(RTE_SDK)/mk/rte.extapp.mk
+
+endif
+endif
diff --git a/examples/vhost_blk/blk.c b/examples/vhost_blk/blk.c
new file mode 100644
index 000000000..424ed3015
--- /dev/null
+++ b/examples/vhost_blk/blk.c
@@ -0,0 +1,125 @@
+// SPDX-License-Identifier: BSD-3-Clause
+// Copyright(c) 2010-2019 Intel Corporation
+
+/**
+ * This work is largely based on the "vhost-user-blk" implementation by
+ * SPDK(https://github.com/spdk/spdk).
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <assert.h>
+#include <ctype.h>
+#include <string.h>
+#include <stddef.h>
+
+#include <rte_atomic.h>
+#include <rte_cycles.h>
+#include <rte_log.h>
+#include <rte_malloc.h>
+#include <rte_byteorder.h>
+#include <rte_string_fns.h>
+
+#include "vhost_blk.h"
+#include "blk_spec.h"
+
+static void
+vhost_strcpy_pad(void *dst, const char *src, size_t size, int pad)
+{
+ size_t len;
+
+ len = strlen(src);
+ if (len < size) {
+ memcpy(dst, src, len);
+ memset((char *)dst + len, pad, size - len);
+ } else {
+ memcpy(dst, src, size);
+ }
+}
+
+static int
+vhost_bdev_blk_readwrite(struct vhost_block_dev *bdev,
+ struct vhost_blk_task *task,
+ uint64_t lba_512, __rte_unused uint32_t xfer_len)
+{
+ uint32_t i;
+ uint64_t offset;
+ uint32_t nbytes = 0;
+
+ offset = lba_512 * 512;
+
+ for (i = 0; i < task->iovs_cnt; i++) {
+ if (task->dxfer_dir == BLK_DIR_TO_DEV)
+ memcpy(bdev->data + offset, task->iovs[i].iov_base,
+ task->iovs[i].iov_len);
+ else
+ memcpy(task->iovs[i].iov_base, bdev->data + offset,
+ task->iovs[i].iov_len);
+ offset += task->iovs[i].iov_len;
+ nbytes += task->iovs[i].iov_len;
+ }
+
+ return nbytes;
+}
+
+int
+vhost_bdev_process_blk_commands(struct vhost_block_dev *bdev,
+ struct vhost_blk_task *task)
+{
+ int used_len;
+
+ if (unlikely(task->data_len > (bdev->blockcnt * bdev->blocklen))) {
+ fprintf(stderr, "read or write beyond capacity\n");
+ return VIRTIO_BLK_S_UNSUPP;
+ }
+
+ switch (task->req->type) {
+ case VIRTIO_BLK_T_IN:
+ if (unlikely(task->data_len == 0 ||
+ (task->data_len & (512 - 1)) != 0)) {
+ fprintf(stderr,
+ "%s - passed IO buffer is not multiple of 512b"
+ "(req_idx = %"PRIu16").\n",
+ task->req->type ? "WRITE" : "READ",
+ task->head_idx);
+ return VIRTIO_BLK_S_UNSUPP;
+ }
+
+ task->dxfer_dir = BLK_DIR_FROM_DEV;
+ vhost_bdev_blk_readwrite(bdev, task,
+ task->req->sector, task->data_len);
+ break;
+ case VIRTIO_BLK_T_OUT:
+ if (unlikely(task->data_len == 0 ||
+ (task->data_len & (512 - 1)) != 0)) {
+ fprintf(stderr,
+ "%s - passed IO buffer is not multiple of 512b"
+ "(req_idx = %"PRIu16").\n",
+ task->req->type ? "WRITE" : "READ",
+ task->head_idx);
+ return VIRTIO_BLK_S_UNSUPP;
+ }
+
+ if (task->readtype) {
+ fprintf(stderr, "type isn't right\n");
+ return VIRTIO_BLK_S_IOERR;
+ }
+ task->dxfer_dir = BLK_DIR_TO_DEV;
+ vhost_bdev_blk_readwrite(bdev, task,
+ task->req->sector, task->data_len);
+ break;
+ case VIRTIO_BLK_T_GET_ID:
+ if (!task->iovs_cnt || task->data_len)
+ return VIRTIO_BLK_S_UNSUPP;
+ used_len = min(VIRTIO_BLK_ID_BYTES, task->data_len);
+ vhost_strcpy_pad(task->iovs[0].iov_base,
+ bdev->product_name, used_len, ' ');
+ break;
+ default:
+ fprintf(stderr, "unsupported cmd\n");
+ return VIRTIO_BLK_S_UNSUPP;
+ }
+
+ return VIRTIO_BLK_S_OK;
+}
diff --git a/examples/vhost_blk/blk_spec.h b/examples/vhost_blk/blk_spec.h
new file mode 100644
index 000000000..5875e2f86
--- /dev/null
+++ b/examples/vhost_blk/blk_spec.h
@@ -0,0 +1,95 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Intel Corporation
+ */
+
+#ifndef _BLK_SPEC_H
+#define _BLK_SPEC_H
+
+#include <stdint.h>
+
+#ifndef VHOST_USER_MEMORY_MAX_NREGIONS
+#define VHOST_USER_MEMORY_MAX_NREGIONS 8
+#endif
+
+#ifndef VHOST_USER_MAX_CONFIG_SIZE
+#define VHOST_USER_MAX_CONFIG_SIZE 256
+#endif
+
+#ifndef VHOST_USER_PROTOCOL_F_CONFIG
+#define VHOST_USER_PROTOCOL_F_CONFIG 9
+#endif
+
+#ifndef VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD
+#define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD 12
+#endif
+
+#define VIRTIO_BLK_ID_BYTES 20 /* ID string length */
+
+#define VIRTIO_BLK_T_IN 0
+#define VIRTIO_BLK_T_OUT 1
+#define VIRTIO_BLK_T_FLUSH 4
+#define VIRTIO_BLK_T_GET_ID 8
+#define VIRTIO_BLK_T_DISCARD 11
+#define VIRTIO_BLK_T_WRITE_ZEROES 13
+
+#define VIRTIO_BLK_S_OK 0
+#define VIRTIO_BLK_S_IOERR 1
+#define VIRTIO_BLK_S_UNSUPP 2
+
+enum vhost_user_request {
+ VHOST_USER_NONE = 0,
+ VHOST_USER_GET_FEATURES = 1,
+ VHOST_USER_SET_FEATURES = 2,
+ VHOST_USER_SET_OWNER = 3,
+ VHOST_USER_RESET_OWNER = 4,
+ VHOST_USER_SET_MEM_TABLE = 5,
+ VHOST_USER_SET_LOG_BASE = 6,
+ VHOST_USER_SET_LOG_FD = 7,
+ VHOST_USER_SET_VRING_NUM = 8,
+ VHOST_USER_SET_VRING_ADDR = 9,
+ VHOST_USER_SET_VRING_BASE = 10,
+ VHOST_USER_GET_VRING_BASE = 11,
+ VHOST_USER_SET_VRING_KICK = 12,
+ VHOST_USER_SET_VRING_CALL = 13,
+ VHOST_USER_SET_VRING_ERR = 14,
+ VHOST_USER_GET_PROTOCOL_FEATURES = 15,
+ VHOST_USER_SET_PROTOCOL_FEATURES = 16,
+ VHOST_USER_GET_QUEUE_NUM = 17,
+ VHOST_USER_SET_VRING_ENABLE = 18,
+ VHOST_USER_MAX
+};
+
+/** Get/set config msg payload */
+struct vhost_user_config {
+ uint32_t offset;
+ uint32_t size;
+ uint32_t flags;
+ uint8_t region[VHOST_USER_MAX_CONFIG_SIZE];
+};
+
+/** Fixed-size vhost_memory struct */
+struct vhost_memory_padded {
+ uint32_t nregions;
+ uint32_t padding;
+ struct vhost_memory_region regions[VHOST_USER_MEMORY_MAX_NREGIONS];
+};
+
+struct vhost_user_msg {
+ enum vhost_user_request request;
+
+#define VHOST_USER_VERSION_MASK 0x3
+#define VHOST_USER_REPLY_MASK (0x1 << 2)
+ uint32_t flags;
+ uint32_t size; /**< the following payload size */
+ union {
+#define VHOST_USER_VRING_IDX_MASK 0xff
+#define VHOST_USER_VRING_NOFD_MASK (0x1 << 8)
+ uint64_t u64;
+ struct vhost_vring_state state;
+ struct vhost_vring_addr addr;
+ struct vhost_memory_padded memory;
+ struct vhost_user_config cfg;
+ } payload;
+} __attribute((packed));
+
+#endif
diff --git a/examples/vhost_blk/meson.build b/examples/vhost_blk/meson.build
new file mode 100644
index 000000000..857367192
--- /dev/null
+++ b/examples/vhost_blk/meson.build
@@ -0,0 +1,21 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2017 Intel Corporation
+
+# meson file, for building this example as part of a main DPDK build.
+#
+# To build this example as a standalone application with an already-installed
+# DPDK instance, use 'make'
+
+if not is_linux
+ build = false
+endif
+
+if not cc.has_header('linux/virtio_blk.h')
+ build = false
+endif
+
+deps += 'vhost'
+allow_experimental_apis = true
+sources = files(
+ 'blk.c', 'vhost_blk.c', 'vhost_blk_compat.c'
+)
diff --git a/examples/vhost_blk/vhost_blk.c b/examples/vhost_blk/vhost_blk.c
new file mode 100644
index 000000000..24807c82f
--- /dev/null
+++ b/examples/vhost_blk/vhost_blk.c
@@ -0,0 +1,1094 @@
+// SPDX-License-Identifier: BSD-3-Clause
+// Copyright(c) 2010-2017 Intel Corporation
+
+#include <stdint.h>
+#include <unistd.h>
+#include <stdbool.h>
+#include <signal.h>
+#include <assert.h>
+#include <semaphore.h>
+#include <linux/virtio_blk.h>
+#include <linux/virtio_ring.h>
+
+#include <rte_atomic.h>
+#include <rte_cycles.h>
+#include <rte_log.h>
+#include <rte_malloc.h>
+#include <rte_vhost.h>
+
+#include "vhost_blk.h"
+#include "blk_spec.h"
+
+#define VIRTQ_DESC_F_NEXT 1
+#define VIRTQ_DESC_F_AVAIL (1 << 7)
+#define VIRTQ_DESC_F_USED (1 << 15)
+
+#define MAX_TASK 12
+
+#define VHOST_BLK_FEATURES ((1ULL << VIRTIO_F_RING_PACKED) | \
+ (1ULL << VIRTIO_F_VERSION_1) |\
+ (1ULL << VIRTIO_F_NOTIFY_ON_EMPTY) | \
+ (1ULL << VHOST_USER_F_PROTOCOL_FEATURES))
+
+/* Path to folder where character device will be created. Can be set by user. */
+static char dev_pathname[PATH_MAX] = "";
+static sem_t exit_sem;
+static int g_should_stop = -1;
+
+struct vhost_blk_ctrlr *
+vhost_blk_ctrlr_find(const char *ctrlr_name)
+{
+ if (ctrlr_name == NULL)
+ return NULL;
+
+ /* currently we only support 1 socket file fd */
+ return g_vhost_ctrlr;
+}
+
+static uint64_t gpa_to_vva(int vid, uint64_t gpa, uint64_t *len)
+{
+ char path[PATH_MAX];
+ struct vhost_blk_ctrlr *ctrlr;
+ int ret = 0;
+
+ ret = rte_vhost_get_ifname(vid, path, PATH_MAX);
+ if (ret) {
+ fprintf(stderr, "Cannot get socket name\n");
+ assert(ret != 0);
+ }
+
+ ctrlr = vhost_blk_ctrlr_find(path);
+ if (!ctrlr) {
+ fprintf(stderr, "Controller is not ready\n");
+ assert(ctrlr != NULL);
+ }
+
+ assert(ctrlr->mem != NULL);
+
+ return rte_vhost_va_from_guest_pa(ctrlr->mem, gpa, len);
+}
+
+static struct vring_packed_desc *
+descriptor_get_next_packed(struct rte_vhost_vring *vq,
+ uint16_t *idx)
+{
+ if (vq->desc_packed[*idx % vq->size].flags & VIRTQ_DESC_F_NEXT) {
+ *idx += 1;
+ return &vq->desc_packed[*idx % vq->size];
+ }
+
+ return NULL;
+}
+
+static bool
+descriptor_has_next_packed(struct vring_packed_desc *cur_desc)
+{
+ return !!(cur_desc->flags & VRING_DESC_F_NEXT);
+}
+
+static bool
+descriptor_is_wr_packed(struct vring_packed_desc *cur_desc)
+{
+ return !!(cur_desc->flags & VRING_DESC_F_WRITE);
+}
+
+static struct rte_vhost_inflight_desc_packed *
+inflight_desc_get_next(struct rte_vhost_inflight_info_packed *inflight_packed,
+ struct rte_vhost_inflight_desc_packed *cur_desc)
+{
+ if (!!(cur_desc->flags & VIRTQ_DESC_F_NEXT))
+ return &inflight_packed->desc[cur_desc->next];
+
+ return NULL;
+}
+
+static bool
+inflight_desc_has_next(struct rte_vhost_inflight_desc_packed *cur_desc)
+{
+ return !!(cur_desc->flags & VRING_DESC_F_NEXT);
+}
+
+static bool
+inflight_desc_is_wr(struct rte_vhost_inflight_desc_packed *cur_desc)
+{
+ return !!(cur_desc->flags & VRING_DESC_F_WRITE);
+}
+
+static void
+inflight_process_payload_chain_packed(struct inflight_blk_task *task)
+{
+ void *data;
+ uint64_t chunck_len;
+ struct vhost_blk_task *blk_task;
+ struct rte_vhost_inflight_desc_packed *desc;
+
+ blk_task = &task->blk_task;
+ blk_task->iovs_cnt = 0;
+
+ do {
+ desc = task->inflight_desc;
+ chunck_len = desc->len;
+ data = (void *)(uintptr_t)gpa_to_vva(blk_task->bdev->vid,
+ desc->addr,
+ &chunck_len);
+ if (!data || chunck_len != desc->len) {
+ fprintf(stderr, "failed to translate desc address.\n");
+ return;
+ }
+
+ blk_task->iovs[blk_task->iovs_cnt].iov_base = data;
+ blk_task->iovs[blk_task->iovs_cnt].iov_len = desc->len;
+ blk_task->data_len += desc->len;
+ blk_task->iovs_cnt++;
+ task->inflight_desc = inflight_desc_get_next(
+ task->inflight_packed, desc);
+ } while (inflight_desc_has_next(task->inflight_desc));
+
+ chunck_len = task->inflight_desc->len;
+ blk_task->status = (void *)(uintptr_t)gpa_to_vva(
+ blk_task->bdev->vid, task->inflight_desc->addr, &chunck_len);
+ if (!blk_task->status || chunck_len != task->inflight_desc->len)
+ fprintf(stderr, "failed to translate desc address.\n");
+}
+
+static void
+inflight_submit_completion_packed(struct inflight_blk_task *task,
+ uint32_t q_idx, uint16_t *used_id,
+ bool *used_wrap_counter)
+{
+ struct vhost_blk_ctrlr *ctrlr;
+ struct rte_vhost_vring *vq;
+ struct vring_packed_desc *desc;
+ int ret;
+
+ ctrlr = vhost_blk_ctrlr_find(dev_pathname);
+ vq = task->blk_task.vq;
+
+ ret = rte_vhost_set_last_inflight_io_packed(ctrlr->bdev->vid, q_idx,
+ task->blk_task.head_idx);
+ if (ret != 0)
+ fprintf(stderr, "failed to set last inflight io\n");
+
+ desc = &vq->desc_packed[*used_id];
+ desc->id = task->blk_task.buffer_id;
+ rte_smp_mb();
+ if (*used_wrap_counter)
+ desc->flags |= VIRTQ_DESC_F_AVAIL | VIRTQ_DESC_F_USED;
+ else
+ desc->flags &= ~(VIRTQ_DESC_F_AVAIL | VIRTQ_DESC_F_USED);
+ rte_smp_mb();
+
+ *used_id += task->blk_task.iovs_cnt + 2;
+ if (*used_id >= vq->size) {
+ *used_id -= vq->size;
+ *used_wrap_counter = !(*used_wrap_counter);
+ }
+
+ ret = rte_vhost_clr_inflight_desc_packed(ctrlr->bdev->vid, q_idx,
+ task->blk_task.head_idx);
+ if (ret != 0)
+ fprintf(stderr, "failed to clear inflight io\n");
+
+ /* Send an interrupt back to the guest VM so that it knows
+ * a completion is ready to be processed.
+ */
+ rte_vhost_vring_call(task->blk_task.bdev->vid, q_idx);
+}
+
+static void
+submit_completion_packed(struct vhost_blk_task *task, uint32_t q_idx,
+ uint16_t *used_id, bool *used_wrap_counter)
+{
+ struct vhost_blk_ctrlr *ctrlr;
+ struct rte_vhost_vring *vq;
+ struct vring_packed_desc *desc;
+ int ret;
+
+ ctrlr = vhost_blk_ctrlr_find(dev_pathname);
+ vq = task->vq;
+
+ ret = rte_vhost_set_last_inflight_io_packed(ctrlr->bdev->vid, q_idx,
+ task->inflight_idx);
+ if (ret != 0)
+ fprintf(stderr, "failed to set last inflight io\n");
+
+ desc = &vq->desc_packed[*used_id];
+ desc->id = task->buffer_id;
+ rte_smp_mb();
+ if (*used_wrap_counter)
+ desc->flags |= VIRTQ_DESC_F_AVAIL | VIRTQ_DESC_F_USED;
+ else
+ desc->flags &= ~(VIRTQ_DESC_F_AVAIL | VIRTQ_DESC_F_USED);
+ rte_smp_mb();
+
+ *used_id += task->iovs_cnt + 2;
+ if (*used_id >= vq->size) {
+ *used_id -= vq->size;
+ *used_wrap_counter = !(*used_wrap_counter);
+ }
+
+ ret = rte_vhost_clr_inflight_desc_packed(ctrlr->bdev->vid, q_idx,
+ task->inflight_idx);
+ if (ret != 0)
+ fprintf(stderr, "failed to clear inflight io\n");
+
+ /* Send an interrupt back to the guest VM so that it knows
+ * a completion is ready to be processed.
+ */
+ rte_vhost_vring_call(task->bdev->vid, q_idx);
+}
+
+static void
+vhost_process_payload_chain_packed(struct vhost_blk_task *task,
+ uint16_t *idx)
+{
+ void *data;
+ uint64_t chunck_len;
+
+ task->iovs_cnt = 0;
+
+ do {
+ chunck_len = task->desc_packed->len;
+ data = (void *)(uintptr_t)gpa_to_vva(task->bdev->vid,
+ task->desc_packed->addr,
+ &chunck_len);
+ if (!data || chunck_len != task->desc_packed->len) {
+ fprintf(stderr, "failed to translate desc address.\n");
+ return;
+ }
+
+ task->iovs[task->iovs_cnt].iov_base = data;
+ task->iovs[task->iovs_cnt].iov_len = task->desc_packed->len;
+ task->data_len += task->desc_packed->len;
+ task->iovs_cnt++;
+ task->desc_packed = descriptor_get_next_packed(task->vq, idx);
+ } while (descriptor_has_next_packed(task->desc_packed));
+
+ task->last_idx = *idx % task->vq->size;
+ chunck_len = task->desc_packed->len;
+ task->status = (void *)(uintptr_t)gpa_to_vva(task->bdev->vid,
+ task->desc_packed->addr,
+ &chunck_len);
+ if (!task->status || chunck_len != task->desc_packed->len)
+ fprintf(stderr, "failed to translate desc address.\n");
+}
+
+
+static int
+descriptor_is_available(struct rte_vhost_vring *vring, uint16_t idx,
+ bool avail_wrap_counter)
+{
+ uint16_t flags = vring->desc_packed[idx].flags;
+
+ return ((!!(flags & VIRTQ_DESC_F_AVAIL) == avail_wrap_counter) &&
+ (!!(flags & VIRTQ_DESC_F_USED) != avail_wrap_counter));
+}
+
+static void
+process_requestq_packed(struct vhost_blk_ctrlr *ctrlr, uint32_t q_idx)
+{
+ bool avail_wrap_counter, used_wrap_counter;
+ uint16_t avail_idx, used_idx;
+ int ret;
+ uint64_t chunck_len;
+ struct vhost_blk_queue *blk_vq;
+ struct rte_vhost_vring *vq;
+ struct vhost_blk_task *task;
+
+ blk_vq = &ctrlr->bdev->queues[q_idx];
+ vq = &blk_vq->vq;
+
+ avail_idx = blk_vq->last_avail_idx;
+ avail_wrap_counter = blk_vq->avail_wrap_counter;
+ used_idx = blk_vq->last_used_idx;
+ used_wrap_counter = blk_vq->used_wrap_counter;
+
+ task = rte_zmalloc(NULL, sizeof(*task), 0);
+ assert(task != NULL);
+ task->vq = vq;
+ task->bdev = ctrlr->bdev;
+
+ while (descriptor_is_available(vq, avail_idx, avail_wrap_counter)) {
+ task->head_idx = avail_idx;
+ task->desc_packed = &task->vq->desc_packed[task->head_idx];
+ task->iovs_cnt = 0;
+ task->data_len = 0;
+ task->req = NULL;
+ task->status = NULL;
+
+ /* does not support indirect descriptors */
+ assert((task->desc_packed->flags & VRING_DESC_F_INDIRECT) == 0);
+
+ chunck_len = task->desc_packed->len;
+ task->req = (void *)(uintptr_t)gpa_to_vva(task->bdev->vid,
+ task->desc_packed->addr, &chunck_len);
+ if (!task->req || chunck_len != task->desc_packed->len) {
+ fprintf(stderr, "failed to translate desc address.\n");
+ rte_free(task);
+ return;
+ }
+
+ task->desc_packed = descriptor_get_next_packed(task->vq,
+ &avail_idx);
+ assert(task->desc_packed != NULL);
+ if (!descriptor_has_next_packed(task->desc_packed)) {
+ task->dxfer_dir = BLK_DIR_NONE;
+ task->last_idx = avail_idx % vq->size;
+ chunck_len = task->desc_packed->len;
+ task->status = (void *)(uintptr_t)
+ gpa_to_vva(task->bdev->vid,
+ task->desc_packed->addr,
+ &chunck_len);
+ if (!task->status ||
+ chunck_len != task->desc_packed->len) {
+ fprintf(stderr,
+ "failed to translate desc address.\n");
+ rte_free(task);
+ return;
+ }
+ } else {
+ task->readtype = descriptor_is_wr_packed(
+ task->desc_packed);
+ vhost_process_payload_chain_packed(task, &avail_idx);
+ }
+ task->buffer_id = vq->desc_packed[task->last_idx].id;
+ rte_vhost_set_inflight_desc_packed(ctrlr->bdev->vid, q_idx,
+ task->head_idx,
+ task->last_idx,
+ &task->inflight_idx);
+
+ if (++avail_idx >= vq->size) {
+ avail_idx -= vq->size;
+ avail_wrap_counter = !avail_wrap_counter;
+ }
+ blk_vq->last_avail_idx = avail_idx;
+ blk_vq->avail_wrap_counter = avail_wrap_counter;
+
+ ret = vhost_bdev_process_blk_commands(ctrlr->bdev, task);
+ if (ret) {
+ /* invalid response */
+ *task->status = VIRTIO_BLK_S_IOERR;
+ } else {
+ /* successfully */
+ *task->status = VIRTIO_BLK_S_OK;
+ }
+
+ submit_completion_packed(task, q_idx, &used_idx,
+ &used_wrap_counter);
+ blk_vq->last_used_idx = used_idx;
+ blk_vq->used_wrap_counter = used_wrap_counter;
+ }
+
+ rte_free(task);
+}
+
+static void
+submit_inflight_vq_packed(struct vhost_blk_ctrlr *ctrlr,
+ uint16_t q_idx)
+{
+ bool used_wrap_counter;
+ int req_idx, ret;
+ uint16_t used_idx;
+ uint64_t chunck_len;
+ struct vhost_blk_queue *blk_vq;
+ struct rte_vhost_ring_inflight *inflight_vq;
+ struct rte_vhost_resubmit_info *resubmit_info;
+ struct rte_vhost_vring *vq;
+ struct inflight_blk_task *task;
+ struct vhost_blk_task *blk_task;
+ struct rte_vhost_inflight_info_packed *inflight_info;
+
+ blk_vq = &ctrlr->bdev->queues[q_idx];
+ vq = &blk_vq->vq;
+ inflight_vq = &blk_vq->inflight_vq;
+ resubmit_info = inflight_vq->resubmit_inflight;
+ inflight_info = inflight_vq->inflight_packed;
+ used_idx = blk_vq->last_used_idx;
+ used_wrap_counter = blk_vq->used_wrap_counter;
+
+ task = rte_malloc(NULL, sizeof(*task), 0);
+ if (!task) {
+ fprintf(stderr, "failed to allocate memory\n");
+ return;
+ }
+ blk_task = &task->blk_task;
+ blk_task->vq = vq;
+ blk_task->bdev = ctrlr->bdev;
+ task->inflight_packed = inflight_vq->inflight_packed;
+
+ while (resubmit_info->resubmit_num-- > 0) {
+ req_idx = resubmit_info->resubmit_num;
+ blk_task->head_idx =
+ resubmit_info->resubmit_list[req_idx].index;
+ task->inflight_desc =
+ &inflight_info->desc[blk_task->head_idx];
+ task->blk_task.iovs_cnt = 0;
+ task->blk_task.data_len = 0;
+ task->blk_task.req = NULL;
+ task->blk_task.status = NULL;
+
+ /* update the avail idx too
+ * as it's initial value equals to used idx
+ */
+ blk_vq->last_avail_idx += task->inflight_desc->num;
+ if (blk_vq->last_avail_idx >= vq->size) {
+ blk_vq->last_avail_idx -= vq->size;
+ blk_vq->avail_wrap_counter =
+ !blk_vq->avail_wrap_counter;
+ }
+
+ /* does not support indirect descriptors */
+ assert(task->inflight_desc != NULL);
+ assert((task->inflight_desc->flags &
+ VRING_DESC_F_INDIRECT) == 0);
+
+ chunck_len = task->inflight_desc->len;
+ blk_task->req = (void *)(uintptr_t)
+ gpa_to_vva(blk_task->bdev->vid,
+ task->inflight_desc->addr,
+ &chunck_len);
+ if (!blk_task->req ||
+ chunck_len != task->inflight_desc->len) {
+ fprintf(stderr, "failed to translate desc address.\n");
+ rte_free(task);
+ return;
+ }
+
+ task->inflight_desc = inflight_desc_get_next(
+ task->inflight_packed, task->inflight_desc);
+ assert(task->inflight_desc != NULL);
+ if (!inflight_desc_has_next(task->inflight_desc)) {
+ blk_task->dxfer_dir = BLK_DIR_NONE;
+ chunck_len = task->inflight_desc->len;
+ blk_task->status = (void *)(uintptr_t)
+ gpa_to_vva(blk_task->bdev->vid,
+ task->inflight_desc->addr,
+ &chunck_len);
+ if (!blk_task->status ||
+ chunck_len != task->inflight_desc->len) {
+ fprintf(stderr,
+ "failed to translate desc address.\n");
+ rte_free(task);
+ return;
+ }
+ } else {
+ blk_task->readtype =
+ inflight_desc_is_wr(task->inflight_desc);
+ inflight_process_payload_chain_packed(task);
+ }
+
+ blk_task->buffer_id = task->inflight_desc->id;
+
+ ret = vhost_bdev_process_blk_commands(ctrlr->bdev, blk_task);
+ if (ret)
+ /* invalid response */
+ *blk_task->status = VIRTIO_BLK_S_IOERR;
+ else
+ /* successfully */
+ *blk_task->status = VIRTIO_BLK_S_OK;
+
+ inflight_submit_completion_packed(task, q_idx, &used_idx,
+ &used_wrap_counter);
+
+ blk_vq->last_used_idx = used_idx;
+ blk_vq->used_wrap_counter = used_wrap_counter;
+ }
+
+ rte_free(task);
+}
+
+static struct vring_desc *
+descriptor_get_next_split(struct vring_desc *vq_desc,
+ struct vring_desc *cur_desc)
+{
+ return &vq_desc[cur_desc->next];
+}
+
+static bool
+descriptor_has_next_split(struct vring_desc *cur_desc)
+{
+ return !!(cur_desc->flags & VRING_DESC_F_NEXT);
+}
+
+static bool
+descriptor_is_wr_split(struct vring_desc *cur_desc)
+{
+ return !!(cur_desc->flags & VRING_DESC_F_WRITE);
+}
+
+static void
+vhost_process_payload_chain_split(struct vhost_blk_task *task)
+{
+ void *data;
+ uint64_t chunck_len;
+
+ task->iovs_cnt = 0;
+
+ do {
+ chunck_len = task->desc_split->len;
+ data = (void *)(uintptr_t)gpa_to_vva(task->bdev->vid,
+ task->desc_split->addr,
+ &chunck_len);
+ if (!data || chunck_len != task->desc_split->len) {
+ fprintf(stderr, "failed to translate desc address.\n");
+ return;
+ }
+
+ task->iovs[task->iovs_cnt].iov_base = data;
+ task->iovs[task->iovs_cnt].iov_len = task->desc_split->len;
+ task->data_len += task->desc_split->len;
+ task->iovs_cnt++;
+ task->desc_split =
+ descriptor_get_next_split(task->vq->desc, task->desc_split);
+ } while (descriptor_has_next_split(task->desc_split));
+
+ chunck_len = task->desc_split->len;
+ task->status = (void *)(uintptr_t)gpa_to_vva(task->bdev->vid,
+ task->desc_split->addr,
+ &chunck_len);
+ if (!task->status || chunck_len != task->desc_split->len)
+ fprintf(stderr, "failed to translate desc address.\n");
+}
+
+static void
+submit_completion_split(struct vhost_blk_task *task, uint32_t vid,
+ uint32_t q_idx)
+{
+ struct rte_vhost_vring *vq;
+ struct vring_used *used;
+
+ vq = task->vq;
+ used = vq->used;
+
+ rte_vhost_set_last_inflight_io_split(vid, q_idx, task->req_idx);
+
+ /* Fill out the next entry in the "used" ring. id = the
+ * index of the descriptor that contained the blk request.
+ * len = the total amount of data transferred for the blk
+ * request. We must report the correct len, for variable
+ * length blk CDBs, where we may return less data than
+ * allocated by the guest VM.
+ */
+ used->ring[used->idx & (vq->size - 1)].id = task->req_idx;
+ used->ring[used->idx & (vq->size - 1)].len = task->data_len;
+ rte_smp_mb();
+ used->idx++;
+ rte_smp_mb();
+
+ rte_vhost_clr_inflight_desc_split(vid, q_idx, used->idx, task->req_idx);
+
+ /* Send an interrupt back to the guest VM so that it knows
+ * a completion is ready to be processed.
+ */
+ rte_vhost_vring_call(task->bdev->vid, q_idx);
+}
+
+static void
+submit_inflight_vq_split(struct vhost_blk_ctrlr *ctrlr,
+ uint32_t q_idx)
+{
+ struct vhost_blk_queue *blk_vq;
+ struct rte_vhost_ring_inflight *inflight_vq;
+ struct rte_vhost_resubmit_info *resubmit_inflight;
+ struct rte_vhost_resubmit_desc *resubmit_list;
+ struct vhost_blk_task *task;
+ int req_idx;
+ uint64_t chunck_len;
+ int ret;
+
+ blk_vq = &ctrlr->bdev->queues[q_idx];
+ inflight_vq = &blk_vq->inflight_vq;
+ resubmit_inflight = inflight_vq->resubmit_inflight;
+ resubmit_list = resubmit_inflight->resubmit_list;
+
+ task = rte_zmalloc(NULL, sizeof(*task), 0);
+ assert(task != NULL);
+
+ task->ctrlr = ctrlr;
+ task->bdev = ctrlr->bdev;
+ task->vq = &blk_vq->vq;
+
+ while (resubmit_inflight->resubmit_num-- > 0) {
+ req_idx = resubmit_list[resubmit_inflight->resubmit_num].index;
+ task->req_idx = req_idx;
+ task->desc_split = &task->vq->desc[task->req_idx];
+ task->iovs_cnt = 0;
+ task->data_len = 0;
+ task->req = NULL;
+ task->status = NULL;
+
+ /* does not support indirect descriptors */
+ assert(task->desc_split != NULL);
+ assert((task->desc_split->flags & VRING_DESC_F_INDIRECT) == 0);
+
+ chunck_len = task->desc_split->len;
+ task->req = (void *)(uintptr_t)gpa_to_vva(task->bdev->vid,
+ task->desc_split->addr, &chunck_len);
+ if (!task->req || chunck_len != task->desc_split->len) {
+ fprintf(stderr, "failed to translate desc address.\n");
+ rte_free(task);
+ return;
+ }
+
+ task->desc_split = descriptor_get_next_split(task->vq->desc,
+ task->desc_split);
+ if (!descriptor_has_next_split(task->desc_split)) {
+ task->dxfer_dir = BLK_DIR_NONE;
+ chunck_len = task->desc_split->len;
+ task->status = (void *)(uintptr_t)
+ gpa_to_vva(task->bdev->vid,
+ task->desc_split->addr,
+ &chunck_len);
+ if (!task->status ||
+ chunck_len != task->desc_split->len) {
+ fprintf(stderr,
+ "failed to translate desc address.\n");
+ rte_free(task);
+ return;
+ }
+ } else {
+ task->readtype =
+ descriptor_is_wr_split(task->desc_split);
+ vhost_process_payload_chain_split(task);
+ }
+
+ ret = vhost_bdev_process_blk_commands(ctrlr->bdev, task);
+ if (ret) {
+ /* invalid response */
+ *task->status = VIRTIO_BLK_S_IOERR;
+ } else {
+ /* successfully */
+ *task->status = VIRTIO_BLK_S_OK;
+ }
+ submit_completion_split(task, ctrlr->bdev->vid, q_idx);
+ }
+
+ rte_free(task);
+}
+
+static void
+process_requestq_split(struct vhost_blk_ctrlr *ctrlr, uint32_t q_idx)
+{
+ int ret;
+ int req_idx;
+ uint16_t last_idx;
+ uint64_t chunck_len;
+ struct vhost_blk_queue *blk_vq;
+ struct rte_vhost_vring *vq;
+ struct vhost_blk_task *task;
+
+ blk_vq = &ctrlr->bdev->queues[q_idx];
+ vq = &blk_vq->vq;
+
+ task = rte_zmalloc(NULL, sizeof(*task), 0);
+ assert(task != NULL);
+ task->ctrlr = ctrlr;
+ task->bdev = ctrlr->bdev;
+ task->vq = vq;
+
+ while (vq->avail->idx != blk_vq->last_avail_idx) {
+ last_idx = blk_vq->last_avail_idx & (vq->size - 1);
+ req_idx = vq->avail->ring[last_idx];
+ task->req_idx = req_idx;
+ task->desc_split = &task->vq->desc[task->req_idx];
+ task->iovs_cnt = 0;
+ task->data_len = 0;
+ task->req = NULL;
+ task->status = NULL;
+
+ rte_vhost_set_inflight_desc_split(ctrlr->bdev->vid, q_idx,
+ task->req_idx);
+
+ /* does not support indirect descriptors */
+ assert((task->desc_split->flags & VRING_DESC_F_INDIRECT) == 0);
+
+ chunck_len = task->desc_split->len;
+ task->req = (void *)(uintptr_t)gpa_to_vva(task->bdev->vid,
+ task->desc_split->addr, &chunck_len);
+ if (!task->req || chunck_len != task->desc_split->len) {
+ fprintf(stderr, "failed to translate desc address.\n");
+ rte_free(task);
+ return;
+ }
+
+ task->desc_split = descriptor_get_next_split(task->vq->desc,
+ task->desc_split);
+ if (!descriptor_has_next_split(task->desc_split)) {
+ task->dxfer_dir = BLK_DIR_NONE;
+ chunck_len = task->desc_split->len;
+ task->status = (void *)(uintptr_t)
+ gpa_to_vva(task->bdev->vid,
+ task->desc_split->addr,
+ &chunck_len);
+ if (!task->status ||
+ chunck_len != task->desc_split->len) {
+ fprintf(stderr,
+ "failed to translate desc address.\n");
+ rte_free(task);
+ return;
+ }
+ } else {
+ task->readtype =
+ descriptor_is_wr_split(task->desc_split);
+ vhost_process_payload_chain_split(task);
+ }
+ blk_vq->last_avail_idx++;
+
+ ret = vhost_bdev_process_blk_commands(ctrlr->bdev, task);
+ if (ret) {
+ /* invalid response */
+ *task->status = VIRTIO_BLK_S_IOERR;
+ } else {
+ /* successfully */
+ *task->status = VIRTIO_BLK_S_OK;
+ }
+
+ submit_completion_split(task, ctrlr->bdev->vid, q_idx);
+ }
+
+ rte_free(task);
+}
+
+static void *
+ctrlr_worker(void *arg)
+{
+ struct vhost_blk_ctrlr *ctrlr = (struct vhost_blk_ctrlr *)arg;
+ struct vhost_blk_queue *blk_vq;
+ struct rte_vhost_ring_inflight *inflight_vq;
+ cpu_set_t cpuset;
+ pthread_t thread;
+ int i;
+
+ fprintf(stdout, "Ctrlr Worker Thread start\n");
+
+ if (ctrlr == NULL || ctrlr->bdev == NULL) {
+ fprintf(stderr,
+ "%s: Error, invalid argument passed to worker thread\n",
+ __func__);
+ exit(0);
+ }
+
+ thread = pthread_self();
+ CPU_ZERO(&cpuset);
+ CPU_SET(0, &cpuset);
+ pthread_setaffinity_np(thread, sizeof(cpu_set_t), &cpuset);
+
+ for (i = 0; i < NUM_OF_BLK_QUEUES; i++) {
+ blk_vq = &ctrlr->bdev->queues[i];
+ inflight_vq = &blk_vq->inflight_vq;
+ if (inflight_vq->resubmit_inflight != NULL &&
+ inflight_vq->resubmit_inflight->resubmit_num != 0) {
+ if (ctrlr->packed_ring)
+ submit_inflight_vq_packed(ctrlr, i);
+ else
+ submit_inflight_vq_split(ctrlr, i);
+ }
+ }
+
+ while (!g_should_stop && ctrlr->bdev != NULL) {
+ for (i = 0; i < NUM_OF_BLK_QUEUES; i++) {
+ if (ctrlr->packed_ring)
+ process_requestq_packed(ctrlr, i);
+ else
+ process_requestq_split(ctrlr, i);
+ }
+ }
+
+ g_should_stop = 2;
+ fprintf(stdout, "Ctrlr Worker Thread Exiting\n");
+ sem_post(&exit_sem);
+ return NULL;
+}
+
+static int
+new_device(int vid)
+{
+ struct vhost_blk_ctrlr *ctrlr;
+ struct vhost_blk_queue *blk_vq;
+ struct rte_vhost_vring *vq;
+ uint64_t features;
+ pthread_t tid;
+ int i, ret;
+
+ ctrlr = vhost_blk_ctrlr_find(dev_pathname);
+ if (!ctrlr) {
+ fprintf(stderr, "Controller is not ready\n");
+ return -1;
+ }
+
+ if (ctrlr->started)
+ return 0;
+
+ ctrlr->bdev->vid = vid;
+ ret = rte_vhost_get_negotiated_features(vid, &features);
+ if (ret) {
+ fprintf(stderr, "failed to get the negotiated features\n");
+ return -1;
+ }
+ ctrlr->packed_ring = !!(features & (1ULL << VIRTIO_F_RING_PACKED));
+
+ ret = rte_vhost_get_mem_table(vid, &ctrlr->mem);
+ if (ret)
+ fprintf(stderr, "Get Controller memory region failed\n");
+ assert(ctrlr->mem != NULL);
+
+ /* Disable Notifications and init last idx */
+ for (i = 0; i < NUM_OF_BLK_QUEUES; i++) {
+ blk_vq = &ctrlr->bdev->queues[i];
+ vq = &blk_vq->vq;
+
+ ret = rte_vhost_get_vhost_vring(ctrlr->bdev->vid, i, vq);
+ assert(ret == 0);
+
+ ret = rte_vhost_get_vring_base(ctrlr->bdev->vid, i,
+ &blk_vq->last_avail_idx,
+ &blk_vq->last_used_idx);
+ assert(ret == 0);
+
+ ret = rte_vhost_get_vhost_ring_inflight(ctrlr->bdev->vid, i,
+ &blk_vq->inflight_vq);
+ assert(ret == 0);
+
+ if (ctrlr->packed_ring) {
+ /* for the reconnection */
+ ret = rte_vhost_get_vring_base_from_inflight(
+ ctrlr->bdev->vid, i,
+ &blk_vq->last_avail_idx,
+ &blk_vq->last_used_idx);
+
+ blk_vq->avail_wrap_counter = blk_vq->last_avail_idx &
+ (1 << 15);
+ blk_vq->last_avail_idx = blk_vq->last_avail_idx &
+ 0x7fff;
+ blk_vq->used_wrap_counter = blk_vq->last_used_idx &
+ (1 << 15);
+ blk_vq->last_used_idx = blk_vq->last_used_idx &
+ 0x7fff;
+ }
+
+ rte_vhost_enable_guest_notification(vid, i, 0);
+ }
+
+ /* start polling vring */
+ g_should_stop = 0;
+ fprintf(stdout, "New Device %s, Device ID %d\n", dev_pathname, vid);
+ if (pthread_create(&tid, NULL, &ctrlr_worker, ctrlr) < 0) {
+ fprintf(stderr, "Worker Thread Started Failed\n");
+ return -1;
+ }
+
+ /* device has been started */
+ ctrlr->started = 1;
+ pthread_detach(tid);
+ return 0;
+}
+
+static void
+destroy_device(int vid)
+{
+ char path[PATH_MAX];
+ struct vhost_blk_ctrlr *ctrlr;
+ struct vhost_blk_queue *blk_vq;
+ int i, ret;
+
+ ret = rte_vhost_get_ifname(vid, path, PATH_MAX);
+ if (ret) {
+ fprintf(stderr, "Destroy Ctrlr Failed\n");
+ return;
+ }
+
+ fprintf(stdout, "Destroy %s Device ID %d\n", path, vid);
+ ctrlr = vhost_blk_ctrlr_find(path);
+ if (!ctrlr) {
+ fprintf(stderr, "Destroy Ctrlr Failed\n");
+ return;
+ }
+
+ if (!ctrlr->started)
+ return;
+
+ g_should_stop = 1;
+ while (g_should_stop != 2)
+ ;
+
+ for (i = 0; i < NUM_OF_BLK_QUEUES; i++) {
+ blk_vq = &ctrlr->bdev->queues[i];
+ if (ctrlr->packed_ring) {
+ blk_vq->last_avail_idx |= (blk_vq->avail_wrap_counter <<
+ 15);
+ blk_vq->last_used_idx |= (blk_vq->used_wrap_counter <<
+ 15);
+ }
+ rte_vhost_set_vring_base(ctrlr->bdev->vid, i,
+ blk_vq->last_avail_idx,
+ blk_vq->last_used_idx);
+ }
+
+ free(ctrlr->mem);
+
+ ctrlr->started = 0;
+ sem_wait(&exit_sem);
+}
+
+static int
+new_connection(int vid)
+{
+ /* extend the proper features for block device */
+ vhost_session_install_rte_compat_hooks(vid);
+
+ return 0;
+}
+
+struct vhost_device_ops vhost_blk_device_ops = {
+ .new_device = new_device,
+ .destroy_device = destroy_device,
+ .new_connection = new_connection,
+};
+
+static struct vhost_block_dev *
+vhost_blk_bdev_construct(const char *bdev_name,
+ const char *bdev_serial, uint32_t blk_size, uint64_t blk_cnt,
+ bool wce_enable)
+{
+ struct vhost_block_dev *bdev;
+
+ bdev = rte_zmalloc(NULL, sizeof(*bdev), RTE_CACHE_LINE_SIZE);
+ if (!bdev)
+ return NULL;
+
+ strncpy(bdev->name, bdev_name, sizeof(bdev->name));
+ strncpy(bdev->product_name, bdev_serial, sizeof(bdev->product_name));
+ bdev->blocklen = blk_size;
+ bdev->blockcnt = blk_cnt;
+ bdev->write_cache = wce_enable;
+
+ fprintf(stdout, "blocklen=%d, blockcnt=%"PRIx64"\n", bdev->blocklen,
+ bdev->blockcnt);
+
+ /* use memory as disk storage space */
+ bdev->data = rte_zmalloc(NULL, blk_cnt * blk_size, 0);
+ if (!bdev->data) {
+ fprintf(stderr, "no enough reserved huge memory for disk\n");
+ free(bdev);
+ return NULL;
+ }
+
+ return bdev;
+}
+
+static struct vhost_blk_ctrlr *
+vhost_blk_ctrlr_construct(const char *ctrlr_name)
+{
+ int ret;
+ struct vhost_blk_ctrlr *ctrlr;
+ char *path;
+ char cwd[PATH_MAX];
+
+ /* always use current directory */
+ path = getcwd(cwd, PATH_MAX);
+ if (!path) {
+ fprintf(stderr, "Cannot get current working directory\n");
+ return NULL;
+ }
+ snprintf(dev_pathname, sizeof(dev_pathname), "%s/%s", path, ctrlr_name);
+
+ if (access(dev_pathname, F_OK) != -1) {
+ if (unlink(dev_pathname) != 0)
+ rte_exit(EXIT_FAILURE, "Cannot remove %s.\n",
+ dev_pathname);
+ }
+
+ if (rte_vhost_driver_register(dev_pathname, 0) != 0) {
+ fprintf(stderr, "socket %s already exists\n", dev_pathname);
+ return NULL;
+ }
+
+ ret = rte_vhost_driver_set_features(dev_pathname, VHOST_BLK_FEATURES);
+ if (ret != 0) {
+ fprintf(stderr, "Set vhost driver features failed\n");
+ rte_vhost_driver_unregister(dev_pathname);
+ return NULL;
+ }
+
+ /* set proper features */
+ vhost_dev_install_rte_compat_hooks(dev_pathname);
+
+ ctrlr = rte_zmalloc(NULL, sizeof(*ctrlr), RTE_CACHE_LINE_SIZE);
+ if (!ctrlr) {
+ rte_vhost_driver_unregister(dev_pathname);
+ return NULL;
+ }
+
+ /* hardcoded block device information with 128MiB */
+ ctrlr->bdev = vhost_blk_bdev_construct("malloc0", "vhost_blk_malloc0",
+ 4096, 32768, 0);
+ if (!ctrlr->bdev) {
+ rte_free(ctrlr);
+ rte_vhost_driver_unregister(dev_pathname);
+ return NULL;
+ }
+
+ rte_vhost_driver_callback_register(dev_pathname,
+ &vhost_blk_device_ops);
+
+ return ctrlr;
+}
+
+static void
+signal_handler(__rte_unused int signum)
+{
+ struct vhost_blk_ctrlr *ctrlr;
+
+ if (access(dev_pathname, F_OK) == 0)
+ unlink(dev_pathname);
+
+ if (g_should_stop != -1) {
+ g_should_stop = 1;
+ while (g_should_stop != 2)
+ ;
+ }
+
+ ctrlr = vhost_blk_ctrlr_find(dev_pathname);
+ if (ctrlr != NULL) {
+ if (ctrlr->bdev != NULL) {
+ rte_free(ctrlr->bdev->data);
+ rte_free(ctrlr->bdev);
+ }
+ rte_free(ctrlr);
+ }
+
+ rte_vhost_driver_unregister(dev_pathname);
+ exit(0);
+}
+
+int main(int argc, char *argv[])
+{
+ int ret;
+
+ signal(SIGINT, signal_handler);
+
+ /* init EAL */
+ ret = rte_eal_init(argc, argv);
+ if (ret < 0)
+ rte_exit(EXIT_FAILURE, "Error with EAL initialization\n");
+
+ g_vhost_ctrlr = vhost_blk_ctrlr_construct("vhost.socket");
+ if (g_vhost_ctrlr == NULL) {
+ fprintf(stderr, "Construct vhost blk controller failed\n");
+ return 0;
+ }
+
+ if (sem_init(&exit_sem, 0, 0) < 0) {
+ fprintf(stderr, "Error init exit_sem\n");
+ return -1;
+ }
+
+ rte_vhost_driver_start(dev_pathname);
+
+ /* loop for exit the application */
+ while (1)
+ sleep(1);
+
+ return 0;
+}
+
diff --git a/examples/vhost_blk/vhost_blk.h b/examples/vhost_blk/vhost_blk.h
new file mode 100644
index 000000000..933e2b7c5
--- /dev/null
+++ b/examples/vhost_blk/vhost_blk.h
@@ -0,0 +1,127 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2017 Intel Corporation
+ */
+
+#ifndef _VHOST_BLK_H_
+#define _VHOST_BLK_H_
+
+#include <stdio.h>
+#include <sys/uio.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <linux/virtio_blk.h>
+#include <linux/virtio_ring.h>
+
+#include <rte_vhost.h>
+
+#ifndef VIRTIO_F_RING_PACKED
+#define VIRTIO_F_RING_PACKED 34
+
+struct vring_packed_desc {
+ /* Buffer Address. */
+ __le64 addr;
+ /* Buffer Length. */
+ __le32 len;
+ /* Buffer ID. */
+ __le16 id;
+ /* The flags depending on descriptor type. */
+ __le16 flags;
+};
+#endif
+
+struct vhost_blk_queue {
+ struct rte_vhost_vring vq;
+ struct rte_vhost_ring_inflight inflight_vq;
+ uint16_t last_avail_idx;
+ uint16_t last_used_idx;
+ bool avail_wrap_counter;
+ bool used_wrap_counter;
+};
+
+#define NUM_OF_BLK_QUEUES 1
+
+#define min(a, b) (((a) < (b)) ? (a) : (b))
+
+struct vhost_block_dev {
+ /** ID for vhost library. */
+ int vid;
+ /** Queues for the block device */
+ struct vhost_blk_queue queues[NUM_OF_BLK_QUEUES];
+ /** Unique name for this block device. */
+ char name[64];
+
+ /** Unique product name for this kind of block device. */
+ char product_name[256];
+
+ /** Size in bytes of a logical block for the backend */
+ uint32_t blocklen;
+
+ /** Number of blocks */
+ uint64_t blockcnt;
+
+ /** write cache enabled, not used at the moment */
+ int write_cache;
+
+ /** use memory as disk storage space */
+ uint8_t *data;
+};
+
+struct vhost_blk_ctrlr {
+ uint8_t started;
+ uint8_t packed_ring;
+ uint8_t need_restart;
+ /** Only support 1 LUN for the example */
+ struct vhost_block_dev *bdev;
+ /** VM memory region */
+ struct rte_vhost_memory *mem;
+} __rte_cache_aligned;
+
+#define VHOST_BLK_MAX_IOVS 128
+
+enum blk_data_dir {
+ BLK_DIR_NONE = 0,
+ BLK_DIR_TO_DEV = 1,
+ BLK_DIR_FROM_DEV = 2,
+};
+
+struct vhost_blk_task {
+ uint8_t readtype;
+ uint8_t req_idx;
+ uint16_t head_idx;
+ uint16_t last_idx;
+ uint16_t inflight_idx;
+ uint16_t buffer_id;
+ uint32_t dxfer_dir;
+ uint32_t data_len;
+ struct virtio_blk_outhdr *req;
+
+ volatile uint8_t *status;
+
+ struct iovec iovs[VHOST_BLK_MAX_IOVS];
+ uint32_t iovs_cnt;
+ struct vring_packed_desc *desc_packed;
+ struct vring_desc *desc_split;
+ struct rte_vhost_vring *vq;
+ struct vhost_block_dev *bdev;
+ struct vhost_blk_ctrlr *ctrlr;
+};
+
+struct inflight_blk_task {
+ struct vhost_blk_task blk_task;
+ struct rte_vhost_inflight_desc_packed *inflight_desc;
+ struct rte_vhost_inflight_info_packed *inflight_packed;
+};
+
+struct vhost_blk_ctrlr *g_vhost_ctrlr;
+struct vhost_device_ops vhost_blk_device_ops;
+
+int vhost_bdev_process_blk_commands(struct vhost_block_dev *bdev,
+ struct vhost_blk_task *task);
+
+void vhost_session_install_rte_compat_hooks(uint32_t vid);
+
+void vhost_dev_install_rte_compat_hooks(const char *path);
+
+struct vhost_blk_ctrlr *vhost_blk_ctrlr_find(const char *ctrlr_name);
+
+#endif /* _VHOST_blk_H_ */
diff --git a/examples/vhost_blk/vhost_blk_compat.c b/examples/vhost_blk/vhost_blk_compat.c
new file mode 100644
index 000000000..4accfa498
--- /dev/null
+++ b/examples/vhost_blk/vhost_blk_compat.c
@@ -0,0 +1,173 @@
+// SPDX-License-Identifier: BSD-3-Clause
+// Copyright(c) 2010-2017 Intel Corporation
+
+#ifndef _VHOST_BLK_COMPAT_H_
+#define _VHOST_BLK_COMPAT_H_
+
+#include <sys/uio.h>
+#include <stdint.h>
+#include <linux/virtio_blk.h>
+#include <linux/virtio_ring.h>
+
+#include <rte_vhost.h>
+#include "vhost_blk.h"
+#include "blk_spec.h"
+
+#define VHOST_MAX_VQUEUES 256
+#define SPDK_VHOST_MAX_VQ_SIZE 1024
+
+#define VHOST_USER_GET_CONFIG 24
+#define VHOST_USER_SET_CONFIG 25
+
+static int
+vhost_blk_get_config(struct vhost_block_dev *bdev, uint8_t *config,
+ uint32_t len)
+{
+ struct virtio_blk_config blkcfg;
+ uint32_t blk_size;
+ uint64_t blkcnt;
+
+ if (bdev == NULL) {
+ /* We can't just return -1 here as this GET_CONFIG message might
+ * be caused by a QEMU VM reboot. Returning -1 will indicate an
+ * error to QEMU, who might then decide to terminate itself.
+ * We don't want that. A simple reboot shouldn't break the
+ * system.
+ *
+ * Presenting a block device with block size 0 and block count 0
+ * doesn't cause any problems on QEMU side and the virtio-pci
+ * device is even still available inside the VM, but there will
+ * be no block device created for it - the kernel drivers will
+ * silently reject it.
+ */
+ blk_size = 0;
+ blkcnt = 0;
+ } else {
+ blk_size = bdev->blocklen;
+ blkcnt = bdev->blockcnt;
+ }
+
+ memset(&blkcfg, 0, sizeof(blkcfg));
+ blkcfg.blk_size = blk_size;
+ /* minimum I/O size in blocks */
+ blkcfg.min_io_size = 1;
+ /* expressed in 512 Bytes sectors */
+ blkcfg.capacity = (blkcnt * blk_size) / 512;
+ /* QEMU can overwrite this value when started */
+ blkcfg.num_queues = VHOST_MAX_VQUEUES;
+
+ fprintf(stdout, "block device:blk_size = %d, blkcnt = %ld\n", blk_size,
+ blkcnt);
+
+ memcpy(config, &blkcfg, min(len, sizeof(blkcfg)));
+
+ return 0;
+}
+
+static enum rte_vhost_msg_result
+extern_vhost_pre_msg_handler(int vid, void *_msg)
+{
+ char path[PATH_MAX];
+ struct vhost_blk_ctrlr *ctrlr;
+ struct vhost_user_msg *msg = _msg;
+ int ret;
+
+ ret = rte_vhost_get_ifname(vid, path, PATH_MAX);
+ if (ret) {
+ fprintf(stderr, "Cannot get socket name\n");
+ return -1;
+ }
+
+ ctrlr = vhost_blk_ctrlr_find(path);
+ if (!ctrlr) {
+ fprintf(stderr, "Controller is not ready\n");
+ return -1;
+ }
+
+ switch ((int)msg->request) {
+ case VHOST_USER_GET_VRING_BASE:
+ case VHOST_USER_SET_VRING_BASE:
+ case VHOST_USER_SET_VRING_ADDR:
+ case VHOST_USER_SET_VRING_NUM:
+ case VHOST_USER_SET_VRING_KICK:
+ case VHOST_USER_SET_VRING_CALL:
+ case VHOST_USER_SET_MEM_TABLE:
+ break;
+ case VHOST_USER_GET_CONFIG: {
+ int rc = 0;
+
+ rc = vhost_blk_get_config(ctrlr->bdev,
+ msg->payload.cfg.region,
+ msg->payload.cfg.size);
+ if (rc != 0)
+ msg->size = 0;
+
+ return RTE_VHOST_MSG_RESULT_REPLY;
+ }
+ case VHOST_USER_SET_CONFIG:
+ default:
+ break;
+ }
+
+ return RTE_VHOST_MSG_RESULT_NOT_HANDLED;
+}
+
+static enum rte_vhost_msg_result
+extern_vhost_post_msg_handler(int vid, void *_msg)
+{
+ char path[PATH_MAX];
+ struct vhost_blk_ctrlr *ctrlr;
+ struct vhost_user_msg *msg = _msg;
+ int ret;
+
+ ret = rte_vhost_get_ifname(vid, path, PATH_MAX);
+ if (ret) {
+ fprintf(stderr, "Cannot get socket name\n");
+ return -1;
+ }
+
+ ctrlr = vhost_blk_ctrlr_find(path);
+ if (!ctrlr) {
+ fprintf(stderr, "Controller is not ready\n");
+ return -1;
+ }
+
+ switch (msg->request) {
+ case VHOST_USER_SET_FEATURES:
+ case VHOST_USER_SET_VRING_KICK:
+ default:
+ break;
+ }
+
+ return RTE_VHOST_MSG_RESULT_NOT_HANDLED;
+}
+
+struct rte_vhost_user_extern_ops g_extern_vhost_ops = {
+ .pre_msg_handle = extern_vhost_pre_msg_handler,
+ .post_msg_handle = extern_vhost_post_msg_handler,
+};
+
+void
+vhost_session_install_rte_compat_hooks(uint32_t vid)
+{
+ int rc;
+
+ rc = rte_vhost_extern_callback_register(vid, &g_extern_vhost_ops, NULL);
+ if (rc != 0)
+ fprintf(stderr,
+ "rte_vhost_extern_callback_register() failed for vid = %d\n",
+ vid);
+}
+
+void
+vhost_dev_install_rte_compat_hooks(const char *path)
+{
+ uint64_t protocol_features = 0;
+
+ rte_vhost_driver_get_protocol_features(path, &protocol_features);
+ protocol_features |= (1ULL << VHOST_USER_PROTOCOL_F_CONFIG);
+ protocol_features |= (1ULL << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD);
+ rte_vhost_driver_set_protocol_features(path, protocol_features);
+}
+
+#endif
--
2.17.2
^ permalink raw reply [flat|nested] 27+ messages in thread
* [dpdk-dev] [PATCH v6] vhost: add vhost-user-blk example which support inflight
2019-11-01 10:42 ` [dpdk-dev] [PATCH v5] " Jin Yu
@ 2019-11-04 16:36 ` Jin Yu
2019-11-06 20:26 ` Maxime Coquelin
2019-11-06 21:01 ` Maxime Coquelin
0 siblings, 2 replies; 27+ messages in thread
From: Jin Yu @ 2019-11-04 16:36 UTC (permalink / raw)
To: Thomas Monjalon, John McNamara, Marko Kovacevic, Maxime Coquelin,
Tiwei Bie, Zhihong Wang
Cc: dev, Jin Yu
A vhost-user-blk example that support inflight feature. It uses the
new APIs that introduced in the first patch, so it can show how these
APIs work to support inflight feature.
Signed-off-by: Jin Yu <jin.yu@intel.com>
---
v1 - add the case.
v2 - add the rte_vhost prefix.
v3 - add packed ring support
v4 - fix build, MAINTAINERS and add guides
v5 - fix ci/intel-compilation errors
v6 - fix ci/intel-compilation errors
---
MAINTAINERS | 2 +
doc/guides/sample_app_ug/index.rst | 1 +
doc/guides/sample_app_ug/vhost_blk.rst | 63 ++
examples/meson.build | 2 +-
examples/vhost_blk/Makefile | 68 ++
examples/vhost_blk/blk.c | 125 +++
examples/vhost_blk/blk_spec.h | 95 ++
examples/vhost_blk/meson.build | 21 +
examples/vhost_blk/vhost_blk.c | 1094 ++++++++++++++++++++++++
examples/vhost_blk/vhost_blk.h | 127 +++
examples/vhost_blk/vhost_blk_compat.c | 173 ++++
11 files changed, 1770 insertions(+), 1 deletion(-)
create mode 100644 doc/guides/sample_app_ug/vhost_blk.rst
create mode 100644 examples/vhost_blk/Makefile
create mode 100644 examples/vhost_blk/blk.c
create mode 100644 examples/vhost_blk/blk_spec.h
create mode 100644 examples/vhost_blk/meson.build
create mode 100644 examples/vhost_blk/vhost_blk.c
create mode 100644 examples/vhost_blk/vhost_blk.h
create mode 100644 examples/vhost_blk/vhost_blk_compat.c
diff --git a/MAINTAINERS b/MAINTAINERS
index 717c31801..c22a8312e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -839,6 +839,8 @@ F: lib/librte_vhost/
F: doc/guides/prog_guide/vhost_lib.rst
F: examples/vhost/
F: doc/guides/sample_app_ug/vhost.rst
+F: example/vhost_blk/
+F: doc/guides/sample_app_ug/vhost_blk.rst
F: examples/vhost_crypto/
F: examples/vdpa/
F: doc/guides/sample_app_ug/vdpa.rst
diff --git a/doc/guides/sample_app_ug/index.rst b/doc/guides/sample_app_ug/index.rst
index a3737c118..613f483f3 100644
--- a/doc/guides/sample_app_ug/index.rst
+++ b/doc/guides/sample_app_ug/index.rst
@@ -40,6 +40,7 @@ Sample Applications User Guides
packet_ordering
vmdq_dcb_forwarding
vhost
+ vhost_blk
vhost_crypto
vdpa
ip_pipeline
diff --git a/doc/guides/sample_app_ug/vhost_blk.rst b/doc/guides/sample_app_ug/vhost_blk.rst
new file mode 100644
index 000000000..39096e2e4
--- /dev/null
+++ b/doc/guides/sample_app_ug/vhost_blk.rst
@@ -0,0 +1,63 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+ Copyright(c) 2010-2017 Intel Corporation.
+
+Vhost_blk Sample Application
+=============================
+
+The vhost_blk sample application implemented a simple block device,
+which used as the backend of Qemu vhost-user-blk device. Users can extend
+the exist example to use other type of block device(e.g. AIO) besides
+memory based block device. Similar with vhost-user-net device, the sample
+application used domain socket to communicate with Qemu, and the virtio
+ring (split or packed format) was processed by vhost_blk sample application.
+
+The sample application reuse lots codes from SPDK(Storage Performance
+Development Kit, https://github.com/spdk/spdk) vhost-user-blk target,
+for DPDK vhost library used in storage area, user can take SPDK as
+reference as well.
+
+Testing steps
+-------------
+
+This section shows the steps how to start a VM with the block device as
+fast data path for critical application.
+
+Compiling the Application
+-------------------------
+
+To compile the sample application see :doc:`compiling`.
+
+The application is located in the ``examples`` sub-directory.
+
+You will also need to build DPDK both on the host and inside the guest
+
+Start the vhost_blk example
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: console
+
+ ./vhost_blk -m 1024
+
+.. _vhost_blk_app_run_vm:
+
+Start the VM
+~~~~~~~~~~~~
+
+.. code-block:: console
+
+ qemu-system-x86_64 -machine accel=kvm \
+ -m $mem -object memory-backend-file,id=mem,size=$mem,\
+ mem-path=/dev/hugepages,share=on -numa node,memdev=mem \
+ -drive file=os.img,if=none,id=disk \
+ -device ide-hd,drive=disk,bootindex=0 \
+ -chardev socket,id=char0,reconnect=1,path=/tmp/vhost.socket \
+ -device vhost-user-blk-pci,ring_packed=1,chardev=char0,num-queues=1 \
+ ...
+
+.. note::
+ You must check whether your Qemu can support "vhost-user-blk" or not,
+ Qemu v4.0 or newer version is required.
+ reconnect=1 means live recovery support that qemu can reconnect vhost_blk
+ after we restart vhost_blk example.
+ ring_packed=1 means the device support packed ring but need the guest kernel
+ version >= 5.0
diff --git a/examples/meson.build b/examples/meson.build
index 98ae50a49..10a6bd7ef 100644
--- a/examples/meson.build
+++ b/examples/meson.build
@@ -42,7 +42,7 @@ all_examples = [
'skeleton', 'tep_termination',
'timer', 'vdpa',
'vhost', 'vhost_crypto',
- 'vm_power_manager',
+ 'vhost_blk', 'vm_power_manager',
'vm_power_manager/guest_cli',
'vmdq', 'vmdq_dcb',
]
diff --git a/examples/vhost_blk/Makefile b/examples/vhost_blk/Makefile
new file mode 100644
index 000000000..a10a90071
--- /dev/null
+++ b/examples/vhost_blk/Makefile
@@ -0,0 +1,68 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2010-2014 Intel Corporation
+
+# binary name
+APP = vhost-blk
+
+# all source are stored in SRCS-y
+SRCS-y := blk.c vhost_blk.c vhost_blk_compat.c
+
+# Build using pkg-config variables if possible
+$(shell pkg-config --exists libdpdk)
+ifeq ($(.SHELLSTATUS),0)
+
+all: shared
+.PHONY: shared static
+shared: build/$(APP)-shared
+ ln -sf $(APP)-shared build/$(APP)
+static: build/$(APP)-static
+ ln -sf $(APP)-static build/$(APP)
+
+LDFLAGS += -pthread
+
+PC_FILE := $(shell pkg-config --path libdpdk)
+CFLAGS += -O3 $(shell pkg-config --cflags libdpdk)
+LDFLAGS_SHARED = $(shell pkg-config --libs libdpdk)
+LDFLAGS_STATIC = -Wl,-Bstatic $(shell pkg-config --static --libs libdpdk)
+
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+
+build/$(APP)-shared: $(SRCS-y) Makefile $(PC_FILE) | build
+ $(CC) $(CFLAGS) $(SRCS-y) -o $@ $(LDFLAGS) $(LDFLAGS_SHARED)
+
+build/$(APP)-static: $(SRCS-y) Makefile $(PC_FILE) | build
+ $(CC) $(CFLAGS) $(SRCS-y) -o $@ $(LDFLAGS) $(LDFLAGS_STATIC)
+
+build:
+ @mkdir -p $@
+
+.PHONY: clean
+clean:
+ rm -f build/$(APP) build/$(APP)-static build/$(APP)-shared
+ test -d build && rmdir -p build || true
+
+else # Build using legacy build system
+
+ifeq ($(RTE_SDK),)
+$(error "Please define RTE_SDK environment variable")
+endif
+
+# Default target, detect a build directory, by looking for a path with a .config
+RTE_TARGET ?= $(notdir $(abspath $(dir $(firstword $(wildcard $(RTE_SDK)/*/.config)))))
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+ifneq ($(CONFIG_RTE_EXEC_ENV_LINUX),y)
+$(info This application can only operate in a linux environment, \
+please change the definition of the RTE_TARGET environment variable)
+all:
+else
+
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+CFLAGS += -O2 -D_FILE_OFFSET_BITS=64
+CFLAGS += $(WERROR_FLAGS)
+
+include $(RTE_SDK)/mk/rte.extapp.mk
+
+endif
+endif
diff --git a/examples/vhost_blk/blk.c b/examples/vhost_blk/blk.c
new file mode 100644
index 000000000..424ed3015
--- /dev/null
+++ b/examples/vhost_blk/blk.c
@@ -0,0 +1,125 @@
+// SPDX-License-Identifier: BSD-3-Clause
+// Copyright(c) 2010-2019 Intel Corporation
+
+/**
+ * This work is largely based on the "vhost-user-blk" implementation by
+ * SPDK(https://github.com/spdk/spdk).
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <assert.h>
+#include <ctype.h>
+#include <string.h>
+#include <stddef.h>
+
+#include <rte_atomic.h>
+#include <rte_cycles.h>
+#include <rte_log.h>
+#include <rte_malloc.h>
+#include <rte_byteorder.h>
+#include <rte_string_fns.h>
+
+#include "vhost_blk.h"
+#include "blk_spec.h"
+
+static void
+vhost_strcpy_pad(void *dst, const char *src, size_t size, int pad)
+{
+ size_t len;
+
+ len = strlen(src);
+ if (len < size) {
+ memcpy(dst, src, len);
+ memset((char *)dst + len, pad, size - len);
+ } else {
+ memcpy(dst, src, size);
+ }
+}
+
+static int
+vhost_bdev_blk_readwrite(struct vhost_block_dev *bdev,
+ struct vhost_blk_task *task,
+ uint64_t lba_512, __rte_unused uint32_t xfer_len)
+{
+ uint32_t i;
+ uint64_t offset;
+ uint32_t nbytes = 0;
+
+ offset = lba_512 * 512;
+
+ for (i = 0; i < task->iovs_cnt; i++) {
+ if (task->dxfer_dir == BLK_DIR_TO_DEV)
+ memcpy(bdev->data + offset, task->iovs[i].iov_base,
+ task->iovs[i].iov_len);
+ else
+ memcpy(task->iovs[i].iov_base, bdev->data + offset,
+ task->iovs[i].iov_len);
+ offset += task->iovs[i].iov_len;
+ nbytes += task->iovs[i].iov_len;
+ }
+
+ return nbytes;
+}
+
+int
+vhost_bdev_process_blk_commands(struct vhost_block_dev *bdev,
+ struct vhost_blk_task *task)
+{
+ int used_len;
+
+ if (unlikely(task->data_len > (bdev->blockcnt * bdev->blocklen))) {
+ fprintf(stderr, "read or write beyond capacity\n");
+ return VIRTIO_BLK_S_UNSUPP;
+ }
+
+ switch (task->req->type) {
+ case VIRTIO_BLK_T_IN:
+ if (unlikely(task->data_len == 0 ||
+ (task->data_len & (512 - 1)) != 0)) {
+ fprintf(stderr,
+ "%s - passed IO buffer is not multiple of 512b"
+ "(req_idx = %"PRIu16").\n",
+ task->req->type ? "WRITE" : "READ",
+ task->head_idx);
+ return VIRTIO_BLK_S_UNSUPP;
+ }
+
+ task->dxfer_dir = BLK_DIR_FROM_DEV;
+ vhost_bdev_blk_readwrite(bdev, task,
+ task->req->sector, task->data_len);
+ break;
+ case VIRTIO_BLK_T_OUT:
+ if (unlikely(task->data_len == 0 ||
+ (task->data_len & (512 - 1)) != 0)) {
+ fprintf(stderr,
+ "%s - passed IO buffer is not multiple of 512b"
+ "(req_idx = %"PRIu16").\n",
+ task->req->type ? "WRITE" : "READ",
+ task->head_idx);
+ return VIRTIO_BLK_S_UNSUPP;
+ }
+
+ if (task->readtype) {
+ fprintf(stderr, "type isn't right\n");
+ return VIRTIO_BLK_S_IOERR;
+ }
+ task->dxfer_dir = BLK_DIR_TO_DEV;
+ vhost_bdev_blk_readwrite(bdev, task,
+ task->req->sector, task->data_len);
+ break;
+ case VIRTIO_BLK_T_GET_ID:
+ if (!task->iovs_cnt || task->data_len)
+ return VIRTIO_BLK_S_UNSUPP;
+ used_len = min(VIRTIO_BLK_ID_BYTES, task->data_len);
+ vhost_strcpy_pad(task->iovs[0].iov_base,
+ bdev->product_name, used_len, ' ');
+ break;
+ default:
+ fprintf(stderr, "unsupported cmd\n");
+ return VIRTIO_BLK_S_UNSUPP;
+ }
+
+ return VIRTIO_BLK_S_OK;
+}
diff --git a/examples/vhost_blk/blk_spec.h b/examples/vhost_blk/blk_spec.h
new file mode 100644
index 000000000..5875e2f86
--- /dev/null
+++ b/examples/vhost_blk/blk_spec.h
@@ -0,0 +1,95 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Intel Corporation
+ */
+
+#ifndef _BLK_SPEC_H
+#define _BLK_SPEC_H
+
+#include <stdint.h>
+
+#ifndef VHOST_USER_MEMORY_MAX_NREGIONS
+#define VHOST_USER_MEMORY_MAX_NREGIONS 8
+#endif
+
+#ifndef VHOST_USER_MAX_CONFIG_SIZE
+#define VHOST_USER_MAX_CONFIG_SIZE 256
+#endif
+
+#ifndef VHOST_USER_PROTOCOL_F_CONFIG
+#define VHOST_USER_PROTOCOL_F_CONFIG 9
+#endif
+
+#ifndef VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD
+#define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD 12
+#endif
+
+#define VIRTIO_BLK_ID_BYTES 20 /* ID string length */
+
+#define VIRTIO_BLK_T_IN 0
+#define VIRTIO_BLK_T_OUT 1
+#define VIRTIO_BLK_T_FLUSH 4
+#define VIRTIO_BLK_T_GET_ID 8
+#define VIRTIO_BLK_T_DISCARD 11
+#define VIRTIO_BLK_T_WRITE_ZEROES 13
+
+#define VIRTIO_BLK_S_OK 0
+#define VIRTIO_BLK_S_IOERR 1
+#define VIRTIO_BLK_S_UNSUPP 2
+
+enum vhost_user_request {
+ VHOST_USER_NONE = 0,
+ VHOST_USER_GET_FEATURES = 1,
+ VHOST_USER_SET_FEATURES = 2,
+ VHOST_USER_SET_OWNER = 3,
+ VHOST_USER_RESET_OWNER = 4,
+ VHOST_USER_SET_MEM_TABLE = 5,
+ VHOST_USER_SET_LOG_BASE = 6,
+ VHOST_USER_SET_LOG_FD = 7,
+ VHOST_USER_SET_VRING_NUM = 8,
+ VHOST_USER_SET_VRING_ADDR = 9,
+ VHOST_USER_SET_VRING_BASE = 10,
+ VHOST_USER_GET_VRING_BASE = 11,
+ VHOST_USER_SET_VRING_KICK = 12,
+ VHOST_USER_SET_VRING_CALL = 13,
+ VHOST_USER_SET_VRING_ERR = 14,
+ VHOST_USER_GET_PROTOCOL_FEATURES = 15,
+ VHOST_USER_SET_PROTOCOL_FEATURES = 16,
+ VHOST_USER_GET_QUEUE_NUM = 17,
+ VHOST_USER_SET_VRING_ENABLE = 18,
+ VHOST_USER_MAX
+};
+
+/** Get/set config msg payload */
+struct vhost_user_config {
+ uint32_t offset;
+ uint32_t size;
+ uint32_t flags;
+ uint8_t region[VHOST_USER_MAX_CONFIG_SIZE];
+};
+
+/** Fixed-size vhost_memory struct */
+struct vhost_memory_padded {
+ uint32_t nregions;
+ uint32_t padding;
+ struct vhost_memory_region regions[VHOST_USER_MEMORY_MAX_NREGIONS];
+};
+
+struct vhost_user_msg {
+ enum vhost_user_request request;
+
+#define VHOST_USER_VERSION_MASK 0x3
+#define VHOST_USER_REPLY_MASK (0x1 << 2)
+ uint32_t flags;
+ uint32_t size; /**< the following payload size */
+ union {
+#define VHOST_USER_VRING_IDX_MASK 0xff
+#define VHOST_USER_VRING_NOFD_MASK (0x1 << 8)
+ uint64_t u64;
+ struct vhost_vring_state state;
+ struct vhost_vring_addr addr;
+ struct vhost_memory_padded memory;
+ struct vhost_user_config cfg;
+ } payload;
+} __attribute((packed));
+
+#endif
diff --git a/examples/vhost_blk/meson.build b/examples/vhost_blk/meson.build
new file mode 100644
index 000000000..857367192
--- /dev/null
+++ b/examples/vhost_blk/meson.build
@@ -0,0 +1,21 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2017 Intel Corporation
+
+# meson file, for building this example as part of a main DPDK build.
+#
+# To build this example as a standalone application with an already-installed
+# DPDK instance, use 'make'
+
+if not is_linux
+ build = false
+endif
+
+if not cc.has_header('linux/virtio_blk.h')
+ build = false
+endif
+
+deps += 'vhost'
+allow_experimental_apis = true
+sources = files(
+ 'blk.c', 'vhost_blk.c', 'vhost_blk_compat.c'
+)
diff --git a/examples/vhost_blk/vhost_blk.c b/examples/vhost_blk/vhost_blk.c
new file mode 100644
index 000000000..24807c82f
--- /dev/null
+++ b/examples/vhost_blk/vhost_blk.c
@@ -0,0 +1,1094 @@
+// SPDX-License-Identifier: BSD-3-Clause
+// Copyright(c) 2010-2017 Intel Corporation
+
+#include <stdint.h>
+#include <unistd.h>
+#include <stdbool.h>
+#include <signal.h>
+#include <assert.h>
+#include <semaphore.h>
+#include <linux/virtio_blk.h>
+#include <linux/virtio_ring.h>
+
+#include <rte_atomic.h>
+#include <rte_cycles.h>
+#include <rte_log.h>
+#include <rte_malloc.h>
+#include <rte_vhost.h>
+
+#include "vhost_blk.h"
+#include "blk_spec.h"
+
+#define VIRTQ_DESC_F_NEXT 1
+#define VIRTQ_DESC_F_AVAIL (1 << 7)
+#define VIRTQ_DESC_F_USED (1 << 15)
+
+#define MAX_TASK 12
+
+#define VHOST_BLK_FEATURES ((1ULL << VIRTIO_F_RING_PACKED) | \
+ (1ULL << VIRTIO_F_VERSION_1) |\
+ (1ULL << VIRTIO_F_NOTIFY_ON_EMPTY) | \
+ (1ULL << VHOST_USER_F_PROTOCOL_FEATURES))
+
+/* Path to folder where character device will be created. Can be set by user. */
+static char dev_pathname[PATH_MAX] = "";
+static sem_t exit_sem;
+static int g_should_stop = -1;
+
+struct vhost_blk_ctrlr *
+vhost_blk_ctrlr_find(const char *ctrlr_name)
+{
+ if (ctrlr_name == NULL)
+ return NULL;
+
+ /* currently we only support 1 socket file fd */
+ return g_vhost_ctrlr;
+}
+
+static uint64_t gpa_to_vva(int vid, uint64_t gpa, uint64_t *len)
+{
+ char path[PATH_MAX];
+ struct vhost_blk_ctrlr *ctrlr;
+ int ret = 0;
+
+ ret = rte_vhost_get_ifname(vid, path, PATH_MAX);
+ if (ret) {
+ fprintf(stderr, "Cannot get socket name\n");
+ assert(ret != 0);
+ }
+
+ ctrlr = vhost_blk_ctrlr_find(path);
+ if (!ctrlr) {
+ fprintf(stderr, "Controller is not ready\n");
+ assert(ctrlr != NULL);
+ }
+
+ assert(ctrlr->mem != NULL);
+
+ return rte_vhost_va_from_guest_pa(ctrlr->mem, gpa, len);
+}
+
+static struct vring_packed_desc *
+descriptor_get_next_packed(struct rte_vhost_vring *vq,
+ uint16_t *idx)
+{
+ if (vq->desc_packed[*idx % vq->size].flags & VIRTQ_DESC_F_NEXT) {
+ *idx += 1;
+ return &vq->desc_packed[*idx % vq->size];
+ }
+
+ return NULL;
+}
+
+static bool
+descriptor_has_next_packed(struct vring_packed_desc *cur_desc)
+{
+ return !!(cur_desc->flags & VRING_DESC_F_NEXT);
+}
+
+static bool
+descriptor_is_wr_packed(struct vring_packed_desc *cur_desc)
+{
+ return !!(cur_desc->flags & VRING_DESC_F_WRITE);
+}
+
+static struct rte_vhost_inflight_desc_packed *
+inflight_desc_get_next(struct rte_vhost_inflight_info_packed *inflight_packed,
+ struct rte_vhost_inflight_desc_packed *cur_desc)
+{
+ if (!!(cur_desc->flags & VIRTQ_DESC_F_NEXT))
+ return &inflight_packed->desc[cur_desc->next];
+
+ return NULL;
+}
+
+static bool
+inflight_desc_has_next(struct rte_vhost_inflight_desc_packed *cur_desc)
+{
+ return !!(cur_desc->flags & VRING_DESC_F_NEXT);
+}
+
+static bool
+inflight_desc_is_wr(struct rte_vhost_inflight_desc_packed *cur_desc)
+{
+ return !!(cur_desc->flags & VRING_DESC_F_WRITE);
+}
+
+static void
+inflight_process_payload_chain_packed(struct inflight_blk_task *task)
+{
+ void *data;
+ uint64_t chunck_len;
+ struct vhost_blk_task *blk_task;
+ struct rte_vhost_inflight_desc_packed *desc;
+
+ blk_task = &task->blk_task;
+ blk_task->iovs_cnt = 0;
+
+ do {
+ desc = task->inflight_desc;
+ chunck_len = desc->len;
+ data = (void *)(uintptr_t)gpa_to_vva(blk_task->bdev->vid,
+ desc->addr,
+ &chunck_len);
+ if (!data || chunck_len != desc->len) {
+ fprintf(stderr, "failed to translate desc address.\n");
+ return;
+ }
+
+ blk_task->iovs[blk_task->iovs_cnt].iov_base = data;
+ blk_task->iovs[blk_task->iovs_cnt].iov_len = desc->len;
+ blk_task->data_len += desc->len;
+ blk_task->iovs_cnt++;
+ task->inflight_desc = inflight_desc_get_next(
+ task->inflight_packed, desc);
+ } while (inflight_desc_has_next(task->inflight_desc));
+
+ chunck_len = task->inflight_desc->len;
+ blk_task->status = (void *)(uintptr_t)gpa_to_vva(
+ blk_task->bdev->vid, task->inflight_desc->addr, &chunck_len);
+ if (!blk_task->status || chunck_len != task->inflight_desc->len)
+ fprintf(stderr, "failed to translate desc address.\n");
+}
+
+static void
+inflight_submit_completion_packed(struct inflight_blk_task *task,
+ uint32_t q_idx, uint16_t *used_id,
+ bool *used_wrap_counter)
+{
+ struct vhost_blk_ctrlr *ctrlr;
+ struct rte_vhost_vring *vq;
+ struct vring_packed_desc *desc;
+ int ret;
+
+ ctrlr = vhost_blk_ctrlr_find(dev_pathname);
+ vq = task->blk_task.vq;
+
+ ret = rte_vhost_set_last_inflight_io_packed(ctrlr->bdev->vid, q_idx,
+ task->blk_task.head_idx);
+ if (ret != 0)
+ fprintf(stderr, "failed to set last inflight io\n");
+
+ desc = &vq->desc_packed[*used_id];
+ desc->id = task->blk_task.buffer_id;
+ rte_smp_mb();
+ if (*used_wrap_counter)
+ desc->flags |= VIRTQ_DESC_F_AVAIL | VIRTQ_DESC_F_USED;
+ else
+ desc->flags &= ~(VIRTQ_DESC_F_AVAIL | VIRTQ_DESC_F_USED);
+ rte_smp_mb();
+
+ *used_id += task->blk_task.iovs_cnt + 2;
+ if (*used_id >= vq->size) {
+ *used_id -= vq->size;
+ *used_wrap_counter = !(*used_wrap_counter);
+ }
+
+ ret = rte_vhost_clr_inflight_desc_packed(ctrlr->bdev->vid, q_idx,
+ task->blk_task.head_idx);
+ if (ret != 0)
+ fprintf(stderr, "failed to clear inflight io\n");
+
+ /* Send an interrupt back to the guest VM so that it knows
+ * a completion is ready to be processed.
+ */
+ rte_vhost_vring_call(task->blk_task.bdev->vid, q_idx);
+}
+
+static void
+submit_completion_packed(struct vhost_blk_task *task, uint32_t q_idx,
+ uint16_t *used_id, bool *used_wrap_counter)
+{
+ struct vhost_blk_ctrlr *ctrlr;
+ struct rte_vhost_vring *vq;
+ struct vring_packed_desc *desc;
+ int ret;
+
+ ctrlr = vhost_blk_ctrlr_find(dev_pathname);
+ vq = task->vq;
+
+ ret = rte_vhost_set_last_inflight_io_packed(ctrlr->bdev->vid, q_idx,
+ task->inflight_idx);
+ if (ret != 0)
+ fprintf(stderr, "failed to set last inflight io\n");
+
+ desc = &vq->desc_packed[*used_id];
+ desc->id = task->buffer_id;
+ rte_smp_mb();
+ if (*used_wrap_counter)
+ desc->flags |= VIRTQ_DESC_F_AVAIL | VIRTQ_DESC_F_USED;
+ else
+ desc->flags &= ~(VIRTQ_DESC_F_AVAIL | VIRTQ_DESC_F_USED);
+ rte_smp_mb();
+
+ *used_id += task->iovs_cnt + 2;
+ if (*used_id >= vq->size) {
+ *used_id -= vq->size;
+ *used_wrap_counter = !(*used_wrap_counter);
+ }
+
+ ret = rte_vhost_clr_inflight_desc_packed(ctrlr->bdev->vid, q_idx,
+ task->inflight_idx);
+ if (ret != 0)
+ fprintf(stderr, "failed to clear inflight io\n");
+
+ /* Send an interrupt back to the guest VM so that it knows
+ * a completion is ready to be processed.
+ */
+ rte_vhost_vring_call(task->bdev->vid, q_idx);
+}
+
+static void
+vhost_process_payload_chain_packed(struct vhost_blk_task *task,
+ uint16_t *idx)
+{
+ void *data;
+ uint64_t chunck_len;
+
+ task->iovs_cnt = 0;
+
+ do {
+ chunck_len = task->desc_packed->len;
+ data = (void *)(uintptr_t)gpa_to_vva(task->bdev->vid,
+ task->desc_packed->addr,
+ &chunck_len);
+ if (!data || chunck_len != task->desc_packed->len) {
+ fprintf(stderr, "failed to translate desc address.\n");
+ return;
+ }
+
+ task->iovs[task->iovs_cnt].iov_base = data;
+ task->iovs[task->iovs_cnt].iov_len = task->desc_packed->len;
+ task->data_len += task->desc_packed->len;
+ task->iovs_cnt++;
+ task->desc_packed = descriptor_get_next_packed(task->vq, idx);
+ } while (descriptor_has_next_packed(task->desc_packed));
+
+ task->last_idx = *idx % task->vq->size;
+ chunck_len = task->desc_packed->len;
+ task->status = (void *)(uintptr_t)gpa_to_vva(task->bdev->vid,
+ task->desc_packed->addr,
+ &chunck_len);
+ if (!task->status || chunck_len != task->desc_packed->len)
+ fprintf(stderr, "failed to translate desc address.\n");
+}
+
+
+static int
+descriptor_is_available(struct rte_vhost_vring *vring, uint16_t idx,
+ bool avail_wrap_counter)
+{
+ uint16_t flags = vring->desc_packed[idx].flags;
+
+ return ((!!(flags & VIRTQ_DESC_F_AVAIL) == avail_wrap_counter) &&
+ (!!(flags & VIRTQ_DESC_F_USED) != avail_wrap_counter));
+}
+
+static void
+process_requestq_packed(struct vhost_blk_ctrlr *ctrlr, uint32_t q_idx)
+{
+ bool avail_wrap_counter, used_wrap_counter;
+ uint16_t avail_idx, used_idx;
+ int ret;
+ uint64_t chunck_len;
+ struct vhost_blk_queue *blk_vq;
+ struct rte_vhost_vring *vq;
+ struct vhost_blk_task *task;
+
+ blk_vq = &ctrlr->bdev->queues[q_idx];
+ vq = &blk_vq->vq;
+
+ avail_idx = blk_vq->last_avail_idx;
+ avail_wrap_counter = blk_vq->avail_wrap_counter;
+ used_idx = blk_vq->last_used_idx;
+ used_wrap_counter = blk_vq->used_wrap_counter;
+
+ task = rte_zmalloc(NULL, sizeof(*task), 0);
+ assert(task != NULL);
+ task->vq = vq;
+ task->bdev = ctrlr->bdev;
+
+ while (descriptor_is_available(vq, avail_idx, avail_wrap_counter)) {
+ task->head_idx = avail_idx;
+ task->desc_packed = &task->vq->desc_packed[task->head_idx];
+ task->iovs_cnt = 0;
+ task->data_len = 0;
+ task->req = NULL;
+ task->status = NULL;
+
+ /* does not support indirect descriptors */
+ assert((task->desc_packed->flags & VRING_DESC_F_INDIRECT) == 0);
+
+ chunck_len = task->desc_packed->len;
+ task->req = (void *)(uintptr_t)gpa_to_vva(task->bdev->vid,
+ task->desc_packed->addr, &chunck_len);
+ if (!task->req || chunck_len != task->desc_packed->len) {
+ fprintf(stderr, "failed to translate desc address.\n");
+ rte_free(task);
+ return;
+ }
+
+ task->desc_packed = descriptor_get_next_packed(task->vq,
+ &avail_idx);
+ assert(task->desc_packed != NULL);
+ if (!descriptor_has_next_packed(task->desc_packed)) {
+ task->dxfer_dir = BLK_DIR_NONE;
+ task->last_idx = avail_idx % vq->size;
+ chunck_len = task->desc_packed->len;
+ task->status = (void *)(uintptr_t)
+ gpa_to_vva(task->bdev->vid,
+ task->desc_packed->addr,
+ &chunck_len);
+ if (!task->status ||
+ chunck_len != task->desc_packed->len) {
+ fprintf(stderr,
+ "failed to translate desc address.\n");
+ rte_free(task);
+ return;
+ }
+ } else {
+ task->readtype = descriptor_is_wr_packed(
+ task->desc_packed);
+ vhost_process_payload_chain_packed(task, &avail_idx);
+ }
+ task->buffer_id = vq->desc_packed[task->last_idx].id;
+ rte_vhost_set_inflight_desc_packed(ctrlr->bdev->vid, q_idx,
+ task->head_idx,
+ task->last_idx,
+ &task->inflight_idx);
+
+ if (++avail_idx >= vq->size) {
+ avail_idx -= vq->size;
+ avail_wrap_counter = !avail_wrap_counter;
+ }
+ blk_vq->last_avail_idx = avail_idx;
+ blk_vq->avail_wrap_counter = avail_wrap_counter;
+
+ ret = vhost_bdev_process_blk_commands(ctrlr->bdev, task);
+ if (ret) {
+ /* invalid response */
+ *task->status = VIRTIO_BLK_S_IOERR;
+ } else {
+ /* successfully */
+ *task->status = VIRTIO_BLK_S_OK;
+ }
+
+ submit_completion_packed(task, q_idx, &used_idx,
+ &used_wrap_counter);
+ blk_vq->last_used_idx = used_idx;
+ blk_vq->used_wrap_counter = used_wrap_counter;
+ }
+
+ rte_free(task);
+}
+
+static void
+submit_inflight_vq_packed(struct vhost_blk_ctrlr *ctrlr,
+ uint16_t q_idx)
+{
+ bool used_wrap_counter;
+ int req_idx, ret;
+ uint16_t used_idx;
+ uint64_t chunck_len;
+ struct vhost_blk_queue *blk_vq;
+ struct rte_vhost_ring_inflight *inflight_vq;
+ struct rte_vhost_resubmit_info *resubmit_info;
+ struct rte_vhost_vring *vq;
+ struct inflight_blk_task *task;
+ struct vhost_blk_task *blk_task;
+ struct rte_vhost_inflight_info_packed *inflight_info;
+
+ blk_vq = &ctrlr->bdev->queues[q_idx];
+ vq = &blk_vq->vq;
+ inflight_vq = &blk_vq->inflight_vq;
+ resubmit_info = inflight_vq->resubmit_inflight;
+ inflight_info = inflight_vq->inflight_packed;
+ used_idx = blk_vq->last_used_idx;
+ used_wrap_counter = blk_vq->used_wrap_counter;
+
+ task = rte_malloc(NULL, sizeof(*task), 0);
+ if (!task) {
+ fprintf(stderr, "failed to allocate memory\n");
+ return;
+ }
+ blk_task = &task->blk_task;
+ blk_task->vq = vq;
+ blk_task->bdev = ctrlr->bdev;
+ task->inflight_packed = inflight_vq->inflight_packed;
+
+ while (resubmit_info->resubmit_num-- > 0) {
+ req_idx = resubmit_info->resubmit_num;
+ blk_task->head_idx =
+ resubmit_info->resubmit_list[req_idx].index;
+ task->inflight_desc =
+ &inflight_info->desc[blk_task->head_idx];
+ task->blk_task.iovs_cnt = 0;
+ task->blk_task.data_len = 0;
+ task->blk_task.req = NULL;
+ task->blk_task.status = NULL;
+
+ /* update the avail idx too
+ * as it's initial value equals to used idx
+ */
+ blk_vq->last_avail_idx += task->inflight_desc->num;
+ if (blk_vq->last_avail_idx >= vq->size) {
+ blk_vq->last_avail_idx -= vq->size;
+ blk_vq->avail_wrap_counter =
+ !blk_vq->avail_wrap_counter;
+ }
+
+ /* does not support indirect descriptors */
+ assert(task->inflight_desc != NULL);
+ assert((task->inflight_desc->flags &
+ VRING_DESC_F_INDIRECT) == 0);
+
+ chunck_len = task->inflight_desc->len;
+ blk_task->req = (void *)(uintptr_t)
+ gpa_to_vva(blk_task->bdev->vid,
+ task->inflight_desc->addr,
+ &chunck_len);
+ if (!blk_task->req ||
+ chunck_len != task->inflight_desc->len) {
+ fprintf(stderr, "failed to translate desc address.\n");
+ rte_free(task);
+ return;
+ }
+
+ task->inflight_desc = inflight_desc_get_next(
+ task->inflight_packed, task->inflight_desc);
+ assert(task->inflight_desc != NULL);
+ if (!inflight_desc_has_next(task->inflight_desc)) {
+ blk_task->dxfer_dir = BLK_DIR_NONE;
+ chunck_len = task->inflight_desc->len;
+ blk_task->status = (void *)(uintptr_t)
+ gpa_to_vva(blk_task->bdev->vid,
+ task->inflight_desc->addr,
+ &chunck_len);
+ if (!blk_task->status ||
+ chunck_len != task->inflight_desc->len) {
+ fprintf(stderr,
+ "failed to translate desc address.\n");
+ rte_free(task);
+ return;
+ }
+ } else {
+ blk_task->readtype =
+ inflight_desc_is_wr(task->inflight_desc);
+ inflight_process_payload_chain_packed(task);
+ }
+
+ blk_task->buffer_id = task->inflight_desc->id;
+
+ ret = vhost_bdev_process_blk_commands(ctrlr->bdev, blk_task);
+ if (ret)
+ /* invalid response */
+ *blk_task->status = VIRTIO_BLK_S_IOERR;
+ else
+ /* successfully */
+ *blk_task->status = VIRTIO_BLK_S_OK;
+
+ inflight_submit_completion_packed(task, q_idx, &used_idx,
+ &used_wrap_counter);
+
+ blk_vq->last_used_idx = used_idx;
+ blk_vq->used_wrap_counter = used_wrap_counter;
+ }
+
+ rte_free(task);
+}
+
+static struct vring_desc *
+descriptor_get_next_split(struct vring_desc *vq_desc,
+ struct vring_desc *cur_desc)
+{
+ return &vq_desc[cur_desc->next];
+}
+
+static bool
+descriptor_has_next_split(struct vring_desc *cur_desc)
+{
+ return !!(cur_desc->flags & VRING_DESC_F_NEXT);
+}
+
+static bool
+descriptor_is_wr_split(struct vring_desc *cur_desc)
+{
+ return !!(cur_desc->flags & VRING_DESC_F_WRITE);
+}
+
+static void
+vhost_process_payload_chain_split(struct vhost_blk_task *task)
+{
+ void *data;
+ uint64_t chunck_len;
+
+ task->iovs_cnt = 0;
+
+ do {
+ chunck_len = task->desc_split->len;
+ data = (void *)(uintptr_t)gpa_to_vva(task->bdev->vid,
+ task->desc_split->addr,
+ &chunck_len);
+ if (!data || chunck_len != task->desc_split->len) {
+ fprintf(stderr, "failed to translate desc address.\n");
+ return;
+ }
+
+ task->iovs[task->iovs_cnt].iov_base = data;
+ task->iovs[task->iovs_cnt].iov_len = task->desc_split->len;
+ task->data_len += task->desc_split->len;
+ task->iovs_cnt++;
+ task->desc_split =
+ descriptor_get_next_split(task->vq->desc, task->desc_split);
+ } while (descriptor_has_next_split(task->desc_split));
+
+ chunck_len = task->desc_split->len;
+ task->status = (void *)(uintptr_t)gpa_to_vva(task->bdev->vid,
+ task->desc_split->addr,
+ &chunck_len);
+ if (!task->status || chunck_len != task->desc_split->len)
+ fprintf(stderr, "failed to translate desc address.\n");
+}
+
+static void
+submit_completion_split(struct vhost_blk_task *task, uint32_t vid,
+ uint32_t q_idx)
+{
+ struct rte_vhost_vring *vq;
+ struct vring_used *used;
+
+ vq = task->vq;
+ used = vq->used;
+
+ rte_vhost_set_last_inflight_io_split(vid, q_idx, task->req_idx);
+
+ /* Fill out the next entry in the "used" ring. id = the
+ * index of the descriptor that contained the blk request.
+ * len = the total amount of data transferred for the blk
+ * request. We must report the correct len, for variable
+ * length blk CDBs, where we may return less data than
+ * allocated by the guest VM.
+ */
+ used->ring[used->idx & (vq->size - 1)].id = task->req_idx;
+ used->ring[used->idx & (vq->size - 1)].len = task->data_len;
+ rte_smp_mb();
+ used->idx++;
+ rte_smp_mb();
+
+ rte_vhost_clr_inflight_desc_split(vid, q_idx, used->idx, task->req_idx);
+
+ /* Send an interrupt back to the guest VM so that it knows
+ * a completion is ready to be processed.
+ */
+ rte_vhost_vring_call(task->bdev->vid, q_idx);
+}
+
+static void
+submit_inflight_vq_split(struct vhost_blk_ctrlr *ctrlr,
+ uint32_t q_idx)
+{
+ struct vhost_blk_queue *blk_vq;
+ struct rte_vhost_ring_inflight *inflight_vq;
+ struct rte_vhost_resubmit_info *resubmit_inflight;
+ struct rte_vhost_resubmit_desc *resubmit_list;
+ struct vhost_blk_task *task;
+ int req_idx;
+ uint64_t chunck_len;
+ int ret;
+
+ blk_vq = &ctrlr->bdev->queues[q_idx];
+ inflight_vq = &blk_vq->inflight_vq;
+ resubmit_inflight = inflight_vq->resubmit_inflight;
+ resubmit_list = resubmit_inflight->resubmit_list;
+
+ task = rte_zmalloc(NULL, sizeof(*task), 0);
+ assert(task != NULL);
+
+ task->ctrlr = ctrlr;
+ task->bdev = ctrlr->bdev;
+ task->vq = &blk_vq->vq;
+
+ while (resubmit_inflight->resubmit_num-- > 0) {
+ req_idx = resubmit_list[resubmit_inflight->resubmit_num].index;
+ task->req_idx = req_idx;
+ task->desc_split = &task->vq->desc[task->req_idx];
+ task->iovs_cnt = 0;
+ task->data_len = 0;
+ task->req = NULL;
+ task->status = NULL;
+
+ /* does not support indirect descriptors */
+ assert(task->desc_split != NULL);
+ assert((task->desc_split->flags & VRING_DESC_F_INDIRECT) == 0);
+
+ chunck_len = task->desc_split->len;
+ task->req = (void *)(uintptr_t)gpa_to_vva(task->bdev->vid,
+ task->desc_split->addr, &chunck_len);
+ if (!task->req || chunck_len != task->desc_split->len) {
+ fprintf(stderr, "failed to translate desc address.\n");
+ rte_free(task);
+ return;
+ }
+
+ task->desc_split = descriptor_get_next_split(task->vq->desc,
+ task->desc_split);
+ if (!descriptor_has_next_split(task->desc_split)) {
+ task->dxfer_dir = BLK_DIR_NONE;
+ chunck_len = task->desc_split->len;
+ task->status = (void *)(uintptr_t)
+ gpa_to_vva(task->bdev->vid,
+ task->desc_split->addr,
+ &chunck_len);
+ if (!task->status ||
+ chunck_len != task->desc_split->len) {
+ fprintf(stderr,
+ "failed to translate desc address.\n");
+ rte_free(task);
+ return;
+ }
+ } else {
+ task->readtype =
+ descriptor_is_wr_split(task->desc_split);
+ vhost_process_payload_chain_split(task);
+ }
+
+ ret = vhost_bdev_process_blk_commands(ctrlr->bdev, task);
+ if (ret) {
+ /* invalid response */
+ *task->status = VIRTIO_BLK_S_IOERR;
+ } else {
+ /* successfully */
+ *task->status = VIRTIO_BLK_S_OK;
+ }
+ submit_completion_split(task, ctrlr->bdev->vid, q_idx);
+ }
+
+ rte_free(task);
+}
+
+static void
+process_requestq_split(struct vhost_blk_ctrlr *ctrlr, uint32_t q_idx)
+{
+ int ret;
+ int req_idx;
+ uint16_t last_idx;
+ uint64_t chunck_len;
+ struct vhost_blk_queue *blk_vq;
+ struct rte_vhost_vring *vq;
+ struct vhost_blk_task *task;
+
+ blk_vq = &ctrlr->bdev->queues[q_idx];
+ vq = &blk_vq->vq;
+
+ task = rte_zmalloc(NULL, sizeof(*task), 0);
+ assert(task != NULL);
+ task->ctrlr = ctrlr;
+ task->bdev = ctrlr->bdev;
+ task->vq = vq;
+
+ while (vq->avail->idx != blk_vq->last_avail_idx) {
+ last_idx = blk_vq->last_avail_idx & (vq->size - 1);
+ req_idx = vq->avail->ring[last_idx];
+ task->req_idx = req_idx;
+ task->desc_split = &task->vq->desc[task->req_idx];
+ task->iovs_cnt = 0;
+ task->data_len = 0;
+ task->req = NULL;
+ task->status = NULL;
+
+ rte_vhost_set_inflight_desc_split(ctrlr->bdev->vid, q_idx,
+ task->req_idx);
+
+ /* does not support indirect descriptors */
+ assert((task->desc_split->flags & VRING_DESC_F_INDIRECT) == 0);
+
+ chunck_len = task->desc_split->len;
+ task->req = (void *)(uintptr_t)gpa_to_vva(task->bdev->vid,
+ task->desc_split->addr, &chunck_len);
+ if (!task->req || chunck_len != task->desc_split->len) {
+ fprintf(stderr, "failed to translate desc address.\n");
+ rte_free(task);
+ return;
+ }
+
+ task->desc_split = descriptor_get_next_split(task->vq->desc,
+ task->desc_split);
+ if (!descriptor_has_next_split(task->desc_split)) {
+ task->dxfer_dir = BLK_DIR_NONE;
+ chunck_len = task->desc_split->len;
+ task->status = (void *)(uintptr_t)
+ gpa_to_vva(task->bdev->vid,
+ task->desc_split->addr,
+ &chunck_len);
+ if (!task->status ||
+ chunck_len != task->desc_split->len) {
+ fprintf(stderr,
+ "failed to translate desc address.\n");
+ rte_free(task);
+ return;
+ }
+ } else {
+ task->readtype =
+ descriptor_is_wr_split(task->desc_split);
+ vhost_process_payload_chain_split(task);
+ }
+ blk_vq->last_avail_idx++;
+
+ ret = vhost_bdev_process_blk_commands(ctrlr->bdev, task);
+ if (ret) {
+ /* invalid response */
+ *task->status = VIRTIO_BLK_S_IOERR;
+ } else {
+ /* successfully */
+ *task->status = VIRTIO_BLK_S_OK;
+ }
+
+ submit_completion_split(task, ctrlr->bdev->vid, q_idx);
+ }
+
+ rte_free(task);
+}
+
+static void *
+ctrlr_worker(void *arg)
+{
+ struct vhost_blk_ctrlr *ctrlr = (struct vhost_blk_ctrlr *)arg;
+ struct vhost_blk_queue *blk_vq;
+ struct rte_vhost_ring_inflight *inflight_vq;
+ cpu_set_t cpuset;
+ pthread_t thread;
+ int i;
+
+ fprintf(stdout, "Ctrlr Worker Thread start\n");
+
+ if (ctrlr == NULL || ctrlr->bdev == NULL) {
+ fprintf(stderr,
+ "%s: Error, invalid argument passed to worker thread\n",
+ __func__);
+ exit(0);
+ }
+
+ thread = pthread_self();
+ CPU_ZERO(&cpuset);
+ CPU_SET(0, &cpuset);
+ pthread_setaffinity_np(thread, sizeof(cpu_set_t), &cpuset);
+
+ for (i = 0; i < NUM_OF_BLK_QUEUES; i++) {
+ blk_vq = &ctrlr->bdev->queues[i];
+ inflight_vq = &blk_vq->inflight_vq;
+ if (inflight_vq->resubmit_inflight != NULL &&
+ inflight_vq->resubmit_inflight->resubmit_num != 0) {
+ if (ctrlr->packed_ring)
+ submit_inflight_vq_packed(ctrlr, i);
+ else
+ submit_inflight_vq_split(ctrlr, i);
+ }
+ }
+
+ while (!g_should_stop && ctrlr->bdev != NULL) {
+ for (i = 0; i < NUM_OF_BLK_QUEUES; i++) {
+ if (ctrlr->packed_ring)
+ process_requestq_packed(ctrlr, i);
+ else
+ process_requestq_split(ctrlr, i);
+ }
+ }
+
+ g_should_stop = 2;
+ fprintf(stdout, "Ctrlr Worker Thread Exiting\n");
+ sem_post(&exit_sem);
+ return NULL;
+}
+
+static int
+new_device(int vid)
+{
+ struct vhost_blk_ctrlr *ctrlr;
+ struct vhost_blk_queue *blk_vq;
+ struct rte_vhost_vring *vq;
+ uint64_t features;
+ pthread_t tid;
+ int i, ret;
+
+ ctrlr = vhost_blk_ctrlr_find(dev_pathname);
+ if (!ctrlr) {
+ fprintf(stderr, "Controller is not ready\n");
+ return -1;
+ }
+
+ if (ctrlr->started)
+ return 0;
+
+ ctrlr->bdev->vid = vid;
+ ret = rte_vhost_get_negotiated_features(vid, &features);
+ if (ret) {
+ fprintf(stderr, "failed to get the negotiated features\n");
+ return -1;
+ }
+ ctrlr->packed_ring = !!(features & (1ULL << VIRTIO_F_RING_PACKED));
+
+ ret = rte_vhost_get_mem_table(vid, &ctrlr->mem);
+ if (ret)
+ fprintf(stderr, "Get Controller memory region failed\n");
+ assert(ctrlr->mem != NULL);
+
+ /* Disable Notifications and init last idx */
+ for (i = 0; i < NUM_OF_BLK_QUEUES; i++) {
+ blk_vq = &ctrlr->bdev->queues[i];
+ vq = &blk_vq->vq;
+
+ ret = rte_vhost_get_vhost_vring(ctrlr->bdev->vid, i, vq);
+ assert(ret == 0);
+
+ ret = rte_vhost_get_vring_base(ctrlr->bdev->vid, i,
+ &blk_vq->last_avail_idx,
+ &blk_vq->last_used_idx);
+ assert(ret == 0);
+
+ ret = rte_vhost_get_vhost_ring_inflight(ctrlr->bdev->vid, i,
+ &blk_vq->inflight_vq);
+ assert(ret == 0);
+
+ if (ctrlr->packed_ring) {
+ /* for the reconnection */
+ ret = rte_vhost_get_vring_base_from_inflight(
+ ctrlr->bdev->vid, i,
+ &blk_vq->last_avail_idx,
+ &blk_vq->last_used_idx);
+
+ blk_vq->avail_wrap_counter = blk_vq->last_avail_idx &
+ (1 << 15);
+ blk_vq->last_avail_idx = blk_vq->last_avail_idx &
+ 0x7fff;
+ blk_vq->used_wrap_counter = blk_vq->last_used_idx &
+ (1 << 15);
+ blk_vq->last_used_idx = blk_vq->last_used_idx &
+ 0x7fff;
+ }
+
+ rte_vhost_enable_guest_notification(vid, i, 0);
+ }
+
+ /* start polling vring */
+ g_should_stop = 0;
+ fprintf(stdout, "New Device %s, Device ID %d\n", dev_pathname, vid);
+ if (pthread_create(&tid, NULL, &ctrlr_worker, ctrlr) < 0) {
+ fprintf(stderr, "Worker Thread Started Failed\n");
+ return -1;
+ }
+
+ /* device has been started */
+ ctrlr->started = 1;
+ pthread_detach(tid);
+ return 0;
+}
+
+static void
+destroy_device(int vid)
+{
+ char path[PATH_MAX];
+ struct vhost_blk_ctrlr *ctrlr;
+ struct vhost_blk_queue *blk_vq;
+ int i, ret;
+
+ ret = rte_vhost_get_ifname(vid, path, PATH_MAX);
+ if (ret) {
+ fprintf(stderr, "Destroy Ctrlr Failed\n");
+ return;
+ }
+
+ fprintf(stdout, "Destroy %s Device ID %d\n", path, vid);
+ ctrlr = vhost_blk_ctrlr_find(path);
+ if (!ctrlr) {
+ fprintf(stderr, "Destroy Ctrlr Failed\n");
+ return;
+ }
+
+ if (!ctrlr->started)
+ return;
+
+ g_should_stop = 1;
+ while (g_should_stop != 2)
+ ;
+
+ for (i = 0; i < NUM_OF_BLK_QUEUES; i++) {
+ blk_vq = &ctrlr->bdev->queues[i];
+ if (ctrlr->packed_ring) {
+ blk_vq->last_avail_idx |= (blk_vq->avail_wrap_counter <<
+ 15);
+ blk_vq->last_used_idx |= (blk_vq->used_wrap_counter <<
+ 15);
+ }
+ rte_vhost_set_vring_base(ctrlr->bdev->vid, i,
+ blk_vq->last_avail_idx,
+ blk_vq->last_used_idx);
+ }
+
+ free(ctrlr->mem);
+
+ ctrlr->started = 0;
+ sem_wait(&exit_sem);
+}
+
+static int
+new_connection(int vid)
+{
+ /* extend the proper features for block device */
+ vhost_session_install_rte_compat_hooks(vid);
+
+ return 0;
+}
+
+struct vhost_device_ops vhost_blk_device_ops = {
+ .new_device = new_device,
+ .destroy_device = destroy_device,
+ .new_connection = new_connection,
+};
+
+static struct vhost_block_dev *
+vhost_blk_bdev_construct(const char *bdev_name,
+ const char *bdev_serial, uint32_t blk_size, uint64_t blk_cnt,
+ bool wce_enable)
+{
+ struct vhost_block_dev *bdev;
+
+ bdev = rte_zmalloc(NULL, sizeof(*bdev), RTE_CACHE_LINE_SIZE);
+ if (!bdev)
+ return NULL;
+
+ strncpy(bdev->name, bdev_name, sizeof(bdev->name));
+ strncpy(bdev->product_name, bdev_serial, sizeof(bdev->product_name));
+ bdev->blocklen = blk_size;
+ bdev->blockcnt = blk_cnt;
+ bdev->write_cache = wce_enable;
+
+ fprintf(stdout, "blocklen=%d, blockcnt=%"PRIx64"\n", bdev->blocklen,
+ bdev->blockcnt);
+
+ /* use memory as disk storage space */
+ bdev->data = rte_zmalloc(NULL, blk_cnt * blk_size, 0);
+ if (!bdev->data) {
+ fprintf(stderr, "no enough reserved huge memory for disk\n");
+ free(bdev);
+ return NULL;
+ }
+
+ return bdev;
+}
+
+static struct vhost_blk_ctrlr *
+vhost_blk_ctrlr_construct(const char *ctrlr_name)
+{
+ int ret;
+ struct vhost_blk_ctrlr *ctrlr;
+ char *path;
+ char cwd[PATH_MAX];
+
+ /* always use current directory */
+ path = getcwd(cwd, PATH_MAX);
+ if (!path) {
+ fprintf(stderr, "Cannot get current working directory\n");
+ return NULL;
+ }
+ snprintf(dev_pathname, sizeof(dev_pathname), "%s/%s", path, ctrlr_name);
+
+ if (access(dev_pathname, F_OK) != -1) {
+ if (unlink(dev_pathname) != 0)
+ rte_exit(EXIT_FAILURE, "Cannot remove %s.\n",
+ dev_pathname);
+ }
+
+ if (rte_vhost_driver_register(dev_pathname, 0) != 0) {
+ fprintf(stderr, "socket %s already exists\n", dev_pathname);
+ return NULL;
+ }
+
+ ret = rte_vhost_driver_set_features(dev_pathname, VHOST_BLK_FEATURES);
+ if (ret != 0) {
+ fprintf(stderr, "Set vhost driver features failed\n");
+ rte_vhost_driver_unregister(dev_pathname);
+ return NULL;
+ }
+
+ /* set proper features */
+ vhost_dev_install_rte_compat_hooks(dev_pathname);
+
+ ctrlr = rte_zmalloc(NULL, sizeof(*ctrlr), RTE_CACHE_LINE_SIZE);
+ if (!ctrlr) {
+ rte_vhost_driver_unregister(dev_pathname);
+ return NULL;
+ }
+
+ /* hardcoded block device information with 128MiB */
+ ctrlr->bdev = vhost_blk_bdev_construct("malloc0", "vhost_blk_malloc0",
+ 4096, 32768, 0);
+ if (!ctrlr->bdev) {
+ rte_free(ctrlr);
+ rte_vhost_driver_unregister(dev_pathname);
+ return NULL;
+ }
+
+ rte_vhost_driver_callback_register(dev_pathname,
+ &vhost_blk_device_ops);
+
+ return ctrlr;
+}
+
+static void
+signal_handler(__rte_unused int signum)
+{
+ struct vhost_blk_ctrlr *ctrlr;
+
+ if (access(dev_pathname, F_OK) == 0)
+ unlink(dev_pathname);
+
+ if (g_should_stop != -1) {
+ g_should_stop = 1;
+ while (g_should_stop != 2)
+ ;
+ }
+
+ ctrlr = vhost_blk_ctrlr_find(dev_pathname);
+ if (ctrlr != NULL) {
+ if (ctrlr->bdev != NULL) {
+ rte_free(ctrlr->bdev->data);
+ rte_free(ctrlr->bdev);
+ }
+ rte_free(ctrlr);
+ }
+
+ rte_vhost_driver_unregister(dev_pathname);
+ exit(0);
+}
+
+int main(int argc, char *argv[])
+{
+ int ret;
+
+ signal(SIGINT, signal_handler);
+
+ /* init EAL */
+ ret = rte_eal_init(argc, argv);
+ if (ret < 0)
+ rte_exit(EXIT_FAILURE, "Error with EAL initialization\n");
+
+ g_vhost_ctrlr = vhost_blk_ctrlr_construct("vhost.socket");
+ if (g_vhost_ctrlr == NULL) {
+ fprintf(stderr, "Construct vhost blk controller failed\n");
+ return 0;
+ }
+
+ if (sem_init(&exit_sem, 0, 0) < 0) {
+ fprintf(stderr, "Error init exit_sem\n");
+ return -1;
+ }
+
+ rte_vhost_driver_start(dev_pathname);
+
+ /* loop for exit the application */
+ while (1)
+ sleep(1);
+
+ return 0;
+}
+
diff --git a/examples/vhost_blk/vhost_blk.h b/examples/vhost_blk/vhost_blk.h
new file mode 100644
index 000000000..933e2b7c5
--- /dev/null
+++ b/examples/vhost_blk/vhost_blk.h
@@ -0,0 +1,127 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2017 Intel Corporation
+ */
+
+#ifndef _VHOST_BLK_H_
+#define _VHOST_BLK_H_
+
+#include <stdio.h>
+#include <sys/uio.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <linux/virtio_blk.h>
+#include <linux/virtio_ring.h>
+
+#include <rte_vhost.h>
+
+#ifndef VIRTIO_F_RING_PACKED
+#define VIRTIO_F_RING_PACKED 34
+
+struct vring_packed_desc {
+ /* Buffer Address. */
+ __le64 addr;
+ /* Buffer Length. */
+ __le32 len;
+ /* Buffer ID. */
+ __le16 id;
+ /* The flags depending on descriptor type. */
+ __le16 flags;
+};
+#endif
+
+struct vhost_blk_queue {
+ struct rte_vhost_vring vq;
+ struct rte_vhost_ring_inflight inflight_vq;
+ uint16_t last_avail_idx;
+ uint16_t last_used_idx;
+ bool avail_wrap_counter;
+ bool used_wrap_counter;
+};
+
+#define NUM_OF_BLK_QUEUES 1
+
+#define min(a, b) (((a) < (b)) ? (a) : (b))
+
+struct vhost_block_dev {
+ /** ID for vhost library. */
+ int vid;
+ /** Queues for the block device */
+ struct vhost_blk_queue queues[NUM_OF_BLK_QUEUES];
+ /** Unique name for this block device. */
+ char name[64];
+
+ /** Unique product name for this kind of block device. */
+ char product_name[256];
+
+ /** Size in bytes of a logical block for the backend */
+ uint32_t blocklen;
+
+ /** Number of blocks */
+ uint64_t blockcnt;
+
+ /** write cache enabled, not used at the moment */
+ int write_cache;
+
+ /** use memory as disk storage space */
+ uint8_t *data;
+};
+
+struct vhost_blk_ctrlr {
+ uint8_t started;
+ uint8_t packed_ring;
+ uint8_t need_restart;
+ /** Only support 1 LUN for the example */
+ struct vhost_block_dev *bdev;
+ /** VM memory region */
+ struct rte_vhost_memory *mem;
+} __rte_cache_aligned;
+
+#define VHOST_BLK_MAX_IOVS 128
+
+enum blk_data_dir {
+ BLK_DIR_NONE = 0,
+ BLK_DIR_TO_DEV = 1,
+ BLK_DIR_FROM_DEV = 2,
+};
+
+struct vhost_blk_task {
+ uint8_t readtype;
+ uint8_t req_idx;
+ uint16_t head_idx;
+ uint16_t last_idx;
+ uint16_t inflight_idx;
+ uint16_t buffer_id;
+ uint32_t dxfer_dir;
+ uint32_t data_len;
+ struct virtio_blk_outhdr *req;
+
+ volatile uint8_t *status;
+
+ struct iovec iovs[VHOST_BLK_MAX_IOVS];
+ uint32_t iovs_cnt;
+ struct vring_packed_desc *desc_packed;
+ struct vring_desc *desc_split;
+ struct rte_vhost_vring *vq;
+ struct vhost_block_dev *bdev;
+ struct vhost_blk_ctrlr *ctrlr;
+};
+
+struct inflight_blk_task {
+ struct vhost_blk_task blk_task;
+ struct rte_vhost_inflight_desc_packed *inflight_desc;
+ struct rte_vhost_inflight_info_packed *inflight_packed;
+};
+
+struct vhost_blk_ctrlr *g_vhost_ctrlr;
+struct vhost_device_ops vhost_blk_device_ops;
+
+int vhost_bdev_process_blk_commands(struct vhost_block_dev *bdev,
+ struct vhost_blk_task *task);
+
+void vhost_session_install_rte_compat_hooks(uint32_t vid);
+
+void vhost_dev_install_rte_compat_hooks(const char *path);
+
+struct vhost_blk_ctrlr *vhost_blk_ctrlr_find(const char *ctrlr_name);
+
+#endif /* _VHOST_blk_H_ */
diff --git a/examples/vhost_blk/vhost_blk_compat.c b/examples/vhost_blk/vhost_blk_compat.c
new file mode 100644
index 000000000..16a6f0402
--- /dev/null
+++ b/examples/vhost_blk/vhost_blk_compat.c
@@ -0,0 +1,173 @@
+// SPDX-License-Identifier: BSD-3-Clause
+// Copyright(c) 2010-2017 Intel Corporation
+
+#ifndef _VHOST_BLK_COMPAT_H_
+#define _VHOST_BLK_COMPAT_H_
+
+#include <sys/uio.h>
+#include <stdint.h>
+#include <linux/virtio_blk.h>
+#include <linux/virtio_ring.h>
+
+#include <rte_vhost.h>
+#include "vhost_blk.h"
+#include "blk_spec.h"
+
+#define VHOST_MAX_VQUEUES 256
+#define SPDK_VHOST_MAX_VQ_SIZE 1024
+
+#define VHOST_USER_GET_CONFIG 24
+#define VHOST_USER_SET_CONFIG 25
+
+static int
+vhost_blk_get_config(struct vhost_block_dev *bdev, uint8_t *config,
+ uint32_t len)
+{
+ struct virtio_blk_config blkcfg;
+ uint32_t blk_size;
+ uint64_t blkcnt;
+
+ if (bdev == NULL) {
+ /* We can't just return -1 here as this GET_CONFIG message might
+ * be caused by a QEMU VM reboot. Returning -1 will indicate an
+ * error to QEMU, who might then decide to terminate itself.
+ * We don't want that. A simple reboot shouldn't break the
+ * system.
+ *
+ * Presenting a block device with block size 0 and block count 0
+ * doesn't cause any problems on QEMU side and the virtio-pci
+ * device is even still available inside the VM, but there will
+ * be no block device created for it - the kernel drivers will
+ * silently reject it.
+ */
+ blk_size = 0;
+ blkcnt = 0;
+ } else {
+ blk_size = bdev->blocklen;
+ blkcnt = bdev->blockcnt;
+ }
+
+ memset(&blkcfg, 0, sizeof(blkcfg));
+ blkcfg.blk_size = blk_size;
+ /* minimum I/O size in blocks */
+ blkcfg.min_io_size = 1;
+ /* expressed in 512 Bytes sectors */
+ blkcfg.capacity = (blkcnt * blk_size) / 512;
+ /* QEMU can overwrite this value when started */
+ blkcfg.num_queues = VHOST_MAX_VQUEUES;
+
+ fprintf(stdout, "block device:blk_size = %d, blkcnt = %"PRIx64"\n",
+ blk_size, blkcnt);
+
+ memcpy(config, &blkcfg, min(len, sizeof(blkcfg)));
+
+ return 0;
+}
+
+static enum rte_vhost_msg_result
+extern_vhost_pre_msg_handler(int vid, void *_msg)
+{
+ char path[PATH_MAX];
+ struct vhost_blk_ctrlr *ctrlr;
+ struct vhost_user_msg *msg = _msg;
+ int ret;
+
+ ret = rte_vhost_get_ifname(vid, path, PATH_MAX);
+ if (ret) {
+ fprintf(stderr, "Cannot get socket name\n");
+ return -1;
+ }
+
+ ctrlr = vhost_blk_ctrlr_find(path);
+ if (!ctrlr) {
+ fprintf(stderr, "Controller is not ready\n");
+ return -1;
+ }
+
+ switch ((int)msg->request) {
+ case VHOST_USER_GET_VRING_BASE:
+ case VHOST_USER_SET_VRING_BASE:
+ case VHOST_USER_SET_VRING_ADDR:
+ case VHOST_USER_SET_VRING_NUM:
+ case VHOST_USER_SET_VRING_KICK:
+ case VHOST_USER_SET_VRING_CALL:
+ case VHOST_USER_SET_MEM_TABLE:
+ break;
+ case VHOST_USER_GET_CONFIG: {
+ int rc = 0;
+
+ rc = vhost_blk_get_config(ctrlr->bdev,
+ msg->payload.cfg.region,
+ msg->payload.cfg.size);
+ if (rc != 0)
+ msg->size = 0;
+
+ return RTE_VHOST_MSG_RESULT_REPLY;
+ }
+ case VHOST_USER_SET_CONFIG:
+ default:
+ break;
+ }
+
+ return RTE_VHOST_MSG_RESULT_NOT_HANDLED;
+}
+
+static enum rte_vhost_msg_result
+extern_vhost_post_msg_handler(int vid, void *_msg)
+{
+ char path[PATH_MAX];
+ struct vhost_blk_ctrlr *ctrlr;
+ struct vhost_user_msg *msg = _msg;
+ int ret;
+
+ ret = rte_vhost_get_ifname(vid, path, PATH_MAX);
+ if (ret) {
+ fprintf(stderr, "Cannot get socket name\n");
+ return -1;
+ }
+
+ ctrlr = vhost_blk_ctrlr_find(path);
+ if (!ctrlr) {
+ fprintf(stderr, "Controller is not ready\n");
+ return -1;
+ }
+
+ switch (msg->request) {
+ case VHOST_USER_SET_FEATURES:
+ case VHOST_USER_SET_VRING_KICK:
+ default:
+ break;
+ }
+
+ return RTE_VHOST_MSG_RESULT_NOT_HANDLED;
+}
+
+struct rte_vhost_user_extern_ops g_extern_vhost_ops = {
+ .pre_msg_handle = extern_vhost_pre_msg_handler,
+ .post_msg_handle = extern_vhost_post_msg_handler,
+};
+
+void
+vhost_session_install_rte_compat_hooks(uint32_t vid)
+{
+ int rc;
+
+ rc = rte_vhost_extern_callback_register(vid, &g_extern_vhost_ops, NULL);
+ if (rc != 0)
+ fprintf(stderr,
+ "rte_vhost_extern_callback_register() failed for vid = %d\n",
+ vid);
+}
+
+void
+vhost_dev_install_rte_compat_hooks(const char *path)
+{
+ uint64_t protocol_features = 0;
+
+ rte_vhost_driver_get_protocol_features(path, &protocol_features);
+ protocol_features |= (1ULL << VHOST_USER_PROTOCOL_F_CONFIG);
+ protocol_features |= (1ULL << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD);
+ rte_vhost_driver_set_protocol_features(path, protocol_features);
+}
+
+#endif
--
2.17.2
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [dpdk-dev] [PATCH v6] vhost: add vhost-user-blk example which support inflight
2019-11-04 16:36 ` [dpdk-dev] [PATCH v6] " Jin Yu
@ 2019-11-06 20:26 ` Maxime Coquelin
2019-11-06 21:01 ` Maxime Coquelin
1 sibling, 0 replies; 27+ messages in thread
From: Maxime Coquelin @ 2019-11-06 20:26 UTC (permalink / raw)
To: Jin Yu, Thomas Monjalon, John McNamara, Marko Kovacevic,
Tiwei Bie, Zhihong Wang
Cc: dev
On 11/4/19 5:36 PM, Jin Yu wrote:
> A vhost-user-blk example that support inflight feature. It uses the
> new APIs that introduced in the first patch, so it can show how these
> APIs work to support inflight feature.
>
> Signed-off-by: Jin Yu <jin.yu@intel.com>
> ---
> v1 - add the case.
> v2 - add the rte_vhost prefix.
> v3 - add packed ring support
> v4 - fix build, MAINTAINERS and add guides
> v5 - fix ci/intel-compilation errors
> v6 - fix ci/intel-compilation errors
> ---
> MAINTAINERS | 2 +
> doc/guides/sample_app_ug/index.rst | 1 +
> doc/guides/sample_app_ug/vhost_blk.rst | 63 ++
> examples/meson.build | 2 +-
> examples/vhost_blk/Makefile | 68 ++
> examples/vhost_blk/blk.c | 125 +++
> examples/vhost_blk/blk_spec.h | 95 ++
> examples/vhost_blk/meson.build | 21 +
> examples/vhost_blk/vhost_blk.c | 1094 ++++++++++++++++++++++++
> examples/vhost_blk/vhost_blk.h | 127 +++
> examples/vhost_blk/vhost_blk_compat.c | 173 ++++
> 11 files changed, 1770 insertions(+), 1 deletion(-)
> create mode 100644 doc/guides/sample_app_ug/vhost_blk.rst
> create mode 100644 examples/vhost_blk/Makefile
> create mode 100644 examples/vhost_blk/blk.c
> create mode 100644 examples/vhost_blk/blk_spec.h
> create mode 100644 examples/vhost_blk/meson.build
> create mode 100644 examples/vhost_blk/vhost_blk.c
> create mode 100644 examples/vhost_blk/vhost_blk.h
> create mode 100644 examples/vhost_blk/vhost_blk_compat.c
As mentioned on the original series, this example is to replace
the vhost-sci example that was just removed.
So I'm in favor of accepting it:
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Thanks,
Maxime
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [dpdk-dev] [PATCH v6] vhost: add vhost-user-blk example which support inflight
2019-11-04 16:36 ` [dpdk-dev] [PATCH v6] " Jin Yu
2019-11-06 20:26 ` Maxime Coquelin
@ 2019-11-06 21:01 ` Maxime Coquelin
1 sibling, 0 replies; 27+ messages in thread
From: Maxime Coquelin @ 2019-11-06 21:01 UTC (permalink / raw)
To: Jin Yu, Thomas Monjalon, John McNamara, Marko Kovacevic,
Tiwei Bie, Zhihong Wang
Cc: dev
On 11/4/19 5:36 PM, Jin Yu wrote:
> A vhost-user-blk example that support inflight feature. It uses the
> new APIs that introduced in the first patch, so it can show how these
> APIs work to support inflight feature.
>
> Signed-off-by: Jin Yu <jin.yu@intel.com>
> ---
> v1 - add the case.
> v2 - add the rte_vhost prefix.
> v3 - add packed ring support
> v4 - fix build, MAINTAINERS and add guides
> v5 - fix ci/intel-compilation errors
> v6 - fix ci/intel-compilation errors
> ---
> MAINTAINERS | 2 +
> doc/guides/sample_app_ug/index.rst | 1 +
> doc/guides/sample_app_ug/vhost_blk.rst | 63 ++
> examples/meson.build | 2 +-
> examples/vhost_blk/Makefile | 68 ++
> examples/vhost_blk/blk.c | 125 +++
> examples/vhost_blk/blk_spec.h | 95 ++
> examples/vhost_blk/meson.build | 21 +
> examples/vhost_blk/vhost_blk.c | 1094 ++++++++++++++++++++++++
> examples/vhost_blk/vhost_blk.h | 127 +++
> examples/vhost_blk/vhost_blk_compat.c | 173 ++++
> 11 files changed, 1770 insertions(+), 1 deletion(-)
> create mode 100644 doc/guides/sample_app_ug/vhost_blk.rst
> create mode 100644 examples/vhost_blk/Makefile
> create mode 100644 examples/vhost_blk/blk.c
> create mode 100644 examples/vhost_blk/blk_spec.h
> create mode 100644 examples/vhost_blk/meson.build
> create mode 100644 examples/vhost_blk/vhost_blk.c
> create mode 100644 examples/vhost_blk/vhost_blk.h
> create mode 100644 examples/vhost_blk/vhost_blk_compat.c
Applied to dpdk-next-virtio/master.
Thanks,
Maxime
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [dpdk-dev] [PATCH v11 0/9] vhost: support inflight share memory protocol feature
2019-10-09 20:48 ` [dpdk-dev] [PATCH v11 0/9] vhost: support inflight share memory protocol feature Jin Yu
` (8 preceding siblings ...)
2019-10-09 20:48 ` [dpdk-dev] [PATCH v11 9/9] vhost: add vhost-user-blk example which support inflight Jin Yu
@ 2019-10-16 11:12 ` Maxime Coquelin
2019-10-25 10:08 ` Thomas Monjalon
9 siblings, 1 reply; 27+ messages in thread
From: Maxime Coquelin @ 2019-10-16 11:12 UTC (permalink / raw)
To: Jin Yu, dev; +Cc: changpeng.liu, tiwei.bie, zhihong.wang
On 10/9/19 10:48 PM, Jin Yu wrote:
> v2:
> - specify the APIs are split-ring only
>
> v3:
> - fix APIs issues and judge split or packed
>
> v4:
> - add rte_vhost_ prefix and fix issues
>
> v5:
> - add the packed ring support and add the vhost_blk example
>
> v6:
> - revise get_vring_base func depend on Tiwei's suggestion
>
> v7:
> - divide patch into small patches
>
> v8:
> - updated base on Maxime's comments
>
> v9:
> - updated base on Tiwei's comments
>
> v10:
> - fix code style and update some misleading log
>
> v11:
> - add the version log to cover letter
>
> This patches introduces two new messages VHOST_USER_GET_INFLIGHT_FD and VHOST_USER_SET_INFLIGHT_FD to support transferring a shared buffer between qemu and backend.
> Now It can both support split and packed ring. The example code show how these API work. The test has passed.
>
> How to test the example:
> 1, Qemu need two patches.
> https://patchwork.kernel.org/patch/10766813/
> https://patchwork.kernel.org/patch/10861411/(QEMU merged) it also needs some manual modifications:
> we should confirm that before we send get inflight we have already sent the set features but it seems Qemu didn't do like this. So we manually revise this, we can add below code in vhost_dev_get_inflight
> int r;
>
> r = vhost_dev_set_features(dev, dev->log_enabled);
> if (r < 0) {
> return -errno;
> }
> before get_inflight_fd.
> 2, Guest OS version >= 5.0
> 3, run the example
> 4, run the qemu with vhost-user-blk-pci.
> eg:
> -chardev socket,id=spdk_vhost_blk0,reconnect=1,path=xxxx\
> -device vhost-user-blk-pci,ring_packed=on,chardev=spdk_vhost_blk0,num-queues=1\
> 5, run fio in the guest
> 6, kill the example and run again.
> 7, the fio in the guest should continue run without errors.
>
>
> Jin Yu (9):
> vhost: add the inflight description
> vhost: add packed ring
> vhost: add the inflight structure
> vhost: add two new messages to support a shared buffer
> vhost: checkout the resubmit inflight information
> vhost: add the APIs to operate inflight ring
> vhost: add APIs for user getting inflight ring
> vhost: add vring functions packed ring support
> vhost: add vhost-user-blk example which support inflight
>
> examples/vhost_blk/Makefile | 68 ++
> examples/vhost_blk/blk.c | 125 +++
> examples/vhost_blk/blk_spec.h | 95 +++
> examples/vhost_blk/meson.build | 21 +
> examples/vhost_blk/vhost_blk.c | 1092 ++++++++++++++++++++++++
> examples/vhost_blk/vhost_blk.h | 128 +++
> examples/vhost_blk/vhost_blk_compat.c | 195 +++++
> lib/librte_vhost/rte_vhost.h | 237 ++++-
> lib/librte_vhost/rte_vhost_version.map | 8 +
> lib/librte_vhost/vhost.c | 407 ++++++++-
> lib/librte_vhost/vhost.h | 16 +
> lib/librte_vhost/vhost_user.c | 456 +++++++++-
> lib/librte_vhost/vhost_user.h | 12 +-
> 13 files changed, 2847 insertions(+), 13 deletions(-)
> create mode 100644 examples/vhost_blk/Makefile
> create mode 100644 examples/vhost_blk/blk.c
> create mode 100644 examples/vhost_blk/blk_spec.h
> create mode 100644 examples/vhost_blk/meson.build
> create mode 100644 examples/vhost_blk/vhost_blk.c
> create mode 100644 examples/vhost_blk/vhost_blk.h
> create mode 100644 examples/vhost_blk/vhost_blk_compat.c
>
Applied to dpdk-next-virtio/master.
Thanks,
Maxime
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [dpdk-dev] [PATCH v11 0/9] vhost: support inflight share memory protocol feature
2019-10-16 11:12 ` [dpdk-dev] [PATCH v11 0/9] vhost: support inflight share memory protocol feature Maxime Coquelin
@ 2019-10-25 10:08 ` Thomas Monjalon
2019-10-25 10:12 ` Maxime Coquelin
0 siblings, 1 reply; 27+ messages in thread
From: Thomas Monjalon @ 2019-10-25 10:08 UTC (permalink / raw)
To: Maxime Coquelin, Jin Yu; +Cc: dev, changpeng.liu, tiwei.bie, zhihong.wang
16/10/2019 13:12, Maxime Coquelin:
> On 10/9/19 10:48 PM, Jin Yu wrote:
> > Jin Yu (9):
> > vhost: add the inflight description
> > vhost: add packed ring
> > vhost: add the inflight structure
> > vhost: add two new messages to support a shared buffer
> > vhost: checkout the resubmit inflight information
> > vhost: add the APIs to operate inflight ring
> > vhost: add APIs for user getting inflight ring
> > vhost: add vring functions packed ring support
> > vhost: add vhost-user-blk example which support inflight
>
> Applied to dpdk-next-virtio/master.
Sorry I have to drop the last patch, adding a new example, for 2 reasons:
- I really would like to see the techboard approving one more example.
- The compilation was probably not tested enough because the example
is not added in meson and make "all examples".
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [dpdk-dev] [PATCH v11 0/9] vhost: support inflight share memory protocol feature
2019-10-25 10:08 ` Thomas Monjalon
@ 2019-10-25 10:12 ` Maxime Coquelin
2019-10-25 23:01 ` Thomas Monjalon
0 siblings, 1 reply; 27+ messages in thread
From: Maxime Coquelin @ 2019-10-25 10:12 UTC (permalink / raw)
To: Thomas Monjalon, Jin Yu; +Cc: dev, changpeng.liu, tiwei.bie, zhihong.wang
On 10/25/19 12:08 PM, Thomas Monjalon wrote:
> 16/10/2019 13:12, Maxime Coquelin:
>> On 10/9/19 10:48 PM, Jin Yu wrote:
>>> Jin Yu (9):
>>> vhost: add the inflight description
>>> vhost: add packed ring
>>> vhost: add the inflight structure
>>> vhost: add two new messages to support a shared buffer
>>> vhost: checkout the resubmit inflight information
>>> vhost: add the APIs to operate inflight ring
>>> vhost: add APIs for user getting inflight ring
>>> vhost: add vring functions packed ring support
>>> vhost: add vhost-user-blk example which support inflight
>>
>> Applied to dpdk-next-virtio/master.
>
> Sorry I have to drop the last patch, adding a new example, for 2 reasons:
OK,
> - I really would like to see the techboard approving one more example.
Just FYI, it is a replacement for vhost-scsi, which is removed in this
release.
> - The compilation was probably not tested enough because the example
> is not added in meson and make "all examples".
I'll let Jin Yu to submit again last patch with fixing its build.
Then techboard can decide if it is accepted.
Thanks,
Maxime
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [dpdk-dev] [PATCH v11 0/9] vhost: support inflight share memory protocol feature
2019-10-25 10:12 ` Maxime Coquelin
@ 2019-10-25 23:01 ` Thomas Monjalon
2019-10-28 1:37 ` Yu, Jin
0 siblings, 1 reply; 27+ messages in thread
From: Thomas Monjalon @ 2019-10-25 23:01 UTC (permalink / raw)
To: Jin Yu; +Cc: dev, Maxime Coquelin, changpeng.liu, tiwei.bie, zhihong.wang
25/10/2019 12:12, Maxime Coquelin:
> On 10/25/19 12:08 PM, Thomas Monjalon wrote:
> > 16/10/2019 13:12, Maxime Coquelin:
> >> On 10/9/19 10:48 PM, Jin Yu wrote:
> >>> Jin Yu (9):
> >>> vhost: add the inflight description
> >>> vhost: add packed ring
> >>> vhost: add the inflight structure
> >>> vhost: add two new messages to support a shared buffer
> >>> vhost: checkout the resubmit inflight information
> >>> vhost: add the APIs to operate inflight ring
> >>> vhost: add APIs for user getting inflight ring
> >>> vhost: add vring functions packed ring support
> >>> vhost: add vhost-user-blk example which support inflight
> >>
> >> Applied to dpdk-next-virtio/master.
> >
> > Sorry I have to drop the last patch, adding a new example, for 2 reasons:
>
> OK,
>
> > - I really would like to see the techboard approving one more example.
>
> Just FYI, it is a replacement for vhost-scsi, which is removed in this
> release.
>
> > - The compilation was probably not tested enough because the example
> > is not added in meson and make "all examples".
>
> I'll let Jin Yu to submit again last patch with fixing its build.
> Then techboard can decide if it is accepted.
While fixing this patch, please add a section in MAINTAINERS.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [dpdk-dev] [PATCH v11 0/9] vhost: support inflight share memory protocol feature
2019-10-25 23:01 ` Thomas Monjalon
@ 2019-10-28 1:37 ` Yu, Jin
0 siblings, 0 replies; 27+ messages in thread
From: Yu, Jin @ 2019-10-28 1:37 UTC (permalink / raw)
To: Thomas Monjalon
Cc: dev, Maxime Coquelin, Liu, Changpeng, Bie, Tiwei, Wang, Zhihong
> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Saturday, October 26, 2019 7:01 AM
> To: Yu, Jin <jin.yu@intel.com>
> Cc: dev@dpdk.org; Maxime Coquelin <maxime.coquelin@redhat.com>; Liu,
> Changpeng <changpeng.liu@intel.com>; Bie, Tiwei <tiwei.bie@intel.com>;
> Wang, Zhihong <zhihong.wang@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v11 0/9] vhost: support inflight share
> memory protocol feature
>
> 25/10/2019 12:12, Maxime Coquelin:
> > On 10/25/19 12:08 PM, Thomas Monjalon wrote:
> > > 16/10/2019 13:12, Maxime Coquelin:
> > >> On 10/9/19 10:48 PM, Jin Yu wrote:
> > >>> Jin Yu (9):
> > >>> vhost: add the inflight description
> > >>> vhost: add packed ring
> > >>> vhost: add the inflight structure
> > >>> vhost: add two new messages to support a shared buffer
> > >>> vhost: checkout the resubmit inflight information
> > >>> vhost: add the APIs to operate inflight ring
> > >>> vhost: add APIs for user getting inflight ring
> > >>> vhost: add vring functions packed ring support
> > >>> vhost: add vhost-user-blk example which support inflight
> > >>
> > >> Applied to dpdk-next-virtio/master.
> > >
> > > Sorry I have to drop the last patch, adding a new example, for 2 reasons:
> >
> > OK,
> >
> > > - I really would like to see the techboard approving one more example.
> >
> > Just FYI, it is a replacement for vhost-scsi, which is removed in this
> > release.
> >
> > > - The compilation was probably not tested enough because the example
> > > is not added in meson and make "all examples".
> >
> > I'll let Jin Yu to submit again last patch with fixing its build.
> > Then techboard can decide if it is accepted.
Got it.
>
> While fixing this patch, please add a section in MAINTAINERS.
>
Got it.
Thank.
^ permalink raw reply [flat|nested] 27+ messages in thread