DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting VM2VM scenario
@ 2017-11-30  9:46 Zhiyong Yang
  2017-11-30  9:46 ` [dpdk-dev] [PATCH 01/11] drivers/net: add vhostpci PMD base files Zhiyong Yang
                   ` (12 more replies)
  0 siblings, 13 replies; 26+ messages in thread
From: Zhiyong Yang @ 2017-11-30  9:46 UTC (permalink / raw)
  To: dev, yliu, maxime.coquelin; +Cc: wei.w.wang, jianfeng.tan

Vhostpci PMD is a new type driver working in guest OS which has ability to
drive the vhostpci modern pci device, which is a new virtio device.

The following linking is about vhostpci design:

An initial device design is presented at KVM Forum'16:
http://www.linux-kvm.org/images/5/55/02x07A-Wei_Wang-Design_of-Vhost-pci.pdf
The latest device design and implementation will be posted to the QEMU community soon.

Vhostpci PMD works in pair with virtio-net PMD to achieve point-to-point communication
between VMs. DPDK already has virtio/vhost user PMD pair to implement RX/TX packets
between guest/host scenario. However, for VM2VM use cases, Virtio PMD needs to
transmit pkts from VM1 to host OS firstly by vhost user port, then transmit pkts to
the 2nd VM by virtio PMD port again. Virtio/Vhostpci PMD pair can implement shared
memory to receive/trasmit packets directly between two VMs. Currently, the entire memory
of the virtio-net side VM is shared to the vhost-pci side VM, and mapped via device BAR2,
and the first 4KB area of BAR2 is reserved to store the metadata.

The vhostpci/virtio PMD working processing is the following:

1.VM1 startup with vhostpci device, bind the device to DPDK in the guest1,
launch the DPDK testpmd, then waiting for the remote memory info (the VM2
shares memory, memory regions and vring info).

2.VM2 startup with virtio-net device, bind the virito-net to DPDK in the VM2,
run testpmd using virtio PMD.

3.vhostpci device negotiate virtio message with virtio-net device via socket
as vhost user/virtio-net do that.

4.Vhostpci device gets VM2's memory region and vring info and write the metadata
to VM2's shared memory. 

5.When the metadata is ready to be read by the Vhostpci PMD, the PMD 
will receive a config interrupt with LINK_UP set in the status config.

6.Vhostpci PMD and Virtio PMD can transmit/receive the packets.

How to test?

1. launch VM1 with vhostpci device. 
qemu/x86_64-softmmu/qemu-system-x86_64 -cpu host -M pc -enable-kvm \
-smp 16,threads=1,sockets=1 -m 8G -mem-prealloc -realtime mlock=on \
-object memory-backend-file,id=mem,size=8G,mem-path=/dev/hugepages, \
share=on -numa node,memdev=mem -drive if=virtio,file=/root/vhost-pci/guest1.img,format=raw \
-kernel /opt/guest_kernel -append 'root=/dev/vda1 ro default_hugepagesz=1G hugepagesz=1G \
hugepages=2 console=ttyS0,115200,8n1 3' -netdev tap,id=net1,br=br0,script=/etc/qemu-ifup \
-chardev socket,id=slave1,server,wait=off, path=/opt/vhost-pci-slave1 -device vhost-pci-net-pci, \
chardev=slave1 \
-nographic

2. bind vhostpci device to dpdk using igb_uio. 
startup dpdk 
./x86_64-native-linuxapp-gcc/app/testpmd -c 0x3 -n 4 -- -i

3. launch VM2 with virtio-net device.

qemu/x86_64-softmmu/qemu-system-x86_64 -cpu host -M pc -enable-kvm \
-smp 4,threads=1,sockets=1 -m 8G -mem-prealloc -realtime mlock=on \
-object memory-backend-file,id=mem,size=8G,mem-path=/dev/hugepages,share=on \
-numa node,memdev=mem -drive if=virtio,file=/root/vhost-pci/guest2.img,format=raw \
-net none -no-hpet -kernel /opt/guest_kernel \
-append 'root=/dev/vda1 ro default_hugepagesz=1G hugepagesz=1G hugepages=2 console=ttyS0,115200,8n1 3' \
-chardev socket,id=sock2,path=/opt/vhost-pci-slave1 \
-netdev type=vhost-user,id=net2,chardev=sock2,vhostforce \
-device virtio-net-pci,mac=52:54:00:00:00:02,netdev=net2 \
-nographic

4.bind virtio-net to dpdk using igb_uio
run dpdk

./x86_64-native-linuxapp-gcc/app/testpmd -c 0x3 -n 4 --socket-mem 512,0 \
-- -i --rxq=1 --txq=1 --nb-cores=1

5. vhostpci PMD run "start"

6. virtio PMD side run "start tx_first"

loopback testing can work.

note: 
1. only support igb_uio for now.
2. vhostpci device is a modern pci device. vhostpci PMD only supports mergable
mode. Virtio device side must be mergable mode.
3. vhostpci PMD supports one queue pair for now.

Zhiyong Yang (11):
  drivers/net: add vhostpci PMD base files
  net/vhostpci: public header files
  net/vhostpci: add debugging log macros
  net/vhostpci: add basic framework
  net/vhostpci: add queue setup
  net/vhostpci: add support for link status change
  net/vhostpci: get remote memory region and vring info
  net/vhostpci: add RX function
  net/vhostpci: add TX function
  net/vhostpci: support RX/TX packets statistics
  net/vhostpci: update release note

 MAINTAINERS                                       |    6 +
 config/common_base                                |    9 +
 config/common_linuxapp                            |    1 +
 doc/guides/rel_notes/release_18_02.rst            |    6 +
 drivers/net/Makefile                              |    1 +
 drivers/net/vhostpci/Makefile                     |   54 +
 drivers/net/vhostpci/rte_pmd_vhostpci_version.map |    3 +
 drivers/net/vhostpci/vhostpci_ethdev.c            | 1521 +++++++++++++++++++++
 drivers/net/vhostpci/vhostpci_ethdev.h            |  176 +++
 drivers/net/vhostpci/vhostpci_logs.h              |   69 +
 drivers/net/vhostpci/vhostpci_net.h               |   74 +
 drivers/net/vhostpci/vhostpci_pci.c               |  334 +++++
 drivers/net/vhostpci/vhostpci_pci.h               |  240 ++++
 mk/rte.app.mk                                     |    1 +
 14 files changed, 2495 insertions(+)
 create mode 100644 drivers/net/vhostpci/Makefile
 create mode 100644 drivers/net/vhostpci/rte_pmd_vhostpci_version.map
 create mode 100644 drivers/net/vhostpci/vhostpci_ethdev.c
 create mode 100644 drivers/net/vhostpci/vhostpci_ethdev.h
 create mode 100644 drivers/net/vhostpci/vhostpci_logs.h
 create mode 100644 drivers/net/vhostpci/vhostpci_net.h
 create mode 100644 drivers/net/vhostpci/vhostpci_pci.c
 create mode 100644 drivers/net/vhostpci/vhostpci_pci.h

-- 
2.13.3

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [dpdk-dev] [PATCH 01/11] drivers/net: add vhostpci PMD base files
  2017-11-30  9:46 [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting VM2VM scenario Zhiyong Yang
@ 2017-11-30  9:46 ` Zhiyong Yang
  2017-11-30  9:46 ` [dpdk-dev] [PATCH 02/11] net/vhostpci: public header files Zhiyong Yang
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Zhiyong Yang @ 2017-11-30  9:46 UTC (permalink / raw)
  To: dev, yliu, maxime.coquelin; +Cc: wei.w.wang, jianfeng.tan, Zhiyong Yang

Add Makefile and map files for vhostpci PMD.
Update common_base and drivers/net/Makefile files.

Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
---
 config/common_base                                |  5 +++
 drivers/net/Makefile                              |  1 +
 drivers/net/vhostpci/Makefile                     | 52 +++++++++++++++++++++++
 drivers/net/vhostpci/rte_pmd_vhostpci_version.map |  3 ++
 4 files changed, 61 insertions(+)
 create mode 100644 drivers/net/vhostpci/Makefile
 create mode 100644 drivers/net/vhostpci/rte_pmd_vhostpci_version.map

diff --git a/config/common_base b/config/common_base
index e74febef4..ea8313591 100644
--- a/config/common_base
+++ b/config/common_base
@@ -281,6 +281,11 @@ CONFIG_RTE_LIBRTE_NFP_DEBUG=n
 CONFIG_RTE_LIBRTE_MRVL_PMD=n
 
 #
+# Compile burst-oriented vhostpci PMD driver
+#
+CONFIG_RTE_LIBRTE_VHOSTPCI_PMD=n
+
+#
 # Compile burst-oriented Broadcom BNXT PMD driver
 #
 CONFIG_RTE_LIBRTE_BNXT_PMD=y
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index ef09b4e16..96da4b59f 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -66,6 +66,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_SFC_EFX_PMD) += sfc
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_SZEDATA2) += szedata2
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += tap
 DIRS-$(CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD) += thunderx
+DIRS-$(CONFIG_RTE_LIBRTE_VHOSTPCI_PMD) += vhostpci
 DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio
 DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3
 
diff --git a/drivers/net/vhostpci/Makefile b/drivers/net/vhostpci/Makefile
new file mode 100644
index 000000000..3467e7cbe
--- /dev/null
+++ b/drivers/net/vhostpci/Makefile
@@ -0,0 +1,52 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2017 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_vhostpci.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
+LDLIBS += -lrte_ethdev -lrte_net -lrte_bus_pci
+
+EXPORT_MAP := rte_pmd_vhostpci_version.map
+
+LIBABIVER := 1
+
+#
+# all source are stored in SRCS-y
+#
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/vhostpci/rte_pmd_vhostpci_version.map b/drivers/net/vhostpci/rte_pmd_vhostpci_version.map
new file mode 100644
index 000000000..58b94270d
--- /dev/null
+++ b/drivers/net/vhostpci/rte_pmd_vhostpci_version.map
@@ -0,0 +1,3 @@
+DPDK_18.02 {
+	local: *;
+};
-- 
2.13.3

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [dpdk-dev] [PATCH 02/11] net/vhostpci: public header files
  2017-11-30  9:46 [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting VM2VM scenario Zhiyong Yang
  2017-11-30  9:46 ` [dpdk-dev] [PATCH 01/11] drivers/net: add vhostpci PMD base files Zhiyong Yang
@ 2017-11-30  9:46 ` Zhiyong Yang
  2017-11-30  9:46 ` [dpdk-dev] [PATCH 03/11] net/vhostpci: add debugging log macros Zhiyong Yang
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Zhiyong Yang @ 2017-11-30  9:46 UTC (permalink / raw)
  To: dev, yliu, maxime.coquelin; +Cc: wei.w.wang, jianfeng.tan, Zhiyong Yang

add public/exported files for vhostpci PMD.  The structures and
constants that define the method of operation of the device can be
visible by both the PMD and the DPDK application.

Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
---
 drivers/net/vhostpci/vhostpci_ethdev.h | 176 ++++++++++++++++++++++++
 drivers/net/vhostpci/vhostpci_net.h    |  74 ++++++++++
 drivers/net/vhostpci/vhostpci_pci.h    | 240 +++++++++++++++++++++++++++++++++
 3 files changed, 490 insertions(+)
 create mode 100644 drivers/net/vhostpci/vhostpci_ethdev.h
 create mode 100644 drivers/net/vhostpci/vhostpci_net.h
 create mode 100644 drivers/net/vhostpci/vhostpci_pci.h

diff --git a/drivers/net/vhostpci/vhostpci_ethdev.h b/drivers/net/vhostpci/vhostpci_ethdev.h
new file mode 100644
index 000000000..3ff67dbc6
--- /dev/null
+++ b/drivers/net/vhostpci/vhostpci_ethdev.h
@@ -0,0 +1,176 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _VHOSTPCI_ETHDEV_H_
+#define _VHOSTPCI_ETHDEV_H_
+
+#include <linux/virtio_ring.h>
+
+#include "vhostpci_pci.h"
+#include "vhostpci_net.h"
+
+#define VHOSTPCI_MAX_RX_QUEUES 128U
+#define VHOSTPCI_MAX_TX_QUEUES 128U
+#define VHOSTPCI_MAX_MAC_ADDRS 1
+#define VHOSTPCI_MIN_RX_BUFSIZE 64
+#define VHOSTPCI_MAX_RX_PKTLEN  9728U
+#define VHOSTPCI_NUM_DESCRIPTORS 256U
+#define VHOSTPCI_MAX_QUEUE_PAIRS 0x1
+
+/* Features supported by vhostpci PMD by default. */
+#define VHOSTPCI_PMD_DEFAULT_GUEST_FEATURES	\
+	(1ULL << VIRTIO_NET_F_MRG_RXBUF	  |	\
+	 1ULL << VIRTIO_F_VERSION_1)
+
+/**
+ * This is the first element of the scatter-gather list.  If you don't
+ * specify GSO or CSUM features, you can simply ignore the header.
+ */
+struct virtio_net_hdr {
+#define VIRTIO_NET_HDR_F_NEEDS_CSUM 1    /**< Use csum_start,csum_offset*/
+#define VIRTIO_NET_HDR_F_DATA_VALID 2    /**< Checksum is valid */
+	uint8_t flags;
+#define VIRTIO_NET_HDR_GSO_NONE     0    /**< Not a GSO frame */
+#define VIRTIO_NET_HDR_GSO_TCPV4    1    /**< GSO frame, IPv4 TCP (TSO) */
+#define VIRTIO_NET_HDR_GSO_UDP      3    /**< GSO frame, IPv4 UDP (UFO) */
+#define VIRTIO_NET_HDR_GSO_TCPV6    4    /**< GSO frame, IPv6 TCP */
+#define VIRTIO_NET_HDR_GSO_ECN      0x80 /**< TCP has ECN set */
+	uint8_t gso_type;
+	uint16_t hdr_len;     /**< Ethernet + IP + tcp/udp hdrs */
+	uint16_t gso_size;    /**< Bytes to append to hdr_len per frame */
+	uint16_t csum_start;  /**< Position to start checksumming from */
+	uint16_t csum_offset; /**< Offset after that to place checksum */
+};
+
+/**
+ * This is the version of the header to use when the MRG_RXBUF
+ * feature has been negotiated.
+ */
+struct virtio_net_hdr_mrg_rxbuf {
+	struct virtio_net_hdr hdr;
+	uint16_t num_buffers; /**< Number of merged rx buffers */
+};
+
+enum {VTNET_RXQ = 0, VTNET_TXQ, VTNET_QNUM};
+
+struct vhostpci_stats {
+	uint64_t pkts;
+	uint64_t bytes;
+	uint64_t missed_pkts;
+};
+
+struct vhostpci_queue {
+	rte_atomic32_t allow_queuing;
+	rte_atomic32_t while_queuing;
+	struct rte_mempool *mb_pool;
+	uint16_t port_id;
+	uint16_t virtqueue_id;
+	struct vhostpci_stats stats;
+	void *vpnet;
+};
+
+/**
+ * Information relating to memory regions including offsets to
+ * addresses in QEMUs memory file.
+ */
+struct vhostpci_mem_region {
+	uint64_t guest_phys_addr;
+	uint64_t guest_user_addr;
+	uint64_t host_user_addr;
+	uint64_t size;
+	void	 *mmap_addr;
+	uint64_t mmap_size;
+	uint64_t offset;
+	uint64_t start;
+	uint64_t end;
+};
+
+/**
+ * Memory structure includes region and mapping information.
+ */
+
+struct vhostpci_memory {
+	uint32_t nregions;
+	struct vhostpci_mem_region regions[MAX_REMOTE_REGION];
+};
+
+/**
+ * Structure contains the info for each batched memory copy.
+ */
+struct batch_copy_elem {
+	void *dst;
+	void *src;
+	uint32_t len;
+};
+
+/**
+ * Structure contains buffer address, length and descriptor index
+ * from vring to do scatter RX.
+ */
+struct buf_vector {
+	uint64_t buf_addr;
+	uint32_t buf_len;
+	uint32_t desc_idx;
+};
+
+/**
+ * Structure contains variables relevant to RX/TX virtqueues.
+ */
+struct vhostpci_virtqueue {
+	struct vring_desc	*desc;
+	struct vring_avail	*avail;
+	struct vring_used	*used;
+	uint32_t		size;
+	uint16_t		last_avail_idx;
+	uint16_t		last_used_idx;
+	int			enabled;
+	struct vring_used_elem  *shadow_used_ring;
+	uint16_t                shadow_used_idx;
+	struct batch_copy_elem	*batch_copy_elems;
+	uint16_t		batch_copy_nb_elems;
+} __rte_cache_aligned;
+
+struct vhostpci_net {
+	uint64_t	features;
+	uint64_t	protocol_features;
+	uint64_t	mem_base;
+	uint32_t	flags;
+	uint32_t	nr_vring;
+	struct vhostpci_memory mem;
+	struct vhostpci_virtqueue *virtqueue[VHOSTPCI_MAX_QUEUE_PAIRS * 2];
+	uint16_t	vhost_hlen;
+};
+
+extern struct vhostpci_hw_internal vhostpci_hw_internal[RTE_MAX_ETHPORTS];
+
+#endif /* _VHOSTPCI_ETHDEV_H_ */
diff --git a/drivers/net/vhostpci/vhostpci_net.h b/drivers/net/vhostpci/vhostpci_net.h
new file mode 100644
index 000000000..0e6eef695
--- /dev/null
+++ b/drivers/net/vhostpci/vhostpci_net.h
@@ -0,0 +1,74 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _VHOSTPCI_NET_H_
+#define _VHOSTPCI_NET_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define VIRTIO_ID_VHOST_PCI_NET 21 /* vhost-pci-net */
+
+#define REMOTE_MEM_BAR_ID 2
+#define METADATA_SIZE 4096
+#define REMOTE_MEM_BAR_SIZE 0x1000000000
+
+#define MAX_REMOTE_REGION 8
+
+struct vpnet_remote_mem {
+	uint64_t gpa;
+	uint64_t size;
+};
+
+struct vpnet_remote_vq {
+	uint16_t last_avail_idx;
+	int32_t  vring_enabled;
+	uint32_t vring_num;
+	uint64_t desc_gpa;
+	uint64_t avail_gpa;
+	uint64_t used_gpa;
+};
+
+struct vpnet_metadata {
+	uint32_t nregions;
+	uint32_t nvqs;
+	struct vpnet_remote_mem mem[MAX_REMOTE_REGION];
+	struct vpnet_remote_vq vq[0];
+};
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _VHOSTPCI_NET_H_ */
diff --git a/drivers/net/vhostpci/vhostpci_pci.h b/drivers/net/vhostpci/vhostpci_pci.h
new file mode 100644
index 000000000..18ec72287
--- /dev/null
+++ b/drivers/net/vhostpci/vhostpci_pci.h
@@ -0,0 +1,240 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _VHOSTPCI_PCI_H_
+#define _VHOSTPCI_PCI_H_
+
+#include <stdint.h>
+
+#include <rte_pci.h>
+#include <rte_bus_pci.h>
+#include <rte_ethdev.h>
+
+struct virtqueue;
+
+/* VHOSTPCI vendor/device ID. */
+#define VHOST_PCI_VENDORID 0x1AF4
+#define VHOST_PCI_NET_MODERN_DEVICEID 0x1055
+
+/**
+ * VirtIO Header, located in BAR 0.
+ */
+#define VIRTIO_PCI_HOST_FEATURES  0  /* host's supported features (32bit, RO)*/
+#define VIRTIO_PCI_GUEST_FEATURES 4  /* guest's supported features (32, RW) */
+#define VIRTIO_PCI_QUEUE_PFN      8  /* physical address of VQ (32, RW) */
+#define VIRTIO_PCI_QUEUE_NUM      12 /* number of ring entries (16, RO) */
+#define VIRTIO_PCI_QUEUE_SEL      14 /* current VQ selection (16, RW) */
+#define VIRTIO_PCI_QUEUE_NOTIFY   16 /* notify host regarding VQ (16, RW) */
+#define VIRTIO_PCI_STATUS         18 /* device status register (8, RW) */
+#define VIRTIO_PCI_ISR		  19 /* interrupt status register, reading
+				      *	also clears the register (8, RO)
+				      */
+
+/* The bit of the ISR which indicates a device has an interrupt. */
+#define VIRTIO_PCI_ISR_INTR   0x1
+/* The bit of the ISR which indicates a device configuration change. */
+#define VIRTIO_PCI_ISR_CONFIG 0x2
+
+/* Status byte for guest to report progress. */
+#define VIRTIO_CONFIG_STATUS_RESET     0x00
+#define VIRTIO_CONFIG_STATUS_ACK       0x01
+#define VIRTIO_CONFIG_STATUS_DRIVER    0x02
+#define VIRTIO_CONFIG_STATUS_DRIVER_OK 0x04
+#define VIRTIO_CONFIG_STATUS_FEATURES_OK 0x08
+#define VIRTIO_CONFIG_STATUS_FAILED    0x80
+
+/* The feature bitmap for virtio net */
+#define VIRTIO_NET_F_MAC	5	/* Host has given MAC address. */
+#define VIRTIO_NET_F_MRG_RXBUF	15	/* Host can merge receive buffers. */
+
+/* Can the device handle any descriptor layout? */
+#define VIRTIO_F_ANY_LAYOUT		27
+
+#define VIRTIO_F_VERSION_1		32
+
+#define VHOSTPCI_NET_S_LINK_UP	1	/* Link is up */
+
+/**
+ * Maximum number of virtqueues per device.
+ */
+#define VHOSTPCI_MAX_VIRTQUEUE_PAIRS 8
+#define VHOSTPCI_MAX_VIRTQUEUES (VHOSTPCI_MAX_VIRTQUEUE_PAIRS * 2)
+
+/* Common configuration */
+#define VIRTIO_PCI_CAP_COMMON_CFG	1
+/* Notifications */
+#define VIRTIO_PCI_CAP_NOTIFY_CFG	2
+/* ISR Status */
+#define VIRTIO_PCI_CAP_ISR_CFG		3
+/* Device specific configuration */
+#define VIRTIO_PCI_CAP_DEVICE_CFG	4
+/* PCI configuration access */
+#define VIRTIO_PCI_CAP_PCI_CFG		5
+
+/* This is the PCI capability header: */
+struct vpnet_pci_cap {
+	uint8_t cap_vndr;	/* Generic PCI field: PCI_CAP_ID_VNDR */
+	uint8_t cap_next;	/* Generic PCI field: next ptr. */
+	uint8_t cap_len;	/* Generic PCI field: capability length */
+	uint8_t cfg_type;	/* Identifies the structure. */
+	uint8_t bar;		/* Where to find it. */
+	uint8_t padding[3];	/* Pad to full dword. */
+	uint32_t offset;	/* Offset within bar. */
+	uint32_t length;	/* Length of the structure, in bytes. */
+};
+
+struct vpnet_notify_cap {
+	struct vpnet_pci_cap cap;
+	uint32_t notify_off_multiplier;	/* Multiplier for queue_notify_off. */
+};
+
+/* Fields in VIRTIO_PCI_CAP_COMMON_CFG: */
+struct vpnet_pci_common_cfg {
+	/* About the whole device. */
+	uint32_t device_feature_select;	/* read-write */
+	uint32_t device_feature;	/* read-only */
+	uint32_t guest_feature_select;	/* read-write */
+	uint32_t guest_feature;		/* read-write */
+	uint16_t msix_config;		/* read-write */
+	uint16_t num_queues;		/* read-only */
+	uint8_t device_status;		/* read-write */
+	uint8_t config_generation;	/* read-only */
+
+	/* About a specific virtqueue. */
+	uint16_t queue_select;		/* read-write */
+	uint16_t queue_size;		/* read-write, power of 2. */
+	uint16_t queue_msix_vector;	/* read-write */
+	uint16_t queue_enable;		/* read-write */
+	uint16_t queue_notify_off;	/* read-only */
+	uint32_t queue_desc_lo;		/* read-write */
+	uint32_t queue_desc_hi;		/* read-write */
+	uint32_t queue_avail_lo;	/* read-write */
+	uint32_t queue_avail_hi;	/* read-write */
+	uint32_t queue_used_lo;		/* read-write */
+	uint32_t queue_used_hi;		/* read-write */
+};
+
+struct vpnet_pci_config {
+	/* configure mac address */
+	uint8_t mac[ETHER_ADDR_LEN];
+	/* link up/down status */
+	uint16_t status;
+} __attribute__((packed));
+
+struct vhostpci_hw {
+	uint64_t    req_guest_features;
+	uint64_t    guest_features;
+	uint32_t    max_queue_pairs;
+	uint16_t    started;
+	uint16_t    max_mtu;
+	uint16_t    vtnet_hdr_size;
+	uint8_t     modern;
+	uint16_t    port_id;
+	uint8_t     mac_addr[ETHER_ADDR_LEN];
+	uint32_t    notify_off_multiplier;
+	uint8_t     *isr;
+	uint16_t    *notify_base;
+	struct vpnet_pci_common_cfg *common_cfg;
+	struct vpnet_pci_config *dev_cfg;
+	struct virtqueue **vqs;
+	struct vhostpci_net *vpnet;
+};
+
+struct vpnet_pci_ops {
+	void (*read_dev_cfg)(struct vhostpci_hw *hw, size_t offset,
+			     void *dst, int len);
+	void (*write_dev_cfg)(struct vhostpci_hw *hw, size_t offset,
+			      const void *src, int len);
+	void (*reset)(struct vhostpci_hw *hw);
+
+	uint8_t (*get_status)(struct vhostpci_hw *hw);
+	void    (*set_status)(struct vhostpci_hw *hw, uint8_t status);
+
+	uint64_t (*get_features)(struct vhostpci_hw *hw);
+	void     (*set_features)(struct vhostpci_hw *hw, uint64_t features);
+
+	uint8_t (*get_isr)(struct vhostpci_hw *hw);
+
+	uint16_t (*set_config_irq)(struct vhostpci_hw *hw, uint16_t vec);
+
+	uint16_t (*set_queue_irq)(struct vhostpci_hw *hw, struct virtqueue *vq,
+			uint16_t vec);
+
+	uint16_t (*get_queue_num)(struct vhostpci_hw *hw, uint16_t queue_id);
+
+	void (*notify_queue)(struct vhostpci_hw *hw, struct virtqueue *vq);
+};
+
+/**
+ * While vhostpci_hw is stored in shared memory, this structure stores
+ * some infos that may vary in the multiple process model locally.
+ * For example, the vtpci_ops pointer.
+ */
+struct vhostpci_hw_internal {
+	const struct vpnet_pci_ops *vtpci_ops;
+};
+
+#define VTPCI_OPS(hw)	(vhostpci_hw_internal[(hw)->port_id].vtpci_ops)
+
+extern struct vhostpci_hw_internal vhostpci_hw_internal[RTE_MAX_ETHPORTS];
+
+/* The alignment to use between consumer and producer parts of vring. */
+#define VHOST_PCI_VRING_ALIGN 4096
+
+static inline int
+vtpci_with_feature(struct vhostpci_hw *hw, uint64_t bit)
+{
+	return (hw->guest_features & (1ULL << bit)) != 0;
+}
+
+/**
+ * Function declaration from vhostpci_pci.c
+ */
+
+uint8_t vhpci_isr(struct vhostpci_hw *hw);
+
+uint8_t vhpci_get_status(struct vhostpci_hw *hw);
+
+void vhpci_init_complete(struct vhostpci_hw *hw);
+
+void vhpci_set_status(struct vhostpci_hw *hw, uint8_t status);
+
+uint64_t vhpci_negotiate_features(struct vhostpci_hw *hw,
+		uint64_t host_features);
+
+int vhostpci_pci_init(struct rte_pci_device *dev, struct vhostpci_hw *hw);
+
+void vhpci_read_dev_config(struct vhostpci_hw *hw, size_t offset,
+		void *dst, int length);
+
+#endif /* _VHOSTPCI_PCI_H_ */
-- 
2.13.3

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [dpdk-dev] [PATCH 03/11] net/vhostpci: add debugging log macros
  2017-11-30  9:46 [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting VM2VM scenario Zhiyong Yang
  2017-11-30  9:46 ` [dpdk-dev] [PATCH 01/11] drivers/net: add vhostpci PMD base files Zhiyong Yang
  2017-11-30  9:46 ` [dpdk-dev] [PATCH 02/11] net/vhostpci: public header files Zhiyong Yang
@ 2017-11-30  9:46 ` Zhiyong Yang
  2017-11-30  9:46 ` [dpdk-dev] [PATCH 04/11] net/vhostpci: add basic framework Zhiyong Yang
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Zhiyong Yang @ 2017-11-30  9:46 UTC (permalink / raw)
  To: dev, yliu, maxime.coquelin; +Cc: wei.w.wang, jianfeng.tan, Zhiyong Yang

add a header file with debugging log macros for vhostpci PMD.

Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
---
 config/common_base                   |  4 +++
 drivers/net/vhostpci/vhostpci_logs.h | 69 ++++++++++++++++++++++++++++++++++++
 2 files changed, 73 insertions(+)
 create mode 100644 drivers/net/vhostpci/vhostpci_logs.h

diff --git a/config/common_base b/config/common_base
index ea8313591..f382fc6ce 100644
--- a/config/common_base
+++ b/config/common_base
@@ -284,6 +284,10 @@ CONFIG_RTE_LIBRTE_MRVL_PMD=n
 # Compile burst-oriented vhostpci PMD driver
 #
 CONFIG_RTE_LIBRTE_VHOSTPCI_PMD=n
+CONFIG_RTE_LIBRTE_VHOSTPCI_DEBUG_INIT=n
+CONFIG_RTE_LIBRTE_VHOSTPCI_DEBUG_RX=n
+CONFIG_RTE_LIBRTE_VHOSTPCI_DEBUG_TX=n
+CONFIG_RTE_LIBRTE_VHOSTPCI_DEBUG_DRIVER=n
 
 #
 # Compile burst-oriented Broadcom BNXT PMD driver
diff --git a/drivers/net/vhostpci/vhostpci_logs.h b/drivers/net/vhostpci/vhostpci_logs.h
new file mode 100644
index 000000000..16c49d29d
--- /dev/null
+++ b/drivers/net/vhostpci/vhostpci_logs.h
@@ -0,0 +1,69 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _VHOSTPCI_LOGS_H_
+#define _VHOSTPCI_LOGS_H_
+
+#include <rte_log.h>
+
+#ifdef RTE_LIBRTE_VHOSTPCI_DEBUG_INIT
+#define PMD_INIT_LOG(level, fmt, args...) \
+	RTE_LOG(level, PMD, "%s(): " fmt "\n", __func__, ## args)
+#define PMD_INIT_FUNC_TRACE() PMD_INIT_LOG(DEBUG, " >>")
+#else
+#define PMD_INIT_LOG(level, fmt, args...) do { } while (0)
+#define PMD_INIT_FUNC_TRACE() do { } while (0)
+#endif
+
+#ifdef RTE_LIBRTE_VHOSTPCI_DEBUG_RX
+#define PMD_RX_LOG(level, fmt, args...) \
+	RTE_LOG(level, PMD, "%s() rx: " fmt "\n", __func__, ## args)
+#else
+#define PMD_RX_LOG(level, fmt, args...) do { } while (0)
+#endif
+
+#ifdef RTE_LIBRTE_VHOSTPCI_DEBUG_TX
+#define PMD_TX_LOG(level, fmt, args...) \
+	RTE_LOG(level, PMD, "%s() tx: " fmt "\n", __func__, ## args)
+#else
+#define PMD_TX_LOG(level, fmt, args...) do { } while (0)
+#endif
+
+#ifdef RTE_LIBRTE_VHOSTPCI_DEBUG_DRIVER
+#define PMD_DRV_LOG(level, fmt, args...) \
+	RTE_LOG(level, PMD, "%s(): " fmt "\n", __func__, ## args)
+#else
+#define PMD_DRV_LOG(level, fmt, args...) do { } while (0)
+#endif
+
+#endif /* _VIRTIO_LOGS_H_ */
-- 
2.13.3

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [dpdk-dev] [PATCH 04/11] net/vhostpci: add basic framework
  2017-11-30  9:46 [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting VM2VM scenario Zhiyong Yang
                   ` (2 preceding siblings ...)
  2017-11-30  9:46 ` [dpdk-dev] [PATCH 03/11] net/vhostpci: add debugging log macros Zhiyong Yang
@ 2017-11-30  9:46 ` Zhiyong Yang
  2017-11-30  9:46 ` [dpdk-dev] [PATCH 05/11] net/vhostpci: add queue setup Zhiyong Yang
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Zhiyong Yang @ 2017-11-30  9:46 UTC (permalink / raw)
  To: dev, yliu, maxime.coquelin; +Cc: wei.w.wang, jianfeng.tan, Zhiyong Yang

This commit introduces the vhostpci framework in DPDK. Including:

1. Register vhostpci PMD.
2. Implement the pci device probe and remove functions.
3. vhostpci_net PMD allocates memory and initializes.
4. start, stop, close and info_gets functions.

Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
---
 config/common_linuxapp                 |   1 +
 drivers/net/vhostpci/Makefile          |   2 +
 drivers/net/vhostpci/vhostpci_ethdev.c | 539 +++++++++++++++++++++++++++++++++
 drivers/net/vhostpci/vhostpci_pci.c    | 334 ++++++++++++++++++++
 mk/rte.app.mk                          |   1 +
 5 files changed, 877 insertions(+)
 create mode 100644 drivers/net/vhostpci/vhostpci_ethdev.c
 create mode 100644 drivers/net/vhostpci/vhostpci_pci.c

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 74c7d64ec..d5e2132a3 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -49,4 +49,5 @@ CONFIG_RTE_LIBRTE_PMD_TAP=y
 CONFIG_RTE_LIBRTE_AVP_PMD=y
 CONFIG_RTE_LIBRTE_NFP_PMD=y
 CONFIG_RTE_LIBRTE_POWER=y
+CONFIG_RTE_LIBRTE_VHOSTPCI_PMD=y
 CONFIG_RTE_VIRTIO_USER=y
diff --git a/drivers/net/vhostpci/Makefile b/drivers/net/vhostpci/Makefile
index 3467e7cbe..3089e54d8 100644
--- a/drivers/net/vhostpci/Makefile
+++ b/drivers/net/vhostpci/Makefile
@@ -48,5 +48,7 @@ LIBABIVER := 1
 #
 # all source are stored in SRCS-y
 #
+SRCS-$(CONFIG_RTE_LIBRTE_VHOSTPCI_PMD) += vhostpci_pci.c
+SRCS-$(CONFIG_RTE_LIBRTE_VHOSTPCI_PMD) += vhostpci_ethdev.c
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/vhostpci/vhostpci_ethdev.c b/drivers/net/vhostpci/vhostpci_ethdev.c
new file mode 100644
index 000000000..873ff7482
--- /dev/null
+++ b/drivers/net/vhostpci/vhostpci_ethdev.c
@@ -0,0 +1,539 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_eal.h>
+#include <rte_dev.h>
+#include <stdbool.h>
+#include <rte_pci.h>
+#include <rte_ether.h>
+#include <rte_common.h>
+#include <rte_memory.h>
+#include <rte_ethdev.h>
+#include <rte_memcpy.h>
+#include <rte_malloc.h>
+#include <rte_atomic.h>
+#include <rte_memzone.h>
+#include <rte_bus_pci.h>
+#include <rte_ethdev_pci.h>
+
+#include "vhostpci_logs.h"
+#include "vhostpci_ethdev.h"
+
+static void
+vhostpci_dev_info_get(struct rte_eth_dev *dev,
+		struct rte_eth_dev_info *dev_info);
+
+static void
+vhostpci_get_hwaddr(struct vhostpci_hw *hw);
+
+static int
+vhostpci_dev_configure(struct rte_eth_dev *dev);
+
+static int
+eth_vhostpci_dev_init(struct rte_eth_dev *eth_dev);
+
+static int
+vhostpci_dev_atomic_write_link_status(struct rte_eth_dev *dev,
+		struct rte_eth_link *link);
+
+static int
+vhostpci_init_device(struct rte_eth_dev *eth_dev, uint64_t req_features);
+
+static int
+vhostpci_dev_start(struct rte_eth_dev *dev);
+
+static void
+update_queuing_status(struct rte_eth_dev *dev);
+
+static void
+vhostpci_dev_close(struct rte_eth_dev *dev);
+
+static void
+vhostpci_dev_stop(struct rte_eth_dev *dev);
+
+static const struct eth_dev_ops vhostpci_eth_dev_ops = {
+	.dev_start               = vhostpci_dev_start,
+	.dev_stop                = vhostpci_dev_stop,
+	.dev_close               = vhostpci_dev_close,
+	.dev_infos_get		 = vhostpci_dev_info_get,
+	.dev_configure		 = vhostpci_dev_configure,
+};
+
+static inline bool
+is_valid_virt_queue_idx(uint32_t idx, int is_tx, uint32_t nr_vring)
+{
+	return (is_tx ^ (idx & 1)) == 0 && idx < nr_vring;
+}
+
+static int
+vhostpci_negotiate_features(struct vhostpci_hw *hw, uint64_t req_features);
+
+static inline int
+vhostpci_pci_with_feature(struct vhostpci_hw *hw, uint64_t bit)
+{
+	return (hw->guest_features & (1ULL << bit)) != 0;
+}
+
+static int
+vhostpci_dev_start(struct rte_eth_dev *dev)
+{
+	struct vhostpci_hw *hw = dev->data->dev_private;
+
+	hw->started = 1;
+	update_queuing_status(dev);
+
+	return 0;
+}
+
+static void
+vhostpci_get_hwaddr(struct vhostpci_hw *hw)
+{
+	if (vtpci_with_feature(hw, VIRTIO_NET_F_MAC)) {
+		vhpci_read_dev_config(hw,
+			offsetof(struct vpnet_pci_config, mac),
+			&hw->mac_addr, ETHER_ADDR_LEN);
+	} else {
+		eth_random_addr(&hw->mac_addr[0]);
+	}
+}
+
+static void
+update_queuing_status(struct rte_eth_dev *dev)
+{
+	struct vhostpci_hw *hw = dev->data->dev_private;
+	struct vhostpci_queue *vq;
+	int i;
+	int allow_queuing = 1;
+
+	if (hw->started == 0)
+		allow_queuing = 0;
+
+	/* Wait until rx/tx_pkt_burst stops accessing vhost device */
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		vq = dev->data->rx_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, allow_queuing);
+		while (rte_atomic32_read(&vq->while_queuing))
+			rte_pause();
+	}
+
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		vq = dev->data->tx_queues[i];
+		if (vq == NULL)
+			continue;
+		rte_atomic32_set(&vq->allow_queuing, allow_queuing);
+		while (rte_atomic32_read(&vq->while_queuing))
+			rte_pause();
+	}
+}
+
+static int
+vhostpci_negotiate_features(struct vhostpci_hw *hw, uint64_t req_features)
+{
+	uint64_t host_features;
+
+	/* Prepare guest_features: feature that driver wants to support */
+	PMD_INIT_LOG(DEBUG, "guest_features before negotiate = %x",
+		req_features);
+
+	/* Read device(host) feature bits */
+	host_features = VTPCI_OPS(hw)->get_features(hw);
+	PMD_INIT_LOG(DEBUG, "host_features before negotiate = %x",
+		host_features);
+
+	/**
+	 * Negotiate features: Subset of device feature bits are written back
+	 * guest feature bits.
+	 */
+	hw->guest_features = req_features;
+	hw->guest_features = vhpci_negotiate_features(hw, host_features);
+	PMD_INIT_LOG(DEBUG, "features after negotiate = %x",
+		hw->guest_features);
+
+	if (hw->modern) {
+		if (!vtpci_with_feature(hw, VIRTIO_F_VERSION_1)) {
+			PMD_INIT_LOG(ERR,
+				"VIRTIO_F_VERSION_1 features is not enabled.");
+			return -1;
+		}
+		vhpci_set_status(hw, VIRTIO_CONFIG_STATUS_FEATURES_OK);
+		if (!(vhpci_get_status(hw) &
+				VIRTIO_CONFIG_STATUS_FEATURES_OK)) {
+			PMD_INIT_LOG(ERR,
+				"failed to set FEATURES_OK status!");
+			return -1;
+		}
+	}
+
+	hw->req_guest_features = req_features;
+
+	return 0;
+}
+
+static void
+vhostpci_dev_info_get(struct rte_eth_dev *dev,
+		struct rte_eth_dev_info *dev_info)
+{
+
+	struct vhostpci_hw *hw = dev->data->dev_private;
+
+	dev_info->speed_capa = ETH_LINK_SPEED_10G; /* fake value */
+
+	dev_info->pci_dev = dev->device ? RTE_ETH_DEV_TO_PCI(dev) : NULL;
+
+	dev_info->max_rx_queues =
+		RTE_MIN(hw->max_queue_pairs, VHOSTPCI_MAX_RX_QUEUES);
+	dev_info->max_tx_queues =
+		RTE_MIN(hw->max_queue_pairs, VHOSTPCI_MAX_TX_QUEUES);
+
+	dev_info->min_rx_bufsize = VHOSTPCI_MIN_RX_BUFSIZE;
+	dev_info->max_rx_pktlen = VHOSTPCI_MAX_RX_PKTLEN;
+	dev_info->max_mac_addrs = VHOSTPCI_MAX_MAC_ADDRS;
+
+	dev_info->rx_offload_capa = 0;
+	dev_info->tx_offload_capa = 0;
+
+}
+
+static int
+vhostpci_dev_configure(struct rte_eth_dev *dev __rte_unused)
+{
+	return 0;
+}
+
+static int
+vhostpci_dev_atomic_write_link_status(struct rte_eth_dev *dev,
+		struct rte_eth_link *link)
+{
+	struct rte_eth_link *dst = &(dev->data->dev_link);
+	struct rte_eth_link *src = link;
+
+	if (rte_atomic64_cmpset((uint64_t *)dst, *(uint64_t *)dst,
+			*(uint64_t *)src) == 0)
+		return -1;
+
+	return 0;
+}
+
+/* reset device and renegotiate features if needed */
+static int
+vhostpci_init_device(struct rte_eth_dev *eth_dev, uint64_t req_features)
+{
+	struct vhostpci_hw *hw = eth_dev->data->dev_private;
+	struct rte_pci_device *pci_dev;
+
+	/* To indicate we've noticed this device. */
+	vhpci_set_status(hw, VIRTIO_CONFIG_STATUS_ACK);
+
+	/* To indicate we've known how to drive the device. */
+	vhpci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER);
+	if (vhostpci_negotiate_features(hw, req_features) < 0)
+		return -1;
+
+	pci_dev = RTE_ETH_DEV_TO_PCI(eth_dev);
+	rte_eth_copy_pci_info(eth_dev, pci_dev);
+
+	/* Setting up rx_header size for the device, only support MRG header.*/
+	if (vhostpci_pci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF) ||
+	    vhostpci_pci_with_feature(hw, VIRTIO_F_VERSION_1))
+		hw->vtnet_hdr_size = sizeof(struct virtio_net_hdr_mrg_rxbuf);
+	else
+		return -1;
+
+	hw->vpnet->vhost_hlen = hw->vtnet_hdr_size;
+
+	/* Copy the permanent MAC address to: virtio_hw */
+	vhostpci_get_hwaddr(hw);
+	ether_addr_copy((struct ether_addr *)hw->mac_addr,
+			&eth_dev->data->mac_addrs[0]);
+	PMD_INIT_LOG(DEBUG,
+		     "PORT MAC: %02X:%02X:%02X:%02X:%02X:%02X",
+		     hw->mac_addr[0], hw->mac_addr[1], hw->mac_addr[2],
+		     hw->mac_addr[3], hw->mac_addr[4], hw->mac_addr[5]);
+
+	/* support 1 queue pairs by default */
+	hw->max_queue_pairs = VHOSTPCI_MAX_QUEUE_PAIRS;
+
+	vhpci_init_complete(hw);
+
+	if (pci_dev)
+		PMD_INIT_LOG(DEBUG, "port %d vendorID=0x%x deviceID=0x%x",
+			eth_dev->data->port_id, pci_dev->id.vendor_id,
+			pci_dev->id.device_id);
+
+	return 0;
+}
+
+static void
+vhostpci_free_queues(struct rte_eth_dev *dev)
+{
+	struct vhostpci_hw *hw = dev->data->dev_private;
+	uint32_t i;
+
+	for (i = 0; i < hw->max_queue_pairs; i++) {
+		if (dev->data->rx_queues[i] != NULL) {
+			rte_free(dev->data->rx_queues[i]);
+			dev->data->tx_queues[i] = NULL;
+		}
+
+		if (dev->data->tx_queues[i] != NULL) {
+			rte_free(dev->data->tx_queues[i]);
+			dev->data->tx_queues[i] = NULL;
+		}
+	}
+}
+
+static void
+vhostpci_dev_stop(struct rte_eth_dev *dev)
+{
+	struct rte_eth_link link;
+	struct vhostpci_hw *hw = dev->data->dev_private;
+	struct rte_intr_conf *intr_conf = &dev->data->dev_conf.intr_conf;
+
+	hw->started = 0;
+
+	if (intr_conf->lsc)
+		rte_intr_disable(dev->intr_handle);
+
+	memset(&link, 0, sizeof(link));
+	vhostpci_dev_atomic_write_link_status(dev, &link);
+}
+
+static int
+vhostpci_net_free(struct rte_eth_dev *dev)
+{
+	struct vhostpci_hw *hw = dev->data->dev_private;
+	struct vhostpci_net *vpnet = hw->vpnet;
+	struct vhostpci_virtqueue *vq;
+	int i;
+
+	if (vpnet == NULL)
+		return -1;
+
+	for (i = 0; i < VHOSTPCI_MAX_QUEUE_PAIRS * 2; i++) {
+		vq = vpnet->virtqueue[i];
+
+		if (vpnet->virtqueue[i] == NULL)
+			continue;
+
+		if (vq->shadow_used_ring != NULL) {
+			rte_free(vq->shadow_used_ring);
+			vq->shadow_used_ring = NULL;
+		}
+
+		if (vq->batch_copy_elems != NULL) {
+			rte_free(vq->batch_copy_elems);
+			vq->batch_copy_elems = NULL;
+		}
+
+		rte_free(vpnet->virtqueue[i]);
+		vpnet->virtqueue[i] = NULL;
+	}
+
+	rte_free(hw->vpnet);
+	hw->vpnet = NULL;
+
+	return 0;
+};
+
+static void
+vhostpci_dev_close(struct rte_eth_dev *dev)
+{
+	struct rte_intr_conf *intr_conf = &dev->data->dev_conf.intr_conf;
+
+	PMD_INIT_LOG(DEBUG, "hostpci_dev_close");
+
+	if (intr_conf->lsc)
+		rte_intr_disable(dev->intr_handle);
+
+	vhostpci_net_free(dev);
+
+	vhostpci_free_queues(dev);
+}
+
+static int
+vhostpci_net_allocate(struct rte_eth_dev *dev)
+{
+	struct vhostpci_hw *hw = dev->data->dev_private;
+	struct vring_used_elem *shadow_used_ring;
+	struct vhostpci_net *vpnet;
+	struct vhostpci_virtqueue *vq;
+	struct batch_copy_elem *batch_copy_elems;
+	int i;
+
+	vpnet = rte_zmalloc_socket(NULL,
+		sizeof(struct vhostpci_net), RTE_CACHE_LINE_SIZE,
+		dev->device->numa_node);
+	if (vpnet == NULL) {
+		rte_eth_dev_release_port(dev);
+		return -1;
+	}
+
+	hw->vpnet = vpnet;
+	for (i = 0; i < VHOSTPCI_MAX_QUEUE_PAIRS * 2; i++) {
+
+		vq = rte_zmalloc_socket(NULL, sizeof(*vq), RTE_CACHE_LINE_SIZE,
+			dev->device->numa_node);
+		if (vq == NULL) {
+			if (vpnet != NULL) {
+				vhostpci_net_free(dev);
+				rte_eth_dev_release_port(dev);
+				return -1;
+			}
+		}
+
+		vpnet->virtqueue[i] = vq;
+		vq->size = VHOSTPCI_NUM_DESCRIPTORS;
+
+		shadow_used_ring = rte_zmalloc_socket(NULL,
+				sizeof(struct vring_used_elem) * vq->size,
+				RTE_CACHE_LINE_SIZE, dev->device->numa_node);
+		if (shadow_used_ring == NULL) {
+			vhostpci_net_free(dev);
+			rte_eth_dev_release_port(dev);
+			return -1;
+		}
+		vq->shadow_used_ring = shadow_used_ring;
+
+		batch_copy_elems = rte_zmalloc_socket(NULL,
+			sizeof(struct batch_copy_elem) * vq->size,
+			RTE_CACHE_LINE_SIZE, dev->device->numa_node);
+		if (!batch_copy_elems) {
+			vhostpci_net_free(dev);
+			rte_eth_dev_release_port(dev);
+			return -1;
+		}
+		vq->batch_copy_elems = batch_copy_elems;
+	}
+
+	return 0;
+}
+
+static int
+eth_vhostpci_dev_init(struct rte_eth_dev *eth_dev)
+{
+	struct vhostpci_hw *hw = eth_dev->data->dev_private;
+	int ret;
+
+	eth_dev->dev_ops = &vhostpci_eth_dev_ops;
+
+	/* Allocate memory for storing MAC addresses */
+	eth_dev->data->mac_addrs = rte_zmalloc("vhostpci",
+			VHOSTPCI_MAX_MAC_ADDRS * ETHER_ADDR_LEN, 0);
+
+	if (eth_dev->data->mac_addrs == NULL) {
+		PMD_INIT_LOG(ERR,
+			"Failed to allocate %d bytes needed to store MAC "
+			"addresses",
+			1 * ETHER_ADDR_LEN);
+
+		return -ENOMEM;
+	}
+
+	hw->port_id = eth_dev->data->port_id;
+	ret = vhostpci_pci_init(RTE_ETH_DEV_TO_PCI(eth_dev), hw);
+	if (ret)
+		return ret;
+
+	ret = vhostpci_net_allocate(eth_dev);
+	if (ret)
+		return ret;
+
+	ret = vhostpci_init_device(eth_dev,
+			VHOSTPCI_PMD_DEFAULT_GUEST_FEATURES);
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
+
+static int
+eth_vhostpci_dev_uninit(struct rte_eth_dev *eth_dev)
+{
+	if (rte_eal_process_type() == RTE_PROC_SECONDARY)
+		return -EPERM;
+
+	vhostpci_dev_stop(eth_dev);
+	vhostpci_dev_close(eth_dev);
+
+	eth_dev->dev_ops = NULL;
+	eth_dev->tx_pkt_burst = NULL;
+	eth_dev->rx_pkt_burst = NULL;
+
+	rte_free(eth_dev->data->mac_addrs);
+	eth_dev->data->mac_addrs = NULL;
+
+	if (eth_dev->device != NULL)
+		rte_pci_unmap_device(RTE_ETH_DEV_TO_PCI(eth_dev));
+
+	PMD_INIT_LOG(DEBUG, "dev_uninit completed");
+
+	return 0;
+}
+
+static int
+eth_vhostpci_probe(struct rte_pci_driver *pci_drv __rte_unused,
+		   struct rte_pci_device *pci_dev)
+{
+	return rte_eth_dev_pci_generic_probe(pci_dev,
+			sizeof(struct vhostpci_hw), eth_vhostpci_dev_init);
+}
+
+static int
+eth_vhostpci_remove(struct rte_pci_device *pci_dev)
+{
+	return rte_eth_dev_pci_generic_remove(pci_dev,
+			eth_vhostpci_dev_uninit);
+}
+
+/**
+ * The set of PCI devices this driver supports
+ */
+const struct rte_pci_id pci_id_vhostpci_map[] = {
+	{ RTE_PCI_DEVICE(VHOST_PCI_VENDORID, VHOST_PCI_NET_MODERN_DEVICEID) },
+	{ .vendor_id = 0, /* sentinel */ },
+};
+
+static struct rte_pci_driver rte_vhostpci_pmd = {
+	.driver = {
+		.name = "net_vhostpci",
+	},
+	.id_table = pci_id_vhostpci_map,
+	.drv_flags = 0,
+	.probe = eth_vhostpci_probe,
+	.remove = eth_vhostpci_remove,
+};
+
+RTE_PMD_REGISTER_PCI(net_vhostpci, rte_vhostpci_pmd);
+RTE_PMD_REGISTER_PCI_TABLE(net_vhostpci, pci_id_vhostpci_map);
diff --git a/drivers/net/vhostpci/vhostpci_pci.c b/drivers/net/vhostpci/vhostpci_pci.c
new file mode 100644
index 000000000..9ec7ee6e6
--- /dev/null
+++ b/drivers/net/vhostpci/vhostpci_pci.c
@@ -0,0 +1,334 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_io.h>
+
+#include "vhostpci_pci.h"
+#include "vhostpci_logs.h"
+
+static void
+modern_read_dev_config(struct vhostpci_hw *hw, size_t offset,
+		       void *dst, int length);
+
+static void
+modern_write_dev_config(struct vhostpci_hw *hw, size_t offset,
+			const void *src, int length);
+
+static uint8_t
+modern_get_status(struct vhostpci_hw *hw);
+
+static void
+modern_set_status(struct vhostpci_hw *hw, uint8_t status);
+
+static uint64_t
+modern_get_features(struct vhostpci_hw *hw);
+
+static void
+modern_set_features(struct vhostpci_hw *hw, uint64_t features);
+
+static uint8_t
+modern_get_isr(struct vhostpci_hw *hw);
+
+const struct vpnet_pci_ops vpnet_modern_ops = {
+	.read_dev_cfg	= modern_read_dev_config,
+	.write_dev_cfg	= modern_write_dev_config,
+	.get_status	= modern_get_status,
+	.set_status	= modern_set_status,
+	.get_features	= modern_get_features,
+	.set_features	= modern_set_features,
+	.get_isr	= modern_get_isr,
+};
+
+struct vhostpci_hw_internal vhostpci_hw_internal[RTE_MAX_ETHPORTS];
+
+static void
+modern_read_dev_config(struct vhostpci_hw *hw, size_t offset,
+		       void *dst, int length)
+{
+	int i;
+	uint8_t *p;
+	uint8_t old_gen, new_gen;
+
+	do {
+		old_gen = rte_read8(&hw->common_cfg->config_generation);
+
+		p = dst;
+		for (i = 0;  i < length; i++)
+			*p++ = rte_read8((uint8_t *)hw->dev_cfg + offset + i);
+
+		new_gen = rte_read8(&hw->common_cfg->config_generation);
+	} while (old_gen != new_gen);
+
+}
+
+static void
+modern_write_dev_config(struct vhostpci_hw *hw, size_t offset,
+			const void *src, int length)
+{
+	int i;
+	const uint8_t *p = src;
+
+	for (i = 0;  i < length; i++)
+		rte_write8((*p++), (((uint8_t *)hw->dev_cfg) + offset + i));
+}
+
+static uint8_t
+modern_get_status(struct vhostpci_hw *hw)
+{
+	return rte_read8(&hw->common_cfg->device_status);
+}
+
+static void
+modern_set_status(struct vhostpci_hw *hw, uint8_t status)
+{
+	rte_write8(status, &hw->common_cfg->device_status);
+}
+
+static uint64_t
+modern_get_features(struct vhostpci_hw *hw)
+{
+	uint32_t features_lo, features_hi;
+
+	rte_write32(0, &hw->common_cfg->device_feature_select);
+	features_lo = rte_read32(&hw->common_cfg->device_feature);
+
+	rte_write32(1, &hw->common_cfg->device_feature_select);
+	features_hi = rte_read32(&hw->common_cfg->device_feature);
+
+	return ((uint64_t)features_hi << 32) | features_lo;
+}
+
+static void
+modern_set_features(struct vhostpci_hw *hw, uint64_t features)
+{
+	rte_write32(0, &hw->common_cfg->guest_feature_select);
+	rte_write32(features & ((1ULL << 32) - 1),
+		    &hw->common_cfg->guest_feature);
+
+	rte_write32(1, &hw->common_cfg->guest_feature_select);
+	rte_write32(features >> 32,
+		    &hw->common_cfg->guest_feature);
+}
+
+static uint8_t
+modern_get_isr(struct vhostpci_hw *hw)
+{
+	return rte_read8(hw->isr);
+}
+
+uint64_t
+vhpci_negotiate_features(struct vhostpci_hw *hw, uint64_t host_features)
+{
+	uint64_t features;
+
+	/**
+	 * Limit negotiated features to what the driver, virtqueue, and
+	 * host all support.
+	 */
+	features = host_features & hw->guest_features;
+	VTPCI_OPS(hw)->set_features(hw, features);
+
+	return features;
+}
+
+uint8_t
+vhpci_isr(struct vhostpci_hw *hw)
+{
+	return VTPCI_OPS(hw)->get_isr(hw);
+};
+
+void
+vhpci_set_status(struct vhostpci_hw *hw, uint8_t status)
+{
+	if (status != VIRTIO_CONFIG_STATUS_RESET)
+		status |= VTPCI_OPS(hw)->get_status(hw);
+
+	VTPCI_OPS(hw)->set_status(hw, status);
+}
+
+uint8_t
+vhpci_get_status(struct vhostpci_hw *hw)
+{
+	return VTPCI_OPS(hw)->get_status(hw);
+}
+
+void
+vhpci_init_complete(struct vhostpci_hw *hw)
+{
+	vhpci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER_OK);
+}
+
+void
+vhpci_read_dev_config(struct vhostpci_hw *hw, size_t offset,
+		      void *dst, int length)
+{
+	VTPCI_OPS(hw)->read_dev_cfg(hw, offset, dst, length);
+}
+
+static void *
+get_cfg_addr(struct rte_pci_device *dev, struct vpnet_pci_cap *cap)
+{
+	uint8_t bar     = cap->bar;
+	uint32_t length = cap->length;
+	uint32_t offset = cap->offset;
+	uint8_t *base;
+
+	if (bar >= PCI_MAX_RESOURCE) {
+		PMD_INIT_LOG(ERR, "invalid bar: %u", bar);
+		return NULL;
+	}
+
+	if (offset + length < offset) {
+		PMD_INIT_LOG(ERR, "offset(%u) + length(%u) overflows",
+			offset, length);
+		return NULL;
+	}
+
+	if (offset + length > dev->mem_resource[bar].len) {
+
+		PMD_INIT_LOG(ERR,
+			"invalid cap: overflows bar space: %u > %" PRIu64,
+			offset + length, dev->mem_resource[bar].len);
+
+		return NULL;
+	}
+
+	base = dev->mem_resource[bar].addr;
+	if (base == NULL) {
+		PMD_INIT_LOG(ERR, "bar %u base addr is NULL", bar);
+		return NULL;
+	}
+
+	return base + offset;
+}
+
+/**
+ * Following macros are derived from linux/pci_regs.h, however,
+ * we can't simply include that header here, as there is no such
+ * file for non-Linux platform.
+ */
+#define PCI_CAPABILITY_LIST	0x34
+#define PCI_CAP_ID_VNDR		0x09
+#define PCI_CAP_ID_MSIX		0x11
+
+static int
+vhostpci_pci_read_caps(struct rte_pci_device *dev, struct vhostpci_hw *hw)
+{
+	uint8_t pos;
+	struct vpnet_pci_cap cap;
+	int ret;
+
+	if (rte_pci_map_device(dev)) {
+		PMD_INIT_LOG(DEBUG, "failed to map pci device!");
+		return -1;
+	}
+
+	ret = rte_pci_read_config(dev, &pos, 1, PCI_CAPABILITY_LIST);
+	if (ret < 0) {
+		PMD_INIT_LOG(DEBUG, "failed to read pci capability list");
+		return -1;
+	}
+
+	while (pos) {
+		ret = rte_pci_read_config(dev, &cap, sizeof(cap), pos);
+		if (ret < 0) {
+			PMD_INIT_LOG(ERR,
+				"failed to read pci cap at pos: %x", pos);
+			break;
+		}
+
+		if (cap.cap_vndr != PCI_CAP_ID_VNDR) {
+
+			PMD_INIT_LOG(DEBUG,
+				"[%2x] skipping non VNDR cap id: %02x",
+				pos, cap.cap_vndr);
+
+			goto next;
+		}
+
+		PMD_INIT_LOG(DEBUG,
+			"[%2x] cfg type: %u, bar: %u, offset: %04x, len: %u",
+			pos, cap.cfg_type, cap.bar, cap.offset, cap.length);
+
+		switch (cap.cfg_type) {
+		case VIRTIO_PCI_CAP_COMMON_CFG:
+			hw->common_cfg = get_cfg_addr(dev, &cap);
+			break;
+		case VIRTIO_PCI_CAP_NOTIFY_CFG:
+			rte_pci_read_config(dev, &hw->notify_off_multiplier,
+					4, pos + sizeof(cap));
+			hw->notify_base = get_cfg_addr(dev, &cap);
+			break;
+		case VIRTIO_PCI_CAP_DEVICE_CFG:
+			hw->dev_cfg = get_cfg_addr(dev, &cap);
+			break;
+		case VIRTIO_PCI_CAP_ISR_CFG:
+			hw->isr = get_cfg_addr(dev, &cap);
+			break;
+		}
+
+next:
+		pos = cap.cap_next;
+	}
+
+	if (hw->common_cfg == NULL || hw->notify_base == NULL ||
+	    hw->dev_cfg == NULL    || hw->isr == NULL) {
+		PMD_INIT_LOG(INFO, "no modern virtio pci device found.");
+		return -1;
+	}
+
+	PMD_INIT_LOG(INFO, "found modern virtio pci device.");
+
+	PMD_INIT_LOG(DEBUG, "common cfg mapped at: %p", hw->common_cfg);
+	PMD_INIT_LOG(DEBUG, "device cfg mapped at: %p", hw->dev_cfg);
+	PMD_INIT_LOG(DEBUG, "isr cfg mapped at: %p", hw->isr);
+	PMD_INIT_LOG(DEBUG, "notify base: %p, notify off multiplier: %u",
+		hw->notify_base, hw->notify_off_multiplier);
+
+	return 0;
+}
+
+int
+vhostpci_pci_init(struct rte_pci_device *dev, struct vhostpci_hw *hw)
+{
+
+	if (vhostpci_pci_read_caps(dev, hw) == 0) {
+		PMD_INIT_LOG(INFO, "modern vhostpci device is detected.");
+		vhostpci_hw_internal[hw->port_id].vtpci_ops = &vpnet_modern_ops;
+		hw->modern = 1;
+		return 0;
+	} else {
+		hw->modern = 0;
+		return -1;
+	}
+}
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 6a6a7452e..1c8b8a202 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -160,6 +160,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD)     += -lrte_pmd_virtio
 ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST)      += -lrte_pmd_vhost
 endif # $(CONFIG_RTE_LIBRTE_VHOST)
+_LDLIBS-$(CONFIG_RTE_LIBRTE_VHOSTPCI_PMD)   += -lrte_pmd_vhostpci
 _LDLIBS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD)    += -lrte_pmd_vmxnet3_uio
 
 ifeq ($(CONFIG_RTE_LIBRTE_CRYPTODEV),y)
-- 
2.13.3

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [dpdk-dev] [PATCH 05/11] net/vhostpci: add queue setup
  2017-11-30  9:46 [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting VM2VM scenario Zhiyong Yang
                   ` (3 preceding siblings ...)
  2017-11-30  9:46 ` [dpdk-dev] [PATCH 04/11] net/vhostpci: add basic framework Zhiyong Yang
@ 2017-11-30  9:46 ` Zhiyong Yang
  2017-11-30  9:46 ` [dpdk-dev] [PATCH 06/11] net/vhostpci: add support for link status change Zhiyong Yang
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Zhiyong Yang @ 2017-11-30  9:46 UTC (permalink / raw)
  To: dev, yliu, maxime.coquelin; +Cc: wei.w.wang, jianfeng.tan, Zhiyong Yang

add the functions for setup the RX/TX queue.

Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
---
 drivers/net/vhostpci/vhostpci_ethdev.c | 63 ++++++++++++++++++++++++++++++++++
 1 file changed, 63 insertions(+)

diff --git a/drivers/net/vhostpci/vhostpci_ethdev.c b/drivers/net/vhostpci/vhostpci_ethdev.c
index 873ff7482..068c19b2b 100644
--- a/drivers/net/vhostpci/vhostpci_ethdev.c
+++ b/drivers/net/vhostpci/vhostpci_ethdev.c
@@ -67,6 +67,19 @@ vhostpci_dev_atomic_write_link_status(struct rte_eth_dev *dev,
 		struct rte_eth_link *link);
 
 static int
+vhostpci_dev_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
+		   uint16_t nb_rx_desc __rte_unused,
+		   unsigned int socket_id,
+		   const struct rte_eth_rxconf *rx_conf __rte_unused,
+		   struct rte_mempool *mb_pool);
+
+static int
+vhostpci_dev_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
+		   uint16_t nb_tx_desc __rte_unused,
+		   unsigned int socket_id,
+		   const struct rte_eth_txconf *tx_conf __rte_unused);
+
+static int
 vhostpci_init_device(struct rte_eth_dev *eth_dev, uint64_t req_features);
 
 static int
@@ -87,6 +100,8 @@ static const struct eth_dev_ops vhostpci_eth_dev_ops = {
 	.dev_close               = vhostpci_dev_close,
 	.dev_infos_get		 = vhostpci_dev_info_get,
 	.dev_configure		 = vhostpci_dev_configure,
+	.rx_queue_setup		 = vhostpci_dev_rx_queue_setup,
+	.tx_queue_setup		 = vhostpci_dev_tx_queue_setup,
 };
 
 static inline bool
@@ -233,6 +248,54 @@ vhostpci_dev_configure(struct rte_eth_dev *dev __rte_unused)
 }
 
 static int
+vhostpci_dev_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
+		uint16_t nb_rx_desc __rte_unused,
+		unsigned int socket_id,
+		const struct rte_eth_rxconf *rx_conf __rte_unused,
+		struct rte_mempool *mb_pool)
+{
+	struct vhostpci_queue *vq;
+	struct vhostpci_hw *hw = dev->data->dev_private;
+
+	vq = rte_zmalloc_socket(NULL, sizeof(struct vhostpci_queue),
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (vq == NULL) {
+		RTE_LOG(ERR, PMD, "Failed to allocate memory for rx queue\n");
+		return -ENOMEM;
+	}
+
+	vq->mb_pool = mb_pool;
+	vq->virtqueue_id = rx_queue_id * VTNET_QNUM + VTNET_TXQ;
+	vq->vpnet = hw->vpnet;
+	dev->data->rx_queues[rx_queue_id] = vq;
+
+	return 0;
+}
+
+static int
+vhostpci_dev_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
+		uint16_t nb_tx_desc __rte_unused,
+		unsigned int socket_id,
+		const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+	struct vhostpci_queue *vq;
+	struct vhostpci_hw *hw = dev->data->dev_private;
+
+	vq = rte_zmalloc_socket(NULL, sizeof(struct vhostpci_queue),
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (vq == NULL) {
+		RTE_LOG(ERR, PMD, "Failed to allocate memory for tx queue\n");
+		return -ENOMEM;
+	}
+
+	vq->virtqueue_id = tx_queue_id * VTNET_QNUM  + VTNET_RXQ;
+	vq->vpnet = hw->vpnet;
+	dev->data->tx_queues[tx_queue_id] = vq;
+
+	return 0;
+}
+
+static int
 vhostpci_dev_atomic_write_link_status(struct rte_eth_dev *dev,
 		struct rte_eth_link *link)
 {
-- 
2.13.3

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [dpdk-dev] [PATCH 06/11] net/vhostpci: add support for link status change
  2017-11-30  9:46 [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting VM2VM scenario Zhiyong Yang
                   ` (4 preceding siblings ...)
  2017-11-30  9:46 ` [dpdk-dev] [PATCH 05/11] net/vhostpci: add queue setup Zhiyong Yang
@ 2017-11-30  9:46 ` Zhiyong Yang
  2017-11-30  9:46 ` [dpdk-dev] [PATCH 07/11] net/vhostpci: get remote memory region and vring info Zhiyong Yang
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Zhiyong Yang @ 2017-11-30  9:46 UTC (permalink / raw)
  To: dev, yliu, maxime.coquelin; +Cc: wei.w.wang, jianfeng.tan, Zhiyong Yang

Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
---
 drivers/net/vhostpci/vhostpci_ethdev.c | 102 +++++++++++++++++++++++++++++++++
 1 file changed, 102 insertions(+)

diff --git a/drivers/net/vhostpci/vhostpci_ethdev.c b/drivers/net/vhostpci/vhostpci_ethdev.c
index 068c19b2b..76353930a 100644
--- a/drivers/net/vhostpci/vhostpci_ethdev.c
+++ b/drivers/net/vhostpci/vhostpci_ethdev.c
@@ -54,6 +54,9 @@ vhostpci_dev_info_get(struct rte_eth_dev *dev,
 		struct rte_eth_dev_info *dev_info);
 
 static void
+vhostpci_interrupt_handler(void *param);
+
+static void
 vhostpci_get_hwaddr(struct vhostpci_hw *hw);
 
 static int
@@ -63,6 +66,10 @@ static int
 eth_vhostpci_dev_init(struct rte_eth_dev *eth_dev);
 
 static int
+vhostpci_dev_atomic_read_link_status(struct rte_eth_dev *dev,
+		struct rte_eth_link *link);
+
+static int
 vhostpci_dev_atomic_write_link_status(struct rte_eth_dev *dev,
 		struct rte_eth_link *link);
 
@@ -85,6 +92,10 @@ vhostpci_init_device(struct rte_eth_dev *eth_dev, uint64_t req_features);
 static int
 vhostpci_dev_start(struct rte_eth_dev *dev);
 
+static int
+vhostpci_dev_link_update(struct rte_eth_dev *dev,
+			 __rte_unused int wait_to_complete);
+
 static void
 update_queuing_status(struct rte_eth_dev *dev);
 
@@ -100,6 +111,7 @@ static const struct eth_dev_ops vhostpci_eth_dev_ops = {
 	.dev_close               = vhostpci_dev_close,
 	.dev_infos_get		 = vhostpci_dev_info_get,
 	.dev_configure		 = vhostpci_dev_configure,
+	.link_update             = vhostpci_dev_link_update,
 	.rx_queue_setup		 = vhostpci_dev_rx_queue_setup,
 	.tx_queue_setup		 = vhostpci_dev_tx_queue_setup,
 };
@@ -296,6 +308,20 @@ vhostpci_dev_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
 }
 
 static int
+vhostpci_dev_atomic_read_link_status(struct rte_eth_dev *dev,
+		struct rte_eth_link *link)
+{
+	struct rte_eth_link *dst = link;
+	struct rte_eth_link *src = &(dev->data->dev_link);
+
+	if (rte_atomic64_cmpset((uint64_t *)dst, *(uint64_t *)dst,
+			*(uint64_t *)src) == 0)
+		return -1;
+
+	return 0;
+}
+
+static int
 vhostpci_dev_atomic_write_link_status(struct rte_eth_dev *dev,
 		struct rte_eth_link *link)
 {
@@ -309,6 +335,70 @@ vhostpci_dev_atomic_write_link_status(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static int
+vhostpci_dev_link_update(struct rte_eth_dev *dev,
+		int wait_to_complete __rte_unused)
+{
+	struct rte_eth_link link, old;
+	uint16_t status;
+	struct vhostpci_hw *hw = dev->data->dev_private;
+
+	memset(&link, 0, sizeof(link));
+	vhostpci_dev_atomic_read_link_status(dev, &link);
+	old = link;
+	link.link_duplex = ETH_LINK_FULL_DUPLEX;
+	link.link_speed  = ETH_SPEED_NUM_10G;
+
+	if (hw->started == 0) {
+		link.link_status = ETH_LINK_DOWN;
+	} else {
+		PMD_INIT_LOG(DEBUG, "Get link status from hw");
+		vhpci_read_dev_config(hw,
+				offsetof(struct vpnet_pci_config, status),
+				&status, sizeof(status));
+		if ((status & VHOSTPCI_NET_S_LINK_UP) == 0) {
+			link.link_status = ETH_LINK_DOWN;
+			PMD_INIT_LOG(DEBUG, "Port %d is down",
+				     dev->data->port_id);
+
+		} else {
+			link.link_status = ETH_LINK_UP;
+			PMD_INIT_LOG(DEBUG, "Port %d is up",
+				     dev->data->port_id);
+		}
+	}
+
+	vhostpci_dev_atomic_write_link_status(dev, &link);
+
+	return (old.link_status == link.link_status) ? -1 : 0;
+}
+
+/**
+ * Process vhostpci Config changed interrupt and call the callback
+ * if link state changed.
+ */
+static void
+vhostpci_interrupt_handler(void *param)
+{
+	struct rte_eth_dev *dev = param;
+	struct vhostpci_hw *hw = dev->data->dev_private;
+	uint8_t isr;
+
+	/* Read interrupt status which clears interrupt */
+	isr = vhpci_isr(hw);
+	PMD_DRV_LOG(INFO, "interrupt status = %x", isr);
+
+	if (rte_intr_enable(dev->intr_handle) < 0)
+		PMD_DRV_LOG(ERR, "interrupt enable failed");
+
+	if (isr & VIRTIO_PCI_ISR_CONFIG) {
+		if (vhostpci_dev_link_update(dev, 0) == 0)
+			_rte_eth_dev_callback_process(dev,
+				RTE_ETH_EVENT_INTR_LSC, NULL, NULL);
+	}
+
+}
+
 /* reset device and renegotiate features if needed */
 static int
 vhostpci_init_device(struct rte_eth_dev *eth_dev, uint64_t req_features)
@@ -538,6 +628,13 @@ eth_vhostpci_dev_init(struct rte_eth_dev *eth_dev)
 	if (ret < 0)
 		return ret;
 
+	eth_dev->data->dev_flags |= RTE_ETH_DEV_INTR_LSC; /* by default */
+
+	/* Register and Setup interrupt callback  */
+	if (eth_dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC)
+		rte_intr_callback_register(eth_dev->intr_handle,
+			vhostpci_interrupt_handler, eth_dev);
+
 	return 0;
 }
 
@@ -557,6 +654,11 @@ eth_vhostpci_dev_uninit(struct rte_eth_dev *eth_dev)
 	rte_free(eth_dev->data->mac_addrs);
 	eth_dev->data->mac_addrs = NULL;
 
+	/* reset interrupt callback  */
+	if (eth_dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC)
+		rte_intr_callback_unregister(eth_dev->intr_handle,
+			vhostpci_interrupt_handler, eth_dev);
+
 	if (eth_dev->device != NULL)
 		rte_pci_unmap_device(RTE_ETH_DEV_TO_PCI(eth_dev));
 
-- 
2.13.3

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [dpdk-dev] [PATCH 07/11] net/vhostpci: get remote memory region and vring info
  2017-11-30  9:46 [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting VM2VM scenario Zhiyong Yang
                   ` (5 preceding siblings ...)
  2017-11-30  9:46 ` [dpdk-dev] [PATCH 06/11] net/vhostpci: add support for link status change Zhiyong Yang
@ 2017-11-30  9:46 ` Zhiyong Yang
  2017-11-30  9:46 ` [dpdk-dev] [PATCH 08/11] net/vhostpci: add RX function Zhiyong Yang
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Zhiyong Yang @ 2017-11-30  9:46 UTC (permalink / raw)
  To: dev, yliu, maxime.coquelin; +Cc: wei.w.wang, jianfeng.tan, Zhiyong Yang

Linking up status is triggered to indicate that remote memory regions
and vring info have been ready, Vhostpci PMD can get them.

Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
---
 drivers/net/vhostpci/vhostpci_ethdev.c | 81 ++++++++++++++++++++++++++++++++++
 1 file changed, 81 insertions(+)

diff --git a/drivers/net/vhostpci/vhostpci_ethdev.c b/drivers/net/vhostpci/vhostpci_ethdev.c
index 76353930a..0582f73b7 100644
--- a/drivers/net/vhostpci/vhostpci_ethdev.c
+++ b/drivers/net/vhostpci/vhostpci_ethdev.c
@@ -73,6 +73,9 @@ static int
 vhostpci_dev_atomic_write_link_status(struct rte_eth_dev *dev,
 		struct rte_eth_link *link);
 
+static inline uint64_t
+remote_gpa_to_vva(struct vhostpci_net *vpnet, uint64_t remote_gpa);
+
 static int
 vhostpci_dev_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
 		   uint16_t nb_rx_desc __rte_unused,
@@ -105,6 +108,9 @@ vhostpci_dev_close(struct rte_eth_dev *dev);
 static void
 vhostpci_dev_stop(struct rte_eth_dev *dev);
 
+static int
+vhostpci_get_remote_mem(struct rte_eth_dev *dev);
+
 static const struct eth_dev_ops vhostpci_eth_dev_ops = {
 	.dev_start               = vhostpci_dev_start,
 	.dev_stop                = vhostpci_dev_stop,
@@ -335,6 +341,71 @@ vhostpci_dev_atomic_write_link_status(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static inline uint64_t
+remote_gpa_to_vva(struct vhostpci_net *vpnet, uint64_t remote_gpa)
+{
+	uint64_t mem_base = vpnet->mem_base;
+	uint32_t i, nregions = vpnet->mem.nregions;
+	struct vhostpci_mem_region *regions = vpnet->mem.regions;
+
+	for (i = 0; i < nregions; i++) {
+		if (remote_gpa > regions[i].start &&
+			remote_gpa < regions[i].end)
+
+			return (remote_gpa - regions[i].start
+				+ regions[i].offset + mem_base);
+	}
+
+	return 0;
+}
+
+static int
+vhostpci_get_remote_mem(struct rte_eth_dev *dev)
+{
+	struct vpnet_metadata *metadata;
+	struct rte_mem_resource *mem_resource;
+	struct rte_pci_device *pci_device = RTE_ETH_DEV_TO_PCI(dev);
+	struct vhostpci_mem_region *regions;
+	struct vpnet_remote_vq *vq;
+	struct vhostpci_net *vpnet;
+	struct vhostpci_virtqueue *virtqueue[VHOSTPCI_MAX_QUEUE_PAIRS * 2];
+	struct vhostpci_hw *hw = dev->data->dev_private;
+	uint64_t offset = 0;
+	uint32_t i;
+
+	vpnet = hw->vpnet;
+	mem_resource = pci_device->mem_resource;
+	metadata = mem_resource[REMOTE_MEM_BAR_ID].addr;
+
+	vpnet->mem_base = (uint64_t)metadata + METADATA_SIZE;
+	vpnet->mem.nregions = metadata->nregions;
+	vpnet->nr_vring = metadata->nvqs;
+	regions = vpnet->mem.regions;
+
+	for (i = 0; i < metadata->nregions; i++) {
+		regions[i].guest_phys_addr = metadata->mem[i].gpa;
+		regions[i].size = metadata->mem[i].size;
+		regions[i].start = metadata->mem[i].gpa;
+		regions[i].end = metadata->mem[i].gpa + metadata->mem[i].size;
+		regions[i].offset = offset;
+		offset += regions[i].size;
+	}
+
+	vq = metadata->vq;
+	for (i = 0; i < vpnet->nr_vring; i++) {
+		virtqueue[i] = vpnet->virtqueue[i];
+		virtqueue[i]->desc  = (struct vring_desc *)remote_gpa_to_vva(vpnet, vq[i].desc_gpa);
+		virtqueue[i]->avail = (struct vring_avail *)remote_gpa_to_vva(vpnet, vq[i].avail_gpa);
+		virtqueue[i]->used  = (struct vring_used *)remote_gpa_to_vva(vpnet, vq[i].used_gpa);
+		virtqueue[i]->last_avail_idx = vq[i].last_avail_idx;
+		virtqueue[i]->enabled = vq[i].vring_enabled;
+		virtqueue[i]->last_used_idx = 0;
+		virtqueue[i]->size  = VHOSTPCI_NUM_DESCRIPTORS;
+	}
+
+	return 0;
+}
+
 static int
 vhostpci_dev_link_update(struct rte_eth_dev *dev,
 		int wait_to_complete __rte_unused)
@@ -362,9 +433,19 @@ vhostpci_dev_link_update(struct rte_eth_dev *dev,
 				     dev->data->port_id);
 
 		} else {
+			int ret;
+
 			link.link_status = ETH_LINK_UP;
 			PMD_INIT_LOG(DEBUG, "Port %d is up",
 				     dev->data->port_id);
+
+			/* get the remote guest memory region and vring info */
+			vhostpci_get_remote_mem(dev);
+
+			ret = vhostpci_init_device(dev,
+				VHOSTPCI_PMD_DEFAULT_GUEST_FEATURES);
+			if (ret < 0)
+				return ret;
 		}
 	}
 
-- 
2.13.3

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [dpdk-dev] [PATCH 08/11] net/vhostpci: add RX function
  2017-11-30  9:46 [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting VM2VM scenario Zhiyong Yang
                   ` (6 preceding siblings ...)
  2017-11-30  9:46 ` [dpdk-dev] [PATCH 07/11] net/vhostpci: get remote memory region and vring info Zhiyong Yang
@ 2017-11-30  9:46 ` Zhiyong Yang
  2017-11-30  9:46 ` [dpdk-dev] [PATCH 09/11] net/vhostpci: add TX function Zhiyong Yang
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Zhiyong Yang @ 2017-11-30  9:46 UTC (permalink / raw)
  To: dev, yliu, maxime.coquelin; +Cc: wei.w.wang, jianfeng.tan, Zhiyong Yang

Add the functions to support receiving packets.

Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
---
 drivers/net/vhostpci/vhostpci_ethdev.c | 311 +++++++++++++++++++++++++++++++++
 1 file changed, 311 insertions(+)

diff --git a/drivers/net/vhostpci/vhostpci_ethdev.c b/drivers/net/vhostpci/vhostpci_ethdev.c
index 0582f73b7..06e3f5c50 100644
--- a/drivers/net/vhostpci/vhostpci_ethdev.c
+++ b/drivers/net/vhostpci/vhostpci_ethdev.c
@@ -49,6 +49,10 @@
 #include "vhostpci_logs.h"
 #include "vhostpci_ethdev.h"
 
+#define MAX_BATCH_LEN 256
+#define VHOSTPCI_MAX_PKT_BURST 32
+#define VHOSTPCI_BUF_VECTOR_MAX 256
+
 static void
 vhostpci_dev_info_get(struct rte_eth_dev *dev,
 		struct rte_eth_dev_info *dev_info);
@@ -92,6 +96,10 @@ vhostpci_dev_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
 static int
 vhostpci_init_device(struct rte_eth_dev *eth_dev, uint64_t req_features);
 
+static uint16_t
+vhostpci_dequeue_burst(struct vhostpci_net *dev, uint16_t queue_id,
+	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count);
+
 static int
 vhostpci_dev_start(struct rte_eth_dev *dev);
 
@@ -313,6 +321,308 @@ vhostpci_dev_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
 	return 0;
 }
 
+static __rte_always_inline void
+update_used_ring(struct vhostpci_virtqueue *vq,
+		 uint32_t used_idx, uint32_t desc_idx)
+{
+	vq->used->ring[used_idx].id  = desc_idx;
+	vq->used->ring[used_idx].len = 0;
+}
+
+static __rte_always_inline int
+copy_desc_to_mbuf(struct vhostpci_net *dev, struct vhostpci_virtqueue *vq,
+		  struct vring_desc *descs, uint16_t max_desc,
+		  struct rte_mbuf *m, uint16_t desc_idx,
+		  struct rte_mempool *mbuf_pool)
+{
+	struct vring_desc *desc;
+	uint64_t desc_addr;
+	uint32_t desc_avail, desc_offset;
+	uint32_t mbuf_avail, mbuf_offset;
+	uint32_t cpy_len;
+	struct rte_mbuf *cur = m, *prev = m;
+	/* A counter to avoid desc dead loop chain */
+	uint32_t nr_desc = 1;
+	struct batch_copy_elem *batch_copy = vq->batch_copy_elems;
+	uint16_t copy_nb = vq->batch_copy_nb_elems;
+	int error = 0;
+
+	desc = &descs[desc_idx];
+	if (unlikely(desc->len < dev->vhost_hlen)) {
+		error = -1;
+		goto out;
+	}
+
+	desc_addr = remote_gpa_to_vva(dev, desc->addr);
+
+	if (unlikely(!desc_addr)) {
+		error = -1;
+		goto out;
+	}
+
+	/**
+	 * A virtio driver normally uses at least 2 desc buffers
+	 * for Tx: the first for storing the header, and others
+	 * for storing the data.
+	 */
+	if (likely((desc->len == dev->vhost_hlen) &&
+		   (desc->flags & VRING_DESC_F_NEXT) != 0)) {
+		desc = &descs[desc->next];
+		if (unlikely(desc->flags & VRING_DESC_F_INDIRECT)) {
+			error = -1;
+			goto out;
+		}
+
+		desc_addr = remote_gpa_to_vva(dev, desc->addr);
+		if (unlikely(!desc_addr)) {
+			error = -1;
+			goto out;
+		}
+
+		desc_offset = 0;
+		desc_avail  = desc->len;
+		nr_desc    += 1;
+	} else {
+		desc_avail  = desc->len - dev->vhost_hlen;
+		desc_offset = dev->vhost_hlen;
+	}
+
+	rte_prefetch0((void *)(uintptr_t)(desc_addr + desc_offset));
+
+	mbuf_offset = 0;
+	mbuf_avail  = m->buf_len - RTE_PKTMBUF_HEADROOM;
+	while (1) {
+		cpy_len = RTE_MIN(desc_avail, mbuf_avail);
+		if (likely(cpy_len > MAX_BATCH_LEN ||
+			   copy_nb >= vq->size ||
+			   (cur == m))) {
+			rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *,
+				mbuf_offset), (void *)((uintptr_t)(desc_addr +
+				desc_offset)), cpy_len);
+		} else {
+			batch_copy[copy_nb].dst =
+				rte_pktmbuf_mtod_offset(cur, void *,
+								mbuf_offset);
+			batch_copy[copy_nb].src =
+				(void *)((uintptr_t)(desc_addr +
+						     desc_offset));
+			batch_copy[copy_nb].len = cpy_len;
+			copy_nb++;
+		}
+
+		mbuf_avail  -= cpy_len;
+		mbuf_offset += cpy_len;
+		desc_avail  -= cpy_len;
+		desc_offset += cpy_len;
+
+		/* This desc reaches to its end, get the next one */
+		if (desc_avail == 0) {
+			if ((desc->flags & VRING_DESC_F_NEXT) == 0)
+				break;
+
+			if (unlikely(desc->next >= max_desc ||
+				     ++nr_desc > max_desc)) {
+				error = -1;
+				goto out;
+			}
+			desc = &descs[desc->next];
+			if (unlikely(desc->flags & VRING_DESC_F_INDIRECT)) {
+				error = -1;
+				goto out;
+			}
+
+			desc_addr = remote_gpa_to_vva(dev, desc->addr);
+			if (unlikely(!desc_addr)) {
+				error = -1;
+				goto out;
+			}
+
+			rte_prefetch0((void *)(uintptr_t)desc_addr);
+
+			desc_offset = 0;
+			desc_avail  = desc->len;
+
+		}
+
+		/**
+		 * This mbuf reaches to its end, get a new one
+		 * to hold more data.
+		 */
+		if (mbuf_avail == 0) {
+			cur = rte_pktmbuf_alloc(mbuf_pool);
+			if (unlikely(cur == NULL)) {
+				error = -1;
+				goto out;
+			}
+
+			prev->next = cur;
+			prev->data_len = mbuf_offset;
+			m->nb_segs += 1;
+			m->pkt_len += mbuf_offset;
+			prev = cur;
+
+			mbuf_offset = 0;
+			mbuf_avail  = cur->buf_len - RTE_PKTMBUF_HEADROOM;
+		}
+	}
+
+	prev->data_len = mbuf_offset;
+	m->pkt_len    += mbuf_offset;
+
+out:
+	vq->batch_copy_nb_elems = copy_nb;
+
+	return error;
+}
+
+static inline void
+do_data_copy_dequeue(struct vhostpci_virtqueue *vq)
+{
+	struct batch_copy_elem *elem = vq->batch_copy_elems;
+	uint16_t count = vq->batch_copy_nb_elems;
+	int i;
+
+	for (i = 0; i < count; i++)
+		rte_memcpy(elem[i].dst, elem[i].src, elem[i].len);
+}
+
+static __rte_always_inline void
+update_used_idx(struct vhostpci_virtqueue *vq, uint32_t count)
+{
+	if (unlikely(count == 0))
+		return;
+
+	rte_smp_wmb();
+	rte_smp_rmb();
+
+	vq->used->idx += count;
+}
+
+static uint16_t
+vhostpci_dequeue_burst(struct vhostpci_net *dev, uint16_t queue_id,
+		struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts,
+		uint16_t count)
+{
+	struct vhostpci_virtqueue *vq;
+	uint32_t desc_indexes[VHOSTPCI_MAX_PKT_BURST];
+	uint32_t used_idx;
+	uint32_t i = 0;
+	uint16_t free_entries;
+	uint16_t avail_idx;
+
+	if (!dev)
+		return 0;
+
+	if (unlikely(!is_valid_virt_queue_idx(queue_id, 1, dev->nr_vring)))
+		return 0;
+
+	vq = dev->virtqueue[queue_id];
+	if (unlikely(vq->enabled == 0))
+		return 0;
+
+	vq->batch_copy_nb_elems = 0;
+
+	free_entries = *((volatile uint16_t *)&vq->avail->idx) -
+			vq->last_avail_idx;
+	if (free_entries == 0)
+		return 0;
+
+	/* Prefetch available and used ring */
+	avail_idx = vq->last_avail_idx & (vq->size - 1);
+	used_idx  = vq->last_used_idx  & (vq->size - 1);
+	rte_prefetch0(&vq->avail->ring[avail_idx]);
+	rte_prefetch0(&vq->used->ring[used_idx]);
+
+	count = RTE_MIN(count, VHOSTPCI_MAX_PKT_BURST);
+	count = RTE_MIN(count, free_entries);
+
+	/* Retrieve all of the head indexes first to avoid caching issues. */
+	for (i = 0; i < count; i++) {
+		avail_idx = (vq->last_avail_idx + i) & (vq->size - 1);
+		used_idx  = (vq->last_used_idx  + i) & (vq->size - 1);
+		desc_indexes[i] = vq->avail->ring[avail_idx];
+		update_used_ring(vq, used_idx, desc_indexes[i]);
+	}
+
+	/* Prefetch descriptor index. */
+	rte_prefetch0(&vq->desc[desc_indexes[0]]);
+	for (i = 0; i < count; i++) {
+		struct vring_desc *desc;
+		uint16_t sz, idx;
+		int err;
+
+		if (likely(i + 1 < count))
+			rte_prefetch0(&vq->desc[desc_indexes[i + 1]]);
+
+		desc = vq->desc;
+		sz = vq->size;
+		idx = desc_indexes[i];
+
+		pkts[i] = rte_pktmbuf_alloc(mbuf_pool);
+		if (unlikely(pkts[i] == NULL))
+			break;
+
+		err = copy_desc_to_mbuf(dev, vq, desc, sz, pkts[i], idx,
+					mbuf_pool);
+		if (unlikely(err)) {
+			rte_pktmbuf_free(pkts[i]);
+			break;
+		}
+
+	}
+	vq->last_avail_idx += i;
+
+	do_data_copy_dequeue(vq);
+	vq->last_used_idx += i;
+	update_used_idx(vq, i);
+
+	return i;
+}
+
+static uint16_t
+eth_vhostpci_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+	struct vhostpci_queue *r = q;
+	uint16_t i, nb_rx = 0;
+	uint16_t nb_receive = nb_bufs;
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		return 0;
+
+	rte_atomic32_set(&r->while_queuing, 1);
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		goto out;
+
+	/* Dequeue packets from TX queue in the other guest */
+	while (nb_receive) {
+		uint16_t nb_pkts;
+		uint16_t num = (uint16_t)RTE_MIN(nb_receive,
+						 VHOSTPCI_MAX_PKT_BURST);
+
+		nb_pkts = vhostpci_dequeue_burst(r->vpnet, r->virtqueue_id,
+						 r->mb_pool, &bufs[nb_rx],
+						 num);
+
+		nb_rx += nb_pkts;
+		nb_receive -= nb_pkts;
+		if (nb_pkts < num)
+			break;
+	}
+
+	r->stats.pkts += nb_rx;
+
+	for (i = 0; likely(i < nb_rx); i++) {
+		bufs[i]->port = r->port_id;
+		r->stats.bytes += bufs[i]->pkt_len;
+	}
+
+out:
+	rte_atomic32_set(&r->while_queuing, 0);
+
+	return nb_rx;
+}
+
 static int
 vhostpci_dev_atomic_read_link_status(struct rte_eth_dev *dev,
 		struct rte_eth_link *link)
@@ -716,6 +1026,7 @@ eth_vhostpci_dev_init(struct rte_eth_dev *eth_dev)
 		rte_intr_callback_register(eth_dev->intr_handle,
 			vhostpci_interrupt_handler, eth_dev);
 
+	eth_dev->rx_pkt_burst = &eth_vhostpci_rx;
 	return 0;
 }
 
-- 
2.13.3

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [dpdk-dev] [PATCH 09/11] net/vhostpci: add TX function
  2017-11-30  9:46 [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting VM2VM scenario Zhiyong Yang
                   ` (7 preceding siblings ...)
  2017-11-30  9:46 ` [dpdk-dev] [PATCH 08/11] net/vhostpci: add RX function Zhiyong Yang
@ 2017-11-30  9:46 ` Zhiyong Yang
  2017-11-30  9:46 ` [dpdk-dev] [PATCH 10/11] net/vhostpci: support RX/TX packets statistics Zhiyong Yang
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Zhiyong Yang @ 2017-11-30  9:46 UTC (permalink / raw)
  To: dev, yliu, maxime.coquelin; +Cc: wei.w.wang, jianfeng.tan, Zhiyong Yang

add the functions to support transmitting packets.

Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
---
 drivers/net/vhostpci/vhostpci_ethdev.c | 352 +++++++++++++++++++++++++++++++++
 1 file changed, 352 insertions(+)

diff --git a/drivers/net/vhostpci/vhostpci_ethdev.c b/drivers/net/vhostpci/vhostpci_ethdev.c
index 06e3f5c50..f233d85a8 100644
--- a/drivers/net/vhostpci/vhostpci_ethdev.c
+++ b/drivers/net/vhostpci/vhostpci_ethdev.c
@@ -53,6 +53,12 @@
 #define VHOSTPCI_MAX_PKT_BURST 32
 #define VHOSTPCI_BUF_VECTOR_MAX 256
 
+/* avoid write operation when necessary, to lessen cache issues */
+#define ASSIGN_UNLESS_EQUAL(var, val) do {	\
+	if ((var) != (val))			\
+		(var) = (val);			\
+} while (0)
+
 static void
 vhostpci_dev_info_get(struct rte_eth_dev *dev,
 		struct rte_eth_dev_info *dev_info);
@@ -100,6 +106,10 @@ static uint16_t
 vhostpci_dequeue_burst(struct vhostpci_net *dev, uint16_t queue_id,
 	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count);
 
+static uint16_t
+vhostpci_dequeue_burst(struct vhostpci_net *dev, uint16_t queue_id,
+	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count);
+
 static int
 vhostpci_dev_start(struct rte_eth_dev *dev);
 
@@ -321,6 +331,346 @@ vhostpci_dev_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
 	return 0;
 }
 
+
+static inline void
+do_data_copy_enqueue(struct vhostpci_virtqueue *vq)
+{
+	struct batch_copy_elem *elem = vq->batch_copy_elems;
+	uint16_t count = vq->batch_copy_nb_elems;
+	int i;
+
+	for (i = 0; i < count; i++)
+		rte_memcpy(elem[i].dst, elem[i].src, elem[i].len);
+}
+
+static __rte_always_inline int
+copy_mbuf_to_desc_mergeable(struct vhostpci_net *dev,
+		struct vhostpci_virtqueue *vq, struct rte_mbuf *m,
+		struct buf_vector *buf_vec, uint16_t num_buffers)
+{
+	uint32_t vec_idx = 0;
+	uint64_t desc_addr;
+	uint32_t mbuf_offset, mbuf_avail;
+	uint32_t desc_offset, desc_avail;
+	uint32_t cpy_len;
+	uint64_t hdr_addr;
+	struct batch_copy_elem *batch_copy = vq->batch_copy_elems;
+	uint16_t copy_nb = vq->batch_copy_nb_elems;
+	int error = 0;
+
+	if (unlikely(m == NULL)) {
+		error = -1;
+		goto out;
+	}
+
+	desc_addr = remote_gpa_to_vva(dev, buf_vec[vec_idx].buf_addr);
+
+	if (buf_vec[vec_idx].buf_len < dev->vhost_hlen || !desc_addr) {
+		error = -1;
+		goto out;
+	}
+
+	hdr_addr = desc_addr;
+	rte_prefetch0((void *)(uintptr_t)hdr_addr);
+
+	desc_avail  = buf_vec[vec_idx].buf_len - dev->vhost_hlen;
+	desc_offset = dev->vhost_hlen;
+
+	mbuf_avail  = rte_pktmbuf_data_len(m);
+	mbuf_offset = 0;
+	while (mbuf_avail != 0 || m->next != NULL) {
+		/* done with current desc buf, get the next one */
+		if (desc_avail == 0) {
+			vec_idx++;
+			desc_addr = remote_gpa_to_vva(dev,
+					buf_vec[vec_idx].buf_addr);
+
+			if (unlikely(!desc_addr)) {
+				error = -1;
+				goto out;
+			}
+
+			/* Prefetch buffer address. */
+			rte_prefetch0((void *)(uintptr_t)desc_addr);
+			desc_offset = 0;
+			desc_avail  = buf_vec[vec_idx].buf_len;
+		}
+
+		/* done with current mbuf, get the next one */
+		if (mbuf_avail == 0) {
+			m = m->next;
+			mbuf_offset = 0;
+			mbuf_avail  = rte_pktmbuf_data_len(m);
+		}
+
+		if (hdr_addr) {
+			struct virtio_net_hdr_mrg_rxbuf *hdr;
+
+			hdr = (struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)
+				hdr_addr;
+			ASSIGN_UNLESS_EQUAL(hdr->num_buffers, num_buffers);
+			hdr_addr = 0;
+		}
+
+		cpy_len = RTE_MIN(desc_avail, mbuf_avail);
+
+		if (likely(cpy_len > MAX_BATCH_LEN || copy_nb >= vq->size)) {
+			rte_memcpy((void *)((uintptr_t)(desc_addr +
+							desc_offset)),
+				rte_pktmbuf_mtod_offset(m, void *, mbuf_offset),
+				cpy_len);
+
+		} else {
+			batch_copy[copy_nb].dst =
+				(void *)((uintptr_t)(desc_addr + desc_offset));
+			batch_copy[copy_nb].src =
+				rte_pktmbuf_mtod_offset(m, void *, mbuf_offset);
+			batch_copy[copy_nb].len = cpy_len;
+			copy_nb++;
+		}
+
+		mbuf_avail  -= cpy_len;
+		mbuf_offset += cpy_len;
+		desc_avail  -= cpy_len;
+		desc_offset += cpy_len;
+	}
+
+out:
+	vq->batch_copy_nb_elems = copy_nb;
+
+	return error;
+}
+
+static __rte_always_inline int
+fill_vec_buf(struct vhostpci_virtqueue *vq,
+		uint32_t avail_idx, uint32_t *vec_idx,
+		struct buf_vector *buf_vec, uint16_t *desc_chain_head,
+		uint16_t *desc_chain_len)
+{
+	uint16_t idx = vq->avail->ring[avail_idx & (vq->size - 1)];
+	uint32_t vec_id = *vec_idx;
+	uint32_t len = 0;
+	struct vring_desc *descs = vq->desc;
+
+	*desc_chain_head = idx;
+
+	while (1) {
+		if (unlikely(vec_id >= VHOSTPCI_BUF_VECTOR_MAX ||
+			idx >= vq->size))
+			return -1;
+
+		len += descs[idx].len;
+		buf_vec[vec_id].buf_addr = descs[idx].addr;
+		buf_vec[vec_id].buf_len  = descs[idx].len;
+		buf_vec[vec_id].desc_idx = idx;
+		vec_id++;
+
+		if ((descs[idx].flags & VRING_DESC_F_NEXT) == 0)
+			break;
+
+		idx = descs[idx].next;
+	}
+
+	*desc_chain_len = len;
+	*vec_idx = vec_id;
+
+	return 0;
+}
+
+static __rte_always_inline void
+update_shadow_used_ring(struct vhostpci_virtqueue *vq, uint16_t desc_idx,
+			uint16_t len)
+{
+	uint16_t i = vq->shadow_used_idx++;
+
+	vq->shadow_used_ring[i].id  = desc_idx;
+	vq->shadow_used_ring[i].len = len;
+}
+
+static inline int
+reserve_avail_buf_mergeable(struct vhostpci_virtqueue *vq,
+			    uint32_t size, struct buf_vector *buf_vec,
+			    uint16_t *num_buffers, uint16_t avail_head)
+{
+	uint16_t cur_idx;
+	uint32_t vec_idx = 0;
+	uint16_t tries = 0;
+
+	uint16_t head_idx = 0;
+	uint16_t len = 0;
+
+	*num_buffers = 0;
+	cur_idx  = vq->last_avail_idx;
+
+	while (size > 0) {
+		if (unlikely(cur_idx == avail_head))
+			return -1;
+
+		if (unlikely(fill_vec_buf(vq, cur_idx, &vec_idx, buf_vec,
+				&head_idx, &len) < 0))
+			return -1;
+		len = RTE_MIN(len, size);
+		update_shadow_used_ring(vq, head_idx, len);
+		size -= len;
+
+		cur_idx++;
+		tries++;
+		*num_buffers += 1;
+
+		/**
+		 * if we tried all available ring items, and still
+		 * can't get enough buf, it means something abnormal
+		 * happened.
+		 */
+		if (unlikely(tries >= vq->size))
+			return -1;
+	}
+
+	return 0;
+}
+
+static __rte_always_inline void
+do_flush_shadow_used_ring(struct vhostpci_virtqueue *vq, uint16_t to,
+		uint16_t from, uint16_t size)
+{
+	rte_memcpy(&vq->used->ring[to], &vq->shadow_used_ring[from],
+			size * sizeof(struct vring_used_elem));
+}
+
+static __rte_always_inline void
+flush_shadow_used_ring(struct vhostpci_virtqueue *vq)
+{
+	uint16_t used_idx = vq->last_used_idx & (vq->size - 1);
+
+	if (used_idx + vq->shadow_used_idx <= vq->size) {
+		do_flush_shadow_used_ring(vq, used_idx, 0,
+					  vq->shadow_used_idx);
+	} else {
+		uint16_t size;
+
+		/* update used ring interval [used_idx, vq->size] */
+		size = vq->size - used_idx;
+		do_flush_shadow_used_ring(vq, used_idx, 0, size);
+
+		/* update the left half used ring interval [0, left_size] */
+		do_flush_shadow_used_ring(vq, 0, size,
+					  vq->shadow_used_idx - size);
+	}
+	vq->last_used_idx += vq->shadow_used_idx;
+
+	rte_smp_wmb();
+
+	*(volatile uint16_t *)&vq->used->idx += vq->shadow_used_idx;
+}
+
+static __rte_always_inline uint32_t
+vhostpci_dev_merge_rx(struct vhostpci_net *dev, uint16_t queue_id,
+		struct rte_mbuf **pkts, uint32_t count)
+{
+	struct vhostpci_virtqueue *vq;
+	uint32_t pkt_idx = 0;
+	uint16_t num_buffers;
+	struct buf_vector buf_vec[VHOSTPCI_BUF_VECTOR_MAX];
+	uint16_t avail_head;
+
+	if (unlikely(!is_valid_virt_queue_idx(queue_id, 0, dev->nr_vring)))
+		return 0;
+
+	vq = dev->virtqueue[queue_id];
+	if (unlikely(vq->enabled == 0))
+		return 0;
+
+	count = RTE_MIN((uint32_t)VHOSTPCI_MAX_PKT_BURST, count);
+	if (count == 0)
+		return 0;
+
+	vq->batch_copy_nb_elems = 0;
+
+	rte_prefetch0(&vq->avail->ring[vq->last_avail_idx & (vq->size - 1)]);
+
+	vq->shadow_used_idx = 0;
+	avail_head = *((volatile uint16_t *)&vq->avail->idx);
+	for (pkt_idx = 0; pkt_idx < count; pkt_idx++) {
+		uint32_t pkt_len = pkts[pkt_idx]->pkt_len + dev->vhost_hlen;
+
+		if (unlikely(reserve_avail_buf_mergeable(vq, pkt_len, buf_vec,
+			&num_buffers, avail_head) < 0)) {
+			vq->shadow_used_idx -= num_buffers;
+			break;
+		}
+
+		if (copy_mbuf_to_desc_mergeable(dev, vq, pkts[pkt_idx],
+						buf_vec, num_buffers) < 0) {
+			vq->shadow_used_idx -= num_buffers;
+			break;
+		}
+
+		vq->last_avail_idx += num_buffers;
+	}
+
+	do_data_copy_enqueue(vq);
+
+	if (likely(vq->shadow_used_idx)) {
+		flush_shadow_used_ring(vq);
+
+		/* flush used->idx update before we read avail->flags. */
+		rte_mb();
+	}
+
+	return pkt_idx;
+}
+
+static uint16_t
+vhostpci_enqueue_burst(struct vhostpci_net *vpnet, uint16_t queue_id,
+		       struct rte_mbuf **pkts, uint16_t count)
+{
+	return vhostpci_dev_merge_rx(vpnet, queue_id, pkts, count);
+}
+
+static uint16_t
+eth_vhostpci_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
+{
+	struct vhostpci_queue *r = q;
+	uint16_t i, nb_tx = 0;
+	uint16_t nb_send = nb_bufs;
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		return 0;
+
+	rte_atomic32_set(&r->while_queuing, 1);
+
+	if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+		goto out;
+
+	/* Enqueue packets to RX queue in the other guest*/
+	while (nb_send) {
+		uint16_t nb_pkts;
+		uint16_t num = (uint16_t)RTE_MIN(nb_send,
+						 VHOSTPCI_MAX_PKT_BURST);
+
+		nb_pkts = vhostpci_enqueue_burst(r->vpnet, r->virtqueue_id,
+						&bufs[nb_tx], num);
+
+		nb_tx += nb_pkts;
+		nb_send -= nb_pkts;
+		if (nb_pkts < num)
+			break;
+	}
+
+	r->stats.pkts += nb_tx;
+	r->stats.missed_pkts += nb_bufs - nb_tx;
+
+	for (i = 0; likely(i < nb_tx); i++)
+		r->stats.bytes += bufs[i]->pkt_len;
+
+	for (i = 0; likely(i < nb_tx); i++)
+		rte_pktmbuf_free(bufs[i]);
+out:
+	rte_atomic32_set(&r->while_queuing, 0);
+
+	return nb_tx;
+}
+
 static __rte_always_inline void
 update_used_ring(struct vhostpci_virtqueue *vq,
 		 uint32_t used_idx, uint32_t desc_idx)
@@ -1027,6 +1377,8 @@ eth_vhostpci_dev_init(struct rte_eth_dev *eth_dev)
 			vhostpci_interrupt_handler, eth_dev);
 
 	eth_dev->rx_pkt_burst = &eth_vhostpci_rx;
+	eth_dev->tx_pkt_burst = &eth_vhostpci_tx;
+
 	return 0;
 }
 
-- 
2.13.3

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [dpdk-dev] [PATCH 10/11] net/vhostpci: support RX/TX packets statistics
  2017-11-30  9:46 [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting VM2VM scenario Zhiyong Yang
                   ` (8 preceding siblings ...)
  2017-11-30  9:46 ` [dpdk-dev] [PATCH 09/11] net/vhostpci: add TX function Zhiyong Yang
@ 2017-11-30  9:46 ` Zhiyong Yang
  2017-11-30  9:46 ` [dpdk-dev] [PATCH 11/11] net/vhostpci: update release note Zhiyong Yang
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Zhiyong Yang @ 2017-11-30  9:46 UTC (permalink / raw)
  To: dev, yliu, maxime.coquelin; +Cc: wei.w.wang, jianfeng.tan, Zhiyong Yang

Add the functions to support for TX/RX pkts statistics.

Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
---
 drivers/net/vhostpci/vhostpci_ethdev.c | 73 ++++++++++++++++++++++++++++++++++
 1 file changed, 73 insertions(+)

diff --git a/drivers/net/vhostpci/vhostpci_ethdev.c b/drivers/net/vhostpci/vhostpci_ethdev.c
index f233d85a8..3dd09e2ea 100644
--- a/drivers/net/vhostpci/vhostpci_ethdev.c
+++ b/drivers/net/vhostpci/vhostpci_ethdev.c
@@ -129,11 +129,19 @@ vhostpci_dev_stop(struct rte_eth_dev *dev);
 static int
 vhostpci_get_remote_mem(struct rte_eth_dev *dev);
 
+static void
+vhostpci_dev_stats_reset(struct rte_eth_dev *dev);
+
+static int
+vhostpci_dev_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats);
+
 static const struct eth_dev_ops vhostpci_eth_dev_ops = {
 	.dev_start               = vhostpci_dev_start,
 	.dev_stop                = vhostpci_dev_stop,
 	.dev_close               = vhostpci_dev_close,
 	.dev_infos_get		 = vhostpci_dev_info_get,
+	.stats_get		 = vhostpci_dev_stats_get,
+	.stats_reset		 = vhostpci_dev_stats_reset,
 	.dev_configure		 = vhostpci_dev_configure,
 	.link_update             = vhostpci_dev_link_update,
 	.rx_queue_setup		 = vhostpci_dev_rx_queue_setup,
@@ -284,6 +292,71 @@ vhostpci_dev_configure(struct rte_eth_dev *dev __rte_unused)
 }
 
 static int
+vhostpci_dev_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+	int i;
+	unsigned long rx_total = 0, tx_total = 0, tx_missed_total = 0;
+	unsigned long rx_total_bytes = 0, tx_total_bytes = 0;
+	struct vhostpci_queue *vq;
+
+	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+			i < dev->data->nb_rx_queues; i++) {
+		if (dev->data->rx_queues[i] == NULL)
+			continue;
+		vq = dev->data->rx_queues[i];
+		stats->q_ipackets[i] = vq->stats.pkts;
+		rx_total += stats->q_ipackets[i];
+
+		stats->q_ibytes[i] = vq->stats.bytes;
+		rx_total_bytes += stats->q_ibytes[i];
+	}
+
+	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
+			i < dev->data->nb_tx_queues; i++) {
+		if (dev->data->tx_queues[i] == NULL)
+			continue;
+		vq = dev->data->tx_queues[i];
+		stats->q_opackets[i] = vq->stats.pkts;
+		tx_missed_total += vq->stats.missed_pkts;
+		tx_total += stats->q_opackets[i];
+
+		stats->q_obytes[i] = vq->stats.bytes;
+		tx_total_bytes += stats->q_obytes[i];
+	}
+
+	stats->ipackets = rx_total;
+	stats->opackets = tx_total;
+	stats->oerrors = tx_missed_total;
+	stats->ibytes = rx_total_bytes;
+	stats->obytes = tx_total_bytes;
+
+	return 0;
+}
+
+static void
+vhostpci_dev_stats_reset(struct rte_eth_dev *dev)
+{
+	struct vhostpci_queue *vq;
+	int i;
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		if (dev->data->rx_queues[i] == NULL)
+			continue;
+		vq = dev->data->rx_queues[i];
+		vq->stats.pkts = 0;
+		vq->stats.bytes = 0;
+	}
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		if (dev->data->tx_queues[i] == NULL)
+			continue;
+		vq = dev->data->tx_queues[i];
+		vq->stats.pkts = 0;
+		vq->stats.bytes = 0;
+		vq->stats.missed_pkts = 0;
+	}
+}
+
+static int
 vhostpci_dev_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
 		uint16_t nb_rx_desc __rte_unused,
 		unsigned int socket_id,
-- 
2.13.3

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [dpdk-dev] [PATCH 11/11] net/vhostpci: update release note
  2017-11-30  9:46 [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting VM2VM scenario Zhiyong Yang
                   ` (9 preceding siblings ...)
  2017-11-30  9:46 ` [dpdk-dev] [PATCH 10/11] net/vhostpci: support RX/TX packets statistics Zhiyong Yang
@ 2017-11-30  9:46 ` Zhiyong Yang
  2017-12-05  6:59 ` [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting VM2VM scenario Yang, Zhiyong
  2017-12-19 11:14 ` Maxime Coquelin
  12 siblings, 0 replies; 26+ messages in thread
From: Zhiyong Yang @ 2017-11-30  9:46 UTC (permalink / raw)
  To: dev, yliu, maxime.coquelin; +Cc: wei.w.wang, jianfeng.tan, Zhiyong Yang

update release note doc.
update MAINTAINERS file.

Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
---
 MAINTAINERS                            | 6 ++++++
 doc/guides/rel_notes/release_18_02.rst | 6 ++++++
 2 files changed, 12 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index f0baeb423..7db0cfa35 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -525,6 +525,12 @@ T: git://dpdk.org/next/dpdk-next-virtio
 F: drivers/net/vhost/
 F: doc/guides/nics/features/vhost.ini
 
+Vhostpci PMD
+M: Zhiyong Yang <zhiyong.yang@intel.com>
+F: drivers/net/vhostpci/
+F: doc/guides/nics/vhostpci.rst
+F: doc/guides/nics/features/vhostpci.ini
+
 Virtio PMD
 M: Yuanhan Liu <yliu@fridaylinux.org>
 M: Maxime Coquelin <maxime.coquelin@redhat.com>
diff --git a/doc/guides/rel_notes/release_18_02.rst b/doc/guides/rel_notes/release_18_02.rst
index 24b67bb8b..b9f251db4 100644
--- a/doc/guides/rel_notes/release_18_02.rst
+++ b/doc/guides/rel_notes/release_18_02.rst
@@ -41,6 +41,11 @@ New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
+* **Added vhostpci PMD.**
+
+  Added a new ``vhostpci PMD`` to drive ``vhostpci device``, which is a new
+  virtio pci device. Vhostpci PMD works in pair with virtio-net PMD to achieve
+  point-to-point communication between VMs.
 
 API Changes
 -----------
@@ -154,6 +159,7 @@ The libraries prepended with a plus sign were incremented in this version.
      librte_pmd_ring.so.2
      librte_pmd_softnic.so.1
      librte_pmd_vhost.so.2
+     librte_pmd_vhostpci.so.1
      librte_port.so.3
      librte_power.so.1
      librte_reorder.so.1
-- 
2.13.3

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting VM2VM scenario
  2017-11-30  9:46 [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting VM2VM scenario Zhiyong Yang
                   ` (10 preceding siblings ...)
  2017-11-30  9:46 ` [dpdk-dev] [PATCH 11/11] net/vhostpci: update release note Zhiyong Yang
@ 2017-12-05  6:59 ` Yang, Zhiyong
  2017-12-05 14:08   ` Yuanhan Liu
  2017-12-07  6:07   ` Yang, Zhiyong
  2017-12-19 11:14 ` Maxime Coquelin
  12 siblings, 2 replies; 26+ messages in thread
From: Yang, Zhiyong @ 2017-12-05  6:59 UTC (permalink / raw)
  To: Yang, Zhiyong, dev, yliu, maxime.coquelin; +Cc: Wang, Wei W, Tan, Jianfeng

Hi all,

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Zhiyong Yang
> Sent: Thursday, November 30, 2017 5:47 PM
> To: dev@dpdk.org; yliu@fridaylinux.org; maxime.coquelin@redhat.com
> Cc: Wang, Wei W <wei.w.wang@intel.com>; Tan, Jianfeng
> <jianfeng.tan@intel.com>
> Subject: [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD
> supporting VM2VM scenario
> 
> Vhostpci PMD is a new type driver working in guest OS which has ability to drive
> the vhostpci modern pci device, which is a new virtio device.
> 
> The following linking is about vhostpci design:
> 
> An initial device design is presented at KVM Forum'16:
> http://www.linux-kvm.org/images/5/55/02x07A-Wei_Wang-Design_of-Vhost-
> pci.pdf
> The latest device design and implementation will be posted to the QEMU
> community soon.
> 

The latest version vhostpci device code has been posted to QEMU community.
[PATCH v3 0/7] Vhost-pci for inter-VM communication
http://lists.nongnu.org/archive/html/qemu-devel/2017-12/msg00494.html

[2016] Design of Vhost-pci by Wei Wang - YouTube linking as reference.
https://www.youtube.com/watch?v=xITj0qsaSJQ

thanks
Zhiyong

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting VM2VM scenario
  2017-12-05  6:59 ` [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting VM2VM scenario Yang, Zhiyong
@ 2017-12-05 14:08   ` Yuanhan Liu
  2017-12-06  3:00     ` Wei Wang
  2017-12-07  6:07   ` Yang, Zhiyong
  1 sibling, 1 reply; 26+ messages in thread
From: Yuanhan Liu @ 2017-12-05 14:08 UTC (permalink / raw)
  To: Yang, Zhiyong; +Cc: dev, maxime.coquelin, Wang, Wei W, Tan, Jianfeng

On Tue, Dec 05, 2017 at 06:59:26AM +0000, Yang, Zhiyong wrote:
> Hi all,
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Zhiyong Yang
> > Sent: Thursday, November 30, 2017 5:47 PM
> > To: dev@dpdk.org; yliu@fridaylinux.org; maxime.coquelin@redhat.com
> > Cc: Wang, Wei W <wei.w.wang@intel.com>; Tan, Jianfeng
> > <jianfeng.tan@intel.com>
> > Subject: [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD
> > supporting VM2VM scenario
> > 
> > Vhostpci PMD is a new type driver working in guest OS which has ability to drive
> > the vhostpci modern pci device, which is a new virtio device.
> > 
> > The following linking is about vhostpci design:
> > 
> > An initial device design is presented at KVM Forum'16:
> > http://www.linux-kvm.org/images/5/55/02x07A-Wei_Wang-Design_of-Vhost-
> > pci.pdf
> > The latest device design and implementation will be posted to the QEMU
> > community soon.
> > 
> 
> The latest version vhostpci device code has been posted to QEMU community.
> [PATCH v3 0/7] Vhost-pci for inter-VM communication
> http://lists.nongnu.org/archive/html/qemu-devel/2017-12/msg00494.html

I didn't follow on the qemu ML. Do you know when it's likely it will
get accepted?

	--yliu
> 
> [2016] Design of Vhost-pci by Wei Wang - YouTube linking as reference.
> https://www.youtube.com/watch?v=xITj0qsaSJQ
> 
> thanks
> Zhiyong

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting VM2VM scenario
  2017-12-05 14:08   ` Yuanhan Liu
@ 2017-12-06  3:00     ` Wei Wang
  0 siblings, 0 replies; 26+ messages in thread
From: Wei Wang @ 2017-12-06  3:00 UTC (permalink / raw)
  To: Yuanhan Liu, Yang, Zhiyong; +Cc: dev, maxime.coquelin, Tan, Jianfeng

On 12/05/2017 10:08 PM, Yuanhan Liu wrote:
> On Tue, Dec 05, 2017 at 06:59:26AM +0000, Yang, Zhiyong wrote:
>> Hi all,
>>
>>> -----Original Message-----
>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Zhiyong Yang
>>> Sent: Thursday, November 30, 2017 5:47 PM
>>> To: dev@dpdk.org; yliu@fridaylinux.org; maxime.coquelin@redhat.com
>>> Cc: Wang, Wei W <wei.w.wang@intel.com>; Tan, Jianfeng
>>> <jianfeng.tan@intel.com>
>>> Subject: [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD
>>> supporting VM2VM scenario
>>>
>>> Vhostpci PMD is a new type driver working in guest OS which has ability to drive
>>> the vhostpci modern pci device, which is a new virtio device.
>>>
>>> The following linking is about vhostpci design:
>>>
>>> An initial device design is presented at KVM Forum'16:
>>> http://www.linux-kvm.org/images/5/55/02x07A-Wei_Wang-Design_of-Vhost-
>>> pci.pdf
>>> The latest device design and implementation will be posted to the QEMU
>>> community soon.
>>>
>> The latest version vhostpci device code has been posted to QEMU community.
>> [PATCH v3 0/7] Vhost-pci for inter-VM communication
>> http://lists.nongnu.org/archive/html/qemu-devel/2017-12/msg00494.html
> I didn't follow on the qemu ML. Do you know when it's likely it will
> get accepted?
>
> 	--yliu

Hi Yuanhan,

There appear to be more people interested in this solution. My 
expectation is to get the fundamental part of device finalized in 3 months.

For the pmd patches, I think it would be great if people can start the 
review from the non-device related part (e.g. the data transmission logic).

Best,
Wei

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting VM2VM scenario
  2017-12-05  6:59 ` [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting VM2VM scenario Yang, Zhiyong
  2017-12-05 14:08   ` Yuanhan Liu
@ 2017-12-07  6:07   ` Yang, Zhiyong
  1 sibling, 0 replies; 26+ messages in thread
From: Yang, Zhiyong @ 2017-12-07  6:07 UTC (permalink / raw)
  To: dev, yliu, maxime.coquelin; +Cc: Wang, Wei W, Tan, Jianfeng, Wang, Zhihong

Hi  all,

> -----Original Message-----
> From: Yang, Zhiyong
> Sent: Tuesday, December 5, 2017 2:59 PM
> To: Yang, Zhiyong <zhiyong.yang@intel.com>; dev@dpdk.org;
> yliu@fridaylinux.org; maxime.coquelin@redhat.com
> Cc: Wang, Wei W <wei.w.wang@intel.com>; Tan, Jianfeng
> <jianfeng.tan@intel.com>
> Subject: RE: [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD
> supporting VM2VM scenario
> 
> Hi all,
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Zhiyong Yang
> > Sent: Thursday, November 30, 2017 5:47 PM
> > To: dev@dpdk.org; yliu@fridaylinux.org; maxime.coquelin@redhat.com
> > Cc: Wang, Wei W <wei.w.wang@intel.com>; Tan, Jianfeng
> > <jianfeng.tan@intel.com>
> > Subject: [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD
> > supporting VM2VM scenario
> >
> > Vhostpci PMD is a new type driver working in guest OS which has
> > ability to drive the vhostpci modern pci device, which is a new virtio device.
> >
> > The following linking is about vhostpci design:
> >
> > An initial device design is presented at KVM Forum'16:
> > http://www.linux-kvm.org/images/5/55/02x07A-Wei_Wang-Design_of-Vhost-
> > pci.pdf
> > The latest device design and implementation will be posted to the QEMU
> > community soon.
> >
> 
> The latest version vhostpci device code has been posted to QEMU community.
> [PATCH v3 0/7] Vhost-pci for inter-VM communication
> http://lists.nongnu.org/archive/html/qemu-devel/2017-12/msg00494.html
> 
> [2016] Design of Vhost-pci by Wei Wang - YouTube linking as reference.
> https://www.youtube.com/watch?v=xITj0qsaSJQ
> 

I miss the code in V1 as followings.

Disable notifications.
virtqueue[i]->used->flags = VRING_USED_F_NO_NOTIFY;

I will add it in next version. It can increase throughput sharply for vhostpci/ virtio PMD loopback test.
The similar performance result can be seen compared to vhost user/virtio PMD loopback test after disabling notifications.

thanks
Zhiyong

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting VM2VM scenario
  2017-11-30  9:46 [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting VM2VM scenario Zhiyong Yang
                   ` (11 preceding siblings ...)
  2017-12-05  6:59 ` [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting VM2VM scenario Yang, Zhiyong
@ 2017-12-19 11:14 ` Maxime Coquelin
  2017-12-20  1:51   ` Yang, Zhiyong
  2018-01-11 11:13   ` Yang, Zhiyong
  12 siblings, 2 replies; 26+ messages in thread
From: Maxime Coquelin @ 2017-12-19 11:14 UTC (permalink / raw)
  To: Zhiyong Yang, dev, yliu; +Cc: wei.w.wang, jianfeng.tan

Hi Zhiyong,

On 11/30/2017 10:46 AM, Zhiyong Yang wrote:
> Vhostpci PMD is a new type driver working in guest OS which has ability to
> drive the vhostpci modern pci device, which is a new virtio device.
> 
> The following linking is about vhostpci design:
> 
> An initial device design is presented at KVM Forum'16:
> http://www.linux-kvm.org/images/5/55/02x07A-Wei_Wang-Design_of-Vhost-pci.pdf
> The latest device design and implementation will be posted to the QEMU community soon.
> 
> Vhostpci PMD works in pair with virtio-net PMD to achieve point-to-point communication
> between VMs. DPDK already has virtio/vhost user PMD pair to implement RX/TX packets
> between guest/host scenario. However, for VM2VM use cases, Virtio PMD needs to
> transmit pkts from VM1 to host OS firstly by vhost user port, then transmit pkts to
> the 2nd VM by virtio PMD port again. Virtio/Vhostpci PMD pair can implement shared
> memory to receive/trasmit packets directly between two VMs. Currently, the entire memory
> of the virtio-net side VM is shared to the vhost-pci side VM, and mapped via device BAR2,
> and the first 4KB area of BAR2 is reserved to store the metadata.
> 
> The vhostpci/virtio PMD working processing is the following:
> 
> 1.VM1 startup with vhostpci device, bind the device to DPDK in the guest1,
> launch the DPDK testpmd, then waiting for the remote memory info (the VM2
> shares memory, memory regions and vring info).
> 
> 2.VM2 startup with virtio-net device, bind the virito-net to DPDK in the VM2,
> run testpmd using virtio PMD.
> 
> 3.vhostpci device negotiate virtio message with virtio-net device via socket
> as vhost user/virtio-net do that.
> 
> 4.Vhostpci device gets VM2's memory region and vring info and write the metadata
> to VM2's shared memory.
> 
> 5.When the metadata is ready to be read by the Vhostpci PMD, the PMD
> will receive a config interrupt with LINK_UP set in the status config.
> 
> 6.Vhostpci PMD and Virtio PMD can transmit/receive the packets.
> 
> How to test?
> 
> 1. launch VM1 with vhostpci device.
> qemu/x86_64-softmmu/qemu-system-x86_64 -cpu host -M pc -enable-kvm \
> -smp 16,threads=1,sockets=1 -m 8G -mem-prealloc -realtime mlock=on \
> -object memory-backend-file,id=mem,size=8G,mem-path=/dev/hugepages, \
> share=on -numa node,memdev=mem -drive if=virtio,file=/root/vhost-pci/guest1.img,format=raw \
> -kernel /opt/guest_kernel -append 'root=/dev/vda1 ro default_hugepagesz=1G hugepagesz=1G \
> hugepages=2 console=ttyS0,115200,8n1 3' -netdev tap,id=net1,br=br0,script=/etc/qemu-ifup \
> -chardev socket,id=slave1,server,wait=off, path=/opt/vhost-pci-slave1 -device vhost-pci-net-pci, \
> chardev=slave1 \
> -nographic
> 
> 2. bind vhostpci device to dpdk using igb_uio.
> startup dpdk
> ./x86_64-native-linuxapp-gcc/app/testpmd -c 0x3 -n 4 -- -i
> 
> 3. launch VM2 with virtio-net device.
> 
> qemu/x86_64-softmmu/qemu-system-x86_64 -cpu host -M pc -enable-kvm \
> -smp 4,threads=1,sockets=1 -m 8G -mem-prealloc -realtime mlock=on \
> -object memory-backend-file,id=mem,size=8G,mem-path=/dev/hugepages,share=on \
> -numa node,memdev=mem -drive if=virtio,file=/root/vhost-pci/guest2.img,format=raw \
> -net none -no-hpet -kernel /opt/guest_kernel \
> -append 'root=/dev/vda1 ro default_hugepagesz=1G hugepagesz=1G hugepages=2 console=ttyS0,115200,8n1 3' \
> -chardev socket,id=sock2,path=/opt/vhost-pci-slave1 \
> -netdev type=vhost-user,id=net2,chardev=sock2,vhostforce \
> -device virtio-net-pci,mac=52:54:00:00:00:02,netdev=net2 \
> -nographic
> 
> 4.bind virtio-net to dpdk using igb_uio
> run dpdk
> 
> ./x86_64-native-linuxapp-gcc/app/testpmd -c 0x3 -n 4 --socket-mem 512,0 \
> -- -i --rxq=1 --txq=1 --nb-cores=1
> 
> 5. vhostpci PMD run "start"
> 
> 6. virtio PMD side run "start tx_first"
> 
> loopback testing can work.
> 
> note:
> 1. only support igb_uio for now.
> 2. vhostpci device is a modern pci device. vhostpci PMD only supports mergable
> mode. Virtio device side must be mergable mode.
> 3. vhostpci PMD supports one queue pair for now.
> 
> Zhiyong Yang (11):
>    drivers/net: add vhostpci PMD base files
>    net/vhostpci: public header files
>    net/vhostpci: add debugging log macros
>    net/vhostpci: add basic framework
>    net/vhostpci: add queue setup
>    net/vhostpci: add support for link status change
>    net/vhostpci: get remote memory region and vring info
>    net/vhostpci: add RX function
>    net/vhostpci: add TX function
>    net/vhostpci: support RX/TX packets statistics
>    net/vhostpci: update release note
> 
>   MAINTAINERS                                       |    6 +
>   config/common_base                                |    9 +
>   config/common_linuxapp                            |    1 +
>   doc/guides/rel_notes/release_18_02.rst            |    6 +
>   drivers/net/Makefile                              |    1 +
>   drivers/net/vhostpci/Makefile                     |   54 +
>   drivers/net/vhostpci/rte_pmd_vhostpci_version.map |    3 +
>   drivers/net/vhostpci/vhostpci_ethdev.c            | 1521 +++++++++++++++++++++
>   drivers/net/vhostpci/vhostpci_ethdev.h            |  176 +++
>   drivers/net/vhostpci/vhostpci_logs.h              |   69 +
>   drivers/net/vhostpci/vhostpci_net.h               |   74 +
>   drivers/net/vhostpci/vhostpci_pci.c               |  334 +++++
>   drivers/net/vhostpci/vhostpci_pci.h               |  240 ++++
>   mk/rte.app.mk                                     |    1 +
>   14 files changed, 2495 insertions(+)
>   create mode 100644 drivers/net/vhostpci/Makefile
>   create mode 100644 drivers/net/vhostpci/rte_pmd_vhostpci_version.map
>   create mode 100644 drivers/net/vhostpci/vhostpci_ethdev.c
>   create mode 100644 drivers/net/vhostpci/vhostpci_ethdev.h
>   create mode 100644 drivers/net/vhostpci/vhostpci_logs.h
>   create mode 100644 drivers/net/vhostpci/vhostpci_net.h
>   create mode 100644 drivers/net/vhostpci/vhostpci_pci.c
>   create mode 100644 drivers/net/vhostpci/vhostpci_pci.h
> 

Thanks for the RFC.
It seems there is a lot of code duplication between this series and
libvhost-user.

Does the non-RFC would make reuse of libvhost-user? I'm thinking of all
the code copied from virtio-net.c for example.

If not, I think this is problematic as it will double the maintenance
cost.

Cheers,
Maxime

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting VM2VM scenario
  2017-12-19 11:14 ` Maxime Coquelin
@ 2017-12-20  1:51   ` Yang, Zhiyong
  2017-12-21  5:52     ` Tan, Jianfeng
  2018-01-11 11:13   ` Yang, Zhiyong
  1 sibling, 1 reply; 26+ messages in thread
From: Yang, Zhiyong @ 2017-12-20  1:51 UTC (permalink / raw)
  To: Maxime Coquelin, dev, yliu; +Cc: Wang, Wei W, Tan, Jianfeng

Hi Maxime,

> -----Original Message-----
> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> Sent: Tuesday, December 19, 2017 7:15 PM
> To: Yang, Zhiyong <zhiyong.yang@intel.com>; dev@dpdk.org;
> yliu@fridaylinux.org
> Cc: Wang, Wei W <wei.w.wang@intel.com>; Tan, Jianfeng
> <jianfeng.tan@intel.com>
> Subject: Re: [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting
> VM2VM scenario
> 

<snip>

> Thanks for the RFC.
> It seems there is a lot of code duplication between this series and libvhost-user.
> 
> Does the non-RFC would make reuse of libvhost-user? I'm thinking of all the
> code copied from virtio-net.c for example.
> 
> If not, I think this is problematic as it will double the maintenance cost.

Thank you for paying attention to the patchset . TX/RX logic basically comes from vhost user code. 
but some function interfaces and data structures are different,  So can not  reuse them directly,
code duplicate should be removed, I'm aware of this point too, when I was writing the vhostpci PMD.
 But We should modify the  libvhost-user firstly,  let libvhostuser become more generic.  
It looks like that more and more PMDs are becoming the member of virtio family or are on the road. 

Thanks
Zhiyong

> 
> Cheers,
> Maxime

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting VM2VM scenario
  2017-12-20  1:51   ` Yang, Zhiyong
@ 2017-12-21  5:52     ` Tan, Jianfeng
  2017-12-21  6:21       ` Yang, Zhiyong
  0 siblings, 1 reply; 26+ messages in thread
From: Tan, Jianfeng @ 2017-12-21  5:52 UTC (permalink / raw)
  To: Yang, Zhiyong, Maxime Coquelin, dev, yliu; +Cc: Wang, Wei W

Hi,

> -----Original Message-----
> From: Yang, Zhiyong
> Sent: Wednesday, December 20, 2017 9:52 AM
> To: Maxime Coquelin; dev@dpdk.org; yliu@fridaylinux.org
> Cc: Wang, Wei W; Tan, Jianfeng
> Subject: RE: [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting
> VM2VM scenario
> 
> Hi Maxime,
> 
> > -----Original Message-----
> > From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> > Sent: Tuesday, December 19, 2017 7:15 PM
> > To: Yang, Zhiyong <zhiyong.yang@intel.com>; dev@dpdk.org;
> > yliu@fridaylinux.org
> > Cc: Wang, Wei W <wei.w.wang@intel.com>; Tan, Jianfeng
> > <jianfeng.tan@intel.com>
> > Subject: Re: [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting
> > VM2VM scenario
> >
> 
> <snip>
> 
> > Thanks for the RFC.
> > It seems there is a lot of code duplication between this series and libvhost-
> user.
> >
> > Does the non-RFC would make reuse of libvhost-user? I'm thinking of all
> the
> > code copied from virtio-net.c for example.
> >
> > If not, I think this is problematic as it will double the maintenance cost.
> 
> Thank you for paying attention to the patchset . TX/RX logic basically comes
> from vhost user code.
> but some function interfaces and data structures are different,  So can not
> reuse them directly,
> code duplicate should be removed, I'm aware of this point too, when I was
> writing the vhostpci PMD.
>  But We should modify the  libvhost-user firstly,  let libvhostuser become
> more generic.
> It looks like that more and more PMDs are becoming the member of virtio
> family or are on the road.

Trying to draw a conclusion here, there are two kinds of code duplication in net/vhost-pci.
- For pci operations in guest driver, it has some duplicated code with virtio-net pmd.
- For the enqueue/dequeuer operations, it has some duplicated code with vhost-user lib.

Right?

Thanks,
Jianfeng

> 
> Thanks
> Zhiyong
> 
> >
> > Cheers,
> > Maxime

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting VM2VM scenario
  2017-12-21  5:52     ` Tan, Jianfeng
@ 2017-12-21  6:21       ` Yang, Zhiyong
  2017-12-21  6:26         ` Yang, Zhiyong
  0 siblings, 1 reply; 26+ messages in thread
From: Yang, Zhiyong @ 2017-12-21  6:21 UTC (permalink / raw)
  To: Tan, Jianfeng, Maxime Coquelin, dev, yliu; +Cc: Wang, Wei W

Hi Jianfeng,

> -----Original Message-----
> From: Tan, Jianfeng
> Sent: Thursday, December 21, 2017 1:52 PM
> To: Yang, Zhiyong <zhiyong.yang@intel.com>; Maxime Coquelin
> <maxime.coquelin@redhat.com>; dev@dpdk.org; yliu@fridaylinux.org
> Cc: Wang, Wei W <wei.w.wang@intel.com>
> Subject: RE: [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting
> VM2VM scenario
> 
> Hi,
> 
> > -----Original Message-----
> > From: Yang, Zhiyong
> > Sent: Wednesday, December 20, 2017 9:52 AM
> > To: Maxime Coquelin; dev@dpdk.org; yliu@fridaylinux.org
> > Cc: Wang, Wei W; Tan, Jianfeng
> > Subject: RE: [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting
> > VM2VM scenario
> >
> > Hi Maxime,
> >
> > > -----Original Message-----
> > > From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> > > Sent: Tuesday, December 19, 2017 7:15 PM
> > > To: Yang, Zhiyong <zhiyong.yang@intel.com>; dev@dpdk.org;
> > > yliu@fridaylinux.org
> > > Cc: Wang, Wei W <wei.w.wang@intel.com>; Tan, Jianfeng
> > > <jianfeng.tan@intel.com>
> > > Subject: Re: [PATCH 00/11] net/vhostpci: A new vhostpci PMD
> > > supporting VM2VM scenario
> > >
> >
> > <snip>
> >
> > > Thanks for the RFC.
> > > It seems there is a lot of code duplication between this series and
> > > libvhost-
> > user.
> > >
> > > Does the non-RFC would make reuse of libvhost-user? I'm thinking of
> > > all
> > the
> > > code copied from virtio-net.c for example.
> > >
> > > If not, I think this is problematic as it will double the maintenance cost.
> >
> > Thank you for paying attention to the patchset . TX/RX logic basically
> > comes from vhost user code.
> > but some function interfaces and data structures are different,  So
> > can not reuse them directly, code duplicate should be removed, I'm
> > aware of this point too, when I was writing the vhostpci PMD.
> >  But We should modify the  libvhost-user firstly,  let libvhostuser
> > become more generic.
> > It looks like that more and more PMDs are becoming the member of
> > virtio family or are on the road.
> 
> Trying to draw a conclusion here, there are two kinds of code duplication in
> net/vhost-pci.
> - For pci operations in guest driver, it has some duplicated code with virtio-net
> pmd.
> - For the enqueue/dequeuer operations, it has some duplicated code with vhost-
> user lib.
> 
> Right?

Right. If the existing code in DPDK can become more generic, we can avoid the duplication. 

Thanks
Zhiyong

> Thanks,
> Jianfeng
> 
> > >
> > > Cheers,
> > > Maxime

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting VM2VM scenario
  2017-12-21  6:21       ` Yang, Zhiyong
@ 2017-12-21  6:26         ` Yang, Zhiyong
  2017-12-21  8:26           ` Maxime Coquelin
  0 siblings, 1 reply; 26+ messages in thread
From: Yang, Zhiyong @ 2017-12-21  6:26 UTC (permalink / raw)
  To: Yang, Zhiyong, Tan, Jianfeng, Maxime Coquelin, dev, yliu; +Cc: Wang, Wei W



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Yang, Zhiyong
> Sent: Thursday, December 21, 2017 2:21 PM
> To: Tan, Jianfeng <jianfeng.tan@intel.com>; Maxime Coquelin
> <maxime.coquelin@redhat.com>; dev@dpdk.org; yliu@fridaylinux.org
> Cc: Wang, Wei W <wei.w.wang@intel.com>
> Subject: Re: [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD
> supporting VM2VM scenario
> 
> Hi Jianfeng,
> 
> > -----Original Message-----
> > From: Tan, Jianfeng
> > Sent: Thursday, December 21, 2017 1:52 PM
> > To: Yang, Zhiyong <zhiyong.yang@intel.com>; Maxime Coquelin
> > <maxime.coquelin@redhat.com>; dev@dpdk.org; yliu@fridaylinux.org
> > Cc: Wang, Wei W <wei.w.wang@intel.com>
> > Subject: RE: [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting
> > VM2VM scenario
> >
> > Hi,
> >
> > > -----Original Message-----
> > > From: Yang, Zhiyong
> > > Sent: Wednesday, December 20, 2017 9:52 AM
> > > To: Maxime Coquelin; dev@dpdk.org; yliu@fridaylinux.org
> > > Cc: Wang, Wei W; Tan, Jianfeng
> > > Subject: RE: [PATCH 00/11] net/vhostpci: A new vhostpci PMD
> > > supporting VM2VM scenario
> > >
> > > Hi Maxime,
> > >
> > > > -----Original Message-----
> > > > From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> > > > Sent: Tuesday, December 19, 2017 7:15 PM
> > > > To: Yang, Zhiyong <zhiyong.yang@intel.com>; dev@dpdk.org;
> > > > yliu@fridaylinux.org
> > > > Cc: Wang, Wei W <wei.w.wang@intel.com>; Tan, Jianfeng
> > > > <jianfeng.tan@intel.com>
> > > > Subject: Re: [PATCH 00/11] net/vhostpci: A new vhostpci PMD
> > > > supporting VM2VM scenario
> > > >
> > >
> > > <snip>
> > >
> > > > Thanks for the RFC.
> > > > It seems there is a lot of code duplication between this series
> > > > and
> > > > libvhost-
> > > user.
> > > >
> > > > Does the non-RFC would make reuse of libvhost-user? I'm thinking
> > > > of all
> > > the
> > > > code copied from virtio-net.c for example.
> > > >
> > > > If not, I think this is problematic as it will double the maintenance cost.
> > >
> > > Thank you for paying attention to the patchset . TX/RX logic
> > > basically comes from vhost user code.
> > > but some function interfaces and data structures are different,  So
> > > can not reuse them directly, code duplicate should be removed, I'm
> > > aware of this point too, when I was writing the vhostpci PMD.
> > >  But We should modify the  libvhost-user firstly,  let libvhostuser
> > > become more generic.
> > > It looks like that more and more PMDs are becoming the member of
> > > virtio family or are on the road.
> >
> > Trying to draw a conclusion here, there are two kinds of code
> > duplication in net/vhost-pci.
> > - For pci operations in guest driver, it has some duplicated code with
> > virtio-net pmd.
> > - For the enqueue/dequeuer operations, it has some duplicated code
> > with vhost- user lib.
> >
> > Right?
> 
> Right. If the existing code in DPDK can become more generic, we can avoid the
> duplication.
> 

BTW,  wonder why vhost user enqueue/dequeue are put in lib layer,not in driver/net/vhost  like virtio PMD?

> Thanks
> Zhiyong
> 
> > Thanks,
> > Jianfeng
> >
> > > >
> > > > Cheers,
> > > > Maxime

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting VM2VM scenario
  2017-12-21  6:26         ` Yang, Zhiyong
@ 2017-12-21  8:26           ` Maxime Coquelin
  2017-12-21  8:40             ` Yang, Zhiyong
  0 siblings, 1 reply; 26+ messages in thread
From: Maxime Coquelin @ 2017-12-21  8:26 UTC (permalink / raw)
  To: Yang, Zhiyong, Tan, Jianfeng, dev, yliu; +Cc: Wang, Wei W



On 12/21/2017 07:26 AM, Yang, Zhiyong wrote:
> 
> 
>> -----Original Message-----
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Yang, Zhiyong
>> Sent: Thursday, December 21, 2017 2:21 PM
>> To: Tan, Jianfeng <jianfeng.tan@intel.com>; Maxime Coquelin
>> <maxime.coquelin@redhat.com>; dev@dpdk.org; yliu@fridaylinux.org
>> Cc: Wang, Wei W <wei.w.wang@intel.com>
>> Subject: Re: [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD
>> supporting VM2VM scenario
>>
>> Hi Jianfeng,
>>
>>> -----Original Message-----
>>> From: Tan, Jianfeng
>>> Sent: Thursday, December 21, 2017 1:52 PM
>>> To: Yang, Zhiyong <zhiyong.yang@intel.com>; Maxime Coquelin
>>> <maxime.coquelin@redhat.com>; dev@dpdk.org; yliu@fridaylinux.org
>>> Cc: Wang, Wei W <wei.w.wang@intel.com>
>>> Subject: RE: [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting
>>> VM2VM scenario
>>>
>>> Hi,
>>>
>>>> -----Original Message-----
>>>> From: Yang, Zhiyong
>>>> Sent: Wednesday, December 20, 2017 9:52 AM
>>>> To: Maxime Coquelin; dev@dpdk.org; yliu@fridaylinux.org
>>>> Cc: Wang, Wei W; Tan, Jianfeng
>>>> Subject: RE: [PATCH 00/11] net/vhostpci: A new vhostpci PMD
>>>> supporting VM2VM scenario
>>>>
>>>> Hi Maxime,
>>>>
>>>>> -----Original Message-----
>>>>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
>>>>> Sent: Tuesday, December 19, 2017 7:15 PM
>>>>> To: Yang, Zhiyong <zhiyong.yang@intel.com>; dev@dpdk.org;
>>>>> yliu@fridaylinux.org
>>>>> Cc: Wang, Wei W <wei.w.wang@intel.com>; Tan, Jianfeng
>>>>> <jianfeng.tan@intel.com>
>>>>> Subject: Re: [PATCH 00/11] net/vhostpci: A new vhostpci PMD
>>>>> supporting VM2VM scenario
>>>>>
>>>>
>>>> <snip>
>>>>
>>>>> Thanks for the RFC.
>>>>> It seems there is a lot of code duplication between this series
>>>>> and
>>>>> libvhost-
>>>> user.
>>>>>
>>>>> Does the non-RFC would make reuse of libvhost-user? I'm thinking
>>>>> of all
>>>> the
>>>>> code copied from virtio-net.c for example.
>>>>>
>>>>> If not, I think this is problematic as it will double the maintenance cost.
>>>>
>>>> Thank you for paying attention to the patchset . TX/RX logic
>>>> basically comes from vhost user code.
>>>> but some function interfaces and data structures are different,  So
>>>> can not reuse them directly, code duplicate should be removed, I'm
>>>> aware of this point too, when I was writing the vhostpci PMD.
>>>>   But We should modify the  libvhost-user firstly,  let libvhostuser
>>>> become more generic.
>>>> It looks like that more and more PMDs are becoming the member of
>>>> virtio family or are on the road.
>>>
>>> Trying to draw a conclusion here, there are two kinds of code
>>> duplication in net/vhost-pci.
>>> - For pci operations in guest driver, it has some duplicated code with
>>> virtio-net pmd.
>>> - For the enqueue/dequeuer operations, it has some duplicated code
>>> with vhost- user lib.
>>>
>>> Right?
>>
>> Right. If the existing code in DPDK can become more generic, we can avoid the
>> duplication.
>>
> 
> BTW,  wonder why vhost user enqueue/dequeue are put in lib layer,not in driver/net/vhost  like virtio PMD?

Because Vhost PMD is not its only user, it exports an API that can be
used directly by the application (e.g. ovs-dpdk).

Regards,
Maxime

>> Thanks
>> Zhiyong
>>
>>> Thanks,
>>> Jianfeng
>>>
>>>>>
>>>>> Cheers,
>>>>> Maxime

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting VM2VM scenario
  2017-12-21  8:26           ` Maxime Coquelin
@ 2017-12-21  8:40             ` Yang, Zhiyong
  0 siblings, 0 replies; 26+ messages in thread
From: Yang, Zhiyong @ 2017-12-21  8:40 UTC (permalink / raw)
  To: Maxime Coquelin, Tan, Jianfeng, dev, yliu; +Cc: Wang, Wei W



> -----Original Message-----
> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> Sent: Thursday, December 21, 2017 4:26 PM
> To: Yang, Zhiyong <zhiyong.yang@intel.com>; Tan, Jianfeng
> <jianfeng.tan@intel.com>; dev@dpdk.org; yliu@fridaylinux.org
> Cc: Wang, Wei W <wei.w.wang@intel.com>
> Subject: Re: [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting
> VM2VM scenario
> 
> 
> 
> On 12/21/2017 07:26 AM, Yang, Zhiyong wrote:
> >
> >
> >> -----Original Message-----
> >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Yang, Zhiyong
> >> Sent: Thursday, December 21, 2017 2:21 PM
> >> To: Tan, Jianfeng <jianfeng.tan@intel.com>; Maxime Coquelin
> >> <maxime.coquelin@redhat.com>; dev@dpdk.org; yliu@fridaylinux.org
> >> Cc: Wang, Wei W <wei.w.wang@intel.com>
> >> Subject: Re: [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci
> >> PMD supporting VM2VM scenario
> >>
> >> Hi Jianfeng,
> >>
> >>> -----Original Message-----
> >>> From: Tan, Jianfeng
> >>> Sent: Thursday, December 21, 2017 1:52 PM
> >>> To: Yang, Zhiyong <zhiyong.yang@intel.com>; Maxime Coquelin
> >>> <maxime.coquelin@redhat.com>; dev@dpdk.org; yliu@fridaylinux.org
> >>> Cc: Wang, Wei W <wei.w.wang@intel.com>
> >>> Subject: RE: [PATCH 00/11] net/vhostpci: A new vhostpci PMD
> >>> supporting VM2VM scenario
> >>>
> >>> Hi,
> >>>
> >>>> -----Original Message-----
> >>>> From: Yang, Zhiyong
> >>>> Sent: Wednesday, December 20, 2017 9:52 AM
> >>>> To: Maxime Coquelin; dev@dpdk.org; yliu@fridaylinux.org
> >>>> Cc: Wang, Wei W; Tan, Jianfeng
> >>>> Subject: RE: [PATCH 00/11] net/vhostpci: A new vhostpci PMD
> >>>> supporting VM2VM scenario
> >>>>
> >>>> Hi Maxime,
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> >>>>> Sent: Tuesday, December 19, 2017 7:15 PM
> >>>>> To: Yang, Zhiyong <zhiyong.yang@intel.com>; dev@dpdk.org;
> >>>>> yliu@fridaylinux.org
> >>>>> Cc: Wang, Wei W <wei.w.wang@intel.com>; Tan, Jianfeng
> >>>>> <jianfeng.tan@intel.com>
> >>>>> Subject: Re: [PATCH 00/11] net/vhostpci: A new vhostpci PMD
> >>>>> supporting VM2VM scenario
> >>>>>
> >>>>
> >>>> <snip>
> >>>>
> >>>>> Thanks for the RFC.
> >>>>> It seems there is a lot of code duplication between this series
> >>>>> and
> >>>>> libvhost-
> >>>> user.
> >>>>>
> >>>>> Does the non-RFC would make reuse of libvhost-user? I'm thinking
> >>>>> of all
> >>>> the
> >>>>> code copied from virtio-net.c for example.
> >>>>>
> >>>>> If not, I think this is problematic as it will double the maintenance cost.
> >>>>
> >>>> Thank you for paying attention to the patchset . TX/RX logic
> >>>> basically comes from vhost user code.
> >>>> but some function interfaces and data structures are different,  So
> >>>> can not reuse them directly, code duplicate should be removed, I'm
> >>>> aware of this point too, when I was writing the vhostpci PMD.
> >>>>   But We should modify the  libvhost-user firstly,  let
> >>>> libvhostuser become more generic.
> >>>> It looks like that more and more PMDs are becoming the member of
> >>>> virtio family or are on the road.
> >>>
> >>> Trying to draw a conclusion here, there are two kinds of code
> >>> duplication in net/vhost-pci.
> >>> - For pci operations in guest driver, it has some duplicated code
> >>> with virtio-net pmd.
> >>> - For the enqueue/dequeuer operations, it has some duplicated code
> >>> with vhost- user lib.
> >>>
> >>> Right?
> >>
> >> Right. If the existing code in DPDK can become more generic, we can
> >> avoid the duplication.
> >>
> >
> > BTW,  wonder why vhost user enqueue/dequeue are put in lib layer,not in
> driver/net/vhost  like virtio PMD?
> 
> Because Vhost PMD is not its only user, it exports an API that can be used
> directly by the application (e.g. ovs-dpdk).
> 

Thanks for your  detailed clarification. Maxime.

Zhiyong

> Regards,
> Maxime
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting VM2VM scenario
  2017-12-19 11:14 ` Maxime Coquelin
  2017-12-20  1:51   ` Yang, Zhiyong
@ 2018-01-11 11:13   ` Yang, Zhiyong
  2018-01-18  9:04     ` Maxime Coquelin
  1 sibling, 1 reply; 26+ messages in thread
From: Yang, Zhiyong @ 2018-01-11 11:13 UTC (permalink / raw)
  To: Maxime Coquelin, dev, yliu; +Cc: Wang, Wei W, Tan, Jianfeng

Hi Maxime, all, 

> -----Original Message-----
> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> Sent: Tuesday, December 19, 2017 7:15 PM
> To: Yang, Zhiyong <zhiyong.yang@intel.com>; dev@dpdk.org;
> yliu@fridaylinux.org
> Cc: Wang, Wei W <wei.w.wang@intel.com>; Tan, Jianfeng
> <jianfeng.tan@intel.com>
> Subject: Re: [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting
> VM2VM scenario
> 
> Hi Zhiyong,
> 
> On 11/30/2017 10:46 AM, Zhiyong Yang wrote:
> > Vhostpci PMD is a new type driver working in guest OS which has
> > ability to drive the vhostpci modern pci device, which is a new virtio device.
> >
> > The following linking is about vhostpci design:
> >
> > An initial device design is presented at KVM Forum'16:
> > http://www.linux-kvm.org/images/5/55/02x07A-Wei_Wang-Design_of-
> Vhost-p
> > ci.pdf The latest device design and implementation will be posted to
> > the QEMU community soon.
> >
> > Vhostpci PMD works in pair with virtio-net PMD to achieve
> > point-to-point communication between VMs. DPDK already has
> > virtio/vhost user PMD pair to implement RX/TX packets between
> > guest/host scenario. However, for VM2VM use cases, Virtio PMD needs to
> > transmit pkts from VM1 to host OS firstly by vhost user port, then
> > transmit pkts to the 2nd VM by virtio PMD port again. Virtio/Vhostpci
> > PMD pair can implement shared memory to receive/trasmit packets
> > directly between two VMs. Currently, the entire memory of the virtio-net
> side VM is shared to the vhost-pci side VM, and mapped via device BAR2,
> and the first 4KB area of BAR2 is reserved to store the metadata.
> >
> > The vhostpci/virtio PMD working processing is the following:
> >
> > 1.VM1 startup with vhostpci device, bind the device to DPDK in the
> > guest1, launch the DPDK testpmd, then waiting for the remote memory
> > info (the VM2 shares memory, memory regions and vring info).
> >
> > 2.VM2 startup with virtio-net device, bind the virito-net to DPDK in
> > the VM2, run testpmd using virtio PMD.
> >
> > 3.vhostpci device negotiate virtio message with virtio-net device via
> > socket as vhost user/virtio-net do that.
> >
> > 4.Vhostpci device gets VM2's memory region and vring info and write
> > the metadata to VM2's shared memory.
> >
> > 5.When the metadata is ready to be read by the Vhostpci PMD, the PMD
> > will receive a config interrupt with LINK_UP set in the status config.
> >
> > 6.Vhostpci PMD and Virtio PMD can transmit/receive the packets.
> >
> > How to test?
> >
> > 1. launch VM1 with vhostpci device.
> > qemu/x86_64-softmmu/qemu-system-x86_64 -cpu host -M pc -enable-
> kvm \
> > -smp 16,threads=1,sockets=1 -m 8G -mem-prealloc -realtime mlock=on \
> > -object memory-backend-file,id=mem,size=8G,mem-
> path=/dev/hugepages, \
> > share=on -numa node,memdev=mem -drive
> > if=virtio,file=/root/vhost-pci/guest1.img,format=raw \ -kernel
> > /opt/guest_kernel -append 'root=/dev/vda1 ro default_hugepagesz=1G
> > hugepagesz=1G \
> > hugepages=2 console=ttyS0,115200,8n1 3' -netdev
> > tap,id=net1,br=br0,script=/etc/qemu-ifup \ -chardev
> > socket,id=slave1,server,wait=off, path=/opt/vhost-pci-slave1 -device
> > vhost-pci-net-pci, \
> > chardev=slave1 \
> > -nographic
> >
> > 2. bind vhostpci device to dpdk using igb_uio.
> > startup dpdk
> > ./x86_64-native-linuxapp-gcc/app/testpmd -c 0x3 -n 4 -- -i
> >
> > 3. launch VM2 with virtio-net device.
> >
> > qemu/x86_64-softmmu/qemu-system-x86_64 -cpu host -M pc -enable-
> kvm \
> > -smp 4,threads=1,sockets=1 -m 8G -mem-prealloc -realtime mlock=on \
> > -object
> > memory-backend-file,id=mem,size=8G,mem-
> path=/dev/hugepages,share=on \
> > -numa node,memdev=mem -drive
> > if=virtio,file=/root/vhost-pci/guest2.img,format=raw \ -net none
> > -no-hpet -kernel /opt/guest_kernel \ -append 'root=/dev/vda1 ro
> > default_hugepagesz=1G hugepagesz=1G hugepages=2
> > console=ttyS0,115200,8n1 3' \ -chardev
> > socket,id=sock2,path=/opt/vhost-pci-slave1 \ -netdev
> > type=vhost-user,id=net2,chardev=sock2,vhostforce \ -device
> > virtio-net-pci,mac=52:54:00:00:00:02,netdev=net2 \ -nographic
> >
> > 4.bind virtio-net to dpdk using igb_uio run dpdk
> >
> > ./x86_64-native-linuxapp-gcc/app/testpmd -c 0x3 -n 4 --socket-mem
> > 512,0 \
> > -- -i --rxq=1 --txq=1 --nb-cores=1
> >
> > 5. vhostpci PMD run "start"
> >
> > 6. virtio PMD side run "start tx_first"
> >
> > loopback testing can work.
> >
> > note:
> > 1. only support igb_uio for now.
> > 2. vhostpci device is a modern pci device. vhostpci PMD only supports
> > mergable mode. Virtio device side must be mergable mode.
> > 3. vhostpci PMD supports one queue pair for now.
> >
> > Zhiyong Yang (11):
> >    drivers/net: add vhostpci PMD base files
> >    net/vhostpci: public header files
> >    net/vhostpci: add debugging log macros
> >    net/vhostpci: add basic framework
> >    net/vhostpci: add queue setup
> >    net/vhostpci: add support for link status change
> >    net/vhostpci: get remote memory region and vring info
> >    net/vhostpci: add RX function
> >    net/vhostpci: add TX function
> >    net/vhostpci: support RX/TX packets statistics
> >    net/vhostpci: update release note
> >
> >   MAINTAINERS                                       |    6 +
> >   config/common_base                                |    9 +
> >   config/common_linuxapp                            |    1 +
> >   doc/guides/rel_notes/release_18_02.rst            |    6 +
> >   drivers/net/Makefile                              |    1 +
> >   drivers/net/vhostpci/Makefile                     |   54 +
> >   drivers/net/vhostpci/rte_pmd_vhostpci_version.map |    3 +
> >   drivers/net/vhostpci/vhostpci_ethdev.c            | 1521
> +++++++++++++++++++++
> >   drivers/net/vhostpci/vhostpci_ethdev.h            |  176 +++
> >   drivers/net/vhostpci/vhostpci_logs.h              |   69 +
> >   drivers/net/vhostpci/vhostpci_net.h               |   74 +
> >   drivers/net/vhostpci/vhostpci_pci.c               |  334 +++++
> >   drivers/net/vhostpci/vhostpci_pci.h               |  240 ++++
> >   mk/rte.app.mk                                     |    1 +
> >   14 files changed, 2495 insertions(+)
> >   create mode 100644 drivers/net/vhostpci/Makefile
> >   create mode 100644
> drivers/net/vhostpci/rte_pmd_vhostpci_version.map
> >   create mode 100644 drivers/net/vhostpci/vhostpci_ethdev.c
> >   create mode 100644 drivers/net/vhostpci/vhostpci_ethdev.h
> >   create mode 100644 drivers/net/vhostpci/vhostpci_logs.h
> >   create mode 100644 drivers/net/vhostpci/vhostpci_net.h
> >   create mode 100644 drivers/net/vhostpci/vhostpci_pci.c
> >   create mode 100644 drivers/net/vhostpci/vhostpci_pci.h
> >
> 
> Thanks for the RFC.
> It seems there is a lot of code duplication between this series and libvhost-
> user.
> 
> Does the non-RFC would make reuse of libvhost-user? I'm thinking of all the
> code copied from virtio-net.c for example.
> 
> If not, I think this is problematic as it will double the maintenance cost.
>

I'm trying to reuse  librte_vhost RX/TX logic  and it seems feasible,
However, I have to expose many internal data structures in librte_vhost such as virtio_net, vhost_virtqueue , etc to PMD layer.
Since vhostpci PMD is using one virtio pci device (vhostpci device) in guest,    Memory allocation and release should be done in driver/net/vhostpci as virtio PMD does that.
Vhostpci and vhost can share struct  virtio_net to manage the different drivers, since they are very similar.
The features for example zero copy feature, make rarp packets don't need to be supported for vhostpci, we can always disable them.
How do you think about the thoughts?

thanks
Zhiyong
 
> Cheers,
> Maxime



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting VM2VM scenario
  2018-01-11 11:13   ` Yang, Zhiyong
@ 2018-01-18  9:04     ` Maxime Coquelin
  2018-01-19  1:56       ` Yang, Zhiyong
  0 siblings, 1 reply; 26+ messages in thread
From: Maxime Coquelin @ 2018-01-18  9:04 UTC (permalink / raw)
  To: Yang, Zhiyong, dev, yliu; +Cc: Wang, Wei W, Tan, Jianfeng

Hi Zhiyong,

Sorry for the late reply, please find my comments inline:

On 01/11/2018 12:13 PM, Yang, Zhiyong wrote:
> Hi Maxime, all,
> 

...

>>> Zhiyong Yang (11):
>>>     drivers/net: add vhostpci PMD base files
>>>     net/vhostpci: public header files
>>>     net/vhostpci: add debugging log macros
>>>     net/vhostpci: add basic framework
>>>     net/vhostpci: add queue setup
>>>     net/vhostpci: add support for link status change
>>>     net/vhostpci: get remote memory region and vring info
>>>     net/vhostpci: add RX function
>>>     net/vhostpci: add TX function
>>>     net/vhostpci: support RX/TX packets statistics
>>>     net/vhostpci: update release note
>>>
>>>    MAINTAINERS                                       |    6 +
>>>    config/common_base                                |    9 +
>>>    config/common_linuxapp                            |    1 +
>>>    doc/guides/rel_notes/release_18_02.rst            |    6 +
>>>    drivers/net/Makefile                              |    1 +
>>>    drivers/net/vhostpci/Makefile                     |   54 +
>>>    drivers/net/vhostpci/rte_pmd_vhostpci_version.map |    3 +
>>>    drivers/net/vhostpci/vhostpci_ethdev.c            | 1521
>> +++++++++++++++++++++
>>>    drivers/net/vhostpci/vhostpci_ethdev.h            |  176 +++
>>>    drivers/net/vhostpci/vhostpci_logs.h              |   69 +
>>>    drivers/net/vhostpci/vhostpci_net.h               |   74 +
>>>    drivers/net/vhostpci/vhostpci_pci.c               |  334 +++++
>>>    drivers/net/vhostpci/vhostpci_pci.h               |  240 ++++
>>>    mk/rte.app.mk                                     |    1 +
>>>    14 files changed, 2495 insertions(+)
>>>    create mode 100644 drivers/net/vhostpci/Makefile
>>>    create mode 100644
>> drivers/net/vhostpci/rte_pmd_vhostpci_version.map
>>>    create mode 100644 drivers/net/vhostpci/vhostpci_ethdev.c
>>>    create mode 100644 drivers/net/vhostpci/vhostpci_ethdev.h
>>>    create mode 100644 drivers/net/vhostpci/vhostpci_logs.h
>>>    create mode 100644 drivers/net/vhostpci/vhostpci_net.h
>>>    create mode 100644 drivers/net/vhostpci/vhostpci_pci.c
>>>    create mode 100644 drivers/net/vhostpci/vhostpci_pci.h
>>>
>>
>> Thanks for the RFC.
>> It seems there is a lot of code duplication between this series and libvhost-
>> user.
>>
>> Does the non-RFC would make reuse of libvhost-user? I'm thinking of all the
>> code copied from virtio-net.c for example.
>>
>> If not, I think this is problematic as it will double the maintenance cost.
>>
> 
> I'm trying to reuse  librte_vhost RX/TX logic  and it seems feasible,
> However, I have to expose many internal data structures in librte_vhost such as virtio_net, vhost_virtqueue , etc to PMD layer.

I don't really like it, it looks like a layer violation.

> Since vhostpci PMD is using one virtio pci device (vhostpci device) in guest,    Memory allocation and release should be done in driver/net/vhostpci as virtio PMD does that.

If you talk about mbuf alloc/release, then Vhost PMD also does it.
So I'm not sure to get the point.

> Vhostpci and vhost can share struct  virtio_net to manage the different drivers, since they are very similar.
> The features for example zero copy feature, make rarp packets don't need to be supported for vhostpci, we can always disable them.
> How do you think about the thoughts?

Why not put vhost-pci wrappers in virtio-net?
Maybe TX/RX functions should be reworked to extract the common bits
between vhost-user and vhost-pci, taking care of not degrading
performance of vhost-user.

I don't know if this is feasible, what do you think?

Thanks,
Maxime
> thanks
> Zhiyong
>   
>> Cheers,
>> Maxime
> 
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting VM2VM scenario
  2018-01-18  9:04     ` Maxime Coquelin
@ 2018-01-19  1:56       ` Yang, Zhiyong
  0 siblings, 0 replies; 26+ messages in thread
From: Yang, Zhiyong @ 2018-01-19  1:56 UTC (permalink / raw)
  To: Maxime Coquelin, dev, yliu; +Cc: Wang, Wei W, Tan, Jianfeng

Hi Maxime,

> -----Original Message-----
> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> Sent: Thursday, January 18, 2018 5:04 PM
> To: Yang, Zhiyong <zhiyong.yang@intel.com>; dev@dpdk.org;
> yliu@fridaylinux.org
> Cc: Wang, Wei W <wei.w.wang@intel.com>; Tan, Jianfeng
> <jianfeng.tan@intel.com>
> Subject: Re: [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting
> VM2VM scenario
> 
> Hi Zhiyong,
> 
> Sorry for the late reply, please find my comments inline:
> 
> On 01/11/2018 12:13 PM, Yang, Zhiyong wrote:
> > Hi Maxime, all,
> >
> 
> ...
> 
> >>> Zhiyong Yang (11):
> >>>     drivers/net: add vhostpci PMD base files
> >>>     net/vhostpci: public header files
> >>>     net/vhostpci: add debugging log macros
> >>>     net/vhostpci: add basic framework
> >>>     net/vhostpci: add queue setup
> >>>     net/vhostpci: add support for link status change
> >>>     net/vhostpci: get remote memory region and vring info
> >>>     net/vhostpci: add RX function
> >>>     net/vhostpci: add TX function
> >>>     net/vhostpci: support RX/TX packets statistics
> >>>     net/vhostpci: update release note
> >>>
> >>>    MAINTAINERS                                       |    6 +
> >>>    config/common_base                                |    9 +
> >>>    config/common_linuxapp                            |    1 +
> >>>    doc/guides/rel_notes/release_18_02.rst            |    6 +
> >>>    drivers/net/Makefile                              |    1 +
> >>>    drivers/net/vhostpci/Makefile                     |   54 +
> >>>    drivers/net/vhostpci/rte_pmd_vhostpci_version.map |    3 +
> >>>    drivers/net/vhostpci/vhostpci_ethdev.c            | 1521
> >> +++++++++++++++++++++
> >>>    drivers/net/vhostpci/vhostpci_ethdev.h            |  176 +++
> >>>    drivers/net/vhostpci/vhostpci_logs.h              |   69 +
> >>>    drivers/net/vhostpci/vhostpci_net.h               |   74 +
> >>>    drivers/net/vhostpci/vhostpci_pci.c               |  334 +++++
> >>>    drivers/net/vhostpci/vhostpci_pci.h               |  240 ++++
> >>>    mk/rte.app.mk                                     |    1 +
> >>>    14 files changed, 2495 insertions(+)
> >>>    create mode 100644 drivers/net/vhostpci/Makefile
> >>>    create mode 100644
> >> drivers/net/vhostpci/rte_pmd_vhostpci_version.map
> >>>    create mode 100644 drivers/net/vhostpci/vhostpci_ethdev.c
> >>>    create mode 100644 drivers/net/vhostpci/vhostpci_ethdev.h
> >>>    create mode 100644 drivers/net/vhostpci/vhostpci_logs.h
> >>>    create mode 100644 drivers/net/vhostpci/vhostpci_net.h
> >>>    create mode 100644 drivers/net/vhostpci/vhostpci_pci.c
> >>>    create mode 100644 drivers/net/vhostpci/vhostpci_pci.h
> >>>
> >>
> >> Thanks for the RFC.
> >> It seems there is a lot of code duplication between this series and
> >> libvhost- user.
> >>
> >> Does the non-RFC would make reuse of libvhost-user? I'm thinking of
> >> all the code copied from virtio-net.c for example.
> >>
> >> If not, I think this is problematic as it will double the maintenance cost.
> >>
> >
> > I'm trying to reuse  librte_vhost RX/TX logic  and it seems feasible,
> > However, I have to expose many internal data structures in librte_vhost
> such as virtio_net, vhost_virtqueue , etc to PMD layer.
> 
> I don't really like it, it looks like a layer violation.
> 
> > Since vhostpci PMD is using one virtio pci device (vhostpci device) in guest,
> Memory allocation and release should be done in driver/net/vhostpci as
> virtio PMD does that.
> 
> If you talk about mbuf alloc/release, then Vhost PMD also does it.
> So I'm not sure to get the point.
> 
> > Vhostpci and vhost can share struct  virtio_net to manage the different
> drivers, since they are very similar.
> > The features for example zero copy feature, make rarp packets don't need
> to be supported for vhostpci, we can always disable them.
> > How do you think about the thoughts?
> 
> Why not put vhost-pci wrappers in virtio-net?
> Maybe TX/RX functions should be reworked to extract the common bits
> between vhost-user and vhost-pci, taking care of not degrading performance
> of vhost-user.
> 
> I don't know if this is feasible, what do you think?
> 

make sense, thanks for your useful suggestions. :)

Thanks
Zhiyong

> Thanks,
> Maxime
> > thanks
> > Zhiyong
> >
> >> Cheers,
> >> Maxime
> >
> >

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2018-01-19  1:56 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-30  9:46 [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting VM2VM scenario Zhiyong Yang
2017-11-30  9:46 ` [dpdk-dev] [PATCH 01/11] drivers/net: add vhostpci PMD base files Zhiyong Yang
2017-11-30  9:46 ` [dpdk-dev] [PATCH 02/11] net/vhostpci: public header files Zhiyong Yang
2017-11-30  9:46 ` [dpdk-dev] [PATCH 03/11] net/vhostpci: add debugging log macros Zhiyong Yang
2017-11-30  9:46 ` [dpdk-dev] [PATCH 04/11] net/vhostpci: add basic framework Zhiyong Yang
2017-11-30  9:46 ` [dpdk-dev] [PATCH 05/11] net/vhostpci: add queue setup Zhiyong Yang
2017-11-30  9:46 ` [dpdk-dev] [PATCH 06/11] net/vhostpci: add support for link status change Zhiyong Yang
2017-11-30  9:46 ` [dpdk-dev] [PATCH 07/11] net/vhostpci: get remote memory region and vring info Zhiyong Yang
2017-11-30  9:46 ` [dpdk-dev] [PATCH 08/11] net/vhostpci: add RX function Zhiyong Yang
2017-11-30  9:46 ` [dpdk-dev] [PATCH 09/11] net/vhostpci: add TX function Zhiyong Yang
2017-11-30  9:46 ` [dpdk-dev] [PATCH 10/11] net/vhostpci: support RX/TX packets statistics Zhiyong Yang
2017-11-30  9:46 ` [dpdk-dev] [PATCH 11/11] net/vhostpci: update release note Zhiyong Yang
2017-12-05  6:59 ` [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting VM2VM scenario Yang, Zhiyong
2017-12-05 14:08   ` Yuanhan Liu
2017-12-06  3:00     ` Wei Wang
2017-12-07  6:07   ` Yang, Zhiyong
2017-12-19 11:14 ` Maxime Coquelin
2017-12-20  1:51   ` Yang, Zhiyong
2017-12-21  5:52     ` Tan, Jianfeng
2017-12-21  6:21       ` Yang, Zhiyong
2017-12-21  6:26         ` Yang, Zhiyong
2017-12-21  8:26           ` Maxime Coquelin
2017-12-21  8:40             ` Yang, Zhiyong
2018-01-11 11:13   ` Yang, Zhiyong
2018-01-18  9:04     ` Maxime Coquelin
2018-01-19  1:56       ` Yang, Zhiyong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).