DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 0/3] add Hyper-V bus and network driver
@ 2018-04-05 19:13 Stephen Hemminger
  2018-04-05 19:13 ` [dpdk-dev] [PATCH 1/3] bus/vmbus: add hyper-v virtual bus support Stephen Hemminger
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Stephen Hemminger @ 2018-04-05 19:13 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This is an experimental driver originally developed by Vyatta/Brocade/ATT
to support DPDK on Hyper-V.  It is a native DPDK driver (unlike the TAP
solution) for VMBus. The driver relies on the version UIO for Hyper-V
(uio_hv_generic) which is in upstream kernel next tree (char-misc-next).

It is not yet a full replacement for the failsafe/tap/vdev_netvsc solution
since it does not support SR-IOV. The driver and bus interface are marked
experimental until it is ready to replace them.

Stephen Hemminger (3):
  bus/vmbus: add hyper-v virtual bus support
  usertools: add hv_uio_setup script
  net/netvsc: add hyper-v netvsc network device

 MAINTAINERS                                   |   10 +
 config/common_base                            |   13 +
 config/common_linuxapp                        |    4 +
 doc/guides/nics/index.rst                     |    1 +
 doc/guides/nics/netvsc.rst                    |   53 ++
 drivers/bus/Makefile                          |    1 +
 drivers/bus/vmbus/Makefile                    |   36 +
 drivers/bus/vmbus/linux/Makefile              |    3 +
 drivers/bus/vmbus/linux/vmbus_bus.c           |  354 +++++++
 drivers/bus/vmbus/linux/vmbus_uio.c           |  340 +++++++
 drivers/bus/vmbus/private.h                   |  125 +++
 drivers/bus/vmbus/rte_bus_vmbus.h             |  381 ++++++++
 drivers/bus/vmbus/rte_bus_vmbus_version.map   |   23 +
 drivers/bus/vmbus/rte_vmbus_reg.h             |  344 +++++++
 drivers/bus/vmbus/vmbus_bufring.c             |  242 +++++
 drivers/bus/vmbus/vmbus_channel.c             |  351 +++++++
 drivers/bus/vmbus/vmbus_common.c              |  287 ++++++
 drivers/bus/vmbus/vmbus_common_uio.c          |  232 +++++
 drivers/net/Makefile                          |    1 +
 drivers/net/netvsc/Makefile                   |   23 +
 drivers/net/netvsc/hn_ethdev.c                |  751 +++++++++++++++
 drivers/net/netvsc/hn_logs.h                  |   35 +
 drivers/net/netvsc/hn_nvs.c                   |  533 +++++++++++
 drivers/net/netvsc/hn_nvs.h                   |  243 +++++
 drivers/net/netvsc/hn_rndis.c                 | 1101 ++++++++++++++++++++++
 drivers/net/netvsc/hn_rndis.h                 |   26 +
 drivers/net/netvsc/hn_rxtx.c                  | 1224 +++++++++++++++++++++++++
 drivers/net/netvsc/hn_var.h                   |  140 +++
 drivers/net/netvsc/ndis.h                     |  378 ++++++++
 drivers/net/netvsc/rndis.h                    |  414 +++++++++
 drivers/net/netvsc/rte_pmd_netvsc_version.map |    5 +
 mk/rte.app.mk                                 |    2 +
 usertools/hv_uio_setup.sh                     |   40 +
 33 files changed, 7716 insertions(+)
 create mode 100644 doc/guides/nics/netvsc.rst
 create mode 100644 drivers/bus/vmbus/Makefile
 create mode 100644 drivers/bus/vmbus/linux/Makefile
 create mode 100644 drivers/bus/vmbus/linux/vmbus_bus.c
 create mode 100644 drivers/bus/vmbus/linux/vmbus_uio.c
 create mode 100644 drivers/bus/vmbus/private.h
 create mode 100644 drivers/bus/vmbus/rte_bus_vmbus.h
 create mode 100644 drivers/bus/vmbus/rte_bus_vmbus_version.map
 create mode 100644 drivers/bus/vmbus/rte_vmbus_reg.h
 create mode 100644 drivers/bus/vmbus/vmbus_bufring.c
 create mode 100644 drivers/bus/vmbus/vmbus_channel.c
 create mode 100644 drivers/bus/vmbus/vmbus_common.c
 create mode 100644 drivers/bus/vmbus/vmbus_common_uio.c
 create mode 100644 drivers/net/netvsc/Makefile
 create mode 100644 drivers/net/netvsc/hn_ethdev.c
 create mode 100644 drivers/net/netvsc/hn_logs.h
 create mode 100644 drivers/net/netvsc/hn_nvs.c
 create mode 100644 drivers/net/netvsc/hn_nvs.h
 create mode 100644 drivers/net/netvsc/hn_rndis.c
 create mode 100644 drivers/net/netvsc/hn_rndis.h
 create mode 100644 drivers/net/netvsc/hn_rxtx.c
 create mode 100644 drivers/net/netvsc/hn_var.h
 create mode 100644 drivers/net/netvsc/ndis.h
 create mode 100644 drivers/net/netvsc/rndis.h
 create mode 100644 drivers/net/netvsc/rte_pmd_netvsc_version.map
 create mode 100755 usertools/hv_uio_setup.sh

-- 
2.16.3

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [dpdk-dev] [PATCH 1/3] bus/vmbus: add hyper-v virtual bus support
  2018-04-05 19:13 [dpdk-dev] [PATCH 0/3] add Hyper-V bus and network driver Stephen Hemminger
@ 2018-04-05 19:13 ` Stephen Hemminger
  2018-04-05 19:13 ` [dpdk-dev] [PATCH 2/3] usertools: add hv_uio_setup script Stephen Hemminger
  2018-04-05 19:13 ` [dpdk-dev] [PATCH 3/3] net/netvsc: add hyper-v netvsc network device Stephen Hemminger
  2 siblings, 0 replies; 20+ messages in thread
From: Stephen Hemminger @ 2018-04-05 19:13 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Stephen Hemminger

From: Stephen Hemminger <stephen@networkplumber.org>

This patch adds support for an additional bus type Virtual Machine BUS
(VMBUS)is used on Microsoft Hyper-V in Windows 10, Windows Server 2016
and Azure.  Most of this code was extracted from FreeBSD and some of
this is from earlier code donated by Brocade.

Only Linux is supported at present, but the code is split
to allow future FreeBSD and Windows support.

This version supports multiple channels per device and requires
requires this bus support as well revised versio of uio_hv_generic
which is has been submitted to upstream Linux.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
 MAINTAINERS                                 |   3 +
 config/common_base                          |   5 +
 config/common_linuxapp                      |   4 +
 drivers/bus/Makefile                        |   1 +
 drivers/bus/vmbus/Makefile                  |  36 +++
 drivers/bus/vmbus/linux/Makefile            |   3 +
 drivers/bus/vmbus/linux/vmbus_bus.c         | 354 ++++++++++++++++++++++++++
 drivers/bus/vmbus/linux/vmbus_uio.c         | 340 +++++++++++++++++++++++++
 drivers/bus/vmbus/private.h                 | 125 +++++++++
 drivers/bus/vmbus/rte_bus_vmbus.h           | 381 ++++++++++++++++++++++++++++
 drivers/bus/vmbus/rte_bus_vmbus_version.map |  23 ++
 drivers/bus/vmbus/rte_vmbus_reg.h           | 344 +++++++++++++++++++++++++
 drivers/bus/vmbus/vmbus_bufring.c           | 242 ++++++++++++++++++
 drivers/bus/vmbus/vmbus_channel.c           | 351 +++++++++++++++++++++++++
 drivers/bus/vmbus/vmbus_common.c            | 287 +++++++++++++++++++++
 drivers/bus/vmbus/vmbus_common_uio.c        | 232 +++++++++++++++++
 mk/rte.app.mk                               |   1 +
 17 files changed, 2732 insertions(+)
 create mode 100644 drivers/bus/vmbus/Makefile
 create mode 100644 drivers/bus/vmbus/linux/Makefile
 create mode 100644 drivers/bus/vmbus/linux/vmbus_bus.c
 create mode 100644 drivers/bus/vmbus/linux/vmbus_uio.c
 create mode 100644 drivers/bus/vmbus/private.h
 create mode 100644 drivers/bus/vmbus/rte_bus_vmbus.h
 create mode 100644 drivers/bus/vmbus/rte_bus_vmbus_version.map
 create mode 100644 drivers/bus/vmbus/rte_vmbus_reg.h
 create mode 100644 drivers/bus/vmbus/vmbus_bufring.c
 create mode 100644 drivers/bus/vmbus/vmbus_channel.c
 create mode 100644 drivers/bus/vmbus/vmbus_common.c
 create mode 100644 drivers/bus/vmbus/vmbus_common_uio.c

diff --git a/MAINTAINERS b/MAINTAINERS
index d4c0cc1bc78e..4b72bf4b09dd 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -342,6 +342,9 @@ VDEV bus driver
 M: Jianfeng Tan <jianfeng.tan@intel.com>
 F: drivers/bus/vdev/
 
+VMBUS bus driver
+M: Stephen Hemminger <sthemmin@microsoft.com>
+F: drivers/bus/vmbus/
 
 Networking Drivers
 ------------------
diff --git a/config/common_base b/config/common_base
index 7abf7c6fcfea..fa3b80fe69c4 100644
--- a/config/common_base
+++ b/config/common_base
@@ -386,6 +386,11 @@ CONFIG_RTE_LIBRTE_PMD_FAILSAFE=y
 CONFIG_RTE_LIBRTE_MVPP2_PMD=n
 
 #
+# Compile support for VMBus library
+#
+CONFIG_RTE_LIBRTE_VMBUS=n
+
+
 # Compile virtual device driver for NetVSC on Hyper-V/Azure
 #
 CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD=n
diff --git a/config/common_linuxapp b/config/common_linuxapp
index d0437e5d6aeb..30f24d0362c5 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -37,3 +37,7 @@ CONFIG_RTE_LIBRTE_DPAA2_MEMPOOL=y
 CONFIG_RTE_LIBRTE_DPAA2_PMD=y
 CONFIG_RTE_LIBRTE_PMD_DPAA2_EVENTDEV=y
 CONFIG_RTE_LIBRTE_PMD_DPAA2_SEC=y
+
+# Hyper-V Virtual Machine bus and drivers
+CONFIG_RTE_LIBRTE_VMBUS=y
+
diff --git a/drivers/bus/Makefile b/drivers/bus/Makefile
index c251b65ad368..6fe35139fa0b 100644
--- a/drivers/bus/Makefile
+++ b/drivers/bus/Makefile
@@ -9,5 +9,6 @@ DIRS-$(CONFIG_RTE_LIBRTE_FSLMC_BUS) += fslmc
 endif
 DIRS-$(CONFIG_RTE_LIBRTE_PCI_BUS) += pci
 DIRS-$(CONFIG_RTE_LIBRTE_VDEV_BUS) += vdev
+DIRS-$(CONFIG_RTE_LIBRTE_VMBUS) += vmbus
 
 include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/drivers/bus/vmbus/Makefile b/drivers/bus/vmbus/Makefile
new file mode 100644
index 000000000000..c4ca1129c7ea
--- /dev/null
+++ b/drivers/bus/vmbus/Makefile
@@ -0,0 +1,36 @@
+# SPDX-License-Identifier: BSD-3-Clause
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+LIB = librte_bus_vmbus.a
+LIBABIVER := 1
+EXPORT_MAP := rte_bus_vmbus_version.map
+
+CFLAGS += -I$(SRCDIR)
+CFLAGS += -O3 $(WERROR_FLAGS)
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+
+ifneq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),)
+SYSTEM := linux
+endif
+ifneq ($(CONFIG_RTE_EXEC_ENV_BSDAPP),)
+$(error "VMBUS not implemented for BSD yet")
+endif
+
+CFLAGS += -I$(RTE_SDK)/drivers/bus/vmbus/$(SYSTEM)
+CFLAGS += -I$(RTE_SDK)/lib/librte_eal/common
+CFLAGS += -I$(RTE_SDK)/lib/librte_eal/$(SYSTEM)app/eal
+
+LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
+LDLIBS += -lrte_ethdev -lrte_vmbus -luuid
+
+include $(RTE_SDK)/drivers/bus/vmbus/$(SYSTEM)/Makefile
+SRCS-$(CONFIG_RTE_LIBRTE_VMBUS) := $(addprefix $(SYSTEM)/,$(SRCS))
+SRCS-$(CONFIG_RTE_LIBRTE_VMBUS) += vmbus_common.c
+SRCS-$(CONFIG_RTE_LIBRTE_VMBUS) += vmbus_channel.c vmbus_bufring.c
+SRCS-$(CONFIG_RTE_LIBRTE_VMBUS) += vmbus_common_uio.c
+
+SYMLINK-$(CONFIG_RTE_LIBRTE_VMBUS)-include += rte_bus_vmbus.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_VMBUS)-include += rte_vmbus_reg.h
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/bus/vmbus/linux/Makefile b/drivers/bus/vmbus/linux/Makefile
new file mode 100644
index 000000000000..ef0d30b2d3aa
--- /dev/null
+++ b/drivers/bus/vmbus/linux/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: BSD-3-Clause
+
+SRCS += vmbus_bus.c vmbus_uio.c
diff --git a/drivers/bus/vmbus/linux/vmbus_bus.c b/drivers/bus/vmbus/linux/vmbus_bus.c
new file mode 100644
index 000000000000..073263df5251
--- /dev/null
+++ b/drivers/bus/vmbus/linux/vmbus_bus.c
@@ -0,0 +1,354 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2018, Microsoft Corporation.
+ * All Rights Reserved.
+ */
+
+#include <string.h>
+#include <unistd.h>
+#include <dirent.h>
+#include <fcntl.h>
+#include <sys/mman.h>
+#include <sys/stat.h>
+#include <uuid/uuid.h>
+
+#include <rte_eal.h>
+#include <rte_tailq.h>
+#include <rte_log.h>
+#include <rte_devargs.h>
+#include <rte_memory.h>
+#include <rte_malloc.h>
+#include <rte_bus_vmbus.h>
+
+#include "eal_filesystem.h"
+#include "private.h"
+
+/** Pathname of VMBUS devices directory. */
+#define SYSFS_VMBUS_DEVICES "/sys/bus/vmbus/devices"
+
+extern struct rte_vmbus_bus rte_vmbus_bus;
+
+/* Read sysfs file to get UUID */
+static int
+parse_sysfs_uuid(const char *filename, uuid_t uu)
+{
+	char buf[BUFSIZ];
+	char *cp, *in = buf;
+	FILE *f;
+
+	f = fopen(filename, "r");
+	if (f == NULL) {
+		VMBUS_LOG(ERR, "%s(): cannot open sysfs value %s",
+				__func__, filename);
+		return -1;
+	}
+
+	if (fgets(buf, sizeof(buf), f) == NULL) {
+		VMBUS_LOG(ERR, "%s(): cannot read sysfs value %s",
+				__func__, filename);
+		fclose(f);
+		return -1;
+	}
+	fclose(f);
+
+	cp = strchr(buf, '\n');
+	if (cp)
+		*cp = '\0';
+
+	/* strip { } notation */
+	if (buf[0] == '{') {
+		in = buf + 1;
+		cp = strchr(in, '}');
+		if (cp)
+			*cp = '\0';
+	}
+
+	if (uuid_parse(in, uu) < 0) {
+		VMBUS_LOG(ERR, "%s %s not a valid UUID",
+			filename, buf);
+		return -1;
+	}
+
+	return 0;
+}
+
+static int
+get_sysfs_string(const char *filename, char *buf, size_t buflen)
+{
+	char *cp;
+	FILE *f;
+
+	f = fopen(filename, "r");
+	if (f == NULL) {
+		VMBUS_LOG(ERR, "%s(): cannot open sysfs value %s",
+				__func__, filename);
+		return -1;
+	}
+
+	if (fgets(buf, buflen, f) == NULL) {
+		VMBUS_LOG(ERR, "%s(): cannot read sysfs value %s",
+				__func__, filename);
+		fclose(f);
+		return -1;
+	}
+	fclose(f);
+
+	/* remove trailing newline */
+	cp = memchr(buf, '\n', buflen);
+	if (cp)
+		*cp = '\0';
+
+	return 0;
+}
+
+static int
+vmbus_get_uio_dev(const struct rte_vmbus_device *dev,
+		  char *dstbuf, size_t buflen)
+{
+	char dirname[PATH_MAX];
+	unsigned int uio_num;
+	struct dirent *e;
+	DIR *dir;
+
+	/* Assume recent kernel where uio is in uio/uioX */
+	snprintf(dirname, sizeof(dirname),
+		 SYSFS_VMBUS_DEVICES "/%s/uio", dev->device.name);
+
+	dir = opendir(dirname);
+	if (dir == NULL)
+		return -1; /* Not a UIO device */
+
+	/* take the first file starting with "uio" */
+	while ((e = readdir(dir)) != NULL) {
+		const int prefix_len = 3;
+		char *endptr;
+
+		if (strncmp(e->d_name, "uio", prefix_len) != 0)
+			continue;
+
+		/* try uio%d */
+		errno = 0;
+		uio_num = strtoull(e->d_name + prefix_len, &endptr, 10);
+		if (errno == 0 && endptr != (e->d_name + prefix_len)) {
+			snprintf(dstbuf, buflen, "%s/uio%u", dirname, uio_num);
+			break;
+		}
+	}
+	closedir(dir);
+
+	if (e == NULL)
+		return -1;
+
+	return uio_num;
+}
+
+/* Check map names with kernel names */
+static const char *map_names[VMBUS_MAX_RESOURCE] = {
+	[HV_TXRX_RING_MAP] = "txrx_rings",
+	[HV_INT_PAGE_MAP]  = "int_page",
+	[HV_MON_PAGE_MAP]  = "monitor_page",
+	[HV_RECV_BUF_MAP]  = "recv:",
+	[HV_SEND_BUF_MAP]  = "send:",
+};
+
+
+/* map the resources of a vmbus device in virtual memory */
+int
+rte_vmbus_map_device(struct rte_vmbus_device *dev)
+{
+	char uioname[PATH_MAX], filename[PATH_MAX];
+	char dirname[PATH_MAX], mapname[64];
+	int i;
+
+	dev->uio_num = vmbus_get_uio_dev(dev, uioname, sizeof(uioname));
+	if (dev->uio_num < 0) {
+		VMBUS_LOG(DEBUG, "Not managed by UIO driver, skipped");
+		return 1;
+	}
+
+	/* Extract resource value */
+	for (i = 0; i < VMBUS_MAX_RESOURCE; i++) {
+		struct rte_mem_resource *res = &dev->resource[i];
+		unsigned long gpad = 0;
+		char *cp;
+
+		snprintf(dirname, sizeof(dirname),
+			 "%s/maps/map%d", uioname, i);
+
+		snprintf(filename, sizeof(filename),
+			 "%s/name", dirname);
+
+		if (get_sysfs_string(filename, mapname, sizeof(mapname)) < 0) {
+			VMBUS_LOG(ERR, "could not read %s", filename);
+			return -1;
+		}
+
+		if (strncmp(map_names[i], mapname, strlen(map_names[i])) != 0) {
+			VMBUS_LOG(ERR,
+				"unexpected resource %s (expected %s)",
+				mapname, map_names[i]);
+			return -1;
+		}
+
+		snprintf(filename, sizeof(filename),
+			 "%s/size", dirname);
+		if (eal_parse_sysfs_value(filename, &res->len) < 0) {
+			VMBUS_LOG(ERR,
+				"could not read %s", filename);
+			return -1;
+		}
+
+		/* both send and receive buffers have gpad in name */
+		cp = memchr(mapname, ':', sizeof(mapname));
+		if (cp)
+			gpad = strtoul(cp+1, NULL, 0);
+
+		/* put the GPAD value in physical address */
+		res->phys_addr = gpad;
+	}
+
+	return vmbus_uio_map_resource(dev);
+}
+
+void
+rte_vmbus_unmap_device(struct rte_vmbus_device *dev)
+{
+	vmbus_uio_unmap_resource(dev);
+}
+
+/* Scan one vmbus sysfs entry, and fill the devices list from it. */
+static int
+vmbus_scan_one(const char *name)
+{
+	struct rte_vmbus_device *dev, *dev2;
+	char filename[PATH_MAX];
+	char dirname[PATH_MAX];
+	unsigned long tmp;
+
+	dev = calloc(1, sizeof(*dev));
+	if (dev == NULL)
+		return -1;
+
+	dev->device.name = strdup(name);
+	if (!dev->device.name)
+		goto error;
+
+	/* sysfs base directory
+	 *   /sys/bus/vmbus/devices/7a08391f-f5a0-4ac0-9802-d13fd964f8df
+	 * or on older kernel
+	 *   /sys/bus/vmbus/devices/vmbus_1
+	 */
+	snprintf(dirname, sizeof(dirname), "%s/%s",
+		 SYSFS_VMBUS_DEVICES, name);
+
+	/* get device id */
+	snprintf(filename, sizeof(filename), "%s/device_id", dirname);
+	if (parse_sysfs_uuid(filename, dev->device_id) < 0)
+		goto error;
+
+	/* get device class  */
+	snprintf(filename, sizeof(filename), "%s/class_id", dirname);
+	if (parse_sysfs_uuid(filename, dev->class_id) < 0)
+		goto error;
+
+	/* get relid */
+	snprintf(filename, sizeof(filename), "%s/id", dirname);
+	if (eal_parse_sysfs_value(filename, &tmp) < 0)
+		goto error;
+	dev->relid = tmp;
+
+	/* get monitor id */
+	snprintf(filename, sizeof(filename), "%s/monitor_id", dirname);
+	if (eal_parse_sysfs_value(filename, &tmp) < 0)
+		goto error;
+	dev->monitor_id = tmp;
+
+	/* get numa node (if present) */
+	snprintf(filename, sizeof(filename), "%s/numa_node",
+		 dirname);
+
+	if (access(filename, R_OK) == 0) {
+		if (eal_parse_sysfs_value(filename, &tmp) < 0)
+			goto error;
+		dev->device.numa_node = tmp;
+	} else {
+		/* if no NUMA support, set default to 0 */
+		dev->device.numa_node = SOCKET_ID_ANY;
+	}
+
+	/* device is valid, add in list (sorted) */
+	VMBUS_LOG(DEBUG, "Adding vmbus device %s", name);
+
+	TAILQ_FOREACH(dev2, &rte_vmbus_bus.device_list, next) {
+		int ret;
+
+		ret = uuid_compare(dev->device_id, dev2->device_id);
+		if (ret > 0)
+			continue;
+
+		if (ret < 0) {
+			vmbus_insert_device(dev2, dev);
+		} else { /* already registered */
+			VMBUS_LOG(NOTICE,
+				"%s already registered", name);
+			free(dev);
+		}
+		return 0;
+	}
+
+	vmbus_add_device(dev);
+	return 0;
+error:
+	VMBUS_LOG(DEBUG, "failed");
+
+	free(dev);
+	return -1;
+}
+
+/*
+ * Scan the content of the vmbus, and the devices in the devices list
+ */
+int
+rte_vmbus_scan(void)
+{
+	struct dirent *e;
+	DIR *dir;
+
+	dir = opendir(SYSFS_VMBUS_DEVICES);
+	if (dir == NULL) {
+		if (errno == ENOENT)
+			return 0;
+
+		VMBUS_LOG(ERR, "%s(): opendir failed: %s",
+			__func__, strerror(errno));
+		return -1;
+	}
+
+	while ((e = readdir(dir)) != NULL) {
+		if (e->d_name[0] == '.')
+			continue;
+
+		if (vmbus_scan_one(e->d_name) < 0)
+			goto error;
+	}
+	closedir(dir);
+	return 0;
+
+error:
+	closedir(dir);
+	return -1;
+}
+
+void rte_vmbus_irq_mask(struct rte_vmbus_device *device)
+{
+	vmbus_uio_irq_control(device, 1);
+}
+
+void rte_vmbus_irq_unmask(struct rte_vmbus_device *device)
+{
+	vmbus_uio_irq_control(device, 0);
+}
+
+int rte_vmbus_irq_read(struct rte_vmbus_device *device)
+{
+	return vmbus_uio_irq_read(device);
+}
diff --git a/drivers/bus/vmbus/linux/vmbus_uio.c b/drivers/bus/vmbus/linux/vmbus_uio.c
new file mode 100644
index 000000000000..88735c0f7dde
--- /dev/null
+++ b/drivers/bus/vmbus/linux/vmbus_uio.c
@@ -0,0 +1,340 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2018, Microsoft Corporation.
+ * All Rights Reserved.
+ */
+
+#include <string.h>
+#include <unistd.h>
+#include <fcntl.h>
+#include <dirent.h>
+#include <inttypes.h>
+#include <sys/stat.h>
+#include <sys/mman.h>
+
+#include <rte_log.h>
+#include <rte_bus.h>
+#include <rte_memory.h>
+#include <rte_eal_memconfig.h>
+#include <rte_common.h>
+#include <rte_malloc.h>
+#include <rte_bus_vmbus.h>
+
+#include "private.h"
+
+/** Pathname of VMBUS devices directory. */
+#define SYSFS_VMBUS_DEVICES "/sys/bus/vmbus/devices"
+
+static void *vmbus_map_addr;
+
+/* Control interrupts */
+void vmbus_uio_irq_control(struct rte_vmbus_device *dev, int32_t onoff)
+{
+	if (write(dev->intr_handle.fd, &onoff, sizeof(onoff)) < 0) {
+		VMBUS_LOG(ERR, "%s(): cannot write to %d:%s",
+			__func__, dev->intr_handle.fd, strerror(errno));
+	}
+}
+
+int vmbus_uio_irq_read(struct rte_vmbus_device *dev)
+{
+	int32_t count;
+
+	if (read(dev->intr_handle.fd, &count, sizeof(count)) < 0) {
+		VMBUS_LOG(ERR, "%s(): cannot read to %d:%s",
+			__func__, dev->intr_handle.fd, strerror(errno));
+		count = -errno;
+	}
+
+	return count;
+}
+
+void
+vmbus_uio_free_resource(struct rte_vmbus_device *dev,
+		struct mapped_vmbus_resource *uio_res)
+{
+	rte_free(uio_res);
+
+	if (dev->intr_handle.uio_cfg_fd >= 0) {
+		close(dev->intr_handle.uio_cfg_fd);
+		dev->intr_handle.uio_cfg_fd = -1;
+	}
+
+	if (dev->intr_handle.fd >= 0) {
+		close(dev->intr_handle.fd);
+		dev->intr_handle.fd = -1;
+		dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
+	}
+}
+
+int
+vmbus_uio_alloc_resource(struct rte_vmbus_device *dev,
+			 struct mapped_vmbus_resource **uio_res)
+{
+	char devname[PATH_MAX]; /* contains the /dev/uioX */
+
+	/* save fd if in primary process */
+	snprintf(devname, sizeof(devname), "/dev/uio%u", dev->uio_num);
+	dev->intr_handle.fd = open(devname, O_RDWR);
+	if (dev->intr_handle.fd < 0) {
+		VMBUS_LOG(ERR, "Cannot open %s: %s",
+			devname, strerror(errno));
+		goto error;
+	}
+	dev->intr_handle.type = RTE_INTR_HANDLE_UIO_INTX;
+
+	/* allocate the mapping details for secondary processes*/
+	*uio_res = rte_zmalloc("UIO_RES", sizeof(**uio_res), 0);
+	if (*uio_res == NULL) {
+		VMBUS_LOG(ERR,
+			"%s(): cannot store uio mmap details", __func__);
+		goto error;
+	}
+
+	strncpy((*uio_res)->path, devname, PATH_MAX);
+	uuid_copy((*uio_res)->id, dev->device_id);
+
+	return 0;
+
+error:
+	vmbus_uio_free_resource(dev, *uio_res);
+	return -1;
+}
+
+static void *
+vmbus_find_max_end_va(void)
+{
+	const struct rte_memseg *seg = rte_eal_get_physmem_layout();
+	const struct rte_memseg *last = seg;
+	unsigned int i = 0;
+
+	for (i = 0; i < RTE_MAX_MEMSEG; i++, seg++) {
+		if (seg->addr == NULL)
+			break;
+
+		if (seg->addr > last->addr)
+			last = seg;
+
+	}
+	return RTE_PTR_ADD(last->addr, last->len);
+}
+
+int
+vmbus_uio_map_resource_by_index(struct rte_vmbus_device *dev, int idx,
+				struct mapped_vmbus_resource *uio_res,
+				int flags)
+{
+	uint64_t size = dev->resource[idx].len;
+	struct vmbus_map *maps = uio_res->maps;
+	void *mapaddr;
+	off_t offset;
+	int fd;
+
+	/* devname for mmap  */
+	fd = open(uio_res->path, O_RDWR);
+	if (fd < 0) {
+		VMBUS_LOG(ERR, "Cannot open %s: %s",
+			  uio_res->path, strerror(errno));
+		return -1;
+	}
+
+	/* try mapping somewhere close to the end of hugepages */
+	if (vmbus_map_addr == NULL)
+		vmbus_map_addr = vmbus_find_max_end_va();
+
+	/* offset is special in uio it indicates which resource */
+	offset = idx * PAGE_SIZE;
+
+	mapaddr = vmbus_map_resource(vmbus_map_addr, fd, offset, size, flags);
+	close(fd);
+
+	if (mapaddr == MAP_FAILED)
+		return -1;
+
+	dev->resource[idx].addr = mapaddr;
+	vmbus_map_addr = RTE_PTR_ADD(mapaddr, size);
+
+	/* Record result of sucessful mapping for use by secondary */
+	maps[idx].addr = mapaddr;
+	maps[idx].size = size;
+
+	return 0;
+}
+
+int vmbus_uio_map_rings(struct vmbus_channel *chan)
+{
+	const struct rte_vmbus_device *dev = chan->device;
+	uint32_t ring_size, *ring_buf;
+
+	/* Primary channel */
+	if (chan->subchannel_id == 0) {
+		struct mapped_vmbus_resource *uio_res;
+
+		uio_res = vmbus_uio_find_resource(chan->device);
+		if (!uio_res) {
+			VMBUS_LOG(ERR, "can not find resources!");
+			return -ENOMEM;
+		}
+
+		if (uio_res->nb_maps < VMBUS_MAX_RESOURCE) {
+			VMBUS_LOG(ERR, "VMBUS: only %u resources found!",
+				  uio_res->nb_maps);
+			return -EINVAL;
+		}
+
+		ring_size = uio_res->maps[HV_TXRX_RING_MAP].size / 2;
+		ring_buf  = uio_res->maps[HV_TXRX_RING_MAP].addr;
+	} else {
+		char ring_path[PATH_MAX];
+		struct stat sb;
+		int fd;
+
+		snprintf(ring_path, sizeof(ring_path),
+			 "%s/%s/channels/%u/ring_buffer",
+			 SYSFS_VMBUS_DEVICES, dev->device.name,
+			 chan->relid);
+
+		fd = open(ring_path, O_RDWR);
+
+		if (fd < 0) {
+			VMBUS_LOG(ERR, "Cannot open %s: %s",
+				  ring_path, strerror(errno));
+			return -errno;
+		}
+
+		if (fstat(fd, &sb) < 0) {
+			VMBUS_LOG(ERR, "Cannot state %s: %s",
+				  ring_path, strerror(errno));
+			close(fd);
+			return -errno;
+		}
+
+		ring_size = sb.st_size / 2;
+		ring_buf = vmbus_map_resource(vmbus_map_addr, fd,
+					      0, sb.st_size, 0);
+		close(fd);
+
+		if (ring_buf == MAP_FAILED)
+			return -EIO;
+
+		vmbus_map_addr = RTE_PTR_ADD(ring_buf, sb.st_size);
+	}
+
+	vmbus_br_setup(&chan->txbr, ring_buf, ring_size);
+	vmbus_br_setup(&chan->rxbr, (char *)ring_buf + ring_size, ring_size);
+	return 0;
+}
+
+static int vmbus_uio_sysfs_read(const char *dir, const char *name,
+				unsigned long *val, unsigned long max_range)
+{
+	char path[PATH_MAX];
+	FILE *f;
+	int ret;
+
+	snprintf(path, sizeof(path), "%s/%s", dir, name);
+	f = fopen(path, "r");
+	if (!f) {
+		VMBUS_LOG(ERR, "%s(): can't open %s:%s",
+			  __func__, path, strerror(errno));
+		return -errno;
+	}
+
+	if (fscanf(f, "%lu", val) != 1)
+		ret = -EIO;
+	else if (*val > max_range)
+		ret = -ERANGE;
+	else
+		ret = 0;
+	fclose(f);
+
+	return ret;
+}
+
+static bool vmbus_isnew_subchannel(struct vmbus_channel *primary,
+				   unsigned long id)
+{
+	const struct vmbus_channel *c;
+
+	STAILQ_FOREACH(c, &primary->subchannel_list, next) {
+		if (c->relid == id)
+			return false;
+	}
+	return true;
+}
+
+
+int vmbus_uio_get_subchan(struct vmbus_channel *primary,
+			  struct vmbus_channel **subchan)
+{
+	const struct rte_vmbus_device *dev = primary->device;
+	char chan_path[PATH_MAX], subchan_path[PATH_MAX];
+	struct dirent *ent;
+	DIR *chan_dir;
+
+	snprintf(chan_path, sizeof(chan_path),
+		 "%s/%s/channels",
+		 SYSFS_VMBUS_DEVICES, dev->device.name);
+
+	chan_dir = opendir(chan_path);
+	if (!chan_dir) {
+		VMBUS_LOG(ERR, "%s(): cannot open %s: %s",
+			  __func__, chan_path, strerror(errno));
+		return -errno;
+	}
+
+	while ((ent = readdir(chan_dir))) {
+		unsigned long relid, subid, monid;
+		char *endp;
+		int err;
+
+		if (ent->d_name[0] == '.')
+			continue;
+
+		errno = 0;
+		relid = strtoul(ent->d_name, &endp, 0);
+		if (*endp || errno != 0 || relid > UINT16_MAX) {
+			VMBUS_LOG(NOTICE, "%s(): not a valid channel relid: %s",
+				  __func__, ent->d_name);
+			continue;
+		}
+
+		snprintf(subchan_path, sizeof(subchan_path), "%s/%lu",
+			 chan_path, relid);
+		err = vmbus_uio_sysfs_read(subchan_path, "subchannel_id",
+					   &subid, UINT16_MAX);
+		if (err) {
+			VMBUS_LOG(NOTICE, "%s(): invalid subchannel id %lu",
+				  __func__, subid);
+			return err;
+		}
+
+		if (subid == 0)
+			continue;	/* skip primary channel */
+
+		err = vmbus_uio_sysfs_read(subchan_path, "monitor_id",
+					   &monid, UINT8_MAX);
+		if (err) {
+			VMBUS_LOG(NOTICE, "%s(): invalid monitor id %lu",
+				  __func__, monid);
+			return err;
+		}
+
+		if (!vmbus_isnew_subchannel(primary, relid))
+			continue;
+
+		err = rte_vmbus_chan_open(dev, subchan);
+		if (err)
+			return err;
+
+		(*subchan)->relid = relid;
+		(*subchan)->subchannel_id = subid;
+		(*subchan)->monitor_id = monid;
+		VMBUS_LOG(DEBUG, "%s(): new sub channel %lu",
+			  __func__, relid);
+		break;
+
+	}
+	closedir(chan_dir);
+
+	return (ent == NULL) ? -ENOENT : 0;
+}
diff --git a/drivers/bus/vmbus/private.h b/drivers/bus/vmbus/private.h
new file mode 100644
index 000000000000..1dba5f800939
--- /dev/null
+++ b/drivers/bus/vmbus/private.h
@@ -0,0 +1,125 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2018, Microsoft Corporation.
+ * All Rights Reserved.
+ */
+
+#ifndef _VMBUS_PRIVATE_H_
+#define _VMBUS_PRIVATE_H_
+
+#include <stdbool.h>
+#include <sys/uio.h>
+#include <rte_log.h>
+#include <rte_vmbus_reg.h>
+
+#ifndef PAGE_SIZE
+#define PAGE_SIZE	4096
+#endif
+
+extern int vmbus_logtype_bus;
+#define VMBUS_LOG(level, fmt, args...) \
+	rte_log(RTE_LOG_ ## level, vmbus_logtype_bus, "%s(): " fmt "\n", \
+		__func__, ##args)
+
+struct vmbus_br {
+	struct vmbus_bufring *vbr;
+	uint32_t	dsize;
+	uint32_t	windex; /* next available location */
+};
+
+#define UIO_NAME_MAX 64
+
+struct vmbus_map {
+	void *addr;	/* user mmap of resource */
+	uint64_t size;	/* length */
+};
+
+/*
+ * For multi-process we need to reproduce all vmbus mappings in secondary
+ * processes, so save them in a tailq.
+ */
+struct mapped_vmbus_resource {
+	TAILQ_ENTRY(mapped_vmbus_resource) next;
+
+	uuid_t id;
+	int nb_maps;
+	struct vmbus_map maps[VMBUS_MAX_RESOURCE];
+	char path[PATH_MAX];
+};
+
+TAILQ_HEAD(mapped_vmbus_res_list, mapped_vmbus_resource);
+
+#define HV_MON_TRIG_LEN	32
+#define HV_MON_TRIG_MAX	4
+
+struct vmbus_channel {
+	STAILQ_HEAD(, vmbus_channel) subchannel_list;
+	STAILQ_ENTRY(vmbus_channel) next;
+	const struct rte_vmbus_device *device;
+
+	struct vmbus_br rxbr;
+	struct vmbus_br txbr;
+
+	uint16_t relid;
+	uint16_t subchannel_id;
+	uint8_t monitor_id;
+};
+
+
+void vmbus_add_device(struct rte_vmbus_device *vmbus_dev);
+void vmbus_insert_device(struct rte_vmbus_device *exist_vmbus_dev,
+			 struct rte_vmbus_device *new_vmbus_dev);
+void vmbus_remove_device(struct rte_vmbus_device *vmbus_device);
+
+void vmbus_uio_irq_control(struct rte_vmbus_device *dev, int32_t onoff);
+int vmbus_uio_irq_read(struct rte_vmbus_device *dev);
+
+int vmbus_uio_map_resource(struct rte_vmbus_device *dev);
+void vmbus_uio_unmap_resource(struct rte_vmbus_device *dev);
+
+int vmbus_uio_alloc_resource(struct rte_vmbus_device *dev,
+		struct mapped_vmbus_resource **uio_res);
+void vmbus_uio_free_resource(struct rte_vmbus_device *dev,
+		struct mapped_vmbus_resource *uio_res);
+
+struct mapped_vmbus_resource *
+vmbus_uio_find_resource(const struct rte_vmbus_device *dev);
+int vmbus_uio_map_resource_by_index(struct rte_vmbus_device *dev, int res_idx,
+				    struct mapped_vmbus_resource *uio_res,
+				    int flags);
+
+void *vmbus_map_resource(void *requested_addr, int fd, off_t offset,
+		size_t size, int additional_flags);
+void vmbus_unmap_resource(void *requested_addr, size_t size);
+
+int vmbus_uio_get_subchan(struct vmbus_channel *primary,
+			  struct vmbus_channel **subchan);
+int vmbus_uio_map_rings(struct vmbus_channel *chan);
+
+void vmbus_br_setup(struct vmbus_br *br, void *buf, unsigned int blen);
+
+/* Amount of space available for write */
+static inline uint32_t
+vmbus_br_availwrite(const struct vmbus_br *br, uint32_t windex)
+{
+	uint32_t rindex = br->vbr->rindex;
+
+	if (windex >= rindex)
+		return br->dsize - (windex - rindex);
+	else
+		return rindex - windex;
+}
+
+static inline uint32_t
+vmbus_br_availread(const struct vmbus_br *br)
+{
+	return br->dsize - vmbus_br_availwrite(br, br->vbr->windex);
+}
+
+int vmbus_txbr_write(struct vmbus_br *tbr, const struct iovec iov[], int iovlen,
+		     bool *need_sig);
+
+int vmbus_rxbr_peek(struct vmbus_br *rbr, void *data, size_t dlen);
+
+int vmbus_rxbr_read(struct vmbus_br *rbr, void *data, size_t dlen, size_t hlen);
+
+#endif /* _VMBUS_PRIVATE_H_ */
diff --git a/drivers/bus/vmbus/rte_bus_vmbus.h b/drivers/bus/vmbus/rte_bus_vmbus.h
new file mode 100644
index 000000000000..c743f253c182
--- /dev/null
+++ b/drivers/bus/vmbus/rte_bus_vmbus.h
@@ -0,0 +1,381 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2018, Microsoft Corporation.
+ * All Rights Reserved.
+ */
+
+#ifndef _VMBUS_H_
+#define _VMBUS_H_
+
+/**
+ * @file
+ *
+ * VMBUS Interface
+ */
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <limits.h>
+#include <stdbool.h>
+#include <errno.h>
+#include <uuid/uuid.h>
+#include <sys/queue.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_compat.h>
+#include <rte_debug.h>
+#include <rte_interrupts.h>
+#include <rte_dev.h>
+#include <rte_vmbus_reg.h>
+
+/* Forward declarations */
+struct rte_vmbus_device;
+struct rte_vmbus_driver;
+struct rte_vmbus_bus;
+struct vmbus_channel;
+struct vmbus_mon_page;
+
+TAILQ_HEAD(rte_vmbus_device_list, rte_vmbus_device);
+TAILQ_HEAD(rte_vmbus_driver_list, rte_vmbus_driver);
+
+/* VMBus iterators */
+#define FOREACH_DEVICE_ON_VMBUS(p)	\
+	TAILQ_FOREACH(p, &(rte_vmbus_bus.device_list), next)
+
+#define FOREACH_DRIVER_ON_VMBUS(p)	\
+	TAILQ_FOREACH(p, &(rte_vmbus_bus.driver_list), next)
+
+#define UUID_BUF_SZ	(36 + 1)
+
+/** Maximum number of VMBUS resources. */
+enum hv_uio_map {
+	HV_TXRX_RING_MAP = 0,
+	HV_INT_PAGE_MAP,
+	HV_MON_PAGE_MAP,
+	HV_RECV_BUF_MAP,
+	HV_SEND_BUF_MAP
+};
+#define VMBUS_MAX_RESOURCE 5
+
+/**
+ * A structure describing a VMBUS device.
+ */
+struct rte_vmbus_device {
+	TAILQ_ENTRY(rte_vmbus_device) next;    /**< Next probed VMBUS device */
+	const struct rte_vmbus_driver *driver; /**< Associated driver */
+	struct rte_device device;              /**< Inherit core device */
+	uuid_t device_id;		       /**< VMBUS device id */
+	uuid_t class_id;		       /**< VMBUS device type */
+	uint32_t relid;			       /**< id for primary */
+	uint8_t monitor_id;		       /**< monitor page */
+	int uio_num;			       /**< UIO device number */
+	uint32_t *int_page;		       /**< VMBUS interrupt page */
+	struct vmbus_mon_page *monitor_page;   /**< VMBUS monitor page */
+
+	struct rte_intr_handle intr_handle;    /**< Interrupt handle */
+	struct rte_mem_resource resource[VMBUS_MAX_RESOURCE];
+};
+
+/**
+ * Initialization function for the driver called during VMBUS probing.
+ */
+typedef int (vmbus_probe_t)(struct rte_vmbus_driver *,
+			    struct rte_vmbus_device *);
+
+/**
+ * Initialization function for the driver called during hot plugging.
+ */
+typedef int (vmbus_remove_t)(struct rte_vmbus_device *);
+
+/**
+ * A structure describing a VMBUS driver.
+ */
+struct rte_vmbus_driver {
+	TAILQ_ENTRY(rte_vmbus_driver) next; /**< Next in list. */
+	struct rte_driver driver;
+	struct rte_vmbus_bus *bus;          /**< VM bus reference. */
+	vmbus_probe_t *probe;               /**< Device Probe function. */
+	vmbus_remove_t *remove;             /**< Device Remove function. */
+
+	const uuid_t *id_table;		    /**< ID table. */
+};
+
+
+/**
+ * Structure describing the VM bus
+ */
+struct rte_vmbus_bus {
+	struct rte_bus bus;               /**< Inherit the generic class */
+	struct rte_vmbus_device_list device_list;  /**< List of devices */
+	struct rte_vmbus_driver_list driver_list;  /**< List of drivers */
+};
+
+/**
+ * Scan the content of the VMBUS bus, and the devices in the devices
+ * list
+ *
+ * @return
+ *  0 on success, negative on error
+ */
+int __rte_experimental rte_vmbus_scan(void);
+
+/**
+ * Probe the VMBUS bus
+ *
+ * @return
+ *   - 0 on success.
+ *   - !0 on error.
+ */
+int __rte_experimental rte_vmbus_probe(void);
+
+/**
+ * Map the VMBUS device resources in user space virtual memory address
+ *
+ * @param dev
+ *   A pointer to a rte_vmbus_device structure describing the device
+ *   to use
+ *
+ * @return
+ *   0 on success, negative on error and positive if no driver
+ *   is found for the device.
+ */
+int __rte_experimental rte_vmbus_map_device(struct rte_vmbus_device *dev);
+
+/**
+ * Unmap this device
+ *
+ * @param dev
+ *   A pointer to a rte_vmbus_device structure describing the device
+ *   to use
+ */
+void __rte_experimental rte_vmbus_unmap_device(struct rte_vmbus_device *dev);
+
+/**
+ * Get connection to primary VMBUS channel
+ *
+ * @param device
+ *   A pointer to a rte_vmbus_device structure describing the device
+ *   to use
+ * @parsm chan
+ *   A pointer to a VMBUS channel pointer that will be filled.
+ * @retrun
+ *   - 0 Success; channel opened.
+ *   - -ENOMEM: Not enough memory available.
+ *   - -EINVAL: Regions could not be mapped.
+ */
+int __rte_experimental rte_vmbus_chan_open(const struct rte_vmbus_device *device,
+			struct vmbus_channel **chan);
+
+/**
+ * Free connection to VMBUS channel
+ *
+ * @param chan
+ *    VMBUS channel
+ */
+void __rte_experimental rte_vmbus_chan_close(struct vmbus_channel *chan);
+
+/**
+ * Get a connection to new secondary vmbus channel
+ *
+ * @param primary
+ *   A pointer to primary VMBUS channel
+ * @parsm chan
+ *   A pointer to a secondary VMBUS channel pointer that will be filled.
+ * @retrun
+ *   - 0 Success; channel opened.
+ *   - -ENOMEM: Not enough memory available.
+ *   - -EINVAL: Regions could not be mapped.
+ */
+int __rte_experimental rte_vmbus_subchan_open(struct vmbus_channel *primary,
+			   struct vmbus_channel **new_chan);
+
+/**
+ * Disable IRQ for device
+ *
+ * @param device
+ *    VMBUS device
+ */
+void __rte_experimental rte_vmbus_irq_mask(struct rte_vmbus_device *device);
+
+/**
+ * Enable IRQ for device
+ *
+ * @param device
+ *    VMBUS device
+ */
+void __rte_experimental rte_vmbus_irq_unmask(struct rte_vmbus_device *device);
+
+/**
+ * Read (and wait) for IRQ
+ *
+ * @param device
+ *    VMBUS device
+ */
+int __rte_experimental rte_vmbus_irq_read(struct rte_vmbus_device *device);
+
+/**
+ * Test if channel is empty
+ *
+ * @param channel
+ *	Pointer to vmbus_channel structure.
+ * @return
+ *	Return true if no data present in incoming ring.
+ */
+bool __rte_experimental rte_vmbus_chan_rx_empty(const struct vmbus_channel *channel);
+
+/**
+ * Send the specified buffer on the given channel
+ *
+ * @param channel
+ *	Pointer to vmbus_channel structure.
+ * @param type
+ *	Type of packet that is being send e.g. negotiate, time
+ *	packet etc.
+ * @param data
+ *	Pointer to the buffer to send
+ * @param dlen
+ *	Number of bytes of data to send
+ * @param xact
+ *	Identifier of the request
+ * @param flags
+ *	Message type inband, rxbuf, gpa
+ * @param need_sig
+ *	Is host signal tx is required (optional)
+ *
+ * Sends data in buffer directly to hyper-v via the vmbus
+ */
+int __rte_experimental rte_vmbus_chan_send(struct vmbus_channel *channel, uint16_t type,
+			void *data, uint32_t dlen,
+			uint64_t xact, uint32_t flags, bool *need_sig);
+
+/**
+ * Explicitly signal host that data is available
+ *
+ * @param
+ *	Pointer to vmbus_channel structure.
+ *
+ * Used when batching multiple sends and only signaling host
+ * after the last send.
+ */
+void __rte_experimental rte_vmbus_chan_signal_tx(const struct vmbus_channel *channel);
+
+/* Structure for scatter/gather I/O */
+struct iova_list {
+	rte_iova_t	addr;
+	uint32_t	len;
+};
+#define MAX_PAGE_BUFFER_COUNT		32
+
+/**
+ * Send a scattered buffer on the given channel
+ *
+ * @param channel
+ *	Pointer to vmbus_channel structure.
+ * @param type
+ *	Type of packet that is being send e.g. negotiate, time
+ *	packet etc.
+ * @param gpa
+ *	Array of buffers to send
+ * @param gpacnt
+ *	Number of elements in iov
+ * @param data
+ *	Pointer to the buffer additional data to send
+ * @param dlen
+ *	 Maximum size of what the the buffer will hold
+ * @param xact
+ *	Identifier of the request
+ * @param flags
+ *	Message type inband, rxbuf, gpa
+ * @param need_sig
+ *	Is host signal tx is required (optional)
+ *
+ * Sends data in buffer directly to hyper-v via the vmbus
+ */
+int __rte_experimental rte_vmbus_chan_send_sglist(struct vmbus_channel *channel,
+			       struct vmbus_gpa gpa[], uint32_t gpacnt,
+			       void *data, uint32_t dlen,
+			       uint64_t xact, bool *need_sig);
+/**
+ * Receive response to request on the given channel
+ * skips the channel header.
+ *
+ * @param channel
+ *	Pointer to vmbus_channel structure.
+ * @param data
+ *	Pointer to the buffer you want to receive the data into.
+ * @param len
+ *	Pointer to size of receive buffer (in/out)
+ * @param
+ *	Pointer to received transaction_id
+ * @return
+ *   On success, returns 0
+ *   On failure, returns negative errno.
+ */
+int __rte_experimental rte_vmbus_chan_recv(struct vmbus_channel *chan,
+					   void *data, uint32_t *len,
+					   uint64_t *request_id);
+
+/**
+ * Receive response to request on the given channel
+ * includes the channel header.
+ *
+ * @param channel
+ *	Pointer to vmbus_channel structure.
+ * @param data
+ *	Pointer to the buffer you want to receive the data into.
+ * @param len
+ *	Pointer to size of receive buffer (in/out)
+ * @return
+ *   On success, returns 0
+ *   On failure, returns negative errno.
+ */
+int __rte_experimental rte_vmbus_chan_recv_raw(struct vmbus_channel *chan,
+					       void *data, uint32_t *len);
+
+
+/**
+ * Determine sub channel index of the given channel
+ *
+ * @param channel
+ *	Pointer to vmbus_channel structure.
+ * @return
+ *   Sub channel index (0 for primary)
+ */
+uint16_t __rte_experimental rte_vmbus_sub_channel_index(const struct vmbus_channel *chan);
+
+/**
+ * Register a VMBUS driver.
+ *
+ * @param driver
+ *   A pointer to a rte_vmbus_driver structure describing the driver
+ *   to be registered.
+ */
+void __rte_experimental rte_vmbus_register(struct rte_vmbus_driver *driver);
+
+/**
+ * Unregister a VMBUS driver.
+ *
+ * @param driver
+ *   A pointer to a rte_vmbus_driver structure describing the driver
+ *   to be unregistered.
+ */
+void __rte_experimental rte_vmbus_unregister(struct rte_vmbus_driver *driver);
+
+/** Helper for VMBUS device registration from driver instance */
+#define RTE_PMD_REGISTER_VMBUS(nm, vmbus_drv)		\
+	RTE_INIT(vmbusinitfn_ ##nm);			\
+	static void vmbusinitfn_ ##nm(void)		\
+	{						\
+		(vmbus_drv).driver.name = RTE_STR(nm);	\
+		rte_vmbus_register(&vmbus_drv);		\
+	}						\
+	RTE_PMD_EXPORT_NAME(nm, __COUNTER__)
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _VMBUS_H_ */
diff --git a/drivers/bus/vmbus/rte_bus_vmbus_version.map b/drivers/bus/vmbus/rte_bus_vmbus_version.map
new file mode 100644
index 000000000000..604deac04173
--- /dev/null
+++ b/drivers/bus/vmbus/rte_bus_vmbus_version.map
@@ -0,0 +1,23 @@
+/* SPDX-License-Identifier: BSD-3-Clause */
+DPDK_18.02 {
+	global:
+
+	rte_vmbus_chan_close;
+	rte_vmbus_chan_open;
+	rte_vmbus_chan_recv;
+	rte_vmbus_chan_recv_raw;
+	rte_vmbus_chan_send;
+	rte_vmbus_chan_send_sglist;
+	rte_vmbus_irq_mask;
+	rte_vmbus_irq_read;
+	rte_vmbus_irq_unmask;
+	rte_vmbus_map_device;
+	rte_vmbus_probe;
+	rte_vmbus_probe_one;
+	rte_vmbus_register;
+	rte_vmbus_scan;
+	rte_vmbus_unmap_device;
+	rte_vmbus_unregister;
+
+	local: *;
+};
diff --git a/drivers/bus/vmbus/rte_vmbus_reg.h b/drivers/bus/vmbus/rte_vmbus_reg.h
new file mode 100644
index 000000000000..1fd7216f1565
--- /dev/null
+++ b/drivers/bus/vmbus/rte_vmbus_reg.h
@@ -0,0 +1,344 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2018, Microsoft Corporation.
+ * All Rights Reserved.
+ */
+
+#ifndef _VMBUS_REG_H_
+#define _VMBUS_REG_H_
+
+/*
+ * Hyper-V SynIC message format.
+ */
+#define VMBUS_MSG_DSIZE_MAX		240
+#define VMBUS_MSG_SIZE			256
+
+struct vmbus_message {
+	uint32_t	type;	/* HYPERV_MSGTYPE_ */
+	uint8_t		dsize;	/* data size */
+	uint8_t		flags;	/* VMBUS_MSGFLAG_ */
+	uint16_t	rsvd;
+	uint64_t	id;
+	uint8_t		data[VMBUS_MSG_DSIZE_MAX];
+} __rte_packed;
+
+#define VMBUS_MSGFLAG_PENDING		0x01
+
+/*
+ * Hyper-V Monitor Notification Facility
+ */
+
+struct vmbus_mon_trig {
+	uint32_t	pending;
+	uint32_t	armed;
+} __rte_packed;
+
+#define VMBUS_MONTRIGS_MAX	4
+#define VMBUS_MONTRIG_LEN	32
+
+/*
+ * Hyper-V Monitor Notification Facility
+ */
+struct hyperv_mon_param {
+	uint32_t	connid;
+	uint16_t	evtflag_ofs;
+	uint16_t	rsvd;
+} __rte_packed;
+
+struct vmbus_mon_page {
+	uint32_t	state;
+	uint32_t	rsvd1;
+
+	struct vmbus_mon_trig trigs[VMBUS_MONTRIGS_MAX];
+	uint8_t		rsvd2[536];
+
+	uint16_t	lat[VMBUS_MONTRIGS_MAX][VMBUS_MONTRIG_LEN];
+	uint8_t		rsvd3[256];
+
+	struct hyperv_mon_param
+			param[VMBUS_MONTRIGS_MAX][VMBUS_MONTRIG_LEN];
+	uint8_t		rsvd4[1984];
+} __rte_packed;
+
+/*
+ * Buffer ring
+ */
+
+struct vmbus_bufring {
+	volatile uint32_t windex;
+	volatile uint32_t rindex;
+
+	/*
+	 * Interrupt mask {0,1}
+	 *
+	 * For TX bufring, host set this to 1, when it is processing
+	 * the TX bufring, so that we can safely skip the TX event
+	 * notification to host.
+	 *
+	 * For RX bufring, once this is set to 1 by us, host will not
+	 * further dispatch interrupts to us, even if there are data
+	 * pending on the RX bufring.  This effectively disables the
+	 * interrupt of the channel to which this RX bufring is attached.
+	 */
+	volatile uint32_t imask;
+
+	/*
+	 * Win8 uses some of the reserved bits to implement
+	 * interrupt driven flow management. On the send side
+	 * we can request that the receiver interrupt the sender
+	 * when the ring transitions from being full to being able
+	 * to handle a message of size "pending_send_sz".
+	 *
+	 * Add necessary state for this enhancement.
+	 */
+	volatile uint32_t pending_send;
+	uint32_t reserved1[12];
+
+	union {
+		struct {
+			uint32_t feat_pending_send_sz:1;
+		};
+		uint32_t value;
+	} feature_bits;
+
+	/* Pad it to PAGE_SIZE so that data starts on page boundary */
+	uint8_t	reserved2[4028];
+
+	/*
+	 * Ring data starts here + RingDataStartOffset
+	 * !!! DO NOT place any fields below this !!!
+	 */
+	uint8_t data[0];
+} __rte_packed;
+
+/*
+ * Channel packets
+ */
+
+/* Channel packet flags */
+#define VMBUS_CHANPKT_TYPE_INBAND      0x0006
+#define VMBUS_CHANPKT_TYPE_RXBUF       0x0007
+#define VMBUS_CHANPKT_TYPE_GPA         0x0009
+#define VMBUS_CHANPKT_TYPE_COMP        0x000b
+
+#define VMBUS_CHANPKT_FLAG_NONE        0
+#define VMBUS_CHANPKT_FLAG_RC          0x0001  /* report completion */
+
+#define VMBUS_CHANPKT_SIZE_SHIFT	3
+#define VMBUS_CHANPKT_SIZE_ALIGN	(1 << VMBUS_CHANPKT_SIZE_SHIFT)
+#define VMBUS_CHANPKT_HLEN_MIN		\
+	(sizeof(struct vmbus_chanpkt_hdr) >> VMBUS_CHANPKT_SIZE_SHIFT)
+
+static inline uint32_t
+vmbus_chanpkt_getlen(uint16_t pktlen)
+{
+	return (uint32_t)pktlen << VMBUS_CHANPKT_SIZE_SHIFT;
+}
+
+/*
+ * GPA stuffs.
+ */
+struct vmbus_gpa_range {
+	uint32_t       len;
+	uint32_t       ofs;
+	uint64_t       page[0];
+} __rte_packed;
+
+/* This is actually vmbus_gpa_range.gpa_page[1] */
+struct vmbus_gpa {
+	uint32_t	len;
+	uint32_t	ofs;
+	uint64_t	page;
+} __rte_packed;
+
+struct vmbus_chanpkt_hdr {
+	uint16_t	type;	/* VMBUS_CHANPKT_TYPE_ */
+	uint16_t	hlen;	/* header len, in 8 bytes */
+	uint16_t	tlen;	/* total len, in 8 bytes */
+	uint16_t	flags;	/* VMBUS_CHANPKT_FLAG_ */
+	uint64_t	xactid;
+} __rte_packed;
+
+static inline uint32_t
+vmbus_chanpkt_datalen(const struct vmbus_chanpkt_hdr *pkt)
+{
+	return vmbus_chanpkt_getlen(pkt->tlen)
+		- vmbus_chanpkt_getlen(pkt->hlen);
+}
+
+struct vmbus_chanpkt {
+	struct vmbus_chanpkt_hdr hdr;
+} __rte_packed;
+
+struct vmbus_rxbuf_desc {
+	uint32_t	len;
+	uint32_t	ofs;
+} __rte_packed;
+
+struct vmbus_chanpkt_rxbuf {
+	struct vmbus_chanpkt_hdr hdr;
+	uint16_t	rxbuf_id;
+	uint16_t	rsvd;
+	uint32_t	rxbuf_cnt;
+	struct vmbus_rxbuf_desc rxbuf[];
+} __rte_packed;
+
+struct vmbus_chanpkt_sglist {
+	struct vmbus_chanpkt_hdr hdr;
+	uint32_t	rsvd;
+	uint32_t	gpa_cnt;
+	struct vmbus_gpa gpa[];
+} __rte_packed;
+
+/*
+ * Channel messages
+ * - Embedded in vmbus_message.msg_data, e.g. response and notification.
+ * - Embedded in hypercall_postmsg_in.hc_data, e.g. request.
+ */
+
+#define VMBUS_CHANMSG_TYPE_CHOFFER		1	/* NOTE */
+#define VMBUS_CHANMSG_TYPE_CHRESCIND		2	/* NOTE */
+#define VMBUS_CHANMSG_TYPE_CHREQUEST		3	/* REQ */
+#define VMBUS_CHANMSG_TYPE_CHOFFER_DONE		4	/* NOTE */
+#define VMBUS_CHANMSG_TYPE_CHOPEN		5	/* REQ */
+#define VMBUS_CHANMSG_TYPE_CHOPEN_RESP		6	/* RESP */
+#define VMBUS_CHANMSG_TYPE_CHCLOSE		7	/* REQ */
+#define VMBUS_CHANMSG_TYPE_GPADL_CONN		8	/* REQ */
+#define VMBUS_CHANMSG_TYPE_GPADL_SUBCONN	9	/* REQ */
+#define VMBUS_CHANMSG_TYPE_GPADL_CONNRESP	10	/* RESP */
+#define VMBUS_CHANMSG_TYPE_GPADL_DISCONN	11	/* REQ */
+#define VMBUS_CHANMSG_TYPE_GPADL_DISCONNRESP	12	/* RESP */
+#define VMBUS_CHANMSG_TYPE_CHFREE		13	/* REQ */
+#define VMBUS_CHANMSG_TYPE_CONNECT		14	/* REQ */
+#define VMBUS_CHANMSG_TYPE_CONNECT_RESP		15	/* RESP */
+#define VMBUS_CHANMSG_TYPE_DISCONNECT		16	/* REQ */
+#define VMBUS_CHANMSG_TYPE_MAX			22
+
+struct vmbus_chanmsg_hdr {
+	uint32_t	type;	/* VMBUS_CHANMSG_TYPE_ */
+	uint32_t	rsvd;
+} __rte_packed;
+
+/* VMBUS_CHANMSG_TYPE_CONNECT */
+struct vmbus_chanmsg_connect {
+	struct vmbus_chanmsg_hdr hdr;
+	uint32_t	ver;
+	uint32_t	rsvd;
+	uint64_t	evtflags;
+	uint64_t	mnf1;
+	uint64_t	mnf2;
+} __rte_packed;
+
+/* VMBUS_CHANMSG_TYPE_CONNECT_RESP */
+struct vmbus_chanmsg_connect_resp {
+	struct vmbus_chanmsg_hdr hdr;
+	uint8_t		done;
+} __rte_packed;
+
+/* VMBUS_CHANMSG_TYPE_CHREQUEST */
+struct vmbus_chanmsg_chrequest {
+	struct vmbus_chanmsg_hdr hdr;
+} __rte_packed;
+
+/* VMBUS_CHANMSG_TYPE_DISCONNECT */
+struct vmbus_chanmsg_disconnect {
+	struct vmbus_chanmsg_hdr hdr;
+} __rte_packed;
+
+/* VMBUS_CHANMSG_TYPE_CHOPEN */
+struct vmbus_chanmsg_chopen {
+	struct vmbus_chanmsg_hdr hdr;
+	uint32_t	chanid;
+	uint32_t	openid;
+	uint32_t	gpadl;
+	uint32_t	vcpuid;
+	uint32_t	txbr_pgcnt;
+#define VMBUS_CHANMSG_CHOPEN_UDATA_SIZE	120
+	uint8_t		udata[VMBUS_CHANMSG_CHOPEN_UDATA_SIZE];
+} __rte_packed;
+
+/* VMBUS_CHANMSG_TYPE_CHOPEN_RESP */
+struct vmbus_chanmsg_chopen_resp {
+	struct vmbus_chanmsg_hdr hdr;
+	uint32_t	chanid;
+	uint32_t	openid;
+	uint32_t	status;
+} __rte_packed;
+
+/* VMBUS_CHANMSG_TYPE_GPADL_CONN */
+struct vmbus_chanmsg_gpadl_conn {
+	struct vmbus_chanmsg_hdr hdr;
+	uint32_t	chanid;
+	uint32_t	gpadl;
+	uint16_t	range_len;
+	uint16_t	range_cnt;
+	struct vmbus_gpa_range range;
+} __rte_packed;
+
+#define VMBUS_CHANMSG_GPADL_CONN_PGMAX		26
+
+/* VMBUS_CHANMSG_TYPE_GPADL_SUBCONN */
+struct vmbus_chanmsg_gpadl_subconn {
+	struct vmbus_chanmsg_hdr hdr;
+	uint32_t	msgno;
+	uint32_t	gpadl;
+	uint64_t	gpa_page[];
+} __rte_packed;
+
+#define VMBUS_CHANMSG_GPADL_SUBCONN_PGMAX	28
+
+/* VMBUS_CHANMSG_TYPE_GPADL_CONNRESP */
+struct vmbus_chanmsg_gpadl_connresp {
+	struct vmbus_chanmsg_hdr hdr;
+	uint32_t	chanid;
+	uint32_t	gpadl;
+	uint32_t	status;
+} __rte_packed;
+
+/* VMBUS_CHANMSG_TYPE_CHCLOSE */
+struct vmbus_chanmsg_chclose {
+	struct vmbus_chanmsg_hdr hdr;
+	uint32_t	chanid;
+} __rte_packed;
+
+/* VMBUS_CHANMSG_TYPE_GPADL_DISCONN */
+struct vmbus_chanmsg_gpadl_disconn {
+	struct vmbus_chanmsg_hdr hdr;
+	uint32_t	chanid;
+	uint32_t	gpadl;
+} __rte_packed;
+
+/* VMBUS_CHANMSG_TYPE_CHFREE */
+struct vmbus_chanmsg_chfree {
+	struct vmbus_chanmsg_hdr hdr;
+	uint32_t	chanid;
+} __rte_packed;
+
+/* VMBUS_CHANMSG_TYPE_CHRESCIND */
+struct vmbus_chanmsg_chrescind {
+	struct vmbus_chanmsg_hdr hdr;
+	uint32_t	chanid;
+} __rte_packed;
+
+/* VMBUS_CHANMSG_TYPE_CHOFFER */
+struct vmbus_chanmsg_choffer {
+	struct vmbus_chanmsg_hdr hdr;
+	uuid_t		chtype;
+	uuid_t		chinst;
+	uint64_t	chlat;	/* unit: 100ns */
+	uint32_t	chrev;
+	uint32_t	svrctx_sz;
+	uint16_t	chflags;
+	uint16_t	mmio_sz;	/* unit: MB */
+	uint8_t		udata[120];
+	uint16_t	subidx;
+	uint16_t	rsvd;
+	uint32_t	chanid;
+	uint8_t		montrig;
+	uint8_t		flags1;	/* VMBUS_CHOFFER_FLAG1_ */
+	uint16_t	flags2;
+	uint32_t	connid;
+} __rte_packed;
+
+#define VMBUS_CHOFFER_FLAG1_HASMNF	0x01
+
+#endif	/* !_VMBUS_REG_H_ */
diff --git a/drivers/bus/vmbus/vmbus_bufring.c b/drivers/bus/vmbus/vmbus_bufring.c
new file mode 100644
index 000000000000..d4b1f734ff72
--- /dev/null
+++ b/drivers/bus/vmbus/vmbus_bufring.c
@@ -0,0 +1,242 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2009-2012,2016 Microsoft Corp.
+ * Copyright (c) 2012 NetApp Inc.
+ * Copyright (c) 2012 Citrix Inc.
+ * All rights reserved.
+ */
+
+#include <unistd.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <string.h>
+#include <sys/uio.h>
+
+#include <rte_eal.h>
+#include <rte_tailq.h>
+#include <rte_log.h>
+#include <rte_malloc.h>
+#include <rte_bus.h>
+#include <rte_atomic.h>
+#include <rte_memory.h>
+#include <rte_hexdump.h>
+#include <rte_pause.h>
+#include <rte_bus_vmbus.h>
+
+#include "private.h"
+
+/* Increase bufring index by inc with wraparound */
+static inline uint32_t vmbus_br_idxinc(uint32_t idx, uint32_t inc, uint32_t sz)
+{
+	idx += inc;
+	if (idx >= sz)
+		idx -= sz;
+
+	return idx;
+}
+
+void vmbus_br_setup(struct vmbus_br *br, void *buf, unsigned int blen)
+{
+	br->vbr = buf;
+	br->windex = br->vbr->windex;
+	br->dsize = blen - sizeof(struct vmbus_bufring);
+}
+
+/*
+ * When we write to the ring buffer, check if the host needs to be
+ * signaled.
+ *
+ * The contract:
+ * - The host guarantees that while it is draining the TX bufring,
+ *   it will set the br_imask to indicate it does not need to be
+ *   interrupted when new data are added.
+ * - The host guarantees that it will completely drain the TX bufring
+ *   before exiting the read loop.  Further, once the TX bufring is
+ *   empty, it will clear the br_imask and re-check to see if new
+ *   data have arrived.
+ */
+static inline bool
+vmbus_txbr_need_signal(const struct vmbus_br *tbr, uint32_t old_windex)
+{
+	rte_smp_mb();
+	if (tbr->vbr->imask)
+		return false;
+
+	rte_smp_rmb();
+
+	/*
+	 * This is the only case we need to signal when the
+	 * ring transitions from being empty to non-empty.
+	 */
+	return old_windex == tbr->vbr->rindex;
+}
+
+static inline uint32_t
+vmbus_txbr_copyto(const struct vmbus_br *tbr, uint32_t windex,
+		  const void *src0, uint32_t cplen)
+{
+	uint8_t *br_data = tbr->vbr->data;
+	uint32_t br_dsize = tbr->dsize;
+	const uint8_t *src = src0;
+
+	/* XXX use double mapping like Linux kernel? */
+	if (cplen > br_dsize - windex) {
+		uint32_t fraglen = br_dsize - windex;
+
+		/* Wrap-around detected */
+		memcpy(br_data + windex, src, fraglen);
+		memcpy(br_data, src + fraglen, cplen - fraglen);
+	} else {
+		memcpy(br_data + windex, src, cplen);
+	}
+
+	return vmbus_br_idxinc(windex, cplen, br_dsize);
+}
+
+/*
+ * Write scattered channel packet to TX bufring.
+ *
+ * The offset of this channel packet is written as a 64bits value
+ * immediately after this channel packet.
+ *
+ * The write goes through three stages:
+ *  1. Reserve space in ring buffer for the new data.
+ *     Writer atomically moves priv_write_index.
+ *  2. Copy the new data into the ring.
+ *  3. Update the tail of the ring (visible to host) that indicates
+ *     next read location. Writer updates write_index
+ */
+int
+vmbus_txbr_write(struct vmbus_br *tbr, const struct iovec iov[], int iovlen,
+		 bool *need_sig)
+{
+	struct vmbus_bufring *vbr = tbr->vbr;
+	uint32_t ring_size = tbr->dsize;
+	uint32_t old_windex, next_windex, windex, total;
+	uint64_t save_windex;
+	int i;
+
+	total = 0;
+	for (i = 0; i < iovlen; i++)
+		total += iov[i].iov_len;
+	total += sizeof(save_windex);
+
+	/* Reserve space in ring */
+	do {
+		uint32_t avail;
+
+		/* Get current free location */
+		old_windex = tbr->windex;
+
+		/* Prevent compiler reordering this with calculation */
+		rte_compiler_barrier();
+
+		avail = vmbus_br_availwrite(tbr, old_windex);
+
+		/* If not enough space in ring, then tell caller. */
+		if (avail <= total)
+			return -EAGAIN;
+
+		next_windex = vmbus_br_idxinc(old_windex, total, ring_size);
+
+		/* Atomic update of next write_index for other threads */
+	} while (!rte_atomic32_cmpset(&tbr->windex, old_windex, next_windex));
+
+	/* Space from old..new is now reserved */
+	windex = old_windex;
+	for (i = 0; i < iovlen; i++) {
+		windex = vmbus_txbr_copyto(tbr, windex,
+					   iov[i].iov_base, iov[i].iov_len);
+	}
+
+	/* Set the offset of the current channel packet. */
+	save_windex = ((uint64_t)old_windex) << 32;
+	windex = vmbus_txbr_copyto(tbr, windex, &save_windex,
+				   sizeof(save_windex));
+
+	/* The region reserved should match region used */
+	RTE_ASSERT(windex == next_windex);
+
+	/* Ensure that data is available before updating host index */
+	rte_smp_wmb();
+
+	/* Checkin for our reservation. wait for our turn to update host */
+	while (!rte_atomic32_cmpset(&vbr->windex, old_windex, next_windex))
+		rte_pause();
+
+	/* If host had read all data before this, then need to signal */
+	*need_sig |= vmbus_txbr_need_signal(tbr, old_windex);
+	return 0;
+}
+
+static inline uint32_t
+vmbus_rxbr_copyfrom(const struct vmbus_br *rbr, uint32_t rindex,
+		    void *dst0, size_t cplen)
+{
+	const uint8_t *br_data = rbr->vbr->data;
+	uint32_t br_dsize = rbr->dsize;
+	uint8_t *dst = dst0;
+
+	if (cplen > br_dsize - rindex) {
+		uint32_t fraglen = br_dsize - rindex;
+
+		/* Wrap-around detected. */
+		memcpy(dst, br_data + rindex, fraglen);
+		memcpy(dst + fraglen, br_data, cplen - fraglen);
+	} else {
+		memcpy(dst, br_data + rindex, cplen);
+	}
+
+	return vmbus_br_idxinc(rindex, cplen, br_dsize);
+}
+
+/* Copy data from receive ring but don't change index */
+int
+vmbus_rxbr_peek(struct vmbus_br *rbr, void *data, size_t dlen)
+{
+	uint32_t avail;
+
+	/*
+	 * The requested data and the 64bits channel packet
+	 * offset should be there at least.
+	 */
+	avail = vmbus_br_availread(rbr);
+	if (avail < dlen + sizeof(uint64_t))
+		return -EAGAIN;
+
+	vmbus_rxbr_copyfrom(rbr, rbr->vbr->rindex, data, dlen);
+	return 0;
+}
+
+/*
+ * Copy data from receive ring and change index
+ * NOTE:
+ * We assume (dlen + skip) == sizeof(channel packet).
+ */
+int
+vmbus_rxbr_read(struct vmbus_br *rbr, void *data, size_t dlen, size_t skip)
+{
+	struct vmbus_bufring *vbr = rbr->vbr;
+	uint32_t br_dsize = rbr->dsize;
+	uint32_t rindex;
+
+	if (vmbus_br_availread(rbr) < dlen + skip + sizeof(uint64_t))
+		return -EAGAIN;
+
+	/*
+	 * Copy channel packet from RX bufring.
+	 */
+	rindex = vmbus_br_idxinc(rbr->vbr->rindex, skip, br_dsize);
+	rindex = vmbus_rxbr_copyfrom(rbr, rindex, data, dlen);
+
+	/*
+	 * Discard this channel packet's 64bits offset, which is useless to us.
+	 */
+	rindex = vmbus_br_idxinc(rindex, sizeof(uint64_t), br_dsize);
+
+	/* Update the read index _after_ the channel packet is fetched.	 */
+	rte_compiler_barrier();
+
+	vbr->rindex = rindex;
+
+	return 0;
+}
diff --git a/drivers/bus/vmbus/vmbus_channel.c b/drivers/bus/vmbus/vmbus_channel.c
new file mode 100644
index 000000000000..71b9cf0e158e
--- /dev/null
+++ b/drivers/bus/vmbus/vmbus_channel.c
@@ -0,0 +1,351 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2018, Microsoft Corporation.
+ * All Rights Reserved.
+ */
+
+#include <unistd.h>
+#include <stdint.h>
+#include <string.h>
+#include <sys/uio.h>
+
+#include <rte_eal.h>
+#include <rte_tailq.h>
+#include <rte_log.h>
+#include <rte_malloc.h>
+#include <rte_bus.h>
+#include <rte_atomic.h>
+#include <rte_memory.h>
+#include <rte_bus_vmbus.h>
+
+#include "private.h"
+
+static inline void
+vmbus_sync_set_bit(volatile uint32_t *addr, uint32_t mask)
+{
+	/* Use GCC builtin which atomic does atomic OR operation */
+	__sync_or_and_fetch(addr, mask);
+}
+
+static inline void
+vmbus_send_interrupt(const struct rte_vmbus_device *dev, uint32_t relid)
+{
+	uint32_t *int_addr;
+	uint32_t int_mask;
+
+	int_addr = dev->int_page + relid / 32;
+	int_mask = 1u << (relid % 32);
+
+	vmbus_sync_set_bit(int_addr, int_mask);
+}
+
+static inline void
+vmbus_set_monitor(const struct rte_vmbus_device *dev, uint32_t monitor_id)
+{
+	uint32_t *monitor_addr, monitor_mask;
+	unsigned int trigger_index;
+
+	trigger_index = monitor_id / HV_MON_TRIG_LEN;
+	monitor_mask = 1u << (monitor_id % HV_MON_TRIG_LEN);
+
+	monitor_addr = &dev->monitor_page->trigs[trigger_index].pending;
+	vmbus_sync_set_bit(monitor_addr, monitor_mask);
+}
+
+static void
+vmbus_set_event(const struct rte_vmbus_device *dev,
+		const struct vmbus_channel *chan)
+{
+	vmbus_send_interrupt(dev, chan->relid);
+	vmbus_set_monitor(dev, chan->monitor_id);
+}
+
+/*
+ * Notify host that there are data pending on our TX bufring.
+ *
+ * Since this in userspace, rely on the monitor page.
+ * Can't do a hypercall from userspace.
+ */
+void
+rte_vmbus_chan_signal_tx(const struct vmbus_channel *chan)
+{
+	const struct rte_vmbus_device *dev = chan->device;
+	const struct vmbus_br *tbr = &chan->txbr;
+
+	/* Make sure all updates are done before signaling host */
+	rte_smp_wmb();
+
+	/* If host is ignoring interrupts? */
+	if (tbr->vbr->imask)
+		return;
+
+	vmbus_set_event(dev, chan);
+}
+
+
+/* Do a simple send directly using transmit ring. */
+int rte_vmbus_chan_send(struct vmbus_channel *chan, uint16_t type,
+			void *data, uint32_t dlen,
+			uint64_t xactid, uint32_t flags, bool *need_sig)
+{
+	struct vmbus_chanpkt pkt;
+	unsigned int pktlen, pad_pktlen;
+	const uint32_t hlen = sizeof(pkt);
+	bool send_evt = false;
+	uint64_t pad = 0;
+	struct iovec iov[3];
+	int error;
+
+	pktlen = hlen + dlen;
+	pad_pktlen = RTE_ALIGN(pktlen, sizeof(uint64_t));
+
+	pkt.hdr.type = type;
+	pkt.hdr.flags = flags;
+	pkt.hdr.hlen = hlen >> VMBUS_CHANPKT_SIZE_SHIFT;
+	pkt.hdr.tlen = pad_pktlen >> VMBUS_CHANPKT_SIZE_SHIFT;
+	pkt.hdr.xactid = xactid;
+
+	iov[0].iov_base = &pkt;
+	iov[0].iov_len = hlen;
+	iov[1].iov_base = data;
+	iov[1].iov_len = dlen;
+	iov[2].iov_base = &pad;
+	iov[2].iov_len = pad_pktlen - pktlen;
+
+	error = vmbus_txbr_write(&chan->txbr, iov, 3, &send_evt);
+
+	/*
+	 * caller sets need_sig to non-NULL if it will handle
+	 * signaling if required later.
+	 * if need_sig is NULL, signal now if needed.
+	 */
+	if (need_sig)
+		*need_sig |= send_evt;
+	else if (error == 0 && send_evt)
+		rte_vmbus_chan_signal_tx(chan);
+	return error;
+}
+
+/* Do a scatter/gather send where the descriptor points to data. */
+int rte_vmbus_chan_send_sglist(struct vmbus_channel *chan,
+			       struct vmbus_gpa sg[], uint32_t sglen,
+			       void *data, uint32_t dlen,
+			       uint64_t xactid, bool *need_sig)
+{
+	struct vmbus_chanpkt_sglist pkt;
+	unsigned int pktlen, pad_pktlen, hlen;
+	bool send_evt = false;
+	struct iovec iov[4];
+	uint64_t pad = 0;
+	int error;
+
+	hlen = offsetof(struct vmbus_chanpkt_sglist, gpa[sglen]);
+	pktlen = hlen + dlen;
+	pad_pktlen = RTE_ALIGN(pktlen, sizeof(uint64_t));
+
+	pkt.hdr.type = VMBUS_CHANPKT_TYPE_GPA;
+	pkt.hdr.flags = VMBUS_CHANPKT_FLAG_RC;
+	pkt.hdr.hlen = hlen >> VMBUS_CHANPKT_SIZE_SHIFT;
+	pkt.hdr.tlen = pad_pktlen >> VMBUS_CHANPKT_SIZE_SHIFT;
+	pkt.hdr.xactid = xactid;
+	pkt.rsvd = 0;
+	pkt.gpa_cnt = sglen;
+
+	iov[0].iov_base = &pkt;
+	iov[0].iov_len = sizeof(pkt);
+	iov[1].iov_base = sg;
+	iov[1].iov_len = sizeof(struct vmbus_gpa) * sglen;
+	iov[2].iov_base = data;
+	iov[2].iov_len = dlen;
+	iov[3].iov_base = &pad;
+	iov[3].iov_len = pad_pktlen - pktlen;
+
+	error = vmbus_txbr_write(&chan->txbr, iov, 4, &send_evt);
+
+	/* if caller is batching, just propagate the status */
+	if (need_sig)
+		*need_sig |= send_evt;
+	else if (error == 0 && send_evt)
+		rte_vmbus_chan_signal_tx(chan);
+	return error;
+}
+
+bool rte_vmbus_chan_rx_empty(const struct vmbus_channel *channel)
+{
+	const struct vmbus_br *br = &channel->rxbr;
+
+	return br->vbr->rindex == br->vbr->windex;
+}
+
+static int vmbus_read_and_signal(struct vmbus_channel *chan,
+				 void *data, size_t dlen, size_t skip)
+{
+	struct vmbus_br *rbr = &chan->rxbr;
+	uint32_t write_sz, pending_sz, bytes_read;
+	int error;
+
+	/* Record where host was when we started read (for debug) */
+	rbr->windex = rbr->vbr->windex;
+
+	/* Read data and skip packet header */
+	error = vmbus_rxbr_read(rbr, data, dlen, skip);
+	if (error)
+		return error;
+
+	/* No need for signaling on older versions */
+	if (!rbr->vbr->feature_bits.feat_pending_send_sz)
+		return 0;
+
+	/* Make sure reading of pending happens after new read index */
+	rte_mb();
+
+	pending_sz = rbr->vbr->pending_send;
+	if (!pending_sz)
+		return 0;
+
+	rte_smp_rmb();
+	write_sz = vmbus_br_availwrite(rbr, rbr->vbr->windex);
+	bytes_read = dlen + skip + sizeof(uint64_t);
+
+	/* If there was space before then host was not blocked */
+	if (write_sz - bytes_read > pending_sz)
+		return 0;
+
+	/* If pending write will not fit */
+	if (write_sz <= pending_sz)
+		return 0;
+
+	vmbus_set_event(chan->device, chan);
+	return 0;
+}
+
+/* TODO: replace this with inplace ring buffer (no copy) */
+int rte_vmbus_chan_recv(struct vmbus_channel *chan, void *data, uint32_t *len,
+			uint64_t *request_id)
+{
+	struct vmbus_chanpkt_hdr pkt;
+	uint32_t dlen, hlen, bufferlen = *len;
+	int error;
+
+	*len = 0;
+
+	error = vmbus_rxbr_peek(&chan->rxbr, &pkt, sizeof(pkt));
+	if (error)
+		return error;
+
+	if (unlikely(pkt.hlen < VMBUS_CHANPKT_HLEN_MIN)) {
+		VMBUS_LOG(ERR, "VMBUS recv, invalid hlen %u", pkt.hlen);
+		/* XXX this channel is dead actually. */
+		return -EIO;
+	}
+
+	if (unlikely(pkt.hlen > pkt.tlen)) {
+		VMBUS_LOG(ERR, "VMBUS recv,invalid hlen %u and tlen %u",
+			  pkt.hlen, pkt.tlen);
+		return -EIO;
+	}
+
+	/* Length are in quad words */
+	hlen = pkt.hlen << VMBUS_CHANPKT_SIZE_SHIFT;
+	dlen = (pkt.tlen << VMBUS_CHANPKT_SIZE_SHIFT) - hlen;
+	*len = dlen;
+
+	/* If caller buffer is not large enough */
+	if (unlikely(dlen > bufferlen))
+		return -ENOBUFS;
+
+	if (request_id)
+		*request_id = pkt.xactid;
+
+	/* Read data and skip the header */
+	return vmbus_read_and_signal(chan, data, dlen, hlen);
+}
+
+int rte_vmbus_chan_recv_raw(struct vmbus_channel *chan,
+			    void *data, uint32_t *len)
+{
+	struct vmbus_chanpkt_hdr pkt;
+	uint32_t dlen, bufferlen = *len;
+	int error;
+
+	error = vmbus_rxbr_peek(&chan->rxbr, &pkt, sizeof(pkt));
+	if (error)
+		return error;
+
+	if (unlikely(pkt.hlen < VMBUS_CHANPKT_HLEN_MIN)) {
+		VMBUS_LOG(ERR, "VMBUS recv, invalid hlen %u", pkt.hlen);
+		/* XXX this channel is dead actually. */
+		return -EIO;
+	}
+
+	if (unlikely(pkt.hlen > pkt.tlen)) {
+		VMBUS_LOG(ERR, "VMBUS recv,invalid hlen %u and tlen %u",
+			pkt.hlen, pkt.tlen);
+		return -EIO;
+	}
+
+	/* Length are in quad words */
+	dlen = pkt.tlen << VMBUS_CHANPKT_SIZE_SHIFT;
+	*len = dlen;
+
+	/* If caller buffer is not large enough */
+	if (unlikely(dlen > bufferlen))
+		return -ENOBUFS;
+
+	/* Put packet header in data buffer */
+	return vmbus_read_and_signal(chan, data, dlen, 0);
+}
+
+/* Setup the primary channel */
+int rte_vmbus_chan_open(const struct rte_vmbus_device *device,
+			struct vmbus_channel **new_chan)
+{
+	struct vmbus_channel *chan;
+	int err;
+
+	chan = rte_zmalloc_socket("VMBUS", sizeof(*chan), RTE_CACHE_LINE_SIZE,
+				  device->device.numa_node);
+	if (!chan) {
+		VMBUS_LOG(ERR, "failed to allocate channel");
+		return -ENOMEM;
+	}
+
+	STAILQ_INIT(&chan->subchannel_list);
+	chan->device = device;
+	chan->relid = device->relid;
+	chan->monitor_id = device->monitor_id;
+
+	err = vmbus_uio_map_rings(chan);
+	if (err) {
+		rte_free(chan);
+		return err;
+	}
+
+	*new_chan = chan;
+	return 0;
+}
+
+/* Setup secondary channel */
+int rte_vmbus_subchan_open(struct vmbus_channel *primary,
+			   struct vmbus_channel **new_chan)
+{
+	struct vmbus_channel *chan;
+	int err;
+
+	err = vmbus_uio_get_subchan(primary, &chan);
+	if (err)
+		return err;
+
+	STAILQ_INSERT_TAIL(&primary->subchannel_list, chan, next);
+	*new_chan = chan;
+	return 0;
+}
+
+uint16_t rte_vmbus_sub_channel_index(const struct vmbus_channel *chan)
+{
+	return chan->subchannel_id;
+}
+
+void rte_vmbus_chan_close(struct vmbus_channel *chan)
+{
+	rte_free(chan);
+}
diff --git a/drivers/bus/vmbus/vmbus_common.c b/drivers/bus/vmbus/vmbus_common.c
new file mode 100644
index 000000000000..0182875b2128
--- /dev/null
+++ b/drivers/bus/vmbus/vmbus_common.c
@@ -0,0 +1,287 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2018, Microsoft Corporation.
+ * All Rights Reserved.
+ */
+
+#include <string.h>
+#include <unistd.h>
+#include <dirent.h>
+#include <fcntl.h>
+#include <sys/queue.h>
+#include <sys/mman.h>
+
+#include <rte_log.h>
+#include <rte_bus.h>
+#include <rte_eal.h>
+#include <rte_tailq.h>
+#include <rte_devargs.h>
+#include <rte_malloc.h>
+#include <rte_errno.h>
+#include <rte_memory.h>
+#include <rte_bus_vmbus.h>
+
+#include "private.h"
+
+int vmbus_logtype_bus;
+extern struct rte_vmbus_bus rte_vmbus_bus;
+
+/* map a particular resource from a file */
+void *
+vmbus_map_resource(void *requested_addr, int fd, off_t offset, size_t size,
+		   int flags)
+{
+	void *mapaddr;
+
+	/* Map the memory resource of device */
+	mapaddr = mmap(requested_addr, size, PROT_READ | PROT_WRITE,
+		       MAP_SHARED | flags, fd, offset);
+	if (mapaddr == MAP_FAILED) {
+		VMBUS_LOG(ERR,
+			"%s(): cannot mmap(%d, %p, 0x%lx, 0x%lx): %s)",
+			__func__, fd, requested_addr,
+			(unsigned long)size, (unsigned long)offset,
+			strerror(errno));
+	}
+	return mapaddr;
+}
+
+/* unmap a particular resource */
+void
+vmbus_unmap_resource(void *requested_addr, size_t size)
+{
+	if (requested_addr == NULL)
+		return;
+
+	/* Unmap the VMBUS memory resource of device */
+	if (munmap(requested_addr, size)) {
+		VMBUS_LOG(ERR, "%s(): cannot munmap(%p, 0x%lx): %s",
+			__func__, requested_addr, (unsigned long)size,
+			strerror(errno));
+	} else
+		VMBUS_LOG(DEBUG, "  VMBUS memory unmapped at %p",
+			  requested_addr);
+}
+
+/**
+ * Match the VMBUS driver and device using UUID table
+ *
+ * @param drv
+ *	VMBUS driver from which ID table would be extracted
+ * @param pci_dev
+ *	VMBUS device to match against the driver
+ * @return
+ *	true for successful match
+ *	false for unsuccessful match
+ */
+static bool
+vmbus_match(const struct rte_vmbus_driver *dr,
+	    const struct rte_vmbus_device *dev)
+{
+	const uuid_t *id_table;
+
+	for (id_table = dr->id_table; !uuid_is_null(*id_table); ++id_table) {
+		if (uuid_compare(*id_table, dev->class_id) == 0)
+			return true;
+	}
+
+	return false;
+}
+
+/*
+ * If device ID match, call the devinit() function of the driver.
+ */
+static int
+vmbus_probe_one_driver(struct rte_vmbus_driver *dr,
+		       struct rte_vmbus_device *dev)
+{
+	char guid[UUID_BUF_SZ];
+	int ret;
+
+	if (!vmbus_match(dr, dev))
+		return 1;	 /* not supported */
+
+	uuid_unparse(dev->device_id, guid);
+	VMBUS_LOG(INFO, "VMBUS device %s on NUMA socket %i",
+		  guid, dev->device.numa_node);
+
+	/* TODO add blacklisted */
+
+	/* map resources for device */
+	ret = rte_vmbus_map_device(dev);
+	if (ret != 0)
+		return ret;
+
+	/* reference driver structure */
+	dev->driver = dr;
+	dev->device.driver = &dr->driver;
+
+	if (dev->device.numa_node < 0) {
+		VMBUS_LOG(WARNING, "  Invalid NUMA socket, default to 0");
+		dev->device.numa_node = 0;
+	}
+
+	/* call the driver probe() function */
+	VMBUS_LOG(INFO, "  probe driver: %s", dr->driver.name);
+	ret = dr->probe(dr, dev);
+	if (ret) {
+		dev->driver = NULL;
+		rte_vmbus_unmap_device(dev);
+	}
+
+	return ret;
+}
+
+/*
+ * IF device class GUID mathces, call the probe function of
+ * registere drivers for the vmbus device.
+ * Return -1 if initialization failed,
+ * and 1 if no driver found for this device.
+ */
+static int
+vmbus_probe_all_drivers(struct rte_vmbus_device *dev)
+{
+	struct rte_vmbus_driver *dr;
+	int rc;
+
+	/* Check if a driver is already loaded */
+	if (dev->driver != NULL) {
+		VMBUS_LOG(DEBUG, "VMBUS driver already loaded");
+		return 0;
+	}
+
+	FOREACH_DRIVER_ON_VMBUS(dr) {
+		rc = vmbus_probe_one_driver(dr, dev);
+		if (rc < 0) /* negative is an error */
+			return -1;
+
+		if (rc > 0) /* positive driver doesn't support it */
+			continue;
+
+		return 0;
+	}
+	return 1;
+}
+
+/*
+ * Scan the vmbus, and call the devinit() function for
+ * all registered drivers that have a matching entry in its id_table
+ * for discovered devices.
+ */
+int
+rte_vmbus_probe(void)
+{
+	struct rte_vmbus_device *dev;
+	size_t probed = 0, failed = 0;
+	char ubuf[UUID_BUF_SZ];
+
+	FOREACH_DEVICE_ON_VMBUS(dev) {
+		probed++;
+
+		uuid_unparse(dev->device_id, ubuf);
+
+		/* TODO: add whitelist/blacklist */
+
+		if (vmbus_probe_all_drivers(dev) < 0) {
+			VMBUS_LOG(NOTICE,
+				"Requested device %s cannot be used", ubuf);
+			rte_errno = errno;
+			failed++;
+		}
+	}
+
+	return (probed && probed == failed) ? -1 : 0;
+}
+
+static int
+vmbus_parse(const char *name, void *addr)
+{
+	uuid_t guid;
+	int ret;
+
+	ret = uuid_parse(name, guid);
+	if (ret == 0 && addr)
+		memcpy(addr, &guid, sizeof(guid));
+
+	return ret;
+}
+
+/* register vmbus driver */
+void
+rte_vmbus_register(struct rte_vmbus_driver *driver)
+{
+	VMBUS_LOG(DEBUG,
+		"Registered driver %s", driver->driver.name);
+
+	TAILQ_INSERT_TAIL(&rte_vmbus_bus.driver_list, driver, next);
+	driver->bus = &rte_vmbus_bus;
+}
+
+/* unregister vmbus driver */
+void
+rte_vmbus_unregister(struct rte_vmbus_driver *driver)
+{
+	TAILQ_REMOVE(&rte_vmbus_bus.driver_list, driver, next);
+	driver->bus = NULL;
+}
+
+/* Add a device to VMBUS bus */
+void
+vmbus_add_device(struct rte_vmbus_device *vmbus_dev)
+{
+	TAILQ_INSERT_TAIL(&rte_vmbus_bus.device_list, vmbus_dev, next);
+}
+
+/* Insert a device into a predefined position in VMBUS bus */
+void
+vmbus_insert_device(struct rte_vmbus_device *exist_vmbus_dev,
+		      struct rte_vmbus_device *new_vmbus_dev)
+{
+	TAILQ_INSERT_BEFORE(exist_vmbus_dev, new_vmbus_dev, next);
+}
+
+/* Remove a device from VMBUS bus */
+void
+vmbus_remove_device(struct rte_vmbus_device *vmbus_dev)
+{
+	TAILQ_REMOVE(&rte_vmbus_bus.device_list, vmbus_dev, next);
+}
+
+/* VMBUS doesn't support hotplug */
+static struct rte_device *
+vmbus_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
+		  const void *data)
+{
+	struct rte_vmbus_device *dev;
+
+	FOREACH_DEVICE_ON_VMBUS(dev) {
+		if (start && &dev->device == start) {
+			start = NULL;
+			continue;
+		}
+		if (cmp(&dev->device, data) == 0)
+			return &dev->device;
+	}
+
+	return NULL;
+}
+
+
+struct rte_vmbus_bus rte_vmbus_bus = {
+	.bus = {
+		.scan = rte_vmbus_scan,
+		.probe = rte_vmbus_probe,
+		.find_device = vmbus_find_device,
+		.parse = vmbus_parse,
+	},
+	.device_list = TAILQ_HEAD_INITIALIZER(rte_vmbus_bus.device_list),
+	.driver_list = TAILQ_HEAD_INITIALIZER(rte_vmbus_bus.driver_list),
+};
+
+RTE_REGISTER_BUS(vmbus, rte_vmbus_bus.bus);
+
+RTE_INIT(vmbus_init_log)
+{
+	vmbus_logtype_bus = rte_log_register("bus.vmbus");
+	if (vmbus_logtype_bus >= 0)
+		rte_log_set_level(vmbus_logtype_bus, RTE_LOG_NOTICE);
+}
diff --git a/drivers/bus/vmbus/vmbus_common_uio.c b/drivers/bus/vmbus/vmbus_common_uio.c
new file mode 100644
index 000000000000..8e66a3c4fbef
--- /dev/null
+++ b/drivers/bus/vmbus/vmbus_common_uio.c
@@ -0,0 +1,232 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2018, Microsoft Corporation.
+ * All Rights Reserved.
+ */
+
+#include <fcntl.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/mman.h>
+
+#include <rte_eal.h>
+#include <rte_tailq.h>
+#include <rte_log.h>
+#include <rte_malloc.h>
+#include <rte_bus.h>
+#include <rte_bus_vmbus.h>
+
+#include "private.h"
+
+static struct rte_tailq_elem vmbus_tailq = {
+	.name = "VMBUS_RESOURCE_LIST",
+};
+EAL_REGISTER_TAILQ(vmbus_tailq)
+
+static int
+vmbus_uio_map_secondary(struct rte_vmbus_device *dev)
+{
+	int fd, i, j;
+	struct mapped_vmbus_resource *uio_res;
+	struct mapped_vmbus_res_list *uio_res_list
+		= RTE_TAILQ_CAST(vmbus_tailq.head, mapped_vmbus_res_list);
+
+	TAILQ_FOREACH(uio_res, uio_res_list, next) {
+
+		/* skip this element if it doesn't match our UUID */
+		if (uuid_compare(uio_res->id, dev->device_id) != 0)
+			continue;
+
+		/* open /dev/uioX */
+		fd = open(uio_res->path, O_RDWR);
+		if (fd < 0) {
+			VMBUS_LOG(ERR, "Cannot open %s: %s",
+				  uio_res->path, strerror(errno));
+			return -1;
+		}
+
+		for (i = 0; i != uio_res->nb_maps; i++) {
+			void *mapaddr;
+
+			mapaddr = vmbus_map_resource(uio_res->maps[i].addr,
+						     fd, 0,
+						     uio_res->maps[i].size, 0);
+
+			if (mapaddr == uio_res->maps[i].addr)
+				continue;
+
+			VMBUS_LOG(ERR,
+				  "Cannot mmap device resource file %s to address: %p",
+				  uio_res->path, uio_res->maps[i].addr);
+
+			if (mapaddr != MAP_FAILED) {
+				/* unmap addrs correctly mapped */
+				for (j = 0; j < i; j++)
+					vmbus_unmap_resource(uio_res->maps[j].addr,
+							     (size_t)uio_res->maps[j].size);
+				/* unmap addr wrongly mapped */
+				vmbus_unmap_resource(mapaddr,
+						     (size_t)uio_res->maps[i].size);
+			}
+
+			close(fd);
+			return -1;
+		}
+
+		/* fd is not needed in slave process, close it */
+		close(fd);
+		return 0;
+	}
+
+	VMBUS_LOG(ERR,  "Cannot find resource for device");
+	return 1;
+}
+
+static int
+vmbus_uio_map_primary(struct rte_vmbus_device *dev)
+{
+	int i, ret;
+	struct mapped_vmbus_resource *uio_res = NULL;
+	struct mapped_vmbus_res_list *uio_res_list =
+		RTE_TAILQ_CAST(vmbus_tailq.head, mapped_vmbus_res_list);
+
+	/* allocate uio resource */
+	ret = vmbus_uio_alloc_resource(dev, &uio_res);
+	if (ret)
+		return ret;
+
+	/* Map the resources */
+	for (i = 0; i < VMBUS_MAX_RESOURCE; i++) {
+		/* skip empty BAR */
+		if (dev->resource[i].len == 0)
+			continue;
+
+		ret = vmbus_uio_map_resource_by_index(dev, i, uio_res, 0);
+		if (ret)
+			goto error;
+	}
+
+	uio_res->nb_maps = i;
+
+	TAILQ_INSERT_TAIL(uio_res_list, uio_res, next);
+
+	return 0;
+error:
+	while (--i >= 0) {
+		vmbus_unmap_resource(uio_res->maps[i].addr,
+				(size_t)uio_res->maps[i].size);
+	}
+	vmbus_uio_free_resource(dev, uio_res);
+	return -1;
+}
+
+
+struct mapped_vmbus_resource *
+vmbus_uio_find_resource(const struct rte_vmbus_device *dev)
+{
+	struct mapped_vmbus_resource *uio_res;
+	struct mapped_vmbus_res_list *uio_res_list =
+			RTE_TAILQ_CAST(vmbus_tailq.head, mapped_vmbus_res_list);
+
+	if (dev == NULL)
+		return NULL;
+
+	TAILQ_FOREACH(uio_res, uio_res_list, next) {
+		/* skip this element if it doesn't match our VMBUS address */
+		if (uuid_compare(uio_res->id, dev->device_id) == 0)
+			return uio_res;
+	}
+	return NULL;
+}
+
+/* map the VMBUS resource of a VMBUS device in virtual memory */
+int
+vmbus_uio_map_resource(struct rte_vmbus_device *dev)
+{
+	struct mapped_vmbus_resource *uio_res;
+	int ret;
+
+	/* TODO: handle rescind */
+	dev->intr_handle.fd = -1;
+	dev->intr_handle.uio_cfg_fd = -1;
+	dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
+
+	/* secondary processes - use already recorded details */
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+		ret = vmbus_uio_map_secondary(dev);
+	else
+		ret = vmbus_uio_map_primary(dev);
+
+	if (ret != 0)
+		return ret;
+
+	uio_res = vmbus_uio_find_resource(dev);
+	if (!uio_res) {
+		VMBUS_LOG(ERR, "can not find resources!");
+		return -EIO;
+	}
+
+	if (uio_res->nb_maps <= HV_MON_PAGE_MAP) {
+		VMBUS_LOG(ERR, "VMBUS: only %u resources found!",
+			uio_res->nb_maps);
+		return -EINVAL;
+	}
+
+	dev->int_page = (uint32_t *)((char *)uio_res->maps[HV_INT_PAGE_MAP].addr
+				     + (PAGE_SIZE >> 1));
+	dev->monitor_page = uio_res->maps[HV_MON_PAGE_MAP].addr;
+	return 0;
+}
+
+static void
+vmbus_uio_unmap(struct mapped_vmbus_resource *uio_res)
+{
+	int i;
+
+	if (uio_res == NULL)
+		return;
+
+	for (i = 0; i != uio_res->nb_maps; i++) {
+		vmbus_unmap_resource(uio_res->maps[i].addr,
+				     (size_t)uio_res->maps[i].size);
+	}
+}
+
+/* unmap the VMBUS resource of a VMBUS device in virtual memory */
+void
+vmbus_uio_unmap_resource(struct rte_vmbus_device *dev)
+{
+	struct mapped_vmbus_resource *uio_res;
+	struct mapped_vmbus_res_list *uio_res_list =
+			RTE_TAILQ_CAST(vmbus_tailq.head, mapped_vmbus_res_list);
+
+	if (dev == NULL)
+		return;
+
+	/* find an entry for the device */
+	uio_res = vmbus_uio_find_resource(dev);
+	if (uio_res == NULL)
+		return;
+
+	/* secondary processes - just free maps */
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+		return vmbus_uio_unmap(uio_res);
+
+	TAILQ_REMOVE(uio_res_list, uio_res, next);
+
+	/* unmap all resources */
+	vmbus_uio_unmap(uio_res);
+
+	/* free uio resource */
+	rte_free(uio_res);
+
+	/* close fd if in primary process */
+	close(dev->intr_handle.fd);
+	if (dev->intr_handle.uio_cfg_fd >= 0) {
+		close(dev->intr_handle.uio_cfg_fd);
+		dev->intr_handle.uio_cfg_fd = -1;
+	}
+
+	dev->intr_handle.fd = -1;
+	dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
+}
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index a9b4b0502ff4..b2be8f23cd96 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -183,6 +183,7 @@ ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST)      += -lrte_pmd_vhost
 endif # $(CONFIG_RTE_LIBRTE_VHOST)
 _LDLIBS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD)    += -lrte_pmd_vmxnet3_uio
+_LDLIBS-$(CONFIG_RTE_LIBRTE_VMBUS)	    += -lrte_bus_vmbus -luuid
 
 ifeq ($(CONFIG_RTE_LIBRTE_BBDEV),y)
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BBDEV_NULL)     += -lrte_pmd_bbdev_null
-- 
2.16.3

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [dpdk-dev] [PATCH 2/3] usertools: add hv_uio_setup script
  2018-04-05 19:13 [dpdk-dev] [PATCH 0/3] add Hyper-V bus and network driver Stephen Hemminger
  2018-04-05 19:13 ` [dpdk-dev] [PATCH 1/3] bus/vmbus: add hyper-v virtual bus support Stephen Hemminger
@ 2018-04-05 19:13 ` Stephen Hemminger
  2018-04-05 20:43   ` Thomas Monjalon
  2018-04-05 19:13 ` [dpdk-dev] [PATCH 3/3] net/netvsc: add hyper-v netvsc network device Stephen Hemminger
  2 siblings, 1 reply; 20+ messages in thread
From: Stephen Hemminger @ 2018-04-05 19:13 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Small script to rebind netvsc kernel device to Hyper-V
networking PMD. It could be integrated in dpdk-bind, but dpdk-bind
is focused on PCI, and that would get messy.

Eventually, this functionality will be built into netvsc driver
(see vdev_netvsc as an example).

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
 usertools/hv_uio_setup.sh | 40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)
 create mode 100755 usertools/hv_uio_setup.sh

diff --git a/usertools/hv_uio_setup.sh b/usertools/hv_uio_setup.sh
new file mode 100755
index 000000000000..9885a0e80828
--- /dev/null
+++ b/usertools/hv_uio_setup.sh
@@ -0,0 +1,40 @@
+#! /bin/bash
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Microsoft Corporation
+
+module=uio_hv_generic
+# Hyper-V network device GUID
+net_guid="f8615163-df3e-46c5-913f-f2d2f965ed0e"
+
+if [ $# -ne 1 ]; then
+	echo "Usage: $0 ethN"
+	exit 1
+fi
+
+syspath=/sys/class/net/$1/device
+devpath=$(readlink $syspath)
+if [ $? -ne 0 ]; then
+	echo "$1 no device present"
+	exit 1
+fi
+dev_guid=$(basename $devpath)
+
+driver=$(readlink $syspath/driver)
+if [ $? -ne 0 ]; then
+	echo "$1 driver not found"
+	exit 1
+fi
+existing=$(basename $driver)
+
+if [ "$existing" != "hv_netvsc" ]; then
+	echo "$1 controlled by $existing"
+	exit 1
+fi
+
+if [ ! -d /sys/module/$module ]; then
+    modprobe $module
+    echo $net_guid >/sys/bus/vmbus/drivers/uio_hv_generic/new_id
+fi
+
+echo $dev_guid > /sys/bus/vmbus/drivers/$existing/unbind
+echo $dev_guid > /sys/bus/vmbus/drivers/$module/bind
-- 
2.16.3

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [dpdk-dev] [PATCH 3/3] net/netvsc: add hyper-v netvsc network device
  2018-04-05 19:13 [dpdk-dev] [PATCH 0/3] add Hyper-V bus and network driver Stephen Hemminger
  2018-04-05 19:13 ` [dpdk-dev] [PATCH 1/3] bus/vmbus: add hyper-v virtual bus support Stephen Hemminger
  2018-04-05 19:13 ` [dpdk-dev] [PATCH 2/3] usertools: add hv_uio_setup script Stephen Hemminger
@ 2018-04-05 19:13 ` Stephen Hemminger
  2018-04-05 20:52   ` Thomas Monjalon
  2 siblings, 1 reply; 20+ messages in thread
From: Stephen Hemminger @ 2018-04-05 19:13 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Stephen Hemminger

From: Stephen Hemminger <stephen@networkplumber.org>

Add VMBUS network device PMD. This code is based off
of the FreeBSD driver. The file and variable names are
kept the same to help with understanding (with most of
the BSD style warts removed).

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
 MAINTAINERS                                   |    7 +
 config/common_base                            |    8 +
 config/common_linuxapp                        |    2 +-
 doc/guides/nics/index.rst                     |    1 +
 doc/guides/nics/netvsc.rst                    |   53 ++
 drivers/bus/vmbus/Makefile                    |    2 +-
 drivers/net/Makefile                          |    1 +
 drivers/net/netvsc/Makefile                   |   23 +
 drivers/net/netvsc/hn_ethdev.c                |  751 +++++++++++++++
 drivers/net/netvsc/hn_logs.h                  |   35 +
 drivers/net/netvsc/hn_nvs.c                   |  533 +++++++++++
 drivers/net/netvsc/hn_nvs.h                   |  243 +++++
 drivers/net/netvsc/hn_rndis.c                 | 1101 ++++++++++++++++++++++
 drivers/net/netvsc/hn_rndis.h                 |   26 +
 drivers/net/netvsc/hn_rxtx.c                  | 1224 +++++++++++++++++++++++++
 drivers/net/netvsc/hn_var.h                   |  140 +++
 drivers/net/netvsc/ndis.h                     |  378 ++++++++
 drivers/net/netvsc/rndis.h                    |  414 +++++++++
 drivers/net/netvsc/rte_pmd_netvsc_version.map |    5 +
 mk/rte.app.mk                                 |    1 +
 20 files changed, 4946 insertions(+), 2 deletions(-)
 create mode 100644 doc/guides/nics/netvsc.rst
 create mode 100644 drivers/net/netvsc/Makefile
 create mode 100644 drivers/net/netvsc/hn_ethdev.c
 create mode 100644 drivers/net/netvsc/hn_logs.h
 create mode 100644 drivers/net/netvsc/hn_nvs.c
 create mode 100644 drivers/net/netvsc/hn_nvs.h
 create mode 100644 drivers/net/netvsc/hn_rndis.c
 create mode 100644 drivers/net/netvsc/hn_rndis.h
 create mode 100644 drivers/net/netvsc/hn_rxtx.c
 create mode 100644 drivers/net/netvsc/hn_var.h
 create mode 100644 drivers/net/netvsc/ndis.h
 create mode 100644 drivers/net/netvsc/rndis.h
 create mode 100644 drivers/net/netvsc/rte_pmd_netvsc_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 4b72bf4b09dd..681cea53032b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -433,6 +433,13 @@ F: drivers/net/enic/
 F: doc/guides/nics/enic.rst
 F: doc/guides/nics/features/enic.ini
 
+Hyper-V netvsc
+M: Stephen Hemminger <sthemmin@microsoft.com>
+M: K. Y. Srinivasan" <kys@microsoft.com>
+M: Haiyang Zhang <haiyangz@microsoft.com>
+F: drivers/net/hyperv/
+F: doc/guides/nics/hyperv.rst
+
 Intel e1000
 M: Wenzhuo Lu <wenzhuo.lu@intel.com>
 T: git://dpdk.org/next/dpdk-next-net-intel
diff --git a/config/common_base b/config/common_base
index fa3b80fe69c4..aa4592ff81cf 100644
--- a/config/common_base
+++ b/config/common_base
@@ -390,7 +390,15 @@ CONFIG_RTE_LIBRTE_MVPP2_PMD=n
 #
 CONFIG_RTE_LIBRTE_VMBUS=n
 
+#
+# Compile native PMD for Hyper-V/Azure
+#
+CONFIG_RTE_LIBRTE_NETVSC_PMD=n
+CONFIG_RTE_LIBRTE_NETVSC_DEBUG_RX=n
+CONFIG_RTE_LIBRTE_NETVSC_DEBUG_TX=n
+CONFIG_RTE_LIBRTE_NETVSC_DEBUG_DUMP=n
 
+#
 # Compile virtual device driver for NetVSC on Hyper-V/Azure
 #
 CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD=n
diff --git a/config/common_linuxapp b/config/common_linuxapp
index 30f24d0362c5..83577c75a161 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -40,4 +40,4 @@ CONFIG_RTE_LIBRTE_PMD_DPAA2_SEC=y
 
 # Hyper-V Virtual Machine bus and drivers
 CONFIG_RTE_LIBRTE_VMBUS=y
-
+CONFIG_RTE_LIBRTE_NETVSC_PMD=y
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 51c453d9ce57..a97d6784f4ae 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -22,6 +22,7 @@ Network Interface Controller Drivers
     ena
     enic
     fm10k
+    hyperv
     i40e
     igb
     ixgbe
diff --git a/doc/guides/nics/netvsc.rst b/doc/guides/nics/netvsc.rst
new file mode 100644
index 000000000000..eae744bf8271
--- /dev/null
+++ b/doc/guides/nics/netvsc.rst
@@ -0,0 +1,53 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) Microsoft Corporation.
+
+Poll Mode Driver for Hyper-V Network Virtual NIC
+================================================
+
+Hyper-V is a hypervisor integrated into Window Server 2008, Windows 10
+and later versions.  It supports a para-virtualized network interface
+called netvsc that is visible on the virtual machine bus (VMBUS).  In
+the Data Plane Development Kit (DPDK), we provide a Netwwork Virtual
+Service Client (NetVSC) Poll Mode Driver (PMD). The NetVSC PMD
+supports Windows Server 2016 and Microsoft Azure cloud.
+
+NetVSC Implementation in DPDK
+-----------------------------
+
+The Netvsc PMD is a standalone driver. VMBus network devices that are
+being used by DPDK must be unbound from the Linux kernel driver
+(hv_netvsc) and bound to the Userspace IO driver for Hyper-V
+(uio_hv_generic).
+
+
+Features and Limitations of Hyper-V PMD
+---------------------------------------
+
+In this release, the hyper PMD driver provides the basic functionality of packet reception and transmission.
+
+*   It supports merge-able buffers per packet when receiving packets and scattered buffer per packet
+    when transmitting packets. The packet size supported is from 64 to 65536.
+
+*   It supports multicast packets and promiscuous mode. In order to this to work, the guest network
+    configuration on Hyper-V must be configured to allow this as well.
+
+*   Hyper-V driver does not support MAC or VLAN filtering because the host does not support it.
+    The device has only a single MAC address.
+
+*   VLAN tags are always stripped and presented in mbuf tci field.
+
+*   The Hyper-V driver does not use or support Link State or Rx interrupt.
+
+*   The number of queues is limited by the host (currently 64).
+
+*   SR-IOV accleration is not supported yet.
+
+
+Prerequisites
+-------------
+
+The following prerequisites apply:
+
+*   Linux kernel uio_hv_generic driver that supports subchannels. This should be present in 4.17 or later.
+
+*   If using the Netvsc PMD, the VDEV_NETVSC driver should *not* be used.
diff --git a/drivers/bus/vmbus/Makefile b/drivers/bus/vmbus/Makefile
index c4ca1129c7ea..54fd309d0f69 100644
--- a/drivers/bus/vmbus/Makefile
+++ b/drivers/bus/vmbus/Makefile
@@ -7,7 +7,7 @@ LIBABIVER := 1
 EXPORT_MAP := rte_bus_vmbus_version.map
 
 CFLAGS += -I$(SRCDIR)
-CFLAGS += -O3 $(WERROR_FLAGS)
+CFLAGS += -O3 $(WERROR_FLAGS)  -g
 CFLAGS += -DALLOW_EXPERIMENTAL_API
 
 ifneq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),)
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 37ca19aa7c41..f5666e2115ad 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -46,6 +46,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD) += thunderx
 DIRS-$(CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD) += vdev_netvsc
 DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio
 DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3
+DIRS-$(CONFIG_RTE_LIBRTE_NETVSC_PMD) += netvsc
 
 ifeq ($(CONFIG_RTE_LIBRTE_KNI),y)
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_KNI) += kni
diff --git a/drivers/net/netvsc/Makefile b/drivers/net/netvsc/Makefile
new file mode 100644
index 000000000000..3c713af3c8fc
--- /dev/null
+++ b/drivers/net/netvsc/Makefile
@@ -0,0 +1,23 @@
+# SPDX-License-Identifier: BSD-3-Clause
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+LIB = librte_pmd_netvsc.a
+
+CFLAGS += -O3 $(WERROR_FLAGS)
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+
+EXPORT_MAP := rte_pmd_netvsc_version.map
+
+LIBABIVER := 1
+
+SRCS-$(CONFIG_RTE_LIBRTE_NETVSC_PMD) += hn_ethdev.c
+SRCS-$(CONFIG_RTE_LIBRTE_NETVSC_PMD) += hn_rxtx.c
+SRCS-$(CONFIG_RTE_LIBRTE_NETVSC_PMD) += hn_rndis.c
+SRCS-$(CONFIG_RTE_LIBRTE_NETVSC_PMD) += hn_nvs.c
+
+LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
+LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs
+LDLIBS += -lrte_bus_vmbus
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/netvsc/hn_ethdev.c b/drivers/net/netvsc/hn_ethdev.c
new file mode 100644
index 000000000000..1cd1a64494eb
--- /dev/null
+++ b/drivers/net/netvsc/hn_ethdev.c
@@ -0,0 +1,751 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2016-2018 Microsoft Corporation
+ * Copyright(c) 2013-2016 Brocade Communications Systems, Inc.
+ * All rights reserved.
+ */
+
+#include <stdint.h>
+#include <string.h>
+#include <stdio.h>
+#include <errno.h>
+#include <unistd.h>
+
+#include <rte_ethdev.h>
+#include <rte_memcpy.h>
+#include <rte_string_fns.h>
+#include <rte_memzone.h>
+#include <rte_malloc.h>
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_ether.h>
+#include <rte_ethdev_driver.h>
+#include <rte_cycles.h>
+#include <rte_errno.h>
+#include <rte_memory.h>
+#include <rte_atomic.h>
+#include <rte_eal.h>
+#include <rte_dev.h>
+#include <rte_bus_vmbus.h>
+
+#include "hn_logs.h"
+#include "hn_var.h"
+#include "hn_rndis.h"
+#include "hn_nvs.h"
+#include "ndis.h"
+
+#define HN_TX_OFFLOAD_CAPS (DEV_TX_OFFLOAD_IPV4_CKSUM | \
+			    DEV_TX_OFFLOAD_TCP_CKSUM  | \
+			    DEV_TX_OFFLOAD_UDP_CKSUM  | \
+			    DEV_TX_OFFLOAD_TCP_TSO    | \
+			    DEV_TX_OFFLOAD_MULTI_SEGS | \
+			    DEV_TX_OFFLOAD_VLAN_INSERT)
+
+#define HN_RX_OFFLOAD_CAPS (DEV_RX_OFFLOAD_CHECKSUM | \
+			    DEV_RX_OFFLOAD_VLAN_STRIP | \
+			    DEV_RX_OFFLOAD_CRC_STRIP)
+
+int hn_logtype_init;
+int hn_logtype_driver;
+
+struct hn_xstats_name_off {
+	char name[RTE_ETH_XSTATS_NAME_SIZE];
+	unsigned int offset;
+};
+
+static const struct hn_xstats_name_off hn_stat_strings[] = {
+	{"good_packets",           offsetof(struct hn_stats, packets)},
+	{"good_bytes",             offsetof(struct hn_stats, bytes)},
+	{"errors",                 offsetof(struct hn_stats, errors)},
+	{"multicast_packets",      offsetof(struct hn_stats, multicast)},
+	{"broadcast_packets",      offsetof(struct hn_stats, broadcast)},
+	{"undersize_packets",      offsetof(struct hn_stats, size_bins[0])},
+	{"size_64_packets",        offsetof(struct hn_stats, size_bins[1])},
+	{"size_65_127_packets",    offsetof(struct hn_stats, size_bins[2])},
+	{"size_128_255_packets",   offsetof(struct hn_stats, size_bins[3])},
+	{"size_256_511_packets",   offsetof(struct hn_stats, size_bins[4])},
+	{"size_512_1023_packets",  offsetof(struct hn_stats, size_bins[5])},
+	{"size_1024_1518_packets", offsetof(struct hn_stats, size_bins[6])},
+	{"size_1519_max_packets",  offsetof(struct hn_stats, size_bins[7])},
+};
+
+static struct rte_eth_dev *
+eth_dev_vmbus_allocate(struct rte_vmbus_device *dev, size_t private_data_size)
+{
+	struct rte_eth_dev *eth_dev;
+	const char *name;
+
+	if (!dev)
+		return NULL;
+
+	name = dev->device.name;
+
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		eth_dev = rte_eth_dev_allocate(name);
+		if (!eth_dev)
+			return NULL;
+
+		if (private_data_size) {
+			eth_dev->data->dev_private = rte_zmalloc_socket(name,
+				private_data_size, RTE_CACHE_LINE_SIZE,
+				dev->device.numa_node);
+			if (!eth_dev->data->dev_private) {
+				rte_eth_dev_release_port(eth_dev);
+				return NULL;
+			}
+		}
+	} else {
+		eth_dev = rte_eth_dev_attach_secondary(name);
+		if (!eth_dev)
+			return NULL;
+	}
+
+	eth_dev->device = &dev->device;
+	eth_dev->intr_handle = &dev->intr_handle;
+
+	return eth_dev;
+}
+
+static void
+eth_dev_vmbus_release(struct rte_eth_dev *eth_dev)
+{
+	/* free ether device */
+	rte_eth_dev_release_port(eth_dev);
+
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+		rte_free(eth_dev->data->dev_private);
+
+	eth_dev->data->dev_private = NULL;
+
+	/*
+	 * Secondary process will check the name to attach.
+	 * Clear this field to avoid attaching a released ports.
+	 */
+	eth_dev->data->name[0] = '\0';
+
+	eth_dev->device = NULL;
+	eth_dev->intr_handle = NULL;
+}
+
+/* XXX Why is this not generic in RTE? */
+static int
+hn_dev_atomic_write_link_status(struct rte_eth_dev *dev,
+		struct rte_eth_link *link)
+{
+	struct rte_eth_link *dst = &dev->data->dev_link;
+	struct rte_eth_link *src = link;
+
+	if (rte_atomic64_cmpset((uint64_t *)dst, *(uint64_t *)dst,
+					*(uint64_t *)src) == 0)
+		return -1;
+
+	return 0;
+}
+
+/* Update link status.
+ * Note: the DPDK definition of "wait_to_complete"
+ *   means block this call until link is up.
+ *   which is not worth supporting.
+ */
+static int
+hn_dev_link_update(struct rte_eth_dev *dev,
+		   __rte_unused int wait_to_complete)
+{
+	struct hn_data *hv = dev->data->dev_private;
+	struct rte_eth_link link, old;
+	int error;
+
+	old = dev->data->dev_link;
+
+	error = hn_rndis_get_linkstatus(hv);
+	if (error)
+		return error;
+
+	hn_rndis_get_linkspeed(hv);
+
+	link = (struct rte_eth_link) {
+		.link_duplex = ETH_LINK_FULL_DUPLEX,
+		.link_autoneg = ETH_LINK_SPEED_FIXED,
+		.link_speed = hv->link_speed / 10000,
+	};
+
+	if (hv->link_status == NDIS_MEDIA_STATE_CONNECTED)
+		link.link_status = ETH_LINK_UP;
+	else
+		link.link_status = ETH_LINK_DOWN;
+
+	if (old.link_status == link.link_status)
+		return 0;
+
+	hn_dev_atomic_write_link_status(dev, &link);
+
+	PMD_INIT_LOG(DEBUG, "Port %d is %s", dev->data->port_id,
+		     (link.link_status == ETH_LINK_UP) ? "up" : "down");
+	return -1;
+}
+
+static void hn_dev_info_get(struct rte_eth_dev *dev,
+			    struct rte_eth_dev_info *dev_info)
+{
+	struct hn_data *hv = dev->data->dev_private;
+
+	dev_info->speed_capa = ETH_LINK_SPEED_10G;
+	dev_info->min_rx_bufsize = HN_MIN_RX_BUF_SIZE;
+	dev_info->max_rx_pktlen  = HN_MAX_XFER_LEN;
+	dev_info->max_mac_addrs  = 1;
+	dev_info->default_txconf = (struct rte_eth_txconf) {
+		.txq_flags = ETH_TXQ_FLAGS_NOXSUMS,
+	};
+
+	dev_info->max_rx_queues = hv->max_queues;
+	dev_info->max_tx_queues = hv->max_queues;
+
+	hn_rndis_get_offload(hv, dev_info);
+}
+
+static void
+hn_dev_promiscuous_enable(struct rte_eth_dev *dev)
+{
+	struct hn_data *hv = dev->data->dev_private;
+
+	hn_rndis_set_rxfilter(hv, NDIS_PACKET_TYPE_PROMISCUOUS);
+}
+
+static void
+hn_dev_promiscuous_disable(struct rte_eth_dev *dev)
+{
+	struct hn_data *hv = dev->data->dev_private;
+	uint32_t filter;
+
+	filter = NDIS_PACKET_TYPE_DIRECTED | NDIS_PACKET_TYPE_BROADCAST;
+	if (dev->data->all_multicast)
+		filter |= NDIS_PACKET_TYPE_ALL_MULTICAST;
+	hn_rndis_set_rxfilter(hv, filter);
+}
+
+static void
+hn_dev_allmulticast_enable(struct rte_eth_dev *dev)
+{
+	struct hn_data *hv = dev->data->dev_private;
+
+	hn_rndis_set_rxfilter(hv, NDIS_PACKET_TYPE_DIRECTED |
+			      NDIS_PACKET_TYPE_ALL_MULTICAST |
+			NDIS_PACKET_TYPE_BROADCAST);
+}
+
+static void
+hn_dev_allmulticast_disable(struct rte_eth_dev *dev)
+{
+	struct hn_data *hv = dev->data->dev_private;
+
+	hn_rndis_set_rxfilter(hv, NDIS_PACKET_TYPE_DIRECTED |
+			     NDIS_PACKET_TYPE_BROADCAST);
+}
+
+/* Setup shared rx/tx queue data */
+static int hn_subchan_configure(struct hn_data *hv,
+				uint32_t subchan)
+{
+	struct vmbus_channel *primary = hn_primary_chan(hv);
+	int err;
+
+	PMD_DRV_LOG(DEBUG,
+		    "open %u subchannels", subchan);
+
+	/* Setup notifier for vmbus sub channels */
+	err = hn_nvs_alloc_subchans(hv, &subchan);
+	if (err)
+		return  err;
+
+	while (subchan > 0) {
+		struct vmbus_channel *new_sc;
+		uint16_t chn_index;
+
+		err = rte_vmbus_subchan_open(primary, &new_sc);
+		if (err == -ENOENT) {
+			rte_delay_ms(100);
+			continue;
+		}
+
+		if (err) {
+			PMD_DRV_LOG(ERR,
+				    "open subchannel failed: %d", err);
+			return err;
+		}
+
+		chn_index = rte_vmbus_sub_channel_index(new_sc);
+		if (chn_index == 0 || chn_index > hv->max_queues) {
+			PMD_DRV_LOG(ERR,
+				    "Invalid subchannel offermsg channel %u",
+				    chn_index);
+			return -EIO;
+		}
+
+		PMD_DRV_LOG(DEBUG, "new sub channel %u", chn_index);
+		hv->channels[chn_index] = new_sc;
+		--subchan;
+	}
+
+	return err;
+}
+
+static int hn_dev_configure(struct rte_eth_dev *dev)
+{
+	const struct rte_eth_conf *dev_conf = &dev->data->dev_conf;
+	const struct rte_eth_rxmode *rxmode = &dev_conf->rxmode;
+	const struct rte_eth_txmode *txmode = &dev_conf->txmode;
+	const struct rte_eth_rss_conf *rss_conf
+		= &dev_conf->rx_adv_conf.rss_conf;
+	struct hn_data *hv = dev->data->dev_private;
+	uint64_t unsupported;
+	int err, subchan;
+
+	PMD_INIT_FUNC_TRACE();
+
+	if (dev->data->nb_tx_queues < dev->data->nb_rx_queues) {
+		PMD_DRV_LOG(INFO,
+			    "increased number of tx queues (%u) to match rx queues(%u)",
+			    dev->data->nb_tx_queues,
+			    dev->data->nb_rx_queues);
+		dev->data->nb_tx_queues = dev->data->nb_rx_queues;
+	}
+	hv->num_queues = dev->data->nb_rx_queues;
+
+	unsupported = txmode->offloads & ~HN_TX_OFFLOAD_CAPS;
+	if (unsupported) {
+		PMD_DRV_LOG(NOTICE,
+			    "unsupported TX offload: %#" PRIx64,
+			    unsupported);
+		return -EINVAL;
+	}
+
+	unsupported = rxmode->offloads & ~HN_RX_OFFLOAD_CAPS;
+	if (unsupported) {
+		PMD_DRV_LOG(NOTICE,
+			    "unsupported RX offload: %#" PRIx64,
+			    rxmode->offloads);
+		return -EINVAL;
+	}
+
+	err = hn_rndis_conf_offload(hv, txmode->offloads,
+				    rxmode->offloads);
+	if (err)
+		return err;
+
+	subchan = hv->num_queues - 1;
+	if (subchan > 0) {
+		err = hn_subchan_configure(hv, subchan);
+		if (err)
+			return err;
+
+		err = hn_rndis_conf_rss(hv, rss_conf);
+		if (err)
+			return err;
+	}
+	return 0;
+}
+
+static int hn_dev_stats_get(struct rte_eth_dev *dev,
+			    struct rte_eth_stats *stats)
+{
+	unsigned int i;
+
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		const struct hn_tx_queue *txq = dev->data->tx_queues[i];
+		if (txq == NULL)
+			continue;
+
+		stats->opackets += txq->stats.packets;
+		stats->obytes += txq->stats.bytes;
+		stats->oerrors += txq->stats.errors;
+
+		if (i < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
+			stats->q_opackets[i] = txq->stats.packets;
+			stats->q_obytes[i] = txq->stats.bytes;
+		}
+	}
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		const struct hn_rx_queue *rxq = dev->data->rx_queues[i];
+		if (rxq == NULL)
+			continue;
+
+		stats->ipackets += rxq->stats.packets;
+		stats->ibytes += rxq->stats.bytes;
+		stats->ierrors += rxq->stats.errors;
+		stats->imissed += rxq->ring_full;
+
+		if (i < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
+			stats->q_ipackets[i] = rxq->stats.packets;
+			stats->q_ibytes[i] = rxq->stats.bytes;
+		}
+	}
+
+	stats->rx_nombuf = dev->data->rx_mbuf_alloc_failed;
+	return 0;
+}
+
+static void
+hn_dev_stats_reset(struct rte_eth_dev *dev)
+{
+	unsigned int i;
+
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		struct hn_tx_queue *txq = dev->data->tx_queues[i];
+		if (txq == NULL)
+			continue;
+		memset(&txq->stats, 0, sizeof(struct hn_stats));
+	}
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		struct hn_rx_queue *rxq = dev->data->rx_queues[i];
+		if (rxq == NULL)
+			continue;
+
+		memset(&rxq->stats, 0, sizeof(struct hn_stats));
+		rxq->ring_full = 0;
+	}
+}
+
+static int
+hn_dev_xstats_get_names(struct rte_eth_dev *dev,
+			struct rte_eth_xstat_name *xstats_names,
+			__rte_unused unsigned int limit)
+{
+	unsigned int i, t, count = 0;
+
+	if (xstats_names == NULL)
+		return dev->data->nb_tx_queues * RTE_DIM(hn_stat_strings)
+			+ dev->data->nb_rx_queues * RTE_DIM(hn_stat_strings);
+
+	/* Note: limit checked in rte_eth_xstats_names() */
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		const struct hn_tx_queue *txq = dev->data->tx_queues[i];
+
+		if (txq == NULL)
+			continue;
+
+		for (t = 0; t < RTE_DIM(hn_stat_strings); t++)
+			snprintf(xstats_names[count++].name,
+				 RTE_ETH_XSTATS_NAME_SIZE,
+				 "tx_q%u_%s", i, hn_stat_strings[t].name);
+	}
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++)  {
+		const struct hn_rx_queue *rxq = dev->data->rx_queues[i];
+
+		if (rxq == NULL)
+			continue;
+
+		for (t = 0; t < RTE_DIM(hn_stat_strings); t++)
+			snprintf(xstats_names[count++].name,
+				 RTE_ETH_XSTATS_NAME_SIZE,
+				 "rx_q%u_%s", i,
+				 hn_stat_strings[t].name);
+	}
+
+	return count;
+}
+
+static int
+hn_dev_xstats_get(struct rte_eth_dev *dev,
+		  struct rte_eth_xstat *xstats,
+		  unsigned int n)
+{
+	unsigned int i, t, count = 0;
+	const unsigned int nstats
+		= dev->data->nb_tx_queues * RTE_DIM(hn_stat_strings)
+		+ dev->data->nb_rx_queues * RTE_DIM(hn_stat_strings);
+	const char *stats;
+
+	if (n < nstats)
+		return nstats;
+
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		const struct hn_tx_queue *txq = dev->data->tx_queues[i];
+
+		if (txq == NULL)
+			continue;
+
+		stats = (const char *)&txq->stats;
+		for (t = 0; t < RTE_DIM(hn_stat_strings); t++)
+			xstats[count++].value = *(const uint64_t *)
+				(stats + hn_stat_strings[t].offset);
+	}
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		const struct hn_rx_queue *rxq = dev->data->rx_queues[i];
+
+		if (rxq == NULL)
+			continue;
+
+		stats = (const char *)&rxq->stats;
+		for (t = 0; t < RTE_DIM(hn_stat_strings); t++)
+			xstats[count++].value = *(const uint64_t *)
+				(stats + hn_stat_strings[t].offset);
+	}
+
+	return count;
+}
+
+
+/* enables testpmd to collect per queue stats. */
+static int
+hn_queue_stats_mapping_set(__rte_unused struct rte_eth_dev *eth_dev,
+			       __rte_unused uint16_t queue_id,
+			       __rte_unused uint8_t stat_idx,
+			       __rte_unused uint8_t is_rx)
+{
+	return 0;
+}
+
+static int
+hn_dev_start(struct rte_eth_dev *dev)
+{
+	struct hn_data *hv = dev->data->dev_private;
+
+	PMD_INIT_FUNC_TRACE();
+
+	/* check if lsc interrupt feature is enabled */
+	if (dev->data->dev_conf.intr_conf.lsc) {
+		PMD_DRV_LOG(ERR, "link status not supported yet");
+		return -ENOTSUP;
+	}
+
+	return hn_rndis_set_rxfilter(hv,
+				     NDIS_PACKET_TYPE_BROADCAST |
+				     NDIS_PACKET_TYPE_ALL_MULTICAST |
+				     NDIS_PACKET_TYPE_DIRECTED);
+}
+
+static void
+hn_dev_stop(struct rte_eth_dev *dev)
+{
+	struct hn_data *hv = dev->data->dev_private;
+
+	PMD_INIT_FUNC_TRACE();
+
+	hn_rndis_set_rxfilter(hv, 0);
+}
+
+static void
+hn_dev_close(struct rte_eth_dev *dev __rte_unused)
+{
+	PMD_INIT_LOG(DEBUG, "close");
+}
+
+static const struct eth_dev_ops hn_eth_dev_ops = {
+	.dev_configure		= hn_dev_configure,
+	.dev_start		= hn_dev_start,
+	.dev_stop		= hn_dev_stop,
+	.dev_close		= hn_dev_close,
+	.dev_infos_get		= hn_dev_info_get,
+	.promiscuous_enable     = hn_dev_promiscuous_enable,
+	.promiscuous_disable    = hn_dev_promiscuous_disable,
+	.allmulticast_enable    = hn_dev_allmulticast_enable,
+	.allmulticast_disable   = hn_dev_allmulticast_disable,
+	.tx_queue_setup		= hn_dev_tx_queue_setup,
+	.tx_queue_release	= hn_dev_tx_queue_release,
+	.rx_queue_setup		= hn_dev_rx_queue_setup,
+	.rx_queue_release	= hn_dev_rx_queue_release,
+	.link_update		= hn_dev_link_update,
+	.stats_get		= hn_dev_stats_get,
+	.xstats_get		= hn_dev_xstats_get,
+	.xstats_get_names	= hn_dev_xstats_get_names,
+	.stats_reset            = hn_dev_stats_reset,
+	.xstats_reset		= hn_dev_stats_reset,
+	.queue_stats_mapping_set = hn_queue_stats_mapping_set,
+};
+
+/*
+ * Setup connection between PMD and kernel.
+ */
+static int
+hn_attach(struct hn_data *hv, unsigned int mtu)
+{
+	int error;
+
+	/* Attach NVS */
+	error = hn_nvs_attach(hv, mtu);
+	if (error)
+		goto failed_nvs;
+
+	/* Attach RNDIS */
+	error = hn_rndis_attach(hv);
+	if (error)
+		goto failed_rndis;
+
+	/*
+	 * NOTE:
+	 * Under certain conditions on certain versions of Hyper-V,
+	 * the RNDIS rxfilter is _not_ zero on the hypervisor side
+	 * after the successful RNDIS initialization.
+	 */
+	hn_rndis_set_rxfilter(hv, NDIS_PACKET_TYPE_NONE);
+	return 0;
+failed_rndis:
+	hn_nvs_detach(hv);
+failed_nvs:
+	return error;
+}
+
+static void
+hn_detach(struct hn_data *hv)
+{
+	hn_nvs_detach(hv);
+	hn_rndis_detach(hv);
+}
+
+static int
+eth_hn_dev_init(struct rte_eth_dev *eth_dev)
+{
+	struct hn_data *hv = eth_dev->data->dev_private;
+	struct rte_device *device = eth_dev->device;
+	struct rte_vmbus_device *vmbus;
+	unsigned int rxr_cnt;
+	int err;
+
+	PMD_INIT_FUNC_TRACE();
+
+	vmbus = container_of(device, struct rte_vmbus_device, device);
+	eth_dev->dev_ops = &hn_eth_dev_ops;
+	eth_dev->tx_pkt_burst = &hn_xmit_pkts;
+	eth_dev->rx_pkt_burst = &hn_recv_pkts;
+
+	/*
+	 * for secondary processes, we don't initialize any further as primary
+	 * has already done this work.
+	 */
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+		return 0;
+
+	/* Since Hyper-V only supports one MAC address, just use local data */
+	eth_dev->data->mac_addrs = &hv->mac_addr;
+
+	hv->vmbus = vmbus;
+	hv->rxbuf_res = &vmbus->resource[HV_RECV_BUF_MAP];
+	hv->chim_res  = &vmbus->resource[HV_SEND_BUF_MAP];
+	hv->port_id = eth_dev->data->port_id;
+
+	/* Initialize primary channel input for control operations */
+	err = rte_vmbus_chan_open(vmbus, &hv->channels[0]);
+	if (err)
+		return err;
+
+	hv->primary = hn_rx_queue_alloc(hv, 0,
+					eth_dev->device->numa_node);
+
+	if (!hv->primary)
+		return -ENOMEM;
+
+	err = hn_attach(hv, ETHER_MTU);
+	if  (err)
+		goto failed;
+
+	err = hn_tx_pool_init(eth_dev);
+	if (err)
+		goto failed;
+
+	err = hn_rndis_get_eaddr(hv, hv->mac_addr.addr_bytes);
+	if (err)
+		goto failed;
+
+	if (hn_rndis_query_rsscaps(hv, &rxr_cnt) == 0)
+		hv->max_queues = rxr_cnt;
+	else
+		hv->max_queues = 1;
+
+	return 0;
+
+failed:
+	hn_detach(hv);
+	return err;
+}
+
+static int
+eth_hn_dev_uninit(struct rte_eth_dev *eth_dev)
+{
+	struct hn_data *hv = eth_dev->data->dev_private;
+
+	PMD_INIT_FUNC_TRACE();
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+		return 0;
+
+	hn_dev_stop(eth_dev);
+	hn_dev_close(eth_dev);
+
+	eth_dev->dev_ops = NULL;
+	eth_dev->tx_pkt_burst = NULL;
+	eth_dev->rx_pkt_burst = NULL;
+
+	hn_detach(hv);
+	rte_vmbus_chan_close(hv->primary->chan);
+	rte_free(hv->primary);
+
+	eth_dev->data->mac_addrs = NULL;
+
+	return 0;
+}
+
+static int eth_hn_probe(struct rte_vmbus_driver *drv __rte_unused,
+			    struct rte_vmbus_device *dev)
+{
+	struct rte_eth_dev *eth_dev;
+	int ret;
+
+	eth_dev = eth_dev_vmbus_allocate(dev, sizeof(struct hn_data));
+	if (!eth_dev)
+		return -ENOMEM;
+
+	ret = eth_hn_dev_init(eth_dev);
+	if (ret)
+		eth_dev_vmbus_release(eth_dev);
+
+	return ret;
+}
+
+static int eth_hn_remove(struct rte_vmbus_device *dev)
+{
+	struct rte_eth_dev *eth_dev;
+	int ret;
+
+	eth_dev = rte_eth_dev_allocated(dev->device.name);
+	if (!eth_dev)
+		return -ENODEV;
+
+	ret = eth_hn_dev_uninit(eth_dev);
+	if (ret)
+		return ret;
+
+	eth_dev_vmbus_release(eth_dev);
+	return 0;
+}
+
+/* Network device GUID */
+static const uuid_t hn_net_ids[] = {
+	/*  f8615163-df3e-46c5-913f-f2d2f965ed0e */
+	{ 0xf8, 0x61, 0x51, 0x63, 0xdf, 0x3e, 0x46, 0xc5,
+	  0x91, 0x3f, 0xf2, 0xd2, 0xf9, 0x65, 0xed, 0xe },
+	{ 0 }
+};
+
+static struct rte_vmbus_driver rte_netvsc_pmd = {
+	.id_table = hn_net_ids,
+	.probe = eth_hn_probe,
+	.remove = eth_hn_remove,
+};
+
+RTE_PMD_REGISTER_VMBUS(net_netvsc, rte_netvsc_pmd);
+RTE_PMD_REGISTER_KMOD_DEP(net_netvsc, "* uio_hv_generic");
+
+RTE_INIT(hn_init_log);
+static void
+hn_init_log(void)
+{
+	hn_logtype_init = rte_log_register("pmd.net.netvsc.init");
+	if (hn_logtype_init >= 0)
+		rte_log_set_level(hn_logtype_init, RTE_LOG_NOTICE);
+	hn_logtype_driver = rte_log_register("pmd.net.netvsc.driver");
+	if (hn_logtype_driver >= 0)
+		rte_log_set_level(hn_logtype_driver, RTE_LOG_NOTICE);
+}
diff --git a/drivers/net/netvsc/hn_logs.h b/drivers/net/netvsc/hn_logs.h
new file mode 100644
index 000000000000..f821655b83a6
--- /dev/null
+++ b/drivers/net/netvsc/hn_logs.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: BSD-3-Clause */
+
+#ifndef _HN_LOGS_H_
+#define _HN_LOGS_H_
+
+#include <rte_log.h>
+
+extern int hn_logtype_init;
+
+#define PMD_INIT_LOG(level, fmt, args...) \
+	rte_log(RTE_LOG_ ## level, hn_logtype_init, "%s(): " fmt "\n",\
+		__func__, ## args)
+#define PMD_INIT_FUNC_TRACE() PMD_INIT_LOG(DEBUG, " >>")
+
+#ifdef RTE_LIBRTE_NETVSC_DEBUG_RX
+#define PMD_RX_LOG(level, fmt, args...) \
+	RTE_LOG(level, PMD, "%s() rx: " fmt "\n", __func__, ## args)
+#else
+#define PMD_RX_LOG(level, fmt, args...) do { } while (0)
+#endif
+
+#ifdef RTE_LIBRTE_NETVSC_DEBUG_TX
+#define PMD_TX_LOG(level, fmt, args...) \
+	RTE_LOG(level, PMD, "%s() tx: " fmt "\n", __func__, ## args)
+#else
+#define PMD_TX_LOG(level, fmt, args...) do { } while (0)
+#endif
+
+
+extern int hn_logtype_driver;
+#define PMD_DRV_LOG(level, fmt, args...) \
+	rte_log(RTE_LOG_ ## level, hn_logtype_driver, "%s(): " fmt "\n", \
+		__func__, ## args)
+
+#endif /* _HN_LOGS_H_ */
diff --git a/drivers/net/netvsc/hn_nvs.c b/drivers/net/netvsc/hn_nvs.c
new file mode 100644
index 000000000000..588b2aadda01
--- /dev/null
+++ b/drivers/net/netvsc/hn_nvs.c
@@ -0,0 +1,533 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2018 Microsoft Corp.
+ * Copyright (c) 2010-2012 Citrix Inc.
+ * Copyright (c) 2012 NetApp Inc.
+ * All rights reserved.
+ */
+
+/*
+ * Network Virtualization Service.
+ */
+
+
+#include <stdint.h>
+#include <string.h>
+#include <stdio.h>
+#include <errno.h>
+#include <unistd.h>
+
+#include <rte_ethdev.h>
+#include <rte_string_fns.h>
+#include <rte_memzone.h>
+#include <rte_malloc.h>
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_ether.h>
+#include <rte_common.h>
+#include <rte_errno.h>
+#include <rte_cycles.h>
+#include <rte_memory.h>
+#include <rte_eal.h>
+#include <rte_dev.h>
+#include <rte_bus_vmbus.h>
+
+#include "hn_logs.h"
+#include "hn_var.h"
+#include "hn_nvs.h"
+
+static const uint32_t hn_nvs_version[] = {
+	NVS_VERSION_5,
+	NVS_VERSION_4,
+	NVS_VERSION_2,
+	NVS_VERSION_1
+};
+
+static int hn_nvs_req_send(struct hn_data *hv,
+			   void *req, uint32_t reqlen)
+{
+	return rte_vmbus_chan_send(hn_primary_chan(hv),
+				   VMBUS_CHANPKT_TYPE_INBAND,
+				   req, reqlen, 0,
+				   VMBUS_CHANPKT_FLAG_NONE, NULL);
+}
+
+static int
+hn_nvs_execute(struct hn_data *hv,
+	       void *req, uint32_t reqlen,
+	       void *resp, uint32_t resplen,
+	       uint32_t type)
+{
+	struct vmbus_channel *chan = hn_primary_chan(hv);
+	char buffer[NVS_RESPSIZE_MAX];
+	const struct hn_nvs_hdr *hdr;
+	uint32_t len;
+	int ret;
+
+	/* Send request to ring buffer */
+	ret = rte_vmbus_chan_send(chan, VMBUS_CHANPKT_TYPE_INBAND,
+				  req, reqlen, 0,
+				  VMBUS_CHANPKT_FLAG_RC, NULL);
+
+	if (ret) {
+		PMD_DRV_LOG(ERR, "send request failed: %d", ret);
+		return ret;
+	}
+
+ retry:
+	len = sizeof(buffer);
+	ret = rte_vmbus_chan_recv(chan, buffer, &len, NULL);
+	if (ret == -EAGAIN) {
+		rte_delay_us(HN_CHAN_INTERVAL_US);
+		goto retry;
+	}
+
+	if (ret < 0) {
+		PMD_DRV_LOG(ERR, "recv response failed: %d", ret);
+		return ret;
+	}
+
+	hdr = (struct hn_nvs_hdr *)buffer;
+	if (hdr->type != type) {
+		PMD_DRV_LOG(ERR, "unexpected NVS resp %#x, expect %#x",
+			    hdr->type, type);
+		return -EINVAL;
+	}
+
+	if (len < resplen) {
+		PMD_DRV_LOG(ERR,
+			    "invalid NVS resp len %u (expect %u)",
+			    len, resplen);
+		return -EINVAL;
+	}
+
+	memcpy(resp, buffer, resplen);
+
+	/* All pass! */
+	return 0;
+}
+
+static int
+hn_nvs_doinit(struct hn_data *hv, uint32_t nvs_ver)
+{
+	struct hn_nvs_init init;
+	struct hn_nvs_init_resp resp;
+	uint32_t status;
+	int error;
+
+	memset(&init, 0, sizeof(init));
+	init.type = NVS_TYPE_INIT;
+	init.ver_min = nvs_ver;
+	init.ver_max = nvs_ver;
+
+	error = hn_nvs_execute(hv, &init, sizeof(init),
+			       &resp, sizeof(resp),
+			       NVS_TYPE_INIT_RESP);
+	if (error)
+		return error;
+
+	status = resp.status;
+	if (status != NVS_STATUS_OK) {
+		/* Not fatal, try other versions */
+		PMD_INIT_LOG(DEBUG, "nvs init failed for ver 0x%x",
+			     nvs_ver);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int
+hn_nvs_conn_rxbuf(struct hn_data *hv)
+{
+	struct hn_nvs_rxbuf_conn conn;
+	struct hn_nvs_rxbuf_connresp resp;
+	uint32_t status;
+	int error;
+
+	/* Kernel has already setup RXBUF on primary channel. */
+
+	/*
+	 * Connect RXBUF to NVS.
+	 */
+	conn.type = NVS_TYPE_RXBUF_CONN;
+	conn.gpadl = hv->rxbuf_res->phys_addr;
+	conn.sig = NVS_RXBUF_SIG;
+	PMD_DRV_LOG(DEBUG, "connect rxbuff va=%p gpad=%#" PRIx64,
+		    hv->rxbuf_res->addr,
+		    hv->rxbuf_res->phys_addr);
+
+	error = hn_nvs_execute(hv, &conn, sizeof(conn),
+			       &resp, sizeof(resp),
+			       NVS_TYPE_RXBUF_CONNRESP);
+	if (error) {
+		PMD_DRV_LOG(ERR,
+			    "exec nvs rxbuf conn failed: %d",
+			    error);
+		goto cleanup;
+	}
+
+	status = resp.status;
+	if (status != NVS_STATUS_OK) {
+		PMD_DRV_LOG(ERR,
+			    "nvs rxbuf conn failed: %x", status);
+		error = -EIO;
+	} else if (resp.nsect != 1) {
+		PMD_DRV_LOG(ERR,
+			    "nvs rxbuf response num sections %u != 1",
+			    resp.nsect);
+		error = -EIO;
+	} else {
+		PMD_DRV_LOG(INFO,
+			    "receive buffer size %u count %u",
+			    resp.nvs_sect[0].slotsz,
+			    resp.nvs_sect[0].slotcnt);
+		hv->rxbuf_section_cnt = resp.nvs_sect[0].slotcnt;
+	}
+
+cleanup:
+	return error;
+}
+
+static void
+hn_nvs_disconn_rxbuf(struct hn_data *hv)
+{
+	struct hn_nvs_rxbuf_disconn disconn;
+	int error;
+
+	/*
+	 * Disconnect RXBUF from NVS.
+	 */
+	memset(&disconn, 0, sizeof(disconn));
+	disconn.type = NVS_TYPE_RXBUF_DISCONN;
+	disconn.sig = NVS_RXBUF_SIG;
+
+	/* NOTE: No response. */
+	error = hn_nvs_req_send(hv, &disconn, sizeof(disconn));
+	if (error) {
+		PMD_DRV_LOG(ERR,
+			    "send nvs rxbuf disconn failed: %d",
+			    error);
+	}
+
+	/*
+	 * Linger long enough for NVS to disconnect RXBUF.
+	 */
+	rte_delay_ms(200);
+}
+
+static void
+hn_nvs_disconn_chim(struct hn_data *hv)
+{
+	int error;
+
+	if (hv->chim_cnt != 0) {
+		struct hn_nvs_chim_disconn disconn;
+
+		/* Disconnect chimney sending buffer from NVS. */
+		memset(&disconn, 0, sizeof(disconn));
+		disconn.type = NVS_TYPE_CHIM_DISCONN;
+		disconn.sig = NVS_CHIM_SIG;
+
+		/* NOTE: No response. */
+		error = hn_nvs_req_send(hv, &disconn, sizeof(disconn));
+
+		if (error) {
+			PMD_DRV_LOG(ERR,
+				    "send nvs chim disconn failed: %d", error);
+		}
+
+		hv->chim_cnt = 0;
+		/*
+		 * Linger long enough for NVS to disconnect chimney
+		 * sending buffer.
+		 */
+		rte_delay_ms(200);
+	}
+}
+
+static int
+hn_nvs_conn_chim(struct hn_data *hv)
+{
+	struct hn_nvs_chim_conn chim;
+	struct hn_nvs_chim_connresp resp;
+	uint32_t sectsz;
+	unsigned long len = hv->chim_res->len;
+	int error;
+
+	/* Connect chimney sending buffer to NVS */
+	memset(&chim, 0, sizeof(chim));
+	chim.type = NVS_TYPE_CHIM_CONN;
+	chim.gpadl = hv->chim_res->phys_addr;
+	chim.sig = NVS_CHIM_SIG;
+	PMD_DRV_LOG(DEBUG, "connect send buf va=%p gpad=%#" PRIx64,
+		    hv->chim_res->addr,
+		    hv->chim_res->phys_addr);
+
+	error = hn_nvs_execute(hv, &chim, sizeof(chim),
+			       &resp, sizeof(resp),
+			       NVS_TYPE_CHIM_CONNRESP);
+	if (error) {
+		PMD_DRV_LOG(ERR, "exec nvs chim conn failed");
+		goto cleanup;
+	}
+
+	if (resp.status != NVS_STATUS_OK) {
+		PMD_DRV_LOG(ERR, "nvs chim conn failed: %x",
+			    resp.status);
+		error = -EIO;
+		goto cleanup;
+	}
+
+	sectsz = resp.sectsz;
+	if (sectsz == 0 || sectsz & (sizeof(uint32_t) - 1)) {
+		/* Can't use chimney sending buffer; done! */
+		PMD_DRV_LOG(NOTICE,
+			    "invalid chimney sending buffer section size: %u",
+			    sectsz);
+		return 0;
+	}
+
+	hv->chim_szmax = sectsz;
+	hv->chim_cnt = len / sectsz;
+
+	PMD_DRV_LOG(INFO, "send buffer %lu section size:%u, count:%u",
+		    len, hv->chim_szmax, hv->chim_cnt);
+
+	if (len % hv->chim_szmax != 0) {
+		PMD_DRV_LOG(NOTICE,
+			    "chimney sending sections are not properly aligned");
+	}
+
+	/* Done! */
+	return 0;
+
+cleanup:
+	hn_nvs_disconn_chim(hv);
+	return error;
+}
+
+/*
+ * Configure MTU and enable VLAN.
+ */
+static int
+hn_nvs_conf_ndis(struct hn_data *hv, unsigned int mtu)
+{
+	struct hn_nvs_ndis_conf conf;
+	int error;
+
+	memset(&conf, 0, sizeof(conf));
+	conf.type = NVS_TYPE_NDIS_CONF;
+	conf.mtu = mtu;
+	conf.caps = NVS_NDIS_CONF_VLAN;
+
+	/* TODO enable SRIOV */
+	//if (hv->nvs_ver >= NVS_VERSION_5)
+	//	conf.caps |= NVS_NDIS_CONF_SRIOV;
+
+	/* NOTE: No response. */
+	error = hn_nvs_req_send(hv, &conf, sizeof(conf));
+	if (error) {
+		PMD_DRV_LOG(ERR,
+			    "send nvs ndis conf failed: %d", error);
+		return error;
+	}
+
+	return 0;
+}
+
+static int
+hn_nvs_init_ndis(struct hn_data *hv)
+{
+	struct hn_nvs_ndis_init ndis;
+	int error;
+
+	memset(&ndis, 0, sizeof(ndis));
+	ndis.type = NVS_TYPE_NDIS_INIT;
+	ndis.ndis_major = NDIS_VERSION_MAJOR(hv->ndis_ver);
+	ndis.ndis_minor = NDIS_VERSION_MINOR(hv->ndis_ver);
+
+	/* NOTE: No response. */
+	error = hn_nvs_req_send(hv, &ndis, sizeof(ndis));
+	if (error)
+		PMD_DRV_LOG(ERR,
+			    "send nvs ndis init failed: %d", error);
+
+	return error;
+}
+
+static int
+hn_nvs_init(struct hn_data *hv)
+{
+	unsigned int i;
+	int error;
+
+	/*
+	 * Find the supported NVS version and set NDIS version accordingly.
+	 */
+	for (i = 0; i < RTE_DIM(hn_nvs_version); ++i) {
+		error = hn_nvs_doinit(hv, hn_nvs_version[i]);
+		if (error) {
+			PMD_INIT_LOG(DEBUG, "version %#x error %d",
+				     hn_nvs_version[i], error);
+			continue;
+		}
+
+		hv->nvs_ver = hn_nvs_version[i];
+
+		/* Set NDIS version according to NVS version. */
+		hv->ndis_ver = NDIS_VERSION_6_30;
+		if (hv->nvs_ver <= NVS_VERSION_4)
+			hv->ndis_ver = NDIS_VERSION_6_1;
+
+		PMD_INIT_LOG(DEBUG,
+			     "NVS version %#x, NDIS version %u.%u",
+			     hv->nvs_ver, NDIS_VERSION_MAJOR(hv->ndis_ver),
+			     NDIS_VERSION_MINOR(hv->ndis_ver));
+		return 0;
+	}
+
+	PMD_DRV_LOG(ERR,
+		    "no NVS compatiable version available");
+	return -ENXIO;
+}
+
+int
+hn_nvs_attach(struct hn_data *hv, unsigned int mtu)
+{
+	int error;
+
+	/*
+	 * Initialize NVS.
+	 */
+	error = hn_nvs_init(hv);
+	if (error)
+		return error;
+
+	/** Configure NDIS before initializing it. */
+	if (hv->nvs_ver >= NVS_VERSION_2) {
+		error = hn_nvs_conf_ndis(hv, mtu);
+		if (error)
+			return error;
+	}
+
+	/*
+	 * Initialize NDIS.
+	 */
+	error = hn_nvs_init_ndis(hv);
+	if (error)
+		return error;
+
+	/*
+	 * Connect RXBUF.
+	 */
+	error = hn_nvs_conn_rxbuf(hv);
+	if (error)
+		return error;
+
+	/*
+	 * Connect chimney sending buffer.
+	 */
+	error = hn_nvs_conn_chim(hv);
+	if (error) {
+		hn_nvs_disconn_rxbuf(hv);
+		return error;
+	}
+
+	return 0;
+}
+
+void
+hn_nvs_detach(struct hn_data *hv __rte_unused)
+{
+	PMD_INIT_FUNC_TRACE();
+
+	/* NOTE: there are no requests to stop the NVS. */
+	hn_nvs_disconn_rxbuf(hv);
+	hn_nvs_disconn_chim(hv);
+}
+
+/*
+ * Ack the consumed RXBUF associated w/ this channel packet,
+ * so that this RXBUF can be recycled by the hypervisor.
+ */
+void
+hn_nvs_ack_rxbuf(struct hn_rx_queue *rxq, uint64_t tid)
+{
+	unsigned int retries = 0;
+	struct hn_nvs_rndis_ack ack = {
+		.type = NVS_TYPE_RNDIS_ACK,
+		.status = NVS_STATUS_OK,
+	};
+	int error;
+
+ again:
+	error = rte_vmbus_chan_send(rxq->chan, VMBUS_CHANPKT_TYPE_COMP,
+				    &ack, sizeof(ack), tid,
+				    VMBUS_CHANPKT_FLAG_NONE, NULL);
+
+	if (error == 0)
+		return;
+
+	if (error == -EAGAIN) {
+		/*
+		 * NOTE:
+		 * This should _not_ happen in real world, since the
+		 * consumption of the TX bufring from the TX path is
+		 * controlled.
+		 */
+		PMD_DRV_LOG(DEBUG, "RXBUF ack retry");
+		if (++retries < 10) {
+			rte_delay_ms(1);
+			goto again;
+		}
+	}
+	/* RXBUF leaks! */
+	PMD_DRV_LOG(ERR, "RXBUF ack failed");
+}
+
+int
+hn_nvs_alloc_subchans(struct hn_data *hv, uint32_t *nsubch)
+{
+	struct hn_nvs_subch_req req;
+	struct hn_nvs_subch_resp resp;
+	int error;
+
+	memset(&req, 0, sizeof(req));
+	req.type = NVS_TYPE_SUBCH_REQ;
+	req.op = NVS_SUBCH_OP_ALLOC;
+	req.nsubch = *nsubch;
+
+	error = hn_nvs_execute(hv, &req, sizeof(req),
+			       &resp, sizeof(resp),
+			       NVS_TYPE_SUBCH_RESP);
+	if (error)
+		return error;
+
+	if (resp.status != NVS_STATUS_OK) {
+		PMD_INIT_LOG(ERR,
+			     "nvs subch alloc failed: %#x",
+			     resp.status);
+		return -EIO;
+	}
+
+	if (resp.nsubch > *nsubch) {
+		PMD_INIT_LOG(NOTICE,
+			     "%u subchans are allocated, requested %u",
+			     resp.nsubch, *nsubch);
+	}
+	*nsubch = resp.nsubch;
+
+	return 0;
+}
+
+void
+hn_nvs_set_datapath(struct hn_data *hv, uint32_t path)
+{
+	struct hn_nvs_datapath dp;
+
+	memset(&dp, 0, sizeof(dp));
+	dp.type = NVS_TYPE_SET_DATAPATH;
+	dp.active_path = path;
+
+	hn_nvs_req_send(hv, &dp, sizeof(dp));
+}
diff --git a/drivers/net/netvsc/hn_nvs.h b/drivers/net/netvsc/hn_nvs.h
new file mode 100644
index 000000000000..8d59b39adbe8
--- /dev/null
+++ b/drivers/net/netvsc/hn_nvs.h
@@ -0,0 +1,243 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2018 Microsoft Corp.
+ * All rights reserved.
+ */
+
+/*
+ * The indirection table message is the largest message
+ * received from host, and that is 112 bytes.
+ */
+#define NVS_RESPSIZE_MAX	256
+
+/*
+ * NDIS protocol version numbers
+ */
+#define NDIS_VERSION_6_1		0x00060001
+#define NDIS_VERSION_6_20		0x00060014
+#define NDIS_VERSION_6_30		0x0006001e
+#define NDIS_VERSION_MAJOR(ver)	(((ver) & 0xffff0000) >> 16)
+#define NDIS_VERSION_MINOR(ver)	((ver) & 0xffff)
+
+/*
+ * NVS versions.
+ */
+#define NVS_VERSION_1		0x00002
+#define NVS_VERSION_2		0x30002
+#define NVS_VERSION_4		0x40000
+#define NVS_VERSION_5		0x50000
+
+#define NVS_RXBUF_SIG		0xcafe
+#define NVS_CHIM_SIG			0xface
+
+#define NVS_CHIM_IDX_INVALID		0xffffffff
+
+#define NVS_RNDIS_MTYPE_DATA		0
+#define NVS_RNDIS_MTYPE_CTRL		1
+
+/*
+ * NVS message transacion status codes.
+ */
+#define NVS_STATUS_OK		1
+#define NVS_STATUS_FAILED		2
+
+/*
+ * NVS request/response message types.
+ */
+#define NVS_TYPE_INIT		1
+#define NVS_TYPE_INIT_RESP	2
+
+#define NVS_TYPE_NDIS_INIT	100
+#define NVS_TYPE_RXBUF_CONN	101
+#define NVS_TYPE_RXBUF_CONNRESP	102
+#define NVS_TYPE_RXBUF_DISCONN	103
+#define NVS_TYPE_CHIM_CONN	104
+#define NVS_TYPE_CHIM_CONNRESP	105
+#define NVS_TYPE_CHIM_DISCONN	106
+#define NVS_TYPE_RNDIS		107
+#define NVS_TYPE_RNDIS_ACK	108
+
+#define NVS_TYPE_NDIS_CONF	125
+#define NVS_TYPE_VFASSOC_NOTE	128	/* notification */
+#define NVS_TYPE_SET_DATAPATH	129
+#define NVS_TYPE_SUBCH_REQ	133
+#define NVS_TYPE_SUBCH_RESP	133	/* same as SUBCH_REQ */
+#define NVS_TYPE_TXTBL_NOTE	134	/* notification */
+
+
+/* NVS message common header */
+struct hn_nvs_hdr {
+	uint32_t	type;
+} __rte_packed;
+
+struct hn_nvs_init {
+	uint32_t	type;	/* NVS_TYPE_INIT */
+	uint32_t	ver_min;
+	uint32_t	ver_max;
+	uint8_t		rsvd[20];
+} __rte_packed;
+
+struct hn_nvs_init_resp {
+	uint32_t	type;	/* NVS_TYPE_INIT_RESP */
+	uint32_t	ver;	/* deprecated */
+	uint32_t	rsvd;
+	uint32_t	status;	/* NVS_STATUS_ */
+} __rte_packed;
+
+/* No response */
+struct hn_nvs_ndis_conf {
+	uint32_t	type;	/* NVS_TYPE_NDIS_CONF */
+	uint32_t	mtu;
+	uint32_t	rsvd;
+	uint64_t	caps;	/* NVS_NDIS_CONF_ */
+	uint8_t		rsvd1[12];
+} __rte_packed;
+
+#define NVS_NDIS_CONF_SRIOV		0x0004
+#define NVS_NDIS_CONF_VLAN		0x0008
+
+/* No response */
+struct hn_nvs_ndis_init {
+	uint32_t	type;	/* NVS_TYPE_NDIS_INIT */
+	uint32_t	ndis_major;	/* NDIS_VERSION_MAJOR_ */
+	uint32_t	ndis_minor;	/* NDIS_VERSION_MINOR_ */
+	uint8_t		rsvd[20];
+} __rte_packed;
+
+#define NVS_DATAPATH_SYNTHETIC	0
+#define NVS_DATAPATH_VF		1
+
+/* No response */
+struct hn_nvs_datapath {
+	uint32_t	type;	/* NVS_TYPE_SET_DATAPATH */
+	uint32_t	active_path;/* NVS_DATAPATH_* */
+	uint32_t	rsvd[6];
+} __rte_packed;
+
+struct hn_nvs_rxbuf_conn {
+	uint32_t	type;	/* NVS_TYPE_RXBUF_CONN */
+	uint32_t	gpadl;	/* RXBUF vmbus GPADL */
+	uint16_t	sig;	/* NVS_RXBUF_SIG */
+	uint8_t		rsvd[22];
+} __rte_packed;
+
+struct hn_nvs_rxbuf_sect {
+	uint32_t	start;
+	uint32_t	slotsz;
+	uint32_t	slotcnt;
+	uint32_t	end;
+} __rte_packed;
+
+struct hn_nvs_rxbuf_connresp {
+	uint32_t	type;	/* NVS_TYPE_RXBUF_CONNRESP */
+	uint32_t	status;	/* NVS_STATUS_ */
+	uint32_t	nsect;	/* # of elem in nvs_sect */
+	struct hn_nvs_rxbuf_sect nvs_sect[1];
+} __rte_packed;
+
+/* No response */
+struct hn_nvs_rxbuf_disconn {
+	uint32_t	type;	/* NVS_TYPE_RXBUF_DISCONN */
+	uint16_t	sig;	/* NVS_RXBUF_SIG */
+	uint8_t		rsvd[26];
+} __rte_packed;
+
+struct hn_nvs_chim_conn {
+	uint32_t	type;	/* NVS_TYPE_CHIM_CONN */
+	uint32_t	gpadl;	/* chimney buf vmbus GPADL */
+	uint16_t	sig;	/* NDIS_NVS_CHIM_SIG */
+	uint8_t		rsvd[22];
+} __rte_packed;
+
+struct hn_nvs_chim_connresp {
+	uint32_t	type;	/* NVS_TYPE_CHIM_CONNRESP */
+	uint32_t	status;	/* NVS_STATUS_ */
+	uint32_t	sectsz;	/* section size */
+} __rte_packed;
+
+/* No response */
+struct hn_nvs_chim_disconn {
+	uint32_t	type;	/* NVS_TYPE_CHIM_DISCONN */
+	uint16_t	sig;	/* NVS_CHIM_SIG */
+	uint8_t		rsvd[26];
+} __rte_packed;
+
+#define NVS_SUBCH_OP_ALLOC		1
+
+struct hn_nvs_subch_req {
+	uint32_t	type;	/* NVS_TYPE_SUBCH_REQ */
+	uint32_t	op;	/* NVS_SUBCH_OP_ */
+	uint32_t	nsubch;
+	uint8_t		rsvd[20];
+} __rte_packed;
+
+struct hn_nvs_subch_resp {
+	uint32_t	type;	/* NVS_TYPE_SUBCH_RESP */
+	uint32_t	status;	/* NVS_STATUS_ */
+	uint32_t	nsubch;
+	uint8_t		rsvd[20];
+} __rte_packed;
+
+struct hn_nvs_rndis {
+	uint32_t	type;	/* NVS_TYPE_RNDIS */
+	uint32_t	rndis_mtype;/* NVS_RNDIS_MTYPE_ */
+	/*
+	 * Chimney sending buffer index and size.
+	 *
+	 * NOTE:
+	 * If nvs_chim_idx is set to NVS_CHIM_IDX_INVALID
+	 * and nvs_chim_sz is set to 0, then chimney sending
+	 * buffer is _not_ used by this RNDIS message.
+	 */
+	uint32_t	chim_idx;
+	uint32_t	chim_sz;
+	uint8_t		rsvd[16];
+} __rte_packed;
+
+struct hn_nvs_rndis_ack {
+	uint32_t	type;	/* NVS_TYPE_RNDIS_ACK */
+	uint32_t	status;	/* NVS_STATUS_ */
+	uint8_t		rsvd[24];
+} __rte_packed;
+
+
+int	hn_nvs_attach(struct hn_data *hv, unsigned int mtu);
+void	hn_nvs_detach(struct hn_data *hv);
+void	hn_nvs_ack_rxbuf(struct hn_rx_queue *rxq, uint64_t tid);
+int	hn_nvs_alloc_subchans(struct hn_data *hv, uint32_t *nsubch);
+void	hn_nvs_set_datapath(struct hn_data *hv, uint32_t path);
+
+static inline int
+hn_nvs_send(struct vmbus_channel *chan, uint16_t flags,
+	    void *nvs_msg, int nvs_msglen, uintptr_t sndc,
+	    bool *need_sig)
+{
+	return rte_vmbus_chan_send(chan, VMBUS_CHANPKT_TYPE_INBAND,
+				   nvs_msg, nvs_msglen, (uint64_t)sndc,
+				   flags, need_sig);
+}
+
+static inline int
+hn_nvs_send_sglist(struct vmbus_channel *chan,
+		   struct vmbus_gpa sg[], int sglen,
+		   void *nvs_msg, int nvs_msglen,
+		   uintptr_t sndc, bool *need_sig)
+{
+	return rte_vmbus_chan_send_sglist(chan, sg, sglen, nvs_msg, nvs_msglen,
+					  (uint64_t)sndc, need_sig);
+}
+
+static inline int
+hn_nvs_send_rndis_sglist(struct vmbus_channel *chan, uint32_t rndis_mtype,
+			 uintptr_t sndc, struct vmbus_gpa sg[], int sgcnt,
+			 bool *need_sig)
+{
+	struct hn_nvs_rndis rndis = {
+		.type = NVS_TYPE_RNDIS,
+		.rndis_mtype = rndis_mtype,
+		.chim_idx = NVS_CHIM_IDX_INVALID,
+		.chim_sz = 0,
+	};
+
+	return hn_nvs_send_sglist(chan, sg, sgcnt,
+				  &rndis, sizeof(rndis), sndc, need_sig);
+}
diff --git a/drivers/net/netvsc/hn_rndis.c b/drivers/net/netvsc/hn_rndis.c
new file mode 100644
index 000000000000..31e3213fc1ae
--- /dev/null
+++ b/drivers/net/netvsc/hn_rndis.c
@@ -0,0 +1,1101 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2009-2018 Microsoft Corp.
+ * Copyright (c) 2010-2012 Citrix Inc.
+ * Copyright (c) 2012 NetApp Inc.
+ * All rights reserved.
+ */
+
+#include <stdint.h>
+#include <string.h>
+#include <stdio.h>
+#include <errno.h>
+#include <unistd.h>
+
+#include <rte_ethdev.h>
+#include <rte_string_fns.h>
+#include <rte_memzone.h>
+#include <rte_malloc.h>
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_ether.h>
+#include <rte_common.h>
+#include <rte_errno.h>
+#include <rte_cycles.h>
+#include <rte_memory.h>
+#include <rte_eal.h>
+#include <rte_dev.h>
+#include <rte_atomic.h>
+#include <rte_memory.h>
+#include <rte_bus_vmbus.h>
+
+#include "hn_logs.h"
+#include "hn_var.h"
+#include "hn_nvs.h"
+#include "hn_rndis.h"
+#include "ndis.h"
+
+#define HN_RNDIS_XFER_SIZE		0x4000
+
+#define HN_NDIS_TXCSUM_CAP_IP4		\
+	(NDIS_TXCSUM_CAP_IP4 | NDIS_TXCSUM_CAP_IP4OPT)
+#define HN_NDIS_TXCSUM_CAP_TCP4		\
+	(NDIS_TXCSUM_CAP_TCP4 | NDIS_TXCSUM_CAP_TCP4OPT)
+#define HN_NDIS_TXCSUM_CAP_TCP6		\
+	(NDIS_TXCSUM_CAP_TCP6 | NDIS_TXCSUM_CAP_TCP6OPT | \
+	 NDIS_TXCSUM_CAP_IP6EXT)
+#define HN_NDIS_TXCSUM_CAP_UDP6		\
+	(NDIS_TXCSUM_CAP_UDP6 | NDIS_TXCSUM_CAP_IP6EXT)
+#define HN_NDIS_LSOV2_CAP_IP6		\
+	(NDIS_LSOV2_CAP_IP6EXT | NDIS_LSOV2_CAP_TCP6OPT)
+
+/* Get unique request id */
+static inline uint32_t
+hn_rndis_rid(struct hn_data *hv)
+{
+	uint32_t rid;
+
+	do {
+		rid = rte_atomic32_add_return(&hv->rndis_req_id, 1);
+	} while (rid == 0);
+
+	return rid;
+}
+
+static void *hn_rndis_alloc(struct hn_data *hv, size_t size)
+{
+	return rte_zmalloc_socket("RNDIS", size, PAGE_SIZE,
+				 hv->vmbus->device.numa_node);
+}
+
+#ifdef RTE_LIBRTE_NETVSC_DEBUG_DUMP
+static void hn_rndis_dump(const void *buf)
+{
+	const union {
+		struct rndis_msghdr hdr;
+		struct rndis_packet_msg pkt;
+		struct rndis_init_req init_request;
+		struct rndis_init_comp init_complete;
+		struct rndis_halt_req halt;
+		struct rndis_query_req query_request;
+		struct rndis_query_comp query_complete;
+		struct rndis_set_req set_request;
+		struct rndis_set_comp set_complete;
+		struct rndis_reset_req reset_request;
+		struct rndis_reset_comp reset_complete;
+		struct rndis_keepalive_req keepalive_request;
+		struct rndis_keepalive_comp keepalive_complete;
+		struct rndis_status_msg indicate_status;
+	} *rndis_msg = buf;
+
+	switch (rndis_msg->hdr.type) {
+	case RNDIS_PACKET_MSG: {
+		const struct rndis_pktinfo *ppi;
+		unsigned int ppi_len;
+
+		RTE_LOG(DEBUG, PMD,
+			    "RNDIS_MSG_PACKET (len %u, data %u:%u, # oob %u %u:%u, pkt %u:%u)\n",
+			    rndis_msg->pkt.len,
+			    rndis_msg->pkt.dataoffset,
+			    rndis_msg->pkt.datalen,
+			    rndis_msg->pkt.oobdataelements,
+			    rndis_msg->pkt.oobdataoffset,
+			    rndis_msg->pkt.oobdatalen,
+			    rndis_msg->pkt.pktinfooffset,
+			    rndis_msg->pkt.pktinfolen);
+
+		ppi = (const struct rndis_pktinfo *)
+			((const char *)buf
+			 + RNDIS_PACKET_MSG_OFFSET_ABS(rndis_msg->pkt.pktinfooffset));
+
+		ppi_len = rndis_msg->pkt.pktinfolen;
+		while (ppi_len > 0) {
+			const void *ppi_data;
+
+			ppi_data = ppi->data;
+
+			RTE_LOG(DEBUG, PMD,
+				"    PPI (size %u, type %u, offs %u data %#x)\n",
+				ppi->size, ppi->type, ppi->offset,
+				*(const uint32_t *)ppi_data);
+			if (ppi->size == 0)
+				break;
+			ppi_len -= ppi->size;
+			ppi = (const struct rndis_pktinfo *)
+				((const char *)ppi + ppi->size);
+		}
+		break;
+	}
+	case RNDIS_INITIALIZE_MSG:
+		RTE_LOG(DEBUG, PMD,
+			    "RNDIS_MSG_INIT (len %u id %#x, ver %u.%u max xfer %u)\n",
+			    rndis_msg->init_request.len,
+			    rndis_msg->init_request.rid,
+			    rndis_msg->init_request.ver_major,
+			    rndis_msg->init_request.ver_minor,
+			    rndis_msg->init_request.max_xfersz);
+		break;
+
+	case RNDIS_INITIALIZE_CMPLT:
+		RTE_LOG(DEBUG, PMD,
+			    "RNDIS_MSG_INIT_C (len %u, id %#x, status 0x%x, vers %u.%u, "
+			    "flags %d, max xfer %u, max pkts %u, aligned %u)\n",
+			    rndis_msg->init_complete.len,
+			    rndis_msg->init_complete.rid,
+			    rndis_msg->init_complete.status,
+			    rndis_msg->init_complete.ver_major,
+			    rndis_msg->init_complete.ver_minor,
+			    rndis_msg->init_complete.devflags,
+			    rndis_msg->init_complete.pktmaxsz,
+			    rndis_msg->init_complete.pktmaxcnt,
+			    rndis_msg->init_complete.align);
+		break;
+
+	case RNDIS_HALT_MSG:
+		RTE_LOG(DEBUG, PMD,
+			    "RNDIS_HALT (len %u id %#x)\n",
+			    rndis_msg->halt.len, rndis_msg->halt.rid);
+		break;
+
+	case RNDIS_QUERY_MSG:
+		RTE_LOG(DEBUG, PMD,
+			    "RNDIS_QUERY (len %u, id %#x, oid %#x, info %u:%u)\n",
+			    rndis_msg->query_request.len,
+			    rndis_msg->query_request.rid,
+			    rndis_msg->query_request.oid,
+			    rndis_msg->query_request.infobuflen,
+			    rndis_msg->query_request.infobufoffset);
+		break;
+
+
+	case RNDIS_QUERY_CMPLT:
+		RTE_LOG(DEBUG, PMD,
+			    "RNDIS_MSG_QUERY_C (len %u, id %#x, status 0x%x, buf %u:%u)\n",
+			    rndis_msg->query_complete.len,
+			    rndis_msg->query_complete.rid,
+			    rndis_msg->query_complete.status,
+			    rndis_msg->query_complete.infobuflen,
+			    rndis_msg->query_complete.infobufoffset);
+		break;
+
+	case RNDIS_SET_MSG:
+		RTE_LOG(DEBUG, PMD,
+			    "RNDIS_SET (len %u, id %#x, oid %#x, info %u:%u)\n",
+			    rndis_msg->set_request.len,
+			    rndis_msg->set_request.rid,
+			    rndis_msg->set_request.oid,
+			    rndis_msg->set_request.infobuflen,
+			    rndis_msg->set_request.infobufoffset);
+		break;
+
+	case RNDIS_SET_CMPLT:
+		RTE_LOG(DEBUG, PMD,
+			    "RNDIS_MSG_SET_C (len %u, id 0x%x, status 0x%x)\n",
+			    rndis_msg->set_complete.len,
+			    rndis_msg->set_complete.rid,
+			    rndis_msg->set_complete.status);
+		break;
+
+	case RNDIS_INDICATE_STATUS_MSG:
+		RTE_LOG(DEBUG, PMD,
+			    "RNDIS_MSG_INDICATE (len %u, status %#x, buf len %u, buf offset %u)\n",
+			    rndis_msg->indicate_status.len,
+			    rndis_msg->indicate_status.status,
+			    rndis_msg->indicate_status.stbuflen,
+			    rndis_msg->indicate_status.stbufoffset);
+		break;
+
+	case RNDIS_RESET_MSG:
+		RTE_LOG(DEBUG, PMD,
+			    "RNDIS_RESET (len %u, id %#x)\n",
+			    rndis_msg->reset_request.len,
+			    rndis_msg->reset_request.rid);
+		break;
+
+	case RNDIS_RESET_CMPLT:
+		RTE_LOG(DEBUG, PMD,
+			    "RNDIS_RESET_C (len %u, status %#x address %#x)\n",
+			    rndis_msg->reset_complete.len,
+			    rndis_msg->reset_complete.status,
+			    rndis_msg->reset_complete.adrreset);
+		break;
+
+	case RNDIS_KEEPALIVE_MSG:
+		RTE_LOG(DEBUG, PMD,
+			    "RNDIS_KEEPALIVE (len %u, id %#x)\n",
+			    rndis_msg->keepalive_request.len,
+			    rndis_msg->keepalive_request.rid);
+		break;
+
+	case RNDIS_KEEPALIVE_CMPLT:
+		RTE_LOG(DEBUG, PMD,
+			    "RNDIS_KEEPALIVE_C (len %u, id %#x address %#x)\n",
+			    rndis_msg->keepalive_complete.len,
+			    rndis_msg->keepalive_complete.rid,
+			    rndis_msg->keepalive_complete.status);
+		break;
+
+	default:
+		RTE_LOG(DEBUG, PMD,
+			    "RNDIS type %#x len %u\n",
+			    rndis_msg->hdr.type,
+			    rndis_msg->hdr.len);
+		break;
+	}
+}
+#else
+#define hn_rndis_dump(buf)
+#endif
+
+static int hn_nvs_send_rndis_ctrl(struct vmbus_channel *chan,
+				  const void *req, uint32_t reqlen)
+
+{
+	struct vmbus_gpa sg;
+	rte_iova_t addr;
+
+	addr = rte_malloc_virt2iova(req);
+	if (unlikely(addr == RTE_BAD_IOVA)) {
+		PMD_DRV_LOG(ERR, "RNDIS send request can not get phys addr");
+		return -EINVAL;
+	}
+
+	if (unlikely(reqlen > PAGE_SIZE)) {
+		PMD_DRV_LOG(ERR, "RNDIS request %u greater than page size",
+			    reqlen);
+		return -EINVAL;
+	}
+
+	sg.page = addr / PAGE_SIZE;
+	sg.ofs  = addr & PAGE_MASK;
+	sg.len  = reqlen;
+
+	if (sg.ofs + reqlen >  PAGE_SIZE) {
+		PMD_DRV_LOG(ERR, "RNDIS request crosses page bounary");
+		return -EINVAL;
+	}
+
+	hn_rndis_dump(req);
+
+	return hn_nvs_send_rndis_sglist(chan, NVS_RNDIS_MTYPE_CTRL,
+					0, &sg, 1, NULL);
+}
+
+void hn_rndis_link_status(struct hn_data *hv __rte_unused, const void *msg)
+{
+	const struct rndis_status_msg *indicate = msg;
+
+	hn_rndis_dump(msg);
+
+	PMD_DRV_LOG(DEBUG, "link status %#x", indicate->status);
+
+	switch (indicate->status) {
+	case RNDIS_STATUS_LINK_SPEED_CHANGE:
+	case RNDIS_STATUS_NETWORK_CHANGE:
+	case RNDIS_STATUS_TASK_OFFLOAD_CURRENT_CONFIG:
+		/* ignore not in DPDK API */
+		break;
+
+	case RNDIS_STATUS_MEDIA_CONNECT:
+	case RNDIS_STATUS_MEDIA_DISCONNECT:
+		/* TODO handle as LSC interrupt  */
+		break;
+	default:
+		PMD_DRV_LOG(NOTICE, "unknown RNDIS indication: %#x",
+			    indicate->status);
+	}
+}
+
+/* Callback from hn_process_events when response is visible */
+void hn_rndis_receive_response(struct hn_data *hv,
+			       const void *data, uint32_t len)
+{
+	const struct rndis_init_comp *hdr = data;
+
+	hn_rndis_dump(data);
+
+	if (len < sizeof(3 * sizeof(uint32_t))) {
+		PMD_DRV_LOG(ERR,
+			    "missing RNDIS header %u", len);
+		return;
+	}
+
+	if (len < hdr->len) {
+		PMD_DRV_LOG(ERR,
+			    "truncated RNDIS response %u", len);
+		return;
+	}
+
+	if  (len > sizeof(hv->rndis_resp)) {
+		PMD_DRV_LOG(NOTICE,
+			    "RNDIS response exceeds buffer");
+		len = sizeof(hv->rndis_resp);
+	}
+
+	if (hdr->rid == 0) {
+		PMD_DRV_LOG(NOTICE,
+			    "RNDIS response id zero!");
+	}
+
+	memcpy(hv->rndis_resp, data, len);
+
+	/* make sure response copied before update */
+	rte_smp_wmb();
+
+	if (rte_atomic32_cmpset(&hv->rndis_pending, hdr->rid, 0) == 0) {
+		PMD_DRV_LOG(ERR,
+			    "received id %#x pending id %#x",
+			    hdr->rid, (uint32_t)hv->rndis_pending);
+	}
+}
+
+/* Do request/response transaction */
+static int hn_rndis_exec1(struct hn_data *hv,
+			  const void *req, uint32_t reqlen,
+			  void *comp, uint32_t comp_len)
+{
+	const struct rndis_halt_req *hdr = req;
+	uint32_t rid = hdr->rid;
+	struct vmbus_channel *chan = hn_primary_chan(hv);
+	int error;
+
+	if (comp_len > sizeof(hv->rndis_resp)) {
+		PMD_DRV_LOG(ERR,
+			    "Expected completion size %u exceeds buffer %lu",
+			    comp_len, sizeof(hv->rndis_resp));
+		return -EIO;
+	}
+
+	if (comp != NULL &&
+	    rte_atomic32_cmpset(&hv->rndis_pending, 0, rid) == 0) {
+		PMD_DRV_LOG(ERR,
+			    "Request already pending");
+		return -EBUSY;
+	}
+
+	error = hn_nvs_send_rndis_ctrl(chan, req, reqlen);
+	if (error) {
+		PMD_DRV_LOG(ERR, "RNDIS ctrl send failed: %d", error);
+		return error;
+	}
+
+	if (comp) {
+		/* Poll primary channel until response received */
+		while (hv->rndis_pending == rid)
+			hn_process_events(hv, 0);
+
+		memcpy(comp, hv->rndis_resp, comp_len);
+	}
+
+	return 0;
+}
+
+/* Do transaction and validate response */
+static int hn_rndis_execute(struct hn_data *hv, uint32_t rid,
+			    const void *req, uint32_t reqlen,
+			    void *comp, uint32_t comp_len, uint32_t comp_type)
+{
+	const struct rndis_comp_hdr *hdr = comp;
+	int ret;
+
+	memset(comp, 0, comp_len);
+
+	ret = hn_rndis_exec1(hv, req, reqlen, comp, comp_len);
+	if (ret < 0)
+		return ret;
+	/*
+	 * Check this RNDIS complete message.
+	 */
+	if (unlikely(hdr->type != comp_type)) {
+		PMD_DRV_LOG(ERR,
+			    "unexpected RNDIS response complete %#x expect %#x",
+			    hdr->type, comp_type);
+
+		return -ENXIO;
+	}
+	if (unlikely(hdr->rid != rid)) {
+		PMD_DRV_LOG(ERR,
+			    "RNDIS comp rid mismatch %#x, expect %#x",
+			    hdr->rid, rid);
+		return -EINVAL;
+	}
+
+	/* All pass! */
+	return 0;
+}
+
+static int
+hn_rndis_query(struct hn_data *hv, uint32_t oid,
+	       const void *idata, uint32_t idlen,
+	       void *odata, uint32_t odlen)
+{
+	struct rndis_query_req *req;
+	struct rndis_query_comp *comp;
+	uint32_t reqlen, comp_len;
+	int error = -EIO;
+	unsigned int ofs;
+	uint32_t rid;
+
+	reqlen = sizeof(*req) + idlen;
+	req = hn_rndis_alloc(hv, reqlen);
+	if (req == NULL)
+		return -ENOMEM;
+
+	comp_len = sizeof(*comp) + odlen;
+	comp = rte_zmalloc("QUERY", comp_len, 0);
+	if (!comp) {
+		error = -ENOMEM;
+		goto done;
+	}
+	comp->status = RNDIS_STATUS_PENDING;
+
+	rid = hn_rndis_rid(hv);
+
+	req->type = RNDIS_QUERY_MSG;
+	req->len = reqlen;
+	req->rid = rid;
+	req->oid = oid;
+	req->infobufoffset = RNDIS_QUERY_REQ_INFOBUFOFFSET;
+	req->infobuflen = idlen;
+
+	/* Input data immediately follows RNDIS query. */
+	memcpy(req + 1, idata, idlen);
+
+	error = hn_rndis_execute(hv, rid, req, reqlen,
+				 comp, comp_len, RNDIS_QUERY_CMPLT);
+
+	if (error)
+		goto done;
+
+	if (comp->status != RNDIS_STATUS_SUCCESS) {
+		PMD_DRV_LOG(ERR, "RNDIS query 0x%08x failed: status 0x%08x",
+			    oid, comp->status);
+		error = -EINVAL;
+		goto done;
+	}
+
+	if (comp->infobuflen == 0 || comp->infobufoffset == 0) {
+		/* No output data! */
+		PMD_DRV_LOG(ERR, "RNDIS query 0x%08x, no data", oid);
+		error = 0;
+		goto done;
+	}
+
+	/*
+	 * Check output data length and offset.
+	 */
+	/* ofs is the offset from the beginning of comp. */
+	ofs = RNDIS_QUERY_COMP_INFOBUFOFFSET_ABS(comp->infobufoffset);
+	if (ofs < sizeof(*comp) || ofs + comp->infobuflen > comp_len) {
+		PMD_DRV_LOG(ERR, "RNDIS query invalid comp ib off/len, %u/%u",
+			    comp->infobufoffset, comp->infobuflen);
+		error = -EINVAL;
+		goto done;
+	}
+
+	/* Save output data. */
+	if (comp->infobuflen < odlen)
+		odlen = comp->infobuflen;
+
+	/* ofs is the offset from the beginning of comp. */
+	memcpy(odata, (const char *)comp + ofs, odlen);
+
+	error = 0;
+done:
+	rte_free(comp);
+	rte_free(req);
+	return error;
+}
+
+static int
+hn_rndis_halt(struct hn_data *hv)
+{
+	struct rndis_halt_req *halt;
+
+	halt = hn_rndis_alloc(hv, sizeof(*halt));
+	if (halt == NULL)
+		return -ENOMEM;
+
+	halt->type = RNDIS_HALT_MSG;
+	halt->len = sizeof(*halt);
+	halt->rid = hn_rndis_rid(hv);
+
+	/* No RNDIS completion; rely on NVS message send completion */
+	hn_rndis_exec1(hv, halt, sizeof(*halt), NULL, 0);
+
+	rte_free(halt);
+
+	PMD_INIT_LOG(DEBUG, "RNDIS halt done");
+	return 0;
+}
+
+static int
+hn_rndis_query_hwcaps(struct hn_data *hv, struct ndis_offload *caps)
+{
+	struct ndis_offload in;
+	uint32_t caps_len, size;
+	int error;
+
+	memset(caps, 0, sizeof(*caps));
+	memset(&in, 0, sizeof(in));
+	in.ndis_hdr.ndis_type = NDIS_OBJTYPE_OFFLOAD;
+
+	if (hv->ndis_ver >= NDIS_VERSION_6_30) {
+		in.ndis_hdr.ndis_rev = NDIS_OFFLOAD_REV_3;
+		size = NDIS_OFFLOAD_SIZE;
+	} else if (hv->ndis_ver >= NDIS_VERSION_6_1) {
+		in.ndis_hdr.ndis_rev = NDIS_OFFLOAD_REV_2;
+		size = NDIS_OFFLOAD_SIZE_6_1;
+	} else {
+		in.ndis_hdr.ndis_rev = NDIS_OFFLOAD_REV_1;
+		size = NDIS_OFFLOAD_SIZE_6_0;
+	}
+	in.ndis_hdr.ndis_size = size;
+
+	caps_len = NDIS_OFFLOAD_SIZE;
+	error = hn_rndis_query(hv, OID_TCP_OFFLOAD_HARDWARE_CAPABILITIES,
+			       &in, size, caps, caps_len);
+	if (error)
+		return error;
+
+	/* Preliminary verification. */
+	if (caps->ndis_hdr.ndis_type != NDIS_OBJTYPE_OFFLOAD) {
+		PMD_DRV_LOG(NOTICE, "invalid NDIS objtype 0x%02x",
+			    caps->ndis_hdr.ndis_type);
+		return -EINVAL;
+	}
+	if (caps->ndis_hdr.ndis_rev < NDIS_OFFLOAD_REV_1) {
+		PMD_DRV_LOG(NOTICE, "invalid NDIS objrev 0x%02x",
+			    caps->ndis_hdr.ndis_rev);
+		return -EINVAL;
+	}
+	if (caps->ndis_hdr.ndis_size > caps_len) {
+		PMD_DRV_LOG(NOTICE, "invalid NDIS objsize %u, data size %u",
+			    caps->ndis_hdr.ndis_size, caps_len);
+		return -EINVAL;
+	} else if (caps->ndis_hdr.ndis_size < NDIS_OFFLOAD_SIZE_6_0) {
+		PMD_DRV_LOG(NOTICE, "invalid NDIS objsize %u",
+			    caps->ndis_hdr.ndis_size);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+int
+hn_rndis_query_rsscaps(struct hn_data *hv,
+		       unsigned int *rxr_cnt0)
+{
+	struct ndis_rss_caps in, caps;
+	unsigned int indsz, rxr_cnt;
+	uint32_t caps_len;
+	int error;
+
+	*rxr_cnt0 = 0;
+
+	if (hv->ndis_ver < NDIS_VERSION_6_20)
+		return -EOPNOTSUPP;
+
+	memset(&in, 0, sizeof(in));
+	in.ndis_hdr.ndis_type = NDIS_OBJTYPE_RSS_CAPS;
+	in.ndis_hdr.ndis_rev = NDIS_RSS_CAPS_REV_2;
+	in.ndis_hdr.ndis_size = NDIS_RSS_CAPS_SIZE;
+
+	caps_len = NDIS_RSS_CAPS_SIZE;
+	error = hn_rndis_query(hv, OID_GEN_RECEIVE_SCALE_CAPABILITIES,
+			       &in, NDIS_RSS_CAPS_SIZE,
+			       &caps, caps_len);
+	if (error)
+		return error;
+
+	PMD_INIT_LOG(DEBUG, "RX rings %u indirect %u caps %#x",
+		     caps.ndis_nrxr, caps.ndis_nind, caps.ndis_caps);
+	/*
+	 * Preliminary verification.
+	 */
+	if (caps.ndis_hdr.ndis_type != NDIS_OBJTYPE_RSS_CAPS) {
+		PMD_DRV_LOG(ERR, "invalid NDIS objtype 0x%02x",
+			    caps.ndis_hdr.ndis_type);
+		return -EINVAL;
+	}
+	if (caps.ndis_hdr.ndis_rev < NDIS_RSS_CAPS_REV_1) {
+		PMD_DRV_LOG(ERR, "invalid NDIS objrev 0x%02x",
+			    caps.ndis_hdr.ndis_rev);
+		return -EINVAL;
+	}
+	if (caps.ndis_hdr.ndis_size > caps_len) {
+		PMD_DRV_LOG(ERR,
+			    "invalid NDIS objsize %u, data size %u",
+			    caps.ndis_hdr.ndis_size, caps_len);
+		return -EINVAL;
+	} else if (caps.ndis_hdr.ndis_size < NDIS_RSS_CAPS_SIZE_6_0) {
+		PMD_DRV_LOG(ERR, "invalid NDIS objsize %u",
+			    caps.ndis_hdr.ndis_size);
+		return -EINVAL;
+	}
+
+	/*
+	 * Save information for later RSS configuration.
+	 */
+	if (caps.ndis_nrxr == 0) {
+		PMD_DRV_LOG(ERR, "0 RX rings!?");
+		return -EINVAL;
+	}
+	rxr_cnt = caps.ndis_nrxr;
+
+	if (caps.ndis_hdr.ndis_size == NDIS_RSS_CAPS_SIZE &&
+	    caps.ndis_hdr.ndis_rev >= NDIS_RSS_CAPS_REV_2) {
+		if (caps.ndis_nind > NDIS_HASH_INDCNT) {
+			PMD_DRV_LOG(ERR,
+				    "too many RSS indirect table entries %u",
+				    caps.ndis_nind);
+			return -EOPNOTSUPP;
+		}
+		if (!rte_is_power_of_2(caps.ndis_nind)) {
+			PMD_DRV_LOG(ERR,
+				    "RSS indirect table size is not power-of-2 %u",
+				    caps.ndis_nind);
+		}
+
+		indsz = caps.ndis_nind;
+	} else {
+		indsz = NDIS_HASH_INDCNT;
+	}
+
+	if (indsz < rxr_cnt) {
+		PMD_DRV_LOG(NOTICE,
+			    "# of RX rings (%d) > RSS indirect table size %d",
+			    rxr_cnt, indsz);
+		rxr_cnt = indsz;
+	}
+
+	hv->rss_offloads = 0;
+	if (caps.ndis_caps & NDIS_RSS_CAP_IPV4)
+		hv->rss_offloads |= ETH_RSS_IPV4
+			| ETH_RSS_NONFRAG_IPV4_TCP
+			| ETH_RSS_NONFRAG_IPV4_UDP;
+	if (caps.ndis_caps & NDIS_RSS_CAP_IPV6)
+		hv->rss_offloads |= ETH_RSS_IPV6
+			| ETH_RSS_NONFRAG_IPV6_TCP;
+	if (caps.ndis_caps & NDIS_RSS_CAP_IPV6_EX)
+		hv->rss_offloads |= ETH_RSS_IPV6_EX
+			| ETH_RSS_IPV6_TCP_EX;
+
+
+	/* Commit! */
+	hv->rss_ind_size = indsz;
+	*rxr_cnt0 = rxr_cnt;
+
+	return 0;
+}
+
+static int
+hn_rndis_set(struct hn_data *hv, uint32_t oid, const void *data, uint32_t dlen)
+{
+	struct rndis_set_req *req;
+	struct rndis_set_comp comp;
+	uint32_t reqlen, comp_len;
+	uint32_t rid;
+	int error;
+
+	reqlen = sizeof(*req) + dlen;
+	req = rte_zmalloc("RNDIS_SET", reqlen, 0);
+	if (!req)
+		return -ENOMEM;
+
+	rid = hn_rndis_rid(hv);
+	req->type = RNDIS_SET_MSG;
+	req->len = reqlen;
+	req->rid = rid;
+	req->oid = oid;
+	req->infobuflen = dlen;
+	req->infobufoffset = RNDIS_SET_REQ_INFOBUFOFFSET;
+
+	/* Data immediately follows RNDIS set. */
+	memcpy(req + 1, data, dlen);
+
+	comp_len = sizeof(comp);
+	error = hn_rndis_execute(hv, rid, req, reqlen,
+				 &comp, comp_len,
+				 RNDIS_SET_CMPLT);
+	if (error) {
+		PMD_DRV_LOG(ERR, "exec RNDIS set %#" PRIx32 " failed",
+			    oid);
+		error = EIO;
+		goto done;
+	}
+
+	if (comp.status != RNDIS_STATUS_SUCCESS) {
+		PMD_DRV_LOG(ERR,
+			    "RNDIS set %#" PRIx32 " failed: status %#" PRIx32,
+			    oid, comp.status);
+		error = EIO;
+		goto done;
+	}
+
+done:
+	rte_free(req);
+	return error;
+}
+
+int hn_rndis_conf_offload(struct hn_data *hv,
+			  uint64_t tx_offloads, uint64_t rx_offloads)
+{
+	struct ndis_offload_params params;
+	struct ndis_offload hwcaps;
+	int error;
+
+	error = hn_rndis_query_hwcaps(hv, &hwcaps);
+	if (error) {
+		PMD_DRV_LOG(ERR, "hwcaps query failed: %d", error);
+		return error;
+	}
+
+	/* NOTE: 0 means "no change" */
+	memset(&params, 0, sizeof(params));
+
+	params.ndis_hdr.ndis_type = NDIS_OBJTYPE_DEFAULT;
+	if (hv->ndis_ver < NDIS_VERSION_6_30) {
+		params.ndis_hdr.ndis_rev = NDIS_OFFLOAD_PARAMS_REV_2;
+		params.ndis_hdr.ndis_size = NDIS_OFFLOAD_PARAMS_SIZE_6_1;
+	} else {
+		params.ndis_hdr.ndis_rev = NDIS_OFFLOAD_PARAMS_REV_3;
+		params.ndis_hdr.ndis_size = NDIS_OFFLOAD_PARAMS_SIZE;
+	}
+
+	if (tx_offloads & DEV_TX_OFFLOAD_TCP_CKSUM) {
+		if (hwcaps.ndis_csum.ndis_ip4_txcsum & NDIS_TXCSUM_CAP_TCP4)
+			params.ndis_tcp4csum = NDIS_OFFLOAD_PARAM_TX;
+		else
+			goto unsupported;
+
+		if (hwcaps.ndis_csum.ndis_ip6_txcsum & NDIS_TXCSUM_CAP_TCP6)
+			params.ndis_tcp6csum = NDIS_OFFLOAD_PARAM_TX;
+		else
+			goto unsupported;
+	}
+
+	if (rx_offloads & DEV_RX_OFFLOAD_TCP_CKSUM) {
+		if ((hwcaps.ndis_csum.ndis_ip4_rxcsum & NDIS_RXCSUM_CAP_TCP4)
+		    == NDIS_RXCSUM_CAP_TCP4)
+			params.ndis_tcp4csum |= NDIS_OFFLOAD_PARAM_RX;
+		else
+			goto unsupported;
+
+		if ((hwcaps.ndis_csum.ndis_ip6_rxcsum & NDIS_RXCSUM_CAP_TCP6)
+		    == NDIS_RXCSUM_CAP_TCP6)
+			params.ndis_tcp6csum |= NDIS_OFFLOAD_PARAM_RX;
+		else
+			goto unsupported;
+	}
+
+	if (tx_offloads & DEV_TX_OFFLOAD_UDP_CKSUM) {
+		if (hwcaps.ndis_csum.ndis_ip4_txcsum & NDIS_TXCSUM_CAP_UDP4)
+			params.ndis_udp4csum = NDIS_OFFLOAD_PARAM_TX;
+		else
+			goto unsupported;
+
+		if ((hwcaps.ndis_csum.ndis_ip6_txcsum & NDIS_TXCSUM_CAP_UDP6)
+		    == NDIS_TXCSUM_CAP_UDP6)
+			params.ndis_udp6csum = NDIS_OFFLOAD_PARAM_TX;
+		else
+			goto unsupported;
+	}
+
+	if (rx_offloads & DEV_TX_OFFLOAD_UDP_CKSUM) {
+		if (hwcaps.ndis_csum.ndis_ip4_rxcsum & NDIS_RXCSUM_CAP_UDP4)
+			params.ndis_udp4csum |= NDIS_OFFLOAD_PARAM_RX;
+		else
+			goto unsupported;
+
+		if (hwcaps.ndis_csum.ndis_ip6_rxcsum & NDIS_RXCSUM_CAP_UDP6)
+			params.ndis_udp6csum |= NDIS_OFFLOAD_PARAM_RX;
+		else
+			goto unsupported;
+	}
+
+	if (tx_offloads & DEV_TX_OFFLOAD_IPV4_CKSUM) {
+		if ((hwcaps.ndis_csum.ndis_ip4_txcsum & NDIS_TXCSUM_CAP_IP4)
+		    == NDIS_TXCSUM_CAP_IP4)
+			params.ndis_ip4csum = NDIS_OFFLOAD_PARAM_TX;
+		else
+			goto unsupported;
+	}
+	if (rx_offloads & DEV_RX_OFFLOAD_IPV4_CKSUM) {
+		if (hwcaps.ndis_csum.ndis_ip4_rxcsum & NDIS_RXCSUM_CAP_IP4)
+			params.ndis_ip4csum |= NDIS_OFFLOAD_PARAM_RX;
+		else
+			goto unsupported;
+	}
+
+	if (tx_offloads & DEV_TX_OFFLOAD_TCP_TSO) {
+		if (hwcaps.ndis_lsov2.ndis_ip4_encap & NDIS_OFFLOAD_ENCAP_8023)
+			params.ndis_lsov2_ip4 = NDIS_OFFLOAD_LSOV2_ON;
+		else
+			goto unsupported;
+
+		if ((hwcaps.ndis_lsov2.ndis_ip6_opts & HN_NDIS_LSOV2_CAP_IP6)
+		    == HN_NDIS_LSOV2_CAP_IP6)
+			params.ndis_lsov2_ip6 = NDIS_OFFLOAD_LSOV2_ON;
+		else
+			goto unsupported;
+	}
+
+	error = hn_rndis_set(hv, OID_TCP_OFFLOAD_PARAMETERS, &params,
+			     params.ndis_hdr.ndis_size);
+	if (error) {
+		PMD_DRV_LOG(ERR, "offload config failed");
+		return error;
+	}
+
+	return 0;
+ unsupported:
+	PMD_DRV_LOG(NOTICE,
+		    "offload tx:%" PRIx64 " rx:%" PRIx64 " not supported by this version",
+		    tx_offloads, rx_offloads);
+	return -EINVAL;
+}
+
+int hn_rndis_get_offload(struct hn_data *hv,
+			 struct rte_eth_dev_info *dev_info)
+{
+	struct ndis_offload hwcaps;
+	int error;
+
+	memset(&hwcaps, 0, sizeof(hwcaps));
+
+	error = hn_rndis_query_hwcaps(hv, &hwcaps);
+	if (error) {
+		PMD_DRV_LOG(ERR, "hwcaps query failed: %d", error);
+		return error;
+	}
+
+	dev_info->tx_offload_capa = DEV_TX_OFFLOAD_VLAN_INSERT |
+		DEV_TX_OFFLOAD_MULTI_SEGS;
+	if ((hwcaps.ndis_csum.ndis_ip4_txcsum & HN_NDIS_TXCSUM_CAP_IP4)
+	    == HN_NDIS_TXCSUM_CAP_IP4)
+		dev_info->tx_offload_capa |= DEV_TX_OFFLOAD_IPV4_CKSUM;
+
+	if ((hwcaps.ndis_csum.ndis_ip4_txcsum & HN_NDIS_TXCSUM_CAP_TCP4)
+	    == HN_NDIS_TXCSUM_CAP_TCP4 &&
+	    (hwcaps.ndis_csum.ndis_ip6_txcsum & HN_NDIS_TXCSUM_CAP_TCP6)
+	    == HN_NDIS_TXCSUM_CAP_TCP6)
+		dev_info->tx_offload_capa |= DEV_TX_OFFLOAD_TCP_CKSUM;
+
+	if ((hwcaps.ndis_csum.ndis_ip4_txcsum & NDIS_TXCSUM_CAP_UDP4) &&
+	    (hwcaps.ndis_csum.ndis_ip6_txcsum & NDIS_TXCSUM_CAP_UDP6))
+		dev_info->tx_offload_capa |= DEV_TX_OFFLOAD_UDP_CKSUM;
+
+	if ((hwcaps.ndis_lsov2.ndis_ip4_encap & NDIS_OFFLOAD_ENCAP_8023) &&
+	    (hwcaps.ndis_lsov2.ndis_ip6_opts & HN_NDIS_LSOV2_CAP_IP6)
+	    == HN_NDIS_LSOV2_CAP_IP6)
+		dev_info->tx_offload_capa |= DEV_TX_OFFLOAD_TCP_TSO;
+
+	dev_info->rx_offload_capa = DEV_RX_OFFLOAD_VLAN_STRIP |
+		DEV_RX_OFFLOAD_CRC_STRIP |
+		DEV_RX_OFFLOAD_JUMBO_FRAME;
+
+	if (hwcaps.ndis_csum.ndis_ip4_rxcsum & NDIS_RXCSUM_CAP_IP4)
+		dev_info->rx_offload_capa |= DEV_RX_OFFLOAD_IPV4_CKSUM;
+
+	if ((hwcaps.ndis_csum.ndis_ip4_rxcsum & NDIS_RXCSUM_CAP_TCP4) &&
+	    (hwcaps.ndis_csum.ndis_ip6_rxcsum & NDIS_RXCSUM_CAP_TCP6))
+		dev_info->rx_offload_capa |= DEV_RX_OFFLOAD_TCP_CKSUM;
+
+	if ((hwcaps.ndis_csum.ndis_ip4_rxcsum & NDIS_RXCSUM_CAP_UDP4) &&
+	    (hwcaps.ndis_csum.ndis_ip6_rxcsum & NDIS_RXCSUM_CAP_UDP6))
+		dev_info->rx_offload_capa |= DEV_RX_OFFLOAD_UDP_CKSUM;
+
+	PMD_INIT_LOG(DEBUG,
+		     "offload capa Tx %#" PRIx64 " Rx %#" PRIx64,
+		     dev_info->tx_offload_capa,
+		     dev_info->rx_offload_capa);
+
+	return 0;
+}
+
+int
+hn_rndis_set_rxfilter(struct hn_data *hv, uint32_t filter)
+{
+	int error;
+
+	error = hn_rndis_set(hv, OID_GEN_CURRENT_PACKET_FILTER,
+			     &filter, sizeof(filter));
+	if (error) {
+		PMD_DRV_LOG(ERR, "set RX filter %#" PRIx32 " failed: %d",
+			    filter, error);
+	} else {
+		PMD_DRV_LOG(DEBUG, "set RX filter %#" PRIx32 " done", filter);
+	}
+
+	return error;
+}
+
+static const uint8_t rss_intel_key[NDIS_HASH_KEYSIZE_TOEPLITZ] = {
+	0x6d, 0x5a, 0x56, 0xda, 0x25, 0x5b, 0x0e, 0xc2,
+	0x41, 0x67, 0x25, 0x3d, 0x43, 0xa3, 0x8f, 0xb0,
+	0xd0, 0xca, 0x2b, 0xcb, 0xae, 0x7b, 0x30, 0xb4,
+	0x77, 0xcb, 0x2d, 0xa3, 0x80, 0x30, 0xf2, 0x0c,
+	0x6a, 0x42, 0xb7, 0x3b, 0xbe, 0xac, 0x01, 0xfa
+};
+
+int hn_rndis_conf_rss(struct hn_data *hv,
+		      const struct rte_eth_rss_conf *rss_conf)
+{
+	struct ndis_rssprm_toeplitz rssp;
+	struct ndis_rss_params *prm = &rssp.rss_params;
+	const uint8_t *rss_key = rss_conf->rss_key ? : rss_intel_key;
+	uint32_t rss_hash;
+	uint16_t rss_size;
+	unsigned int i;
+	int error;
+
+	PMD_INIT_FUNC_TRACE();
+
+	memset(&rssp, 0, sizeof(rssp));
+	rss_size = NDIS_RSSPRM_TOEPLITZ_SIZE(hv->rss_ind_size);
+
+	prm->ndis_hdr.ndis_type = NDIS_OBJTYPE_RSS_PARAMS;
+	prm->ndis_hdr.ndis_rev = NDIS_RSS_PARAMS_REV_2;
+	prm->ndis_hdr.ndis_size = rss_size;
+	prm->ndis_flags = 0;
+
+	rss_hash = NDIS_HASH_FUNCTION_TOEPLITZ;
+	if (rss_conf->rss_hf & ETH_RSS_IPV4)
+		rss_hash |= NDIS_HASH_IPV4;
+	if (rss_conf->rss_hf & ETH_RSS_NONFRAG_IPV4_TCP)
+		rss_hash |= NDIS_HASH_TCP_IPV4;
+	if (rss_conf->rss_hf & ETH_RSS_IPV6)
+		rss_hash |=  NDIS_HASH_IPV6;
+	if (rss_conf->rss_hf & ETH_RSS_NONFRAG_IPV6_TCP)
+		rss_hash |= NDIS_HASH_TCP_IPV6;
+
+	prm->ndis_hash = rss_hash;
+	prm->ndis_indsize = sizeof(rssp.rss_ind[0]) * hv->rss_ind_size;
+	prm->ndis_indoffset = offsetof(struct ndis_rssprm_toeplitz, rss_ind[0]);
+	prm->ndis_keysize = sizeof(rssp.rss_key);
+	prm->ndis_keyoffset = offsetof(struct ndis_rssprm_toeplitz, rss_key[0]);
+
+	memcpy(&rssp.rss_key, rss_key, NDIS_HASH_KEYSIZE_TOEPLITZ);
+
+	for (i = 0; i < NDIS_HASH_INDCNT; i++)
+		rssp.rss_ind[i] = i % hv->num_queues;
+
+	error = hn_rndis_set(hv, OID_GEN_RECEIVE_SCALE_PARAMETERS,
+			     &rssp, rss_size);
+	if (error)
+		PMD_DRV_LOG(ERR,
+			    "RSS config failed: %d", error);
+	else
+		PMD_DRV_LOG(DEBUG, "RSS config done");
+
+	return error;
+}
+
+static int hn_rndis_init(struct hn_data *hv)
+{
+	struct rndis_init_req *req;
+	struct rndis_init_comp comp;
+	uint32_t comp_len, rid;
+	int error;
+
+	req = hn_rndis_alloc(hv, sizeof(*req));
+	if (!req) {
+		PMD_DRV_LOG(ERR, "no memory for RNDIS init");
+		return -ENXIO;
+	}
+
+	rid = hn_rndis_rid(hv);
+	req->type = RNDIS_INITIALIZE_MSG;
+	req->len = sizeof(*req);
+	req->rid = rid;
+	req->ver_major = RNDIS_VERSION_MAJOR;
+	req->ver_minor = RNDIS_VERSION_MINOR;
+	req->max_xfersz = HN_RNDIS_XFER_SIZE;
+
+	comp_len = RNDIS_INIT_COMP_SIZE_MIN;
+	error = hn_rndis_execute(hv, rid, req, sizeof(*req),
+				 &comp, comp_len,
+				 RNDIS_INITIALIZE_CMPLT);
+	if (error)
+		goto done;
+
+	if (comp.status != RNDIS_STATUS_SUCCESS) {
+		PMD_DRV_LOG(ERR, "RNDIS init failed: status 0x%08x",
+			    comp.status);
+		error = -EIO;
+		goto done;
+	}
+
+	hv->rndis_agg_size = comp.pktmaxsz;
+	hv->rndis_agg_pkts = comp.pktmaxcnt;
+	hv->rndis_agg_align = 1U << comp.align;
+
+	if (hv->rndis_agg_align < sizeof(uint32_t)) {
+		/*
+		 * The RNDIS packet message encap assumes that the RNDIS
+		 * packet message is at least 4 bytes aligned.  Fix up the
+		 * alignment here, if the remote side sets the alignment
+		 * too low.
+		 */
+		PMD_DRV_LOG(NOTICE,
+			    "fixup RNDIS aggpkt align: %u -> %zu",
+			    hv->rndis_agg_align, sizeof(uint32_t));
+		hv->rndis_agg_align = sizeof(uint32_t);
+	}
+
+	PMD_INIT_LOG(INFO,
+		     "RNDIS ver %u.%u, aggpkt size %u, aggpkt cnt %u, aggpkt align %u",
+		     comp.ver_major, comp.ver_minor,
+		     hv->rndis_agg_size, hv->rndis_agg_pkts,
+		     hv->rndis_agg_align);
+	error = 0;
+done:
+	rte_free(req);
+	return error;
+}
+
+int
+hn_rndis_get_eaddr(struct hn_data *hv, uint8_t *eaddr)
+{
+	uint32_t eaddr_len;
+	int error;
+
+	eaddr_len = ETHER_ADDR_LEN;
+	error = hn_rndis_query(hv, OID_802_3_PERMANENT_ADDRESS, NULL, 0,
+			       eaddr, eaddr_len);
+	if (error)
+		return error;
+
+	PMD_DRV_LOG(INFO, "MAC address %02x:%02x:%02x:%02x:%02x:%02x",
+		    eaddr[0], eaddr[1], eaddr[2],
+		    eaddr[3], eaddr[4], eaddr[5]);
+	return 0;
+}
+
+int
+hn_rndis_get_linkstatus(struct hn_data *hv)
+{
+	return hn_rndis_query(hv, OID_GEN_MEDIA_CONNECT_STATUS, NULL, 0,
+			      &hv->link_status, sizeof(uint32_t));
+}
+
+int
+hn_rndis_get_linkspeed(struct hn_data *hv)
+{
+	return hn_rndis_query(hv, OID_GEN_LINK_SPEED, NULL, 0,
+			      &hv->link_speed, sizeof(uint32_t));
+}
+
+int
+hn_rndis_attach(struct hn_data *hv)
+{
+	/* Initialize RNDIS. */
+	return hn_rndis_init(hv);
+}
+
+void
+hn_rndis_detach(struct hn_data *hv)
+{
+	/* Halt the RNDIS. */
+	hn_rndis_halt(hv);
+}
diff --git a/drivers/net/netvsc/hn_rndis.h b/drivers/net/netvsc/hn_rndis.h
new file mode 100644
index 000000000000..46049b2e0776
--- /dev/null
+++ b/drivers/net/netvsc/hn_rndis.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: BSD-3-Clause */
+
+#include "rndis.h"
+
+struct hn_data;
+
+void hn_rndis_receive_response(struct hn_data *hv,
+			      const void *data, uint32_t len);
+void	hn_rndis_link_status(struct hn_data *hv, const void *data);
+int	hn_rndis_attach(struct hn_data *hv);
+void	hn_rndis_detach(struct hn_data *hv);
+int	hn_rndis_get_eaddr(struct hn_data *hv, uint8_t *eaddr);
+int	hn_rndis_get_linkstatus(struct hn_data *hv);
+int	hn_rndis_get_linkspeed(struct hn_data *hv);
+int	hn_rndis_set_rxfilter(struct hn_data *hv, uint32_t filter);
+void	hn_rndis_rx_ctrl(struct hn_data *hv, const void *data,
+			 int dlen);
+int	hn_rndis_get_offload(struct hn_data *hv,
+			     struct rte_eth_dev_info *dev_info);
+int	hn_rndis_conf_offload(struct hn_data *hv,
+			      uint64_t tx_offloads,
+			      uint64_t rx_offloads);
+int	hn_rndis_query_rsscaps(struct hn_data *hv,
+			       unsigned int *rxr_cnt0);
+int hn_rndis_conf_rss(struct hn_data *hv,
+		      const struct rte_eth_rss_conf *rss_conf);
diff --git a/drivers/net/netvsc/hn_rxtx.c b/drivers/net/netvsc/hn_rxtx.c
new file mode 100644
index 000000000000..c0132faaf5f5
--- /dev/null
+++ b/drivers/net/netvsc/hn_rxtx.c
@@ -0,0 +1,1224 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2016-2018 Microsoft Corporation
+ * Copyright(c) 2013-2016 Brocade Communications Systems, Inc.
+ * All rights reserved.
+ */
+
+#include <stdint.h>
+#include <string.h>
+#include <stdio.h>
+#include <errno.h>
+#include <unistd.h>
+#include <strings.h>
+
+#include <rte_ethdev.h>
+#include <rte_memcpy.h>
+#include <rte_string_fns.h>
+#include <rte_memzone.h>
+#include <rte_malloc.h>
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_ether.h>
+#include <rte_common.h>
+#include <rte_errno.h>
+#include <rte_memory.h>
+#include <rte_eal.h>
+#include <rte_dev.h>
+#include <rte_bus_vmbus.h>
+#include <rte_spinlock.h>
+
+#include "hn_logs.h"
+#include "hn_var.h"
+#include "hn_rndis.h"
+#include "hn_nvs.h"
+#include "ndis.h"
+
+#define HN_NVS_SEND_MSG_SIZE \
+	(sizeof(struct vmbus_chanpkt_hdr) + sizeof(struct hn_nvs_rndis))
+
+#define HN_TXD_CACHE_SIZE	32 /* per cpu tx_descriptor pool cache */
+
+struct hn_rxinfo {
+	uint32_t	vlan_info;
+	uint32_t	csum_info;
+	uint32_t	hash_info;
+	uint32_t	hash_value;
+};
+#define HN_RXINFO_VLAN			0x0001
+#define HN_RXINFO_CSUM			0x0002
+#define HN_RXINFO_HASHINF		0x0004
+#define HN_RXINFO_HASHVAL		0x0008
+#define HN_RXINFO_ALL			\
+	(HN_RXINFO_VLAN |		\
+	 HN_RXINFO_CSUM |		\
+	 HN_RXINFO_HASHINF |		\
+	 HN_RXINFO_HASHVAL)
+
+#define HN_NDIS_VLAN_INFO_INVALID	0xffffffff
+#define HN_NDIS_RXCSUM_INFO_INVALID	0
+#define HN_NDIS_HASH_INFO_INVALID	0
+
+/*
+ * Per-transmit book keeping.
+ * A slot in transmit ring (chim_index) is reserved for each transmit.
+ *
+ * There are two types of transmit:
+ *   - buffered transmit where chimney buffer is used and RNDIS header
+ *     is in the buffer. mbuf == NULL for this case.
+ *
+ *   - direct transmit where RNDIS header is in the in  rndis_pkt
+ *     mbuf is freed after transmit.
+ *
+ * Descriptors come from per-port pool which is used
+ * to limit number of outstanding requests per device.
+ */
+struct hn_txdesc {
+	struct rte_mbuf *m;
+
+	uint16_t	queue_id;
+	uint16_t	chim_index;
+	uint32_t	chim_size;
+	uint32_t	data_size;
+	uint32_t	packets;
+
+	struct rndis_packet_msg *rndis_pkt;
+};
+
+#define HN_RNDIS_PKT_LEN				\
+	(sizeof(struct rndis_packet_msg) +		\
+	 RNDIS_PKTINFO_SIZE(NDIS_HASH_VALUE_SIZE) +	\
+	 RNDIS_PKTINFO_SIZE(NDIS_VLAN_INFO_SIZE) +	\
+	 RNDIS_PKTINFO_SIZE(NDIS_LSO2_INFO_SIZE) +	\
+	 RNDIS_PKTINFO_SIZE(NDIS_TXCSUM_INFO_SIZE))
+
+/* Threshold where chimney (copy) is used for small packets */
+#define HN_CHIM_THRESHOLD	(HN_RNDIS_PKT_LEN + 256)
+
+/* Minimum space required for a packet */
+#define HN_PKTSIZE_MIN(align) \
+	RTE_ALIGN(ETHER_MIN_LEN + HN_RNDIS_PKT_LEN, align)
+
+#define DEFAULT_TX_FREE_THRESH 32U
+
+static void
+hn_update_packet_stats(struct hn_stats *stats, const struct rte_mbuf *m)
+{
+	uint32_t s = m->pkt_len;
+	const struct ether_addr *ea;
+
+	if (s == 64) {
+		stats->size_bins[1]++;
+	} else if (s > 64 && s < 1024) {
+		uint32_t bin;
+
+		/* count zeros, and offset into correct bin */
+		bin = (sizeof(s) * 8) - __builtin_clz(s) - 5;
+		stats->size_bins[bin]++;
+	} else {
+		if (s < 64)
+			stats->size_bins[0]++;
+		else if (s < 1519)
+			stats->size_bins[6]++;
+		else if (s >= 1519)
+			stats->size_bins[7]++;
+	}
+
+	ea = rte_pktmbuf_mtod(m, const struct ether_addr *);
+	if (is_multicast_ether_addr(ea)) {
+		if (is_broadcast_ether_addr(ea))
+			stats->broadcast++;
+		else
+			stats->multicast++;
+	}
+}
+
+static inline unsigned int hn_rndis_pktlen(const struct rndis_packet_msg *pkt)
+{
+	return pkt->pktinfooffset + pkt->pktinfolen;
+}
+
+static inline uint32_t
+hn_rndis_pktmsg_offset(uint32_t ofs)
+{
+	return ofs - offsetof(struct rndis_packet_msg, dataoffset);
+}
+
+static void hn_txd_init(struct rte_mempool *mp __rte_unused,
+			void *opaque, void *obj, unsigned int idx)
+{
+	struct hn_txdesc *txd = obj;
+	struct rte_eth_dev *dev = opaque;
+	struct rndis_packet_msg *pkt;
+
+	memset(txd, 0, sizeof(*txd));
+	txd->chim_index = idx;
+
+	pkt = rte_malloc_socket("RNDIS_TX", HN_RNDIS_PKT_LEN,
+				RTE_CACHE_LINE_SIZE, dev->device->numa_node);
+	if (pkt == NULL)
+		rte_exit(EXIT_FAILURE, "can not allocate RNDIS header");
+
+	txd->rndis_pkt = pkt;
+}
+
+/*
+ * Unlike Linux and FreeBSD, this driver uses a mempool
+ * to limit outstanding transmits and reserve buffers
+ */
+int
+hn_tx_pool_init(struct rte_eth_dev *dev)
+{
+	struct hn_data *hv = dev->data->dev_private;
+	char name[RTE_MEMPOOL_NAMESIZE];
+	struct rte_mempool *mp;
+
+	snprintf(name, sizeof(name),
+		 "hn_txd_%u", dev->data->port_id);
+
+	PMD_INIT_LOG(DEBUG, "create a TX send pool %s n=%u size=%zu socket=%d",
+		     name, hv->chim_cnt, sizeof(struct hn_txdesc),
+		     dev->device->numa_node);
+
+	mp = rte_mempool_create(name, hv->chim_cnt, sizeof(struct hn_txdesc),
+				HN_TXD_CACHE_SIZE, 0,
+				NULL, NULL,
+				hn_txd_init, dev,
+				dev->device->numa_node, 0);
+	if (mp == NULL) {
+		PMD_DRV_LOG(ERR,
+			    "mempool %s create failed: %d", name, rte_errno);
+		return -rte_errno;
+	}
+
+	hv->tx_pool = mp;
+	return 0;
+}
+
+static void hn_reset_txagg(struct hn_tx_queue *txq)
+{
+	txq->agg_szleft = txq->agg_szmax;
+	txq->agg_pktleft = txq->agg_pktmax;
+	txq->agg_txd = NULL;
+	txq->agg_prevpkt = NULL;
+}
+
+int
+hn_dev_tx_queue_setup(struct rte_eth_dev *dev,
+		      uint16_t queue_idx, uint16_t nb_desc __rte_unused,
+		      unsigned int socket_id,
+		      const struct rte_eth_txconf *tx_conf)
+
+{
+	struct hn_data *hv = dev->data->dev_private;
+	struct hn_tx_queue *txq;
+	uint32_t tx_free_thresh;
+
+	PMD_INIT_FUNC_TRACE();
+
+	txq = rte_zmalloc_socket("HN_TXQ", sizeof(*txq), RTE_CACHE_LINE_SIZE,
+				 socket_id);
+	if (!txq)
+		return -ENOMEM;
+
+	txq->hv = hv;
+	txq->chan = hv->channels[queue_idx];
+	txq->port_id = dev->data->port_id;
+	txq->queue_id = queue_idx;
+
+	tx_free_thresh = tx_conf->tx_free_thresh;
+	if (tx_free_thresh == 0)
+		tx_free_thresh = RTE_MIN(hv->chim_cnt / 4,
+					 DEFAULT_TX_FREE_THRESH);
+
+	if (tx_free_thresh >= hv->chim_cnt - 3) {
+		RTE_LOG(ERR, PMD, "tx_free_thresh must be less than the "
+			"number of TX entries minus 3 (%u)."
+			" (tx_free_thresh=%u port=%u queue=%u)\n",
+			hv->chim_cnt - 3,
+			tx_free_thresh, dev->data->port_id, queue_idx);
+		return -EINVAL;
+	}
+
+	txq->free_thresh = tx_free_thresh;
+
+	txq->agg_szmax  = RTE_MIN(hv->chim_szmax, hv->rndis_agg_size);
+	txq->agg_pktmax = hv->rndis_agg_pkts;
+	txq->agg_align  = hv->rndis_agg_align;
+
+	hn_reset_txagg(txq);
+
+	PMD_DRV_LOG(INFO,
+		    "tx queue aggregation packets=%u bytes=%u align=%u",
+		    txq->agg_pktmax, txq->agg_szmax, txq->agg_align);
+
+	dev->data->tx_queues[queue_idx] = txq;
+
+	return 0;
+}
+
+void
+hn_dev_tx_queue_release(void *arg)
+{
+	struct hn_tx_queue *txq = arg;
+	struct hn_txdesc *txd;
+
+	PMD_INIT_FUNC_TRACE();
+
+	if (!txq)
+		return;
+
+	/* If any pending data is still present just drop it */
+	txd = txq->agg_txd;
+	if (txd)
+		rte_mempool_put(txq->hv->tx_pool, txd);
+
+	rte_free(txq);
+}
+
+static void
+hn_nvs_send_completed(struct rte_eth_dev *dev,
+		      uint16_t queue_id,
+		      unsigned long xactid)
+{
+	struct hn_txdesc *txd = (struct hn_txdesc *)xactid;
+	struct hn_tx_queue *txq;
+
+	/* Control packets are sent with xacid == 0 */
+	if (!txd)
+		return;
+
+	txq = dev->data->tx_queues[queue_id];
+
+	PMD_TX_LOG(DEBUG, "port %u:%u complete tx %u mbuf %p size %u",
+		   txq->port_id, txq->queue_id,
+		   txd->chim_index, txd->m, txd->data_size);
+
+	txq->stats.bytes += txd->data_size;
+	txq->stats.packets += txd->packets;
+	rte_pktmbuf_free(txd->m);
+
+	rte_mempool_put(txq->hv->tx_pool, txd);
+}
+
+/* Handle transmit completion events */
+static void
+hn_nvs_handle_comp(struct rte_eth_dev *dev, uint16_t queue_id,
+		   const struct vmbus_chanpkt_hdr *pkt,
+		   const void *data)
+{
+	const struct hn_nvs_hdr *hdr = data;
+
+	switch (hdr->type) {
+	case NVS_TYPE_RNDIS_ACK:
+		hn_nvs_send_completed(dev, queue_id, pkt->xactid);
+		break;
+
+	default:
+		PMD_TX_LOG(NOTICE,
+			   "unexpected send completion type %u",
+			   hdr->type);
+	}
+}
+
+/* Parse per-packet info (meta data) */
+static int
+hn_rndis_rxinfo(const void *info_data, unsigned int info_dlen,
+		struct hn_rxinfo *info)
+{
+	const struct rndis_pktinfo *pi = info_data;
+	uint32_t mask = 0;
+
+	while (info_dlen != 0) {
+		const void *data;
+		uint32_t dlen;
+
+		if (unlikely(info_dlen < sizeof(*pi)))
+			return -EINVAL;
+
+		if (unlikely(info_dlen < pi->size))
+			return -EINVAL;
+		info_dlen -= pi->size;
+
+		if (unlikely(pi->size & RNDIS_PKTINFO_SIZE_ALIGNMASK))
+			return -EINVAL;
+		if (unlikely(pi->size < pi->offset))
+			return -EINVAL;
+
+		dlen = pi->size - pi->offset;
+		data = pi->data;
+
+		switch (pi->type) {
+		case NDIS_PKTINFO_TYPE_VLAN:
+			if (unlikely(dlen < NDIS_VLAN_INFO_SIZE))
+				return -EINVAL;
+			info->vlan_info = *((const uint32_t *)data);
+			mask |= HN_RXINFO_VLAN;
+			break;
+
+		case NDIS_PKTINFO_TYPE_CSUM:
+			if (unlikely(dlen < NDIS_RXCSUM_INFO_SIZE))
+				return -EINVAL;
+			info->csum_info = *((const uint32_t *)data);
+			mask |= HN_RXINFO_CSUM;
+			break;
+
+		case NDIS_PKTINFO_TYPE_HASHVAL:
+			if (unlikely(dlen < NDIS_HASH_VALUE_SIZE))
+				return -EINVAL;
+			info->hash_value = *((const uint32_t *)data);
+			mask |= HN_RXINFO_HASHVAL;
+			break;
+
+		case NDIS_PKTINFO_TYPE_HASHINF:
+			if (unlikely(dlen < NDIS_HASH_INFO_SIZE))
+				return -EINVAL;
+			info->hash_info = *((const uint32_t *)data);
+			mask |= HN_RXINFO_HASHINF;
+			break;
+
+		default:
+			goto next;
+		}
+
+		if (mask == HN_RXINFO_ALL)
+			break; /* All found; done */
+next:
+		pi = (const struct rndis_pktinfo *)
+		    ((const uint8_t *)pi + pi->size);
+	}
+
+	/*
+	 * Final fixup.
+	 * - If there is no hash value, invalidate the hash info.
+	 */
+	if (!(mask & HN_RXINFO_HASHVAL))
+		info->hash_info = HN_NDIS_HASH_INFO_INVALID;
+	return 0;
+}
+
+/* XXX this could be optimized */
+static struct rte_mbuf *hn_build_mbuf(struct rte_mempool *mp,
+				      const uint8_t *data, unsigned int dlen)
+{
+	struct rte_mbuf *m0 = NULL;
+	struct rte_mbuf **top = &m0;
+	uint32_t chunk;
+
+	while (dlen > 0) {
+		struct rte_mbuf *m;
+
+		m = rte_pktmbuf_alloc(mp);
+		if (unlikely(m == NULL)) {
+			rte_pktmbuf_free(m0);
+			return NULL;
+		}
+
+		*top = m;
+		top = &m->next;
+
+		chunk = RTE_MIN(dlen, rte_pktmbuf_tailroom(m));
+		rte_memcpy(rte_pktmbuf_append(m, chunk),
+			   data, chunk);
+
+		data += chunk;
+		dlen -= chunk;
+	}
+
+	return m0;
+}
+
+static void hn_rxpkt(struct hn_rx_queue *rxq, const void *data,
+		     unsigned int dlen,
+		     const struct hn_rxinfo *info)
+{
+	struct rte_mbuf *m;
+
+	if (unlikely(dlen < ETHER_HDR_LEN)) {
+		PMD_RX_LOG(NOTICE, "runt packet len %u", dlen);
+		++rxq->stats.errors;
+		return;
+	}
+
+	m = hn_build_mbuf(rxq->mb_pool, data, dlen);
+	if (unlikely(m == NULL)) {
+		struct rte_eth_dev *dev
+			= &rte_eth_devices[rxq->port_id];
+		dev->data->rx_mbuf_alloc_failed++;
+		return;
+	}
+
+	m->port = rxq->port_id;
+	m->ol_flags = 0;
+
+	if (info->vlan_info != HN_NDIS_VLAN_INFO_INVALID) {
+		m->vlan_tci = info->vlan_info;
+		m->ol_flags |= PKT_RX_VLAN_STRIPPED | PKT_RX_VLAN;
+	}
+
+	if (info->csum_info != HN_NDIS_RXCSUM_INFO_INVALID) {
+		if (info->csum_info & NDIS_RXCSUM_INFO_IPCS_OK)
+			m->ol_flags |= PKT_RX_IP_CKSUM_GOOD;
+
+		if (info->csum_info & (NDIS_RXCSUM_INFO_UDPCS_OK
+				       | NDIS_RXCSUM_INFO_TCPCS_OK))
+			m->ol_flags |= PKT_RX_L4_CKSUM_GOOD;
+	}
+
+	if (info->hash_info != HN_NDIS_HASH_INFO_INVALID) {
+		m->ol_flags |= PKT_RX_RSS_HASH;
+		m->hash.rss = info->hash_value;
+	}
+
+	PMD_RX_LOG(DEBUG, "port %u:%u RX size %u flags %#" PRIx64,
+		   rxq->port_id, rxq->queue_id,
+		   m->pkt_len, m->ol_flags);
+
+	++rxq->stats.packets;
+	rxq->stats.bytes += m->pkt_len;
+	hn_update_packet_stats(&rxq->stats, m);
+
+	if (unlikely(rte_ring_sp_enqueue(rxq->rx_ring, m) != 0)) {
+		++rxq->ring_full;
+		rte_pktmbuf_free(m);
+	}
+}
+
+static void hn_rndis_rx_data(struct hn_rx_queue *rxq,
+			     const void *data, uint32_t dlen)
+{
+	unsigned int data_off, data_len, pktinfo_off, pktinfo_len;
+	const struct rndis_packet_msg *pkt;
+	struct hn_rxinfo info = {
+		.vlan_info = HN_NDIS_VLAN_INFO_INVALID,
+		.csum_info = HN_NDIS_RXCSUM_INFO_INVALID,
+		.hash_info = HN_NDIS_HASH_INFO_INVALID,
+	};
+	int err;
+
+	if (unlikely(dlen < sizeof(*pkt))) {
+		PMD_RX_LOG(ERR, "invalid RNDIS packet message");
+		return;
+	}
+
+
+	pkt = data;
+
+	if (unlikely(dlen < pkt->len)) {
+		PMD_RX_LOG(ERR, "truncated RNDIS packet message, (%u < %u)",
+			    dlen, pkt->len);
+		return;
+	}
+
+	if (unlikely(pkt->len < pkt->datalen
+		     + pkt->oobdatalen + pkt->pktinfolen)) {
+		PMD_RX_LOG(ERR,
+			   "invalid RNDIS packet len %u, data %u, oob %u, pktinfo %u",
+			   pkt->len, pkt->datalen, pkt->oobdatalen,
+			   pkt->pktinfolen);
+		return;
+	}
+
+	if (unlikely(pkt->datalen == 0)) {
+		PMD_RX_LOG(ERR, "invalid RNDIS packet message, no data");
+		return;
+	}
+
+	/*
+	 * Check offsets.
+	 */
+#define IS_OFFSET_INVALID(ofs)			\
+	((ofs) < RNDIS_PACKET_MSG_OFFSET_MIN ||	\
+	 ((ofs) & RNDIS_PACKET_MSG_OFFSET_ALIGNMASK))
+
+	/* XXX Hyper-V does not meet data offset alignment requirement */
+	if (unlikely(pkt->dataoffset < RNDIS_PACKET_MSG_OFFSET_MIN)) {
+		PMD_DRV_LOG(ERR, "invalid RNDIS packet data offset %u",
+			    pkt->dataoffset);
+		return;
+	}
+
+	if (likely(pkt->pktinfooffset > 0) &&
+	    unlikely(IS_OFFSET_INVALID(pkt->pktinfooffset))) {
+		PMD_DRV_LOG(ERR, "invalid RNDIS packet pktinfo offset %u",
+			    pkt->pktinfooffset);
+		return;
+	}
+#undef IS_OFFSET_INVALID
+
+	data_off = RNDIS_PACKET_MSG_OFFSET_ABS(pkt->dataoffset);
+	data_len = pkt->datalen;
+	pktinfo_off = RNDIS_PACKET_MSG_OFFSET_ABS(pkt->pktinfooffset);
+	pktinfo_len = pkt->pktinfolen;
+
+	if (likely(pktinfo_len > 0)) {
+		err = hn_rndis_rxinfo((const uint8_t *)pkt + pktinfo_off,
+				      pktinfo_len, &info);
+		if (err) {
+			PMD_DRV_LOG(ERR, "invalid RNDIS packet info");
+			return;
+		}
+	}
+
+	if (unlikely(data_off + data_len > pkt->len)) {
+		PMD_DRV_LOG(ERR,
+			    "invalid RNDIS data len %u, data abs %d len %d",
+			    pkt->len, data_off, data_len);
+		return;
+	}
+
+	hn_rxpkt(rxq, (const uint8_t *)pkt + data_off, data_len, &info);
+}
+
+static void
+hn_rndis_receive(const struct rte_eth_dev *dev,
+		 struct hn_rx_queue *rxq, const void *buf, uint32_t len)
+{
+	const struct rndis_msghdr *hdr = buf;
+
+	switch (hdr->type) {
+	case RNDIS_PACKET_MSG:
+		if (dev->data->dev_started)
+			hn_rndis_rx_data(rxq, buf, len);
+		break;
+
+	case RNDIS_INDICATE_STATUS_MSG:
+		hn_rndis_link_status(rxq->hv, buf);
+		break;
+
+	case RNDIS_INITIALIZE_CMPLT:
+	case RNDIS_QUERY_CMPLT:
+	case RNDIS_SET_CMPLT:
+		hn_rndis_receive_response(rxq->hv, buf, len);
+		break;
+
+	default:
+		PMD_DRV_LOG(NOTICE,
+			    "unexpected RNDIS message (type %#x len %u)",
+			    hdr->type, len);
+		break;
+	}
+}
+
+static void
+hn_nvs_handle_rxbuf(struct rte_eth_dev *dev,
+		    struct hn_data *hv,
+		    struct hn_rx_queue *rxq,
+		    const struct vmbus_chanpkt_hdr *hdr,
+		    const void *buf)
+{
+	const struct vmbus_chanpkt_rxbuf *pkt;
+	const struct hn_nvs_hdr *nvs_hdr = buf;
+	uint32_t rxbuf_sz = hv->rxbuf_res->len;
+	char *rxbuf = hv->rxbuf_res->addr;
+	unsigned int i, hlen, count;
+
+	/* At minimum we need type header */
+	if (unlikely(vmbus_chanpkt_datalen(hdr) < sizeof(*nvs_hdr))) {
+		PMD_RX_LOG(ERR, "invalid receive nvs RNDIS");
+		return;
+	}
+
+	/* Make sure that this is a RNDIS message. */
+	if (unlikely(nvs_hdr->type != NVS_TYPE_RNDIS)) {
+		PMD_RX_LOG(ERR, "nvs type %u, not RNDIS",
+			    nvs_hdr->type);
+		return;
+	}
+
+	hlen = vmbus_chanpkt_getlen(hdr->hlen);
+	if (unlikely(hlen < sizeof(*pkt))) {
+		PMD_RX_LOG(ERR, "invalid rxbuf chanpkt");
+		return;
+	}
+
+	pkt = container_of(hdr, const struct vmbus_chanpkt_rxbuf, hdr);
+	if (unlikely(pkt->rxbuf_id != NVS_RXBUF_SIG)) {
+		PMD_RX_LOG(ERR, "invalid rxbuf_id 0x%08x",
+			    pkt->rxbuf_id);
+		return;
+	}
+
+	count = pkt->rxbuf_cnt;
+	if (unlikely(hlen < offsetof(struct vmbus_chanpkt_rxbuf,
+				     rxbuf[count]))) {
+		PMD_RX_LOG(ERR, "invalid rxbuf_cnt %u", count);
+		return;
+	}
+
+	/* Each range represents 1 RNDIS pkt that contains 1 Ethernet frame */
+	for (i = 0; i < count; ++i) {
+		unsigned int ofs, len;
+
+		ofs = pkt->rxbuf[i].ofs;
+		len = pkt->rxbuf[i].len;
+
+		if (unlikely(ofs + len > rxbuf_sz)) {
+			PMD_RX_LOG(ERR,
+				    "%uth RNDIS msg overflow ofs %u, len %u",
+				    i, ofs, len);
+			continue;
+		}
+
+		if (unlikely(len == 0)) {
+			PMD_RX_LOG(ERR, "%uth RNDIS msg len %u", i, len);
+			continue;
+		}
+
+		hn_rndis_receive(dev, rxq, rxbuf + ofs, len);
+	}
+
+	/*
+	 * Ack the consumed RXBUF associated w/ this channel packet,
+	 * so that this RXBUF can be recycled by the hypervisor.
+	 */
+	hn_nvs_ack_rxbuf(rxq, pkt->hdr.xactid);
+}
+
+struct hn_rx_queue *hn_rx_queue_alloc(struct hn_data *hv,
+				      uint16_t queue_id,
+				      unsigned int socket_id)
+{
+	struct hn_rx_queue *rxq;
+
+	rxq = rte_zmalloc_socket("HN_RXQ", sizeof(*rxq),
+				 RTE_CACHE_LINE_SIZE, socket_id);
+	if (rxq) {
+		rxq->hv = hv;
+		rxq->chan = hv->channels[queue_id];
+		rte_spinlock_init(&rxq->ring_lock);
+		rxq->port_id = hv->port_id;
+		rxq->queue_id = queue_id;
+	}
+	return rxq;
+}
+
+int
+hn_dev_rx_queue_setup(struct rte_eth_dev *dev,
+		      uint16_t queue_idx, uint16_t nb_desc,
+		      unsigned int socket_id,
+		      const struct rte_eth_rxconf *rx_conf __rte_unused,
+		      struct rte_mempool *mp)
+{
+	struct hn_data *hv = dev->data->dev_private;
+	uint32_t qmax = hv->rxbuf_section_cnt;
+	char ring_name[RTE_RING_NAMESIZE];
+	struct hn_rx_queue *rxq;
+	unsigned int count;
+	size_t size;
+	int err;
+
+	PMD_INIT_FUNC_TRACE();
+
+	if (nb_desc == 0 || nb_desc > qmax)
+		nb_desc = qmax;
+
+	if (queue_idx == 0) {
+		rxq = hv->primary;
+	} else {
+		rxq = hn_rx_queue_alloc(hv, queue_idx, socket_id);
+		if (!rxq)
+			return -ENOMEM;
+	}
+
+	count = rte_align32pow2(nb_desc);
+	size = sizeof(struct rte_ring) + count * sizeof(void *);
+	rxq->rx_ring = rte_malloc_socket("RX_RING", size,
+					 RTE_CACHE_LINE_SIZE,
+					 socket_id);
+	if (!rxq->rx_ring) {
+		rte_free(rxq);
+		return -ENOMEM;
+	}
+	rxq->mb_pool = mp;
+
+	/*
+	 * Staging ring from receive event logic to rx_pkts.
+	 * rx_pkts assumes caller is handling multi-thread issue.
+	 * event logic has locking.
+	 */
+	snprintf(ring_name, sizeof(ring_name),
+		 "hn_rx_%u_%u", dev->data->port_id, queue_idx);
+	err = rte_ring_init(rxq->rx_ring, ring_name,
+			    count, 0);
+	if (err) {
+		rte_free(rxq->rx_ring);
+		rte_free(rxq);
+		return err;
+	}
+
+	dev->data->rx_queues[queue_idx] = rxq;
+	return 0;
+}
+
+void
+hn_dev_rx_queue_release(void *arg)
+{
+	struct hn_rx_queue *rxq = arg;
+
+	PMD_INIT_FUNC_TRACE();
+
+	if (!rxq)
+		return;
+
+	rte_free(rxq->rx_ring);
+	rxq->rx_ring = NULL;
+	rxq->mb_pool = NULL;
+
+	if (rxq != rxq->hv->primary)
+		rte_free(rxq);
+}
+
+static void
+hn_nvs_handle_notify(const struct vmbus_chanpkt_hdr *pkthdr,
+		     const void *data)
+{
+	const struct hn_nvs_hdr *hdr = data;
+
+	if (unlikely(vmbus_chanpkt_datalen(pkthdr) < sizeof(*hdr))) {
+		PMD_DRV_LOG(ERR, "invalid nvs notify");
+		return;
+	}
+
+	PMD_DRV_LOG(INFO,
+		    "got notify, nvs type %u", hdr->type);
+}
+
+/*
+ * Process pending events on the channel.
+ * Called from both Rx queue poll and Tx cleanup
+ */
+void hn_process_events(struct hn_data *hv, uint16_t queue_id)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[hv->port_id];
+	struct hn_rx_queue *rxq;
+	int ret = 0;
+
+	rxq = queue_id == 0 ? hv->primary : dev->data->rx_queues[queue_id];
+
+	/* If no pending data then nothing to do */
+	if (rte_vmbus_chan_rx_empty(rxq->chan))
+		return;
+
+	/*
+	 * Since channel is shared between Rx and TX queue need to have a lock
+	 * since DPDK does not force same CPU to be used for Rx/Tx.
+	 */
+	if (unlikely(!rte_spinlock_trylock(&rxq->ring_lock)))
+		return;
+
+	for (;;) {
+		char event_buf[NVS_RESPSIZE_MAX];
+		uint32_t len = sizeof(event_buf);
+		const struct vmbus_chanpkt_hdr *pkt;
+		const void *data;
+
+		ret = rte_vmbus_chan_recv_raw(rxq->chan, event_buf, &len);
+		if (ret == -ENOBUFS) {
+			rte_exit(EXIT_FAILURE,
+				 "event buffer size %u not large enough for %u",
+				 NVS_RESPSIZE_MAX, len);
+		}
+		if (ret != 0)
+			break;
+
+		pkt = (const struct vmbus_chanpkt_hdr *)event_buf;
+		data = event_buf + vmbus_chanpkt_getlen(pkt->hlen);
+
+		switch (pkt->type) {
+		case VMBUS_CHANPKT_TYPE_COMP:
+			hn_nvs_handle_comp(dev, queue_id, pkt, data);
+			break;
+
+		case VMBUS_CHANPKT_TYPE_RXBUF:
+			hn_nvs_handle_rxbuf(dev, hv, rxq, pkt, data);
+			break;
+
+		case VMBUS_CHANPKT_TYPE_INBAND:
+			hn_nvs_handle_notify(pkt, data);
+			break;
+
+		default:
+			PMD_DRV_LOG(ERR,
+				    "unknown chan pkt %u", pkt->type);
+			break;
+		}
+	}
+	rte_spinlock_unlock(&rxq->ring_lock);
+
+	if (unlikely(ret != -EAGAIN)) {
+		PMD_DRV_LOG(ERR,
+			    "channel receive failed: %d",
+			    ret);
+	}
+}
+
+static void hn_append_to_chim(struct hn_tx_queue *txq,
+			      struct rndis_packet_msg *pkt,
+			      const struct rte_mbuf *m)
+{
+	struct hn_txdesc *txd = txq->agg_txd;
+	uint8_t *buf = (uint8_t *)pkt;
+	unsigned int data_offs;
+
+	data_offs = RNDIS_PACKET_MSG_OFFSET_ABS(pkt->dataoffset);
+	txd->chim_size += pkt->len;
+	txd->data_size += m->pkt_len;
+	++txd->packets;
+	hn_update_packet_stats(&txq->stats, m);
+
+	for (; m; m = m->next) {
+		uint16_t len = rte_pktmbuf_data_len(m);
+
+		rte_memcpy(buf + data_offs,
+			   rte_pktmbuf_mtod(m, const char *), len);
+		data_offs += len;
+	}
+}
+
+/*
+ * Send pending aggregated data in chimney buffer (if any).
+ * Returns error if send was unsuccessful because channel ring buffer
+ * was full.
+ */
+static int hn_flush_txagg(struct hn_tx_queue *txq, bool *need_sig)
+
+{
+	struct hn_txdesc *txd = txq->agg_txd;
+	struct hn_nvs_rndis rndis;
+	int ret;
+
+	if (!txd)
+		return 0;
+
+	PMD_TX_LOG(DEBUG,
+		   "port %u:%u send chim index %u size %u packets %u size %u",
+		   txq->port_id, txq->queue_id,
+		   txd->chim_index, txd->chim_size,
+		   txd->packets, txd->data_size);
+
+	rndis = (struct hn_nvs_rndis) {
+		.type = NVS_TYPE_RNDIS,
+		.rndis_mtype = NVS_RNDIS_MTYPE_DATA,
+		.chim_idx = txd->chim_index,
+		.chim_sz = txd->chim_size,
+	};
+
+	ret = hn_nvs_send(txq->chan, VMBUS_CHANPKT_FLAG_RC,
+			  &rndis, sizeof(rndis), (uintptr_t)txd, need_sig);
+
+	if (likely(ret == 0))
+		hn_reset_txagg(txq);
+	else
+		PMD_TX_LOG(NOTICE, "port %u:%u send failed: %d",
+			   txq->port_id, txq->queue_id, ret);
+
+	return ret;
+}
+
+static struct hn_txdesc *hn_new_txd(struct hn_data *hv,
+				    const struct hn_tx_queue *txq)
+{
+	struct hn_txdesc *txd;
+
+	if (rte_mempool_get(hv->tx_pool, (void **)&txd)) {
+		PMD_TX_LOG(DEBUG, "tx pool exhausted!");
+		return NULL;
+	}
+
+	txd->m = NULL;
+	txd->queue_id = txq->queue_id;
+	txd->packets = 0;
+	txd->data_size = 0;
+	txd->chim_size = 0;
+
+	return txd;
+}
+
+static void *
+hn_try_txagg(struct hn_data *hv, struct hn_tx_queue *txq, uint32_t pktsize)
+{
+	struct hn_txdesc *agg_txd = txq->agg_txd;
+	struct rndis_packet_msg *pkt;
+	void *chim;
+
+	if (agg_txd) {
+		unsigned int padding, olen;
+
+		/*
+		 * Update the previous RNDIS packet's total length,
+		 * it can be increased due to the mandatory alignment
+		 * padding for this RNDIS packet.  And update the
+		 * aggregating txdesc's chimney sending buffer size
+		 * accordingly.
+		 *
+		 * Zero-out the padding, as required by the RNDIS spec.
+		 */
+		pkt = txq->agg_prevpkt;
+		olen = pkt->len;
+		padding = RTE_ALIGN(olen, txq->agg_align) - olen;
+		if (padding > 0) {
+			agg_txd->chim_size += padding;
+			pkt->len += padding;
+			memset((uint8_t *)pkt + olen, 0, padding);
+		}
+
+		chim = (uint8_t *)pkt + pkt->len;
+
+		txq->agg_pktleft--;
+		txq->agg_szleft -= pktsize;
+		if (txq->agg_szleft < HN_PKTSIZE_MIN(txq->agg_align)) {
+			/*
+			 * Probably can't aggregate more packets,
+			 * flush this aggregating txdesc proactively.
+			 */
+			txq->agg_pktleft = 0;
+		}
+	} else {
+		agg_txd = hn_new_txd(hv, txq);
+		if (!agg_txd)
+			return NULL;
+
+		chim = (uint8_t *)hv->chim_res->addr
+			+ agg_txd->chim_index * hv->chim_szmax;
+
+		txq->agg_txd = agg_txd;
+		txq->agg_pktleft = txq->agg_pktmax - 1;
+		txq->agg_szleft = txq->agg_szmax - pktsize;
+	}
+	txq->agg_prevpkt = chim;
+
+	return chim;
+}
+
+static inline void *
+hn_rndis_pktinfo_append(struct rndis_packet_msg *pkt,
+			uint32_t pi_dlen, uint32_t pi_type)
+{
+	const uint32_t pi_size = RNDIS_PKTINFO_SIZE(pi_dlen);
+	struct rndis_pktinfo *pi;
+
+	/*
+	 * Per-packet-info does not move; it only grows.
+	 *
+	 * NOTE:
+	 * pktinfooffset in this phase counts from the beginning
+	 * of rndis_packet_msg.
+	 */
+	pi = (struct rndis_pktinfo *)((uint8_t *)pkt + hn_rndis_pktlen(pkt));
+
+	pkt->pktinfolen += pi_size;
+
+	pi->size = pi_size;
+	pi->type = pi_type;
+	pi->offset = RNDIS_PKTINFO_OFFSET;
+
+	return pi->data;
+}
+
+/* Put RNDIS header and packet info on packet */
+static void hn_encap(struct rndis_packet_msg *pkt,
+		     uint16_t queue_id,
+		     const struct rte_mbuf *m)
+{
+	unsigned int hlen = m->l2_len + m->l3_len;
+	uint32_t *pi_data;
+	uint32_t pkt_hlen;
+
+	pkt->type = RNDIS_PACKET_MSG;
+	pkt->len = m->pkt_len;
+	pkt->dataoffset = 0;
+	pkt->datalen = m->pkt_len;
+	pkt->oobdataoffset = 0;
+	pkt->oobdatalen = 0;
+	pkt->oobdataelements = 0;
+	pkt->pktinfooffset = sizeof(*pkt);
+	pkt->pktinfolen = 0;
+	pkt->vchandle = 0;
+	pkt->reserved = 0;
+
+	/*
+	 * Set the hash value for this packet, to the queue_id to cause
+	 * TX done event for this packet on the right channel.
+	 */
+	pi_data = hn_rndis_pktinfo_append(pkt, NDIS_HASH_VALUE_SIZE,
+					  NDIS_PKTINFO_TYPE_HASHVAL);
+	*pi_data = queue_id;
+
+	if (m->ol_flags & PKT_TX_VLAN_PKT) {
+		pi_data = hn_rndis_pktinfo_append(pkt,
+				NDIS_VLAN_INFO_SIZE, NDIS_PKTINFO_TYPE_VLAN);
+		*pi_data = m->vlan_tci;
+	}
+
+	if (m->ol_flags & PKT_TX_TCP_SEG) {
+		pi_data = hn_rndis_pktinfo_append(pkt,
+				NDIS_LSO2_INFO_SIZE, NDIS_PKTINFO_TYPE_LSO);
+
+		if (m->ol_flags & PKT_TX_IPV6) {
+			*pi_data = NDIS_LSO2_INFO_MAKEIPV6(hlen,
+							   m->tso_segsz);
+		} else {
+			*pi_data = NDIS_LSO2_INFO_MAKEIPV4(hlen,
+							   m->tso_segsz);
+		}
+	} else if (m->ol_flags &
+		   (PKT_TX_TCP_CKSUM | PKT_TX_UDP_CKSUM | PKT_TX_IP_CKSUM)) {
+		pi_data = hn_rndis_pktinfo_append(pkt,
+				NDIS_TXCSUM_INFO_SIZE, NDIS_PKTINFO_TYPE_CSUM);
+		*pi_data = 0;
+
+		if (m->ol_flags & PKT_TX_IPV6)
+			*pi_data |= NDIS_TXCSUM_INFO_IPV6;
+		if (m->ol_flags & PKT_TX_IPV4) {
+			*pi_data |= NDIS_TXCSUM_INFO_IPV4;
+
+			if (m->ol_flags & PKT_TX_IP_CKSUM)
+				*pi_data |= NDIS_TXCSUM_INFO_IPCS;
+		}
+
+		if (m->ol_flags & PKT_TX_TCP_CKSUM)
+			*pi_data |= NDIS_TXCSUM_INFO_MKTCPCS(hlen);
+		else if (m->ol_flags & PKT_TX_UDP_CKSUM)
+			*pi_data |= NDIS_TXCSUM_INFO_MKUDPCS(hlen);
+	}
+
+	pkt_hlen = pkt->pktinfooffset + pkt->pktinfolen;
+	/* Fixup RNDIS packet message total length */
+	pkt->len += pkt_hlen;
+
+	/* Convert RNDIS packet message offsets */
+	pkt->dataoffset = hn_rndis_pktmsg_offset(pkt_hlen);
+	pkt->pktinfooffset = hn_rndis_pktmsg_offset(pkt->pktinfooffset);
+}
+
+/* Build scatter gather list from chained mbuf */
+static inline int hn_xmit_sg(struct hn_tx_queue *txq,
+			     struct hn_txdesc *txd,
+			     struct rte_mbuf *m,
+			     bool *need_sig)
+{
+	unsigned int segs = m->nb_segs + 1;
+	struct vmbus_gpa sg[segs];
+	rte_iova_t addr;
+	unsigned int i;
+
+	PMD_TX_LOG(DEBUG, "port %u:%u sg mbuf %p segs %u size %u",
+		   txq->port_id, txq->queue_id, m, segs,
+		   txd->data_size);
+
+	/* pass IOVA of rndis header in first segment */
+	addr = rte_malloc_virt2iova(txd->rndis_pkt);
+	sg[0].page = addr / PAGE_SIZE;
+	sg[0].ofs = addr & PAGE_MASK;
+	sg[0].len = hn_rndis_pktlen(txd->rndis_pkt);
+
+	for (i = 1; i < segs; i++, m = m->next) {
+		addr = rte_mbuf_data_iova(m);
+		sg[i].page = addr / PAGE_SIZE;
+		sg[i].ofs = addr & PAGE_MASK;
+		sg[i].len = rte_pktmbuf_data_len(m);
+	}
+
+	hn_update_packet_stats(&txq->stats, m);
+
+	return hn_nvs_send_rndis_sglist(txq->chan, NVS_RNDIS_MTYPE_DATA,
+					(uintptr_t)txd, sg, segs, need_sig);
+}
+
+uint16_t
+hn_xmit_pkts(void *ptxq, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	struct hn_tx_queue *txq = ptxq;
+	struct hn_data *hv = txq->hv;
+	bool need_sig = false;
+	uint16_t nb_tx;
+	int ret;
+
+	if (unlikely(hv->closed))
+		return 0;
+
+	if (rte_mempool_avail_count(hv->tx_pool) <= txq->free_thresh)
+		hn_process_events(hv, txq->queue_id);
+
+	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
+		struct rte_mbuf *m = tx_pkts[nb_tx];
+		uint32_t pkt_size = m->pkt_len + HN_RNDIS_PKT_LEN;
+		struct rndis_packet_msg *pkt;
+
+		/* For small packets aggregate them in chimney buffer */
+		if (m->pkt_len + HN_RNDIS_PKT_LEN < HN_CHIM_THRESHOLD) {
+			/* If this packet will not fit, then flush  */
+			if (txq->agg_pktleft == 0 ||
+			    RTE_ALIGN(pkt_size, txq->agg_align) < txq->agg_szleft)
+				if (hn_flush_txagg(txq, &need_sig))
+					goto fail;
+
+			pkt = hn_try_txagg(hv, txq, pkt_size);
+			if (unlikely(pkt == NULL))
+				goto fail;
+
+			hn_encap(pkt, txq->queue_id, m);
+			hn_append_to_chim(txq, pkt, m);
+
+			rte_pktmbuf_free(m);
+
+			/* if buffer is full, flush */
+			if (txq->agg_pktleft == 0 &&
+			    hn_flush_txagg(txq, &need_sig))
+				goto fail;
+		} else {
+			struct hn_txdesc *txd;
+
+			/* flush pending buffer first */
+			if (hn_flush_txagg(txq, &need_sig))
+				goto fail;
+
+			/* Send larger packets directly */
+			txd = hn_new_txd(hv, txq);
+			if (unlikely(txd == NULL))
+				goto fail;
+
+			pkt = txd->rndis_pkt;
+			txd->m = m;
+			txd->data_size = m->pkt_len;
+			txd->packets = 1;
+
+			hn_encap(pkt, txq->queue_id, m);
+
+			ret = hn_xmit_sg(txq, txd, m, &need_sig);
+			if (unlikely(ret != 0)) {
+				PMD_TX_LOG(NOTICE, "sg send failed: %d", ret);
+				rte_mempool_put(hv->tx_pool, txd);
+				goto fail;
+			}
+		}
+	}
+
+	/* If partial buffer left, then try and send it.
+	 * if that fails, then reuse it on next send.
+	 */
+	hn_flush_txagg(txq, &need_sig);
+
+fail:
+	if (need_sig)
+		rte_vmbus_chan_signal_tx(txq->chan);
+
+	return nb_tx;
+}
+
+uint16_t
+hn_recv_pkts(void *prxq, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
+{
+	struct hn_rx_queue *rxq = prxq;
+	struct hn_data *hv = rxq->hv;
+
+	if (unlikely(hv->closed))
+		return 0;
+
+	/* Get all outstanding receive completions */
+	hn_process_events(hv, rxq->queue_id);
+
+	/* Get mbufs off staging ring */
+	return rte_ring_sc_dequeue_burst(rxq->rx_ring, (void **)rx_pkts,
+					 nb_pkts, NULL);
+}
diff --git a/drivers/net/netvsc/hn_var.h b/drivers/net/netvsc/hn_var.h
new file mode 100644
index 000000000000..49538c2c6236
--- /dev/null
+++ b/drivers/net/netvsc/hn_var.h
@@ -0,0 +1,140 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2009-2018 Microsoft Corp.
+ * Copyright (c) 2016 Brocade Communications Systems, Inc.
+ * Copyright (c) 2012 NetApp Inc.
+ * Copyright (c) 2012 Citrix Inc.
+ * All rights reserved.
+ */
+
+/*
+ * Tunable ethdev params
+ */
+#define HN_MIN_RX_BUF_SIZE	1024
+#define HN_MAX_XFER_LEN		2048
+#define	HN_MAX_MAC_ADDRS	1
+#define HN_MAX_CHANNELS		64
+
+/* Claimed to be 12232B */
+#define HN_MTU_MAX		(9 * 1024)
+
+/* Retry interval */
+#define HN_CHAN_INTERVAL_US	100
+
+/* Buffers need to be aligned */
+#ifndef PAGE_SIZE
+#define PAGE_SIZE 4096
+#endif
+
+#ifndef PAGE_MASK
+#define PAGE_MASK (PAGE_SIZE - 1)
+#endif
+
+struct hn_data;
+struct hn_txdesc;
+
+struct hn_stats {
+	uint64_t	packets;
+	uint64_t	bytes;
+	uint64_t	errors;
+	uint64_t	multicast;
+	uint64_t	broadcast;
+	/* Size bins in array as RFC 2819, undersized [0], 64 [1], etc */
+	uint64_t	size_bins[8];
+};
+
+struct hn_tx_queue {
+	struct hn_data  *hv;
+	struct vmbus_channel *chan;
+	uint16_t	port_id;
+	uint16_t	queue_id;
+	uint32_t	free_thresh;
+
+	/* Applied packet transmission aggregation limits. */
+	uint32_t	agg_szmax;
+	uint32_t	agg_pktmax;
+	uint32_t	agg_align;
+
+	/* Packet transmission aggregation states */
+	struct hn_txdesc *agg_txd;
+	uint32_t	agg_pktleft;
+	uint32_t	agg_szleft;
+	struct rndis_packet_msg *agg_prevpkt;
+
+	struct hn_stats stats;
+};
+
+struct hn_rx_queue {
+	struct hn_data  *hv;
+	struct vmbus_channel *chan;
+	struct rte_mempool *mb_pool;
+	struct rte_ring *rx_ring;
+
+	rte_spinlock_t ring_lock;
+	uint16_t port_id;
+	uint16_t queue_id;
+	struct hn_stats stats;
+	uint64_t ring_full;
+};
+
+struct hn_data {
+	struct rte_vmbus_device *vmbus;
+	struct hn_rx_queue *primary;
+	uint16_t	port_id;
+	bool		closed;
+	uint32_t	link_status;
+	uint32_t	link_speed;
+
+	struct rte_mem_resource *rxbuf_res;	/* UIO resource for Rx */
+	uint32_t	rxbuf_section_cnt;	/* # of Rx sections */
+	uint16_t	max_queues;		/* Max available queues */
+	uint16_t	num_queues;
+	uint64_t	rss_offloads;
+
+	struct rte_mem_resource *chim_res;	/* UIO resource for Tx */
+	struct rte_mempool *tx_pool;		/* Tx descriptors */
+	uint32_t	chim_szmax;		/* Max size per buffer */
+	uint32_t	chim_cnt;		/* Max packets per buffer */
+
+	uint32_t	nvs_ver;
+	uint32_t	ndis_ver;
+	uint32_t	rndis_agg_size;
+	uint32_t	rndis_agg_pkts;
+	uint32_t	rndis_agg_align;
+	unsigned int	rss_ind_size;
+
+	volatile uint32_t  rndis_pending;
+	rte_atomic32_t	rndis_req_id;
+	uint8_t		rndis_resp[256];
+
+	struct ether_addr mac_addr;
+	struct vmbus_channel *channels[HN_MAX_CHANNELS];
+};
+
+static inline struct vmbus_channel *
+hn_primary_chan(const struct hn_data *hv)
+{
+	return hv->channels[0];
+}
+
+void hn_process_events(struct hn_data *hv, uint16_t queue_id);
+
+uint16_t hn_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+		      uint16_t nb_pkts);
+uint16_t hn_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
+		      uint16_t nb_pkts);
+
+int	hn_tx_pool_init(struct rte_eth_dev *dev);
+int	hn_dev_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
+			      uint16_t nb_desc, unsigned int socket_id,
+			      const struct rte_eth_txconf *tx_conf);
+void	hn_dev_tx_queue_release(void *arg);
+
+struct hn_rx_queue *hn_rx_queue_alloc(struct hn_data *hv,
+				      uint16_t queue_id,
+				      unsigned int socket_id);
+int	hn_dev_rx_queue_setup(struct rte_eth_dev *dev,
+			      uint16_t queue_idx, uint16_t nb_desc,
+			      unsigned int socket_id,
+			      const struct rte_eth_rxconf *rx_conf,
+			      struct rte_mempool *mp);
+void	hn_dev_rx_queue_release(void *arg);
diff --git a/drivers/net/netvsc/ndis.h b/drivers/net/netvsc/ndis.h
new file mode 100644
index 000000000000..e07df8171664
--- /dev/null
+++ b/drivers/net/netvsc/ndis.h
@@ -0,0 +1,378 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2018 Microsoft Corp.
+ * All rights reserved.
+ */
+
+#ifndef _NET_NDIS_H_
+#define _NET_NDIS_H_
+
+#define	NDIS_MEDIA_STATE_CONNECTED	0
+#define	NDIS_MEDIA_STATE_DISCONNECTED	1
+
+#define	NDIS_NETCHANGE_TYPE_POSSIBLE	1
+#define	NDIS_NETCHANGE_TYPE_DEFINITE	2
+#define	NDIS_NETCHANGE_TYPE_FROMMEDIA	3
+
+#define	NDIS_OFFLOAD_SET_NOCHG		0
+#define	NDIS_OFFLOAD_SET_ON		1
+#define	NDIS_OFFLOAD_SET_OFF		2
+
+/* a.k.a GRE MAC */
+#define	NDIS_ENCAP_TYPE_NVGRE		0x00000001
+
+#define	NDIS_HASH_FUNCTION_MASK		0x000000FF	/* see hash function */
+#define	NDIS_HASH_TYPE_MASK		0x00FFFF00	/* see hash type */
+
+/* hash function */
+#define	NDIS_HASH_FUNCTION_TOEPLITZ	0x00000001
+
+/* hash type */
+#define	NDIS_HASH_IPV4			0x00000100
+#define	NDIS_HASH_TCP_IPV4		0x00000200
+#define	NDIS_HASH_IPV6			0x00000400
+#define	NDIS_HASH_IPV6_EX		0x00000800
+#define	NDIS_HASH_TCP_IPV6		0x00001000
+#define	NDIS_HASH_TCP_IPV6_EX		0x00002000
+
+#define	NDIS_HASH_KEYSIZE_TOEPLITZ	40
+#define	NDIS_HASH_INDCNT		128
+
+#define	NDIS_OBJTYPE_DEFAULT		0x80
+#define	NDIS_OBJTYPE_RSS_CAPS		0x88
+#define	NDIS_OBJTYPE_RSS_PARAMS		0x89
+#define	NDIS_OBJTYPE_OFFLOAD		0xa7
+
+struct ndis_object_hdr {
+	uint8_t			ndis_type;	/* NDIS_OBJTYPE_ */
+	uint8_t			ndis_rev;	/* type specific */
+	uint16_t		ndis_size;	/* incl. this hdr */
+};
+
+/*
+ * OID_TCP_OFFLOAD_PARAMETERS
+ * ndis_type: NDIS_OBJTYPE_DEFAULT
+ */
+struct ndis_offload_params {
+	struct ndis_object_hdr	ndis_hdr;
+	uint8_t			ndis_ip4csum;	/* NDIS_OFFLOAD_PARAM_ */
+	uint8_t			ndis_tcp4csum;	/* NDIS_OFFLOAD_PARAM_ */
+	uint8_t			ndis_udp4csum;	/* NDIS_OFFLOAD_PARAM_ */
+	uint8_t			ndis_tcp6csum;	/* NDIS_OFFLOAD_PARAM_ */
+	uint8_t			ndis_udp6csum;	/* NDIS_OFFLOAD_PARAM_ */
+	uint8_t			ndis_lsov1;	/* NDIS_OFFLOAD_PARAM_ */
+	uint8_t			ndis_ipsecv1;	/* NDIS_OFFLOAD_IPSECV1_ */
+	uint8_t			ndis_lsov2_ip4;	/* NDIS_OFFLOAD_LSOV2_ */
+	uint8_t			ndis_lsov2_ip6;	/* NDIS_OFFLOAD_LSOV2_ */
+	uint8_t			ndis_tcp4conn;	/* 0 */
+	uint8_t			ndis_tcp6conn;	/* 0 */
+	uint32_t		ndis_flags;	/* 0 */
+	/* NDIS >= 6.1 */
+	uint8_t			ndis_ipsecv2;	/* NDIS_OFFLOAD_IPSECV2_ */
+	uint8_t			ndis_ipsecv2_ip4;/* NDIS_OFFLOAD_IPSECV2_ */
+	/* NDIS >= 6.30 */
+	uint8_t			ndis_rsc_ip4;	/* NDIS_OFFLOAD_RSC_ */
+	uint8_t			ndis_rsc_ip6;	/* NDIS_OFFLOAD_RSC_ */
+	uint8_t			ndis_encap;	/* NDIS_OFFLOAD_SET_ */
+	uint8_t			ndis_encap_types;/* NDIS_ENCAP_TYPE_ */
+};
+
+#define	NDIS_OFFLOAD_PARAMS_SIZE	sizeof(struct ndis_offload_params)
+#define	NDIS_OFFLOAD_PARAMS_SIZE_6_1	\
+	offsetof(struct ndis_offload_params, ndis_rsc_ip4)
+
+#define	NDIS_OFFLOAD_PARAMS_REV_2	2	/* NDIS 6.1 */
+#define	NDIS_OFFLOAD_PARAMS_REV_3	3	/* NDIS 6.30 */
+
+#define	NDIS_OFFLOAD_PARAM_NOCHG	0	/* common */
+#define	NDIS_OFFLOAD_PARAM_OFF		1
+#define	NDIS_OFFLOAD_PARAM_TX		2
+#define	NDIS_OFFLOAD_PARAM_RX		3
+#define	NDIS_OFFLOAD_PARAM_TXRX		4
+
+/* NDIS_OFFLOAD_PARAM_NOCHG */
+#define	NDIS_OFFLOAD_LSOV1_OFF		1
+#define	NDIS_OFFLOAD_LSOV1_ON		2
+
+/* NDIS_OFFLOAD_PARAM_NOCHG */
+#define	NDIS_OFFLOAD_IPSECV1_OFF	1
+#define	NDIS_OFFLOAD_IPSECV1_AH		2
+#define	NDIS_OFFLOAD_IPSECV1_ESP	3
+#define	NDIS_OFFLOAD_IPSECV1_AH_ESP	4
+
+/* NDIS_OFFLOAD_PARAM_NOCHG */
+#define	NDIS_OFFLOAD_LSOV2_OFF		1
+#define	NDIS_OFFLOAD_LSOV2_ON		2
+
+/* NDIS_OFFLOAD_PARAM_NOCHG */
+#define	NDIS_OFFLOAD_IPSECV2_OFF	1
+#define	NDIS_OFFLOAD_IPSECV2_AH		2
+#define	NDIS_OFFLOAD_IPSECV2_ESP	3
+#define	NDIS_OFFLOAD_IPSECV2_AH_ESP	4
+
+/* NDIS_OFFLOAD_PARAM_NOCHG */
+#define	NDIS_OFFLOAD_RSC_OFF		1
+#define	NDIS_OFFLOAD_RSC_ON		2
+
+/*
+ * OID_GEN_RECEIVE_SCALE_CAPABILITIES
+ * ndis_type: NDIS_OBJTYPE_RSS_CAPS
+ */
+struct ndis_rss_caps {
+	struct ndis_object_hdr		ndis_hdr;
+	uint32_t			ndis_caps;	/* NDIS_RSS_CAP_ */
+	uint32_t			ndis_nmsi;	/* # of MSIs */
+	uint32_t			ndis_nrxr;	/* # of RX rings */
+	/* NDIS >= 6.30 */
+	uint16_t			ndis_nind;	/* # of indtbl ent. */
+	uint16_t			ndis_pad;
+};
+
+#define	NDIS_RSS_CAPS_SIZE		\
+	offsetof(struct ndis_rss_caps, ndis_pad)
+#define	NDIS_RSS_CAPS_SIZE_6_0		\
+	offsetof(struct ndis_rss_caps, ndis_nind)
+
+#define	NDIS_RSS_CAPS_REV_1		1	/* NDIS 6.{0,1,20} */
+#define	NDIS_RSS_CAPS_REV_2		2	/* NDIS 6.30 */
+
+#define	NDIS_RSS_CAP_MSI		0x01000000
+#define	NDIS_RSS_CAP_CLASSIFY_ISR	0x02000000
+#define	NDIS_RSS_CAP_CLASSIFY_DPC	0x04000000
+#define	NDIS_RSS_CAP_MSIX		0x08000000
+#define	NDIS_RSS_CAP_IPV4		0x00000100
+#define	NDIS_RSS_CAP_IPV6		0x00000200
+#define	NDIS_RSS_CAP_IPV6_EX		0x00000400
+#define	NDIS_RSS_CAP_HASH_TOEPLITZ	NDIS_HASH_FUNCTION_TOEPLITZ
+#define	NDIS_RSS_CAP_HASHFUNC_MASK	NDIS_HASH_FUNCTION_MASK
+
+/*
+ * OID_GEN_RECEIVE_SCALE_PARAMETERS
+ * ndis_type: NDIS_OBJTYPE_RSS_PARAMS
+ */
+struct ndis_rss_params {
+	struct ndis_object_hdr		ndis_hdr;
+	uint16_t			ndis_flags;	/* NDIS_RSS_FLAG_ */
+	uint16_t			ndis_bcpu;	/* base cpu 0 */
+	uint32_t			ndis_hash;	/* NDIS_HASH_ */
+	uint16_t			ndis_indsize;	/* indirect table */
+	uint32_t			ndis_indoffset;
+	uint16_t			ndis_keysize;	/* hash key */
+	uint32_t			ndis_keyoffset;
+	/* NDIS >= 6.20 */
+	uint32_t			ndis_cpumaskoffset;
+	uint32_t			ndis_cpumaskcnt;
+	uint32_t			ndis_cpumaskentsz;
+};
+
+#define	NDIS_RSS_PARAMS_SIZE		sizeof(struct ndis_rss_params)
+#define	NDIS_RSS_PARAMS_SIZE_6_0	\
+	offsetof(struct ndis_rss_params, ndis_cpumaskoffset)
+
+#define	NDIS_RSS_PARAMS_REV_1		1	/* NDIS 6.0 */
+#define	NDIS_RSS_PARAMS_REV_2		2	/* NDIS 6.20 */
+
+#define	NDIS_RSS_FLAG_NONE		0x0000
+#define	NDIS_RSS_FLAG_BCPU_UNCHG	0x0001
+#define	NDIS_RSS_FLAG_HASH_UNCHG	0x0002
+#define	NDIS_RSS_FLAG_IND_UNCHG		0x0004
+#define	NDIS_RSS_FLAG_KEY_UNCHG		0x0008
+#define	NDIS_RSS_FLAG_DISABLE		0x0010
+
+/* non-standard convenient struct */
+struct ndis_rssprm_toeplitz {
+	struct ndis_rss_params		rss_params;
+	/* Toeplitz hash key */
+	uint8_t				rss_key[NDIS_HASH_KEYSIZE_TOEPLITZ];
+	/* Indirect table */
+	uint32_t			rss_ind[NDIS_HASH_INDCNT];
+};
+
+#define	NDIS_RSSPRM_TOEPLITZ_SIZE(nind)	\
+	offsetof(struct ndis_rssprm_toeplitz, rss_ind[nind])
+
+/*
+ * OID_TCP_OFFLOAD_HARDWARE_CAPABILITIES
+ * ndis_type: NDIS_OBJTYPE_OFFLOAD
+ */
+
+#define	NDIS_OFFLOAD_ENCAP_NONE		0x0000
+#define	NDIS_OFFLOAD_ENCAP_NULL		0x0001
+#define	NDIS_OFFLOAD_ENCAP_8023		0x0002
+#define	NDIS_OFFLOAD_ENCAP_8023PQ	0x0004
+#define	NDIS_OFFLOAD_ENCAP_8023PQ_OOB	0x0008
+#define	NDIS_OFFLOAD_ENCAP_RFC1483	0x0010
+
+struct ndis_csum_offload {
+	uint32_t			ndis_ip4_txenc;	/*NDIS_OFFLOAD_ENCAP_*/
+	uint32_t			ndis_ip4_txcsum;
+#define	NDIS_TXCSUM_CAP_IP4OPT		0x001
+#define	NDIS_TXCSUM_CAP_TCP4OPT		0x004
+#define	NDIS_TXCSUM_CAP_TCP4		0x010
+#define	NDIS_TXCSUM_CAP_UDP4		0x040
+#define	NDIS_TXCSUM_CAP_IP4		0x100
+	uint32_t			ndis_ip4_rxenc;	/*NDIS_OFFLOAD_ENCAP_*/
+	uint32_t			ndis_ip4_rxcsum;
+#define	NDIS_RXCSUM_CAP_IP4OPT		0x001
+#define	NDIS_RXCSUM_CAP_TCP4OPT		0x004
+#define	NDIS_RXCSUM_CAP_TCP4		0x010
+#define	NDIS_RXCSUM_CAP_UDP4		0x040
+#define	NDIS_RXCSUM_CAP_IP4		0x100
+	uint32_t			ndis_ip6_txenc;	/*NDIS_OFFLOAD_ENCAP_*/
+	uint32_t			ndis_ip6_txcsum;
+#define	NDIS_TXCSUM_CAP_IP6EXT		0x001
+#define	NDIS_TXCSUM_CAP_TCP6OPT		0x004
+#define	NDIS_TXCSUM_CAP_TCP6		0x010
+#define	NDIS_TXCSUM_CAP_UDP6		0x040
+	uint32_t			ndis_ip6_rxenc;	/*NDIS_OFFLOAD_ENCAP_*/
+	uint32_t			ndis_ip6_rxcsum;
+#define	NDIS_RXCSUM_CAP_IP6EXT		0x001
+#define	NDIS_RXCSUM_CAP_TCP6OPT		0x004
+#define	NDIS_RXCSUM_CAP_TCP6		0x010
+#define	NDIS_RXCSUM_CAP_UDP6		0x040
+};
+
+struct ndis_lsov1_offload {
+	uint32_t			ndis_encap;	/*NDIS_OFFLOAD_ENCAP_*/
+	uint32_t			ndis_maxsize;
+	uint32_t			ndis_minsegs;
+	uint32_t			ndis_opts;
+};
+
+struct ndis_ipsecv1_offload {
+	uint32_t			ndis_encap;	/*NDIS_OFFLOAD_ENCAP_*/
+	uint32_t			ndis_ah_esp;
+	uint32_t			ndis_xport_tun;
+	uint32_t			ndis_ip4_opts;
+	uint32_t			ndis_flags;
+	uint32_t			ndis_ip4_ah;
+	uint32_t			ndis_ip4_esp;
+};
+
+struct ndis_lsov2_offload {
+	uint32_t			ndis_ip4_encap;	/*NDIS_OFFLOAD_ENCAP_*/
+	uint32_t			ndis_ip4_maxsz;
+	uint32_t			ndis_ip4_minsg;
+	uint32_t			ndis_ip6_encap;	/*NDIS_OFFLOAD_ENCAP_*/
+	uint32_t			ndis_ip6_maxsz;
+	uint32_t			ndis_ip6_minsg;
+	uint32_t			ndis_ip6_opts;
+#define	NDIS_LSOV2_CAP_IP6EXT		0x001
+#define	NDIS_LSOV2_CAP_TCP6OPT		0x004
+};
+
+struct ndis_ipsecv2_offload {
+	uint32_t			ndis_encap;	/*NDIS_OFFLOAD_ENCAP_*/
+	uint16_t			ndis_ip6;
+	uint16_t			ndis_ip4opt;
+	uint16_t			ndis_ip6ext;
+	uint16_t			ndis_ah;
+	uint16_t			ndis_esp;
+	uint16_t			ndis_ah_esp;
+	uint16_t			ndis_xport;
+	uint16_t			ndis_tun;
+	uint16_t			ndis_xport_tun;
+	uint16_t			ndis_lso;
+	uint16_t			ndis_extseq;
+	uint32_t			ndis_udp_esp;
+	uint32_t			ndis_auth;
+	uint32_t			ndis_crypto;
+	uint32_t			ndis_sa_caps;
+};
+
+struct ndis_rsc_offload {
+	uint16_t			ndis_ip4;
+	uint16_t			ndis_ip6;
+};
+
+struct ndis_encap_offload {
+	uint32_t			ndis_flags;
+	uint32_t			ndis_maxhdr;
+};
+
+struct ndis_offload {
+	struct ndis_object_hdr		ndis_hdr;
+	struct ndis_csum_offload	ndis_csum;
+	struct ndis_lsov1_offload	ndis_lsov1;
+	struct ndis_ipsecv1_offload	ndis_ipsecv1;
+	struct ndis_lsov2_offload	ndis_lsov2;
+	uint32_t			ndis_flags;
+	/* NDIS >= 6.1 */
+	struct ndis_ipsecv2_offload	ndis_ipsecv2;
+	/* NDIS >= 6.30 */
+	struct ndis_rsc_offload		ndis_rsc;
+	struct ndis_encap_offload	ndis_encap_gre;
+};
+
+#define	NDIS_OFFLOAD_SIZE		sizeof(struct ndis_offload)
+#define	NDIS_OFFLOAD_SIZE_6_0		offsetof(struct ndis_offload, ndis_ipsecv2)
+#define	NDIS_OFFLOAD_SIZE_6_1		offsetof(struct ndis_offload, ndis_rsc)
+
+#define	NDIS_OFFLOAD_REV_1		1	/* NDIS 6.0 */
+#define	NDIS_OFFLOAD_REV_2		2	/* NDIS 6.1 */
+#define	NDIS_OFFLOAD_REV_3		3	/* NDIS 6.30 */
+
+/*
+ * Per-packet-info
+ */
+
+/* VLAN */
+#define	NDIS_VLAN_INFO_SIZE		sizeof(uint32_t)
+#define	NDIS_VLAN_INFO_PRI_MASK		0x0007
+#define	NDIS_VLAN_INFO_CFI_MASK		0x0008
+#define	NDIS_VLAN_INFO_ID_MASK		0xfff0
+#define	NDIS_VLAN_INFO_MAKE(id, pri, cfi)	\
+	(((pri) & NDIS_VLAN_INFO_PRI_MASK) |	\
+	 (((cfi) & 0x1) << 3) | (((id) & 0xfff) << 4))
+#define	NDIS_VLAN_INFO_ID(inf)		(((inf) & NDIS_VLAN_INFO_ID_MASK) >> 4)
+#define	NDIS_VLAN_INFO_CFI(inf)		(((inf) & NDIS_VLAN_INFO_CFI_MASK) >> 3)
+#define	NDIS_VLAN_INFO_PRI(inf)		((inf) & NDIS_VLAN_INFO_PRI_MASK)
+
+/* Reception checksum */
+#define	NDIS_RXCSUM_INFO_SIZE		sizeof(uint32_t)
+#define	NDIS_RXCSUM_INFO_TCPCS_FAILED	0x0001
+#define	NDIS_RXCSUM_INFO_UDPCS_FAILED	0x0002
+#define	NDIS_RXCSUM_INFO_IPCS_FAILED	0x0004
+#define	NDIS_RXCSUM_INFO_TCPCS_OK	0x0008
+#define	NDIS_RXCSUM_INFO_UDPCS_OK	0x0010
+#define	NDIS_RXCSUM_INFO_IPCS_OK	0x0020
+#define	NDIS_RXCSUM_INFO_LOOPBACK	0x0040
+#define	NDIS_RXCSUM_INFO_TCPCS_INVAL	0x0080
+#define	NDIS_RXCSUM_INFO_IPCS_INVAL	0x0100
+
+/* LSOv2 */
+#define	NDIS_LSO2_INFO_SIZE		sizeof(uint32_t)
+#define	NDIS_LSO2_INFO_MSS_MASK		0x000fffff
+#define	NDIS_LSO2_INFO_THOFF_MASK	0x3ff00000
+#define	NDIS_LSO2_INFO_ISLSO2		0x40000000
+#define	NDIS_LSO2_INFO_ISIPV6		0x80000000
+
+#define	NDIS_LSO2_INFO_MAKE(thoff, mss)				\
+	((((uint32_t)(mss)) & NDIS_LSO2_INFO_MSS_MASK) |	\
+	 ((((uint32_t)(thoff)) & 0x3ff) << 20) |		\
+	 NDIS_LSO2_INFO_ISLSO2)
+
+#define	NDIS_LSO2_INFO_MAKEIPV4(thoff, mss)			\
+	NDIS_LSO2_INFO_MAKE((thoff), (mss))
+
+#define	NDIS_LSO2_INFO_MAKEIPV6(thoff, mss)			\
+	(NDIS_LSO2_INFO_MAKE((thoff), (mss)) | NDIS_LSO2_INFO_ISIPV6)
+
+/* Transmission checksum */
+#define	NDIS_TXCSUM_INFO_SIZE		sizeof(uint32_t)
+#define	NDIS_TXCSUM_INFO_IPV4		0x00000001
+#define	NDIS_TXCSUM_INFO_IPV6		0x00000002
+#define	NDIS_TXCSUM_INFO_TCPCS		0x00000004
+#define	NDIS_TXCSUM_INFO_UDPCS		0x00000008
+#define	NDIS_TXCSUM_INFO_IPCS		0x00000010
+#define	NDIS_TXCSUM_INFO_THOFF		0x03ff0000
+
+#define	NDIS_TXCSUM_INFO_MKL4CS(thoff, flag)			\
+	((((uint32_t)(thoff)) << 16) | (flag))
+
+#define	NDIS_TXCSUM_INFO_MKTCPCS(thoff)				\
+	NDIS_TXCSUM_INFO_MKL4CS((thoff), NDIS_TXCSUM_INFO_TCPCS)
+
+#define	NDIS_TXCSUM_INFO_MKUDPCS(thoff)				\
+	NDIS_TXCSUM_INFO_MKL4CS((thoff), NDIS_TXCSUM_INFO_UDPCS)
+
+#endif	/* !_NET_NDIS_H_ */
diff --git a/drivers/net/netvsc/rndis.h b/drivers/net/netvsc/rndis.h
new file mode 100644
index 000000000000..eac9a99fd8ef
--- /dev/null
+++ b/drivers/net/netvsc/rndis.h
@@ -0,0 +1,414 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2018 Microsoft Corp.
+ * Copyright (c) 2010 Jonathan Armani <armani@openbsd.org>
+ * Copyright (c) 2010 Fabien Romano <fabien@openbsd.org>
+ * Copyright (c) 2010 Michael Knudsen <mk@openbsd.org>
+ * All rights reserved.
+ */
+
+#ifndef	_NET_RNDIS_H_
+#define	_NET_RNDIS_H_
+
+/* Canonical major/minor version as of 22th Aug. 2016. */
+#define	RNDIS_VERSION_MAJOR		0x00000001
+#define	RNDIS_VERSION_MINOR		0x00000000
+
+#define	RNDIS_STATUS_SUCCESS		0x00000000
+#define	RNDIS_STATUS_PENDING		0x00000103
+
+#define RNDIS_STATUS_ONLINE		0x40010003
+#define RNDIS_STATUS_RESET_START	0x40010004
+#define RNDIS_STATUS_RESET_END		0x40010005
+#define RNDIS_STATUS_RING_STATUS	0x40010006
+#define RNDIS_STATUS_CLOSED		0x40010007
+#define RNDIS_STATUS_WAN_LINE_UP	0x40010008
+#define RNDIS_STATUS_WAN_LINE_DOWN	0x40010009
+#define RNDIS_STATUS_WAN_FRAGMENT	0x4001000A
+#define	RNDIS_STATUS_MEDIA_CONNECT	0x4001000B
+#define	RNDIS_STATUS_MEDIA_DISCONNECT	0x4001000C
+#define RNDIS_STATUS_HARDWARE_LINE_UP	0x4001000D
+#define RNDIS_STATUS_HARDWARE_LINE_DOWN	0x4001000E
+#define RNDIS_STATUS_INTERFACE_UP	0x4001000F
+#define RNDIS_STATUS_INTERFACE_DOWN	0x40010010
+#define RNDIS_STATUS_MEDIA_BUSY		0x40010011
+#define	RNDIS_STATUS_MEDIA_SPECIFIC_INDICATION	0x40010012
+#define RNDIS_STATUS_WW_INDICATION	RDIA_SPECIFIC_INDICATION
+#define RNDIS_STATUS_LINK_SPEED_CHANGE	0x40010013
+#define RNDIS_STATUS_NETWORK_CHANGE	0x40010018
+#define	RNDIS_STATUS_TASK_OFFLOAD_CURRENT_CONFIG 0x40020006
+
+#define	RNDIS_STATUS_FAILURE		0xC0000001
+#define RNDIS_STATUS_RESOURCES		0xC000009A
+#define	RNDIS_STATUS_NOT_SUPPORTED	0xC00000BB
+#define RNDIS_STATUS_CLOSING		0xC0010002
+#define RNDIS_STATUS_BAD_VERSION	0xC0010004
+#define RNDIS_STATUS_BAD_CHARACTERISTICS 0xC0010005
+#define RNDIS_STATUS_ADAPTER_NOT_FOUND	0xC0010006
+#define RNDIS_STATUS_OPEN_FAILED	0xC0010007
+#define RNDIS_STATUS_DEVICE_FAILED	0xC0010008
+#define RNDIS_STATUS_MULTICAST_FULL	0xC0010009
+#define RNDIS_STATUS_MULTICAST_EXISTS	0xC001000A
+#define RNDIS_STATUS_MULTICAST_NOT_FOUND 0xC001000B
+#define RNDIS_STATUS_REQUEST_ABORTED	0xC001000C
+#define RNDIS_STATUS_RESET_IN_PROGRESS	0xC001000D
+#define RNDIS_STATUS_CLOSING_INDICATING	0xC001000E
+#define RNDIS_STATUS_INVALID_PACKET	0xC001000F
+#define RNDIS_STATUS_OPEN_LIST_FULL	0xC0010010
+#define RNDIS_STATUS_ADAPTER_NOT_READY	0xC0010011
+#define RNDIS_STATUS_ADAPTER_NOT_OPEN	0xC0010012
+#define RNDIS_STATUS_NOT_INDICATING	0xC0010013
+#define RNDIS_STATUS_INVALID_LENGTH	0xC0010014
+#define	RNDIS_STATUS_INVALID_DATA	0xC0010015
+#define RNDIS_STATUS_BUFFER_TOO_SHORT	0xC0010016
+#define RNDIS_STATUS_INVALID_OID	0xC0010017
+#define RNDIS_STATUS_ADAPTER_REMOVED	0xC0010018
+#define RNDIS_STATUS_UNSUPPORTED_MEDIA	0xC0010019
+#define RNDIS_STATUS_GROUP_ADDRESS_IN_US 0xC001001A
+#define RNDIS_STATUS_FILE_NOT_FOUND	0xC001001B
+#define RNDIS_STATUS_ERROR_READING_FILE	0xC001001C
+#define RNDIS_STATUS_ALREADY_MAPPED	0xC001001D
+#define RNDIS_STATUS_RESOURCE_CONFLICT	0xC001001E
+#define RNDIS_STATUS_NO_CABLE		0xC001001F
+
+#define	OID_GEN_SUPPORTED_LIST		0x00010101
+#define	OID_GEN_HARDWARE_STATUS		0x00010102
+#define	OID_GEN_MEDIA_SUPPORTED		0x00010103
+#define	OID_GEN_MEDIA_IN_USE		0x00010104
+#define	OID_GEN_MAXIMUM_LOOKAHEAD	0x00010105
+#define	OID_GEN_MAXIMUM_FRAME_SIZE	0x00010106
+#define	OID_GEN_LINK_SPEED		0x00010107
+#define	OID_GEN_TRANSMIT_BUFFER_SPACE	0x00010108
+#define	OID_GEN_RECEIVE_BUFFER_SPACE	0x00010109
+#define	OID_GEN_TRANSMIT_BLOCK_SIZE	0x0001010A
+#define	OID_GEN_RECEIVE_BLOCK_SIZE	0x0001010B
+#define	OID_GEN_VENDOR_ID		0x0001010C
+#define	OID_GEN_VENDOR_DESCRIPTION	0x0001010D
+#define	OID_GEN_CURRENT_PACKET_FILTER	0x0001010E
+#define	OID_GEN_CURRENT_LOOKAHEAD	0x0001010F
+#define	OID_GEN_DRIVER_VERSION		0x00010110
+#define	OID_GEN_MAXIMUM_TOTAL_SIZE	0x00010111
+#define	OID_GEN_PROTOCOL_OPTIONS	0x00010112
+#define	OID_GEN_MAC_OPTIONS		0x00010113
+#define	OID_GEN_MEDIA_CONNECT_STATUS	0x00010114
+#define	OID_GEN_MAXIMUM_SEND_PACKETS	0x00010115
+#define	OID_GEN_VENDOR_DRIVER_VERSION	0x00010116
+#define	OID_GEN_SUPPORTED_GUIDS		0x00010117
+#define	OID_GEN_NETWORK_LAYER_ADDRESSES	0x00010118
+#define	OID_GEN_TRANSPORT_HEADER_OFFSET	0x00010119
+#define	OID_GEN_RECEIVE_SCALE_CAPABILITIES	0x00010203
+#define	OID_GEN_RECEIVE_SCALE_PARAMETERS	0x00010204
+#define	OID_GEN_MACHINE_NAME		0x0001021A
+#define	OID_GEN_RNDIS_CONFIG_PARAMETER	0x0001021B
+#define	OID_GEN_VLAN_ID			0x0001021C
+
+#define	OID_802_3_PERMANENT_ADDRESS	0x01010101
+#define	OID_802_3_CURRENT_ADDRESS	0x01010102
+#define	OID_802_3_MULTICAST_LIST	0x01010103
+#define	OID_802_3_MAXIMUM_LIST_SIZE	0x01010104
+#define	OID_802_3_MAC_OPTIONS		0x01010105
+#define	OID_802_3_RCV_ERROR_ALIGNMENT	0x01020101
+#define	OID_802_3_XMIT_ONE_COLLISION	0x01020102
+#define	OID_802_3_XMIT_MORE_COLLISIONS	0x01020103
+#define	OID_802_3_XMIT_DEFERRED		0x01020201
+#define	OID_802_3_XMIT_MAX_COLLISIONS	0x01020202
+#define	OID_802_3_RCV_OVERRUN		0x01020203
+#define	OID_802_3_XMIT_UNDERRUN		0x01020204
+#define	OID_802_3_XMIT_HEARTBEAT_FAILURE	0x01020205
+#define	OID_802_3_XMIT_TIMES_CRS_LOST	0x01020206
+#define	OID_802_3_XMIT_LATE_COLLISIONS	0x01020207
+
+#define	OID_TCP_OFFLOAD_PARAMETERS	0xFC01020C
+#define	OID_TCP_OFFLOAD_HARDWARE_CAPABILITIES	0xFC01020D
+
+#define	RNDIS_MEDIUM_802_3		0x00000000
+
+/* Device flags */
+#define	RNDIS_DF_CONNECTIONLESS		0x00000001
+#define	RNDIS_DF_CONNECTION_ORIENTED	0x00000002
+
+/*
+ * Common RNDIS message header.
+ */
+struct rndis_msghdr {
+	uint32_t type;
+	uint32_t len;
+};
+
+/*
+ * RNDIS data message
+ */
+#define	RNDIS_PACKET_MSG		0x00000001
+
+struct rndis_packet_msg {
+	uint32_t type;
+	uint32_t len;
+	uint32_t dataoffset;
+	uint32_t datalen;
+	uint32_t oobdataoffset;
+	uint32_t oobdatalen;
+	uint32_t oobdataelements;
+	uint32_t pktinfooffset;
+	uint32_t pktinfolen;
+	uint32_t vchandle;
+	uint32_t reserved;
+};
+
+/*
+ * Minimum value for dataoffset, oobdataoffset, and
+ * pktinfooffset.
+ */
+#define	RNDIS_PACKET_MSG_OFFSET_MIN		\
+	(sizeof(struct rndis_packet_msg) -	\
+	 offsetof(struct rndis_packet_msg, dataoffset))
+
+/* Offset from the beginning of rndis_packet_msg. */
+#define	RNDIS_PACKET_MSG_OFFSET_ABS(ofs)	\
+	((ofs) + offsetof(struct rndis_packet_msg, dataoffset))
+
+#define	RNDIS_PACKET_MSG_OFFSET_ALIGN		4
+#define	RNDIS_PACKET_MSG_OFFSET_ALIGNMASK	\
+	(RNDIS_PACKET_MSG_OFFSET_ALIGN - 1)
+
+/* Per-packet-info for RNDIS data message */
+struct rndis_pktinfo {
+	uint32_t size;
+	uint32_t type;		/* NDIS_PKTINFO_TYPE_ */
+	uint32_t offset;
+	uint8_t data[];
+};
+
+#define	RNDIS_PKTINFO_OFFSET		\
+	offsetof(struct rndis_pktinfo, data[0])
+#define	RNDIS_PKTINFO_SIZE_ALIGN	4
+#define	RNDIS_PKTINFO_SIZE_ALIGNMASK	(RNDIS_PKTINFO_SIZE_ALIGN - 1)
+
+#define	NDIS_PKTINFO_TYPE_CSUM		0
+#define	NDIS_PKTINFO_TYPE_IPSEC		1
+#define	NDIS_PKTINFO_TYPE_LSO		2
+#define	NDIS_PKTINFO_TYPE_CLASSIFY	3
+/* reserved 4 */
+#define	NDIS_PKTINFO_TYPE_SGLIST	5
+#define	NDIS_PKTINFO_TYPE_VLAN		6
+#define	NDIS_PKTINFO_TYPE_ORIG		7
+#define	NDIS_PKTINFO_TYPE_PKT_CANCELID	8
+#define	NDIS_PKTINFO_TYPE_ORIG_NBLIST	9
+#define	NDIS_PKTINFO_TYPE_CACHE_NBLIST	10
+#define	NDIS_PKTINFO_TYPE_PKT_PAD	11
+
+/* RNDIS extension */
+
+/* Per-packet hash info */
+#define NDIS_HASH_INFO_SIZE		sizeof(uint32_t)
+#define NDIS_PKTINFO_TYPE_HASHINF	NDIS_PKTINFO_TYPE_ORIG_NBLIST
+/* NDIS_HASH_ */
+
+/* Per-packet hash value */
+#define NDIS_HASH_VALUE_SIZE		sizeof(uint32_t)
+#define NDIS_PKTINFO_TYPE_HASHVAL	NDIS_PKTINFO_TYPE_PKT_CANCELID
+
+/* Per-packet-info size */
+#define RNDIS_PKTINFO_SIZE(dlen)	offsetof(struct rndis_pktinfo, data[dlen])
+
+/*
+ * RNDIS control messages
+ */
+
+/*
+ * Common header for RNDIS completion messages.
+ *
+ * NOTE: It does not apply to RNDIS_RESET_CMPLT.
+ */
+struct rndis_comp_hdr {
+	uint32_t type;
+	uint32_t len;
+	uint32_t rid;
+	uint32_t status;
+};
+
+/* Initialize the device. */
+#define	RNDIS_INITIALIZE_MSG	0x00000002
+#define	RNDIS_INITIALIZE_CMPLT	0x80000002
+
+struct rndis_init_req {
+	uint32_t type;
+	uint32_t len;
+	uint32_t rid;
+	uint32_t ver_major;
+	uint32_t ver_minor;
+	uint32_t max_xfersz;
+};
+
+struct rndis_init_comp {
+	uint32_t type;
+	uint32_t len;
+	uint32_t rid;
+	uint32_t status;
+	uint32_t ver_major;
+	uint32_t ver_minor;
+	uint32_t devflags;
+	uint32_t medium;
+	uint32_t pktmaxcnt;
+	uint32_t pktmaxsz;
+	uint32_t align;
+	uint32_t aflistoffset;
+	uint32_t aflistsz;
+};
+
+#define	RNDIS_INIT_COMP_SIZE_MIN	\
+	offsetof(struct rndis_init_comp, aflistsz)
+
+/* Halt the device.  No response sent. */
+#define	RNDIS_HALT_MSG		0x00000003
+
+struct rndis_halt_req {
+	uint32_t type;
+	uint32_t len;
+	uint32_t rid;
+};
+
+/* Send a query object. */
+#define	RNDIS_QUERY_MSG		0x00000004
+#define	RNDIS_QUERY_CMPLT	0x80000004
+
+struct rndis_query_req {
+	uint32_t type;
+	uint32_t len;
+	uint32_t rid;
+	uint32_t oid;
+	uint32_t infobuflen;
+	uint32_t infobufoffset;
+	uint32_t devicevchdl;
+};
+
+#define	RNDIS_QUERY_REQ_INFOBUFOFFSET		\
+	(sizeof(struct rndis_query_req) -	\
+	 offsetof(struct rndis_query_req, rid))
+
+struct rndis_query_comp {
+	uint32_t type;
+	uint32_t len;
+	uint32_t rid;
+	uint32_t status;
+	uint32_t infobuflen;
+	uint32_t infobufoffset;
+};
+
+/* infobuf offset from the beginning of rndis_query_comp. */
+#define	RNDIS_QUERY_COMP_INFOBUFOFFSET_ABS(ofs)	\
+	((ofs) + offsetof(struct rndis_query_comp, rid))
+
+/* Send a set object request. */
+#define	RNDIS_SET_MSG		0x00000005
+#define	RNDIS_SET_CMPLT		0x80000005
+
+struct rndis_set_req {
+	uint32_t type;
+	uint32_t len;
+	uint32_t rid;
+	uint32_t oid;
+	uint32_t infobuflen;
+	uint32_t infobufoffset;
+	uint32_t devicevchdl;
+};
+
+#define	RNDIS_SET_REQ_INFOBUFOFFSET		\
+	(sizeof(struct rndis_set_req) -		\
+	 offsetof(struct rndis_set_req, rid))
+
+struct rndis_set_comp {
+	uint32_t type;
+	uint32_t len;
+	uint32_t rid;
+	uint32_t status;
+};
+
+/*
+ * Parameter used by OID_GEN_RNDIS_CONFIG_PARAMETER.
+ */
+#define	RNDIS_SET_PARAM_NUMERIC	0x00000000
+#define	RNDIS_SET_PARAM_STRING	0x00000002
+
+struct rndis_set_parameter {
+	uint32_t nameoffset;
+	uint32_t namelen;
+	uint32_t type;
+	uint32_t valueoffset;
+	uint32_t valuelen;
+};
+
+/* Perform a soft reset on the device. */
+#define	RNDIS_RESET_MSG		0x00000006
+#define	RNDIS_RESET_CMPLT		0x80000006
+
+struct rndis_reset_req {
+	uint32_t type;
+	uint32_t len;
+	uint32_t rid;
+};
+
+struct rndis_reset_comp {
+	uint32_t type;
+	uint32_t len;
+	uint32_t status;
+	uint32_t adrreset;
+};
+
+/* 802.3 link-state or undefined message error.  Sent by device. */
+#define	RNDIS_INDICATE_STATUS_MSG	0x00000007
+
+struct rndis_status_msg {
+	uint32_t type;
+	uint32_t len;
+	uint32_t status;
+	uint32_t stbuflen;
+	uint32_t stbufoffset;
+	/* rndis_diag_info */
+};
+
+/* stbuf offset from the beginning of rndis_status_msg. */
+#define	RNDIS_STBUFOFFSET_ABS(ofs)	\
+	((ofs) + offsetof(struct rndis_status_msg, status))
+
+/*
+ * Immediately after rndis_status_msg.stbufoffset, if a control
+ * message is malformatted, or a packet message contains inappropriate
+ * content.
+ */
+struct rndis_diag_info {
+	uint32_t diagstatus;
+	uint32_t erroffset;
+};
+
+/* Keepalive message.  May be sent by device. */
+#define	RNDIS_KEEPALIVE_MSG	0x00000008
+#define	RNDIS_KEEPALIVE_CMPLT	0x80000008
+
+struct rndis_keepalive_req {
+	uint32_t type;
+	uint32_t len;
+	uint32_t rid;
+};
+
+struct rndis_keepalive_comp {
+	uint32_t type;
+	uint32_t len;
+	uint32_t rid;
+	uint32_t status;
+};
+
+/* Packet filter bits used by OID_GEN_CURRENT_PACKET_FILTER */
+#define	NDIS_PACKET_TYPE_NONE			0x00000000
+#define	NDIS_PACKET_TYPE_DIRECTED		0x00000001
+#define	NDIS_PACKET_TYPE_MULTICAST		0x00000002
+#define	NDIS_PACKET_TYPE_ALL_MULTICAST		0x00000004
+#define	NDIS_PACKET_TYPE_BROADCAST		0x00000008
+#define	NDIS_PACKET_TYPE_SOURCE_ROUTING		0x00000010
+#define	NDIS_PACKET_TYPE_PROMISCUOUS		0x00000020
+#define	NDIS_PACKET_TYPE_SMT			0x00000040
+#define	NDIS_PACKET_TYPE_ALL_LOCAL		0x00000080
+#define	NDIS_PACKET_TYPE_GROUP			0x00001000
+#define	NDIS_PACKET_TYPE_ALL_FUNCTIONAL		0x00002000
+#define	NDIS_PACKET_TYPE_FUNCTIONAL		0x00004000
+#define	NDIS_PACKET_TYPE_MAC_FRAME		0x00008000
+
+#endif	/* !_NET_RNDIS_H_ */
diff --git a/drivers/net/netvsc/rte_pmd_netvsc_version.map b/drivers/net/netvsc/rte_pmd_netvsc_version.map
new file mode 100644
index 000000000000..5e5f5c7d5ccc
--- /dev/null
+++ b/drivers/net/netvsc/rte_pmd_netvsc_version.map
@@ -0,0 +1,5 @@
+/* SPDX-License-Identifier: BSD-3-Clause */
+
+DPDK_18.02 {
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index b2be8f23cd96..38d42d6c4269 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -184,6 +184,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST)      += -lrte_pmd_vhost
 endif # $(CONFIG_RTE_LIBRTE_VHOST)
 _LDLIBS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD)    += -lrte_pmd_vmxnet3_uio
 _LDLIBS-$(CONFIG_RTE_LIBRTE_VMBUS)	    += -lrte_bus_vmbus -luuid
+_LDLIBS-$(CONFIG_RTE_LIBRTE_NETVSC_PMD)	    += -lrte_pmd_netvsc
 
 ifeq ($(CONFIG_RTE_LIBRTE_BBDEV),y)
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BBDEV_NULL)     += -lrte_pmd_bbdev_null
-- 
2.16.3

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH 2/3] usertools: add hv_uio_setup script
  2018-04-05 19:13 ` [dpdk-dev] [PATCH 2/3] usertools: add hv_uio_setup script Stephen Hemminger
@ 2018-04-05 20:43   ` Thomas Monjalon
  2018-04-05 21:03     ` Stephen Hemminger
  2018-04-05 21:07     ` Bruce Richardson
  0 siblings, 2 replies; 20+ messages in thread
From: Thomas Monjalon @ 2018-04-05 20:43 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Stephen Hemminger

05/04/2018 21:13, Stephen Hemminger:
> Small script to rebind netvsc kernel device to Hyper-V
> networking PMD. It could be integrated in dpdk-bind, but dpdk-bind
> is focused on PCI, and that would get messy.
> 
> Eventually, this functionality will be built into netvsc driver
> (see vdev_netvsc as an example).

I believe we should avoid creating such script.
The direction to go, for hotplug, is to remove dpdk-devbind.py,
and implement kernel binding in PMDs (with EAL helpers).

In order to make this change happen, we should not
add this hv_uio_setup.sh script.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] net/netvsc: add hyper-v netvsc network device
  2018-04-05 19:13 ` [dpdk-dev] [PATCH 3/3] net/netvsc: add hyper-v netvsc network device Stephen Hemminger
@ 2018-04-05 20:52   ` Thomas Monjalon
  2018-04-05 20:59     ` Stephen Hemminger
  0 siblings, 1 reply; 20+ messages in thread
From: Thomas Monjalon @ 2018-04-05 20:52 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Stephen Hemminger

Hi Stephen,

Good to see there is a good progress.

This patch should add an entry in the release notes.
But I guess it is not ready for 18.05?


05/04/2018 21:13, Stephen Hemminger:
> +#
> +# Compile native PMD for Hyper-V/Azure
> +#
> +CONFIG_RTE_LIBRTE_NETVSC_PMD=n
> +CONFIG_RTE_LIBRTE_NETVSC_DEBUG_RX=n
> +CONFIG_RTE_LIBRTE_NETVSC_DEBUG_TX=n
> +CONFIG_RTE_LIBRTE_NETVSC_DEBUG_DUMP=n

Please switch to the new dynamic logging.


[...]
> +the Data Plane Development Kit (DPDK), we provide a Netwwork Virtual

typo: Netwwork


> +The following prerequisites apply:
> +
> +*   Linux kernel uio_hv_generic driver that supports subchannels. This should be present in 4.17 or later.

The DPDK policy is to wait for prerequisite be available for merging.


> --- a/drivers/net/Makefile
> +++ b/drivers/net/Makefile
> @@ -46,6 +46,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD) += thunderx
>  DIRS-$(CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD) += vdev_netvsc
>  DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio
>  DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3
> +DIRS-$(CONFIG_RTE_LIBRTE_NETVSC_PMD) += netvsc

Please keep the alphabetical order.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] net/netvsc: add hyper-v netvsc network device
  2018-04-05 20:52   ` Thomas Monjalon
@ 2018-04-05 20:59     ` Stephen Hemminger
  2018-04-05 21:07       ` Thomas Monjalon
  0 siblings, 1 reply; 20+ messages in thread
From: Stephen Hemminger @ 2018-04-05 20:59 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: Stephen Hemminger, dev

On Thu, 05 Apr 2018 22:52:31 +0200
Thomas Monjalon <thomas@monjalon.net> wrote:

> Hi Stephen,
> 
> Good to see there is a good progress.
> 
> This patch should add an entry in the release notes.
> But I guess it is not ready for 18.05?
> 
> 
> 05/04/2018 21:13, Stephen Hemminger:
> > +#
> > +# Compile native PMD for Hyper-V/Azure
> > +#
> > +CONFIG_RTE_LIBRTE_NETVSC_PMD=n
> > +CONFIG_RTE_LIBRTE_NETVSC_DEBUG_RX=n
> > +CONFIG_RTE_LIBRTE_NETVSC_DEBUG_TX=n
> > +CONFIG_RTE_LIBRTE_NETVSC_DEBUG_DUMP=n  
> 
> Please switch to the new dynamic logging.

It does use dynamic logging for the normal driver logs. For debug the dump
code is a config option (same as other current drivers).

> 
> 
> [...]
> > +the Data Plane Development Kit (DPDK), we provide a Netwwork Virtual  
> 
> typo: Netwwork
> 
> 
> > +The following prerequisites apply:
> > +
> > +*   Linux kernel uio_hv_generic driver that supports subchannels. This should be present in 4.17 or later.  
> 
> The DPDK policy is to wait for prerequisite be available for merging.

Does linux-next count?

> 
> 
> > --- a/drivers/net/Makefile
> > +++ b/drivers/net/Makefile
> > @@ -46,6 +46,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD) += thunderx
> >  DIRS-$(CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD) += vdev_netvsc
> >  DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio
> >  DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3
> > +DIRS-$(CONFIG_RTE_LIBRTE_NETVSC_PMD) += netvsc  
> 
> Please keep the alphabetical order.

Ok

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH 2/3] usertools: add hv_uio_setup script
  2018-04-05 20:43   ` Thomas Monjalon
@ 2018-04-05 21:03     ` Stephen Hemminger
  2018-04-05 21:13       ` Thomas Monjalon
  2018-04-05 21:07     ` Bruce Richardson
  1 sibling, 1 reply; 20+ messages in thread
From: Stephen Hemminger @ 2018-04-05 21:03 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: Stephen Hemminger, dev

On Thu, 05 Apr 2018 22:43:39 +0200
Thomas Monjalon <thomas@monjalon.net> wrote:

> 05/04/2018 21:13, Stephen Hemminger:
> > Small script to rebind netvsc kernel device to Hyper-V
> > networking PMD. It could be integrated in dpdk-bind, but dpdk-bind
> > is focused on PCI, and that would get messy.
> > 
> > Eventually, this functionality will be built into netvsc driver
> > (see vdev_netvsc as an example).  
> 
> I believe we should avoid creating such script.
> The direction to go, for hotplug, is to remove dpdk-devbind.py,
> and implement kernel binding in PMDs (with EAL helpers).
> 
> In order to make this change happen, we should not
> add this hv_uio_setup.sh script.

Yes, this is a temporary script like dpdk-bind, want to get rid of it
and do everything inside driver. That is the next step.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH 2/3] usertools: add hv_uio_setup script
  2018-04-05 20:43   ` Thomas Monjalon
  2018-04-05 21:03     ` Stephen Hemminger
@ 2018-04-05 21:07     ` Bruce Richardson
  2018-04-05 21:10       ` Thomas Monjalon
  1 sibling, 1 reply; 20+ messages in thread
From: Bruce Richardson @ 2018-04-05 21:07 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: Stephen Hemminger, dev, Stephen Hemminger

On Thu, Apr 05, 2018 at 10:43:39PM +0200, Thomas Monjalon wrote:
> 05/04/2018 21:13, Stephen Hemminger:
> > Small script to rebind netvsc kernel device to Hyper-V
> > networking PMD. It could be integrated in dpdk-bind, but dpdk-bind
> > is focused on PCI, and that would get messy.
> > 
> > Eventually, this functionality will be built into netvsc driver
> > (see vdev_netvsc as an example).
> 
> I believe we should avoid creating such script.
> The direction to go, for hotplug, is to remove dpdk-devbind.py,
> and implement kernel binding in PMDs (with EAL helpers).
>
I'm not convinced at all that that is the direction to go. I instead would
prefer to see all binding happen outside DPDK. I believe having udev or
similar manage bindings, set up via e.g driverctl[1], is a far better path.

Just my 2c.
/Bruce

[1] https://gitlab.com/driverctl/driverctl

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] net/netvsc: add hyper-v netvsc network device
  2018-04-05 20:59     ` Stephen Hemminger
@ 2018-04-05 21:07       ` Thomas Monjalon
  2018-04-05 21:19         ` Stephen Hemminger
  0 siblings, 1 reply; 20+ messages in thread
From: Thomas Monjalon @ 2018-04-05 21:07 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Stephen Hemminger, dev

05/04/2018 22:59, Stephen Hemminger:
> On Thu, 05 Apr 2018 22:52:31 +0200
> Thomas Monjalon <thomas@monjalon.net> wrote:
> 
> > Hi Stephen,
> > 
> > Good to see there is a good progress.
> > 
> > This patch should add an entry in the release notes.
> > But I guess it is not ready for 18.05?

[...]
> > > +The following prerequisites apply:
> > > +
> > > +*   Linux kernel uio_hv_generic driver that supports subchannels. This should be present in 4.17 or later.  
> > 
> > The DPDK policy is to wait for prerequisite be available for merging.
> 
> Does linux-next count?

I would say no, but I could be convinced of the contrary.
Can we have ABI breakage from linux-next to mainline?
What is the benefit of pushing the PMD early?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH 2/3] usertools: add hv_uio_setup script
  2018-04-05 21:07     ` Bruce Richardson
@ 2018-04-05 21:10       ` Thomas Monjalon
  2018-04-05 22:43         ` Stephen Hemminger
  2018-04-05 23:57         ` Ananyev, Konstantin
  0 siblings, 2 replies; 20+ messages in thread
From: Thomas Monjalon @ 2018-04-05 21:10 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: Stephen Hemminger, dev, Stephen Hemminger

05/04/2018 23:07, Bruce Richardson:
> On Thu, Apr 05, 2018 at 10:43:39PM +0200, Thomas Monjalon wrote:
> > 05/04/2018 21:13, Stephen Hemminger:
> > > Small script to rebind netvsc kernel device to Hyper-V
> > > networking PMD. It could be integrated in dpdk-bind, but dpdk-bind
> > > is focused on PCI, and that would get messy.
> > > 
> > > Eventually, this functionality will be built into netvsc driver
> > > (see vdev_netvsc as an example).
> > 
> > I believe we should avoid creating such script.
> > The direction to go, for hotplug, is to remove dpdk-devbind.py,
> > and implement kernel binding in PMDs (with EAL helpers).
> >
> I'm not convinced at all that that is the direction to go. I instead would
> prefer to see all binding happen outside DPDK. I believe having udev or
> similar manage bindings, set up via e.g driverctl[1], is a far better path.

This is a system admin tool, and only for Linux.
Having the binding logic inside DPDK, allows the application to control
how hotplug behave.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH 2/3] usertools: add hv_uio_setup script
  2018-04-05 21:03     ` Stephen Hemminger
@ 2018-04-05 21:13       ` Thomas Monjalon
  2018-04-05 21:18         ` Stephen Hemminger
                           ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Thomas Monjalon @ 2018-04-05 21:13 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Stephen Hemminger, dev

05/04/2018 23:03, Stephen Hemminger:
> On Thu, 05 Apr 2018 22:43:39 +0200
> Thomas Monjalon <thomas@monjalon.net> wrote:
> 
> > 05/04/2018 21:13, Stephen Hemminger:
> > > Small script to rebind netvsc kernel device to Hyper-V
> > > networking PMD. It could be integrated in dpdk-bind, but dpdk-bind
> > > is focused on PCI, and that would get messy.
> > > 
> > > Eventually, this functionality will be built into netvsc driver
> > > (see vdev_netvsc as an example).  
> > 
> > I believe we should avoid creating such script.
> > The direction to go, for hotplug, is to remove dpdk-devbind.py,
> > and implement kernel binding in PMDs (with EAL helpers).
> > 
> > In order to make this change happen, we should not
> > add this hv_uio_setup.sh script.
> 
> Yes, this is a temporary script like dpdk-bind, want to get rid of it
> and do everything inside driver. That is the next step.

If this is temporary, it is a step in the wrong direction which
could confuse users.

If it becomes definitive (design discussion in progress), then it should
be merged in dpdk-devbind.py.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH 2/3] usertools: add hv_uio_setup script
  2018-04-05 21:13       ` Thomas Monjalon
@ 2018-04-05 21:18         ` Stephen Hemminger
  2018-04-05 21:20         ` Stephen Hemminger
  2018-04-05 22:39         ` Stephen Hemminger
  2 siblings, 0 replies; 20+ messages in thread
From: Stephen Hemminger @ 2018-04-05 21:18 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: Stephen Hemminger, dev

On Thu, 05 Apr 2018 23:13:54 +0200
Thomas Monjalon <thomas@monjalon.net> wrote:

> 05/04/2018 23:03, Stephen Hemminger:
> > On Thu, 05 Apr 2018 22:43:39 +0200
> > Thomas Monjalon <thomas@monjalon.net> wrote:
> >   
> > > 05/04/2018 21:13, Stephen Hemminger:  
> > > > Small script to rebind netvsc kernel device to Hyper-V
> > > > networking PMD. It could be integrated in dpdk-bind, but dpdk-bind
> > > > is focused on PCI, and that would get messy.
> > > > 
> > > > Eventually, this functionality will be built into netvsc driver
> > > > (see vdev_netvsc as an example).    
> > > 
> > > I believe we should avoid creating such script.
> > > The direction to go, for hotplug, is to remove dpdk-devbind.py,
> > > and implement kernel binding in PMDs (with EAL helpers).
> > > 
> > > In order to make this change happen, we should not
> > > add this hv_uio_setup.sh script.  
> > 
> > Yes, this is a temporary script like dpdk-bind, want to get rid of it
> > and do everything inside driver. That is the next step.  
> 
> If this is temporary, it is a step in the wrong direction which
> could confuse users.
> 
> If it becomes definitive (design discussion in progress), then it should
> be merged in dpdk-devbind.py.
> 
> 

This is an experimental driver, if everyone waits until everything is done
there would be no review or testing.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] net/netvsc: add hyper-v netvsc network device
  2018-04-05 21:07       ` Thomas Monjalon
@ 2018-04-05 21:19         ` Stephen Hemminger
  0 siblings, 0 replies; 20+ messages in thread
From: Stephen Hemminger @ 2018-04-05 21:19 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: Stephen Hemminger, dev

On Thu, 05 Apr 2018 23:07:45 +0200
Thomas Monjalon <thomas@monjalon.net> wrote:

> 05/04/2018 22:59, Stephen Hemminger:
> > On Thu, 05 Apr 2018 22:52:31 +0200
> > Thomas Monjalon <thomas@monjalon.net> wrote:
> >   
> > > Hi Stephen,
> > > 
> > > Good to see there is a good progress.
> > > 
> > > This patch should add an entry in the release notes.
> > > But I guess it is not ready for 18.05?  
> 
> [...]
> > > > +The following prerequisites apply:
> > > > +
> > > > +*   Linux kernel uio_hv_generic driver that supports subchannels. This should be present in 4.17 or later.    
> > > 
> > > The DPDK policy is to wait for prerequisite be available for merging.  
> > 
> > Does linux-next count?  
> 
> I would say no, but I could be convinced of the contrary.
> Can we have ABI breakage from linux-next to mainline?
> What is the benefit of pushing the PMD early?

There are already people using earlier versions and sending feeback.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH 2/3] usertools: add hv_uio_setup script
  2018-04-05 21:13       ` Thomas Monjalon
  2018-04-05 21:18         ` Stephen Hemminger
@ 2018-04-05 21:20         ` Stephen Hemminger
  2018-04-05 22:39         ` Stephen Hemminger
  2 siblings, 0 replies; 20+ messages in thread
From: Stephen Hemminger @ 2018-04-05 21:20 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: Stephen Hemminger, dev

On Thu, 05 Apr 2018 23:13:54 +0200
Thomas Monjalon <thomas@monjalon.net> wrote:

> 05/04/2018 23:03, Stephen Hemminger:
> > On Thu, 05 Apr 2018 22:43:39 +0200
> > Thomas Monjalon <thomas@monjalon.net> wrote:
> >   
> > > 05/04/2018 21:13, Stephen Hemminger:  
> > > > Small script to rebind netvsc kernel device to Hyper-V
> > > > networking PMD. It could be integrated in dpdk-bind, but dpdk-bind
> > > > is focused on PCI, and that would get messy.
> > > > 
> > > > Eventually, this functionality will be built into netvsc driver
> > > > (see vdev_netvsc as an example).    
> > > 
> > > I believe we should avoid creating such script.
> > > The direction to go, for hotplug, is to remove dpdk-devbind.py,
> > > and implement kernel binding in PMDs (with EAL helpers).
> > > 
> > > In order to make this change happen, we should not
> > > add this hv_uio_setup.sh script.  
> > 
> > Yes, this is a temporary script like dpdk-bind, want to get rid of it
> > and do everything inside driver. That is the next step.  
> 
> If this is temporary, it is a step in the wrong direction which
> could confuse users.
> 
> If it becomes definitive (design discussion in progress), then it should
> be merged in dpdk-devbind.py.

I looked into changing dpdk-devbind.py but it needed lots of work and to be
honest the time for me to do it python was too much trouble and likely to
break existing users.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH 2/3] usertools: add hv_uio_setup script
  2018-04-05 21:13       ` Thomas Monjalon
  2018-04-05 21:18         ` Stephen Hemminger
  2018-04-05 21:20         ` Stephen Hemminger
@ 2018-04-05 22:39         ` Stephen Hemminger
  2 siblings, 0 replies; 20+ messages in thread
From: Stephen Hemminger @ 2018-04-05 22:39 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: Stephen Hemminger, dev

On Thu, 05 Apr 2018 23:13:54 +0200
Thomas Monjalon <thomas@monjalon.net> wrote:

> 05/04/2018 23:03, Stephen Hemminger:
> > On Thu, 05 Apr 2018 22:43:39 +0200
> > Thomas Monjalon <thomas@monjalon.net> wrote:
> >   
> > > 05/04/2018 21:13, Stephen Hemminger:  
> > > > Small script to rebind netvsc kernel device to Hyper-V
> > > > networking PMD. It could be integrated in dpdk-bind, but dpdk-bind
> > > > is focused on PCI, and that would get messy.
> > > > 
> > > > Eventually, this functionality will be built into netvsc driver
> > > > (see vdev_netvsc as an example).    
> > > 
> > > I believe we should avoid creating such script.
> > > The direction to go, for hotplug, is to remove dpdk-devbind.py,
> > > and implement kernel binding in PMDs (with EAL helpers).
> > > 
> > > In order to make this change happen, we should not
> > > add this hv_uio_setup.sh script.  
> > 
> > Yes, this is a temporary script like dpdk-bind, want to get rid of it
> > and do everything inside driver. That is the next step.  
> 
> If this is temporary, it is a step in the wrong direction which
> could confuse users.
> 
> If it becomes definitive (design discussion in progress), then it should
> be merged in dpdk-devbind.py.

Right now PCI does it the same way. It doesn't have a good cold plug interface.
The whole PCI probe logic expects that the device will have a vfio/uio already
bound.

When PCI is fixed, then VMBUS can be changed to the same thing.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH 2/3] usertools: add hv_uio_setup script
  2018-04-05 21:10       ` Thomas Monjalon
@ 2018-04-05 22:43         ` Stephen Hemminger
  2018-04-05 23:57         ` Ananyev, Konstantin
  1 sibling, 0 replies; 20+ messages in thread
From: Stephen Hemminger @ 2018-04-05 22:43 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: Bruce Richardson, Stephen Hemminger, dev

On Thu, 05 Apr 2018 23:10:33 +0200
Thomas Monjalon <thomas@monjalon.net> wrote:

> 05/04/2018 23:07, Bruce Richardson:
> > On Thu, Apr 05, 2018 at 10:43:39PM +0200, Thomas Monjalon wrote:  
> > > 05/04/2018 21:13, Stephen Hemminger:  
> > > > Small script to rebind netvsc kernel device to Hyper-V
> > > > networking PMD. It could be integrated in dpdk-bind, but dpdk-bind
> > > > is focused on PCI, and that would get messy.
> > > > 
> > > > Eventually, this functionality will be built into netvsc driver
> > > > (see vdev_netvsc as an example).  
> > > 
> > > I believe we should avoid creating such script.
> > > The direction to go, for hotplug, is to remove dpdk-devbind.py,
> > > and implement kernel binding in PMDs (with EAL helpers).
> > >  
> > I'm not convinced at all that that is the direction to go. I instead would
> > prefer to see all binding happen outside DPDK. I believe having udev or
> > similar manage bindings, set up via e.g driverctl[1], is a far better path.  
> 
> This is a system admin tool, and only for Linux.
> Having the binding logic inside DPDK, allows the application to control
> how hotplug behave.
> 
> 

What about using driverctl?
That solution would work for both PCI and VMBUS, but not sure how widely
adopted it is by distributions.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH 2/3] usertools: add hv_uio_setup script
  2018-04-05 21:10       ` Thomas Monjalon
  2018-04-05 22:43         ` Stephen Hemminger
@ 2018-04-05 23:57         ` Ananyev, Konstantin
  2018-04-06  0:22           ` Stephen Hemminger
  1 sibling, 1 reply; 20+ messages in thread
From: Ananyev, Konstantin @ 2018-04-05 23:57 UTC (permalink / raw)
  To: Thomas Monjalon, Richardson, Bruce
  Cc: Stephen Hemminger, dev, Stephen Hemminger



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas Monjalon
> Sent: Thursday, April 5, 2018 10:11 PM
> To: Richardson, Bruce <bruce.richardson@intel.com>
> Cc: Stephen Hemminger <sthemmin@microsoft.com>; dev@dpdk.org; Stephen Hemminger <stephen@networkplumber.org>
> Subject: Re: [dpdk-dev] [PATCH 2/3] usertools: add hv_uio_setup script
> 
> 05/04/2018 23:07, Bruce Richardson:
> > On Thu, Apr 05, 2018 at 10:43:39PM +0200, Thomas Monjalon wrote:
> > > 05/04/2018 21:13, Stephen Hemminger:
> > > > Small script to rebind netvsc kernel device to Hyper-V
> > > > networking PMD. It could be integrated in dpdk-bind, but dpdk-bind
> > > > is focused on PCI, and that would get messy.
> > > >
> > > > Eventually, this functionality will be built into netvsc driver
> > > > (see vdev_netvsc as an example).
> > >
> > > I believe we should avoid creating such script.
> > > The direction to go, for hotplug, is to remove dpdk-devbind.py,
> > > and implement kernel binding in PMDs (with EAL helpers).
> > >
> > I'm not convinced at all that that is the direction to go. I instead would
> > prefer to see all binding happen outside DPDK. I believe having udev or
> > similar manage bindings, set up via e.g driverctl[1], is a far better path.
> 
> This is a system admin tool, and only for Linux.
> Having the binding logic inside DPDK, allows the application to control
> how hotplug behave.

I also don't think that DPDK application should control hotplug behavior logic.
It is clearly up to the system admin to make such decisions. 
Konstantin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH 2/3] usertools: add hv_uio_setup script
  2018-04-05 23:57         ` Ananyev, Konstantin
@ 2018-04-06  0:22           ` Stephen Hemminger
  2018-04-06  8:38             ` Bruce Richardson
  0 siblings, 1 reply; 20+ messages in thread
From: Stephen Hemminger @ 2018-04-06  0:22 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: Thomas Monjalon, Richardson, Bruce, Stephen Hemminger, dev

On Thu, 5 Apr 2018 23:57:47 +0000
"Ananyev, Konstantin" <konstantin.ananyev@intel.com> wrote:

> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas Monjalon
> > Sent: Thursday, April 5, 2018 10:11 PM
> > To: Richardson, Bruce <bruce.richardson@intel.com>
> > Cc: Stephen Hemminger <sthemmin@microsoft.com>; dev@dpdk.org; Stephen Hemminger <stephen@networkplumber.org>
> > Subject: Re: [dpdk-dev] [PATCH 2/3] usertools: add hv_uio_setup script
> > 
> > 05/04/2018 23:07, Bruce Richardson:  
> > > On Thu, Apr 05, 2018 at 10:43:39PM +0200, Thomas Monjalon wrote:  
> > > > 05/04/2018 21:13, Stephen Hemminger:  
> > > > > Small script to rebind netvsc kernel device to Hyper-V
> > > > > networking PMD. It could be integrated in dpdk-bind, but dpdk-bind
> > > > > is focused on PCI, and that would get messy.
> > > > >
> > > > > Eventually, this functionality will be built into netvsc driver
> > > > > (see vdev_netvsc as an example).  
> > > >
> > > > I believe we should avoid creating such script.
> > > > The direction to go, for hotplug, is to remove dpdk-devbind.py,
> > > > and implement kernel binding in PMDs (with EAL helpers).
> > > >  
> > > I'm not convinced at all that that is the direction to go. I instead would
> > > prefer to see all binding happen outside DPDK. I believe having udev or
> > > similar manage bindings, set up via e.g driverctl[1], is a far better path.  
> > 
> > This is a system admin tool, and only for Linux.
> > Having the binding logic inside DPDK, allows the application to control
> > how hotplug behave.  
> 
> I also don't think that DPDK application should control hotplug behavior logic.
> It is clearly up to the system admin to make such decisions. 
> Konstantin

My preference would be to get driverctl working as a standard tool.
But it requires kernel changes to work with vmbus.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH 2/3] usertools: add hv_uio_setup script
  2018-04-06  0:22           ` Stephen Hemminger
@ 2018-04-06  8:38             ` Bruce Richardson
  0 siblings, 0 replies; 20+ messages in thread
From: Bruce Richardson @ 2018-04-06  8:38 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Ananyev, Konstantin, Thomas Monjalon, Stephen Hemminger, dev

On Thu, Apr 05, 2018 at 05:22:42PM -0700, Stephen Hemminger wrote:
> On Thu, 5 Apr 2018 23:57:47 +0000
> "Ananyev, Konstantin" <konstantin.ananyev@intel.com> wrote:
> 
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas Monjalon
> > > Sent: Thursday, April 5, 2018 10:11 PM
> > > To: Richardson, Bruce <bruce.richardson@intel.com>
> > > Cc: Stephen Hemminger <sthemmin@microsoft.com>; dev@dpdk.org; Stephen Hemminger <stephen@networkplumber.org>
> > > Subject: Re: [dpdk-dev] [PATCH 2/3] usertools: add hv_uio_setup script
> > > 
> > > 05/04/2018 23:07, Bruce Richardson:  
> > > > On Thu, Apr 05, 2018 at 10:43:39PM +0200, Thomas Monjalon wrote:  
> > > > > 05/04/2018 21:13, Stephen Hemminger:  
> > > > > > Small script to rebind netvsc kernel device to Hyper-V
> > > > > > networking PMD. It could be integrated in dpdk-bind, but dpdk-bind
> > > > > > is focused on PCI, and that would get messy.
> > > > > >
> > > > > > Eventually, this functionality will be built into netvsc driver
> > > > > > (see vdev_netvsc as an example).  
> > > > >
> > > > > I believe we should avoid creating such script.
> > > > > The direction to go, for hotplug, is to remove dpdk-devbind.py,
> > > > > and implement kernel binding in PMDs (with EAL helpers).
> > > > >  
> > > > I'm not convinced at all that that is the direction to go. I instead would
> > > > prefer to see all binding happen outside DPDK. I believe having udev or
> > > > similar manage bindings, set up via e.g driverctl[1], is a far better path.  
> > > 
> > > This is a system admin tool, and only for Linux.
> > > Having the binding logic inside DPDK, allows the application to control
> > > how hotplug behave.  
> > 
> > I also don't think that DPDK application should control hotplug behavior logic.
> > It is clearly up to the system admin to make such decisions. 
> > Konstantin
> 
> My preference would be to get driverctl working as a standard tool.
> But it requires kernel changes to work with vmbus.
> 
+1

I don't think that binding should be done by DPDK for a couple of reasons:
1. There are already daemons and kernel supports out there, such as udev,
   for managing devices on a system level. I'd rather not see DPDK duplicate
   functionality, when we can re-use what is there. Also there exists the
   possibility of conflict, e.g. what if udev has a rule for a device, and
   DPDK also tries to manage it at the same time.

2. I believe that the app is the wrong place to manage the binding of
   devices, since it's up to the system administrator not the app to determine
   the exact setup for the platform. If apps are to manage binding, then each
   app will have to expose to the user/sysadmin cmdline options to specify
   what devices should be hotplugged into the app or not, and what drivers
   they should be bound too. Not all NICs hotplugged to a platform are for
   DPDK use, and they won't all want to use the igb_uio or the vfio_pci
   drivers. Better that that is configured for each platform on the platform
   itself.

I really feel that the driverctl approach is the best one - yes it's linux
only for now, but architecturally I think it's the proper solution.

/Bruce

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2018-04-06  8:38 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-05 19:13 [dpdk-dev] [PATCH 0/3] add Hyper-V bus and network driver Stephen Hemminger
2018-04-05 19:13 ` [dpdk-dev] [PATCH 1/3] bus/vmbus: add hyper-v virtual bus support Stephen Hemminger
2018-04-05 19:13 ` [dpdk-dev] [PATCH 2/3] usertools: add hv_uio_setup script Stephen Hemminger
2018-04-05 20:43   ` Thomas Monjalon
2018-04-05 21:03     ` Stephen Hemminger
2018-04-05 21:13       ` Thomas Monjalon
2018-04-05 21:18         ` Stephen Hemminger
2018-04-05 21:20         ` Stephen Hemminger
2018-04-05 22:39         ` Stephen Hemminger
2018-04-05 21:07     ` Bruce Richardson
2018-04-05 21:10       ` Thomas Monjalon
2018-04-05 22:43         ` Stephen Hemminger
2018-04-05 23:57         ` Ananyev, Konstantin
2018-04-06  0:22           ` Stephen Hemminger
2018-04-06  8:38             ` Bruce Richardson
2018-04-05 19:13 ` [dpdk-dev] [PATCH 3/3] net/netvsc: add hyper-v netvsc network device Stephen Hemminger
2018-04-05 20:52   ` Thomas Monjalon
2018-04-05 20:59     ` Stephen Hemminger
2018-04-05 21:07       ` Thomas Monjalon
2018-04-05 21:19         ` Stephen Hemminger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).