[dpdk-dev] [PATCH v4 0/7] Hyper-V Poll Mode driver

DPDK patches and discussions
 help / color / mirror / Atom feed

* [dpdk-dev] [PATCH v4 0/7] Hyper-V Poll Mode driver
@ 2015-04-21 17:32 Stephen Hemminger
  2015-04-21 17:32 ` [dpdk-dev] [PATCH v4 1/7] ether: add function to query for link state interrupt Stephen Hemminger
                   ` (6 more replies)
  0 siblings, 7 replies; 17+ messages in thread
From: Stephen Hemminger @ 2015-04-21 17:32 UTC (permalink / raw)
  To: alexmay; +Cc: dev

Hyper-V Poll Mode Driver.

Only change from v3 is addition of pieces that were missing
for rte_vmbus.h and rte_vmbus.c

Stephen Hemminger (7):
  ether: add function to query for link state interrupt
  pmd: change drivers initialization for pci
  hv: add basic vmbus support
  hv: uio driver
  hv: poll mode driver
  hv: enable driver in common config
  hv: add kernel patch

 config/common_linuxapp                             |    9 +
 lib/Makefile                                       |    1 +
 lib/librte_eal/common/Makefile                     |    2 +-
 lib/librte_eal/common/eal_common_options.c         |    5 +
 lib/librte_eal/common/eal_internal_cfg.h           |    1 +
 lib/librte_eal/common/eal_options.h                |    2 +
 lib/librte_eal/common/eal_private.h                |   10 +
 lib/librte_eal/common/include/rte_vmbus.h          |  159 ++
 lib/librte_eal/linuxapp/Makefile                   |    3 +
 lib/librte_eal/linuxapp/eal/Makefile               |    3 +
 lib/librte_eal/linuxapp/eal/eal.c                  |   11 +
 lib/librte_eal/linuxapp/eal/eal_vmbus.c            |  641 ++++++++
 lib/librte_eal/linuxapp/hv_uio/Makefile            |   57 +
 lib/librte_eal/linuxapp/hv_uio/hv_uio.c            |  551 +++++++
 lib/librte_eal/linuxapp/hv_uio/hyperv_net.h        |  907 +++++++++++
 .../linuxapp/hv_uio/vmbus-get-pages.patch          |   55 +
 lib/librte_ether/rte_ethdev.c                      |  142 +-
 lib/librte_ether/rte_ethdev.h                      |   27 +-
 lib/librte_pmd_e1000/em_ethdev.c                   |    2 +-
 lib/librte_pmd_e1000/igb_ethdev.c                  |    4 +-
 lib/librte_pmd_enic/enic_ethdev.c                  |    2 +-
 lib/librte_pmd_fm10k/fm10k_ethdev.c                |    2 +-
 lib/librte_pmd_hyperv/Makefile                     |   28 +
 lib/librte_pmd_hyperv/hyperv.h                     |  169 ++
 lib/librte_pmd_hyperv/hyperv_drv.c                 | 1653 ++++++++++++++++++++
 lib/librte_pmd_hyperv/hyperv_drv.h                 |  558 +++++++
 lib/librte_pmd_hyperv/hyperv_ethdev.c              |  332 ++++
 lib/librte_pmd_hyperv/hyperv_logs.h                |   69 +
 lib/librte_pmd_hyperv/hyperv_rxtx.c                |  403 +++++
 lib/librte_pmd_hyperv/hyperv_rxtx.h                |   35 +
 lib/librte_pmd_i40e/i40e_ethdev.c                  |    2 +-
 lib/librte_pmd_i40e/i40e_ethdev_vf.c               |    2 +-
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c                |    4 +-
 lib/librte_pmd_virtio/virtio_ethdev.c              |    2 +-
 lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c            |    2 +-
 mk/rte.app.mk                                      |    4 +
 36 files changed, 5839 insertions(+), 20 deletions(-)
 create mode 100644 lib/librte_eal/common/include/rte_vmbus.h
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_vmbus.c
 create mode 100644 lib/librte_eal/linuxapp/hv_uio/Makefile
 create mode 100644 lib/librte_eal/linuxapp/hv_uio/hv_uio.c
 create mode 100644 lib/librte_eal/linuxapp/hv_uio/hyperv_net.h
 create mode 100644 lib/librte_eal/linuxapp/hv_uio/vmbus-get-pages.patch
 create mode 100644 lib/librte_pmd_hyperv/Makefile
 create mode 100644 lib/librte_pmd_hyperv/hyperv.h
 create mode 100644 lib/librte_pmd_hyperv/hyperv_drv.c
 create mode 100644 lib/librte_pmd_hyperv/hyperv_drv.h
 create mode 100644 lib/librte_pmd_hyperv/hyperv_ethdev.c
 create mode 100644 lib/librte_pmd_hyperv/hyperv_logs.h
 create mode 100644 lib/librte_pmd_hyperv/hyperv_rxtx.c
 create mode 100644 lib/librte_pmd_hyperv/hyperv_rxtx.h

-- 
2.1.4

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [dpdk-dev] [PATCH v4 1/7] ether: add function to query for link state interrupt
  2015-04-21 17:32 [dpdk-dev] [PATCH v4 0/7] Hyper-V Poll Mode driver Stephen Hemminger
@ 2015-04-21 17:32 ` Stephen Hemminger
  2015-07-08 23:42   ` Thomas Monjalon
       [not found]   ` <d0360434d10a44dcb9f5c9c7220c3162@HQ1WP-EXMB11.corp.brocade.com>
  2015-04-21 17:32 ` [dpdk-dev] [PATCH v4 2/7] pmd: change drivers initialization for pci Stephen Hemminger
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 17+ messages in thread
From: Stephen Hemminger @ 2015-04-21 17:32 UTC (permalink / raw)
  To: alexmay; +Cc: dev, Stephen Hemminger

From: Stephen Hemminger <shemming@brocade.com>

Allow application to query whether link state will work.
This is also part of abstracting dependency on PCI.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/librte_ether/rte_ethdev.c | 14 ++++++++++++++
 lib/librte_ether/rte_ethdev.h | 12 ++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index e20cca5..9577d17 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1340,6 +1340,20 @@ rte_eth_dev_start(uint8_t port_id)
 	return 0;
 }
 
+int
+rte_eth_has_link_state(uint8_t port_id)
+{
+	struct rte_eth_dev *dev;
+
+	if (port_id >= nb_ports) {
+		PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+		return 0;
+	}
+	dev = &rte_eth_devices[port_id];
+
+	return (dev->pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC) != 0;
+}
+
 void
 rte_eth_dev_stop(uint8_t port_id)
 {
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 4648290..991023b 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -2064,6 +2064,18 @@ extern void rte_eth_link_get_nowait(uint8_t port_id,
 				struct rte_eth_link *link);
 
 /**
+ * Test whether device supports link state interrupt mode.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @return
+ *   - (1) if link state interrupt is supported
+ *   - (0) if link state interrupt is not supported
+ */
+extern int
+rte_eth_has_link_state(uint8_t port_id);
+
+/**
  * Retrieve the general I/O statistics of an Ethernet device.
  *
  * @param port_id
-- 
2.1.4

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/7] ether: add function to query for link state interrupt
  2015-04-21 17:32 ` [dpdk-dev] [PATCH v4 1/7] ether: add function to query for link state interrupt Stephen Hemminger
@ 2015-07-08 23:42   ` Thomas Monjalon
       [not found]   ` <d0360434d10a44dcb9f5c9c7220c3162@HQ1WP-EXMB11.corp.brocade.com>
  1 sibling, 0 replies; 17+ messages in thread
From: Thomas Monjalon @ 2015-07-08 23:42 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Stephen Hemminger, alexmay

2015-04-21 10:32, Stephen Hemminger:
> Allow application to query whether link state will work.
> This is also part of abstracting dependency on PCI.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  lib/librte_ether/rte_ethdev.c | 14 ++++++++++++++
>  lib/librte_ether/rte_ethdev.h | 12 ++++++++++++
[...]
>  /**
> + * Test whether device supports link state interrupt mode.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @return
> + *   - (1) if link state interrupt is supported
> + *   - (0) if link state interrupt is not supported
> + */
> +extern int
> +rte_eth_has_link_state(uint8_t port_id);

It requires change in map file to work with shared library.

^ permalink raw reply	[flat|nested] 17+ messages in thread

[parent not found: <d0360434d10a44dcb9f5c9c7220c3162@HQ1WP-EXMB11.corp.brocade.com>]

* Re: [dpdk-dev] [PATCH v4 1/7] ether: add function to query for link state interrupt
       [not found]   ` <d0360434d10a44dcb9f5c9c7220c3162@HQ1WP-EXMB11.corp.brocade.com>
@ 2017-02-08 23:25     ` Stephen Hemminger
  0 siblings, 0 replies; 17+ messages in thread
From: Stephen Hemminger @ 2017-02-08 23:25 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, alexmay, Stephen Hemminger

On Wed, 8 Jul 2015 23:42:05 +0000
Thomas Monjalon <thomas.monjalon@6wind.com> wrote:

> 2015-04-21 10:32, Stephen Hemminger:
> > Allow application to query whether link state will work.
> > This is also part of abstracting dependency on PCI.
> > 
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > ---
> >  lib/librte_ether/rte_ethdev.c | 14 ++++++++++++++
> >  lib/librte_ether/rte_ethdev.h | 12 ++++++++++++  
> [...]
> >  /**
> > + * Test whether device supports link state interrupt mode.
> > + *
> > + * @param port_id
> > + *   The port identifier of the Ethernet device.
> > + * @return
> > + *   - (1) if link state interrupt is supported
> > + *   - (0) if link state interrupt is not supported
> > + */
> > +extern int
> > +rte_eth_has_link_state(uint8_t port_id);  
> 
> It requires change in map file to work with shared library.

A better way to solve this to move drv_flags out of PCI driver and into generic driver
data structure.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [dpdk-dev] [PATCH v4 2/7] pmd: change drivers initialization for pci
  2015-04-21 17:32 [dpdk-dev] [PATCH v4 0/7] Hyper-V Poll Mode driver Stephen Hemminger
  2015-04-21 17:32 ` [dpdk-dev] [PATCH v4 1/7] ether: add function to query for link state interrupt Stephen Hemminger
@ 2015-04-21 17:32 ` Stephen Hemminger
  2015-04-21 17:32 ` [dpdk-dev] [PATCH v4 3/7] hv: add basic vmbus support Stephen Hemminger
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Stephen Hemminger @ 2015-04-21 17:32 UTC (permalink / raw)
  To: alexmay; +Cc: dev, Stephen Hemminger

From: Stephen Hemminger <shemming@brocade.com>

The change to generic ether device structure to support multiple
bus types requires a change to all existing PMD but only in the
initialization (and the change is backwards compatiable).

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/librte_pmd_e1000/em_ethdev.c        | 2 +-
 lib/librte_pmd_e1000/igb_ethdev.c       | 4 ++--
 lib/librte_pmd_enic/enic_ethdev.c       | 2 +-
 lib/librte_pmd_fm10k/fm10k_ethdev.c     | 2 +-
 lib/librte_pmd_i40e/i40e_ethdev.c       | 2 +-
 lib/librte_pmd_i40e/i40e_ethdev_vf.c    | 2 +-
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c     | 4 ++--
 lib/librte_pmd_virtio/virtio_ethdev.c   | 2 +-
 lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c | 2 +-
 9 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/lib/librte_pmd_e1000/em_ethdev.c b/lib/librte_pmd_e1000/em_ethdev.c
index 82e0b7a..e57530e 100644
--- a/lib/librte_pmd_e1000/em_ethdev.c
+++ b/lib/librte_pmd_e1000/em_ethdev.c
@@ -281,7 +281,7 @@ eth_em_dev_init(struct rte_eth_dev *eth_dev)
 }
 
 static struct eth_driver rte_em_pmd = {
-	{
+	.pci_drv = {
 		.name = "rte_em_pmd",
 		.id_table = pci_id_em_map,
 		.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
diff --git a/lib/librte_pmd_e1000/igb_ethdev.c b/lib/librte_pmd_e1000/igb_ethdev.c
index e2b7cf3..67273b0 100644
--- a/lib/librte_pmd_e1000/igb_ethdev.c
+++ b/lib/librte_pmd_e1000/igb_ethdev.c
@@ -680,7 +680,7 @@ eth_igbvf_dev_init(struct rte_eth_dev *eth_dev)
 }
 
 static struct eth_driver rte_igb_pmd = {
-	{
+	.pci_drv = {
 		.name = "rte_igb_pmd",
 		.id_table = pci_id_igb_map,
 		.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
@@ -693,7 +693,7 @@ static struct eth_driver rte_igb_pmd = {
  * virtual function driver struct
  */
 static struct eth_driver rte_igbvf_pmd = {
-	{
+	.pci_drv = {
 		.name = "rte_igbvf_pmd",
 		.id_table = pci_id_igbvf_map,
 		.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
diff --git a/lib/librte_pmd_enic/enic_ethdev.c b/lib/librte_pmd_enic/enic_ethdev.c
index 63a594d..dbef5c6 100644
--- a/lib/librte_pmd_enic/enic_ethdev.c
+++ b/lib/librte_pmd_enic/enic_ethdev.c
@@ -609,7 +609,7 @@ static int eth_enicpmd_dev_init(struct rte_eth_dev *eth_dev)
 }
 
 static struct eth_driver rte_enic_pmd = {
-	{
+	.pci_drv = {
 		.name = "rte_enic_pmd",
 		.id_table = pci_id_enic_map,
 		.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
diff --git a/lib/librte_pmd_fm10k/fm10k_ethdev.c b/lib/librte_pmd_fm10k/fm10k_ethdev.c
index 1a96cf2..ed6aaa6 100644
--- a/lib/librte_pmd_fm10k/fm10k_ethdev.c
+++ b/lib/librte_pmd_fm10k/fm10k_ethdev.c
@@ -1843,7 +1843,7 @@ static struct rte_pci_id pci_id_fm10k_map[] = {
 };
 
 static struct eth_driver rte_pmd_fm10k = {
-	{
+	.pci_drv = {
 		.name = "rte_pmd_fm10k",
 		.id_table = pci_id_fm10k_map,
 		.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
diff --git a/lib/librte_pmd_i40e/i40e_ethdev.c b/lib/librte_pmd_i40e/i40e_ethdev.c
index dc44764..ba13d68 100644
--- a/lib/librte_pmd_i40e/i40e_ethdev.c
+++ b/lib/librte_pmd_i40e/i40e_ethdev.c
@@ -265,7 +265,7 @@ static const struct eth_dev_ops i40e_eth_dev_ops = {
 };
 
 static struct eth_driver rte_i40e_pmd = {
-	{
+	.pci_drv = {
 		.name = "rte_i40e_pmd",
 		.id_table = pci_id_i40e_map,
 		.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
diff --git a/lib/librte_pmd_i40e/i40e_ethdev_vf.c b/lib/librte_pmd_i40e/i40e_ethdev_vf.c
index 4581c5b..0186fbd 100644
--- a/lib/librte_pmd_i40e/i40e_ethdev_vf.c
+++ b/lib/librte_pmd_i40e/i40e_ethdev_vf.c
@@ -1201,7 +1201,7 @@ i40evf_dev_init(struct rte_eth_dev *eth_dev)
  * virtual function driver struct
  */
 static struct eth_driver rte_i40evf_pmd = {
-	{
+	.pci_drv = {
 		.name = "rte_i40evf_pmd",
 		.id_table = pci_id_i40evf_map,
 		.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
index 1b3b4b5..757ae96 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
@@ -1087,7 +1087,7 @@ eth_ixgbevf_dev_init(struct rte_eth_dev *eth_dev)
 }
 
 static struct eth_driver rte_ixgbe_pmd = {
-	{
+	.pci_drv = {
 		.name = "rte_ixgbe_pmd",
 		.id_table = pci_id_ixgbe_map,
 		.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
@@ -1100,7 +1100,7 @@ static struct eth_driver rte_ixgbe_pmd = {
  * virtual function driver struct
  */
 static struct eth_driver rte_ixgbevf_pmd = {
-	{
+	.pci_drv = {
 		.name = "rte_ixgbevf_pmd",
 		.id_table = pci_id_ixgbevf_map,
 		.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c b/lib/librte_pmd_virtio/virtio_ethdev.c
index ffa26a0..e39206d 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -1238,7 +1238,7 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
 }
 
 static struct eth_driver rte_virtio_pmd = {
-	{
+	.pci_drv = {
 		.name = "rte_virtio_pmd",
 		.id_table = pci_id_virtio_map,
 	},
diff --git a/lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c b/lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c
index 577e0f9..97278cf 100644
--- a/lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c
+++ b/lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c
@@ -261,7 +261,7 @@ eth_vmxnet3_dev_init(struct rte_eth_dev *eth_dev)
 }
 
 static struct eth_driver rte_vmxnet3_pmd = {
-	{
+	.pci_drv = {
 		.name = "rte_vmxnet3_pmd",
 		.id_table = pci_id_vmxnet3_map,
 		.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
-- 
2.1.4

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [dpdk-dev] [PATCH v4 3/7] hv: add basic vmbus support
  2015-04-21 17:32 [dpdk-dev] [PATCH v4 0/7] Hyper-V Poll Mode driver Stephen Hemminger
  2015-04-21 17:32 ` [dpdk-dev] [PATCH v4 1/7] ether: add function to query for link state interrupt Stephen Hemminger
  2015-04-21 17:32 ` [dpdk-dev] [PATCH v4 2/7] pmd: change drivers initialization for pci Stephen Hemminger
@ 2015-04-21 17:32 ` Stephen Hemminger
  2015-07-08 23:51   ` Thomas Monjalon
  2015-04-21 17:32 ` [dpdk-dev] [PATCH v4 4/7] hv: uio driver Stephen Hemminger
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Stephen Hemminger @ 2015-04-21 17:32 UTC (permalink / raw)
  To: alexmay; +Cc: dev

The hyper-v device driver forces the base EAL code to change
to support multiple bus types. This is done changing the pci_device
in ether driver to a generic union.

As much as possible this is done in a backwards source compatiable
way. It will break ABI for device drivers.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/librte_eal/common/Makefile             |   2 +-
 lib/librte_eal/common/eal_common_options.c |   5 +
 lib/librte_eal/common/eal_internal_cfg.h   |   1 +
 lib/librte_eal/common/eal_options.h        |   2 +
 lib/librte_eal/common/eal_private.h        |  10 +
 lib/librte_eal/common/include/rte_vmbus.h  | 159 +++++++
 lib/librte_eal/linuxapp/eal/Makefile       |   3 +
 lib/librte_eal/linuxapp/eal/eal.c          |  11 +
 lib/librte_eal/linuxapp/eal/eal_vmbus.c    | 641 +++++++++++++++++++++++++++++
 lib/librte_ether/rte_ethdev.c              | 128 +++++-
 lib/librte_ether/rte_ethdev.h              |  15 +-
 11 files changed, 968 insertions(+), 9 deletions(-)
 create mode 100644 lib/librte_eal/common/include/rte_vmbus.h
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_vmbus.c

diff --git a/lib/librte_eal/common/Makefile b/lib/librte_eal/common/Makefile
index 3ea3bbf..202485e 100644
--- a/lib/librte_eal/common/Makefile
+++ b/lib/librte_eal/common/Makefile
@@ -33,7 +33,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
 
 INC := rte_branch_prediction.h rte_common.h
 INC += rte_debug.h rte_eal.h rte_errno.h rte_launch.h rte_lcore.h
-INC += rte_log.h rte_memory.h rte_memzone.h rte_pci.h
+INC += rte_log.h rte_memory.h rte_memzone.h rte_pci.h rte_vmbus.h
 INC += rte_pci_dev_ids.h rte_per_lcore.h rte_random.h
 INC += rte_rwlock.h rte_tailq.h rte_interrupts.h rte_alarm.h
 INC += rte_string_fns.h rte_version.h
diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c
index 8fcb1ab..76a3394 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -80,6 +80,7 @@ eal_long_options[] = {
 	{OPT_NO_HPET,           0, NULL, OPT_NO_HPET_NUM          },
 	{OPT_NO_HUGE,           0, NULL, OPT_NO_HUGE_NUM          },
 	{OPT_NO_PCI,            0, NULL, OPT_NO_PCI_NUM           },
+	{OPT_NO_VMBUS,		0, NULL, OPT_NO_VMBUS_NUM	  },
 	{OPT_NO_SHCONF,         0, NULL, OPT_NO_SHCONF_NUM        },
 	{OPT_PCI_BLACKLIST,     1, NULL, OPT_PCI_BLACKLIST_NUM    },
 	{OPT_PCI_WHITELIST,     1, NULL, OPT_PCI_WHITELIST_NUM    },
@@ -726,6 +727,10 @@ eal_parse_common_option(int opt, const char *optarg,
 		conf->no_pci = 1;
 		break;
 
+	case OPT_NO_VMBUS_NUM:
+		conf->no_vmbus = 1;
+		break;
+
 	case OPT_NO_HPET_NUM:
 		conf->no_hpet = 1;
 		break;
diff --git a/lib/librte_eal/common/eal_internal_cfg.h b/lib/librte_eal/common/eal_internal_cfg.h
index e2ecb0d..0e7de34 100644
--- a/lib/librte_eal/common/eal_internal_cfg.h
+++ b/lib/librte_eal/common/eal_internal_cfg.h
@@ -66,6 +66,7 @@ struct internal_config {
 	volatile unsigned no_hugetlbfs;   /**< true to disable hugetlbfs */
 	volatile unsigned xen_dom0_support; /**< support app running on Xen Dom0*/
 	volatile unsigned no_pci;         /**< true to disable PCI */
+	volatile unsigned no_vmbus;	  /**< true to disable VMBUS */
 	volatile unsigned no_hpet;        /**< true to disable HPET */
 	volatile unsigned vmware_tsc_map; /**< true to use VMware TSC mapping
 										* instead of native TSC */
diff --git a/lib/librte_eal/common/eal_options.h b/lib/librte_eal/common/eal_options.h
index f6714d9..54f03dc 100644
--- a/lib/librte_eal/common/eal_options.h
+++ b/lib/librte_eal/common/eal_options.h
@@ -67,6 +67,8 @@ enum {
 	OPT_NO_HUGE_NUM,
 #define OPT_NO_PCI            "no-pci"
 	OPT_NO_PCI_NUM,
+#define OPT_NO_VMBUS          "no-vmbus"
+	OPT_NO_VMBUS_NUM,
 #define OPT_NO_SHCONF         "no-shconf"
 	OPT_NO_SHCONF_NUM,
 #define OPT_SOCKET_MEM        "socket-mem"
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index 4acf5a0..039e9f3 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -180,6 +180,16 @@ int rte_eal_pci_close_one_driver(struct rte_pci_driver *dr,
 		struct rte_pci_device *dev);
 
 /**
+ * VMBUS related functions and structures
+ */
+int rte_eal_vmbus_init(void);
+
+struct rte_vmbus_driver;
+struct rte_vmbus_device;
+
+int rte_eal_vmbus_probe_one_driver(struct rte_vmbus_driver *dr,
+		struct rte_vmbus_device *dev);
+/**
  * Init tail queues for non-EAL library structures. This is to allow
  * the rings, mempools, etc. lists to be shared among multiple processes
  *
diff --git a/lib/librte_eal/common/include/rte_vmbus.h b/lib/librte_eal/common/include/rte_vmbus.h
new file mode 100644
index 0000000..e632572
--- /dev/null
+++ b/lib/librte_eal/common/include/rte_vmbus.h
@@ -0,0 +1,159 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2013 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2013-2015 Brocade Communications Systems, Inc.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#ifndef _RTE_VMBUS_H_
+#define _RTE_VMBUS_H_
+
+/**
+ * @file
+ *
+ * RTE VMBUS Interface
+ */
+
+#include <sys/queue.h>
+
+/** Pathname of VMBUS devices directory. */
+#define SYSFS_VMBUS_DEVICES "/sys/bus/vmbus/devices"
+
+/** Formatting string for VMBUS device identifier: Ex: vmbus_0_9 */
+#define VMBUS_PRI_FMT "vmbus_0_%u"
+
+#define VMBUS_ID_ANY 0xFFFF
+
+#define VMBUS_NETWORK_DEVICE "{f8615163-df3e-46c5-913f-f2d2f965ed0e}"
+
+/** Maximum number of VMBUS resources. */
+#define VMBUS_MAX_RESOURCE 7
+
+/**
+ * A structure describing an ID for a VMBUS driver. Each driver provides a
+ * table of these IDs for each device that it supports.
+ */
+struct rte_vmbus_id {
+	uint16_t device_id;           /**< VMBUS Device ID */
+	uint16_t sysfs_num;           /**< vmbus_0_X */
+};
+
+/**
+ * A structure describing a VMBUS memory resource.
+ */
+struct rte_vmbus_resource {
+	uint64_t phys_addr;   /**< Physical address, 0 if no resource. */
+	uint64_t len;         /**< Length of the resource. */
+	void *addr;           /**< Virtual address, NULL when not mapped. */
+};
+
+/**
+ * A structure describing a VMBUS device.
+ */
+struct rte_vmbus_device {
+	TAILQ_ENTRY(rte_vmbus_device) next;     /**< Next probed VMBUS device. */
+	struct rte_vmbus_id id;                 /**< VMBUS ID. */
+	const struct rte_vmbus_driver *driver;  /**< Associated driver */
+	int numa_node;                          /**< NUMA node connection */
+	unsigned int blacklisted:1;             /**< Device is blacklisted */
+	struct rte_vmbus_resource mem_resource[VMBUS_MAX_RESOURCE];   /**< VMBUS Memory Resource */
+	uint32_t vmbus_monitor_id;              /**< VMBus monitor ID for device */
+	int uio_fd;                             /** UIO device file descriptor */
+};
+
+/** Macro used to help building up tables of device IDs */
+#define RTE_VMBUS_DEVICE(dev)          \
+	.device_id = (dev)
+
+struct rte_vmbus_driver;
+
+/**
+ * Initialisation function for the driver called during VMBUS probing.
+ */
+typedef int (vmbus_devinit_t)(struct rte_vmbus_driver *, struct rte_vmbus_device *);
+
+/**
+ * Uninitialisation function for the driver called during hotplugging.
+ */
+typedef int (vmbus_devuninit_t)(struct rte_vmbus_device *);
+
+/**
+ * A structure describing a VMBUS driver.
+ */
+struct rte_vmbus_driver {
+	TAILQ_ENTRY(rte_vmbus_driver) next;     /**< Next in list. */
+	const char *name;                       /**< Driver name. */
+	vmbus_devinit_t *devinit;               /**< Device init. function. */
+	vmbus_devuninit_t *devuninit;           /**< Device uninit function. */
+	const struct rte_vmbus_id *id_table;    /**< ID table, NULL terminated. */
+	uint32_t drv_flags;                     /**< Flags contolling handling of device. */
+	const char *module_name;		/**< Associated kernel module */
+};
+
+/**
+ * Probe the VMBUS device for registered drivers.
+ *
+ * Scan the content of the vmbus, and call the probe() function for
+ * all registered drivers that have a matching entry in its id_table
+ * for discovered devices.
+ *
+ * @return
+ *   - 0 on success.
+ *   - Negative on error.
+ */
+int rte_eal_vmbus_probe(void);
+
+/**
+ * Dump the content of the vmbus.
+ */
+void rte_eal_vmbus_dump(void);
+
+/**
+ * Register a VMBUS driver.
+ *
+ * @param driver
+ *   A pointer to a rte_vmbus_driver structure describing the driver
+ *   to be registered.
+ */
+void rte_eal_vmbus_register(struct rte_vmbus_driver *driver);
+
+/**
+ * Unregister a VMBUS driver.
+ *
+ * @param driver
+ *   A pointer to a rte_vmbus_driver structure describing the driver
+ *   to be unregistered.
+ */
+void rte_eal_vmbus_unregister(struct rte_vmbus_driver *driver);
+
+int vmbus_uio_map_resource(struct rte_vmbus_device *dev);
+
+#endif /* _RTE_VMBUS_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index 01f7b70..acd5127 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -74,6 +74,9 @@ SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_alarm.c
 ifeq ($(CONFIG_RTE_LIBRTE_IVSHMEM),y)
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_ivshmem.c
 endif
+ifeq ($(CONFIG_RTE_LIBRTE_HV_PMD),y)
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_vmbus.c
+endif
 
 # from common dir
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_common_memzone.c
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index bd770cf..86d0e31 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -70,6 +70,7 @@
 #include <rte_cpuflags.h>
 #include <rte_interrupts.h>
 #include <rte_pci.h>
+#include <rte_vmbus.h>
 #include <rte_devargs.h>
 #include <rte_common.h>
 #include <rte_version.h>
@@ -796,6 +797,11 @@ rte_eal_init(int argc, char **argv)
 
 	rte_eal_mcfg_complete();
 
+#ifdef RTE_LIBRTE_HV_PMD
+	if (rte_eal_vmbus_init() < 0)
+		RTE_LOG(ERR, EAL, "Cannot init VMBUS\n");
+#endif
+
 	TAILQ_FOREACH(solib, &solib_list, next) {
 		RTE_LOG(INFO, EAL, "open shared lib %s\n", solib->name);
 		solib->lib_handle = dlopen(solib->name, RTLD_NOW);
@@ -845,6 +851,11 @@ rte_eal_init(int argc, char **argv)
 	if (rte_eal_pci_probe())
 		rte_panic("Cannot probe PCI\n");
 
+#ifdef RTE_LIBRTE_HV_PMD
+	if (rte_eal_vmbus_probe() < 0)
+		rte_panic("Cannot probe VMBUS\n");
+#endif
+
 	return fctret;
 }
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_vmbus.c b/lib/librte_eal/linuxapp/eal/eal_vmbus.c
new file mode 100644
index 0000000..165edd6
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/eal_vmbus.c
@@ -0,0 +1,641 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2013 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2013-2015 Brocade Communications Systems, Inc.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#include <string.h>
+#include <dirent.h>
+#include <fcntl.h>
+#include <sys/mman.h>
+#include <sys/queue.h>
+
+#include <rte_log.h>
+#include <rte_vmbus.h>
+#include <rte_common.h>
+#include <rte_tailq.h>
+#include <rte_eal.h>
+#include <rte_malloc.h>
+
+#include "eal_filesystem.h"
+#include "eal_private.h"
+
+#define PROC_MODULES "/proc/modules"
+#define VMBUS_DRV_PATH "/sys/bus/vmbus/drivers/%s"
+
+TAILQ_HEAD(vmbus_device_list, rte_vmbus_device); /**< VMBUS devices in D-linked Q. */
+TAILQ_HEAD(vmbus_driver_list, rte_vmbus_driver); /**< VMBUS drivers in D-linked Q. */
+
+static struct vmbus_driver_list vmbus_driver_list =
+	TAILQ_HEAD_INITIALIZER(vmbus_driver_list);
+static struct vmbus_device_list vmbus_device_list =
+	TAILQ_HEAD_INITIALIZER(vmbus_device_list);
+
+struct uio_map {
+	void *addr;
+	uint64_t offset;
+	uint64_t size;
+	uint64_t phaddr;
+};
+
+/*
+ * For multi-process we need to reproduce all vmbus mappings in secondary
+ * processes, so save them in a tailq.
+ */
+struct uio_resource {
+	TAILQ_ENTRY(uio_resource) next;
+
+	struct rte_vmbus_id vmbus_addr;
+	char path[PATH_MAX];
+	size_t nb_maps;
+	struct uio_map maps[VMBUS_MAX_RESOURCE];
+};
+
+/*
+ * parse a sysfs file containing one integer value
+ * different to the eal version, as it needs to work with 64-bit values
+ */
+static int
+vmbus_parse_sysfs_value(const char *filename, uint64_t *val)
+{
+	FILE *f;
+	char buf[BUFSIZ];
+	char *end = NULL;
+
+	f = fopen(filename, "r");
+	if (f == NULL) {
+		RTE_LOG(ERR, EAL, "%s(): cannot open sysfs value %s\n",
+				__func__, filename);
+		return -1;
+	}
+
+	if (fgets(buf, sizeof(buf), f) == NULL) {
+		RTE_LOG(ERR, EAL, "%s(): cannot read sysfs value %s\n",
+				__func__, filename);
+		fclose(f);
+		return -1;
+	}
+	*val = strtoull(buf, &end, 0);
+	if ((buf[0] == '\0') || (end == NULL) || (*end != '\n')) {
+		RTE_LOG(ERR, EAL, "%s(): cannot parse sysfs value %s\n",
+				__func__, filename);
+		fclose(f);
+		return -1;
+	}
+	fclose(f);
+	return 0;
+}
+
+#define OFF_MAX              ((uint64_t)(off_t)-1)
+static ssize_t
+vmbus_uio_get_mappings(const char *devname, struct uio_map maps[], size_t nb_maps)
+{
+	size_t i;
+	char dirname[PATH_MAX];
+	char filename[PATH_MAX];
+	uint64_t offset, size;
+
+	for (i = 0; i != nb_maps; i++) {
+
+		/* check if map directory exists */
+		snprintf(dirname, sizeof(dirname),
+				"%s/maps/map%zu", devname, i);
+
+		RTE_LOG(DEBUG, EAL, "Scanning maps in %s\n", (char *)dirname);
+
+		if (access(dirname, F_OK) != 0)
+			break;
+
+		/* get mapping offset */
+		snprintf(filename, sizeof(filename),
+				"%s/offset", dirname);
+		if (vmbus_parse_sysfs_value(filename, &offset) < 0) {
+			RTE_LOG(ERR, EAL,
+					"%s(): cannot parse offset of %s\n",
+					__func__, dirname);
+			return -1;
+		}
+
+		/* get mapping size */
+		snprintf(filename, sizeof(filename),
+				"%s/size", dirname);
+		if (vmbus_parse_sysfs_value(filename, &size) < 0) {
+			RTE_LOG(ERR, EAL,
+					"%s(): cannot parse size of %s\n",
+					__func__, dirname);
+			return -1;
+		}
+
+		/* get mapping physical address */
+		snprintf(filename, sizeof(filename),
+				"%s/addr", dirname);
+		if (vmbus_parse_sysfs_value(filename, &maps[i].phaddr) < 0) {
+			RTE_LOG(ERR, EAL,
+					"%s(): cannot parse addr of %s\n",
+					__func__, dirname);
+			return -1;
+		}
+
+		if ((offset > OFF_MAX) || (size > SIZE_MAX)) {
+			RTE_LOG(ERR, EAL,
+					"%s(): offset/size exceed system max value\n",
+					__func__);
+			return -1;
+		}
+
+		maps[i].offset = offset;
+		maps[i].size = size;
+	}
+	return i;
+}
+
+/* maximum time to wait that /dev/uioX appears */
+#define UIO_DEV_WAIT_TIMEOUT 3 /* seconds */
+
+/* map a particular resource from a file */
+static void *
+vmbus_map_resource(struct rte_vmbus_device *dev, void *requested_addr,
+		const char *devname, off_t offset, size_t size)
+{
+	int fd;
+	void *mapaddr;
+
+	if (dev->uio_fd <= 0)
+		fd = open(devname, O_RDWR);
+	else
+		fd = dev->uio_fd;
+
+	if (fd < 0) {
+		RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
+				devname, strerror(errno));
+		goto fail;
+	}
+
+	dev->uio_fd = fd;
+	/* Map the memory resource of device */
+	mapaddr = mmap(requested_addr, size, PROT_READ | PROT_WRITE,
+			MAP_SHARED, fd, offset);
+	if (mapaddr == MAP_FAILED ||
+			(requested_addr != NULL && mapaddr != requested_addr)) {
+		RTE_LOG(ERR, EAL,
+			"%s(): cannot mmap(%s(%d), %p, 0x%lx, 0x%lx):"
+			" %s (%p)\n", __func__, devname, fd, requested_addr,
+			(unsigned long)size, (unsigned long)offset,
+			strerror(errno), mapaddr);
+		close(fd);
+		goto fail;
+	}
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+		close(fd);
+
+	RTE_LOG(DEBUG, EAL, "  VMBUS memory mapped at %p\n", mapaddr);
+
+	return mapaddr;
+
+fail:
+	return NULL;
+}
+
+/* map the resources of a vmbus device in virtual memory */
+int
+vmbus_uio_map_resource(struct rte_vmbus_device *dev)
+{
+	int i;
+	struct dirent *e;
+	DIR *dir;
+	char dirname[PATH_MAX];
+	char dirname2[PATH_MAX];
+	char devname[PATH_MAX]; /* contains the /dev/uioX */
+	void *mapaddr;
+	unsigned uio_num;
+	uint64_t phaddr;
+	uint64_t offset;
+	uint64_t pagesz;
+	ssize_t nb_maps;
+	struct rte_vmbus_id *loc = &dev->id;
+	struct uio_resource *uio_res;
+	struct uio_map *maps;
+
+	/* depending on kernel version, uio can be located in uio/uioX
+	 * or uio:uioX */
+	snprintf(dirname, sizeof(dirname),
+			"/sys/bus/vmbus/devices/" VMBUS_PRI_FMT "/uio", loc->sysfs_num);
+
+	dir = opendir(dirname);
+	if (dir == NULL) {
+		/* retry with the parent directory */
+		snprintf(dirname, sizeof(dirname),
+				"/sys/bus/vmbus/devices/" VMBUS_PRI_FMT, loc->sysfs_num);
+		dir = opendir(dirname);
+
+		if (dir == NULL) {
+			RTE_LOG(ERR, EAL, "Cannot opendir %s\n", dirname);
+			return -1;
+		}
+	}
+
+	/* take the first file starting with "uio" */
+	while ((e = readdir(dir)) != NULL) {
+		/* format could be uio%d ...*/
+		int shortprefix_len = sizeof("uio") - 1;
+		/* ... or uio:uio%d */
+		int longprefix_len = sizeof("uio:uio") - 1;
+		char *endptr;
+
+		if (strncmp(e->d_name, "uio", 3) != 0)
+			continue;
+
+		/* first try uio%d */
+		errno = 0;
+		uio_num = strtoull(e->d_name + shortprefix_len, &endptr, 10);
+		if (errno == 0 && endptr != e->d_name) {
+			snprintf(dirname2, sizeof(dirname2),
+					"%s/uio%u", dirname, uio_num);
+			break;
+		}
+
+		/* then try uio:uio%d */
+		errno = 0;
+		uio_num = strtoull(e->d_name + longprefix_len, &endptr, 10);
+		if (errno == 0 && endptr != e->d_name) {
+			snprintf(dirname2, sizeof(dirname2),
+					"%s/uio:uio%u", dirname, uio_num);
+			break;
+		}
+	}
+	closedir(dir);
+
+	/* No uio resource found */
+	if (e == NULL) {
+		RTE_LOG(WARNING, EAL,
+			VMBUS_PRI_FMT" not managed by UIO driver, skipping\n",
+			loc->sysfs_num);
+		return -1;
+	}
+
+	/* allocate the mapping details for secondary processes*/
+	uio_res = rte_zmalloc("UIO_RES", sizeof(*uio_res), 0);
+	if (uio_res == NULL) {
+		RTE_LOG(ERR, EAL,
+				"%s(): cannot store uio mmap details\n", __func__);
+		return -1;
+	}
+
+	snprintf(devname, sizeof(devname), "/dev/uio%u", uio_num);
+	snprintf(uio_res->path, sizeof(uio_res->path), "%s", devname);
+	memcpy(&uio_res->vmbus_addr, &dev->id, sizeof(uio_res->vmbus_addr));
+
+	/* collect info about device mappings */
+	nb_maps = vmbus_uio_get_mappings(dirname2, uio_res->maps,
+			sizeof(uio_res->maps) / sizeof(uio_res->maps[0]));
+	if (nb_maps < 0)
+		return nb_maps;
+
+	RTE_LOG(DEBUG, EAL, "Found %d memory maps for device "VMBUS_PRI_FMT"\n",
+			(int)nb_maps, loc->sysfs_num);
+
+	uio_res->nb_maps = nb_maps;
+
+	pagesz = sysconf(_SC_PAGESIZE);
+
+	maps = uio_res->maps;
+	for (i = 0; i != VMBUS_MAX_RESOURCE; i++) {
+		phaddr = maps[i].phaddr;
+		if (phaddr == 0)
+			continue;
+
+		RTE_LOG(DEBUG, EAL, "	mem_map%d: addr=0x%lx len = %lu\n",
+				i,
+				maps[i].phaddr,
+				maps[i].size);
+
+		if (i != nb_maps) {
+			offset = i * pagesz;
+			mapaddr = vmbus_map_resource(dev, NULL, devname, (off_t)offset,
+					(size_t)maps[i].size);
+			if (mapaddr == NULL)
+				return -1;
+
+			/* Important: offset for mapping can be non-zero, pad the addr */
+			mapaddr = ((char *)mapaddr + maps[i].offset);
+			maps[i].addr = mapaddr;
+			maps[i].offset = offset;
+			dev->mem_resource[i].addr = mapaddr;
+			dev->mem_resource[i].phys_addr = phaddr;
+			dev->mem_resource[i].len = maps[i].size;
+		}
+	}
+
+	return 0;
+}
+
+/* Compare two VMBUS device addresses. */
+static int
+vmbus_compare(struct rte_vmbus_id *id, struct rte_vmbus_id *id2)
+{
+	return id->device_id > id2->device_id;
+}
+
+/* Scan one vmbus sysfs entry, and fill the devices list from it. */
+static int
+vmbus_scan_one(const char *name)
+{
+	char filename[PATH_MAX];
+	char buf[BUFSIZ];
+	char dirname[PATH_MAX];
+	unsigned long tmp;
+	unsigned int sysfs_num;
+	struct rte_vmbus_device *dev;
+	FILE *f;
+
+	dev = rte_zmalloc("vmbus_device", sizeof(*dev), 0);
+	if (dev == NULL)
+		return -1;
+
+	snprintf(dirname, sizeof(dirname), "%s/%s",
+		 SYSFS_VMBUS_DEVICES, name);
+
+	/* parse directory name in sysfs.  this does not always reflect
+	 * the device id read below.
+	 */
+	if (sscanf(name, VMBUS_PRI_FMT, &sysfs_num) != 1) {
+		RTE_LOG(ERR, EAL, "Unable to parse vmbus sysfs name\n");
+		rte_free(dev);
+		return -1;
+	}
+	dev->id.sysfs_num = sysfs_num;
+
+	/* get device id */
+	snprintf(filename, sizeof(filename), "%s/id", dirname);
+	if (eal_parse_sysfs_value(filename, &tmp) < 0) {
+		rte_free(dev);
+		return -1;
+	}
+	dev->id.device_id = (uint16_t)tmp;
+
+	/* get monitor id */
+	snprintf(filename, sizeof(filename), "%s/monitor_id", dirname);
+	if (eal_parse_sysfs_value(filename, &tmp) < 0) {
+		rte_free(dev);
+		return -1;
+	}
+	dev->vmbus_monitor_id = tmp;
+
+	/* compare class_id of device with {f8615163-df3e-46c5-913ff2d2f965ed0e} */
+	snprintf(filename, sizeof(filename), "%s/class_id", dirname);
+	f = fopen(filename, "r");
+	if (f == NULL) {
+		RTE_LOG(ERR, EAL, "%s(): cannot open sysfs value %s\n",
+				__func__, filename);
+		rte_free(dev);
+		return -1;
+	}
+	if (fgets(buf, sizeof(buf), f) == NULL) {
+		RTE_LOG(ERR, EAL, "%s(): cannot read sysfs value %s\n",
+				__func__, filename);
+		fclose(f);
+		rte_free(dev);
+		return -1;
+	}
+	fclose(f);
+
+	if (strncmp(buf, VMBUS_NETWORK_DEVICE, strlen(VMBUS_NETWORK_DEVICE))) {
+		RTE_LOG(DEBUG, EAL, "%s(): skip vmbus_0_%u with class_id = %s",
+				__func__, dev->id.sysfs_num, buf);
+		rte_free(dev);
+		return 0;
+	}
+
+	/* device is valid, add in list (sorted) */
+	RTE_LOG(DEBUG, EAL, "Adding vmbus device %d\n", dev->id.device_id);
+	if (!TAILQ_EMPTY(&vmbus_device_list)) {
+		struct rte_vmbus_device *dev2 = NULL;
+
+		TAILQ_FOREACH(dev2, &vmbus_device_list, next) {
+			if (vmbus_compare(&dev->id, &dev2->id))
+				continue;
+
+			TAILQ_INSERT_BEFORE(dev2, dev, next);
+			return 0;
+		}
+	}
+
+	TAILQ_INSERT_TAIL(&vmbus_device_list, dev, next);
+
+	return 0;
+}
+
+static int
+check_vmbus_device(const char *buf, int bufsize)
+{
+	char *n, *buf_copy, *endp;
+	unsigned long err;
+
+	/* the format is 'vmbus_0_%d' */
+	n = strrchr(buf, '_');
+	if (n == NULL)
+		return -1;
+	n++;
+	buf_copy = strndup(n, bufsize);
+	if (buf_copy == NULL) {
+		RTE_LOG(ERR, EAL, "%s(): failed to strndup: %s\n",
+				__func__, strerror(errno));
+		return -1;
+	}
+
+	err = strtoul(buf_copy, &endp, 10);
+	free(buf_copy);
+
+	if (*endp != '\0' || (err == ULONG_MAX && errno == ERANGE)) {
+		RTE_LOG(ERR, EAL, "%s(): can't parse devid: %s\n",
+				__func__, strerror(errno));
+		return -1;
+	}
+
+	return 0;
+}
+
+/*
+ * Scan the content of the vmbus, and the devices in the devices list
+ */
+static int
+vmbus_scan(void)
+{
+	struct dirent *e;
+	DIR *dir;
+
+	dir = opendir(SYSFS_VMBUS_DEVICES);
+	if (dir == NULL) {
+		if (errno == ENOENT)
+			return 0;
+
+		RTE_LOG(ERR, EAL, "%s(): opendir failed: %s\n",
+			__func__, strerror(errno));
+		return -1;
+	}
+
+	while ((e = readdir(dir)) != NULL) {
+		if (e->d_name[0] == '.')
+			continue;
+
+		if (check_vmbus_device(e->d_name, sizeof(e->d_name)))
+			continue;
+
+		if (vmbus_scan_one(e->d_name) < 0)
+			goto error;
+	}
+	closedir(dir);
+	return 0;
+
+ error:
+	closedir(dir);
+	return -1;
+}
+
+/* Init the VMBUS EAL subsystem */
+int rte_eal_vmbus_init(void)
+{
+	/* VMBUS can be disabled */
+	if (internal_config.no_vmbus)
+		return 0;
+
+	if (vmbus_scan() < 0) {
+		RTE_LOG(ERR, EAL, "%s(): Cannot scan vmbus\n", __func__);
+		return -1;
+	}
+	return 0;
+}
+
+/* Below is PROBE part of eal_vmbus library */
+
+/*
+ * If device ID match, call the devinit() function of the driver.
+ */
+int
+rte_eal_vmbus_probe_one_driver(struct rte_vmbus_driver *dr,
+		struct rte_vmbus_device *dev)
+{
+	const struct rte_vmbus_id *id_table;
+
+	for (id_table = dr->id_table; id_table->device_id != VMBUS_ID_ANY; id_table++) {
+		const struct rte_vmbus_id *loc = &dev->id;
+
+		RTE_LOG(DEBUG, EAL, "VMBUS device "VMBUS_PRI_FMT"\n",
+				loc->sysfs_num);
+		RTE_LOG(DEBUG, EAL, "  probe driver: %s\n", dr->name);
+
+		/* no initialization when blacklisted, return without error */
+		if (dev->blacklisted) {
+			RTE_LOG(DEBUG, EAL, "  Device is blacklisted, not initializing\n");
+			return 0;
+		}
+
+		/* map the resources */
+		if (vmbus_uio_map_resource(dev) < 0)
+			return -1;
+
+		/* reference driver structure */
+		dev->driver = dr;
+
+		/* call the driver devinit() function */
+		return dr->devinit(dr, dev);
+	}
+
+	/* return positive value if driver is not found */
+	return 1;
+}
+
+/*
+ * call the devinit() function of all
+ * registered drivers for the vmbus device. Return -1 if no driver is
+ * found for this class of vmbus device.
+ * The present assumption is that we have drivers only for vmbus network
+ * devices. That's why we don't check driver's id_table now.
+ */
+static int
+vmbus_probe_all_drivers(struct rte_vmbus_device *dev)
+{
+	struct rte_vmbus_driver *dr = NULL;
+	int ret;
+
+	TAILQ_FOREACH(dr, &vmbus_driver_list, next) {
+		ret = rte_eal_vmbus_probe_one_driver(dr, dev);
+		if (ret < 0) {
+			/* negative value is an error */
+			RTE_LOG(ERR, EAL, "Failed to probe driver %s\n", dr->name);
+			break;
+		}
+		if (ret > 0) {
+			/* positive value means driver not found */
+			RTE_LOG(DEBUG, EAL, "Driver %s not found", dr->name);
+			continue;
+		}
+
+		RTE_LOG(DEBUG, EAL, "OK. Driver was found and probed.\n");
+		return 0;
+	}
+	return -1;
+}
+
+
+/*
+ * Scan the vmbus, and call the devinit() function for
+ * all registered drivers that have a matching entry in its id_table
+ * for discovered devices.
+ */
+int
+rte_eal_vmbus_probe(void)
+{
+	struct rte_vmbus_device *dev = NULL;
+
+	TAILQ_FOREACH(dev, &vmbus_device_list, next) {
+		RTE_LOG(DEBUG, EAL, "Probing driver for device %d ...\n",
+				dev->id.device_id);
+		vmbus_probe_all_drivers(dev);
+	}
+	return 0;
+}
+
+/* register vmbus driver */
+void
+rte_eal_vmbus_register(struct rte_vmbus_driver *driver)
+{
+	TAILQ_INSERT_TAIL(&vmbus_driver_list, driver, next);
+}
+
+/* unregister vmbus driver */
+void
+rte_eal_vmbus_unregister(struct rte_vmbus_driver *driver)
+{
+	TAILQ_REMOVE(&vmbus_driver_list, driver, next);
+}
+
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 9577d17..9093966 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -379,6 +379,98 @@ rte_eth_dev_uninit(struct rte_pci_device *pci_dev)
 	return 0;
 }
 
+#ifdef RTE_LIBRTE_HV_PMD
+static int
+rte_vmbus_dev_init(struct rte_vmbus_driver *vmbus_drv,
+		   struct rte_vmbus_device *vmbus_dev)
+{
+	struct eth_driver  *eth_drv = (struct eth_driver *)vmbus_drv;
+	struct rte_eth_dev *eth_dev;
+	char ethdev_name[RTE_ETH_NAME_MAX_LEN];
+	int diag;
+
+	snprintf(ethdev_name, RTE_ETH_NAME_MAX_LEN, "%u_%u",
+		 vmbus_dev->id.device_id, vmbus_dev->id.sysfs_num);
+
+	eth_dev = rte_eth_dev_allocate(ethdev_name, RTE_ETH_DEV_PCI);
+	if (eth_dev == NULL)
+		return -ENOMEM;
+
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		eth_dev->data->dev_private = rte_zmalloc("ethdev private structure",
+				  eth_drv->dev_private_size,
+				  RTE_CACHE_LINE_SIZE);
+		if (eth_dev->data->dev_private == NULL)
+			rte_panic("Cannot allocate memzone for private port data\n");
+	}
+	eth_dev->vmbus_dev = vmbus_dev;
+	eth_dev->driver = eth_drv;
+	eth_dev->data->rx_mbuf_alloc_failed = 0;
+
+	/* init user callbacks */
+	TAILQ_INIT(&(eth_dev->link_intr_cbs));
+
+	/*
+	 * Set the default maximum frame size.
+	 */
+	eth_dev->data->mtu = ETHER_MTU;
+
+	/* Invoke PMD device initialization function */
+	diag = (*eth_drv->eth_dev_init)(eth_dev);
+	if (diag == 0)
+		return 0;
+
+	PMD_DEBUG_TRACE("driver %s: eth_dev_init(device_id=0x%x)"
+			" failed\n", vmbus_drv->name,
+			(unsigned) vmbus_dev->id.device_id);
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+		rte_free(eth_dev->data->dev_private);
+	nb_ports--;
+	return diag;
+}
+
+
+static int
+rte_vmbus_dev_uninit(struct rte_vmbus_device *vmbus_dev)
+{
+	const struct eth_driver *eth_drv;
+	struct rte_eth_dev *eth_dev;
+	char ethdev_name[RTE_ETH_NAME_MAX_LEN];
+	int ret;
+
+	if (vmbus_dev == NULL)
+		return -EINVAL;
+
+	snprintf(ethdev_name, RTE_ETH_NAME_MAX_LEN, "%u_%u",
+		 vmbus_dev->id.device_id, vmbus_dev->id.sysfs_num);
+
+	eth_dev = rte_eth_dev_allocated(ethdev_name);
+	if (eth_dev == NULL)
+		return -ENODEV;
+
+	eth_drv = (const struct eth_driver *)vmbus_dev->driver;
+
+	/* Invoke PMD device uninit function */
+	if (*eth_drv->eth_dev_uninit) {
+		ret = (*eth_drv->eth_dev_uninit)(eth_dev);
+		if (ret)
+			return ret;
+	}
+
+	/* free ether device */
+	rte_eth_dev_release_port(eth_dev);
+
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+		rte_free(eth_dev->data->dev_private);
+
+	eth_dev->pci_dev = NULL;
+	eth_dev->driver = NULL;
+	eth_dev->data = NULL;
+
+	return 0;
+}
+#endif
+
 /**
  * Register an Ethernet [Poll Mode] driver.
  *
@@ -396,9 +488,22 @@ rte_eth_dev_uninit(struct rte_pci_device *pci_dev)
 void
 rte_eth_driver_register(struct eth_driver *eth_drv)
 {
-	eth_drv->pci_drv.devinit = rte_eth_dev_init;
-	eth_drv->pci_drv.devuninit = rte_eth_dev_uninit;
-	rte_eal_pci_register(&eth_drv->pci_drv);
+	switch (eth_drv->bus_type) {
+	case RTE_BUS_PCI:
+		eth_drv->pci_drv.devinit = rte_eth_dev_init;
+		eth_drv->pci_drv.devuninit = rte_eth_dev_uninit;
+		rte_eal_pci_register(&eth_drv->pci_drv);
+		break;
+#ifdef RTE_LIBRTE_HV_PMD
+	case RTE_BUS_VMBUS:
+		eth_drv->vmbus_drv.devinit = rte_vmbus_dev_init;
+		eth_drv->vmbus_drv.devuninit = rte_vmbus_dev_uninit;
+		rte_eal_vmbus_register(&eth_drv->vmbus_drv);
+		break;
+#endif
+	default:
+		rte_panic("unknown bus type %u\n", eth_drv->bus_type);
+	}
 }
 
 static int
@@ -1351,6 +1456,9 @@ rte_eth_has_link_state(uint8_t port_id)
 	}
 	dev = &rte_eth_devices[port_id];
 
+	if (dev->driver->bus_type != RTE_BUS_PCI)
+		return 0;
+
 	return (dev->pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC) != 0;
 }
 
@@ -1901,9 +2009,17 @@ rte_eth_dev_info_get(uint8_t port_id, struct rte_eth_dev_info *dev_info)
 
 	FUNC_PTR_OR_RET(*dev->dev_ops->dev_infos_get);
 	(*dev->dev_ops->dev_infos_get)(dev, dev_info);
-	dev_info->pci_dev = dev->pci_dev;
-	if (dev->driver)
-		dev_info->driver_name = dev->driver->pci_drv.name;
+
+	if (dev->driver) {
+		switch (dev->driver->bus_type) {
+		case RTE_BUS_PCI:
+			dev_info->driver_name = dev->driver->pci_drv.name;
+			dev_info->pci_dev = dev->pci_dev;
+			break;
+		case RTE_BUS_VMBUS:
+			dev_info->driver_name = dev->driver->vmbus_drv.name;
+		}
+	}
 }
 
 void
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 991023b..9e08f3e 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -178,6 +178,7 @@ extern "C" {
 #include <rte_log.h>
 #include <rte_interrupts.h>
 #include <rte_pci.h>
+#include <rte_vmbus.h>
 #include <rte_dev.h>
 #include <rte_devargs.h>
 #include <rte_mbuf.h>
@@ -1477,7 +1478,10 @@ struct rte_eth_dev {
 	struct rte_eth_dev_data *data;  /**< Pointer to device data */
 	const struct eth_driver *driver;/**< Driver for this device */
 	const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD */
-	struct rte_pci_device *pci_dev; /**< PCI info. supplied by probing */
+	union {
+		struct rte_pci_device *pci_dev; /**< PCI info. supplied by probig */
+		struct rte_vmbus_device *vmbus_dev; /**< VMBUS info. supplied by probing */
+	};
 	/** User application callbacks for NIC interrupts */
 	struct rte_eth_dev_cb_list link_intr_cbs;
 	/**
@@ -1696,7 +1700,14 @@ typedef int (*eth_dev_uninit_t)(struct rte_eth_dev *eth_dev);
  * - The size of the private data to allocate for each matching device.
  */
 struct eth_driver {
-	struct rte_pci_driver pci_drv;    /**< The PMD is also a PCI driver. */
+	union {
+		struct rte_pci_driver pci_drv;    /**< The PMD is also a PCI driver. */
+		struct rte_vmbus_driver vmbus_drv;/**< The PMD is also a VMBUS drv. */
+	};
+	enum {
+		RTE_BUS_PCI=0,
+		RTE_BUS_VMBUS
+	} bus_type;			  /**< Device bus type. */
 	eth_dev_init_t eth_dev_init;      /**< Device init function. */
 	eth_dev_uninit_t eth_dev_uninit;  /**< Device uninit function. */
 	unsigned int dev_private_size;    /**< Size of device private data. */
-- 
2.1.4

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-dev] [PATCH v4 3/7] hv: add basic vmbus support
  2015-04-21 17:32 ` [dpdk-dev] [PATCH v4 3/7] hv: add basic vmbus support Stephen Hemminger
@ 2015-07-08 23:51   ` Thomas Monjalon
  0 siblings, 0 replies; 17+ messages in thread
From: Thomas Monjalon @ 2015-07-08 23:51 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, alexmay

2015-04-21 10:32, Stephen Hemminger:
> The hyper-v device driver forces the base EAL code to change
> to support multiple bus types. This is done changing the pci_device
> in ether driver to a generic union.
> 
> As much as possible this is done in a backwards source compatiable
> way. It will break ABI for device drivers.

> --- a/lib/librte_eal/common/eal_common_options.c
> +++ b/lib/librte_eal/common/eal_common_options.c
> @@ -80,6 +80,7 @@ eal_long_options[] = {
>  	{OPT_NO_HPET,           0, NULL, OPT_NO_HPET_NUM          },
>  	{OPT_NO_HUGE,           0, NULL, OPT_NO_HUGE_NUM          },
>  	{OPT_NO_PCI,            0, NULL, OPT_NO_PCI_NUM           },
> +	{OPT_NO_VMBUS,		0, NULL, OPT_NO_VMBUS_NUM	  },

Alignment, please.

> @@ -66,6 +66,7 @@ struct internal_config {
>  	volatile unsigned no_hugetlbfs;   /**< true to disable hugetlbfs */
>  	volatile unsigned xen_dom0_support; /**< support app running on Xen Dom0*/
>  	volatile unsigned no_pci;         /**< true to disable PCI */
> +	volatile unsigned no_vmbus;	  /**< true to disable VMBUS */
>  	volatile unsigned no_hpet;        /**< true to disable HPET */
>  	volatile unsigned vmware_tsc_map; /**< true to use VMware TSC mapping

Alignment may be better.

> +#ifdef RTE_LIBRTE_HV_PMD
> +	case RTE_BUS_VMBUS:
> +		eth_drv->vmbus_drv.devinit = rte_vmbus_dev_init;
> +		eth_drv->vmbus_drv.devuninit = rte_vmbus_dev_uninit;
> +		rte_eal_vmbus_register(&eth_drv->vmbus_drv);
> +		break;
> +#endif

Why ifdef'ing this code?

> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -1477,7 +1478,10 @@ struct rte_eth_dev {
>  	struct rte_eth_dev_data *data;  /**< Pointer to device data */
>  	const struct eth_driver *driver;/**< Driver for this device */
>  	const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD */
> -	struct rte_pci_device *pci_dev; /**< PCI info. supplied by probing */
> +	union {
> +		struct rte_pci_device *pci_dev; /**< PCI info. supplied by probig */
> +		struct rte_vmbus_device *vmbus_dev; /**< VMBUS info. supplied by probing */
> +	};
[...]
>  struct eth_driver {
> -	struct rte_pci_driver pci_drv;    /**< The PMD is also a PCI driver. */
> +	union {
> +		struct rte_pci_driver pci_drv;    /**< The PMD is also a PCI driver. */
> +		struct rte_vmbus_driver vmbus_drv;/**< The PMD is also a VMBUS drv. */
> +	};
> +	enum {
> +		RTE_BUS_PCI=0,
> +		RTE_BUS_VMBUS
> +	} bus_type;			  /**< Device bus type. */

A device may also be virtual.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [dpdk-dev] [PATCH v4 4/7] hv: uio driver
  2015-04-21 17:32 [dpdk-dev] [PATCH v4 0/7] Hyper-V Poll Mode driver Stephen Hemminger
                   ` (2 preceding siblings ...)
  2015-04-21 17:32 ` [dpdk-dev] [PATCH v4 3/7] hv: add basic vmbus support Stephen Hemminger
@ 2015-04-21 17:32 ` Stephen Hemminger
  2015-07-08 23:55   ` Thomas Monjalon
  2015-04-21 17:32 ` [dpdk-dev] [PATCH v4 5/7] hv: poll mode driver Stephen Hemminger
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Stephen Hemminger @ 2015-04-21 17:32 UTC (permalink / raw)
  To: alexmay; +Cc: dev, Stas Egorov, Stephen Hemminger

From: Stephen Hemminger <shemming@brocade.com>

Add new UIO driver in kernel to support DPDK Poll Mode Driver.

Signed-off-by: Stas Egorov <segorov@mirantis.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/librte_eal/linuxapp/Makefile            |   3 +
 lib/librte_eal/linuxapp/hv_uio/Makefile     |  57 ++
 lib/librte_eal/linuxapp/hv_uio/hv_uio.c     | 551 +++++++++++++++++
 lib/librte_eal/linuxapp/hv_uio/hyperv_net.h | 907 ++++++++++++++++++++++++++++
 4 files changed, 1518 insertions(+)
 create mode 100644 lib/librte_eal/linuxapp/hv_uio/Makefile
 create mode 100644 lib/librte_eal/linuxapp/hv_uio/hv_uio.c
 create mode 100644 lib/librte_eal/linuxapp/hv_uio/hyperv_net.h

diff --git a/lib/librte_eal/linuxapp/Makefile b/lib/librte_eal/linuxapp/Makefile
index 8fcfdf6..a28d289 100644
--- a/lib/librte_eal/linuxapp/Makefile
+++ b/lib/librte_eal/linuxapp/Makefile
@@ -41,5 +41,8 @@ endif
 ifeq ($(CONFIG_RTE_LIBRTE_XEN_DOM0),y)
 DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += xen_dom0
 endif
+ifeq ($(CONFIG_RTE_LIBRTE_HV_PMD),y)
+DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += hv_uio
+endif
 
 include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/lib/librte_eal/linuxapp/hv_uio/Makefile b/lib/librte_eal/linuxapp/hv_uio/Makefile
new file mode 100644
index 0000000..2ed7771
--- /dev/null
+++ b/lib/librte_eal/linuxapp/hv_uio/Makefile
@@ -0,0 +1,57 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+#   Copyright(c) 2013-2015 Brocade Communications Systems, Inc.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# module name and path
+#
+MODULE = hv_uio
+MODULE_PATH = drivers/net/hv_uio
+
+#
+# CFLAGS
+#
+MODULE_CFLAGS += -I$(SRCDIR) --param max-inline-insns-single=100
+MODULE_CFLAGS += -I$(RTE_OUTPUT)/include
+MODULE_CFLAGS += -Winline -Wall -Werror
+MODULE_CFLAGS += -include $(RTE_OUTPUT)/include/rte_config.h
+ifeq ($(CONFIG_RTE_LIBRTE_HV_DEBUG),y)
+MODULE_CFLAGS += -DDBG
+endif
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-y := hv_uio.c
+
+include $(RTE_SDK)/mk/rte.module.mk
diff --git a/lib/librte_eal/linuxapp/hv_uio/hv_uio.c b/lib/librte_eal/linuxapp/hv_uio/hv_uio.c
new file mode 100644
index 0000000..294b0fd
--- /dev/null
+++ b/lib/librte_eal/linuxapp/hv_uio/hv_uio.c
@@ -0,0 +1,551 @@
+/*
+ * Copyright (c) 2013-2015 Brocade Communications Systems, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/device.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/netdevice.h>
+#include <linux/if_ether.h>
+#include <linux/uio_driver.h>
+#include <linux/slab.h>
+
+#include "hyperv_net.h"
+
+#define HV_DEVICE_ADD	 0
+#define HV_DEVICE_REMOVE 1
+#define HV_RING_SIZE	 512
+
+static uint mtu = ETH_DATA_LEN;
+/*
+ * List of resources to be mapped to uspace
+ * can be extended up to MAX_UIO_MAPS(5) items
+ */
+enum {
+	TXRX_RING_MAP,
+	INT_PAGE_MAP,
+	MON_PAGE_MAP,
+	RECV_BUF_MAP
+};
+
+struct hyperv_private_data {
+	struct netvsc_device *net_device;
+	struct uio_info *info;
+};
+
+extern void vmbus_get_monitor_pages(unsigned long *int_page,
+				    unsigned long monitor_pages[2]);
+
+/* phys addrs of pages in vmbus_connection from hv_vmbus */
+static unsigned long int_page, monitor_pages[2];
+
+static inline int
+hyperv_uio_find_mem_index(struct uio_info *info, struct vm_area_struct *vma)
+{
+	if (vma->vm_pgoff < MAX_UIO_MAPS) {
+		if (unlikely(info->mem[vma->vm_pgoff].size == 0))
+			return -1;
+		return (int)vma->vm_pgoff;
+	}
+	return -1;
+}
+
+static int
+hyperv_uio_mmap(struct uio_info *info, struct vm_area_struct *vma)
+{
+	int mi = hyperv_uio_find_mem_index(info, vma);
+
+	if (mi < 0)
+		return -EINVAL;
+
+	return remap_pfn_range(vma,
+			vma->vm_start,
+			virt_to_phys((void *)info->mem[mi].addr) >> PAGE_SHIFT,
+			vma->vm_end - vma->vm_start,
+			vma->vm_page_prot);
+}
+
+static struct netvsc_device *
+alloc_net_device(struct hv_device *dev)
+{
+	struct netvsc_device *net_device;
+
+	net_device = kzalloc(sizeof(struct netvsc_device), GFP_KERNEL);
+	if (!net_device) {
+		pr_err("unable to allocate memory for netvsc_device\n");
+		return NULL;
+	}
+
+	init_waitqueue_head(&net_device->wait_drain);
+	net_device->start_remove = false;
+	net_device->destroy = false;
+	net_device->dev = dev;
+	net_device->ndev = hv_get_drvdata(dev);
+	net_device->recv_section_cnt = 0;
+
+	return net_device;
+}
+
+/* Negotiate NVSP protocol version */
+static int
+negotiate_nvsp_ver(struct hv_device *dev,
+		   struct netvsc_device *net_device,
+		   struct nvsp_message *init_packet,
+		   u32 nvsp_ver)
+{
+	int ret;
+
+	memset(init_packet, 0, sizeof(struct nvsp_message));
+	init_packet->hdr.msg_type = NVSP_MSG_TYPE_INIT;
+	init_packet->msg.init_msg.init.min_protocol_ver = nvsp_ver;
+	init_packet->msg.init_msg.init.max_protocol_ver = nvsp_ver;
+
+	/* Send the init request */
+	ret = vmbus_sendpacket(dev->channel, init_packet,
+			       sizeof(struct nvsp_message),
+			       (unsigned long)init_packet,
+			       VM_PKT_DATA_INBAND,
+			       VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED);
+
+	if (ret) {
+		pr_err("unable to send nvsp negotiation packet\n");
+		return ret;
+	}
+
+	if (nvsp_ver != NVSP_PROTOCOL_VERSION_2)
+		return 0;
+
+	/* NVSPv2 only: Send NDIS config */
+	memset(init_packet, 0, sizeof(struct nvsp_message));
+	init_packet->hdr.msg_type = NVSP_MSG2_TYPE_SEND_NDIS_CONFIG;
+	init_packet->msg.v2_msg.send_ndis_config.mtu = mtu;
+	init_packet->msg.v2_msg.send_ndis_config.capability.ieee8021q = 1;
+
+	ret = vmbus_sendpacket(dev->channel, init_packet,
+			       sizeof(struct nvsp_message),
+			       (unsigned long)init_packet,
+			       VM_PKT_DATA_INBAND, 0);
+
+	return ret;
+}
+
+static int
+netvsc_destroy_recv_buf(struct netvsc_device *net_device)
+{
+	struct nvsp_message *revoke_packet;
+	int ret = 0;
+
+	/*
+	 * If we got a section count, it means we received a
+	 * SendReceiveBufferComplete msg (ie sent
+	 * NvspMessage1TypeSendReceiveBuffer msg) therefore, we need
+	 * to send a revoke msg here
+	 */
+	if (net_device->recv_section_cnt) {
+		/* Send the revoke receive buffer */
+		revoke_packet = &net_device->revoke_packet;
+		memset(revoke_packet, 0, sizeof(struct nvsp_message));
+
+		revoke_packet->hdr.msg_type =
+			NVSP_MSG1_TYPE_REVOKE_RECV_BUF;
+		revoke_packet->msg.v1_msg.
+		revoke_recv_buf.id = NETVSC_RECEIVE_BUFFER_ID;
+
+		ret = vmbus_sendpacket(net_device->dev->channel,
+				       revoke_packet,
+				       sizeof(struct nvsp_message),
+				       (unsigned long)revoke_packet,
+				       VM_PKT_DATA_INBAND, 0);
+		/*
+		 * If we failed here, we might as well return and
+		 * have a leak rather than continue and a bugchk
+		 */
+		if (ret != 0) {
+			pr_err("unable to send revoke receive buffer to netvsp\n");
+			return ret;
+		}
+	}
+
+	/* Teardown the gpadl on the vsp end */
+	if (net_device->recv_buf_gpadl_handle) {
+		pr_devel("trying to teardown gpadl...\n");
+		ret = vmbus_teardown_gpadl(net_device->dev->channel,
+			   net_device->recv_buf_gpadl_handle);
+
+		if (ret) {
+			pr_err("unable to teardown receive buffer's gpadl\n");
+			return ret;
+		}
+		net_device->recv_buf_gpadl_handle = 0;
+	}
+
+	if (net_device->recv_buf) {
+		/* Free up the receive buffer */
+		free_pages((unsigned long)net_device->recv_buf,
+				get_order(net_device->recv_buf_size));
+		net_device->recv_buf = NULL;
+	}
+
+	if (net_device->recv_section) {
+		net_device->recv_section_cnt = 0;
+		kfree(net_device->recv_section);
+		net_device->recv_section = NULL;
+	}
+
+	return ret;
+}
+
+static int
+netvsc_init_recv_buf(struct hv_device *dev, struct netvsc_device *net_dev)
+{
+	int ret = 0;
+	struct nvsp_message *init_packet;
+
+	if (!net_dev)
+		return -ENODEV;
+
+	net_dev->recv_buf = (void *)__get_free_pages(GFP_KERNEL|__GFP_ZERO,
+			get_order(net_dev->recv_buf_size));
+	if (!net_dev->recv_buf) {
+		pr_err("unable to allocate receive buffer of size %d\n",
+		       net_dev->recv_buf_size);
+		ret = -ENOMEM;
+		goto cleanup;
+	}
+
+	/*
+	 * Establish the gpadl handle for this buffer on this
+	 * channel.  Note: This call uses the vmbus connection rather
+	 * than the channel to establish the gpadl handle.
+	 */
+	ret = vmbus_establish_gpadl(dev->channel, net_dev->recv_buf,
+				    net_dev->recv_buf_size,
+				    &net_dev->recv_buf_gpadl_handle);
+	if (ret != 0) {
+		pr_err("unable to establish receive buffer's gpadl\n");
+		goto cleanup;
+	}
+
+
+	/* Notify the NetVsp of the gpadl handle */
+	init_packet = &net_dev->channel_init_pkt;
+
+	memset(init_packet, 0, sizeof(struct nvsp_message));
+
+	init_packet->hdr.msg_type = NVSP_MSG1_TYPE_SEND_RECV_BUF;
+	init_packet->msg.v1_msg.send_recv_buf.
+		gpadl_handle = net_dev->recv_buf_gpadl_handle;
+	init_packet->msg.v1_msg.
+		send_recv_buf.id = NETVSC_RECEIVE_BUFFER_ID;
+
+	/* Send the gpadl notification request */
+	ret = vmbus_sendpacket(dev->channel, init_packet,
+			       sizeof(struct nvsp_message),
+			       (unsigned long)init_packet,
+			       VM_PKT_DATA_INBAND,
+			       VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED);
+	if (ret != 0) {
+		pr_err("unable to send receive buffer's gpadl to netvsp\n");
+		goto cleanup;
+	}
+
+	net_dev->recv_section_cnt = 1;
+	goto exit;
+
+cleanup:
+	netvsc_destroy_recv_buf(net_dev);
+
+exit:
+	return ret;
+}
+
+static int
+netvsc_connect_vsp(struct hv_device *dev, struct netvsc_device *net_dev)
+{
+	int ret;
+	struct nvsp_message *init_packet;
+	int ndis_version;
+
+	if (!net_dev)
+		return -ENODEV;
+
+	init_packet = &net_dev->channel_init_pkt;
+
+	/* Negotiate the latest NVSP protocol supported */
+	if (negotiate_nvsp_ver(dev, net_dev, init_packet,
+			       NVSP_PROTOCOL_VERSION_2) == 0) {
+		net_dev->nvsp_version = NVSP_PROTOCOL_VERSION_2;
+	} else if (negotiate_nvsp_ver(dev, net_dev, init_packet,
+				    NVSP_PROTOCOL_VERSION_1) == 0) {
+		net_dev->nvsp_version = NVSP_PROTOCOL_VERSION_1;
+	} else {
+		return -EPROTO;
+	}
+
+	pr_devel("Negotiated NVSP version:%x\n", net_dev->nvsp_version);
+
+	/* Send the ndis version */
+	memset(init_packet, 0, sizeof(struct nvsp_message));
+
+	ndis_version = 0x00050001;
+
+	init_packet->hdr.msg_type = NVSP_MSG1_TYPE_SEND_NDIS_VER;
+	init_packet->msg.v1_msg.
+		send_ndis_ver.ndis_major_ver =
+				(ndis_version & 0xFFFF0000) >> 16;
+	init_packet->msg.v1_msg.
+		send_ndis_ver.ndis_minor_ver =
+				ndis_version & 0xFFFF;
+
+	/* Send the init request */
+	ret = vmbus_sendpacket(dev->channel, init_packet,
+				sizeof(struct nvsp_message),
+				(unsigned long)init_packet,
+				VM_PKT_DATA_INBAND, 0);
+	if (ret != 0) {
+		pr_err("unable to send init_packet via vmbus\n");
+		return ret;
+	}
+
+	/* Post the big receive buffer to NetVSP */
+	ret = netvsc_init_recv_buf(dev, net_dev);
+
+	return ret;
+}
+
+static int
+hyperv_dev_add(struct hv_device *dev, struct netvsc_device *net_dev)
+{
+	int ret = 0;
+
+	net_dev->recv_buf_size = NETVSC_RECEIVE_BUFFER_SIZE;
+
+	ret = vmbus_open(dev->channel, HV_RING_SIZE * PAGE_SIZE,
+			HV_RING_SIZE * PAGE_SIZE, NULL, 0, NULL, dev);
+	if (ret) {
+		pr_err("unable to open channel: %d\n", ret);
+		return ret;
+	}
+	dev->channel->inbound.ring_buffer->interrupt_mask = 1;
+
+	ret = netvsc_connect_vsp(dev, net_dev);
+	if (ret) {
+		pr_err("unable to connect to NetVSP: %d\n", ret);
+		goto close;
+	}
+
+	return ret;
+
+close:
+	vmbus_close(dev->channel);
+
+	return ret;
+}
+
+static void
+hyperv_dev_remove(struct hv_device *dev, struct netvsc_device *net_dev)
+{
+	if (net_dev->recv_buf) {
+		netvsc_destroy_recv_buf(net_dev);
+		vmbus_close(dev->channel);
+	}
+}
+
+#define MAX_HV_DEVICE_NUM 256
+static struct hv_device *hv_device_list[MAX_HV_DEVICE_NUM];
+
+/*
+ * This callback is set as irqcontrol for uio, it can be used for mtu changing
+ * The variable arg consists of command, device number(see HV_DEV_ID)
+ * and value of MTU(see HV_MTU)
+ */
+static int
+hyperv_write_cb(struct uio_info *info, s32 arg)
+{
+	struct hv_device *dev;
+	int ret, cmd = arg & 1, dev_num = (arg >> 1) & 0xFF;
+	struct hyperv_private_data *pdata;
+	struct netvsc_device *net_device;
+
+	dev = hv_device_list[dev_num];
+	if (!dev)
+		return 0;
+	pdata = hv_get_drvdata(dev);
+	net_device = pdata->net_device;
+	switch (cmd) {
+	case HV_DEVICE_ADD:
+		mtu = arg >> 9;
+		pr_devel("New mtu = %u\n", mtu);
+		ret = hyperv_dev_add(dev, net_device);
+		if (!ret) {
+			info->mem[TXRX_RING_MAP].addr =
+				(phys_addr_t)(dev->channel->ringbuffer_pages);
+			info->mem[RECV_BUF_MAP].addr = (phys_addr_t)(net_device->recv_buf);
+			return sizeof(s32);
+		}
+		break;
+	case HV_DEVICE_REMOVE:
+		hyperv_dev_remove(dev, net_device);
+		return sizeof(s32);
+	}
+
+	return 0;
+}
+
+static int
+hyperv_probe(struct hv_device *dev,
+		const struct hv_vmbus_device_id *dev_id)
+{
+	int ret;
+	struct hyperv_private_data *pdata;
+	struct uio_info *info;
+	struct netvsc_device *net_device;
+
+	pdata = kzalloc(sizeof(struct hyperv_private_data), GFP_KERNEL);
+	if (!pdata) {
+		pr_err("Failed to allocate hyperv_private_data\n");
+		return -ENOMEM;
+	}
+
+	info = kzalloc(sizeof(struct uio_info), GFP_KERNEL);
+	if (!info) {
+		pr_err("Failed to allocate uio_info\n");
+		kfree(pdata);
+		return -ENOMEM;
+	}
+
+	net_device = alloc_net_device(dev);
+	if (!net_device) {
+		kfree(pdata);
+		kfree(info);
+		return -ENOMEM;
+	}
+
+	ret = hyperv_dev_add(dev, net_device);
+	if (ret) {
+		kfree(pdata);
+		kfree(info);
+		kfree(net_device);
+		return ret;
+	}
+
+	/* Fill general uio info */
+	info->name = "hv_uio";
+	info->version = "1.0";
+	info->irqcontrol = hyperv_write_cb;
+	info->irq = UIO_IRQ_CUSTOM;
+
+	/* mem resources */
+	info->mem[TXRX_RING_MAP].name = "txrx_rings";
+	info->mem[TXRX_RING_MAP].addr =
+		(phys_addr_t)(dev->channel->ringbuffer_pages);
+	info->mem[TXRX_RING_MAP].size = HV_RING_SIZE * PAGE_SIZE * 2;
+	info->mem[TXRX_RING_MAP].memtype = UIO_MEM_LOGICAL;
+
+	info->mem[INT_PAGE_MAP].name = "int_page";
+	info->mem[INT_PAGE_MAP].addr =
+		(phys_addr_t)(int_page);
+	info->mem[INT_PAGE_MAP].size = PAGE_SIZE;
+	info->mem[INT_PAGE_MAP].memtype = UIO_MEM_LOGICAL;
+
+	info->mem[MON_PAGE_MAP].name = "monitor_pages";
+	info->mem[MON_PAGE_MAP].addr =
+		(phys_addr_t)(monitor_pages[1]);
+	info->mem[MON_PAGE_MAP].size = PAGE_SIZE;
+	info->mem[MON_PAGE_MAP].memtype = UIO_MEM_LOGICAL;
+
+	info->mem[RECV_BUF_MAP].name = "recv_buf";
+	info->mem[RECV_BUF_MAP].addr = (phys_addr_t)(net_device->recv_buf);
+	info->mem[RECV_BUF_MAP].size = net_device->recv_buf_size;
+	info->mem[RECV_BUF_MAP].memtype = UIO_MEM_LOGICAL;
+
+	info->mmap = hyperv_uio_mmap;
+
+	pr_devel("register hyperv driver for hv_device {%pUl}\n", dev->dev_instance.b);
+	ret = uio_register_device(&dev->device, info);
+	if (ret)
+		pr_err("Failed to register uio device for hyperv\n");
+	else
+		hv_device_list[dev->channel->offermsg.child_relid] = dev;
+
+	pdata->info = info;
+	pdata->net_device = net_device;
+	hv_set_drvdata(dev, pdata);
+
+	return ret;
+}
+
+static int
+hyperv_remove(struct hv_device *dev)
+{
+	struct hyperv_private_data *pdata;
+	struct uio_info *info;
+	struct netvsc_device *net_dev;
+
+	pr_devel("unregister hyperv driver for hv_device {%pUl}\n",
+	       dev->dev_instance.b);
+
+	pdata = hv_get_drvdata(dev);
+	info = pdata->info;
+	uio_unregister_device(info);
+	kfree(info);
+
+	net_dev = pdata->net_device;
+	hv_set_drvdata(dev, NULL);
+
+	hyperv_dev_remove(dev, net_dev);
+
+	kfree(net_dev);
+	kfree(pdata);
+
+	return 0;
+}
+
+static const struct hv_vmbus_device_id hyperv_id_table[] = {
+	{ HV_NIC_GUID, },
+	{ },
+};
+
+MODULE_DEVICE_TABLE(vmbus, hyperv_id_table);
+
+static struct hv_driver hv_uio_drv = {
+	.name = KBUILD_MODNAME,
+	.id_table = hyperv_id_table,
+	.probe = hyperv_probe,
+	.remove = hyperv_remove,
+};
+
+static int __init
+hyperv_module_init(void)
+{
+	vmbus_get_monitor_pages(&int_page, monitor_pages);
+
+	return vmbus_driver_register(&hv_uio_drv);
+}
+
+static void __exit
+hyperv_module_exit(void)
+{
+	vmbus_driver_unregister(&hv_uio_drv);
+}
+
+module_init(hyperv_module_init);
+module_exit(hyperv_module_exit);
+
+MODULE_DESCRIPTION("UIO driver for Hyper-V netVSC");
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Brocade");
diff --git a/lib/librte_eal/linuxapp/hv_uio/hyperv_net.h b/lib/librte_eal/linuxapp/hv_uio/hyperv_net.h
new file mode 100644
index 0000000..8097779
--- /dev/null
+++ b/lib/librte_eal/linuxapp/hv_uio/hyperv_net.h
@@ -0,0 +1,907 @@
+/*
+ *
+ * Copyright (c) 2011, Microsoft Corporation.
+ * Copyright (c) 2013-2015 Brocade Communications Systems, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ * Authors:
+ *   Haiyang Zhang <haiyangz@microsoft.com>
+ *   Hank Janssen  <hjanssen@microsoft.com>
+ *   K. Y. Srinivasan <kys@microsoft.com>
+ *
+ */
+
+#ifndef _HYPERV_NET_H
+#define _HYPERV_NET_H
+
+#include <linux/list.h>
+#include <linux/hyperv.h>
+#include <linux/rndis.h>
+
+/* Fwd declaration */
+struct hv_netvsc_packet;
+
+/* Represent the xfer page packet which contains 1 or more netvsc packet */
+struct xferpage_packet {
+	struct list_head list_ent;
+	u32 status;
+
+	/* # of netvsc packets this xfer packet contains */
+	u32 count;
+};
+
+/*
+ * Represent netvsc packet which contains 1 RNDIS and 1 ethernet frame
+ * within the RNDIS
+ */
+struct hv_netvsc_packet {
+	/* Bookkeeping stuff */
+	struct list_head list_ent;
+	u32 status;
+
+	struct hv_device *device;
+	bool is_data_pkt;
+	u16 vlan_tci;
+
+	/*
+	 * Valid only for receives when we break a xfer page packet
+	 * into multiple netvsc packets
+	 */
+	struct xferpage_packet *xfer_page_pkt;
+
+	union {
+		struct {
+			u64 recv_completion_tid;
+			void *recv_completion_ctx;
+			void (*recv_completion)(void *context);
+		} recv;
+		struct {
+			u64 send_completion_tid;
+			void *send_completion_ctx;
+			void (*send_completion)(void *context);
+		} send;
+	} completion;
+
+	/* This points to the memory after page_buf */
+	void *extension;
+
+	u32 total_data_buflen;
+	/* Points to the send/receive buffer where the ethernet frame is */
+	void *data;
+	u32 page_buf_cnt;
+	struct hv_page_buffer page_buf[0];
+};
+
+struct netvsc_device_info {
+	unsigned char mac_adr[ETH_ALEN];
+	bool link_state;	/* 0 - link up, 1 - link down */
+	int  ring_size;
+};
+
+enum rndis_device_state {
+	RNDIS_DEV_UNINITIALIZED = 0,
+	RNDIS_DEV_INITIALIZING,
+	RNDIS_DEV_INITIALIZED,
+	RNDIS_DEV_DATAINITIALIZED,
+};
+
+struct rndis_device {
+	struct netvsc_device *net_dev;
+
+	enum rndis_device_state state;
+	bool link_state;
+	atomic_t new_req_id;
+
+	spinlock_t request_lock;
+	struct list_head req_list;
+
+	unsigned char hw_mac_adr[ETH_ALEN];
+};
+
+
+/* Interface */
+int netvsc_device_add(struct hv_device *device, void *additional_info);
+int netvsc_device_remove(struct hv_device *device);
+int netvsc_send(struct hv_device *device,
+		struct hv_netvsc_packet *packet);
+void netvsc_linkstatus_callback(struct hv_device *device_obj,
+				unsigned int status);
+int netvsc_recv_callback(struct hv_device *device_obj,
+			struct hv_netvsc_packet *packet);
+int rndis_filter_open(struct hv_device *dev);
+int rndis_filter_close(struct hv_device *dev);
+int rndis_filter_device_add(struct hv_device *dev,
+			void *additional_info);
+void rndis_filter_device_remove(struct hv_device *dev);
+int rndis_filter_receive(struct hv_device *dev,
+			struct hv_netvsc_packet *pkt);
+
+
+
+int rndis_filter_send(struct hv_device *dev,
+			struct hv_netvsc_packet *pkt);
+
+int rndis_filter_set_packet_filter(struct rndis_device *dev, u32 new_filter);
+int rndis_filter_set_device_mac(struct hv_device *hdev, char *mac);
+
+
+#define NVSP_INVALID_PROTOCOL_VERSION	((u32)0xFFFFFFFF)
+
+#define NVSP_PROTOCOL_VERSION_1		2
+#define NVSP_PROTOCOL_VERSION_2		0x30002
+
+enum {
+	NVSP_MSG_TYPE_NONE = 0,
+
+	/* Init Messages */
+	NVSP_MSG_TYPE_INIT			= 1,
+	NVSP_MSG_TYPE_INIT_COMPLETE		= 2,
+
+	NVSP_VERSION_MSG_START			= 100,
+
+	/* Version 1 Messages */
+	NVSP_MSG1_TYPE_SEND_NDIS_VER		= NVSP_VERSION_MSG_START,
+
+	NVSP_MSG1_TYPE_SEND_RECV_BUF,
+	NVSP_MSG1_TYPE_SEND_RECV_BUF_COMPLETE,
+	NVSP_MSG1_TYPE_REVOKE_RECV_BUF,
+
+	NVSP_MSG1_TYPE_SEND_SEND_BUF,
+	NVSP_MSG1_TYPE_SEND_SEND_BUF_COMPLETE,
+	NVSP_MSG1_TYPE_REVOKE_SEND_BUF,
+
+	NVSP_MSG1_TYPE_SEND_RNDIS_PKT,
+	NVSP_MSG1_TYPE_SEND_RNDIS_PKT_COMPLETE,
+
+	/* Version 2 messages */
+	NVSP_MSG2_TYPE_SEND_CHIMNEY_DELEGATED_BUF,
+	NVSP_MSG2_TYPE_SEND_CHIMNEY_DELEGATED_BUF_COMP,
+	NVSP_MSG2_TYPE_REVOKE_CHIMNEY_DELEGATED_BUF,
+
+	NVSP_MSG2_TYPE_RESUME_CHIMNEY_RX_INDICATION,
+
+	NVSP_MSG2_TYPE_TERMINATE_CHIMNEY,
+	NVSP_MSG2_TYPE_TERMINATE_CHIMNEY_COMP,
+
+	NVSP_MSG2_TYPE_INDICATE_CHIMNEY_EVENT,
+
+	NVSP_MSG2_TYPE_SEND_CHIMNEY_PKT,
+	NVSP_MSG2_TYPE_SEND_CHIMNEY_PKT_COMP,
+
+	NVSP_MSG2_TYPE_POST_CHIMNEY_RECV_REQ,
+	NVSP_MSG2_TYPE_POST_CHIMNEY_RECV_REQ_COMP,
+
+	NVSP_MSG2_TYPE_ALLOC_RXBUF,
+	NVSP_MSG2_TYPE_ALLOC_RXBUF_COMP,
+
+	NVSP_MSG2_TYPE_FREE_RXBUF,
+
+	NVSP_MSG2_TYPE_SEND_VMQ_RNDIS_PKT,
+	NVSP_MSG2_TYPE_SEND_VMQ_RNDIS_PKT_COMP,
+
+	NVSP_MSG2_TYPE_SEND_NDIS_CONFIG,
+
+	NVSP_MSG2_TYPE_ALLOC_CHIMNEY_HANDLE,
+	NVSP_MSG2_TYPE_ALLOC_CHIMNEY_HANDLE_COMP,
+};
+
+enum {
+	NVSP_STAT_NONE = 0,
+	NVSP_STAT_SUCCESS,
+	NVSP_STAT_FAIL,
+	NVSP_STAT_PROTOCOL_TOO_NEW,
+	NVSP_STAT_PROTOCOL_TOO_OLD,
+	NVSP_STAT_INVALID_RNDIS_PKT,
+	NVSP_STAT_BUSY,
+	NVSP_STAT_PROTOCOL_UNSUPPORTED,
+	NVSP_STAT_MAX,
+};
+
+struct nvsp_message_header {
+	u32 msg_type;
+};
+
+/* Init Messages */
+
+/*
+ * This message is used by the VSC to initialize the channel after the channels
+ * has been opened. This message should never include anything other then
+ * versioning (i.e. this message will be the same for ever).
+ */
+struct nvsp_message_init {
+	u32 min_protocol_ver;
+	u32 max_protocol_ver;
+} __packed;
+
+/*
+ * This message is used by the VSP to complete the initialization of the
+ * channel. This message should never include anything other then versioning
+ * (i.e. this message will be the same for ever).
+ */
+struct nvsp_message_init_complete {
+	u32 negotiated_protocol_ver;
+	u32 max_mdl_chain_len;
+	u32 status;
+} __packed;
+
+union nvsp_message_init_uber {
+	struct nvsp_message_init init;
+	struct nvsp_message_init_complete init_complete;
+} __packed;
+
+/* Version 1 Messages */
+
+/*
+ * This message is used by the VSC to send the NDIS version to the VSP. The VSP
+ * can use this information when handling OIDs sent by the VSC.
+ */
+struct nvsp_1_message_send_ndis_version {
+	u32 ndis_major_ver;
+	u32 ndis_minor_ver;
+} __packed;
+
+/*
+ * This message is used by the VSC to send a receive buffer to the VSP. The VSP
+ * can then use the receive buffer to send data to the VSC.
+ */
+struct nvsp_1_message_send_receive_buffer {
+	u32 gpadl_handle;
+	u16 id;
+} __packed;
+
+struct nvsp_1_receive_buffer_section {
+	u32 offset;
+	u32 sub_alloc_size;
+	u32 num_sub_allocs;
+	u32 end_offset;
+} __packed;
+
+/*
+ * This message is used by the VSP to acknowledge a receive buffer send by the
+ * VSC. This message must be sent by the VSP before the VSP uses the receive
+ * buffer.
+ */
+struct nvsp_1_message_send_receive_buffer_complete {
+	u32 status;
+	u32 num_sections;
+
+	/*
+	 * The receive buffer is split into two parts, a large suballocation
+	 * section and a small suballocation section. These sections are then
+	 * suballocated by a certain size.
+	 */
+
+	/*
+	 * For example, the following break up of the receive buffer has 6
+	 * large suballocations and 10 small suballocations.
+	 */
+
+	/*
+	 * |            Large Section          |  |   Small Section   |
+	 * ------------------------------------------------------------
+	 * |     |     |     |     |     |     |  | | | | | | | | | | |
+	 * |                                      |
+	 *  LargeOffset                            SmallOffset
+	 */
+
+	struct nvsp_1_receive_buffer_section sections[1];
+} __packed;
+
+/*
+ * This message is sent by the VSC to revoke the receive buffer.  After the VSP
+ * completes this transaction, the vsp should never use the receive buffer
+ * again.
+ */
+struct nvsp_1_message_revoke_receive_buffer {
+	u16 id;
+};
+
+/*
+ * This message is used by the VSC to send a send buffer to the VSP. The VSC
+ * can then use the send buffer to send data to the VSP.
+ */
+struct nvsp_1_message_send_send_buffer {
+	u32 gpadl_handle;
+	u16 id;
+} __packed;
+
+/*
+ * This message is used by the VSP to acknowledge a send buffer sent by the
+ * VSC. This message must be sent by the VSP before the VSP uses the sent
+ * buffer.
+ */
+struct nvsp_1_message_send_send_buffer_complete {
+	u32 status;
+
+	/*
+	 * The VSC gets to choose the size of the send buffer and the VSP gets
+	 * to choose the sections size of the buffer.  This was done to enable
+	 * dynamic reconfigurations when the cost of GPA-direct buffers
+	 * decreases.
+	 */
+	u32 section_size;
+} __packed;
+
+/*
+ * This message is sent by the VSC to revoke the send buffer.  After the VSP
+ * completes this transaction, the vsp should never use the send buffer again.
+ */
+struct nvsp_1_message_revoke_send_buffer {
+	u16 id;
+};
+
+/*
+ * This message is used by both the VSP and the VSC to send a RNDIS message to
+ * the opposite channel endpoint.
+ */
+struct nvsp_1_message_send_rndis_packet {
+	/*
+	 * This field is specified by RNIDS. They assume there's two different
+	 * channels of communication. However, the Network VSP only has one.
+	 * Therefore, the channel travels with the RNDIS packet.
+	 */
+	u32 channel_type;
+
+	/*
+	 * This field is used to send part or all of the data through a send
+	 * buffer. This values specifies an index into the send buffer. If the
+	 * index is 0xFFFFFFFF, then the send buffer is not being used and all
+	 * of the data was sent through other VMBus mechanisms.
+	 */
+	u32 send_buf_section_index;
+	u32 send_buf_section_size;
+} __packed;
+
+/*
+ * This message is used by both the VSP and the VSC to complete a RNDIS message
+ * to the opposite channel endpoint. At this point, the initiator of this
+ * message cannot use any resources associated with the original RNDIS packet.
+ */
+struct nvsp_1_message_send_rndis_packet_complete {
+	u32 status;
+};
+
+union nvsp_1_message_uber {
+	struct nvsp_1_message_send_ndis_version send_ndis_ver;
+
+	struct nvsp_1_message_send_receive_buffer send_recv_buf;
+	struct nvsp_1_message_send_receive_buffer_complete
+						send_recv_buf_complete;
+	struct nvsp_1_message_revoke_receive_buffer revoke_recv_buf;
+
+	struct nvsp_1_message_send_send_buffer send_send_buf;
+	struct nvsp_1_message_send_send_buffer_complete send_send_buf_complete;
+	struct nvsp_1_message_revoke_send_buffer revoke_send_buf;
+
+	struct nvsp_1_message_send_rndis_packet send_rndis_pkt;
+	struct nvsp_1_message_send_rndis_packet_complete
+						send_rndis_pkt_complete;
+} __packed;
+
+
+/*
+ * Network VSP protocol version 2 messages:
+ */
+struct nvsp_2_vsc_capability {
+	union {
+		u64 data;
+		struct {
+			u64 vmq:1;
+			u64 chimney:1;
+			u64 sriov:1;
+			u64 ieee8021q:1;
+			u64 correlation_id:1;
+		};
+	};
+} __packed;
+
+struct nvsp_2_send_ndis_config {
+	u32 mtu;
+	u32 reserved;
+	struct nvsp_2_vsc_capability capability;
+} __packed;
+
+/* Allocate receive buffer */
+struct nvsp_2_alloc_rxbuf {
+	/* Allocation ID to match the allocation request and response */
+	u32 alloc_id;
+
+	/* Length of the VM shared memory receive buffer that needs to
+	 * be allocated
+	 */
+	u32 len;
+} __packed;
+
+/* Allocate receive buffer complete */
+struct nvsp_2_alloc_rxbuf_comp {
+	/* The NDIS_STATUS code for buffer allocation */
+	u32 status;
+
+	u32 alloc_id;
+
+	/* GPADL handle for the allocated receive buffer */
+	u32 gpadl_handle;
+
+	/* Receive buffer ID */
+	u64 recv_buf_id;
+} __packed;
+
+struct nvsp_2_free_rxbuf {
+	u64 recv_buf_id;
+} __packed;
+
+union nvsp_2_message_uber {
+	struct nvsp_2_send_ndis_config send_ndis_config;
+	struct nvsp_2_alloc_rxbuf alloc_rxbuf;
+	struct nvsp_2_alloc_rxbuf_comp alloc_rxbuf_comp;
+	struct nvsp_2_free_rxbuf free_rxbuf;
+} __packed;
+
+union nvsp_all_messages {
+	union nvsp_message_init_uber init_msg;
+	union nvsp_1_message_uber v1_msg;
+	union nvsp_2_message_uber v2_msg;
+} __packed;
+
+/* ALL Messages */
+struct nvsp_message {
+	struct nvsp_message_header hdr;
+	union nvsp_all_messages msg;
+} __packed;
+
+
+#define NETVSC_MTU 65536
+
+#define NETVSC_RECEIVE_BUFFER_SIZE (MAX_ORDER_NR_PAGES * PAGE_SIZE)
+
+#define NETVSC_RECEIVE_BUFFER_ID		0xcafe
+
+/* Per netvsc channel-specific */
+struct netvsc_device {
+	struct hv_device *dev;
+
+	u32 nvsp_version;
+
+	atomic_t num_outstanding_sends;
+	wait_queue_head_t wait_drain;
+	bool start_remove;
+	bool destroy;
+	/*
+	 * List of free preallocated hv_netvsc_packet to represent receive
+	 * packet
+	 */
+	struct list_head recv_pkt_list;
+	spinlock_t recv_pkt_list_lock;
+
+	/* Receive buffer allocated by us but manages by NetVSP */
+	void *recv_buf;
+	u32 recv_buf_size;
+	u32 recv_buf_gpadl_handle;
+	u32 recv_section_cnt;
+	struct nvsp_1_receive_buffer_section *recv_section;
+
+	/* Used for NetVSP initialization protocol */
+	struct completion channel_init_wait;
+	struct nvsp_message channel_init_pkt;
+
+	struct nvsp_message revoke_packet;
+	/* unsigned char HwMacAddr[HW_MACADDR_LEN]; */
+
+	struct net_device *ndev;
+
+	/* Holds rndis device info */
+	void *extension;
+};
+
+/* NdisInitialize message */
+struct rndis_initialize_request {
+	u32 req_id;
+	u32 major_ver;
+	u32 minor_ver;
+	u32 max_xfer_size;
+};
+
+/* Response to NdisInitialize */
+struct rndis_initialize_complete {
+	u32 req_id;
+	u32 status;
+	u32 major_ver;
+	u32 minor_ver;
+	u32 dev_flags;
+	u32 medium;
+	u32 max_pkt_per_msg;
+	u32 max_xfer_size;
+	u32 pkt_alignment_factor;
+	u32 af_list_offset;
+	u32 af_list_size;
+};
+
+/* Call manager devices only: Information about an address family */
+/* supported by the device is appended to the response to NdisInitialize. */
+struct rndis_co_address_family {
+	u32 address_family;
+	u32 major_ver;
+	u32 minor_ver;
+};
+
+/* NdisHalt message */
+struct rndis_halt_request {
+	u32 req_id;
+};
+
+/* NdisQueryRequest message */
+struct rndis_query_request {
+	u32 req_id;
+	u32 oid;
+	u32 info_buflen;
+	u32 info_buf_offset;
+	u32 dev_vc_handle;
+};
+
+/* Response to NdisQueryRequest */
+struct rndis_query_complete {
+	u32 req_id;
+	u32 status;
+	u32 info_buflen;
+	u32 info_buf_offset;
+};
+
+/* NdisSetRequest message */
+struct rndis_set_request {
+	u32 req_id;
+	u32 oid;
+	u32 info_buflen;
+	u32 info_buf_offset;
+	u32 dev_vc_handle;
+};
+
+/* Response to NdisSetRequest */
+struct rndis_set_complete {
+	u32 req_id;
+	u32 status;
+};
+
+/* NdisReset message */
+struct rndis_reset_request {
+	u32 reserved;
+};
+
+/* Response to NdisReset */
+struct rndis_reset_complete {
+	u32 status;
+	u32 addressing_reset;
+};
+
+/* NdisMIndicateStatus message */
+struct rndis_indicate_status {
+	u32 status;
+	u32 status_buflen;
+	u32 status_buf_offset;
+};
+
+/* Diagnostic information passed as the status buffer in */
+/* struct rndis_indicate_status messages signifying error conditions. */
+struct rndis_diagnostic_info {
+	u32 diag_status;
+	u32 error_offset;
+};
+
+/* NdisKeepAlive message */
+struct rndis_keepalive_request {
+	u32 req_id;
+};
+
+/* Response to NdisKeepAlive */
+struct rndis_keepalive_complete {
+	u32 req_id;
+	u32 status;
+};
+
+/*
+ * Data message. All Offset fields contain byte offsets from the beginning of
+ * struct rndis_packet. All Length fields are in bytes.  VcHandle is set
+ * to 0 for connectionless data, otherwise it contains the VC handle.
+ */
+struct rndis_packet {
+	u32 data_offset;
+	u32 data_len;
+	u32 oob_data_offset;
+	u32 oob_data_len;
+	u32 num_oob_data_elements;
+	u32 per_pkt_info_offset;
+	u32 per_pkt_info_len;
+	u32 vc_handle;
+	u32 reserved;
+};
+
+/* Optional Out of Band data associated with a Data message. */
+struct rndis_oobd {
+	u32 size;
+	u32 type;
+	u32 class_info_offset;
+};
+
+/* Packet extension field contents associated with a Data message. */
+struct rndis_per_packet_info {
+	u32 size;
+	u32 type;
+	u32 ppi_offset;
+};
+
+enum ndis_per_pkt_info_type {
+	TCPIP_CHKSUM_PKTINFO,
+	IPSEC_PKTINFO,
+	TCP_LARGESEND_PKTINFO,
+	CLASSIFICATION_HANDLE_PKTINFO,
+	NDIS_RESERVED,
+	SG_LIST_PKTINFO,
+	IEEE_8021Q_INFO,
+	ORIGINAL_PKTINFO,
+	PACKET_CANCEL_ID,
+	ORIGINAL_NET_BUFLIST,
+	CACHED_NET_BUFLIST,
+	SHORT_PKT_PADINFO,
+	MAX_PER_PKT_INFO
+};
+
+struct ndis_pkt_8021q_info {
+	union {
+		struct {
+			u32 pri:3; /* User Priority */
+			u32 cfi:1; /* Canonical Format ID */
+			u32 vlanid:12; /* VLAN ID */
+			u32 reserved:16;
+		};
+		u32 value;
+	};
+};
+
+#define NDIS_VLAN_PPI_SIZE (sizeof(struct rndis_per_packet_info) + \
+		sizeof(struct ndis_pkt_8021q_info))
+
+/* Format of Information buffer passed in a SetRequest for the OID */
+/* OID_GEN_RNDIS_CONFIG_PARAMETER. */
+struct rndis_config_parameter_info {
+	u32 parameter_name_offset;
+	u32 parameter_name_length;
+	u32 parameter_type;
+	u32 parameter_value_offset;
+	u32 parameter_value_length;
+};
+
+/* Values for ParameterType in struct rndis_config_parameter_info */
+#define RNDIS_CONFIG_PARAM_TYPE_INTEGER     0
+#define RNDIS_CONFIG_PARAM_TYPE_STRING      2
+
+/* CONDIS Miniport messages for connection oriented devices */
+/* that do not implement a call manager. */
+
+/* CoNdisMiniportCreateVc message */
+struct rcondis_mp_create_vc {
+	u32 req_id;
+	u32 ndis_vc_handle;
+};
+
+/* Response to CoNdisMiniportCreateVc */
+struct rcondis_mp_create_vc_complete {
+	u32 req_id;
+	u32 dev_vc_handle;
+	u32 status;
+};
+
+/* CoNdisMiniportDeleteVc message */
+struct rcondis_mp_delete_vc {
+	u32 req_id;
+	u32 dev_vc_handle;
+};
+
+/* Response to CoNdisMiniportDeleteVc */
+struct rcondis_mp_delete_vc_complete {
+	u32 req_id;
+	u32 status;
+};
+
+/* CoNdisMiniportQueryRequest message */
+struct rcondis_mp_query_request {
+	u32 req_id;
+	u32 request_type;
+	u32 oid;
+	u32 dev_vc_handle;
+	u32 info_buflen;
+	u32 info_buf_offset;
+};
+
+/* CoNdisMiniportSetRequest message */
+struct rcondis_mp_set_request {
+	u32 req_id;
+	u32 request_type;
+	u32 oid;
+	u32 dev_vc_handle;
+	u32 info_buflen;
+	u32 info_buf_offset;
+};
+
+/* CoNdisIndicateStatus message */
+struct rcondis_indicate_status {
+	u32 ndis_vc_handle;
+	u32 status;
+	u32 status_buflen;
+	u32 status_buf_offset;
+};
+
+/* CONDIS Call/VC parameters */
+struct rcondis_specific_parameters {
+	u32 parameter_type;
+	u32 parameter_length;
+	u32 parameter_lffset;
+};
+
+struct rcondis_media_parameters {
+	u32 flags;
+	u32 reserved1;
+	u32 reserved2;
+	struct rcondis_specific_parameters media_specific;
+};
+
+struct rndis_flowspec {
+	u32 token_rate;
+	u32 token_bucket_size;
+	u32 peak_bandwidth;
+	u32 latency;
+	u32 delay_variation;
+	u32 service_type;
+	u32 max_sdu_size;
+	u32 minimum_policed_size;
+};
+
+struct rcondis_call_manager_parameters {
+	struct rndis_flowspec transmit;
+	struct rndis_flowspec receive;
+	struct rcondis_specific_parameters call_mgr_specific;
+};
+
+/* CoNdisMiniportActivateVc message */
+struct rcondis_mp_activate_vc_request {
+	u32 req_id;
+	u32 flags;
+	u32 dev_vc_handle;
+	u32 media_params_offset;
+	u32 media_params_length;
+	u32 call_mgr_params_offset;
+	u32 call_mgr_params_length;
+};
+
+/* Response to CoNdisMiniportActivateVc */
+struct rcondis_mp_activate_vc_complete {
+	u32 req_id;
+	u32 status;
+};
+
+/* CoNdisMiniportDeactivateVc message */
+struct rcondis_mp_deactivate_vc_request {
+	u32 req_id;
+	u32 flags;
+	u32 dev_vc_handle;
+};
+
+/* Response to CoNdisMiniportDeactivateVc */
+struct rcondis_mp_deactivate_vc_complete {
+	u32 req_id;
+	u32 status;
+};
+
+
+/* union with all of the RNDIS messages */
+union rndis_message_container {
+	struct rndis_packet pkt;
+	struct rndis_initialize_request init_req;
+	struct rndis_halt_request halt_req;
+	struct rndis_query_request query_req;
+	struct rndis_set_request set_req;
+	struct rndis_reset_request reset_req;
+	struct rndis_keepalive_request keep_alive_req;
+	struct rndis_indicate_status indicate_status;
+	struct rndis_initialize_complete init_complete;
+	struct rndis_query_complete query_complete;
+	struct rndis_set_complete set_complete;
+	struct rndis_reset_complete reset_complete;
+	struct rndis_keepalive_complete keep_alive_complete;
+	struct rcondis_mp_create_vc co_miniport_create_vc;
+	struct rcondis_mp_delete_vc co_miniport_delete_vc;
+	struct rcondis_indicate_status co_indicate_status;
+	struct rcondis_mp_activate_vc_request co_miniport_activate_vc;
+	struct rcondis_mp_deactivate_vc_request co_miniport_deactivate_vc;
+	struct rcondis_mp_create_vc_complete co_miniport_create_vc_complete;
+	struct rcondis_mp_delete_vc_complete co_miniport_delete_vc_complete;
+	struct rcondis_mp_activate_vc_complete co_miniport_activate_vc_complete;
+	struct rcondis_mp_deactivate_vc_complete
+		co_miniport_deactivate_vc_complete;
+};
+
+/* Remote NDIS message format */
+struct rndis_message {
+	u32 ndis_msg_type;
+
+	/* Total length of this message, from the beginning */
+	/* of the sruct rndis_message, in bytes. */
+	u32 msg_len;
+
+	/* Actual message */
+	union rndis_message_container msg;
+};
+
+
+struct rndis_filter_packet {
+	void *completion_ctx;
+	void (*completion)(void *context);
+	struct rndis_message msg;
+};
+
+/* Handy macros */
+
+/* get the size of an RNDIS message. Pass in the message type, */
+/* struct rndis_set_request, struct rndis_packet for example */
+#define RNDIS_MESSAGE_SIZE(msg)				\
+	(sizeof(msg) + (sizeof(struct rndis_message) -	\
+	 sizeof(union rndis_message_container)))
+
+/* get pointer to info buffer with message pointer */
+#define MESSAGE_TO_INFO_BUFFER(msg)				\
+	(((unsigned char *)(msg)) + msg->info_buf_offset)
+
+/* get pointer to status buffer with message pointer */
+#define MESSAGE_TO_STATUS_BUFFER(msg)			\
+	(((unsigned char *)(msg)) + msg->status_buf_offset)
+
+/* get pointer to OOBD buffer with message pointer */
+#define MESSAGE_TO_OOBD_BUFFER(msg)				\
+	(((unsigned char *)(msg)) + msg->oob_data_offset)
+
+/* get pointer to data buffer with message pointer */
+#define MESSAGE_TO_DATA_BUFFER(msg)				\
+	(((unsigned char *)(msg)) + msg->per_pkt_info_offset)
+
+/* get pointer to contained message from NDIS_MESSAGE pointer */
+#define RNDIS_MESSAGE_PTR_TO_MESSAGE_PTR(rndis_msg)		\
+	((void *) &rndis_msg->msg)
+
+/* get pointer to contained message from NDIS_MESSAGE pointer */
+#define RNDIS_MESSAGE_RAW_PTR_TO_MESSAGE_PTR(rndis_msg)	\
+	((void *) rndis_msg)
+
+
+#define __struct_bcount(x)
+
+
+
+#define RNDIS_HEADER_SIZE	(sizeof(struct rndis_message) - \
+				 sizeof(union rndis_message_container))
+
+#define NDIS_PACKET_TYPE_DIRECTED	0x00000001
+#define NDIS_PACKET_TYPE_MULTICAST	0x00000002
+#define NDIS_PACKET_TYPE_ALL_MULTICAST	0x00000004
+#define NDIS_PACKET_TYPE_BROADCAST	0x00000008
+#define NDIS_PACKET_TYPE_SOURCE_ROUTING	0x00000010
+#define NDIS_PACKET_TYPE_PROMISCUOUS	0x00000020
+#define NDIS_PACKET_TYPE_SMT		0x00000040
+#define NDIS_PACKET_TYPE_ALL_LOCAL	0x00000080
+#define NDIS_PACKET_TYPE_GROUP		0x00000100
+#define NDIS_PACKET_TYPE_ALL_FUNCTIONAL	0x00000200
+#define NDIS_PACKET_TYPE_FUNCTIONAL	0x00000400
+#define NDIS_PACKET_TYPE_MAC_FRAME	0x00000800
+
+
+
+#endif /* _HYPERV_NET_H */
-- 
2.1.4

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-dev] [PATCH v4 4/7] hv: uio driver
  2015-04-21 17:32 ` [dpdk-dev] [PATCH v4 4/7] hv: uio driver Stephen Hemminger
@ 2015-07-08 23:55   ` Thomas Monjalon
  0 siblings, 0 replies; 17+ messages in thread
From: Thomas Monjalon @ 2015-07-08 23:55 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Stas Egorov, Stephen Hemminger, alexmay

2015-04-21 10:32, Stephen Hemminger:
> Add new UIO driver in kernel to support DPDK Poll Mode Driver.
> 
> Signed-off-by: Stas Egorov <segorov@mirantis.com>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

We should not add a kernel driver in DPDK.
Stephen, you worked on upstreaming things in kernel and you often say that
maintaining an out-of-tree module is a nightmare. So why submitting this one
in DPDK?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [dpdk-dev] [PATCH v4 5/7] hv: poll mode driver
  2015-04-21 17:32 [dpdk-dev] [PATCH v4 0/7] Hyper-V Poll Mode driver Stephen Hemminger
                   ` (3 preceding siblings ...)
  2015-04-21 17:32 ` [dpdk-dev] [PATCH v4 4/7] hv: uio driver Stephen Hemminger
@ 2015-04-21 17:32 ` Stephen Hemminger
  2015-04-21 19:34   ` Butler, Siobhan A
  2015-07-09  0:05   ` Thomas Monjalon
  2015-04-21 17:32 ` [dpdk-dev] [PATCH v4 6/7] hv: enable driver in common config Stephen Hemminger
  2015-04-21 17:32 ` [dpdk-dev] [PATCH v4 7/7] hv: add kernel patch Stephen Hemminger
  6 siblings, 2 replies; 17+ messages in thread
From: Stephen Hemminger @ 2015-04-21 17:32 UTC (permalink / raw)
  To: alexmay; +Cc: dev, Stas Egorov, Stephen Hemminger

From: Stephen Hemminger <shemming@brocade.com>

This is new Poll Mode driver for using hyper-v virtual network
interface.

Signed-off-by: Stas Egorov <segorov@mirantis.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/Makefile                          |    1 +
 lib/librte_pmd_hyperv/Makefile        |   28 +
 lib/librte_pmd_hyperv/hyperv.h        |  169 ++++
 lib/librte_pmd_hyperv/hyperv_drv.c    | 1653 +++++++++++++++++++++++++++++++++
 lib/librte_pmd_hyperv/hyperv_drv.h    |  558 +++++++++++
 lib/librte_pmd_hyperv/hyperv_ethdev.c |  332 +++++++
 lib/librte_pmd_hyperv/hyperv_logs.h   |   69 ++
 lib/librte_pmd_hyperv/hyperv_rxtx.c   |  403 ++++++++
 lib/librte_pmd_hyperv/hyperv_rxtx.h   |   35 +
 mk/rte.app.mk                         |    4 +
 10 files changed, 3252 insertions(+)
 create mode 100644 lib/librte_pmd_hyperv/Makefile
 create mode 100644 lib/librte_pmd_hyperv/hyperv.h
 create mode 100644 lib/librte_pmd_hyperv/hyperv_drv.c
 create mode 100644 lib/librte_pmd_hyperv/hyperv_drv.h
 create mode 100644 lib/librte_pmd_hyperv/hyperv_ethdev.c
 create mode 100644 lib/librte_pmd_hyperv/hyperv_logs.h
 create mode 100644 lib/librte_pmd_hyperv/hyperv_rxtx.c
 create mode 100644 lib/librte_pmd_hyperv/hyperv_rxtx.h

diff --git a/lib/Makefile b/lib/Makefile
index d94355d..6c1daf2 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -47,6 +47,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += librte_pmd_i40e
 DIRS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += librte_pmd_fm10k
 DIRS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += librte_pmd_mlx4
 DIRS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += librte_pmd_enic
+DIRS-$(CONFIG_RTE_LIBRTE_HV_PMD) += librte_pmd_hyperv
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_BOND) += librte_pmd_bond
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_RING) += librte_pmd_ring
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_PCAP) += librte_pmd_pcap
diff --git a/lib/librte_pmd_hyperv/Makefile b/lib/librte_pmd_hyperv/Makefile
new file mode 100644
index 0000000..4ba08c8
--- /dev/null
+++ b/lib/librte_pmd_hyperv/Makefile
@@ -0,0 +1,28 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2013-2015 Brocade Communications Systems, Inc.
+#   All rights reserved.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_hyperv.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_HV_PMD) += hyperv_ethdev.c
+SRCS-$(CONFIG_RTE_LIBRTE_HV_PMD) += hyperv_rxtx.c
+SRCS-$(CONFIG_RTE_LIBRTE_HV_PMD) += hyperv_drv.c
+
+# this lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_HV_PMD) += lib/librte_eal lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_HV_PMD) += lib/librte_mempool lib/librte_mbuf
+DEPDIRS-$(CONFIG_RTE_LIBRTE_HV_PMD) += lib/librte_malloc
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_pmd_hyperv/hyperv.h b/lib/librte_pmd_hyperv/hyperv.h
new file mode 100644
index 0000000..5f66d8a
--- /dev/null
+++ b/lib/librte_pmd_hyperv/hyperv.h
@@ -0,0 +1,169 @@
+/*-
+ * Copyright (c) 2013-2015 Brocade Communications Systems, Inc.
+ * All rights reserved.
+ */
+
+#ifndef _HYPERV_H_
+#define _HYPERV_H_
+
+#include <sys/param.h>
+#include <rte_log.h>
+#include <rte_debug.h>
+#include <rte_ether.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+#include <rte_memzone.h>
+#include <rte_cycles.h>
+#include <rte_dev.h>
+
+#include "hyperv_logs.h"
+
+#define PAGE_SHIFT		12
+#define PAGE_SIZE		(1 << PAGE_SHIFT)
+
+/*
+ * Tunable ethdev params
+ */
+#define HV_MIN_RX_BUF_SIZE 1024
+#define HV_MAX_RX_PKT_LEN  4096
+#define HV_MAX_MAC_ADDRS   1
+#define HV_MAX_RX_QUEUES   1
+#define HV_MAX_TX_QUEUES   1
+#define HV_MAX_PKT_BURST   32
+#define HV_MAX_LINK_REQ    10
+
+/*
+ * List of resources mapped from kspace
+ * need to be the same as defined in hv_uio.c
+ */
+enum {
+	TXRX_RING_MAP,
+	INT_PAGE_MAP,
+	MON_PAGE_MAP,
+	RECV_BUF_MAP
+};
+
+/*
+ * Statistics
+ */
+struct hv_stats {
+	uint64_t opkts;
+	uint64_t obytes;
+	uint64_t oerrors;
+
+	uint64_t ipkts;
+	uint64_t ibytes;
+	uint64_t ierrors;
+	uint64_t rx_nombuf;
+};
+
+struct hv_data;
+struct netvsc_packet;
+struct rndis_msg;
+typedef void (*receive_callback_t)(struct hv_data *hv, struct rndis_msg *msg,
+		struct netvsc_packet *pkt);
+
+/*
+ * Main driver structure
+ */
+struct hv_data {
+	int vmbus_device;
+	uint8_t monitor_bit;
+	uint8_t monitor_group;
+	uint8_t kernel_initialized;
+	int uio_fd;
+	/* Flag indicates channel state. If closed, RX/TX shouldn't work further */
+	uint8_t closed;
+	/* Flag indicates whether HALT rndis request was received by host */
+	uint8_t hlt_req_sent;
+	/* Flag indicates pending state for HALT request */
+	uint8_t hlt_req_pending;
+	/* Counter for RNDIS requests */
+	uint32_t new_request_id;
+	/* State of RNDIS device */
+	uint8_t rndis_dev_state;
+	/* Number of transmitted packets but not completed yet by Hyper-V */
+	int num_outstanding_sends;
+	/* Max pkt len to fit in rx mbufs */
+	uint32_t max_rx_pkt_len;
+
+	uint8_t jumbo_frame_support;
+
+	struct hv_vmbus_ring_buffer *in;
+	struct hv_vmbus_ring_buffer *out;
+
+	/* Size of each ring_buffer(in/out) */
+	uint32_t rb_size;
+	/* Size of data in each ring_buffer(in/out) */
+	uint32_t rb_data_size;
+
+	void *int_page;
+	struct hv_vmbus_monitor_page *monitor_pages;
+	void *recv_interrupt_page;
+	void *send_interrupt_page;
+	void *ring_pages;
+	void *recv_buf;
+
+	uint8_t link_req_cnt;
+	uint32_t link_status;
+	uint8_t  hw_mac_addr[ETHER_ADDR_LEN];
+	struct rndis_request *req;
+	struct netvsc_packet *netvsc_packet;
+	struct nvsp_msg *rx_comp_msg;
+	struct hv_rx_queue *rxq;
+	struct hv_tx_queue *txq;
+	struct hv_vm_packet_descriptor *desc;
+	receive_callback_t receive_callback;
+	int pkt_rxed;
+
+	uint32_t debug;
+	struct hv_stats stats;
+};
+
+/*
+ * Extern functions declarations
+ */
+int hyperv_dev_tx_queue_setup(struct rte_eth_dev *dev,
+			 uint16_t queue_idx,
+			 uint16_t nb_desc,
+			 unsigned int socket_id,
+			 const struct rte_eth_txconf *tx_conf);
+
+void hyperv_dev_tx_queue_release(void *ptxq);
+
+int hyperv_dev_rx_queue_setup(struct rte_eth_dev *dev,
+			 uint16_t queue_idx,
+			 uint16_t nb_desc,
+			 unsigned int socket_id,
+			 const struct rte_eth_rxconf *rx_conf,
+			 struct rte_mempool *mp);
+
+void hyperv_dev_rx_queue_release(void *prxq);
+
+uint16_t
+hyperv_recv_pkts(void *prxq,
+		 struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
+
+uint16_t
+hyperv_xmit_pkts(void *ptxq,
+		 struct rte_mbuf **tx_pkts, uint16_t nb_pkts);
+
+int hv_rf_on_device_add(struct hv_data *hv);
+int hv_rf_on_device_remove(struct hv_data *hv);
+int hv_rf_on_send(struct hv_data *hv, struct netvsc_packet *pkt);
+int hv_rf_on_open(struct hv_data *hv);
+int hv_rf_on_close(struct hv_data *hv);
+int hv_rf_set_device_mac(struct hv_data *hv, uint8_t *mac);
+void hyperv_start_rx(struct hv_data *hv);
+void hyperv_stop_rx(struct hv_data *hv);
+int hyperv_get_buffer(struct hv_data *hv, void *buffer, uint32_t bufferlen);
+void hyperv_scan_comps(struct hv_data *hv, int allow_rx_drop);
+uint8_t hyperv_get_link_status(struct hv_data *hv);
+int hyperv_set_rx_mode(struct hv_data *hv, uint8_t promisc, uint8_t mcast);
+
+inline int rte_hv_dev_atomic_write_link_status(struct rte_eth_dev *dev,
+		struct rte_eth_link *link);
+inline int rte_hv_dev_atomic_read_link_status(struct rte_eth_dev *dev,
+		struct rte_eth_link *link);
+
+#endif /* _HYPERV_H_ */
diff --git a/lib/librte_pmd_hyperv/hyperv_drv.c b/lib/librte_pmd_hyperv/hyperv_drv.c
new file mode 100644
index 0000000..4a37966
--- /dev/null
+++ b/lib/librte_pmd_hyperv/hyperv_drv.c
@@ -0,0 +1,1653 @@
+/*-
+ * Copyright (c) 2009-2012 Microsoft Corp.
+ * Copyright (c) 2010-2012 Citrix Inc.
+ * Copyright (c) 2012 NetApp Inc.
+ * Copyright (c) 2013-2015 Brocade Communications Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice unmodified, this list of conditions, and the following
+ *    disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
+ * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
+ * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
+ * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#include "hyperv.h"
+#include "hyperv_drv.h"
+#include "hyperv_rxtx.h"
+
+#define LOOP_CNT 10000
+#define MAC_STRLEN 14
+#define MAC_PARAM_STR "NetworkAddress"
+
+#define hex "0123456789abcdef"
+#define high(x) hex[(x & 0xf0) >> 4]
+#define low(x) hex[x & 0x0f]
+
+static int hv_rf_on_receive(struct hv_data *hv, struct netvsc_packet *pkt);
+
+/*
+ * Ring buffer
+ */
+
+/* Amount of space to write to */
+#define HV_BYTES_AVAIL_TO_WRITE(r, w, z) \
+	(((w) >= (r)) ? ((z) - ((w) - (r))) : ((r) - (w)))
+
+/*
+ * Get number of bytes available to read and to write to
+ * for the specified ring buffer
+ */
+static inline void
+get_ring_buffer_avail_bytes(
+	struct hv_data               *hv,
+	struct hv_vmbus_ring_buffer  *ring_buffer,
+	uint32_t                     *read,
+	uint32_t                     *write)
+{
+	rte_compiler_barrier();
+
+	/*
+	 * Capture the read/write indices before they changed
+	 */
+	uint32_t read_loc = ring_buffer->read_index;
+	uint32_t write_loc = ring_buffer->write_index;
+
+	*write = HV_BYTES_AVAIL_TO_WRITE(
+			read_loc, write_loc, hv->rb_data_size);
+	*read = hv->rb_data_size - *write;
+}
+
+/*
+ * Helper routine to copy from source to ring buffer.
+ *
+ * Assume there is enough room. Handles wrap-around in dest case only!
+ */
+static uint32_t
+copy_to_ring_buffer(
+	struct hv_vmbus_ring_buffer  *ring_buffer,
+	uint32_t                     ring_buffer_size,
+	uint32_t                     start_write_offset,
+	char                         *src,
+	uint32_t                     src_len)
+{
+	char *ring_buf = (char *)ring_buffer->buffer;
+	uint32_t fragLen;
+
+	if (src_len > ring_buffer_size - start_write_offset)  {
+		/* wrap-around detected! */
+		fragLen = ring_buffer_size - start_write_offset;
+		rte_memcpy(ring_buf + start_write_offset, src, fragLen);
+		rte_memcpy(ring_buf, src + fragLen, src_len - fragLen);
+	} else {
+		rte_memcpy(ring_buf + start_write_offset, src, src_len);
+	}
+
+	start_write_offset += src_len;
+	start_write_offset %= ring_buffer_size;
+
+	return start_write_offset;
+}
+
+/*
+ * Helper routine to copy to dest from ring buffer.
+ *
+ * Assume there is enough room. Handles wrap-around in src case only!
+ */
+static uint32_t
+copy_from_ring_buffer(
+	struct hv_data               *hv,
+	struct hv_vmbus_ring_buffer  *ring_buffer,
+	char                         *dest,
+	uint32_t                     dest_len,
+	uint32_t                     start_read_offset)
+{
+	uint32_t fragLen;
+	char *ring_buf = (char *)ring_buffer->buffer;
+
+	if (dest_len > hv->rb_data_size - start_read_offset) {
+		/*  wrap-around detected at the src */
+		fragLen = hv->rb_data_size - start_read_offset;
+		rte_memcpy(dest, ring_buf + start_read_offset, fragLen);
+		rte_memcpy(dest + fragLen, ring_buf, dest_len - fragLen);
+	} else {
+		rte_memcpy(dest, ring_buf + start_read_offset, dest_len);
+	}
+
+	start_read_offset += dest_len;
+	start_read_offset %= hv->rb_data_size;
+
+	return start_read_offset;
+}
+
+/*
+ * Write to the ring buffer.
+ */
+static int
+hv_ring_buffer_write(
+	struct hv_data                 *hv,
+	struct hv_vmbus_sg_buffer_list sg_buffers[],
+	uint32_t                       sg_buffer_count)
+{
+	struct hv_vmbus_ring_buffer *ring_buffer = hv->out;
+	uint32_t i = 0;
+	uint32_t byte_avail_to_write;
+	uint32_t byte_avail_to_read;
+	uint32_t total_bytes_to_write = 0;
+	volatile uint32_t next_write_location;
+	uint64_t prev_indices = 0;
+
+	for (i = 0; i < sg_buffer_count; i++)
+		total_bytes_to_write += sg_buffers[i].length;
+
+	total_bytes_to_write += sizeof(uint64_t);
+
+	get_ring_buffer_avail_bytes(hv, ring_buffer, &byte_avail_to_read,
+			&byte_avail_to_write);
+
+	/*
+	 * If there is only room for the packet, assume it is full.
+	 * Otherwise, the next time around, we think the ring buffer
+	 * is empty since the read index == write index
+	 */
+	if (byte_avail_to_write <= total_bytes_to_write) {
+		PMD_PERROR_LOG(hv, DBG_RB,
+				"byte_avail_to_write = %u, total_bytes_to_write = %u",
+				byte_avail_to_write, total_bytes_to_write);
+		return -EAGAIN;
+	}
+
+	/*
+	 * Write to the ring buffer
+	 */
+	next_write_location = ring_buffer->write_index;
+
+	for (i = 0; i < sg_buffer_count; i++) {
+		next_write_location = copy_to_ring_buffer(ring_buffer,
+				hv->rb_data_size, next_write_location,
+				(char *) sg_buffers[i].data, sg_buffers[i].length);
+	}
+
+	/*
+	 * Set previous packet start
+	 */
+	prev_indices = (uint64_t)ring_buffer->write_index << 32;
+
+	next_write_location = copy_to_ring_buffer(
+			ring_buffer, hv->rb_data_size, next_write_location,
+			(char *) &prev_indices, sizeof(uint64_t));
+
+	/*
+	 * Make sure we flush all writes before updating the writeIndex
+	 */
+	rte_compiler_barrier();
+
+	/*
+	 * Now, update the write location
+	 */
+	ring_buffer->write_index = next_write_location;
+
+	return 0;
+}
+
+/*
+ * Read without advancing the read index.
+ */
+static int
+hv_ring_buffer_peek(struct hv_data  *hv, void *buffer, uint32_t buffer_len)
+{
+	struct hv_vmbus_ring_buffer *ring_buffer = hv->in;
+	uint32_t bytesAvailToWrite;
+	uint32_t bytesAvailToRead;
+
+	get_ring_buffer_avail_bytes(hv, ring_buffer,
+			&bytesAvailToRead,
+			&bytesAvailToWrite);
+
+	/* Make sure there is something to read */
+	if (bytesAvailToRead < buffer_len)
+		return -EAGAIN;
+
+	copy_from_ring_buffer(hv, ring_buffer,
+		(char *)buffer, buffer_len, ring_buffer->read_index);
+
+	return 0;
+}
+
+/*
+ * Read and advance the read index.
+ */
+static int
+hv_ring_buffer_read(struct hv_data  *hv, void *buffer,
+		    uint32_t buffer_len, uint32_t offset)
+{
+	struct hv_vmbus_ring_buffer *ring_buffer = hv->in;
+	uint32_t bytes_avail_to_write;
+	uint32_t bytes_avail_to_read;
+	uint32_t next_read_location = 0;
+	uint64_t prev_indices = 0;
+
+	if (buffer_len <= 0)
+		return -EINVAL;
+
+	get_ring_buffer_avail_bytes(
+			hv,
+			ring_buffer,
+			&bytes_avail_to_read,
+			&bytes_avail_to_write);
+
+	/*
+	 * Make sure there is something to read
+	 */
+	if (bytes_avail_to_read < buffer_len) {
+		PMD_PERROR_LOG(hv, DBG_RB, "bytes_avail_to_read = %u, buffer_len = %u",
+				bytes_avail_to_read, buffer_len);
+		return -EAGAIN;
+	}
+
+	next_read_location = (ring_buffer->read_index + offset) % hv->rb_data_size;
+
+	next_read_location = copy_from_ring_buffer(
+			hv,
+			ring_buffer,
+			(char *) buffer,
+			buffer_len,
+			next_read_location);
+
+	next_read_location = copy_from_ring_buffer(
+			hv,
+			ring_buffer,
+			(char *) &prev_indices,
+			sizeof(uint64_t),
+			next_read_location);
+
+	/*
+	 * Make sure all reads are done before we update the read index since
+	 * the writer may start writing to the read area once the read index
+	 * is updated.
+	 */
+	rte_compiler_barrier();
+
+	/*
+	 * Update the read index
+	 */
+	ring_buffer->read_index = next_read_location;
+
+	return 0;
+}
+
+/*
+ * VMBus
+ */
+
+/*
+ * Retrieve the raw packet on the specified channel
+ */
+static int
+hv_vmbus_channel_recv_packet_raw(struct hv_data  *hv, void *buffer,
+				 uint32_t        buffer_len,
+				 uint32_t        *buffer_actual_len,
+				 uint64_t        *request_id,
+				 int             mode)
+{
+	int ret;
+	uint32_t packetLen;
+	struct hv_vm_packet_descriptor desc;
+
+	*buffer_actual_len = 0;
+	*request_id = 0;
+
+	ret = hv_ring_buffer_peek(hv, &desc,
+			sizeof(struct hv_vm_packet_descriptor));
+
+	if (ret != 0)
+		return 0;
+
+	if ((desc.type == HV_VMBUS_PACKET_TYPE_DATA_USING_TRANSFER_PAGES
+				&& !(mode & 1)) ||
+			((desc.type == HV_VMBUS_PACKET_TYPE_COMPLETION) && !(mode & 2))) {
+		return -1;
+	}
+
+	packetLen = desc.length8 << 3;
+
+	*buffer_actual_len = packetLen;
+
+	if (unlikely(packetLen > buffer_len)) {
+		PMD_PERROR_LOG(hv, DBG_RX, "The buffer desc is too big, will drop it");
+		return -ENOMEM;
+	}
+
+	*request_id = desc.transaction_id;
+
+	/* Copy over the entire packet to the user buffer */
+	ret = hv_ring_buffer_read(hv, buffer, packetLen, 0);
+
+	return 0;
+}
+
+/*
+ * Trigger an event notification on the specified channel
+ */
+static void
+vmbus_channel_set_event(struct hv_data *hv)
+{
+	/* Here we assume that channel->offer_msg.monitor_allocated == 1,
+	 * in another case our driver will not work */
+	/* Each uint32_t represents 32 channels */
+	__sync_or_and_fetch(((uint32_t *)hv->send_interrupt_page
+		+ ((hv->vmbus_device >> 5))), 1 << (hv->vmbus_device & 31)
+	);
+	__sync_or_and_fetch((uint32_t *)&hv->monitor_pages->
+			trigger_group[hv->monitor_group].u.pending, 1 << hv->monitor_bit);
+}
+
+/**
+ * @brief Send the specified buffer on the given channel
+ */
+static int
+hv_vmbus_channel_send_packet(struct hv_data *hv, void *buffer,
+			     uint32_t buffer_len, uint64_t request_id,
+			     enum hv_vmbus_packet_type type,
+			     uint32_t flags)
+{
+	struct hv_vmbus_sg_buffer_list buffer_list[3];
+	struct hv_vm_packet_descriptor desc;
+	uint32_t packet_len_aligned;
+	uint64_t aligned_data;
+	uint32_t packet_len;
+	int ret = 0;
+	uint32_t old_write = hv->out->write_index;
+
+	packet_len = sizeof(struct hv_vm_packet_descriptor) + buffer_len;
+	packet_len_aligned = HV_ALIGN_UP(packet_len, sizeof(uint64_t));
+	aligned_data = 0;
+
+	/* Setup the descriptor */
+	desc.type = type;   /* HV_VMBUS_PACKET_TYPE_DATA_IN_BAND;             */
+	desc.flags = flags; /* HV_VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED */
+	/* in 8-bytes granularity */
+	desc.data_offset8 = sizeof(struct hv_vm_packet_descriptor) >> 3;
+	desc.length8 = (uint16_t) (packet_len_aligned >> 3);
+	desc.transaction_id = request_id;
+
+	buffer_list[0].data = &desc;
+	buffer_list[0].length = sizeof(struct hv_vm_packet_descriptor);
+
+	buffer_list[1].data = buffer;
+	buffer_list[1].length = buffer_len;
+
+	buffer_list[2].data = &aligned_data;
+	buffer_list[2].length = packet_len_aligned - packet_len;
+
+	ret = hv_ring_buffer_write(hv, buffer_list, 3);
+
+	rte_mb();
+	if (!ret && !hv->out->interrupt_mask && hv->out->read_index == old_write)
+		vmbus_channel_set_event(hv);
+
+	return ret;
+}
+
+/*
+ * Send a range of single-page buffer packets using
+ * a GPADL Direct packet type
+ */
+static int
+hv_vmbus_channel_send_packet_pagebuffer(
+	struct hv_data  *hv,
+	struct hv_vmbus_page_buffer	page_buffers[],
+	uint32_t		page_count,
+	void			*buffer,
+	uint32_t		buffer_len,
+	uint64_t		request_id)
+{
+
+	int ret = 0;
+	uint32_t packet_len, packetLen_aligned, descSize, i = 0;
+	struct hv_vmbus_sg_buffer_list buffer_list[3];
+	struct hv_vmbus_channel_packet_page_buffer desc;
+	uint64_t alignedData = 0;
+	uint32_t old_write = hv->out->write_index;
+
+	if (page_count > HV_MAX_PAGE_BUFFER_COUNT) {
+		PMD_PERROR_LOG(hv, DBG_VMBUS, "page_count %u goes out of the limit",
+				page_count);
+		return -EINVAL;
+	}
+
+	/*
+	 * Adjust the size down since hv_vmbus_channel_packet_page_buffer
+	 * is the largest size we support
+	 */
+	descSize = sizeof(struct hv_vmbus_channel_packet_page_buffer) -
+		((HV_MAX_PAGE_BUFFER_COUNT - page_count) *
+		 sizeof(struct hv_vmbus_page_buffer));
+	packet_len = descSize + buffer_len;
+	packetLen_aligned = HV_ALIGN_UP(packet_len, sizeof(uint64_t));
+
+	/* Setup the descriptor */
+	desc.type = HV_VMBUS_PACKET_TYPE_DATA_USING_GPA_DIRECT;
+	desc.flags = HV_VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED;
+	desc.data_offset8 = descSize >> 3; /* in 8-bytes granularity */
+	desc.length8 = (uint16_t) (packetLen_aligned >> 3);
+	desc.transaction_id = request_id;
+	desc.range_count = page_count;
+
+	for (i = 0; i < page_count; i++) {
+		desc.range[i].length = page_buffers[i].length;
+		desc.range[i].offset = page_buffers[i].offset;
+		desc.range[i].pfn = page_buffers[i].pfn;
+	}
+
+	buffer_list[0].data = &desc;
+	buffer_list[0].length = descSize;
+
+	buffer_list[1].data = buffer;
+	buffer_list[1].length = buffer_len;
+
+	buffer_list[2].data = &alignedData;
+	buffer_list[2].length = packetLen_aligned - packet_len;
+
+	ret = hv_ring_buffer_write(hv, buffer_list, 3);
+	if (likely(ret == 0))
+		++hv->num_outstanding_sends;
+
+	rte_mb();
+	if (!ret && !hv->out->interrupt_mask &&
+			hv->out->read_index == old_write)
+		vmbus_channel_set_event(hv);
+
+	return ret;
+}
+
+/*
+ * NetVSC
+ */
+
+/*
+ * Net VSC on send
+ * Sends a packet on the specified Hyper-V device.
+ * Returns 0 on success, non-zero on failure.
+ */
+static int
+hv_nv_on_send(struct hv_data *hv, struct netvsc_packet *pkt)
+{
+	struct nvsp_msg send_msg;
+	int ret;
+
+	send_msg.msg_type = nvsp_msg_1_type_send_rndis_pkt;
+	if (pkt->is_data_pkt) {
+		/* 0 is RMC_DATA */
+		send_msg.msgs.send_rndis_pkt.chan_type = 0;
+	} else {
+		/* 1 is RMC_CONTROL */
+		send_msg.msgs.send_rndis_pkt.chan_type = 1;
+	}
+
+	/* Not using send buffer section */
+	send_msg.msgs.send_rndis_pkt.send_buf_section_idx =
+	    0xFFFFFFFF;
+	send_msg.msgs.send_rndis_pkt.send_buf_section_size = 0;
+
+	if (likely(pkt->page_buf_count)) {
+		ret = hv_vmbus_channel_send_packet_pagebuffer(hv,
+				pkt->page_buffers, pkt->page_buf_count,
+				&send_msg, sizeof(struct nvsp_msg),
+				(uint64_t)pkt->is_data_pkt ? (hv->txq->tx_tail + 1) : 0);
+	} else {
+		PMD_PERROR_LOG(hv, DBG_TX, "pkt->page_buf_count value can't be zero");
+		ret = -1;
+	}
+
+	return ret;
+}
+
+/*
+ * Net VSC on receive
+ *
+ * This function deals exclusively with virtual addresses.
+ */
+static void
+hv_nv_on_receive(struct hv_data *hv, struct hv_vm_packet_descriptor *pkt)
+{
+	struct hv_vm_transfer_page_packet_header *vm_xfer_page_pkt;
+	struct nvsp_msg *nvsp_msg_pkt;
+	struct netvsc_packet *net_vsc_pkt = NULL;
+	unsigned long start;
+	int count, i;
+
+	nvsp_msg_pkt = (struct nvsp_msg *)((unsigned long)pkt
+			+ (pkt->data_offset8 << 3));
+
+	/* Make sure this is a valid nvsp packet */
+	if (unlikely(nvsp_msg_pkt->msg_type != nvsp_msg_1_type_send_rndis_pkt)) {
+		PMD_PERROR_LOG(hv, DBG_RX, "NVSP packet is not valid");
+		return;
+	}
+
+	vm_xfer_page_pkt = (struct hv_vm_transfer_page_packet_header *)pkt;
+
+	if (unlikely(vm_xfer_page_pkt->transfer_page_set_id
+			!= NETVSC_RECEIVE_BUFFER_ID)) {
+		PMD_PERROR_LOG(hv, DBG_RX, "transfer_page_set_id is not valid");
+		return;
+	}
+
+	count = vm_xfer_page_pkt->range_count;
+
+	/*
+	 * Initialize the netvsc packet
+	 */
+	for (i = 0; i < count; ++i) {
+		net_vsc_pkt = hv->netvsc_packet;
+
+		net_vsc_pkt->tot_data_buf_len =
+			vm_xfer_page_pkt->ranges[i].byte_count;
+		net_vsc_pkt->page_buf_count = 1;
+
+		net_vsc_pkt->page_buffers[0].length =
+			vm_xfer_page_pkt->ranges[i].byte_count;
+
+		/* The virtual address of the packet in the receive buffer */
+		start = ((unsigned long)hv->recv_buf +
+				vm_xfer_page_pkt->ranges[i].byte_offset);
+
+		/* Page number of the virtual page containing packet start */
+		net_vsc_pkt->page_buffers[0].pfn = start >> PAGE_SHIFT;
+
+		/* Calculate the page relative offset */
+		net_vsc_pkt->page_buffers[0].offset =
+			vm_xfer_page_pkt->ranges[i].byte_offset & (PAGE_SIZE - 1);
+
+		/*
+		 * In this implementation, we are dealing with virtual
+		 * addresses exclusively.  Since we aren't using physical
+		 * addresses at all, we don't care if a packet crosses a
+		 * page boundary.  For this reason, the original code to
+		 * check for and handle page crossings has been removed.
+		 */
+
+		/*
+		 * Pass it to the upper layer.  The receive completion call
+		 * has been moved into this function.
+		 */
+		hv_rf_on_receive(hv, net_vsc_pkt);
+	}
+	/* Send a receive completion packet to RNDIS device (ie NetVsp) */
+	hv_vmbus_channel_send_packet(hv, hv->rx_comp_msg, sizeof(struct nvsp_msg),
+			vm_xfer_page_pkt->d.transaction_id,
+			HV_VMBUS_PACKET_TYPE_COMPLETION, 0);
+}
+
+/*
+ * Net VSC on send completion
+ */
+static void
+hv_nv_on_send_completion(struct hv_data *hv, struct hv_vm_packet_descriptor *pkt)
+{
+	struct nvsp_msg *nvsp_msg_pkt;
+
+	nvsp_msg_pkt =
+	    (struct nvsp_msg *)((unsigned long)pkt + (pkt->data_offset8 << 3));
+
+	if (likely(nvsp_msg_pkt->msg_type ==
+				nvsp_msg_1_type_send_rndis_pkt_complete)) {
+
+		if (unlikely(hv->hlt_req_pending))
+			hv->hlt_req_sent = 1;
+		else
+			if (pkt->transaction_id)
+				++hv->txq->tx_free;
+		--hv->num_outstanding_sends;
+		return;
+	}
+	PMD_PINFO_LOG(hv, DBG_TX, "unhandled completion (for kernel req or so)");
+}
+
+/*
+ * Analogue of bsd hv_nv_on_channel_callback
+ */
+static void
+hv_nv_complete_request(struct hv_data *hv, struct rndis_request *request)
+{
+	uint32_t bytes_rxed, cnt = 0;
+	uint64_t request_id;
+	struct hv_vm_packet_descriptor *desc;
+	uint8_t *buffer;
+	int     bufferlen = NETVSC_PACKET_SIZE;
+	int     ret = 0;
+
+	PMD_INIT_FUNC_TRACE();
+
+	hv->req = request;
+
+	buffer = rte_malloc(NULL, bufferlen, RTE_CACHE_LINE_SIZE);
+	if (!buffer) {
+		PMD_PERROR_LOG(hv, DBG_LOAD, "failed to allocate packet");
+		return;
+	}
+
+	do {
+		rte_delay_us(1);
+		ret = hv_vmbus_channel_recv_packet_raw(hv,
+				buffer, bufferlen, &bytes_rxed, &request_id, 3);
+		if (ret == 0) {
+			if (bytes_rxed > 0) {
+				desc = (struct hv_vm_packet_descriptor *)buffer;
+
+				switch (desc->type) {
+				case HV_VMBUS_PACKET_TYPE_COMPLETION:
+					hv_nv_on_send_completion(hv, desc);
+					break;
+				case HV_VMBUS_PACKET_TYPE_DATA_USING_TRANSFER_PAGES:
+					hv_nv_on_receive(hv, desc);
+					break;
+				default:
+					break;
+				}
+				PMD_PDEBUG_LOG(hv, DBG_LOAD,
+					       "Did %d attempts until non-empty data was receieved",
+					       cnt);
+				cnt = 0;
+			} else {
+				cnt++;
+			}
+		} else if (ret == -ENOMEM) {
+			/* Handle large packet */
+			PMD_PDEBUG_LOG(hv, DBG_LOAD,
+				       "recv_packet_raw returned -ENOMEM");
+			rte_free(buffer);
+			buffer = rte_malloc(NULL, bytes_rxed, RTE_CACHE_LINE_SIZE);
+			if (buffer == NULL) {
+				PMD_PERROR_LOG(hv, DBG_LOAD, "failed to allocate buffer");
+				break;
+			}
+			bufferlen = bytes_rxed;
+		} else {
+			PMD_PERROR_LOG(hv, DBG_LOAD, "Unexpected return code (%d)", ret);
+		}
+		if (!hv->req) {
+			PMD_PINFO_LOG(hv, DBG_LOAD, "Single request processed");
+			break;
+		}
+		if (cnt >= LOOP_CNT) {
+			PMD_PERROR_LOG(hv, DBG_LOAD, "Emergency break from the loop");
+			break;
+		}
+		if (hv->hlt_req_sent) {
+			PMD_PINFO_LOG(hv, DBG_LOAD, "Halt request processed");
+			break;
+		}
+		/* The field hv->req->response_msg.ndis_msg_type
+		 * should be set to non-zero value when response received
+		 */
+	} while (!hv->req->response_msg.ndis_msg_type);
+
+	rte_free(buffer);
+}
+
+/*
+ * RNDIS
+ */
+
+/*
+ * Create new RNDIS request
+ */
+static inline struct rndis_request *
+hv_rndis_request(struct hv_data *hv, uint32_t message_type,
+		uint32_t message_length)
+{
+	struct rndis_request *request;
+	struct rndis_msg *rndis_mesg;
+	struct rndis_set_request *set;
+	char mz_name[RTE_MEMZONE_NAMESIZE];
+	uint32_t size;
+
+	PMD_INIT_FUNC_TRACE();
+
+	request = rte_zmalloc("rndis_req", sizeof(struct rndis_request),
+			      RTE_CACHE_LINE_SIZE);
+
+	if (!request)
+		return NULL;
+
+	sprintf(mz_name, "hv_%d_%u_%d_%p", hv->vmbus_device, message_type,
+			hv->new_request_id, request);
+
+	size = MAX(message_length, sizeof(struct rndis_msg));
+
+	request->request_msg_memzone = rte_memzone_reserve_aligned(mz_name,
+			size, rte_lcore_to_socket_id(rte_lcore_id()), 0, PAGE_SIZE);
+	if (!request->request_msg_memzone) {
+		PMD_PERROR_LOG(hv, DBG_LOAD, "memzone_reserve failed");
+		rte_free(request);
+		return NULL;
+	}
+	request->request_msg = request->request_msg_memzone->addr;
+	rndis_mesg = request->request_msg;
+	rndis_mesg->ndis_msg_type = message_type;
+	rndis_mesg->msg_len = message_length;
+
+	/*
+	 * Set the request id. This field is always after the rndis header
+	 * for request/response packet types so we just use the set_request
+	 * as a template.
+	 */
+	set = &rndis_mesg->msg.set_request;
+	hv->new_request_id++;
+	set->request_id = hv->new_request_id;
+
+	return request;
+}
+
+/*
+ * RNDIS filter
+ */
+
+static void
+hv_rf_receive_response(
+	struct hv_data       *hv,
+	struct rndis_msg     *response)
+{
+	struct rndis_request *request = hv->req;
+
+	PMD_INIT_FUNC_TRACE();
+
+	if (response->msg_len <= sizeof(struct rndis_msg)) {
+		rte_memcpy(&request->response_msg, response,
+				response->msg_len);
+	} else {
+		if (response->ndis_msg_type == REMOTE_NDIS_INITIALIZE_CMPLT) {
+			request->response_msg.msg.init_complete.status =
+				STATUS_BUFFER_OVERFLOW;
+		}
+		PMD_PERROR_LOG(hv, DBG_LOAD, "response buffer overflow\n");
+	}
+}
+
+/*
+ * RNDIS filter receive indicate status
+ */
+static void
+hv_rf_receive_indicate_status(struct hv_data *hv, struct rndis_msg *response)
+{
+	struct rndis_indicate_status *indicate = &response->msg.indicate_status;
+
+	PMD_INIT_FUNC_TRACE();
+
+	if (indicate->status == RNDIS_STATUS_MEDIA_CONNECT)
+		hv->link_status = 1;
+	else if (indicate->status == RNDIS_STATUS_MEDIA_DISCONNECT)
+		hv->link_status = 0;
+	else if (indicate->status == RNDIS_STATUS_INVALID_DATA)
+		PMD_PERROR_LOG(hv, DBG_RX, "Invalid data in RNDIS message");
+	else
+		PMD_PERROR_LOG(hv, DBG_RX, "Unsupported status: %u", indicate->status);
+}
+
+/*
+ * RNDIS filter receive data
+ */
+static void
+hv_rf_receive_data(struct hv_data *hv, struct rndis_msg *msg,
+		struct netvsc_packet *pkt)
+{
+	struct rte_mbuf *m_new;
+	struct hv_rx_queue *rxq = hv->rxq;
+	struct rndis_packet *rndis_pkt;
+	uint32_t data_offset;
+
+	if (unlikely(hv->closed))
+		return;
+
+	rndis_pkt = &msg->msg.packet;
+
+	if (unlikely(hv->max_rx_pkt_len < rndis_pkt->data_length)) {
+		PMD_PWARN_LOG(hv, DBG_RX, "Packet is too large (%db), dropping.",
+				rndis_pkt->data_length);
+		++hv->stats.ierrors;
+		return;
+	}
+
+	/* Remove rndis header, then pass data packet up the stack */
+	data_offset = RNDIS_HEADER_SIZE + rndis_pkt->data_offset;
+
+	/* L2 frame length, with L2 header, not including CRC */
+	pkt->tot_data_buf_len        = rndis_pkt->data_length;
+	pkt->page_buffers[0].offset += data_offset;
+	/* Buffer length now L2 frame length plus trailing junk */
+	pkt->page_buffers[0].length -= data_offset;
+
+	pkt->vlan_tci = 0;
+
+	/*
+	 * Just put data into appropriate mbuf, all further work will be done
+	 * by the upper layer (mbuf replacement, index adjustment, etc)
+	 */
+	m_new = rxq->sw_ring[rxq->rx_tail];
+	if (++rxq->rx_tail == rxq->nb_rx_desc)
+		rxq->rx_tail = 0;
+
+	/*
+	 * Copy the received packet to mbuf.
+	 * The copy is required since the memory pointed to by netvsc_packet
+	 * cannot be reallocated
+	 */
+	uint8_t *vaddr = (uint8_t *)
+		(pkt->page_buffers[0].pfn << PAGE_SHIFT)
+		+ pkt->page_buffers[0].offset;
+
+	m_new->nb_segs = 1;
+	m_new->pkt_len = m_new->data_len = pkt->tot_data_buf_len;
+	rte_memcpy(rte_pktmbuf_mtod(m_new, void *), vaddr, m_new->data_len);
+
+	if (pkt->vlan_tci) {
+		m_new->vlan_tci = pkt->vlan_tci;
+		m_new->ol_flags |= PKT_RX_VLAN_PKT;
+	}
+
+	hv->pkt_rxed = 1;
+}
+
+/*
+ * RNDIS filter receive data, jumbo frames support
+ */
+static void
+hv_rf_receive_data_sg(struct hv_data *hv, struct rndis_msg *msg,
+		struct netvsc_packet *pkt)
+{
+	struct rte_mbuf *m_new;
+	struct hv_rx_queue *rxq = hv->rxq;
+	struct rndis_packet *rndis_pkt;
+	uint32_t data_offset;
+
+	if (unlikely(hv->closed))
+		return;
+
+	rndis_pkt = &msg->msg.packet;
+
+	/* Remove rndis header, then pass data packet up the stack */
+	data_offset = RNDIS_HEADER_SIZE + rndis_pkt->data_offset;
+
+	/* L2 frame length, with L2 header, not including CRC */
+	pkt->tot_data_buf_len        = rndis_pkt->data_length;
+	pkt->page_buffers[0].offset += data_offset;
+	/* Buffer length now L2 frame length plus trailing junk */
+	pkt->page_buffers[0].length -= data_offset;
+
+	pkt->vlan_tci = 0;
+
+	/*
+	 * Just put data into appropriate mbuf, all further work will be done
+	 * by the upper layer (mbuf replacement, index adjustment, etc)
+	 */
+	m_new = rxq->sw_ring[rxq->rx_tail];
+	if (++rxq->rx_tail == rxq->nb_rx_desc)
+		rxq->rx_tail = 0;
+
+	/*
+	 * Copy the received packet to mbuf.
+	 * The copy is required since the memory pointed to by netvsc_packet
+	 * cannot be reallocated
+	 */
+	uint8_t *vaddr = (uint8_t *)
+		(pkt->page_buffers[0].pfn << PAGE_SHIFT)
+		+ pkt->page_buffers[0].offset;
+
+	/* Scatter-gather emulation */
+	uint32_t carry_len = pkt->tot_data_buf_len;
+	struct rte_mbuf *m_next;
+
+	m_new->pkt_len = carry_len;
+	m_new->nb_segs = (carry_len - 1) / hv->max_rx_pkt_len + 1;
+
+	while (1) {
+		m_new->data_len = MIN(carry_len, hv->max_rx_pkt_len);
+		rte_memcpy(rte_pktmbuf_mtod(m_new, void *),
+			   vaddr, m_new->data_len);
+		vaddr += m_new->data_len;
+
+		if (carry_len <= hv->max_rx_pkt_len)
+			break;
+
+		carry_len -= hv->max_rx_pkt_len;
+		m_next = rxq->sw_ring[rxq->rx_tail];
+		if (++rxq->rx_tail == rxq->nb_rx_desc)
+			rxq->rx_tail = 0;
+		m_new->next = m_next;
+		m_new = m_next;
+	}
+
+	if (pkt->vlan_tci) {
+		m_new->vlan_tci = pkt->vlan_tci;
+		m_new->ol_flags |= PKT_RX_VLAN_PKT;
+	}
+
+	hv->pkt_rxed = 1;
+}
+
+static int
+hv_rf_send_request(struct hv_data *hv, struct rndis_request *request)
+{
+	struct netvsc_packet *packet;
+
+	PMD_INIT_FUNC_TRACE();
+	/* Set up the packet to send it */
+	packet = &request->pkt;
+
+	packet->is_data_pkt = 0;
+	packet->tot_data_buf_len = request->request_msg->msg_len;
+	packet->page_buf_count = 1;
+
+	packet->page_buffers[0].pfn =
+		(request->request_msg_memzone->phys_addr) >> PAGE_SHIFT;
+	packet->page_buffers[0].length = request->request_msg->msg_len;
+	packet->page_buffers[0].offset =
+	    (unsigned long)request->request_msg & (PAGE_SIZE - 1);
+
+	return hv_nv_on_send(hv, packet);
+}
+
+static void u8_to_u16(const char *src, int len, char *dst)
+{
+	int i;
+
+	for (i = 0; i < len; ++i) {
+		dst[2 * i] = src[i];
+		dst[2 * i + 1] = 0;
+	}
+}
+
+int
+hv_rf_set_device_mac(struct hv_data *hv, uint8_t *macaddr)
+{
+	struct rndis_request *request;
+	struct rndis_set_request *set_request;
+	struct rndis_config_parameter_info *info;
+	struct rndis_set_complete *set_complete;
+	char mac_str[2*ETHER_ADDR_LEN+1];
+	wchar_t *param_value, *param_name;
+	uint32_t status;
+	uint32_t message_len = sizeof(struct rndis_config_parameter_info) +
+		2 * MAC_STRLEN + 4 * ETHER_ADDR_LEN;
+	int ret, i;
+
+	request = hv_rndis_request(hv, REMOTE_NDIS_SET_MSG,
+		RNDIS_MESSAGE_SIZE(struct rndis_set_request) + message_len);
+	if (!request)
+		return -ENOMEM;
+
+	set_request = &request->request_msg->msg.set_request;
+	set_request->oid = RNDIS_OID_GEN_RNDIS_CONFIG_PARAMETER;
+	set_request->device_vc_handle = 0;
+	set_request->info_buffer_offset = sizeof(struct rndis_set_request);
+	set_request->info_buffer_length = message_len;
+
+	info = (struct rndis_config_parameter_info *)((ulong)set_request +
+		set_request->info_buffer_offset);
+	info->parameter_type = RNDIS_CONFIG_PARAM_TYPE_STRING;
+	info->parameter_name_offset =
+		sizeof(struct rndis_config_parameter_info);
+	info->parameter_name_length = 2 * MAC_STRLEN;
+	info->parameter_value_offset =
+		info->parameter_name_offset + info->parameter_name_length;
+	/* Multiply by 2 because of string representation and by 2
+	 * because of utf16 representation
+	 */
+	info->parameter_value_length = 4 * ETHER_ADDR_LEN;
+	param_name = (wchar_t *)((ulong)info + info->parameter_name_offset);
+	param_value = (wchar_t *)((ulong)info + info->parameter_value_offset);
+
+	u8_to_u16(MAC_PARAM_STR, MAC_STRLEN, (char *)param_name);
+	for (i = 0; i < ETHER_ADDR_LEN; ++i) {
+		mac_str[2*i] = high(macaddr[i]);
+		mac_str[2*i+1] = low(macaddr[i]);
+	}
+
+	u8_to_u16((const char *)mac_str, 2 * ETHER_ADDR_LEN, (char *)param_value);
+
+	ret = hv_rf_send_request(hv, request);
+	if (ret)
+		goto cleanup;
+
+	request->response_msg.msg.set_complete.status = 0xFFFF;
+	hv_nv_complete_request(hv, request);
+	set_complete = &request->response_msg.msg.set_complete;
+	if (set_complete->status == 0xFFFF) {
+		/* Host is not responding, we can't free request in this case */
+		ret = -1;
+		PMD_PERROR_LOG(hv, DBG_LOAD, "Host is not responding");
+		goto exit;
+	}
+	/* Response received, check status */
+	status = set_complete->status;
+	if (status) {
+		/* Bad response status, return error */
+		PMD_PERROR_LOG(hv, DBG_LOAD, "set_complete->status = %u\n", status);
+		ret = -EINVAL;
+	}
+
+cleanup:
+	rte_free(request);
+exit:
+	return ret;
+}
+
+/*
+ * RNDIS filter on receive
+ */
+static int
+hv_rf_on_receive(struct hv_data *hv, struct netvsc_packet *pkt)
+{
+	struct rndis_msg rndis_mesg;
+	struct rndis_msg *rndis_hdr;
+
+	/* Shift virtual page number to form virtual page address */
+	rndis_hdr = (struct rndis_msg *)(pkt->page_buffers[0].pfn << PAGE_SHIFT);
+
+	rndis_hdr = (void *)((unsigned long)rndis_hdr
+			+ pkt->page_buffers[0].offset);
+
+	/*
+	 * Make sure we got a valid rndis message
+	 * Fixme:  There seems to be a bug in set completion msg where
+	 * its msg_len is 16 bytes but the byte_count field in the
+	 * xfer page range shows 52 bytes
+	 */
+	if (unlikely(pkt->tot_data_buf_len != rndis_hdr->msg_len)) {
+		++hv->stats.ierrors;
+		PMD_PERROR_LOG(hv, DBG_RX,
+			       "invalid rndis message? (expected %u "
+			       "bytes got %u)... dropping this message",
+			       rndis_hdr->msg_len, pkt->tot_data_buf_len);
+		return -1;
+	}
+
+	rte_memcpy(&rndis_mesg, rndis_hdr,
+	    (rndis_hdr->msg_len > sizeof(struct rndis_msg)) ?
+	    sizeof(struct rndis_msg) : rndis_hdr->msg_len);
+
+	switch (rndis_mesg.ndis_msg_type) {
+
+	/* data message */
+	case REMOTE_NDIS_PACKET_MSG:
+		hv->receive_callback(hv, &rndis_mesg, pkt);
+		break;
+	/* completion messages */
+	case REMOTE_NDIS_INITIALIZE_CMPLT:
+	case REMOTE_NDIS_QUERY_CMPLT:
+	case REMOTE_NDIS_SET_CMPLT:
+	case REMOTE_NDIS_RESET_CMPLT:
+	case REMOTE_NDIS_KEEPALIVE_CMPLT:
+		hv_rf_receive_response(hv, &rndis_mesg);
+		break;
+	/* notification message */
+	case REMOTE_NDIS_INDICATE_STATUS_MSG:
+		hv_rf_receive_indicate_status(hv, &rndis_mesg);
+		break;
+	default:
+		PMD_PERROR_LOG(hv, DBG_RX, "hv_rf_on_receive():  Unknown msg_type 0x%x",
+		    rndis_mesg.ndis_msg_type);
+		break;
+	}
+
+	return 0;
+}
+
+/*
+ * RNDIS filter on send
+ */
+int
+hv_rf_on_send(struct hv_data *hv, struct netvsc_packet *pkt)
+{
+	struct rndis_msg *rndis_mesg;
+	struct rndis_packet *rndis_pkt;
+	uint32_t rndis_msg_size;
+
+	/* Add the rndis header */
+	rndis_mesg = (struct rndis_msg *)pkt->extension;
+
+	memset(rndis_mesg, 0, sizeof(struct rndis_msg));
+
+	rndis_msg_size = RNDIS_MESSAGE_SIZE(struct rndis_packet);
+
+	rndis_mesg->ndis_msg_type = REMOTE_NDIS_PACKET_MSG;
+	rndis_mesg->msg_len = pkt->tot_data_buf_len + rndis_msg_size;
+
+	rndis_pkt = &rndis_mesg->msg.packet;
+	rndis_pkt->data_offset = sizeof(struct rndis_packet);
+	rndis_pkt->data_length = pkt->tot_data_buf_len;
+
+	pkt->is_data_pkt = 1;
+
+	/*
+	 * Invoke netvsc send.  If return status is bad, the caller now
+	 * resets the context pointers before retrying.
+	 */
+	return hv_nv_on_send(hv, pkt);
+}
+
+static int
+hv_rf_init_device(struct hv_data *hv)
+{
+	struct rndis_request *request;
+	struct rndis_initialize_request *init;
+	struct rndis_initialize_complete *init_complete;
+	uint32_t status;
+	int ret;
+
+	PMD_INIT_FUNC_TRACE();
+
+	request = hv_rndis_request(hv, REMOTE_NDIS_INITIALIZE_MSG,
+	    RNDIS_MESSAGE_SIZE(struct rndis_initialize_request));
+	if (!request) {
+		ret = -1;
+		goto cleanup;
+	}
+
+	/* Set up the rndis set */
+	init = &request->request_msg->msg.init_request;
+	init->major_version = RNDIS_MAJOR_VERSION;
+	init->minor_version = RNDIS_MINOR_VERSION;
+	/*
+	 * Per the RNDIS document, this should be set to the max MTU
+	 * plus the header size.  However, 2048 works fine, so leaving
+	 * it as is.
+	 */
+	init->max_xfer_size = 2048;
+
+	hv->rndis_dev_state = RNDIS_DEV_INITIALIZING;
+
+	ret = hv_rf_send_request(hv, request);
+	if (ret != 0) {
+		hv->rndis_dev_state = RNDIS_DEV_UNINITIALIZED;
+		goto cleanup;
+	}
+
+	/* Putting -1 here to ensure that HyperV really answered us */
+	request->response_msg.msg.init_complete.status = -1;
+	hv_nv_complete_request(hv, request);
+
+	init_complete = &request->response_msg.msg.init_complete;
+	status = init_complete->status;
+	if (status == 0) {
+		PMD_PINFO_LOG(hv, DBG_LOAD, "Remote NDIS device is initialized");
+		hv->rndis_dev_state = RNDIS_DEV_INITIALIZED;
+		ret = 0;
+	} else {
+		PMD_PINFO_LOG(hv, DBG_LOAD, "Remote NDIS device left uninitialized");
+		hv->rndis_dev_state = RNDIS_DEV_UNINITIALIZED;
+		ret = -1;
+	}
+
+cleanup:
+	rte_free(request);
+
+	return ret;
+}
+
+/*
+ * RNDIS filter query device
+ */
+static int
+hv_rf_query_device(struct hv_data *hv, uint32_t oid, void *result,
+		   uint32_t result_size)
+{
+	struct rndis_request *request;
+	struct rndis_query_request *query;
+	struct rndis_query_complete *query_complete;
+	int ret = 0;
+
+	PMD_INIT_FUNC_TRACE();
+
+	request = hv_rndis_request(hv, REMOTE_NDIS_QUERY_MSG,
+	    RNDIS_MESSAGE_SIZE(struct rndis_query_request));
+	if (request == NULL) {
+		ret = -1;
+		goto cleanup;
+	}
+
+	/* Set up the rndis query */
+	query = &request->request_msg->msg.query_request;
+	query->oid = oid;
+	query->info_buffer_offset = sizeof(struct rndis_query_request);
+	query->info_buffer_length = 0;
+	query->device_vc_handle = 0;
+
+	ret = hv_rf_send_request(hv, request);
+	if (ret != 0) {
+		PMD_PERROR_LOG(hv, DBG_TX, "RNDISFILTER request failed to Send!");
+		goto cleanup;
+	}
+
+	hv_nv_complete_request(hv, request);
+
+	/* Copy the response back */
+	query_complete = &request->response_msg.msg.query_complete;
+
+	if (query_complete->info_buffer_length > result_size) {
+		ret = -EINVAL;
+		goto cleanup;
+	}
+
+	rte_memcpy(result, (void *)((unsigned long)query_complete +
+	    query_complete->info_buffer_offset),
+	    query_complete->info_buffer_length);
+
+cleanup:
+	rte_free(request);
+
+	return ret;
+}
+
+/*
+ * RNDIS filter query device MAC address
+ */
+static inline int
+hv_rf_query_device_mac(struct hv_data *hv)
+{
+	uint32_t size = HW_MACADDR_LEN;
+
+	int ret = hv_rf_query_device(hv, RNDIS_OID_802_3_PERMANENT_ADDRESS,
+			&hv->hw_mac_addr, size);
+	PMD_PDEBUG_LOG(hv, DBG_TX, "MAC: %02x:%02x:%02x:%02x:%02x:%02x, ret = %d",
+			hv->hw_mac_addr[0], hv->hw_mac_addr[1], hv->hw_mac_addr[2],
+			hv->hw_mac_addr[3], hv->hw_mac_addr[4], hv->hw_mac_addr[5],
+			ret);
+	return ret;
+}
+
+/*
+ * RNDIS filter query device link status
+ */
+static inline int
+hv_rf_query_device_link_status(struct hv_data *hv)
+{
+	uint32_t size = sizeof(uint32_t);
+	/* Set all bits to 1, it's to ensure that the response is actual */
+	uint32_t status = -1;
+
+	int ret = hv_rf_query_device(hv, RNDIS_OID_GEN_MEDIA_CONNECT_STATUS,
+			&status, size);
+	hv->link_status = status ? 0 : 1;
+	PMD_PDEBUG_LOG(hv, DBG_TX, "Link Status: %s",
+			hv->link_status ? "Up" : "Down");
+	return ret;
+}
+
+int
+hv_rf_on_device_add(struct hv_data *hv)
+{
+	int ret;
+
+	PMD_INIT_FUNC_TRACE();
+
+	hv->closed = 0;
+	hv->rb_data_size = hv->rb_size - sizeof(struct hv_vmbus_ring_buffer);
+	PMD_PDEBUG_LOG(hv, DBG_LOAD, "hv->rb_data_size = %u", hv->rb_data_size);
+
+	if (unlikely(hv->in->interrupt_mask == 0)) {
+		PMD_PINFO_LOG(hv, DBG_LOAD, "Disabling interrupts from host");
+		hv->in->interrupt_mask = 1;
+		rte_mb();
+	}
+
+	hv->netvsc_packet = rte_zmalloc("", sizeof(struct netvsc_packet),
+					RTE_CACHE_LINE_SIZE);
+	if (hv->netvsc_packet == NULL)
+		return -ENOMEM;
+	hv->netvsc_packet->is_data_pkt = 1;
+
+	hv->rx_comp_msg = rte_zmalloc("", sizeof(struct nvsp_msg),
+				      RTE_CACHE_LINE_SIZE);
+	if (hv->rx_comp_msg == NULL)
+		return -ENOMEM;
+
+	hv->rx_comp_msg->msg_type = nvsp_msg_1_type_send_rndis_pkt_complete;
+	hv->rx_comp_msg->msgs.send_rndis_pkt_complete.status =
+		nvsp_status_success;
+
+	memset(&hv->stats, 0, sizeof(struct hv_stats));
+
+	hv->receive_callback = hv_rf_receive_data;
+
+	/* It's for completion of requests which were sent from kernel-space part */
+	hv_nv_complete_request(hv, NULL);
+	hv_nv_complete_request(hv, NULL);
+
+	hv->rndis_dev_state = RNDIS_DEV_UNINITIALIZED;
+
+	/* Send the rndis initialization message */
+	ret = hv_rf_init_device(hv);
+	if (ret != 0) {
+		PMD_PERROR_LOG(hv, DBG_LOAD, "rndis init failed!");
+		hv_rf_on_device_remove(hv);
+		return ret;
+	}
+
+	/* Get the mac address */
+	ret = hv_rf_query_device_mac(hv);
+	if (ret != 0) {
+		PMD_PERROR_LOG(hv, DBG_LOAD, "rndis query mac failed!");
+		hv_rf_on_device_remove(hv);
+		return ret;
+	}
+
+	return ret;
+}
+
+#define HALT_COMPLETION_WAIT_COUNT      25
+
+/*
+ * RNDIS filter halt device
+ */
+static int
+hv_rf_halt_device(struct hv_data *hv)
+{
+	struct rndis_request *request;
+	struct rndis_halt_request *halt;
+	int i, ret;
+
+	PMD_INIT_FUNC_TRACE();
+
+	/* Attempt to do a rndis device halt */
+	request = hv_rndis_request(hv, REMOTE_NDIS_HALT_MSG,
+	    RNDIS_MESSAGE_SIZE(struct rndis_halt_request));
+	if (!request) {
+		PMD_PERROR_LOG(hv, DBG_LOAD, "Unable to create RNDIS_HALT request");
+		return -1;
+	}
+
+	/* initialize "poor man's semaphore" */
+	hv->hlt_req_sent = 0;
+
+	/* Set up the rndis set */
+	halt = &request->request_msg->msg.halt_request;
+	hv->new_request_id++;
+	halt->request_id = hv->new_request_id;
+
+	ret = hv_rf_send_request(hv, request);
+	if (ret) {
+		PMD_PERROR_LOG(hv, DBG_LOAD, "Failed to send RNDIS_HALT request: %d",
+				ret);
+		return ret;
+	}
+
+	/*
+	 * Wait for halt response from halt callback.  We must wait for
+	 * the transaction response before freeing the request and other
+	 * resources.
+	 */
+	for (i = HALT_COMPLETION_WAIT_COUNT; i > 0; i--) {
+		hv_nv_complete_request(hv, request);
+		if (hv->hlt_req_sent != 0) {
+			PMD_PDEBUG_LOG(hv, DBG_LOAD, "Completed HALT request at %d try",
+					HALT_COMPLETION_WAIT_COUNT - i + 1);
+			break;
+		}
+	}
+	hv->hlt_req_sent = 0;
+	if (i == 0) {
+		PMD_PERROR_LOG(hv, DBG_LOAD, "RNDIS_HALT request was not completed!");
+		rte_free(request);
+		return -1;
+	}
+
+	hv->rndis_dev_state = RNDIS_DEV_UNINITIALIZED;
+
+	rte_free(request);
+
+	return 0;
+}
+
+#define HV_TX_DRAIN_TRIES 50
+static inline int
+hyperv_tx_drain(struct hv_data *hv)
+{
+	int i = HV_TX_DRAIN_TRIES;
+
+	PMD_PDEBUG_LOG(hv, DBG_LOAD, "Waiting for TXs to be completed...");
+	while (hv->num_outstanding_sends > 0 && --i) {
+		hv_nv_complete_request(hv, NULL);
+		rte_delay_ms(100);
+	}
+
+	return hv->num_outstanding_sends;
+}
+
+/*
+ * RNDIS filter on device remove
+ */
+int
+hv_rf_on_device_remove(struct hv_data *hv)
+{
+	int ret;
+
+	PMD_INIT_FUNC_TRACE();
+	hv->closed = 1;
+	if (hyperv_tx_drain(hv) > 0) {
+		/* Hypervisor is not responding, exit with error here */
+		PMD_PWARN_LOG(hv, DBG_LOAD, "Can't drain TX queue: no response");
+		return -EAGAIN;
+	}
+	PMD_PDEBUG_LOG(hv, DBG_LOAD, "TX queue is empty, can halt the device");
+
+	/* Halt and release the rndis device */
+	hv->hlt_req_pending = 1;
+	ret = hv_rf_halt_device(hv);
+	hv->hlt_req_pending = 0;
+
+	rte_free(hv->netvsc_packet);
+
+	return ret;
+}
+
+/*
+ * RNDIS filter set packet filter
+ * Sends an rndis request with the new filter, then waits for a response
+ * from the host.
+ * Returns zero on success, non-zero on failure.
+ */
+static int
+hv_rf_set_packet_filter(struct hv_data *hv, uint32_t new_filter)
+{
+	struct rndis_request *request;
+	struct rndis_set_request *set;
+	struct rndis_set_complete *set_complete;
+	uint32_t status;
+	int ret;
+
+	PMD_INIT_FUNC_TRACE();
+
+	request = hv_rndis_request(hv, REMOTE_NDIS_SET_MSG,
+			RNDIS_MESSAGE_SIZE(struct rndis_set_request) + sizeof(uint32_t));
+	if (!request) {
+		ret = -1;
+		goto cleanup;
+	}
+
+	/* Set up the rndis set */
+	set = &request->request_msg->msg.set_request;
+	set->oid = RNDIS_OID_GEN_CURRENT_PACKET_FILTER;
+	set->info_buffer_length = sizeof(uint32_t);
+	set->info_buffer_offset = sizeof(struct rndis_set_request);
+
+	rte_memcpy((void *)((unsigned long)set + sizeof(struct rndis_set_request)),
+			&new_filter, sizeof(uint32_t));
+
+	ret = hv_rf_send_request(hv, request);
+	if (ret)
+		goto cleanup;
+
+	/*
+	 * Wait for the response from the host.
+	 */
+	request->response_msg.msg.set_complete.status = 0xFFFF;
+	hv_nv_complete_request(hv, request);
+
+	set_complete = &request->response_msg.msg.set_complete;
+	if (set_complete->status == 0xFFFF) {
+		/* Host is not responding, we can't free request in this case */
+		ret = -1;
+		goto exit;
+	}
+	/* Response received, check status */
+	status = set_complete->status;
+	if (status)
+		/* Bad response status, return error */
+		ret = -2;
+
+cleanup:
+	rte_free(request);
+exit:
+	return ret;
+}
+
+/*
+ * RNDIS filter open device
+ */
+int
+hv_rf_on_open(struct hv_data *hv)
+{
+	int ret;
+
+	if (hv->closed)
+		return 0;
+
+	if (hv->jumbo_frame_support)
+		hv->receive_callback = hv_rf_receive_data_sg;
+
+	ret = hyperv_set_rx_mode(hv, 1, 0);
+	if (!ret) {
+		PMD_PDEBUG_LOG(hv, DBG_LOAD, "RNDIS device opened");
+		hv->rndis_dev_state = RNDIS_DEV_DATAINITIALIZED;
+	} else
+		PMD_PERROR_LOG(hv, DBG_LOAD, "RNDIS device is left unopened");
+
+	return ret;
+}
+
+/*
+ * RNDIS filter on close
+ */
+int
+hv_rf_on_close(struct hv_data *hv)
+{
+	int ret;
+
+	PMD_INIT_FUNC_TRACE();
+
+	if (hv->closed)
+		return 0;
+
+	if (hv->rndis_dev_state != RNDIS_DEV_DATAINITIALIZED) {
+		PMD_PDEBUG_LOG(hv, DBG_LOAD, "RNDIS device state should be"
+				" RNDIS_DEV_DATAINITIALIZED, but now it is %u",
+				hv->rndis_dev_state);
+		return 0;
+	}
+
+	ret = hv_rf_set_packet_filter(hv, 0);
+	if (!ret) {
+		PMD_PDEBUG_LOG(hv, DBG_LOAD, "RNDIS device closed");
+		hv->rndis_dev_state = RNDIS_DEV_INITIALIZED;
+	} else
+		PMD_PDEBUG_LOG(hv, DBG_LOAD, "RNDIS device is left unclosed");
+
+	return ret;
+}
+
+/*
+ * RX Flow
+ */
+int
+hyperv_get_buffer(struct hv_data *hv, void *buffer, uint32_t bufferlen)
+{
+	uint32_t bytes_rxed;
+	uint64_t request_id;
+	struct hv_vm_packet_descriptor *desc;
+
+	int ret = hv_vmbus_channel_recv_packet_raw(hv, buffer, bufferlen,
+			&bytes_rxed, &request_id, 1);
+	if (likely(ret == 0)) {
+		if (bytes_rxed) {
+			desc = (struct hv_vm_packet_descriptor *)buffer;
+
+			if (likely(desc->type ==
+						HV_VMBUS_PACKET_TYPE_DATA_USING_TRANSFER_PAGES)) {
+				hv->pkt_rxed = 0;
+				hv_nv_on_receive(hv, desc);
+				return hv->pkt_rxed;
+			}
+		}
+	}
+
+	return 0;
+}
+
+/*
+ * TX completions handler
+ */
+void
+hyperv_scan_comps(struct hv_data *hv, int allow_rx_drop)
+{
+	uint32_t bytes_rxed;
+	uint64_t request_id;
+
+	while (1) {
+		int ret = hv_vmbus_channel_recv_packet_raw(hv, hv->desc, PAGE_SIZE,
+			&bytes_rxed, &request_id, 2 | allow_rx_drop);
+
+		if (ret != 0 || !bytes_rxed)
+			break;
+
+		if (likely(hv->desc->type == HV_VMBUS_PACKET_TYPE_COMPLETION))
+			hv_nv_on_send_completion(hv, hv->desc);
+	}
+}
+
+/*
+ * Get link status
+ */
+uint8_t
+hyperv_get_link_status(struct hv_data *hv)
+{
+	if (hv_rf_query_device_link_status(hv))
+		return 2;
+	return hv->link_status;
+}
+
+/*
+ * Set/Reset RX mode
+ */
+int
+hyperv_set_rx_mode(struct hv_data *hv, uint8_t promisc, uint8_t mcast)
+{
+	PMD_INIT_FUNC_TRACE();
+
+	if (!promisc) {
+		return hv_rf_set_packet_filter(hv,
+				NDIS_PACKET_TYPE_BROADCAST                   |
+				(mcast ? NDIS_PACKET_TYPE_ALL_MULTICAST : 0) |
+				NDIS_PACKET_TYPE_DIRECTED);
+	}
+
+	return hv_rf_set_packet_filter(hv, NDIS_PACKET_TYPE_PROMISCUOUS);
+}
diff --git a/lib/librte_pmd_hyperv/hyperv_drv.h b/lib/librte_pmd_hyperv/hyperv_drv.h
new file mode 100644
index 0000000..22acad5
--- /dev/null
+++ b/lib/librte_pmd_hyperv/hyperv_drv.h
@@ -0,0 +1,558 @@
+/*-
+ * Copyright (c) 2009-2012 Microsoft Corp.
+ * Copyright (c) 2010-2012 Citrix Inc.
+ * Copyright (c) 2012 NetApp Inc.
+ * Copyright (c) 2013-2015 Brocade Communications Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice unmodified, this list of conditions, and the following
+ *    disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
+ * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
+ * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
+ * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#ifndef _HYPERV_DRV_H_
+#define _HYPERV_DRV_H_
+
+/*
+ * Definitions from hyperv.h
+ */
+#define HW_MACADDR_LEN	6
+#define HV_MAX_PAGE_BUFFER_COUNT	19
+
+#define HV_ALIGN_UP(value, align) \
+		(((value) & (align-1)) ? \
+		    (((value) + (align-1)) & ~(align-1)) : (value))
+
+/*
+ *  Connection identifier type
+ */
+union hv_vmbus_connection_id {
+	uint32_t                as_uint32_t;
+	struct {
+		uint32_t        id:24;
+		uint32_t        reserved:8;
+	} u;
+
+} __attribute__((packed));
+
+union hv_vmbus_monitor_trigger_state {
+	uint32_t as_uint32_t;
+	struct {
+		uint32_t group_enable:4;
+		uint32_t rsvd_z:28;
+	} u;
+};
+
+union hv_vmbus_monitor_trigger_group {
+	uint64_t as_uint64_t;
+	struct {
+		uint32_t pending;
+		uint32_t armed;
+	} u;
+};
+
+struct hv_vmbus_monitor_parameter {
+	union hv_vmbus_connection_id  connection_id;
+	uint16_t                flag_number;
+	uint16_t                rsvd_z;
+};
+
+/*
+ * hv_vmbus_monitor_page Layout
+ * ------------------------------------------------------
+ * | 0   | trigger_state (4 bytes) | Rsvd1 (4 bytes)     |
+ * | 8   | trigger_group[0]                              |
+ * | 10  | trigger_group[1]                              |
+ * | 18  | trigger_group[2]                              |
+ * | 20  | trigger_group[3]                              |
+ * | 28  | Rsvd2[0]                                      |
+ * | 30  | Rsvd2[1]                                      |
+ * | 38  | Rsvd2[2]                                      |
+ * | 40  | next_check_time[0][0] | next_check_time[0][1] |
+ * | ...                                                 |
+ * | 240 | latency[0][0..3]                              |
+ * | 340 | Rsvz3[0]                                      |
+ * | 440 | parameter[0][0]                               |
+ * | 448 | parameter[0][1]                               |
+ * | ...                                                 |
+ * | 840 | Rsvd4[0]                                      |
+ * ------------------------------------------------------
+ */
+
+struct hv_vmbus_monitor_page {
+	union hv_vmbus_monitor_trigger_state  trigger_state;
+	uint32_t                        rsvd_z1;
+
+	union hv_vmbus_monitor_trigger_group  trigger_group[4];
+	uint64_t                        rsvd_z2[3];
+
+	int32_t                         next_check_time[4][32];
+
+	uint16_t                        latency[4][32];
+	uint64_t                        rsvd_z3[32];
+
+	struct hv_vmbus_monitor_parameter      parameter[4][32];
+
+	uint8_t                         rsvd_z4[1984];
+};
+
+enum hv_vmbus_packet_type {
+	HV_VMBUS_PACKET_TYPE_DATA_USING_TRANSFER_PAGES		= 0x7,
+	HV_VMBUS_PACKET_TYPE_DATA_USING_GPA_DIRECT		= 0x9,
+	HV_VMBUS_PACKET_TYPE_COMPLETION				= 0xb,
+};
+
+#define HV_VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED    1
+
+struct hv_vm_packet_descriptor {
+	uint16_t type;
+	uint16_t data_offset8;
+	uint16_t length8;
+	uint16_t flags;
+	uint64_t transaction_id;
+} __attribute__((packed));
+
+struct hv_vm_transfer_page {
+	uint32_t byte_count;
+	uint32_t byte_offset;
+} __attribute__((packed));
+
+struct hv_vm_transfer_page_packet_header {
+	struct hv_vm_packet_descriptor d;
+	uint16_t                       transfer_page_set_id;
+	uint8_t                        sender_owns_set;
+	uint8_t                        reserved;
+	uint32_t                       range_count;
+	struct hv_vm_transfer_page     ranges[1];
+} __attribute__((packed));
+
+struct hv_vmbus_ring_buffer {
+	volatile uint32_t       write_index;
+	volatile uint32_t       read_index;
+	/*
+	 * NOTE: The interrupt_mask field is used only for channels, but
+	 * vmbus connection also uses this data structure
+	 */
+	volatile uint32_t       interrupt_mask;
+	/* pad it to PAGE_SIZE so that data starts on a page */
+	uint8_t                 reserved[4084];
+
+	/*
+	 * WARNING: Ring data starts here + ring_data_start_offset
+	 *  !!! DO NOT place any fields below this !!!
+	 */
+	uint8_t			buffer[0];	/* doubles as interrupt mask */
+} __attribute__((packed));
+
+struct hv_vmbus_page_buffer {
+	uint32_t	length;
+	uint32_t	offset;
+	uint64_t	pfn;
+} __attribute__((packed));
+
+/*
+ * Definitions from hv_vmbus_priv.h
+ */
+struct hv_vmbus_sg_buffer_list {
+	void		*data;
+	uint32_t	length;
+};
+
+struct hv_vmbus_channel_packet_page_buffer {
+	uint16_t		type;
+	uint16_t		data_offset8;
+	uint16_t		length8;
+	uint16_t		flags;
+	uint64_t		transaction_id;
+	uint32_t		reserved;
+	uint32_t		range_count;
+	struct hv_vmbus_page_buffer	range[HV_MAX_PAGE_BUFFER_COUNT];
+} __attribute__((packed));
+
+/*
+ * Definitions from hv_net_vsc.h
+ */
+#define NETVSC_PACKET_MAXPAGE 16
+#define NETVSC_PACKET_SIZE    256
+
+/*
+ * This message is used by both the VSP and the VSC to complete
+ * a RNDIS message to the opposite channel endpoint.  At this
+ * point, the initiator of this message cannot use any resources
+ * associated with the original RNDIS packet.
+ */
+enum nvsp_status_ {
+	nvsp_status_none = 0,
+	nvsp_status_success,
+	nvsp_status_failure,
+};
+
+struct nvsp_1_msg_send_rndis_pkt_complete {
+	uint32_t                                status;
+} __attribute__((packed));
+
+enum nvsp_msg_type {
+	/*
+	 * Version 1 Messages
+	 */
+	nvsp_msg_1_type_send_ndis_vers          = 100,
+
+	nvsp_msg_1_type_send_rx_buf,
+	nvsp_msg_1_type_send_rx_buf_complete,
+	nvsp_msg_1_type_revoke_rx_buf,
+
+	nvsp_msg_1_type_send_send_buf,
+	nvsp_msg_1_type_send_send_buf_complete,
+	nvsp_msg_1_type_revoke_send_buf,
+
+	nvsp_msg_1_type_send_rndis_pkt,
+	nvsp_msg_1_type_send_rndis_pkt_complete,
+};
+
+struct nvsp_1_msg_send_rndis_pkt {
+	/*
+	 * This field is specified by RNDIS.  They assume there's
+	 * two different channels of communication. However,
+	 * the Network VSP only has one.  Therefore, the channel
+	 * travels with the RNDIS packet.
+	 */
+	uint32_t                                chan_type;
+
+	/*
+	 * This field is used to send part or all of the data
+	 * through a send buffer. This value specifies an
+	 * index into the send buffer.  If the index is
+	 * 0xFFFFFFFF, then the send buffer is not being used
+	 * and all of the data was sent through other VMBus
+	 * mechanisms.
+	 */
+	uint32_t                                send_buf_section_idx;
+	uint32_t                                send_buf_section_size;
+} __attribute__((packed));
+
+/*
+ * ALL Messages
+ */
+struct nvsp_msg {
+	uint32_t                                msg_type;
+	union {
+		struct nvsp_1_msg_send_rndis_pkt               send_rndis_pkt;
+		struct nvsp_1_msg_send_rndis_pkt_complete      send_rndis_pkt_complete;
+		/* size is set like in linux kernel driver */
+		uint8_t raw[24];
+	} msgs;
+} __attribute__((packed));
+
+#define NETVSC_RECEIVE_BUFFER_ID                0xcafe
+
+struct netvsc_packet {
+	uint8_t             is_data_pkt;      /* One byte */
+	uint8_t             ext_pages;
+	uint16_t            vlan_tci;
+
+	void                *extension;
+	uint64_t            extension_phys_addr;
+	uint32_t            tot_data_buf_len;
+	uint32_t            page_buf_count;
+	struct hv_vmbus_page_buffer page_buffers[NETVSC_PACKET_MAXPAGE];
+};
+
+/*
+ * Definitions from hv_rndis.h
+ */
+#define RNDIS_MAJOR_VERSION                             0x00000001
+#define RNDIS_MINOR_VERSION                             0x00000000
+
+#define STATUS_BUFFER_OVERFLOW                          (0x80000005L)
+
+/*
+ * Remote NDIS message types
+ */
+#define REMOTE_NDIS_PACKET_MSG                          0x00000001
+#define REMOTE_NDIS_INITIALIZE_MSG                      0x00000002
+#define REMOTE_NDIS_HALT_MSG                            0x00000003
+#define REMOTE_NDIS_QUERY_MSG                           0x00000004
+#define REMOTE_NDIS_SET_MSG                             0x00000005
+#define REMOTE_NDIS_RESET_MSG                           0x00000006
+#define REMOTE_NDIS_INDICATE_STATUS_MSG                 0x00000007
+#define REMOTE_NDIS_KEEPALIVE_MSG                       0x00000008
+/*
+ * Remote NDIS message completion types
+ */
+#define REMOTE_NDIS_INITIALIZE_CMPLT                    0x80000002
+#define REMOTE_NDIS_QUERY_CMPLT                         0x80000004
+#define REMOTE_NDIS_SET_CMPLT                           0x80000005
+#define REMOTE_NDIS_RESET_CMPLT                         0x80000006
+#define REMOTE_NDIS_KEEPALIVE_CMPLT                     0x80000008
+
+#define RNDIS_OID_GEN_MEDIA_CONNECT_STATUS              0x00010114
+#define RNDIS_OID_GEN_CURRENT_PACKET_FILTER             0x0001010E
+#define RNDIS_OID_802_3_PERMANENT_ADDRESS               0x01010101
+#define RNDIS_OID_802_3_CURRENT_ADDRESS                 0x01010102
+#define RNDIS_OID_GEN_RNDIS_CONFIG_PARAMETER            0x0001021B
+
+#define RNDIS_CONFIG_PARAM_TYPE_STRING      2
+/* extended info after the RNDIS request message */
+#define RNDIS_EXT_LEN                       100
+/*
+ * Packet extension field contents associated with a Data message.
+ */
+struct rndis_per_packet_info {
+	uint32_t            size;
+	uint32_t            type;
+	uint32_t            per_packet_info_offset;
+};
+
+#define ieee_8021q_info 6
+
+struct ndis_8021q_info {
+	union {
+		struct {
+			uint32_t   user_pri:3;  /* User Priority */
+			uint32_t   cfi:1;  /* Canonical Format ID */
+			uint32_t   vlan_id:12;
+			uint32_t   reserved:16;
+		} s1;
+		uint32_t    value;
+	} u1;
+};
+
+/* Format of Information buffer passed in a SetRequest for the OID */
+/* OID_GEN_RNDIS_CONFIG_PARAMETER. */
+struct rndis_config_parameter_info {
+	uint32_t parameter_name_offset;
+	uint32_t parameter_name_length;
+	uint32_t parameter_type;
+	uint32_t parameter_value_offset;
+	uint32_t parameter_value_length;
+};
+
+/*
+ * NdisInitialize message
+ */
+struct rndis_initialize_request {
+	/* RNDIS request ID */
+	uint32_t            request_id;
+	uint32_t            major_version;
+	uint32_t            minor_version;
+	uint32_t            max_xfer_size;
+};
+
+/*
+ * Response to NdisInitialize
+ */
+struct rndis_initialize_complete {
+	/* RNDIS request ID */
+	uint32_t            request_id;
+	/* RNDIS status */
+	uint32_t            status;
+	uint32_t            major_version;
+	uint32_t            minor_version;
+	uint32_t            device_flags;
+	/* RNDIS medium */
+	uint32_t            medium;
+	uint32_t            max_pkts_per_msg;
+	uint32_t            max_xfer_size;
+	uint32_t            pkt_align_factor;
+	uint32_t            af_list_offset;
+	uint32_t            af_list_size;
+};
+
+/*
+ * NdisSetRequest message
+ */
+struct rndis_set_request {
+	/* RNDIS request ID */
+	uint32_t            request_id;
+	/* RNDIS OID */
+	uint32_t            oid;
+	uint32_t            info_buffer_length;
+	uint32_t            info_buffer_offset;
+	/* RNDIS handle */
+	uint32_t            device_vc_handle;
+};
+
+/*
+ * Response to NdisSetRequest
+ */
+struct rndis_set_complete {
+	/* RNDIS request ID */
+	uint32_t            request_id;
+	/* RNDIS status */
+	uint32_t            status;
+};
+
+/*
+ * NdisQueryRequest message
+ */
+struct rndis_query_request {
+	/* RNDIS request ID */
+	uint32_t            request_id;
+	/* RNDIS OID */
+	uint32_t            oid;
+	uint32_t            info_buffer_length;
+	uint32_t            info_buffer_offset;
+	/* RNDIS handle */
+	uint32_t            device_vc_handle;
+};
+
+/*
+ * Response to NdisQueryRequest
+ */
+struct rndis_query_complete {
+	/* RNDIS request ID */
+	uint32_t            request_id;
+	/* RNDIS status */
+	uint32_t            status;
+	uint32_t            info_buffer_length;
+	uint32_t            info_buffer_offset;
+};
+
+/*
+ * Data message. All offset fields contain byte offsets from the beginning
+ * of the rndis_packet structure. All length fields are in bytes.
+ * VcHandle is set to 0 for connectionless data, otherwise it
+ * contains the VC handle.
+ */
+struct rndis_packet {
+	uint32_t            data_offset;
+	uint32_t            data_length;
+	uint32_t            oob_data_offset;
+	uint32_t            oob_data_length;
+	uint32_t            num_oob_data_elements;
+	uint32_t            per_pkt_info_offset;
+	uint32_t            per_pkt_info_length;
+	/* RNDIS handle */
+	uint32_t            vc_handle;
+	uint32_t            reserved;
+};
+
+/*
+ * NdisHalt message
+ */
+struct rndis_halt_request {
+	/* RNDIS request ID */
+	uint32_t            request_id;
+};
+
+/*
+ * NdisMIndicateStatus message
+ */
+struct rndis_indicate_status {
+	/* RNDIS status */
+	uint32_t                                status;
+	uint32_t                                status_buf_length;
+	uint32_t                                status_buf_offset;
+};
+
+#define RNDIS_STATUS_MEDIA_CONNECT              (0x4001000BL)
+#define RNDIS_STATUS_MEDIA_DISCONNECT           (0x4001000CL)
+#define RNDIS_STATUS_INVALID_DATA               (0xC0010015L)
+
+/*
+ * union with all of the RNDIS messages
+ */
+union rndis_msg_container {
+	struct rndis_initialize_request                init_request;
+	struct rndis_initialize_complete               init_complete;
+	struct rndis_set_request                       set_request;
+	struct rndis_set_complete                      set_complete;
+	struct rndis_query_request                     query_request;
+	struct rndis_query_complete                    query_complete;
+	struct rndis_packet                            packet;
+	struct rndis_halt_request                      halt_request;
+	struct rndis_indicate_status                   indicate_status;
+#if 0
+	rndis_keepalive_request                 keepalive_request;
+	rndis_reset_request                     reset_request;
+	rndis_reset_complete                    reset_complete;
+	rndis_keepalive_complete                keepalive_complete;
+	rcondis_mp_create_vc                    co_miniport_create_vc;
+	rcondis_mp_delete_vc                    co_miniport_delete_vc;
+	rcondis_indicate_status                 co_miniport_status;
+	rcondis_mp_activate_vc_request          co_miniport_activate_vc;
+	rcondis_mp_deactivate_vc_request        co_miniport_deactivate_vc;
+	rcondis_mp_create_vc_complete           co_miniport_create_vc_complete;
+	rcondis_mp_delete_vc_complete           co_miniport_delete_vc_complete;
+	rcondis_mp_activate_vc_complete         co_miniport_activate_vc_complete;
+	rcondis_mp_deactivate_vc_complete       co_miniport_deactivate_vc_complete;
+#endif
+	uint32_t packet_ex[16]; /* to pad the union size */
+};
+
+struct rndis_msg {
+	uint32_t         ndis_msg_type;
+
+	/*
+	 * Total length of this message, from the beginning
+	 * of the rndis_msg struct, in bytes.
+	 */
+	uint32_t         msg_len;
+
+	/* Actual message */
+	union rndis_msg_container msg;
+};
+
+#define RNDIS_HEADER_SIZE (sizeof(struct rndis_msg) - sizeof(union rndis_msg_container))
+
+#define NDIS_PACKET_TYPE_DIRECTED       0x00000001
+#define NDIS_PACKET_TYPE_MULTICAST      0x00000002
+#define NDIS_PACKET_TYPE_ALL_MULTICAST  0x00000004
+#define NDIS_PACKET_TYPE_BROADCAST      0x00000008
+#define NDIS_PACKET_TYPE_SOURCE_ROUTING 0x00000010
+#define NDIS_PACKET_TYPE_PROMISCUOUS    0x00000020
+
+/*
+ * get the size of an RNDIS message. Pass in the message type,
+ * rndis_set_request, rndis_packet for example
+ */
+#define RNDIS_MESSAGE_SIZE(message) \
+	(sizeof(message) + (sizeof(struct rndis_msg) - sizeof(union rndis_msg_container)))
+
+
+/*
+ * Definitions from hv_rndis_filter.h
+ */
+enum {
+	RNDIS_DEV_UNINITIALIZED = 0,
+	RNDIS_DEV_INITIALIZING,
+	RNDIS_DEV_INITIALIZED,
+	RNDIS_DEV_DATAINITIALIZED,
+};
+
+struct rndis_request {
+	/* assumed a fixed size response here. */
+	struct rndis_msg    response_msg;
+
+	/* Simplify allocation by having a netvsc packet inline */
+	struct netvsc_packet pkt;
+	/* set additional buffer since packet can cross page boundary */
+	struct hv_vmbus_page_buffer buffer;
+	/* assumed a fixed size request here. */
+	struct rndis_msg    *request_msg;
+	const struct rte_memzone *request_msg_memzone;
+};
+
+struct rndis_filter_packet {
+	struct rndis_msg                       message;
+};
+
+#endif /* _HYPERV_DRV_H_ */
diff --git a/lib/librte_pmd_hyperv/hyperv_ethdev.c b/lib/librte_pmd_hyperv/hyperv_ethdev.c
new file mode 100644
index 0000000..7b909db
--- /dev/null
+++ b/lib/librte_pmd_hyperv/hyperv_ethdev.c
@@ -0,0 +1,332 @@
+/*-
+ * Copyright (c) 2013-2015 Brocade Communications Systems, Inc.
+ * All rights reserved.
+ */
+
+#include <assert.h>
+#include <unistd.h>
+#include "hyperv.h"
+
+static struct rte_vmbus_id vmbus_id_hyperv_map[] = {
+	{
+		.device_id = 0x0,
+	},
+};
+
+static void
+hyperv_dev_info_get(__rte_unused struct rte_eth_dev *dev,
+		struct rte_eth_dev_info *dev_info)
+{
+	PMD_INIT_FUNC_TRACE();
+	dev_info->max_rx_queues  = HV_MAX_RX_QUEUES;
+	dev_info->max_tx_queues  = HV_MAX_TX_QUEUES;
+	dev_info->min_rx_bufsize = HV_MIN_RX_BUF_SIZE;
+	dev_info->max_rx_pktlen  = HV_MAX_RX_PKT_LEN;
+	dev_info->max_mac_addrs  = HV_MAX_MAC_ADDRS;
+}
+
+inline int
+rte_hv_dev_atomic_write_link_status(struct rte_eth_dev *dev,
+		struct rte_eth_link *link)
+{
+	struct rte_eth_link *dst = &(dev->data->dev_link);
+	struct rte_eth_link *src = link;
+
+	if (rte_atomic64_cmpset((uint64_t *)dst, *(uint64_t *)dst,
+				*(uint64_t *)src) == 0)
+		return -1;
+
+	return 0;
+}
+
+inline int
+rte_hv_dev_atomic_read_link_status(struct rte_eth_dev *dev,
+		struct rte_eth_link *link)
+{
+	struct rte_eth_link *dst = link;
+	struct rte_eth_link *src = &(dev->data->dev_link);
+
+	if (rte_atomic64_cmpset((uint64_t *)dst, *(uint64_t *)dst,
+				*(uint64_t *)src) == 0)
+		return -1;
+
+	return 0;
+}
+
+/* return 0 means link status changed, -1 means not changed */
+static int
+hyperv_dev_link_update(struct rte_eth_dev *dev,
+		__rte_unused int wait_to_complete)
+{
+	uint8_t ret;
+	struct rte_eth_link old, link;
+	struct hv_data *hv = dev->data->dev_private;
+
+	PMD_INIT_FUNC_TRACE();
+	memset(&old, 0, sizeof(old));
+	memset(&link, 0, sizeof(link));
+	rte_hv_dev_atomic_read_link_status(dev, &old);
+	if (!hv->link_status && (hv->link_req_cnt == HV_MAX_LINK_REQ)) {
+		ret = hyperv_get_link_status(hv);
+		if (ret > 1)
+			return -1;
+		hv->link_req_cnt = 0;
+	}
+	link.link_duplex = ETH_LINK_FULL_DUPLEX;
+	link.link_speed = ETH_LINK_SPEED_10000;
+	link.link_status = hv->link_status;
+	hv->link_req_cnt++;
+	rte_hv_dev_atomic_write_link_status(dev, &link);
+
+	return (old.link_status == link.link_status) ? -1 : 0;
+}
+
+static int
+hyperv_dev_configure(struct rte_eth_dev *dev)
+{
+	struct hv_data *hv = dev->data->dev_private;
+	const struct rte_eth_rxmode *rxmode = &dev->data->dev_conf.rxmode;
+
+	PMD_INIT_FUNC_TRACE();
+
+	rte_memcpy(dev->data->mac_addrs->addr_bytes, hv->hw_mac_addr,
+			ETHER_ADDR_LEN);
+	hv->jumbo_frame_support = rxmode->jumbo_frame;
+
+	return 0;
+}
+
+static int
+hyperv_init(struct rte_eth_dev *dev)
+{
+	struct hv_data *hv = dev->data->dev_private;
+	struct rte_vmbus_device *vmbus_dev;
+
+	vmbus_dev = dev->vmbus_dev;
+	hv->uio_fd = vmbus_dev->uio_fd;
+	hv->kernel_initialized = 1;
+	hv->vmbus_device = vmbus_dev->id.device_id;
+	hv->monitor_bit = (uint8_t)(vmbus_dev->vmbus_monitor_id % 32);
+	hv->monitor_group = (uint8_t)(vmbus_dev->vmbus_monitor_id / 32);
+	PMD_PDEBUG_LOG(hv, DBG_LOAD, "hyperv_init for vmbus device %d",
+			vmbus_dev->id.device_id);
+
+	/* get the memory mappings */
+	hv->ring_pages = vmbus_dev->mem_resource[TXRX_RING_MAP].addr;
+	hv->int_page = vmbus_dev->mem_resource[INT_PAGE_MAP].addr;
+	hv->monitor_pages =
+		(struct hv_vmbus_monitor_page *)
+		vmbus_dev->mem_resource[MON_PAGE_MAP].addr;
+	hv->recv_buf = vmbus_dev->mem_resource[RECV_BUF_MAP].addr;
+	assert(hv->ring_pages);
+	assert(hv->int_page);
+	assert(hv->monitor_pages);
+	assert(hv->recv_buf);
+
+	/* separate send/recv int_pages */
+	hv->recv_interrupt_page = hv->int_page;
+
+	hv->send_interrupt_page =
+		((uint8_t *) hv->int_page + (PAGE_SIZE >> 1));
+
+	/* retrieve in/out ring_buffers */
+	hv->out = hv->ring_pages;
+	hv->in  = (void *)((uint64_t)hv->out +
+			(vmbus_dev->mem_resource[TXRX_RING_MAP].len / 2));
+	hv->rb_size = (vmbus_dev->mem_resource[TXRX_RING_MAP].len / 2);
+
+	dev->rx_pkt_burst = hyperv_recv_pkts;
+	dev->tx_pkt_burst = hyperv_xmit_pkts;
+
+	return hv_rf_on_device_add(hv);
+}
+
+#define HV_DEV_ID (hv->vmbus_device << 1)
+#define HV_MTU (dev->data->dev_conf.rxmode.max_rx_pkt_len << 9)
+
+static int
+hyperv_dev_start(struct rte_eth_dev *dev)
+{
+	int ret;
+	uint32_t cmd;
+	size_t bytes;
+	struct hv_data *hv = dev->data->dev_private;
+
+	PMD_INIT_FUNC_TRACE();
+	if (!hv->kernel_initialized) {
+		cmd = HV_DEV_ID | HV_MTU;
+		bytes = write(hv->uio_fd, &cmd, sizeof(uint32_t));
+		if (bytes < sizeof(uint32_t)) {
+			PMD_PERROR_LOG(hv, DBG_LOAD, "write on uio_fd %d failed",
+					hv->uio_fd);
+			return -1;
+		}
+		ret = vmbus_uio_map_resource(dev->vmbus_dev);
+		if (ret < 0) {
+			PMD_PERROR_LOG(hv, DBG_LOAD, "Failed to map resources");
+			return ret;
+		}
+		ret = hyperv_init(dev);
+		if (ret)
+			return ret;
+	}
+	ret = hv_rf_on_open(hv);
+	if (ret) {
+		PMD_PERROR_LOG(hv, DBG_LOAD, "hv_rf_on_open failed");
+		return ret;
+	}
+	hv->link_req_cnt = HV_MAX_LINK_REQ;
+
+	return ret;
+}
+
+static void
+hyperv_dev_stop(struct rte_eth_dev *dev)
+{
+	struct hv_data *hv = dev->data->dev_private;
+	uint32_t cmd;
+	size_t bytes;
+
+	PMD_INIT_FUNC_TRACE();
+	if (!hv->closed) {
+		hv_rf_on_close(hv);
+		hv_rf_on_device_remove(hv);
+		if (hv->kernel_initialized) {
+			cmd = 1 | HV_DEV_ID;
+			bytes = write(hv->uio_fd, &cmd, sizeof(uint32_t));
+			if (bytes)
+				hv->kernel_initialized = 0;
+			else
+				PMD_PWARN_LOG(hv, DBG_LOAD, "write to uio_fd %d failed: (%zu)b",
+						hv->uio_fd, bytes);
+		}
+		hv->link_status = 0;
+	}
+}
+
+static void
+hyperv_dev_close(struct rte_eth_dev *dev)
+{
+	PMD_INIT_FUNC_TRACE();
+	hyperv_dev_stop(dev);
+}
+
+static void
+hyperv_dev_promisc_enable(struct rte_eth_dev *dev)
+{
+	struct hv_data *hv = dev->data->dev_private;
+
+	PMD_INIT_FUNC_TRACE();
+	hyperv_set_rx_mode(hv, 1, dev->data->all_multicast);
+}
+
+static void
+hyperv_dev_promisc_disable(struct rte_eth_dev *dev)
+{
+	struct hv_data *hv = dev->data->dev_private;
+
+	PMD_INIT_FUNC_TRACE();
+	hyperv_set_rx_mode(hv, 0, dev->data->all_multicast);
+}
+
+static void
+hyperv_dev_allmulticast_enable(struct rte_eth_dev *dev)
+{
+	struct hv_data *hv = dev->data->dev_private;
+
+	PMD_INIT_FUNC_TRACE();
+	hyperv_set_rx_mode(hv, dev->data->promiscuous, 1);
+}
+
+static void
+hyperv_dev_allmulticast_disable(struct rte_eth_dev *dev)
+{
+	struct hv_data *hv = dev->data->dev_private;
+
+	PMD_INIT_FUNC_TRACE();
+	hyperv_set_rx_mode(hv, dev->data->promiscuous, 0);
+}
+
+static void
+hyperv_dev_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+	struct hv_data *hv = dev->data->dev_private;
+	struct hv_stats *st = &hv->stats;
+
+	PMD_INIT_FUNC_TRACE();
+
+	memset(stats, 0, sizeof(struct rte_eth_stats));
+
+	stats->opackets = st->opkts;
+	stats->obytes = st->obytes;
+	stats->oerrors = st->oerrors;
+	stats->ipackets = st->ipkts;
+	stats->ibytes = st->ibytes;
+	stats->ierrors = st->ierrors;
+	stats->rx_nombuf = st->rx_nombuf;
+}
+
+static struct eth_dev_ops hyperv_eth_dev_ops = {
+	.dev_configure		= hyperv_dev_configure,
+	.dev_start		= hyperv_dev_start,
+	.dev_stop		= hyperv_dev_stop,
+	.dev_infos_get		= hyperv_dev_info_get,
+	.rx_queue_release	= hyperv_dev_rx_queue_release,
+	.tx_queue_release	= hyperv_dev_tx_queue_release,
+	.rx_queue_setup		= hyperv_dev_rx_queue_setup,
+	.tx_queue_setup		= hyperv_dev_tx_queue_setup,
+	.dev_close		= hyperv_dev_close,
+	.promiscuous_enable	= hyperv_dev_promisc_enable,
+	.promiscuous_disable	= hyperv_dev_promisc_disable,
+	.allmulticast_enable	= hyperv_dev_allmulticast_enable,
+	.allmulticast_disable	= hyperv_dev_allmulticast_disable,
+	.link_update		= hyperv_dev_link_update,
+	.stats_get		= hyperv_dev_stats_get,
+};
+
+static int
+eth_hyperv_dev_init(struct rte_eth_dev *eth_dev)
+{
+	int ret;
+
+	PMD_INIT_FUNC_TRACE();
+
+	eth_dev->dev_ops = &hyperv_eth_dev_ops;
+	eth_dev->data->mac_addrs = rte_malloc("mac_addrs",
+					      sizeof(struct ether_addr),
+					      RTE_CACHE_LINE_SIZE);
+	if (!eth_dev->data->mac_addrs) {
+		PMD_PERROR_LOG(hv, DBG_LOAD, "unable to allocate memory for mac addrs");
+		return -1;
+	}
+
+	ret = hyperv_init(eth_dev);
+
+	return ret;
+}
+
+static struct eth_driver rte_hyperv_pmd = {
+	.vmbus_drv = {
+		.name = "rte_hyperv_pmd",
+		.module_name = "hv_uio",
+		.id_table = vmbus_id_hyperv_map,
+	},
+	.bus_type = RTE_BUS_VMBUS,
+	.eth_dev_init = eth_hyperv_dev_init,
+	.dev_private_size = sizeof(struct hv_data),
+};
+
+static int
+rte_hyperv_pmd_init(const char *name __rte_unused,
+		    const char *param __rte_unused)
+{
+	rte_eth_driver_register(&rte_hyperv_pmd);
+	return 0;
+}
+
+static struct rte_driver rte_hyperv_driver = {
+	.type = PMD_PDEV,
+	.init = rte_hyperv_pmd_init,
+};
+
+PMD_REGISTER_DRIVER(rte_hyperv_driver);
diff --git a/lib/librte_pmd_hyperv/hyperv_logs.h b/lib/librte_pmd_hyperv/hyperv_logs.h
new file mode 100644
index 0000000..1b96468
--- /dev/null
+++ b/lib/librte_pmd_hyperv/hyperv_logs.h
@@ -0,0 +1,69 @@
+/*-
+ *   Copyright(c) 2013-2015 Brocade Communications Systems, Inc.
+ *   All rights reserved.
+ */
+
+#ifndef _HYPERV_LOGS_H_
+#define _HYPERV_LOGS_H_
+
+#ifdef RTE_LIBRTE_HV_DEBUG_INIT
+#define PMD_INIT_LOG(level, fmt, args...) \
+	RTE_LOG(level, PMD, "%s(): " fmt "\n", __func__, ## args)
+#define PMD_INIT_FUNC_TRACE() PMD_INIT_LOG(DEBUG, " >>")
+#else
+#define PMD_INIT_LOG(level, fmt, args...) do { } while (0)
+#define PMD_INIT_FUNC_TRACE() do { } while (0)
+#endif
+
+#ifdef RTE_LIBRTE_HV_DEBUG
+
+#define RTE_DBG_LOAD   INIT
+#define RTE_DBG_STATS  STATS
+#define RTE_DBG_TX     TX
+#define RTE_DBG_RX     RX
+#define RTE_DBG_MBUF   MBUF
+#define RTE_DBG_ASSERT ASRT
+#define RTE_DBG_RB     RB
+#define RTE_DBG_VMBUS  VMBUS
+#define RTE_DBG_ALL    ALL
+
+#define STR(x) #x
+
+#define HV_RTE_LOG(hv, codepath, level, fmt, args...) \
+	RTE_LOG(level, PMD, "[%d]: %-6s: %s: " fmt "\n", \
+		hv->vmbus_device, STR(codepath), __func__, ## args)
+
+#define PMD_PDEBUG_LOG(hv, codepath, fmt, args...) \
+do { \
+	if (unlikely(hv->debug & (codepath))) \
+		HV_RTE_LOG(hv, RTE_##codepath, DEBUG, fmt, ## args) \
+} while (0)
+
+#define PMD_PINFO_LOG(hv, codepath, fmt, args...) \
+do { \
+	if (unlikely(hv->debug & (codepath))) \
+		HV_RTE_LOG(hv, RTE_##codepath, INFO, fmt, ## args) \
+} while (0)
+
+#define PMD_PWARN_LOG(hv, codepath, fmt, args...) \
+do { \
+	if (unlikely(hv->debug & (codepath))) \
+		HV_RTE_LOG(hv, RTE_##codepath, WARNING, fmt, ## args) \
+} while (0)
+
+#define PMD_PERROR_LOG(hv, codepath, fmt, args...) \
+do { \
+	if (unlikely(hv->debug & (codepath))) \
+		HV_RTE_LOG(hv, RTE_##codepath, ERR, fmt, ## args) \
+} while (0)
+#else
+#define HV_RTE_LOG(level, fmt, args...) do { } while (0)
+#define PMD_PDEBUG_LOG(fmt, args...) do { } while (0)
+#define PMD_PINFO_LOG(fmt, args...) do { } while (0)
+#define PMD_PWARN_LOG(fmt, args...) do { } while (0)
+#define PMD_PERROR_LOG(fmt, args...) do { } while (0)
+#undef RTE_LIBRTE_HV_DEBUG_TX
+#undef RTE_LIBRTE_HV_DEBUG_RX
+#endif
+
+#endif /* _HYPERV_LOGS_H_ */
diff --git a/lib/librte_pmd_hyperv/hyperv_rxtx.c b/lib/librte_pmd_hyperv/hyperv_rxtx.c
new file mode 100644
index 0000000..9e423d0
--- /dev/null
+++ b/lib/librte_pmd_hyperv/hyperv_rxtx.c
@@ -0,0 +1,403 @@
+/*-
+ *   Copyright(c) 2013-2015 Brocade Communications Systems, Inc.
+ *   All rights reserved.
+ */
+
+#include "hyperv.h"
+#include "hyperv_rxtx.h"
+#include "hyperv_drv.h"
+
+#define RTE_MBUF_DATA_DMA_ADDR(mb) \
+	((uint64_t)((mb)->buf_physaddr + (mb)->data_off))
+
+#define RPPI_SIZE	(sizeof(struct rndis_per_packet_info)\
+			 + sizeof(struct ndis_8021q_info))
+#define RNDIS_OFF	(sizeof(struct netvsc_packet) + RPPI_SIZE)
+#define TX_PKT_SIZE	(RNDIS_OFF + sizeof(struct rndis_filter_packet) * 2)
+
+static inline struct rte_mbuf *
+hv_rxmbuf_alloc(struct rte_mempool *mp)
+{
+	return	__rte_mbuf_raw_alloc(mp);
+}
+
+static inline int
+hyperv_has_rx_work(struct hv_data *hv)
+{
+	return hv->in->read_index != hv->in->write_index;
+}
+
+#ifndef DEFAULT_TX_FREE_THRESHOLD
+#define DEFAULT_TX_FREE_THRESHOLD 32
+#endif
+
+int
+hyperv_dev_tx_queue_setup(struct rte_eth_dev *dev,
+			  uint16_t queue_idx,
+			  uint16_t nb_desc,
+			  unsigned int socket_id,
+			  const struct rte_eth_txconf *tx_conf)
+
+{
+	struct hv_data *hv = dev->data->dev_private;
+	const struct rte_memzone *tz;
+	struct hv_tx_queue *txq;
+	char tz_name[RTE_MEMZONE_NAMESIZE];
+	uint32_t i, delta = 0, new_delta;
+	struct netvsc_packet *pkt;
+
+	PMD_INIT_FUNC_TRACE();
+
+	txq = rte_zmalloc_socket("ethdev TX queue", sizeof(struct hv_tx_queue),
+				 RTE_CACHE_LINE_SIZE, socket_id);
+	if (txq == NULL) {
+		PMD_PERROR_LOG(hv, DBG_LOAD, "rte_zmalloc for tx_queue failed");
+		return -ENOMEM;
+	}
+
+	if (tx_conf->tx_free_thresh >= nb_desc) {
+		PMD_PERROR_LOG(hv, DBG_LOAD,
+			       "tx_free_thresh should be less then nb_desc");
+		return -EINVAL;
+	}
+	txq->tx_free_thresh = (tx_conf->tx_free_thresh ? tx_conf->tx_free_thresh :
+			       DEFAULT_TX_FREE_THRESHOLD);
+	txq->pkts = rte_calloc_socket("TX pkts", sizeof(void*), nb_desc,
+				       RTE_CACHE_LINE_SIZE, socket_id);
+	if (txq->pkts == NULL) {
+		PMD_PERROR_LOG(hv, DBG_LOAD, "rte_zmalloc for pkts failed");
+		return -ENOMEM;
+	}
+	sprintf(tz_name, "hv_%d_%u_%u", hv->vmbus_device, queue_idx, socket_id);
+	tz = rte_memzone_reserve_aligned(tz_name,
+					 (uint32_t)nb_desc * TX_PKT_SIZE,
+					 rte_lcore_to_socket_id(rte_lcore_id()),
+					 0, PAGE_SIZE);
+	if (tz == NULL) {
+		PMD_PERROR_LOG(hv, DBG_LOAD, "netvsc packet ring alloc fail");
+		return -ENOMEM;
+	}
+	for (i = 0; i < nb_desc; i++) {
+		pkt = txq->pkts[i] = (struct netvsc_packet *)((uint8_t *)tz->addr +
+							      i * TX_PKT_SIZE + delta);
+		pkt->extension = (uint8_t *)tz->addr + i * TX_PKT_SIZE + RNDIS_OFF + delta;
+		if (!pkt->extension) {
+			PMD_PERROR_LOG(hv, DBG_TX,
+				       "pkt->extension is NULL for %d-th pkt", i);
+			return -EINVAL;
+		}
+		pkt->extension_phys_addr =
+			tz->phys_addr + i * TX_PKT_SIZE + RNDIS_OFF + delta;
+		pkt->ext_pages = 1;
+		pkt->page_buffers[0].pfn = pkt->extension_phys_addr >> PAGE_SHIFT;
+		pkt->page_buffers[0].offset =
+			(unsigned long)pkt->extension & (PAGE_SIZE - 1);
+		pkt->page_buffers[0].length = RNDIS_MESSAGE_SIZE(struct rndis_packet);
+		if (pkt->page_buffers[0].offset + pkt->page_buffers[0].length
+		    > PAGE_SIZE) {
+			new_delta = PAGE_SIZE - pkt->page_buffers[0].offset;
+			pkt->page_buffers[0].pfn++;
+			delta += new_delta;
+			pkt->page_buffers[0].offset = 0;
+			pkt->extension = (uint8_t *)pkt->extension + new_delta;
+			pkt->extension_phys_addr += new_delta;
+		}
+	}
+	txq->sw_ring = rte_calloc_socket("txq_sw_ring",
+					 sizeof(struct rte_mbuf *), nb_desc,
+					 RTE_CACHE_LINE_SIZE, socket_id);
+	if (txq->sw_ring == NULL) {
+		hyperv_dev_tx_queue_release(txq);
+		return -ENOMEM;
+	}
+	txq->port_id = dev->data->port_id;
+	txq->nb_tx_desc = txq->tx_avail = nb_desc;
+	txq->tx_free_thresh = tx_conf->tx_free_thresh;
+	txq->hv = hv;
+	dev->data->tx_queues[queue_idx] = txq;
+	hv->txq = txq;
+
+	return 0;
+}
+
+void
+hyperv_dev_tx_queue_release(void *ptxq)
+{
+	struct hv_tx_queue *txq = ptxq;
+
+	PMD_INIT_FUNC_TRACE();
+	if (txq == NULL)
+		return;
+	rte_free(txq->sw_ring);
+	rte_free(txq->pkts);
+	rte_free(txq);
+}
+
+int
+hyperv_dev_rx_queue_setup(struct rte_eth_dev *dev,
+			  uint16_t queue_idx,
+			  uint16_t nb_desc,
+			  unsigned int socket_id,
+			  const struct rte_eth_rxconf *rx_conf,
+			  struct rte_mempool *mp)
+{
+	uint16_t i;
+	struct hv_rx_queue *rxq;
+	struct rte_mbuf *mbuf;
+	struct hv_data *hv = dev->data->dev_private;
+
+	PMD_INIT_FUNC_TRACE();
+
+	rxq = rte_zmalloc_socket("ethdev RX queue", sizeof(struct hv_rx_queue),
+				 RTE_CACHE_LINE_SIZE, socket_id);
+	if (rxq == NULL) {
+		PMD_PERROR_LOG(hv, DBG_LOAD,
+			       "rte_zmalloc for rx_queue failed!");
+		return -ENOMEM;
+	}
+	hv->desc = rxq->desc = rte_zmalloc_socket(NULL, PAGE_SIZE,
+						  RTE_CACHE_LINE_SIZE, socket_id);
+	if (rxq->desc == NULL) {
+		PMD_PERROR_LOG(hv, DBG_LOAD,
+			       "rte_zmalloc for vmbus_desc failed!");
+		hyperv_dev_rx_queue_release(rxq);
+		return -ENOMEM;
+	}
+	rxq->sw_ring = rte_calloc_socket("rxq->sw_ring",
+					 sizeof(struct mbuf *), nb_desc,
+					 RTE_CACHE_LINE_SIZE, socket_id);
+	if (rxq->sw_ring == NULL) {
+		hyperv_dev_rx_queue_release(rxq);
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < nb_desc; i++) {
+		mbuf = hv_rxmbuf_alloc(mp);
+		if (mbuf == NULL) {
+			PMD_PERROR_LOG(hv, DBG_LOAD, "RX mbuf alloc failed");
+			return -ENOMEM;
+		}
+
+		mbuf->nb_segs = 1;
+		mbuf->next = NULL;
+		mbuf->port = rxq->port_id;
+		rxq->sw_ring[i] = mbuf;
+	}
+
+	rxq->mb_pool = mp;
+	rxq->nb_rx_desc = nb_desc;
+	rxq->rx_head = 0;
+	rxq->rx_tail = 0;
+	rxq->rx_free_thresh = rx_conf->rx_free_thresh;
+	rxq->port_id = dev->data->port_id;
+	rxq->hv = hv;
+	dev->data->rx_queues[queue_idx] = rxq;
+	hv->rxq = rxq;
+	hv->max_rx_pkt_len = mp->elt_size -
+		(sizeof(struct rte_mbuf) + RTE_PKTMBUF_HEADROOM);
+
+	return 0;
+}
+
+void
+hyperv_dev_rx_queue_release(void *prxq)
+{
+	struct hv_rx_queue *rxq = prxq;
+
+	PMD_INIT_FUNC_TRACE();
+	if (rxq == NULL)
+		return;
+	rte_free(rxq->sw_ring);
+	rte_free(rxq->desc);
+	rte_free(rxq);
+}
+
+uint16_t
+hyperv_recv_pkts(void *prxq, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
+{
+	struct hv_rx_queue *rxq = prxq;
+	struct hv_data *hv = rxq->hv;
+	struct rte_mbuf *new_mb, *rx_mbuf, *first_mbuf;
+	uint16_t nb_rx = 0;
+	uint16_t segs, i;
+
+	if (unlikely(hv->closed))
+		return 0;
+
+	nb_pkts = MIN(nb_pkts, HV_MAX_PKT_BURST);
+	hyperv_scan_comps(hv, 0);
+
+	while (nb_rx < nb_pkts) {
+		/*
+		 * if there are no mbufs in sw_ring,
+		 * we need to trigger receive procedure
+		 */
+		if (rxq->rx_head == rxq->rx_tail) {
+			if (!hyperv_has_rx_work(hv))
+				break;
+
+			if (unlikely(!hyperv_get_buffer(hv, rxq->desc, PAGE_SIZE))) {
+				hyperv_scan_comps(hv, 0);
+				continue;
+			}
+		}
+
+		/*
+		 * Now the received data is in sw_ring of our rxq
+		 * we need to extract it and replace in sw_ring with new mbuf
+		 */
+		rx_mbuf = first_mbuf = rxq->sw_ring[rxq->rx_head];
+		segs = first_mbuf->nb_segs;
+		for (i = 0; i < segs; ++i) {
+			new_mb = hv_rxmbuf_alloc(rxq->mb_pool);
+			if (unlikely(!new_mb)) {
+				PMD_PERROR_LOG(hv, DBG_RX, "mbuf alloc fail");
+				++hv->stats.rx_nombuf;
+				return nb_rx;
+			}
+
+			rx_mbuf = rxq->sw_ring[rxq->rx_head];
+			rxq->sw_ring[rxq->rx_head] = new_mb;
+
+			if (++rxq->rx_head == rxq->nb_rx_desc)
+				rxq->rx_head = 0;
+
+			rx_mbuf->ol_flags |= PKT_RX_IPV4_HDR;
+			rx_mbuf->port = rxq->port_id;
+		}
+		rx_mbuf->next = NULL;
+
+		rx_pkts[nb_rx++] = first_mbuf;
+		++hv->stats.ipkts;
+		hv->stats.ibytes += first_mbuf->pkt_len;
+	}
+
+	return nb_rx;
+}
+
+static void hyperv_txeof(struct hv_tx_queue *txq)
+{
+	struct rte_mbuf *mb, *mb_next;
+
+	txq->tx_avail += txq->tx_free;
+	while (txq->tx_free) {
+		--txq->tx_free;
+		mb = txq->sw_ring[txq->tx_head];
+		while (mb) {
+			mb_next = mb->next;
+			rte_mempool_put(mb->pool, mb);
+			mb = mb_next;
+		}
+		if (++txq->tx_head == txq->nb_tx_desc)
+			txq->tx_head = 0;
+	}
+}
+
+uint16_t
+hyperv_xmit_pkts(void *ptxq, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	struct hv_tx_queue *txq = ptxq;
+	struct hv_data *hv = txq->hv;
+	struct netvsc_packet *packet;
+	struct rte_mbuf *m;
+	uint32_t data_pages;
+	uint64_t first_data_page;
+	uint32_t total_len;
+	uint32_t len;
+	uint16_t i, nb_tx;
+	uint8_t rndis_pages;
+	int ret;
+
+	if (unlikely(hv->closed))
+		return 0;
+
+	for (nb_tx = 0; nb_tx < nb_pkts; ++nb_tx) {
+		hyperv_scan_comps(hv, 0);
+		/* Determine if the descriptor ring needs to be cleaned. */
+		if (txq->tx_free > txq->tx_free_thresh)
+			hyperv_txeof(txq);
+
+		if (!txq->tx_avail) {
+			hyperv_scan_comps(hv, 1);
+			hyperv_txeof(txq);
+			if (!txq->tx_avail) {
+				PMD_PWARN_LOG(hv, DBG_TX, "No TX mbuf available");
+				break;
+			}
+		}
+		m = tx_pkts[nb_tx];
+		len = m->data_len;
+		total_len = m->pkt_len;
+		first_data_page = RTE_MBUF_DATA_DMA_ADDR(m) >> PAGE_SHIFT;
+		data_pages = ((RTE_MBUF_DATA_DMA_ADDR(m) + len - 1) >> PAGE_SHIFT) -
+			first_data_page + 1;
+
+		packet = txq->pkts[txq->tx_tail];
+		rndis_pages = packet->ext_pages;
+
+		txq->sw_ring[txq->tx_tail] = m;
+		packet->tot_data_buf_len = total_len;
+		packet->page_buffers[rndis_pages].pfn =
+			RTE_MBUF_DATA_DMA_ADDR(m) >> PAGE_SHIFT;
+		packet->page_buffers[rndis_pages].offset =
+			RTE_MBUF_DATA_DMA_ADDR(m) & (PAGE_SIZE - 1);
+		if (data_pages == 1)
+			packet->page_buffers[rndis_pages].length = len;
+		else
+			packet->page_buffers[rndis_pages].length = PAGE_SIZE -
+				packet->page_buffers[rndis_pages].offset;
+
+		for (i = 1; i < data_pages; ++i) {
+			packet->page_buffers[rndis_pages + i].pfn = first_data_page + i;
+			packet->page_buffers[rndis_pages + i].offset = 0;
+			packet->page_buffers[rndis_pages + i].length = PAGE_SIZE;
+		}
+		if (data_pages > 1)
+			packet->page_buffers[rndis_pages - 1 + data_pages].length =
+				((rte_pktmbuf_mtod(m, unsigned long) + len - 1)
+				 & (PAGE_SIZE - 1)) + 1;
+
+		uint16_t index = data_pages + rndis_pages;
+
+		for (i = 1; i < m->nb_segs; ++i) {
+			m = m->next;
+			len = m->data_len;
+			first_data_page = RTE_MBUF_DATA_DMA_ADDR(m) >> PAGE_SHIFT;
+			data_pages = ((RTE_MBUF_DATA_DMA_ADDR(m) + len - 1) >> PAGE_SHIFT) -
+				first_data_page + 1;
+			packet->page_buffers[index].pfn =
+				RTE_MBUF_DATA_DMA_ADDR(m) >> PAGE_SHIFT;
+			packet->page_buffers[index].offset =
+				rte_pktmbuf_mtod(m, unsigned long)
+				& (PAGE_SIZE - 1);
+			packet->page_buffers[index].length = m->data_len;
+			if (data_pages > 1) {
+				/* It can be 2 in case of usual mbuf_size=2048 */
+				packet->page_buffers[index].length = PAGE_SIZE -
+					packet->page_buffers[index].offset;
+				packet->page_buffers[++index].offset = 0;
+				packet->page_buffers[index].pfn =
+					packet->page_buffers[index - 1].pfn + 1;
+				packet->page_buffers[index].length =
+					m->data_len
+					- packet->page_buffers[index - 1].length;
+			}
+			++index;
+		}
+		packet->page_buf_count = index;
+
+		ret = hv_rf_on_send(hv, packet);
+		if (likely(ret == 0)) {
+			++hv->stats.opkts;
+			hv->stats.obytes += total_len;
+			if (++txq->tx_tail == txq->nb_tx_desc)
+				txq->tx_tail = 0;
+			--txq->tx_avail;
+		} else {
+			++hv->stats.oerrors;
+			PMD_PERROR_LOG(hv, DBG_TX, "TX ring buffer is busy");
+		}
+	}
+
+	return nb_tx;
+}
diff --git a/lib/librte_pmd_hyperv/hyperv_rxtx.h b/lib/librte_pmd_hyperv/hyperv_rxtx.h
new file mode 100644
index 0000000..c45a704
--- /dev/null
+++ b/lib/librte_pmd_hyperv/hyperv_rxtx.h
@@ -0,0 +1,35 @@
+/*-
+ *   Copyright(c) 2013-2015 Brocade Communications Systems, Inc.
+ *   All rights reserved.
+ */
+
+/**
+ * Structure associated with each TX queue.
+ */
+struct hv_tx_queue {
+	struct netvsc_packet    **pkts;
+	struct rte_mbuf         **sw_ring;
+	uint16_t                nb_tx_desc;
+	uint16_t                tx_avail;
+	uint16_t                tx_head;
+	uint16_t                tx_tail;
+	uint16_t                tx_free_thresh;
+	uint16_t                tx_free;
+	uint8_t                 port_id;
+	struct hv_data          *hv;
+} __rte_cache_aligned;
+
+/**
+ * Structure associated with each RX queue.
+ */
+struct hv_rx_queue {
+	struct rte_mempool      *mb_pool;
+	struct rte_mbuf         **sw_ring;
+	uint16_t                nb_rx_desc;
+	uint16_t                rx_head;
+	uint16_t                rx_tail;
+	uint16_t                rx_free_thresh;
+	uint8_t                 port_id;
+	struct hv_data          *hv;
+	struct hv_vm_packet_descriptor *desc;
+} __rte_cache_aligned;
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 62a76ae..e0416d1 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -133,6 +133,10 @@ LDLIBS += -lm
 LDLIBS += -lrt
 endif
 
+ifeq ($(CONFIG_RTE_LIBRTE_HV_PMD),y)
+LDLIBS += -lrte_pmd_hyperv
+endif
+
 ifeq ($(CONFIG_RTE_LIBRTE_VHOST), y)
 LDLIBS += -lrte_vhost
 endif
-- 
2.1.4

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-dev] [PATCH v4 5/7] hv: poll mode driver
  2015-04-21 17:32 ` [dpdk-dev] [PATCH v4 5/7] hv: poll mode driver Stephen Hemminger
@ 2015-04-21 19:34   ` Butler, Siobhan A
  2015-04-21 21:35     ` Stephen Hemminger
  2015-07-09  0:05   ` Thomas Monjalon
  1 sibling, 1 reply; 17+ messages in thread
From: Butler, Siobhan A @ 2015-04-21 19:34 UTC (permalink / raw)
  To: Stephen Hemminger, alexmay; +Cc: dev, Stas Egorov, Stephen Hemminger

Hi Stephen 
Will you have documentation to go along with these changes?
Thanks
Siobhan

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stephen
> Hemminger
> Sent: Tuesday, April 21, 2015 6:33 PM
> To: alexmay@microsoft.com
> Cc: dev@dpdk.org; Stas Egorov; Stephen Hemminger
> Subject: [dpdk-dev] [PATCH v4 5/7] hv: poll mode driver
> 
> From: Stephen Hemminger <shemming@brocade.com>
> 
> This is new Poll Mode driver for using hyper-v virtual network
> interface.
> 
> Signed-off-by: Stas Egorov <segorov@mirantis.com>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  lib/Makefile                          |    1 +
>  lib/librte_pmd_hyperv/Makefile        |   28 +
>  lib/librte_pmd_hyperv/hyperv.h        |  169 ++++
>  lib/librte_pmd_hyperv/hyperv_drv.c    | 1653
> +++++++++++++++++++++++++++++++++
>  lib/librte_pmd_hyperv/hyperv_drv.h    |  558 +++++++++++
>  lib/librte_pmd_hyperv/hyperv_ethdev.c |  332 +++++++
>  lib/librte_pmd_hyperv/hyperv_logs.h   |   69 ++
>  lib/librte_pmd_hyperv/hyperv_rxtx.c   |  403 ++++++++
>  lib/librte_pmd_hyperv/hyperv_rxtx.h   |   35 +
>  mk/rte.app.mk                         |    4 +
>  10 files changed, 3252 insertions(+)
>  create mode 100644 lib/librte_pmd_hyperv/Makefile
>  create mode 100644 lib/librte_pmd_hyperv/hyperv.h
>  create mode 100644 lib/librte_pmd_hyperv/hyperv_drv.c
>  create mode 100644 lib/librte_pmd_hyperv/hyperv_drv.h
>  create mode 100644 lib/librte_pmd_hyperv/hyperv_ethdev.c
>  create mode 100644 lib/librte_pmd_hyperv/hyperv_logs.h
>  create mode 100644 lib/librte_pmd_hyperv/hyperv_rxtx.c
>  create mode 100644 lib/librte_pmd_hyperv/hyperv_rxtx.h
> 
> diff --git a/lib/Makefile b/lib/Makefile
> index d94355d..6c1daf2 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -47,6 +47,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_I40E_PMD) +=
> librte_pmd_i40e
>  DIRS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += librte_pmd_fm10k
>  DIRS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += librte_pmd_mlx4
>  DIRS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += librte_pmd_enic
> +DIRS-$(CONFIG_RTE_LIBRTE_HV_PMD) += librte_pmd_hyperv
>  DIRS-$(CONFIG_RTE_LIBRTE_PMD_BOND) += librte_pmd_bond
>  DIRS-$(CONFIG_RTE_LIBRTE_PMD_RING) += librte_pmd_ring
>  DIRS-$(CONFIG_RTE_LIBRTE_PMD_PCAP) += librte_pmd_pcap
> diff --git a/lib/librte_pmd_hyperv/Makefile
> b/lib/librte_pmd_hyperv/Makefile
> new file mode 100644
> index 0000000..4ba08c8
> --- /dev/null
> +++ b/lib/librte_pmd_hyperv/Makefile
> @@ -0,0 +1,28 @@
> +#   BSD LICENSE
> +#
> +#   Copyright(c) 2013-2015 Brocade Communications Systems, Inc.
> +#   All rights reserved.
> +
> +include $(RTE_SDK)/mk/rte.vars.mk
> +
> +#
> +# library name
> +#
> +LIB = librte_pmd_hyperv.a
> +
> +CFLAGS += -O3
> +CFLAGS += $(WERROR_FLAGS)
> +
> +#
> +# all source are stored in SRCS-y
> +#
> +SRCS-$(CONFIG_RTE_LIBRTE_HV_PMD) += hyperv_ethdev.c
> +SRCS-$(CONFIG_RTE_LIBRTE_HV_PMD) += hyperv_rxtx.c
> +SRCS-$(CONFIG_RTE_LIBRTE_HV_PMD) += hyperv_drv.c
> +
> +# this lib depends upon:
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_HV_PMD) += lib/librte_eal lib/librte_ether
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_HV_PMD) += lib/librte_mempool
> lib/librte_mbuf
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_HV_PMD) += lib/librte_malloc
> +
> +include $(RTE_SDK)/mk/rte.lib.mk
> diff --git a/lib/librte_pmd_hyperv/hyperv.h
> b/lib/librte_pmd_hyperv/hyperv.h
> new file mode 100644
> index 0000000..5f66d8a
> --- /dev/null
> +++ b/lib/librte_pmd_hyperv/hyperv.h
> @@ -0,0 +1,169 @@
> +/*-
> + * Copyright (c) 2013-2015 Brocade Communications Systems, Inc.
> + * All rights reserved.
> + */
> +
> +#ifndef _HYPERV_H_
> +#define _HYPERV_H_
> +
> +#include <sys/param.h>
> +#include <rte_log.h>
> +#include <rte_debug.h>
> +#include <rte_ether.h>
> +#include <rte_ethdev.h>
> +#include <rte_malloc.h>
> +#include <rte_memzone.h>
> +#include <rte_cycles.h>
> +#include <rte_dev.h>
> +
> +#include "hyperv_logs.h"
> +
> +#define PAGE_SHIFT		12
> +#define PAGE_SIZE		(1 << PAGE_SHIFT)
> +
> +/*
> + * Tunable ethdev params
> + */
> +#define HV_MIN_RX_BUF_SIZE 1024
> +#define HV_MAX_RX_PKT_LEN  4096
> +#define HV_MAX_MAC_ADDRS   1
> +#define HV_MAX_RX_QUEUES   1
> +#define HV_MAX_TX_QUEUES   1
> +#define HV_MAX_PKT_BURST   32
> +#define HV_MAX_LINK_REQ    10
> +
> +/*
> + * List of resources mapped from kspace
> + * need to be the same as defined in hv_uio.c
> + */
> +enum {
> +	TXRX_RING_MAP,
> +	INT_PAGE_MAP,
> +	MON_PAGE_MAP,
> +	RECV_BUF_MAP
> +};
> +
> +/*
> + * Statistics
> + */
> +struct hv_stats {
> +	uint64_t opkts;
> +	uint64_t obytes;
> +	uint64_t oerrors;
> +
> +	uint64_t ipkts;
> +	uint64_t ibytes;
> +	uint64_t ierrors;
> +	uint64_t rx_nombuf;
> +};
> +
> +struct hv_data;
> +struct netvsc_packet;
> +struct rndis_msg;
> +typedef void (*receive_callback_t)(struct hv_data *hv, struct rndis_msg
> *msg,
> +		struct netvsc_packet *pkt);
> +
> +/*
> + * Main driver structure
> + */
> +struct hv_data {
> +	int vmbus_device;
> +	uint8_t monitor_bit;
> +	uint8_t monitor_group;
> +	uint8_t kernel_initialized;
> +	int uio_fd;
> +	/* Flag indicates channel state. If closed, RX/TX shouldn't work
> further */
> +	uint8_t closed;
> +	/* Flag indicates whether HALT rndis request was received by host */
> +	uint8_t hlt_req_sent;
> +	/* Flag indicates pending state for HALT request */
> +	uint8_t hlt_req_pending;
> +	/* Counter for RNDIS requests */
> +	uint32_t new_request_id;
> +	/* State of RNDIS device */
> +	uint8_t rndis_dev_state;
> +	/* Number of transmitted packets but not completed yet by Hyper-V
> */
> +	int num_outstanding_sends;
> +	/* Max pkt len to fit in rx mbufs */
> +	uint32_t max_rx_pkt_len;
> +
> +	uint8_t jumbo_frame_support;
> +
> +	struct hv_vmbus_ring_buffer *in;
> +	struct hv_vmbus_ring_buffer *out;
> +
> +	/* Size of each ring_buffer(in/out) */
> +	uint32_t rb_size;
> +	/* Size of data in each ring_buffer(in/out) */
> +	uint32_t rb_data_size;
> +
> +	void *int_page;
> +	struct hv_vmbus_monitor_page *monitor_pages;
> +	void *recv_interrupt_page;
> +	void *send_interrupt_page;
> +	void *ring_pages;
> +	void *recv_buf;
> +
> +	uint8_t link_req_cnt;
> +	uint32_t link_status;
> +	uint8_t  hw_mac_addr[ETHER_ADDR_LEN];
> +	struct rndis_request *req;
> +	struct netvsc_packet *netvsc_packet;
> +	struct nvsp_msg *rx_comp_msg;
> +	struct hv_rx_queue *rxq;
> +	struct hv_tx_queue *txq;
> +	struct hv_vm_packet_descriptor *desc;
> +	receive_callback_t receive_callback;
> +	int pkt_rxed;
> +
> +	uint32_t debug;
> +	struct hv_stats stats;
> +};
> +
> +/*
> + * Extern functions declarations
> + */
> +int hyperv_dev_tx_queue_setup(struct rte_eth_dev *dev,
> +			 uint16_t queue_idx,
> +			 uint16_t nb_desc,
> +			 unsigned int socket_id,
> +			 const struct rte_eth_txconf *tx_conf);
> +
> +void hyperv_dev_tx_queue_release(void *ptxq);
> +
> +int hyperv_dev_rx_queue_setup(struct rte_eth_dev *dev,
> +			 uint16_t queue_idx,
> +			 uint16_t nb_desc,
> +			 unsigned int socket_id,
> +			 const struct rte_eth_rxconf *rx_conf,
> +			 struct rte_mempool *mp);
> +
> +void hyperv_dev_rx_queue_release(void *prxq);
> +
> +uint16_t
> +hyperv_recv_pkts(void *prxq,
> +		 struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
> +
> +uint16_t
> +hyperv_xmit_pkts(void *ptxq,
> +		 struct rte_mbuf **tx_pkts, uint16_t nb_pkts);
> +
> +int hv_rf_on_device_add(struct hv_data *hv);
> +int hv_rf_on_device_remove(struct hv_data *hv);
> +int hv_rf_on_send(struct hv_data *hv, struct netvsc_packet *pkt);
> +int hv_rf_on_open(struct hv_data *hv);
> +int hv_rf_on_close(struct hv_data *hv);
> +int hv_rf_set_device_mac(struct hv_data *hv, uint8_t *mac);
> +void hyperv_start_rx(struct hv_data *hv);
> +void hyperv_stop_rx(struct hv_data *hv);
> +int hyperv_get_buffer(struct hv_data *hv, void *buffer, uint32_t
> bufferlen);
> +void hyperv_scan_comps(struct hv_data *hv, int allow_rx_drop);
> +uint8_t hyperv_get_link_status(struct hv_data *hv);
> +int hyperv_set_rx_mode(struct hv_data *hv, uint8_t promisc, uint8_t
> mcast);
> +
> +inline int rte_hv_dev_atomic_write_link_status(struct rte_eth_dev *dev,
> +		struct rte_eth_link *link);
> +inline int rte_hv_dev_atomic_read_link_status(struct rte_eth_dev *dev,
> +		struct rte_eth_link *link);
> +
> +#endif /* _HYPERV_H_ */
> diff --git a/lib/librte_pmd_hyperv/hyperv_drv.c
> b/lib/librte_pmd_hyperv/hyperv_drv.c
> new file mode 100644
> index 0000000..4a37966
> --- /dev/null
> +++ b/lib/librte_pmd_hyperv/hyperv_drv.c
> @@ -0,0 +1,1653 @@
> +/*-
> + * Copyright (c) 2009-2012 Microsoft Corp.
> + * Copyright (c) 2010-2012 Citrix Inc.
> + * Copyright (c) 2012 NetApp Inc.
> + * Copyright (c) 2013-2015 Brocade Communications Systems, Inc.
> + * All rights reserved.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + * 1. Redistributions of source code must retain the above copyright
> + *    notice unmodified, this list of conditions, and the following
> + *    disclaimer.
> + * 2. Redistributions in binary form must reproduce the above copyright
> + *    notice, this list of conditions and the following disclaimer in the
> + *    documentation and/or other materials provided with the distribution.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
> OR
> + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
> WARRANTIES
> + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
> DISCLAIMED.
> + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
> + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
> (INCLUDING, BUT
> + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
> LOSS OF USE,
> + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
> AND ON ANY
> + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
> THE USE OF
> + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> DAMAGE.
> + *
> + */
> +
> +#include "hyperv.h"
> +#include "hyperv_drv.h"
> +#include "hyperv_rxtx.h"
> +
> +#define LOOP_CNT 10000
> +#define MAC_STRLEN 14
> +#define MAC_PARAM_STR "NetworkAddress"
> +
> +#define hex "0123456789abcdef"
> +#define high(x) hex[(x & 0xf0) >> 4]
> +#define low(x) hex[x & 0x0f]
> +
> +static int hv_rf_on_receive(struct hv_data *hv, struct netvsc_packet *pkt);
> +
> +/*
> + * Ring buffer
> + */
> +
> +/* Amount of space to write to */
> +#define HV_BYTES_AVAIL_TO_WRITE(r, w, z) \
> +	(((w) >= (r)) ? ((z) - ((w) - (r))) : ((r) - (w)))
> +
> +/*
> + * Get number of bytes available to read and to write to
> + * for the specified ring buffer
> + */
> +static inline void
> +get_ring_buffer_avail_bytes(
> +	struct hv_data               *hv,
> +	struct hv_vmbus_ring_buffer  *ring_buffer,
> +	uint32_t                     *read,
> +	uint32_t                     *write)
> +{
> +	rte_compiler_barrier();
> +
> +	/*
> +	 * Capture the read/write indices before they changed
> +	 */
> +	uint32_t read_loc = ring_buffer->read_index;
> +	uint32_t write_loc = ring_buffer->write_index;
> +
> +	*write = HV_BYTES_AVAIL_TO_WRITE(
> +			read_loc, write_loc, hv->rb_data_size);
> +	*read = hv->rb_data_size - *write;
> +}
> +
> +/*
> + * Helper routine to copy from source to ring buffer.
> + *
> + * Assume there is enough room. Handles wrap-around in dest case only!
> + */
> +static uint32_t
> +copy_to_ring_buffer(
> +	struct hv_vmbus_ring_buffer  *ring_buffer,
> +	uint32_t                     ring_buffer_size,
> +	uint32_t                     start_write_offset,
> +	char                         *src,
> +	uint32_t                     src_len)
> +{
> +	char *ring_buf = (char *)ring_buffer->buffer;
> +	uint32_t fragLen;
> +
> +	if (src_len > ring_buffer_size - start_write_offset)  {
> +		/* wrap-around detected! */
> +		fragLen = ring_buffer_size - start_write_offset;
> +		rte_memcpy(ring_buf + start_write_offset, src, fragLen);
> +		rte_memcpy(ring_buf, src + fragLen, src_len - fragLen);
> +	} else {
> +		rte_memcpy(ring_buf + start_write_offset, src, src_len);
> +	}
> +
> +	start_write_offset += src_len;
> +	start_write_offset %= ring_buffer_size;
> +
> +	return start_write_offset;
> +}
> +
> +/*
> + * Helper routine to copy to dest from ring buffer.
> + *
> + * Assume there is enough room. Handles wrap-around in src case only!
> + */
> +static uint32_t
> +copy_from_ring_buffer(
> +	struct hv_data               *hv,
> +	struct hv_vmbus_ring_buffer  *ring_buffer,
> +	char                         *dest,
> +	uint32_t                     dest_len,
> +	uint32_t                     start_read_offset)
> +{
> +	uint32_t fragLen;
> +	char *ring_buf = (char *)ring_buffer->buffer;
> +
> +	if (dest_len > hv->rb_data_size - start_read_offset) {
> +		/*  wrap-around detected at the src */
> +		fragLen = hv->rb_data_size - start_read_offset;
> +		rte_memcpy(dest, ring_buf + start_read_offset, fragLen);
> +		rte_memcpy(dest + fragLen, ring_buf, dest_len - fragLen);
> +	} else {
> +		rte_memcpy(dest, ring_buf + start_read_offset, dest_len);
> +	}
> +
> +	start_read_offset += dest_len;
> +	start_read_offset %= hv->rb_data_size;
> +
> +	return start_read_offset;
> +}
> +
> +/*
> + * Write to the ring buffer.
> + */
> +static int
> +hv_ring_buffer_write(
> +	struct hv_data                 *hv,
> +	struct hv_vmbus_sg_buffer_list sg_buffers[],
> +	uint32_t                       sg_buffer_count)
> +{
> +	struct hv_vmbus_ring_buffer *ring_buffer = hv->out;
> +	uint32_t i = 0;
> +	uint32_t byte_avail_to_write;
> +	uint32_t byte_avail_to_read;
> +	uint32_t total_bytes_to_write = 0;
> +	volatile uint32_t next_write_location;
> +	uint64_t prev_indices = 0;
> +
> +	for (i = 0; i < sg_buffer_count; i++)
> +		total_bytes_to_write += sg_buffers[i].length;
> +
> +	total_bytes_to_write += sizeof(uint64_t);
> +
> +	get_ring_buffer_avail_bytes(hv, ring_buffer, &byte_avail_to_read,
> +			&byte_avail_to_write);
> +
> +	/*
> +	 * If there is only room for the packet, assume it is full.
> +	 * Otherwise, the next time around, we think the ring buffer
> +	 * is empty since the read index == write index
> +	 */
> +	if (byte_avail_to_write <= total_bytes_to_write) {
> +		PMD_PERROR_LOG(hv, DBG_RB,
> +				"byte_avail_to_write = %u,
> total_bytes_to_write = %u",
> +				byte_avail_to_write, total_bytes_to_write);
> +		return -EAGAIN;
> +	}
> +
> +	/*
> +	 * Write to the ring buffer
> +	 */
> +	next_write_location = ring_buffer->write_index;
> +
> +	for (i = 0; i < sg_buffer_count; i++) {
> +		next_write_location = copy_to_ring_buffer(ring_buffer,
> +				hv->rb_data_size, next_write_location,
> +				(char *) sg_buffers[i].data,
> sg_buffers[i].length);
> +	}
> +
> +	/*
> +	 * Set previous packet start
> +	 */
> +	prev_indices = (uint64_t)ring_buffer->write_index << 32;
> +
> +	next_write_location = copy_to_ring_buffer(
> +			ring_buffer, hv->rb_data_size, next_write_location,
> +			(char *) &prev_indices, sizeof(uint64_t));
> +
> +	/*
> +	 * Make sure we flush all writes before updating the writeIndex
> +	 */
> +	rte_compiler_barrier();
> +
> +	/*
> +	 * Now, update the write location
> +	 */
> +	ring_buffer->write_index = next_write_location;
> +
> +	return 0;
> +}
> +
> +/*
> + * Read without advancing the read index.
> + */
> +static int
> +hv_ring_buffer_peek(struct hv_data  *hv, void *buffer, uint32_t
> buffer_len)
> +{
> +	struct hv_vmbus_ring_buffer *ring_buffer = hv->in;
> +	uint32_t bytesAvailToWrite;
> +	uint32_t bytesAvailToRead;
> +
> +	get_ring_buffer_avail_bytes(hv, ring_buffer,
> +			&bytesAvailToRead,
> +			&bytesAvailToWrite);
> +
> +	/* Make sure there is something to read */
> +	if (bytesAvailToRead < buffer_len)
> +		return -EAGAIN;
> +
> +	copy_from_ring_buffer(hv, ring_buffer,
> +		(char *)buffer, buffer_len, ring_buffer->read_index);
> +
> +	return 0;
> +}
> +
> +/*
> + * Read and advance the read index.
> + */
> +static int
> +hv_ring_buffer_read(struct hv_data  *hv, void *buffer,
> +		    uint32_t buffer_len, uint32_t offset)
> +{
> +	struct hv_vmbus_ring_buffer *ring_buffer = hv->in;
> +	uint32_t bytes_avail_to_write;
> +	uint32_t bytes_avail_to_read;
> +	uint32_t next_read_location = 0;
> +	uint64_t prev_indices = 0;
> +
> +	if (buffer_len <= 0)
> +		return -EINVAL;
> +
> +	get_ring_buffer_avail_bytes(
> +			hv,
> +			ring_buffer,
> +			&bytes_avail_to_read,
> +			&bytes_avail_to_write);
> +
> +	/*
> +	 * Make sure there is something to read
> +	 */
> +	if (bytes_avail_to_read < buffer_len) {
> +		PMD_PERROR_LOG(hv, DBG_RB, "bytes_avail_to_read =
> %u, buffer_len = %u",
> +				bytes_avail_to_read, buffer_len);
> +		return -EAGAIN;
> +	}
> +
> +	next_read_location = (ring_buffer->read_index + offset) % hv-
> >rb_data_size;
> +
> +	next_read_location = copy_from_ring_buffer(
> +			hv,
> +			ring_buffer,
> +			(char *) buffer,
> +			buffer_len,
> +			next_read_location);
> +
> +	next_read_location = copy_from_ring_buffer(
> +			hv,
> +			ring_buffer,
> +			(char *) &prev_indices,
> +			sizeof(uint64_t),
> +			next_read_location);
> +
> +	/*
> +	 * Make sure all reads are done before we update the read index
> since
> +	 * the writer may start writing to the read area once the read index
> +	 * is updated.
> +	 */
> +	rte_compiler_barrier();
> +
> +	/*
> +	 * Update the read index
> +	 */
> +	ring_buffer->read_index = next_read_location;
> +
> +	return 0;
> +}
> +
> +/*
> + * VMBus
> + */
> +
> +/*
> + * Retrieve the raw packet on the specified channel
> + */
> +static int
> +hv_vmbus_channel_recv_packet_raw(struct hv_data  *hv, void *buffer,
> +				 uint32_t        buffer_len,
> +				 uint32_t        *buffer_actual_len,
> +				 uint64_t        *request_id,
> +				 int             mode)
> +{
> +	int ret;
> +	uint32_t packetLen;
> +	struct hv_vm_packet_descriptor desc;
> +
> +	*buffer_actual_len = 0;
> +	*request_id = 0;
> +
> +	ret = hv_ring_buffer_peek(hv, &desc,
> +			sizeof(struct hv_vm_packet_descriptor));
> +
> +	if (ret != 0)
> +		return 0;
> +
> +	if ((desc.type ==
> HV_VMBUS_PACKET_TYPE_DATA_USING_TRANSFER_PAGES
> +				&& !(mode & 1)) ||
> +			((desc.type ==
> HV_VMBUS_PACKET_TYPE_COMPLETION) && !(mode & 2))) {
> +		return -1;
> +	}
> +
> +	packetLen = desc.length8 << 3;
> +
> +	*buffer_actual_len = packetLen;
> +
> +	if (unlikely(packetLen > buffer_len)) {
> +		PMD_PERROR_LOG(hv, DBG_RX, "The buffer desc is too big,
> will drop it");
> +		return -ENOMEM;
> +	}
> +
> +	*request_id = desc.transaction_id;
> +
> +	/* Copy over the entire packet to the user buffer */
> +	ret = hv_ring_buffer_read(hv, buffer, packetLen, 0);
> +
> +	return 0;
> +}
> +
> +/*
> + * Trigger an event notification on the specified channel
> + */
> +static void
> +vmbus_channel_set_event(struct hv_data *hv)
> +{
> +	/* Here we assume that channel->offer_msg.monitor_allocated ==
> 1,
> +	 * in another case our driver will not work */
> +	/* Each uint32_t represents 32 channels */
> +	__sync_or_and_fetch(((uint32_t *)hv->send_interrupt_page
> +		+ ((hv->vmbus_device >> 5))), 1 << (hv->vmbus_device &
> 31)
> +	);
> +	__sync_or_and_fetch((uint32_t *)&hv->monitor_pages->
> +			trigger_group[hv->monitor_group].u.pending, 1 <<
> hv->monitor_bit);
> +}
> +
> +/**
> + * @brief Send the specified buffer on the given channel
> + */
> +static int
> +hv_vmbus_channel_send_packet(struct hv_data *hv, void *buffer,
> +			     uint32_t buffer_len, uint64_t request_id,
> +			     enum hv_vmbus_packet_type type,
> +			     uint32_t flags)
> +{
> +	struct hv_vmbus_sg_buffer_list buffer_list[3];
> +	struct hv_vm_packet_descriptor desc;
> +	uint32_t packet_len_aligned;
> +	uint64_t aligned_data;
> +	uint32_t packet_len;
> +	int ret = 0;
> +	uint32_t old_write = hv->out->write_index;
> +
> +	packet_len = sizeof(struct hv_vm_packet_descriptor) + buffer_len;
> +	packet_len_aligned = HV_ALIGN_UP(packet_len, sizeof(uint64_t));
> +	aligned_data = 0;
> +
> +	/* Setup the descriptor */
> +	desc.type = type;   /* HV_VMBUS_PACKET_TYPE_DATA_IN_BAND;
> */
> +	desc.flags = flags; /*
> HV_VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED */
> +	/* in 8-bytes granularity */
> +	desc.data_offset8 = sizeof(struct hv_vm_packet_descriptor) >> 3;
> +	desc.length8 = (uint16_t) (packet_len_aligned >> 3);
> +	desc.transaction_id = request_id;
> +
> +	buffer_list[0].data = &desc;
> +	buffer_list[0].length = sizeof(struct hv_vm_packet_descriptor);
> +
> +	buffer_list[1].data = buffer;
> +	buffer_list[1].length = buffer_len;
> +
> +	buffer_list[2].data = &aligned_data;
> +	buffer_list[2].length = packet_len_aligned - packet_len;
> +
> +	ret = hv_ring_buffer_write(hv, buffer_list, 3);
> +
> +	rte_mb();
> +	if (!ret && !hv->out->interrupt_mask && hv->out->read_index ==
> old_write)
> +		vmbus_channel_set_event(hv);
> +
> +	return ret;
> +}
> +
> +/*
> + * Send a range of single-page buffer packets using
> + * a GPADL Direct packet type
> + */
> +static int
> +hv_vmbus_channel_send_packet_pagebuffer(
> +	struct hv_data  *hv,
> +	struct hv_vmbus_page_buffer	page_buffers[],
> +	uint32_t		page_count,
> +	void			*buffer,
> +	uint32_t		buffer_len,
> +	uint64_t		request_id)
> +{
> +
> +	int ret = 0;
> +	uint32_t packet_len, packetLen_aligned, descSize, i = 0;
> +	struct hv_vmbus_sg_buffer_list buffer_list[3];
> +	struct hv_vmbus_channel_packet_page_buffer desc;
> +	uint64_t alignedData = 0;
> +	uint32_t old_write = hv->out->write_index;
> +
> +	if (page_count > HV_MAX_PAGE_BUFFER_COUNT) {
> +		PMD_PERROR_LOG(hv, DBG_VMBUS, "page_count %u goes
> out of the limit",
> +				page_count);
> +		return -EINVAL;
> +	}
> +
> +	/*
> +	 * Adjust the size down since
> hv_vmbus_channel_packet_page_buffer
> +	 * is the largest size we support
> +	 */
> +	descSize = sizeof(struct hv_vmbus_channel_packet_page_buffer) -
> +		((HV_MAX_PAGE_BUFFER_COUNT - page_count) *
> +		 sizeof(struct hv_vmbus_page_buffer));
> +	packet_len = descSize + buffer_len;
> +	packetLen_aligned = HV_ALIGN_UP(packet_len, sizeof(uint64_t));
> +
> +	/* Setup the descriptor */
> +	desc.type = HV_VMBUS_PACKET_TYPE_DATA_USING_GPA_DIRECT;
> +	desc.flags =
> HV_VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED;
> +	desc.data_offset8 = descSize >> 3; /* in 8-bytes granularity */
> +	desc.length8 = (uint16_t) (packetLen_aligned >> 3);
> +	desc.transaction_id = request_id;
> +	desc.range_count = page_count;
> +
> +	for (i = 0; i < page_count; i++) {
> +		desc.range[i].length = page_buffers[i].length;
> +		desc.range[i].offset = page_buffers[i].offset;
> +		desc.range[i].pfn = page_buffers[i].pfn;
> +	}
> +
> +	buffer_list[0].data = &desc;
> +	buffer_list[0].length = descSize;
> +
> +	buffer_list[1].data = buffer;
> +	buffer_list[1].length = buffer_len;
> +
> +	buffer_list[2].data = &alignedData;
> +	buffer_list[2].length = packetLen_aligned - packet_len;
> +
> +	ret = hv_ring_buffer_write(hv, buffer_list, 3);
> +	if (likely(ret == 0))
> +		++hv->num_outstanding_sends;
> +
> +	rte_mb();
> +	if (!ret && !hv->out->interrupt_mask &&
> +			hv->out->read_index == old_write)
> +		vmbus_channel_set_event(hv);
> +
> +	return ret;
> +}
> +
> +/*
> + * NetVSC
> + */
> +
> +/*
> + * Net VSC on send
> + * Sends a packet on the specified Hyper-V device.
> + * Returns 0 on success, non-zero on failure.
> + */
> +static int
> +hv_nv_on_send(struct hv_data *hv, struct netvsc_packet *pkt)
> +{
> +	struct nvsp_msg send_msg;
> +	int ret;
> +
> +	send_msg.msg_type = nvsp_msg_1_type_send_rndis_pkt;
> +	if (pkt->is_data_pkt) {
> +		/* 0 is RMC_DATA */
> +		send_msg.msgs.send_rndis_pkt.chan_type = 0;
> +	} else {
> +		/* 1 is RMC_CONTROL */
> +		send_msg.msgs.send_rndis_pkt.chan_type = 1;
> +	}
> +
> +	/* Not using send buffer section */
> +	send_msg.msgs.send_rndis_pkt.send_buf_section_idx =
> +	    0xFFFFFFFF;
> +	send_msg.msgs.send_rndis_pkt.send_buf_section_size = 0;
> +
> +	if (likely(pkt->page_buf_count)) {
> +		ret = hv_vmbus_channel_send_packet_pagebuffer(hv,
> +				pkt->page_buffers, pkt->page_buf_count,
> +				&send_msg, sizeof(struct nvsp_msg),
> +				(uint64_t)pkt->is_data_pkt ? (hv->txq-
> >tx_tail + 1) : 0);
> +	} else {
> +		PMD_PERROR_LOG(hv, DBG_TX, "pkt->page_buf_count
> value can't be zero");
> +		ret = -1;
> +	}
> +
> +	return ret;
> +}
> +
> +/*
> + * Net VSC on receive
> + *
> + * This function deals exclusively with virtual addresses.
> + */
> +static void
> +hv_nv_on_receive(struct hv_data *hv, struct hv_vm_packet_descriptor
> *pkt)
> +{
> +	struct hv_vm_transfer_page_packet_header *vm_xfer_page_pkt;
> +	struct nvsp_msg *nvsp_msg_pkt;
> +	struct netvsc_packet *net_vsc_pkt = NULL;
> +	unsigned long start;
> +	int count, i;
> +
> +	nvsp_msg_pkt = (struct nvsp_msg *)((unsigned long)pkt
> +			+ (pkt->data_offset8 << 3));
> +
> +	/* Make sure this is a valid nvsp packet */
> +	if (unlikely(nvsp_msg_pkt->msg_type !=
> nvsp_msg_1_type_send_rndis_pkt)) {
> +		PMD_PERROR_LOG(hv, DBG_RX, "NVSP packet is not valid");
> +		return;
> +	}
> +
> +	vm_xfer_page_pkt = (struct hv_vm_transfer_page_packet_header
> *)pkt;
> +
> +	if (unlikely(vm_xfer_page_pkt->transfer_page_set_id
> +			!= NETVSC_RECEIVE_BUFFER_ID)) {
> +		PMD_PERROR_LOG(hv, DBG_RX, "transfer_page_set_id is
> not valid");
> +		return;
> +	}
> +
> +	count = vm_xfer_page_pkt->range_count;
> +
> +	/*
> +	 * Initialize the netvsc packet
> +	 */
> +	for (i = 0; i < count; ++i) {
> +		net_vsc_pkt = hv->netvsc_packet;
> +
> +		net_vsc_pkt->tot_data_buf_len =
> +			vm_xfer_page_pkt->ranges[i].byte_count;
> +		net_vsc_pkt->page_buf_count = 1;
> +
> +		net_vsc_pkt->page_buffers[0].length =
> +			vm_xfer_page_pkt->ranges[i].byte_count;
> +
> +		/* The virtual address of the packet in the receive buffer */
> +		start = ((unsigned long)hv->recv_buf +
> +				vm_xfer_page_pkt->ranges[i].byte_offset);
> +
> +		/* Page number of the virtual page containing packet start */
> +		net_vsc_pkt->page_buffers[0].pfn = start >> PAGE_SHIFT;
> +
> +		/* Calculate the page relative offset */
> +		net_vsc_pkt->page_buffers[0].offset =
> +			vm_xfer_page_pkt->ranges[i].byte_offset &
> (PAGE_SIZE - 1);
> +
> +		/*
> +		 * In this implementation, we are dealing with virtual
> +		 * addresses exclusively.  Since we aren't using physical
> +		 * addresses at all, we don't care if a packet crosses a
> +		 * page boundary.  For this reason, the original code to
> +		 * check for and handle page crossings has been removed.
> +		 */
> +
> +		/*
> +		 * Pass it to the upper layer.  The receive completion call
> +		 * has been moved into this function.
> +		 */
> +		hv_rf_on_receive(hv, net_vsc_pkt);
> +	}
> +	/* Send a receive completion packet to RNDIS device (ie NetVsp) */
> +	hv_vmbus_channel_send_packet(hv, hv->rx_comp_msg,
> sizeof(struct nvsp_msg),
> +			vm_xfer_page_pkt->d.transaction_id,
> +			HV_VMBUS_PACKET_TYPE_COMPLETION, 0);
> +}
> +
> +/*
> + * Net VSC on send completion
> + */
> +static void
> +hv_nv_on_send_completion(struct hv_data *hv, struct
> hv_vm_packet_descriptor *pkt)
> +{
> +	struct nvsp_msg *nvsp_msg_pkt;
> +
> +	nvsp_msg_pkt =
> +	    (struct nvsp_msg *)((unsigned long)pkt + (pkt->data_offset8 <<
> 3));
> +
> +	if (likely(nvsp_msg_pkt->msg_type ==
> +
> 	nvsp_msg_1_type_send_rndis_pkt_complete)) {
> +
> +		if (unlikely(hv->hlt_req_pending))
> +			hv->hlt_req_sent = 1;
> +		else
> +			if (pkt->transaction_id)
> +				++hv->txq->tx_free;
> +		--hv->num_outstanding_sends;
> +		return;
> +	}
> +	PMD_PINFO_LOG(hv, DBG_TX, "unhandled completion (for kernel
> req or so)");
> +}
> +
> +/*
> + * Analogue of bsd hv_nv_on_channel_callback
> + */
> +static void
> +hv_nv_complete_request(struct hv_data *hv, struct rndis_request
> *request)
> +{
> +	uint32_t bytes_rxed, cnt = 0;
> +	uint64_t request_id;
> +	struct hv_vm_packet_descriptor *desc;
> +	uint8_t *buffer;
> +	int     bufferlen = NETVSC_PACKET_SIZE;
> +	int     ret = 0;
> +
> +	PMD_INIT_FUNC_TRACE();
> +
> +	hv->req = request;
> +
> +	buffer = rte_malloc(NULL, bufferlen, RTE_CACHE_LINE_SIZE);
> +	if (!buffer) {
> +		PMD_PERROR_LOG(hv, DBG_LOAD, "failed to allocate
> packet");
> +		return;
> +	}
> +
> +	do {
> +		rte_delay_us(1);
> +		ret = hv_vmbus_channel_recv_packet_raw(hv,
> +				buffer, bufferlen, &bytes_rxed,
> &request_id, 3);
> +		if (ret == 0) {
> +			if (bytes_rxed > 0) {
> +				desc = (struct hv_vm_packet_descriptor
> *)buffer;
> +
> +				switch (desc->type) {
> +				case
> HV_VMBUS_PACKET_TYPE_COMPLETION:
> +					hv_nv_on_send_completion(hv,
> desc);
> +					break;
> +				case
> HV_VMBUS_PACKET_TYPE_DATA_USING_TRANSFER_PAGES:
> +					hv_nv_on_receive(hv, desc);
> +					break;
> +				default:
> +					break;
> +				}
> +				PMD_PDEBUG_LOG(hv, DBG_LOAD,
> +					       "Did %d attempts until non-empty
> data was receieved",
> +					       cnt);
> +				cnt = 0;
> +			} else {
> +				cnt++;
> +			}
> +		} else if (ret == -ENOMEM) {
> +			/* Handle large packet */
> +			PMD_PDEBUG_LOG(hv, DBG_LOAD,
> +				       "recv_packet_raw returned -ENOMEM");
> +			rte_free(buffer);
> +			buffer = rte_malloc(NULL, bytes_rxed,
> RTE_CACHE_LINE_SIZE);
> +			if (buffer == NULL) {
> +				PMD_PERROR_LOG(hv, DBG_LOAD, "failed
> to allocate buffer");
> +				break;
> +			}
> +			bufferlen = bytes_rxed;
> +		} else {
> +			PMD_PERROR_LOG(hv, DBG_LOAD, "Unexpected
> return code (%d)", ret);
> +		}
> +		if (!hv->req) {
> +			PMD_PINFO_LOG(hv, DBG_LOAD, "Single request
> processed");
> +			break;
> +		}
> +		if (cnt >= LOOP_CNT) {
> +			PMD_PERROR_LOG(hv, DBG_LOAD, "Emergency
> break from the loop");
> +			break;
> +		}
> +		if (hv->hlt_req_sent) {
> +			PMD_PINFO_LOG(hv, DBG_LOAD, "Halt request
> processed");
> +			break;
> +		}
> +		/* The field hv->req->response_msg.ndis_msg_type
> +		 * should be set to non-zero value when response received
> +		 */
> +	} while (!hv->req->response_msg.ndis_msg_type);
> +
> +	rte_free(buffer);
> +}
> +
> +/*
> + * RNDIS
> + */
> +
> +/*
> + * Create new RNDIS request
> + */
> +static inline struct rndis_request *
> +hv_rndis_request(struct hv_data *hv, uint32_t message_type,
> +		uint32_t message_length)
> +{
> +	struct rndis_request *request;
> +	struct rndis_msg *rndis_mesg;
> +	struct rndis_set_request *set;
> +	char mz_name[RTE_MEMZONE_NAMESIZE];
> +	uint32_t size;
> +
> +	PMD_INIT_FUNC_TRACE();
> +
> +	request = rte_zmalloc("rndis_req", sizeof(struct rndis_request),
> +			      RTE_CACHE_LINE_SIZE);
> +
> +	if (!request)
> +		return NULL;
> +
> +	sprintf(mz_name, "hv_%d_%u_%d_%p", hv->vmbus_device,
> message_type,
> +			hv->new_request_id, request);
> +
> +	size = MAX(message_length, sizeof(struct rndis_msg));
> +
> +	request->request_msg_memzone =
> rte_memzone_reserve_aligned(mz_name,
> +			size, rte_lcore_to_socket_id(rte_lcore_id()), 0,
> PAGE_SIZE);
> +	if (!request->request_msg_memzone) {
> +		PMD_PERROR_LOG(hv, DBG_LOAD, "memzone_reserve
> failed");
> +		rte_free(request);
> +		return NULL;
> +	}
> +	request->request_msg = request->request_msg_memzone->addr;
> +	rndis_mesg = request->request_msg;
> +	rndis_mesg->ndis_msg_type = message_type;
> +	rndis_mesg->msg_len = message_length;
> +
> +	/*
> +	 * Set the request id. This field is always after the rndis header
> +	 * for request/response packet types so we just use the set_request
> +	 * as a template.
> +	 */
> +	set = &rndis_mesg->msg.set_request;
> +	hv->new_request_id++;
> +	set->request_id = hv->new_request_id;
> +
> +	return request;
> +}
> +
> +/*
> + * RNDIS filter
> + */
> +
> +static void
> +hv_rf_receive_response(
> +	struct hv_data       *hv,
> +	struct rndis_msg     *response)
> +{
> +	struct rndis_request *request = hv->req;
> +
> +	PMD_INIT_FUNC_TRACE();
> +
> +	if (response->msg_len <= sizeof(struct rndis_msg)) {
> +		rte_memcpy(&request->response_msg, response,
> +				response->msg_len);
> +	} else {
> +		if (response->ndis_msg_type ==
> REMOTE_NDIS_INITIALIZE_CMPLT) {
> +			request->response_msg.msg.init_complete.status =
> +				STATUS_BUFFER_OVERFLOW;
> +		}
> +		PMD_PERROR_LOG(hv, DBG_LOAD, "response buffer
> overflow\n");
> +	}
> +}
> +
> +/*
> + * RNDIS filter receive indicate status
> + */
> +static void
> +hv_rf_receive_indicate_status(struct hv_data *hv, struct rndis_msg
> *response)
> +{
> +	struct rndis_indicate_status *indicate = &response-
> >msg.indicate_status;
> +
> +	PMD_INIT_FUNC_TRACE();
> +
> +	if (indicate->status == RNDIS_STATUS_MEDIA_CONNECT)
> +		hv->link_status = 1;
> +	else if (indicate->status == RNDIS_STATUS_MEDIA_DISCONNECT)
> +		hv->link_status = 0;
> +	else if (indicate->status == RNDIS_STATUS_INVALID_DATA)
> +		PMD_PERROR_LOG(hv, DBG_RX, "Invalid data in RNDIS
> message");
> +	else
> +		PMD_PERROR_LOG(hv, DBG_RX, "Unsupported status: %u",
> indicate->status);
> +}
> +
> +/*
> + * RNDIS filter receive data
> + */
> +static void
> +hv_rf_receive_data(struct hv_data *hv, struct rndis_msg *msg,
> +		struct netvsc_packet *pkt)
> +{
> +	struct rte_mbuf *m_new;
> +	struct hv_rx_queue *rxq = hv->rxq;
> +	struct rndis_packet *rndis_pkt;
> +	uint32_t data_offset;
> +
> +	if (unlikely(hv->closed))
> +		return;
> +
> +	rndis_pkt = &msg->msg.packet;
> +
> +	if (unlikely(hv->max_rx_pkt_len < rndis_pkt->data_length)) {
> +		PMD_PWARN_LOG(hv, DBG_RX, "Packet is too large (%db),
> dropping.",
> +				rndis_pkt->data_length);
> +		++hv->stats.ierrors;
> +		return;
> +	}
> +
> +	/* Remove rndis header, then pass data packet up the stack */
> +	data_offset = RNDIS_HEADER_SIZE + rndis_pkt->data_offset;
> +
> +	/* L2 frame length, with L2 header, not including CRC */
> +	pkt->tot_data_buf_len        = rndis_pkt->data_length;
> +	pkt->page_buffers[0].offset += data_offset;
> +	/* Buffer length now L2 frame length plus trailing junk */
> +	pkt->page_buffers[0].length -= data_offset;
> +
> +	pkt->vlan_tci = 0;
> +
> +	/*
> +	 * Just put data into appropriate mbuf, all further work will be done
> +	 * by the upper layer (mbuf replacement, index adjustment, etc)
> +	 */
> +	m_new = rxq->sw_ring[rxq->rx_tail];
> +	if (++rxq->rx_tail == rxq->nb_rx_desc)
> +		rxq->rx_tail = 0;
> +
> +	/*
> +	 * Copy the received packet to mbuf.
> +	 * The copy is required since the memory pointed to by
> netvsc_packet
> +	 * cannot be reallocated
> +	 */
> +	uint8_t *vaddr = (uint8_t *)
> +		(pkt->page_buffers[0].pfn << PAGE_SHIFT)
> +		+ pkt->page_buffers[0].offset;
> +
> +	m_new->nb_segs = 1;
> +	m_new->pkt_len = m_new->data_len = pkt->tot_data_buf_len;
> +	rte_memcpy(rte_pktmbuf_mtod(m_new, void *), vaddr, m_new-
> >data_len);
> +
> +	if (pkt->vlan_tci) {
> +		m_new->vlan_tci = pkt->vlan_tci;
> +		m_new->ol_flags |= PKT_RX_VLAN_PKT;
> +	}
> +
> +	hv->pkt_rxed = 1;
> +}
> +
> +/*
> + * RNDIS filter receive data, jumbo frames support
> + */
> +static void
> +hv_rf_receive_data_sg(struct hv_data *hv, struct rndis_msg *msg,
> +		struct netvsc_packet *pkt)
> +{
> +	struct rte_mbuf *m_new;
> +	struct hv_rx_queue *rxq = hv->rxq;
> +	struct rndis_packet *rndis_pkt;
> +	uint32_t data_offset;
> +
> +	if (unlikely(hv->closed))
> +		return;
> +
> +	rndis_pkt = &msg->msg.packet;
> +
> +	/* Remove rndis header, then pass data packet up the stack */
> +	data_offset = RNDIS_HEADER_SIZE + rndis_pkt->data_offset;
> +
> +	/* L2 frame length, with L2 header, not including CRC */
> +	pkt->tot_data_buf_len        = rndis_pkt->data_length;
> +	pkt->page_buffers[0].offset += data_offset;
> +	/* Buffer length now L2 frame length plus trailing junk */
> +	pkt->page_buffers[0].length -= data_offset;
> +
> +	pkt->vlan_tci = 0;
> +
> +	/*
> +	 * Just put data into appropriate mbuf, all further work will be done
> +	 * by the upper layer (mbuf replacement, index adjustment, etc)
> +	 */
> +	m_new = rxq->sw_ring[rxq->rx_tail];
> +	if (++rxq->rx_tail == rxq->nb_rx_desc)
> +		rxq->rx_tail = 0;
> +
> +	/*
> +	 * Copy the received packet to mbuf.
> +	 * The copy is required since the memory pointed to by
> netvsc_packet
> +	 * cannot be reallocated
> +	 */
> +	uint8_t *vaddr = (uint8_t *)
> +		(pkt->page_buffers[0].pfn << PAGE_SHIFT)
> +		+ pkt->page_buffers[0].offset;
> +
> +	/* Scatter-gather emulation */
> +	uint32_t carry_len = pkt->tot_data_buf_len;
> +	struct rte_mbuf *m_next;
> +
> +	m_new->pkt_len = carry_len;
> +	m_new->nb_segs = (carry_len - 1) / hv->max_rx_pkt_len + 1;
> +
> +	while (1) {
> +		m_new->data_len = MIN(carry_len, hv->max_rx_pkt_len);
> +		rte_memcpy(rte_pktmbuf_mtod(m_new, void *),
> +			   vaddr, m_new->data_len);
> +		vaddr += m_new->data_len;
> +
> +		if (carry_len <= hv->max_rx_pkt_len)
> +			break;
> +
> +		carry_len -= hv->max_rx_pkt_len;
> +		m_next = rxq->sw_ring[rxq->rx_tail];
> +		if (++rxq->rx_tail == rxq->nb_rx_desc)
> +			rxq->rx_tail = 0;
> +		m_new->next = m_next;
> +		m_new = m_next;
> +	}
> +
> +	if (pkt->vlan_tci) {
> +		m_new->vlan_tci = pkt->vlan_tci;
> +		m_new->ol_flags |= PKT_RX_VLAN_PKT;
> +	}
> +
> +	hv->pkt_rxed = 1;
> +}
> +
> +static int
> +hv_rf_send_request(struct hv_data *hv, struct rndis_request *request)
> +{
> +	struct netvsc_packet *packet;
> +
> +	PMD_INIT_FUNC_TRACE();
> +	/* Set up the packet to send it */
> +	packet = &request->pkt;
> +
> +	packet->is_data_pkt = 0;
> +	packet->tot_data_buf_len = request->request_msg->msg_len;
> +	packet->page_buf_count = 1;
> +
> +	packet->page_buffers[0].pfn =
> +		(request->request_msg_memzone->phys_addr) >>
> PAGE_SHIFT;
> +	packet->page_buffers[0].length = request->request_msg->msg_len;
> +	packet->page_buffers[0].offset =
> +	    (unsigned long)request->request_msg & (PAGE_SIZE - 1);
> +
> +	return hv_nv_on_send(hv, packet);
> +}
> +
> +static void u8_to_u16(const char *src, int len, char *dst)
> +{
> +	int i;
> +
> +	for (i = 0; i < len; ++i) {
> +		dst[2 * i] = src[i];
> +		dst[2 * i + 1] = 0;
> +	}
> +}
> +
> +int
> +hv_rf_set_device_mac(struct hv_data *hv, uint8_t *macaddr)
> +{
> +	struct rndis_request *request;
> +	struct rndis_set_request *set_request;
> +	struct rndis_config_parameter_info *info;
> +	struct rndis_set_complete *set_complete;
> +	char mac_str[2*ETHER_ADDR_LEN+1];
> +	wchar_t *param_value, *param_name;
> +	uint32_t status;
> +	uint32_t message_len = sizeof(struct rndis_config_parameter_info)
> +
> +		2 * MAC_STRLEN + 4 * ETHER_ADDR_LEN;
> +	int ret, i;
> +
> +	request = hv_rndis_request(hv, REMOTE_NDIS_SET_MSG,
> +		RNDIS_MESSAGE_SIZE(struct rndis_set_request) +
> message_len);
> +	if (!request)
> +		return -ENOMEM;
> +
> +	set_request = &request->request_msg->msg.set_request;
> +	set_request->oid = RNDIS_OID_GEN_RNDIS_CONFIG_PARAMETER;
> +	set_request->device_vc_handle = 0;
> +	set_request->info_buffer_offset = sizeof(struct rndis_set_request);
> +	set_request->info_buffer_length = message_len;
> +
> +	info = (struct rndis_config_parameter_info *)((ulong)set_request +
> +		set_request->info_buffer_offset);
> +	info->parameter_type = RNDIS_CONFIG_PARAM_TYPE_STRING;
> +	info->parameter_name_offset =
> +		sizeof(struct rndis_config_parameter_info);
> +	info->parameter_name_length = 2 * MAC_STRLEN;
> +	info->parameter_value_offset =
> +		info->parameter_name_offset + info-
> >parameter_name_length;
> +	/* Multiply by 2 because of string representation and by 2
> +	 * because of utf16 representation
> +	 */
> +	info->parameter_value_length = 4 * ETHER_ADDR_LEN;
> +	param_name = (wchar_t *)((ulong)info + info-
> >parameter_name_offset);
> +	param_value = (wchar_t *)((ulong)info + info-
> >parameter_value_offset);
> +
> +	u8_to_u16(MAC_PARAM_STR, MAC_STRLEN, (char *)param_name);
> +	for (i = 0; i < ETHER_ADDR_LEN; ++i) {
> +		mac_str[2*i] = high(macaddr[i]);
> +		mac_str[2*i+1] = low(macaddr[i]);
> +	}
> +
> +	u8_to_u16((const char *)mac_str, 2 * ETHER_ADDR_LEN, (char
> *)param_value);
> +
> +	ret = hv_rf_send_request(hv, request);
> +	if (ret)
> +		goto cleanup;
> +
> +	request->response_msg.msg.set_complete.status = 0xFFFF;
> +	hv_nv_complete_request(hv, request);
> +	set_complete = &request->response_msg.msg.set_complete;
> +	if (set_complete->status == 0xFFFF) {
> +		/* Host is not responding, we can't free request in this case
> */
> +		ret = -1;
> +		PMD_PERROR_LOG(hv, DBG_LOAD, "Host is not
> responding");
> +		goto exit;
> +	}
> +	/* Response received, check status */
> +	status = set_complete->status;
> +	if (status) {
> +		/* Bad response status, return error */
> +		PMD_PERROR_LOG(hv, DBG_LOAD, "set_complete->status
> = %u\n", status);
> +		ret = -EINVAL;
> +	}
> +
> +cleanup:
> +	rte_free(request);
> +exit:
> +	return ret;
> +}
> +
> +/*
> + * RNDIS filter on receive
> + */
> +static int
> +hv_rf_on_receive(struct hv_data *hv, struct netvsc_packet *pkt)
> +{
> +	struct rndis_msg rndis_mesg;
> +	struct rndis_msg *rndis_hdr;
> +
> +	/* Shift virtual page number to form virtual page address */
> +	rndis_hdr = (struct rndis_msg *)(pkt->page_buffers[0].pfn <<
> PAGE_SHIFT);
> +
> +	rndis_hdr = (void *)((unsigned long)rndis_hdr
> +			+ pkt->page_buffers[0].offset);
> +
> +	/*
> +	 * Make sure we got a valid rndis message
> +	 * Fixme:  There seems to be a bug in set completion msg where
> +	 * its msg_len is 16 bytes but the byte_count field in the
> +	 * xfer page range shows 52 bytes
> +	 */
> +	if (unlikely(pkt->tot_data_buf_len != rndis_hdr->msg_len)) {
> +		++hv->stats.ierrors;
> +		PMD_PERROR_LOG(hv, DBG_RX,
> +			       "invalid rndis message? (expected %u "
> +			       "bytes got %u)... dropping this message",
> +			       rndis_hdr->msg_len, pkt->tot_data_buf_len);
> +		return -1;
> +	}
> +
> +	rte_memcpy(&rndis_mesg, rndis_hdr,
> +	    (rndis_hdr->msg_len > sizeof(struct rndis_msg)) ?
> +	    sizeof(struct rndis_msg) : rndis_hdr->msg_len);
> +
> +	switch (rndis_mesg.ndis_msg_type) {
> +
> +	/* data message */
> +	case REMOTE_NDIS_PACKET_MSG:
> +		hv->receive_callback(hv, &rndis_mesg, pkt);
> +		break;
> +	/* completion messages */
> +	case REMOTE_NDIS_INITIALIZE_CMPLT:
> +	case REMOTE_NDIS_QUERY_CMPLT:
> +	case REMOTE_NDIS_SET_CMPLT:
> +	case REMOTE_NDIS_RESET_CMPLT:
> +	case REMOTE_NDIS_KEEPALIVE_CMPLT:
> +		hv_rf_receive_response(hv, &rndis_mesg);
> +		break;
> +	/* notification message */
> +	case REMOTE_NDIS_INDICATE_STATUS_MSG:
> +		hv_rf_receive_indicate_status(hv, &rndis_mesg);
> +		break;
> +	default:
> +		PMD_PERROR_LOG(hv, DBG_RX, "hv_rf_on_receive():
> Unknown msg_type 0x%x",
> +		    rndis_mesg.ndis_msg_type);
> +		break;
> +	}
> +
> +	return 0;
> +}
> +
> +/*
> + * RNDIS filter on send
> + */
> +int
> +hv_rf_on_send(struct hv_data *hv, struct netvsc_packet *pkt)
> +{
> +	struct rndis_msg *rndis_mesg;
> +	struct rndis_packet *rndis_pkt;
> +	uint32_t rndis_msg_size;
> +
> +	/* Add the rndis header */
> +	rndis_mesg = (struct rndis_msg *)pkt->extension;
> +
> +	memset(rndis_mesg, 0, sizeof(struct rndis_msg));
> +
> +	rndis_msg_size = RNDIS_MESSAGE_SIZE(struct rndis_packet);
> +
> +	rndis_mesg->ndis_msg_type = REMOTE_NDIS_PACKET_MSG;
> +	rndis_mesg->msg_len = pkt->tot_data_buf_len + rndis_msg_size;
> +
> +	rndis_pkt = &rndis_mesg->msg.packet;
> +	rndis_pkt->data_offset = sizeof(struct rndis_packet);
> +	rndis_pkt->data_length = pkt->tot_data_buf_len;
> +
> +	pkt->is_data_pkt = 1;
> +
> +	/*
> +	 * Invoke netvsc send.  If return status is bad, the caller now
> +	 * resets the context pointers before retrying.
> +	 */
> +	return hv_nv_on_send(hv, pkt);
> +}
> +
> +static int
> +hv_rf_init_device(struct hv_data *hv)
> +{
> +	struct rndis_request *request;
> +	struct rndis_initialize_request *init;
> +	struct rndis_initialize_complete *init_complete;
> +	uint32_t status;
> +	int ret;
> +
> +	PMD_INIT_FUNC_TRACE();
> +
> +	request = hv_rndis_request(hv, REMOTE_NDIS_INITIALIZE_MSG,
> +	    RNDIS_MESSAGE_SIZE(struct rndis_initialize_request));
> +	if (!request) {
> +		ret = -1;
> +		goto cleanup;
> +	}
> +
> +	/* Set up the rndis set */
> +	init = &request->request_msg->msg.init_request;
> +	init->major_version = RNDIS_MAJOR_VERSION;
> +	init->minor_version = RNDIS_MINOR_VERSION;
> +	/*
> +	 * Per the RNDIS document, this should be set to the max MTU
> +	 * plus the header size.  However, 2048 works fine, so leaving
> +	 * it as is.
> +	 */
> +	init->max_xfer_size = 2048;
> +
> +	hv->rndis_dev_state = RNDIS_DEV_INITIALIZING;
> +
> +	ret = hv_rf_send_request(hv, request);
> +	if (ret != 0) {
> +		hv->rndis_dev_state = RNDIS_DEV_UNINITIALIZED;
> +		goto cleanup;
> +	}
> +
> +	/* Putting -1 here to ensure that HyperV really answered us */
> +	request->response_msg.msg.init_complete.status = -1;
> +	hv_nv_complete_request(hv, request);
> +
> +	init_complete = &request->response_msg.msg.init_complete;
> +	status = init_complete->status;
> +	if (status == 0) {
> +		PMD_PINFO_LOG(hv, DBG_LOAD, "Remote NDIS device is
> initialized");
> +		hv->rndis_dev_state = RNDIS_DEV_INITIALIZED;
> +		ret = 0;
> +	} else {
> +		PMD_PINFO_LOG(hv, DBG_LOAD, "Remote NDIS device left
> uninitialized");
> +		hv->rndis_dev_state = RNDIS_DEV_UNINITIALIZED;
> +		ret = -1;
> +	}
> +
> +cleanup:
> +	rte_free(request);
> +
> +	return ret;
> +}
> +
> +/*
> + * RNDIS filter query device
> + */
> +static int
> +hv_rf_query_device(struct hv_data *hv, uint32_t oid, void *result,
> +		   uint32_t result_size)
> +{
> +	struct rndis_request *request;
> +	struct rndis_query_request *query;
> +	struct rndis_query_complete *query_complete;
> +	int ret = 0;
> +
> +	PMD_INIT_FUNC_TRACE();
> +
> +	request = hv_rndis_request(hv, REMOTE_NDIS_QUERY_MSG,
> +	    RNDIS_MESSAGE_SIZE(struct rndis_query_request));
> +	if (request == NULL) {
> +		ret = -1;
> +		goto cleanup;
> +	}
> +
> +	/* Set up the rndis query */
> +	query = &request->request_msg->msg.query_request;
> +	query->oid = oid;
> +	query->info_buffer_offset = sizeof(struct rndis_query_request);
> +	query->info_buffer_length = 0;
> +	query->device_vc_handle = 0;
> +
> +	ret = hv_rf_send_request(hv, request);
> +	if (ret != 0) {
> +		PMD_PERROR_LOG(hv, DBG_TX, "RNDISFILTER request
> failed to Send!");
> +		goto cleanup;
> +	}
> +
> +	hv_nv_complete_request(hv, request);
> +
> +	/* Copy the response back */
> +	query_complete = &request->response_msg.msg.query_complete;
> +
> +	if (query_complete->info_buffer_length > result_size) {
> +		ret = -EINVAL;
> +		goto cleanup;
> +	}
> +
> +	rte_memcpy(result, (void *)((unsigned long)query_complete +
> +	    query_complete->info_buffer_offset),
> +	    query_complete->info_buffer_length);
> +
> +cleanup:
> +	rte_free(request);
> +
> +	return ret;
> +}
> +
> +/*
> + * RNDIS filter query device MAC address
> + */
> +static inline int
> +hv_rf_query_device_mac(struct hv_data *hv)
> +{
> +	uint32_t size = HW_MACADDR_LEN;
> +
> +	int ret = hv_rf_query_device(hv,
> RNDIS_OID_802_3_PERMANENT_ADDRESS,
> +			&hv->hw_mac_addr, size);
> +	PMD_PDEBUG_LOG(hv, DBG_TX, "MAC:
> %02x:%02x:%02x:%02x:%02x:%02x, ret = %d",
> +			hv->hw_mac_addr[0], hv->hw_mac_addr[1], hv-
> >hw_mac_addr[2],
> +			hv->hw_mac_addr[3], hv->hw_mac_addr[4], hv-
> >hw_mac_addr[5],
> +			ret);
> +	return ret;
> +}
> +
> +/*
> + * RNDIS filter query device link status
> + */
> +static inline int
> +hv_rf_query_device_link_status(struct hv_data *hv)
> +{
> +	uint32_t size = sizeof(uint32_t);
> +	/* Set all bits to 1, it's to ensure that the response is actual */
> +	uint32_t status = -1;
> +
> +	int ret = hv_rf_query_device(hv,
> RNDIS_OID_GEN_MEDIA_CONNECT_STATUS,
> +			&status, size);
> +	hv->link_status = status ? 0 : 1;
> +	PMD_PDEBUG_LOG(hv, DBG_TX, "Link Status: %s",
> +			hv->link_status ? "Up" : "Down");
> +	return ret;
> +}
> +
> +int
> +hv_rf_on_device_add(struct hv_data *hv)
> +{
> +	int ret;
> +
> +	PMD_INIT_FUNC_TRACE();
> +
> +	hv->closed = 0;
> +	hv->rb_data_size = hv->rb_size - sizeof(struct
> hv_vmbus_ring_buffer);
> +	PMD_PDEBUG_LOG(hv, DBG_LOAD, "hv->rb_data_size = %u", hv-
> >rb_data_size);
> +
> +	if (unlikely(hv->in->interrupt_mask == 0)) {
> +		PMD_PINFO_LOG(hv, DBG_LOAD, "Disabling interrupts from
> host");
> +		hv->in->interrupt_mask = 1;
> +		rte_mb();
> +	}
> +
> +	hv->netvsc_packet = rte_zmalloc("", sizeof(struct netvsc_packet),
> +					RTE_CACHE_LINE_SIZE);
> +	if (hv->netvsc_packet == NULL)
> +		return -ENOMEM;
> +	hv->netvsc_packet->is_data_pkt = 1;
> +
> +	hv->rx_comp_msg = rte_zmalloc("", sizeof(struct nvsp_msg),
> +				      RTE_CACHE_LINE_SIZE);
> +	if (hv->rx_comp_msg == NULL)
> +		return -ENOMEM;
> +
> +	hv->rx_comp_msg->msg_type =
> nvsp_msg_1_type_send_rndis_pkt_complete;
> +	hv->rx_comp_msg->msgs.send_rndis_pkt_complete.status =
> +		nvsp_status_success;
> +
> +	memset(&hv->stats, 0, sizeof(struct hv_stats));
> +
> +	hv->receive_callback = hv_rf_receive_data;
> +
> +	/* It's for completion of requests which were sent from kernel-space
> part */
> +	hv_nv_complete_request(hv, NULL);
> +	hv_nv_complete_request(hv, NULL);
> +
> +	hv->rndis_dev_state = RNDIS_DEV_UNINITIALIZED;
> +
> +	/* Send the rndis initialization message */
> +	ret = hv_rf_init_device(hv);
> +	if (ret != 0) {
> +		PMD_PERROR_LOG(hv, DBG_LOAD, "rndis init failed!");
> +		hv_rf_on_device_remove(hv);
> +		return ret;
> +	}
> +
> +	/* Get the mac address */
> +	ret = hv_rf_query_device_mac(hv);
> +	if (ret != 0) {
> +		PMD_PERROR_LOG(hv, DBG_LOAD, "rndis query mac
> failed!");
> +		hv_rf_on_device_remove(hv);
> +		return ret;
> +	}
> +
> +	return ret;
> +}
> +
> +#define HALT_COMPLETION_WAIT_COUNT      25
> +
> +/*
> + * RNDIS filter halt device
> + */
> +static int
> +hv_rf_halt_device(struct hv_data *hv)
> +{
> +	struct rndis_request *request;
> +	struct rndis_halt_request *halt;
> +	int i, ret;
> +
> +	PMD_INIT_FUNC_TRACE();
> +
> +	/* Attempt to do a rndis device halt */
> +	request = hv_rndis_request(hv, REMOTE_NDIS_HALT_MSG,
> +	    RNDIS_MESSAGE_SIZE(struct rndis_halt_request));
> +	if (!request) {
> +		PMD_PERROR_LOG(hv, DBG_LOAD, "Unable to create
> RNDIS_HALT request");
> +		return -1;
> +	}
> +
> +	/* initialize "poor man's semaphore" */
> +	hv->hlt_req_sent = 0;
> +
> +	/* Set up the rndis set */
> +	halt = &request->request_msg->msg.halt_request;
> +	hv->new_request_id++;
> +	halt->request_id = hv->new_request_id;
> +
> +	ret = hv_rf_send_request(hv, request);
> +	if (ret) {
> +		PMD_PERROR_LOG(hv, DBG_LOAD, "Failed to send
> RNDIS_HALT request: %d",
> +				ret);
> +		return ret;
> +	}
> +
> +	/*
> +	 * Wait for halt response from halt callback.  We must wait for
> +	 * the transaction response before freeing the request and other
> +	 * resources.
> +	 */
> +	for (i = HALT_COMPLETION_WAIT_COUNT; i > 0; i--) {
> +		hv_nv_complete_request(hv, request);
> +		if (hv->hlt_req_sent != 0) {
> +			PMD_PDEBUG_LOG(hv, DBG_LOAD, "Completed
> HALT request at %d try",
> +					HALT_COMPLETION_WAIT_COUNT - i
> + 1);
> +			break;
> +		}
> +	}
> +	hv->hlt_req_sent = 0;
> +	if (i == 0) {
> +		PMD_PERROR_LOG(hv, DBG_LOAD, "RNDIS_HALT request
> was not completed!");
> +		rte_free(request);
> +		return -1;
> +	}
> +
> +	hv->rndis_dev_state = RNDIS_DEV_UNINITIALIZED;
> +
> +	rte_free(request);
> +
> +	return 0;
> +}
> +
> +#define HV_TX_DRAIN_TRIES 50
> +static inline int
> +hyperv_tx_drain(struct hv_data *hv)
> +{
> +	int i = HV_TX_DRAIN_TRIES;
> +
> +	PMD_PDEBUG_LOG(hv, DBG_LOAD, "Waiting for TXs to be
> completed...");
> +	while (hv->num_outstanding_sends > 0 && --i) {
> +		hv_nv_complete_request(hv, NULL);
> +		rte_delay_ms(100);
> +	}
> +
> +	return hv->num_outstanding_sends;
> +}
> +
> +/*
> + * RNDIS filter on device remove
> + */
> +int
> +hv_rf_on_device_remove(struct hv_data *hv)
> +{
> +	int ret;
> +
> +	PMD_INIT_FUNC_TRACE();
> +	hv->closed = 1;
> +	if (hyperv_tx_drain(hv) > 0) {
> +		/* Hypervisor is not responding, exit with error here */
> +		PMD_PWARN_LOG(hv, DBG_LOAD, "Can't drain TX queue:
> no response");
> +		return -EAGAIN;
> +	}
> +	PMD_PDEBUG_LOG(hv, DBG_LOAD, "TX queue is empty, can halt
> the device");
> +
> +	/* Halt and release the rndis device */
> +	hv->hlt_req_pending = 1;
> +	ret = hv_rf_halt_device(hv);
> +	hv->hlt_req_pending = 0;
> +
> +	rte_free(hv->netvsc_packet);
> +
> +	return ret;
> +}
> +
> +/*
> + * RNDIS filter set packet filter
> + * Sends an rndis request with the new filter, then waits for a response
> + * from the host.
> + * Returns zero on success, non-zero on failure.
> + */
> +static int
> +hv_rf_set_packet_filter(struct hv_data *hv, uint32_t new_filter)
> +{
> +	struct rndis_request *request;
> +	struct rndis_set_request *set;
> +	struct rndis_set_complete *set_complete;
> +	uint32_t status;
> +	int ret;
> +
> +	PMD_INIT_FUNC_TRACE();
> +
> +	request = hv_rndis_request(hv, REMOTE_NDIS_SET_MSG,
> +			RNDIS_MESSAGE_SIZE(struct rndis_set_request) +
> sizeof(uint32_t));
> +	if (!request) {
> +		ret = -1;
> +		goto cleanup;
> +	}
> +
> +	/* Set up the rndis set */
> +	set = &request->request_msg->msg.set_request;
> +	set->oid = RNDIS_OID_GEN_CURRENT_PACKET_FILTER;
> +	set->info_buffer_length = sizeof(uint32_t);
> +	set->info_buffer_offset = sizeof(struct rndis_set_request);
> +
> +	rte_memcpy((void *)((unsigned long)set + sizeof(struct
> rndis_set_request)),
> +			&new_filter, sizeof(uint32_t));
> +
> +	ret = hv_rf_send_request(hv, request);
> +	if (ret)
> +		goto cleanup;
> +
> +	/*
> +	 * Wait for the response from the host.
> +	 */
> +	request->response_msg.msg.set_complete.status = 0xFFFF;
> +	hv_nv_complete_request(hv, request);
> +
> +	set_complete = &request->response_msg.msg.set_complete;
> +	if (set_complete->status == 0xFFFF) {
> +		/* Host is not responding, we can't free request in this case
> */
> +		ret = -1;
> +		goto exit;
> +	}
> +	/* Response received, check status */
> +	status = set_complete->status;
> +	if (status)
> +		/* Bad response status, return error */
> +		ret = -2;
> +
> +cleanup:
> +	rte_free(request);
> +exit:
> +	return ret;
> +}
> +
> +/*
> + * RNDIS filter open device
> + */
> +int
> +hv_rf_on_open(struct hv_data *hv)
> +{
> +	int ret;
> +
> +	if (hv->closed)
> +		return 0;
> +
> +	if (hv->jumbo_frame_support)
> +		hv->receive_callback = hv_rf_receive_data_sg;
> +
> +	ret = hyperv_set_rx_mode(hv, 1, 0);
> +	if (!ret) {
> +		PMD_PDEBUG_LOG(hv, DBG_LOAD, "RNDIS device
> opened");
> +		hv->rndis_dev_state = RNDIS_DEV_DATAINITIALIZED;
> +	} else
> +		PMD_PERROR_LOG(hv, DBG_LOAD, "RNDIS device is left
> unopened");
> +
> +	return ret;
> +}
> +
> +/*
> + * RNDIS filter on close
> + */
> +int
> +hv_rf_on_close(struct hv_data *hv)
> +{
> +	int ret;
> +
> +	PMD_INIT_FUNC_TRACE();
> +
> +	if (hv->closed)
> +		return 0;
> +
> +	if (hv->rndis_dev_state != RNDIS_DEV_DATAINITIALIZED) {
> +		PMD_PDEBUG_LOG(hv, DBG_LOAD, "RNDIS device state
> should be"
> +				" RNDIS_DEV_DATAINITIALIZED, but now it is
> %u",
> +				hv->rndis_dev_state);
> +		return 0;
> +	}
> +
> +	ret = hv_rf_set_packet_filter(hv, 0);
> +	if (!ret) {
> +		PMD_PDEBUG_LOG(hv, DBG_LOAD, "RNDIS device closed");
> +		hv->rndis_dev_state = RNDIS_DEV_INITIALIZED;
> +	} else
> +		PMD_PDEBUG_LOG(hv, DBG_LOAD, "RNDIS device is left
> unclosed");
> +
> +	return ret;
> +}
> +
> +/*
> + * RX Flow
> + */
> +int
> +hyperv_get_buffer(struct hv_data *hv, void *buffer, uint32_t bufferlen)
> +{
> +	uint32_t bytes_rxed;
> +	uint64_t request_id;
> +	struct hv_vm_packet_descriptor *desc;
> +
> +	int ret = hv_vmbus_channel_recv_packet_raw(hv, buffer, bufferlen,
> +			&bytes_rxed, &request_id, 1);
> +	if (likely(ret == 0)) {
> +		if (bytes_rxed) {
> +			desc = (struct hv_vm_packet_descriptor *)buffer;
> +
> +			if (likely(desc->type ==
> +
> 	HV_VMBUS_PACKET_TYPE_DATA_USING_TRANSFER_PAGES)) {
> +				hv->pkt_rxed = 0;
> +				hv_nv_on_receive(hv, desc);
> +				return hv->pkt_rxed;
> +			}
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +/*
> + * TX completions handler
> + */
> +void
> +hyperv_scan_comps(struct hv_data *hv, int allow_rx_drop)
> +{
> +	uint32_t bytes_rxed;
> +	uint64_t request_id;
> +
> +	while (1) {
> +		int ret = hv_vmbus_channel_recv_packet_raw(hv, hv->desc,
> PAGE_SIZE,
> +			&bytes_rxed, &request_id, 2 | allow_rx_drop);
> +
> +		if (ret != 0 || !bytes_rxed)
> +			break;
> +
> +		if (likely(hv->desc->type ==
> HV_VMBUS_PACKET_TYPE_COMPLETION))
> +			hv_nv_on_send_completion(hv, hv->desc);
> +	}
> +}
> +
> +/*
> + * Get link status
> + */
> +uint8_t
> +hyperv_get_link_status(struct hv_data *hv)
> +{
> +	if (hv_rf_query_device_link_status(hv))
> +		return 2;
> +	return hv->link_status;
> +}
> +
> +/*
> + * Set/Reset RX mode
> + */
> +int
> +hyperv_set_rx_mode(struct hv_data *hv, uint8_t promisc, uint8_t mcast)
> +{
> +	PMD_INIT_FUNC_TRACE();
> +
> +	if (!promisc) {
> +		return hv_rf_set_packet_filter(hv,
> +				NDIS_PACKET_TYPE_BROADCAST                   |
> +				(mcast ?
> NDIS_PACKET_TYPE_ALL_MULTICAST : 0) |
> +				NDIS_PACKET_TYPE_DIRECTED);
> +	}
> +
> +	return hv_rf_set_packet_filter(hv,
> NDIS_PACKET_TYPE_PROMISCUOUS);
> +}
> diff --git a/lib/librte_pmd_hyperv/hyperv_drv.h
> b/lib/librte_pmd_hyperv/hyperv_drv.h
> new file mode 100644
> index 0000000..22acad5
> --- /dev/null
> +++ b/lib/librte_pmd_hyperv/hyperv_drv.h
> @@ -0,0 +1,558 @@
> +/*-
> + * Copyright (c) 2009-2012 Microsoft Corp.
> + * Copyright (c) 2010-2012 Citrix Inc.
> + * Copyright (c) 2012 NetApp Inc.
> + * Copyright (c) 2013-2015 Brocade Communications Systems, Inc.
> + * All rights reserved.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + * 1. Redistributions of source code must retain the above copyright
> + *    notice unmodified, this list of conditions, and the following
> + *    disclaimer.
> + * 2. Redistributions in binary form must reproduce the above copyright
> + *    notice, this list of conditions and the following disclaimer in the
> + *    documentation and/or other materials provided with the distribution.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
> OR
> + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
> WARRANTIES
> + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
> DISCLAIMED.
> + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
> + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
> (INCLUDING, BUT
> + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
> LOSS OF USE,
> + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
> AND ON ANY
> + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
> THE USE OF
> + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> DAMAGE.
> + *
> + */
> +
> +#ifndef _HYPERV_DRV_H_
> +#define _HYPERV_DRV_H_
> +
> +/*
> + * Definitions from hyperv.h
> + */
> +#define HW_MACADDR_LEN	6
> +#define HV_MAX_PAGE_BUFFER_COUNT	19
> +
> +#define HV_ALIGN_UP(value, align) \
> +		(((value) & (align-1)) ? \
> +		    (((value) + (align-1)) & ~(align-1)) : (value))
> +
> +/*
> + *  Connection identifier type
> + */
> +union hv_vmbus_connection_id {
> +	uint32_t                as_uint32_t;
> +	struct {
> +		uint32_t        id:24;
> +		uint32_t        reserved:8;
> +	} u;
> +
> +} __attribute__((packed));
> +
> +union hv_vmbus_monitor_trigger_state {
> +	uint32_t as_uint32_t;
> +	struct {
> +		uint32_t group_enable:4;
> +		uint32_t rsvd_z:28;
> +	} u;
> +};
> +
> +union hv_vmbus_monitor_trigger_group {
> +	uint64_t as_uint64_t;
> +	struct {
> +		uint32_t pending;
> +		uint32_t armed;
> +	} u;
> +};
> +
> +struct hv_vmbus_monitor_parameter {
> +	union hv_vmbus_connection_id  connection_id;
> +	uint16_t                flag_number;
> +	uint16_t                rsvd_z;
> +};
> +
> +/*
> + * hv_vmbus_monitor_page Layout
> + * ------------------------------------------------------
> + * | 0   | trigger_state (4 bytes) | Rsvd1 (4 bytes)     |
> + * | 8   | trigger_group[0]                              |
> + * | 10  | trigger_group[1]                              |
> + * | 18  | trigger_group[2]                              |
> + * | 20  | trigger_group[3]                              |
> + * | 28  | Rsvd2[0]                                      |
> + * | 30  | Rsvd2[1]                                      |
> + * | 38  | Rsvd2[2]                                      |
> + * | 40  | next_check_time[0][0] | next_check_time[0][1] |
> + * | ...                                                 |
> + * | 240 | latency[0][0..3]                              |
> + * | 340 | Rsvz3[0]                                      |
> + * | 440 | parameter[0][0]                               |
> + * | 448 | parameter[0][1]                               |
> + * | ...                                                 |
> + * | 840 | Rsvd4[0]                                      |
> + * ------------------------------------------------------
> + */
> +
> +struct hv_vmbus_monitor_page {
> +	union hv_vmbus_monitor_trigger_state  trigger_state;
> +	uint32_t                        rsvd_z1;
> +
> +	union hv_vmbus_monitor_trigger_group  trigger_group[4];
> +	uint64_t                        rsvd_z2[3];
> +
> +	int32_t                         next_check_time[4][32];
> +
> +	uint16_t                        latency[4][32];
> +	uint64_t                        rsvd_z3[32];
> +
> +	struct hv_vmbus_monitor_parameter      parameter[4][32];
> +
> +	uint8_t                         rsvd_z4[1984];
> +};
> +
> +enum hv_vmbus_packet_type {
> +	HV_VMBUS_PACKET_TYPE_DATA_USING_TRANSFER_PAGES
> 	= 0x7,
> +	HV_VMBUS_PACKET_TYPE_DATA_USING_GPA_DIRECT
> 	= 0x9,
> +	HV_VMBUS_PACKET_TYPE_COMPLETION
> 	= 0xb,
> +};
> +
> +#define HV_VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED    1
> +
> +struct hv_vm_packet_descriptor {
> +	uint16_t type;
> +	uint16_t data_offset8;
> +	uint16_t length8;
> +	uint16_t flags;
> +	uint64_t transaction_id;
> +} __attribute__((packed));
> +
> +struct hv_vm_transfer_page {
> +	uint32_t byte_count;
> +	uint32_t byte_offset;
> +} __attribute__((packed));
> +
> +struct hv_vm_transfer_page_packet_header {
> +	struct hv_vm_packet_descriptor d;
> +	uint16_t                       transfer_page_set_id;
> +	uint8_t                        sender_owns_set;
> +	uint8_t                        reserved;
> +	uint32_t                       range_count;
> +	struct hv_vm_transfer_page     ranges[1];
> +} __attribute__((packed));
> +
> +struct hv_vmbus_ring_buffer {
> +	volatile uint32_t       write_index;
> +	volatile uint32_t       read_index;
> +	/*
> +	 * NOTE: The interrupt_mask field is used only for channels, but
> +	 * vmbus connection also uses this data structure
> +	 */
> +	volatile uint32_t       interrupt_mask;
> +	/* pad it to PAGE_SIZE so that data starts on a page */
> +	uint8_t                 reserved[4084];
> +
> +	/*
> +	 * WARNING: Ring data starts here + ring_data_start_offset
> +	 *  !!! DO NOT place any fields below this !!!
> +	 */
> +	uint8_t			buffer[0];	/* doubles as interrupt mask
> */
> +} __attribute__((packed));
> +
> +struct hv_vmbus_page_buffer {
> +	uint32_t	length;
> +	uint32_t	offset;
> +	uint64_t	pfn;
> +} __attribute__((packed));
> +
> +/*
> + * Definitions from hv_vmbus_priv.h
> + */
> +struct hv_vmbus_sg_buffer_list {
> +	void		*data;
> +	uint32_t	length;
> +};
> +
> +struct hv_vmbus_channel_packet_page_buffer {
> +	uint16_t		type;
> +	uint16_t		data_offset8;
> +	uint16_t		length8;
> +	uint16_t		flags;
> +	uint64_t		transaction_id;
> +	uint32_t		reserved;
> +	uint32_t		range_count;
> +	struct hv_vmbus_page_buffer
> 	range[HV_MAX_PAGE_BUFFER_COUNT];
> +} __attribute__((packed));
> +
> +/*
> + * Definitions from hv_net_vsc.h
> + */
> +#define NETVSC_PACKET_MAXPAGE 16
> +#define NETVSC_PACKET_SIZE    256
> +
> +/*
> + * This message is used by both the VSP and the VSC to complete
> + * a RNDIS message to the opposite channel endpoint.  At this
> + * point, the initiator of this message cannot use any resources
> + * associated with the original RNDIS packet.
> + */
> +enum nvsp_status_ {
> +	nvsp_status_none = 0,
> +	nvsp_status_success,
> +	nvsp_status_failure,
> +};
> +
> +struct nvsp_1_msg_send_rndis_pkt_complete {
> +	uint32_t                                status;
> +} __attribute__((packed));
> +
> +enum nvsp_msg_type {
> +	/*
> +	 * Version 1 Messages
> +	 */
> +	nvsp_msg_1_type_send_ndis_vers          = 100,
> +
> +	nvsp_msg_1_type_send_rx_buf,
> +	nvsp_msg_1_type_send_rx_buf_complete,
> +	nvsp_msg_1_type_revoke_rx_buf,
> +
> +	nvsp_msg_1_type_send_send_buf,
> +	nvsp_msg_1_type_send_send_buf_complete,
> +	nvsp_msg_1_type_revoke_send_buf,
> +
> +	nvsp_msg_1_type_send_rndis_pkt,
> +	nvsp_msg_1_type_send_rndis_pkt_complete,
> +};
> +
> +struct nvsp_1_msg_send_rndis_pkt {
> +	/*
> +	 * This field is specified by RNDIS.  They assume there's
> +	 * two different channels of communication. However,
> +	 * the Network VSP only has one.  Therefore, the channel
> +	 * travels with the RNDIS packet.
> +	 */
> +	uint32_t                                chan_type;
> +
> +	/*
> +	 * This field is used to send part or all of the data
> +	 * through a send buffer. This value specifies an
> +	 * index into the send buffer.  If the index is
> +	 * 0xFFFFFFFF, then the send buffer is not being used
> +	 * and all of the data was sent through other VMBus
> +	 * mechanisms.
> +	 */
> +	uint32_t                                send_buf_section_idx;
> +	uint32_t                                send_buf_section_size;
> +} __attribute__((packed));
> +
> +/*
> + * ALL Messages
> + */
> +struct nvsp_msg {
> +	uint32_t                                msg_type;
> +	union {
> +		struct nvsp_1_msg_send_rndis_pkt               send_rndis_pkt;
> +		struct nvsp_1_msg_send_rndis_pkt_complete
> send_rndis_pkt_complete;
> +		/* size is set like in linux kernel driver */
> +		uint8_t raw[24];
> +	} msgs;
> +} __attribute__((packed));
> +
> +#define NETVSC_RECEIVE_BUFFER_ID                0xcafe
> +
> +struct netvsc_packet {
> +	uint8_t             is_data_pkt;      /* One byte */
> +	uint8_t             ext_pages;
> +	uint16_t            vlan_tci;
> +
> +	void                *extension;
> +	uint64_t            extension_phys_addr;
> +	uint32_t            tot_data_buf_len;
> +	uint32_t            page_buf_count;
> +	struct hv_vmbus_page_buffer
> page_buffers[NETVSC_PACKET_MAXPAGE];
> +};
> +
> +/*
> + * Definitions from hv_rndis.h
> + */
> +#define RNDIS_MAJOR_VERSION                             0x00000001
> +#define RNDIS_MINOR_VERSION                             0x00000000
> +
> +#define STATUS_BUFFER_OVERFLOW                          (0x80000005L)
> +
> +/*
> + * Remote NDIS message types
> + */
> +#define REMOTE_NDIS_PACKET_MSG                          0x00000001
> +#define REMOTE_NDIS_INITIALIZE_MSG                      0x00000002
> +#define REMOTE_NDIS_HALT_MSG                            0x00000003
> +#define REMOTE_NDIS_QUERY_MSG                           0x00000004
> +#define REMOTE_NDIS_SET_MSG                             0x00000005
> +#define REMOTE_NDIS_RESET_MSG                           0x00000006
> +#define REMOTE_NDIS_INDICATE_STATUS_MSG                 0x00000007
> +#define REMOTE_NDIS_KEEPALIVE_MSG                       0x00000008
> +/*
> + * Remote NDIS message completion types
> + */
> +#define REMOTE_NDIS_INITIALIZE_CMPLT                    0x80000002
> +#define REMOTE_NDIS_QUERY_CMPLT                         0x80000004
> +#define REMOTE_NDIS_SET_CMPLT                           0x80000005
> +#define REMOTE_NDIS_RESET_CMPLT                         0x80000006
> +#define REMOTE_NDIS_KEEPALIVE_CMPLT                     0x80000008
> +
> +#define RNDIS_OID_GEN_MEDIA_CONNECT_STATUS              0x00010114
> +#define RNDIS_OID_GEN_CURRENT_PACKET_FILTER             0x0001010E
> +#define RNDIS_OID_802_3_PERMANENT_ADDRESS               0x01010101
> +#define RNDIS_OID_802_3_CURRENT_ADDRESS                 0x01010102
> +#define RNDIS_OID_GEN_RNDIS_CONFIG_PARAMETER            0x0001021B
> +
> +#define RNDIS_CONFIG_PARAM_TYPE_STRING      2
> +/* extended info after the RNDIS request message */
> +#define RNDIS_EXT_LEN                       100
> +/*
> + * Packet extension field contents associated with a Data message.
> + */
> +struct rndis_per_packet_info {
> +	uint32_t            size;
> +	uint32_t            type;
> +	uint32_t            per_packet_info_offset;
> +};
> +
> +#define ieee_8021q_info 6
> +
> +struct ndis_8021q_info {
> +	union {
> +		struct {
> +			uint32_t   user_pri:3;  /* User Priority */
> +			uint32_t   cfi:1;  /* Canonical Format ID */
> +			uint32_t   vlan_id:12;
> +			uint32_t   reserved:16;
> +		} s1;
> +		uint32_t    value;
> +	} u1;
> +};
> +
> +/* Format of Information buffer passed in a SetRequest for the OID */
> +/* OID_GEN_RNDIS_CONFIG_PARAMETER. */
> +struct rndis_config_parameter_info {
> +	uint32_t parameter_name_offset;
> +	uint32_t parameter_name_length;
> +	uint32_t parameter_type;
> +	uint32_t parameter_value_offset;
> +	uint32_t parameter_value_length;
> +};
> +
> +/*
> + * NdisInitialize message
> + */
> +struct rndis_initialize_request {
> +	/* RNDIS request ID */
> +	uint32_t            request_id;
> +	uint32_t            major_version;
> +	uint32_t            minor_version;
> +	uint32_t            max_xfer_size;
> +};
> +
> +/*
> + * Response to NdisInitialize
> + */
> +struct rndis_initialize_complete {
> +	/* RNDIS request ID */
> +	uint32_t            request_id;
> +	/* RNDIS status */
> +	uint32_t            status;
> +	uint32_t            major_version;
> +	uint32_t            minor_version;
> +	uint32_t            device_flags;
> +	/* RNDIS medium */
> +	uint32_t            medium;
> +	uint32_t            max_pkts_per_msg;
> +	uint32_t            max_xfer_size;
> +	uint32_t            pkt_align_factor;
> +	uint32_t            af_list_offset;
> +	uint32_t            af_list_size;
> +};
> +
> +/*
> + * NdisSetRequest message
> + */
> +struct rndis_set_request {
> +	/* RNDIS request ID */
> +	uint32_t            request_id;
> +	/* RNDIS OID */
> +	uint32_t            oid;
> +	uint32_t            info_buffer_length;
> +	uint32_t            info_buffer_offset;
> +	/* RNDIS handle */
> +	uint32_t            device_vc_handle;
> +};
> +
> +/*
> + * Response to NdisSetRequest
> + */
> +struct rndis_set_complete {
> +	/* RNDIS request ID */
> +	uint32_t            request_id;
> +	/* RNDIS status */
> +	uint32_t            status;
> +};
> +
> +/*
> + * NdisQueryRequest message
> + */
> +struct rndis_query_request {
> +	/* RNDIS request ID */
> +	uint32_t            request_id;
> +	/* RNDIS OID */
> +	uint32_t            oid;
> +	uint32_t            info_buffer_length;
> +	uint32_t            info_buffer_offset;
> +	/* RNDIS handle */
> +	uint32_t            device_vc_handle;
> +};
> +
> +/*
> + * Response to NdisQueryRequest
> + */
> +struct rndis_query_complete {
> +	/* RNDIS request ID */
> +	uint32_t            request_id;
> +	/* RNDIS status */
> +	uint32_t            status;
> +	uint32_t            info_buffer_length;
> +	uint32_t            info_buffer_offset;
> +};
> +
> +/*
> + * Data message. All offset fields contain byte offsets from the beginning
> + * of the rndis_packet structure. All length fields are in bytes.
> + * VcHandle is set to 0 for connectionless data, otherwise it
> + * contains the VC handle.
> + */
> +struct rndis_packet {
> +	uint32_t            data_offset;
> +	uint32_t            data_length;
> +	uint32_t            oob_data_offset;
> +	uint32_t            oob_data_length;
> +	uint32_t            num_oob_data_elements;
> +	uint32_t            per_pkt_info_offset;
> +	uint32_t            per_pkt_info_length;
> +	/* RNDIS handle */
> +	uint32_t            vc_handle;
> +	uint32_t            reserved;
> +};
> +
> +/*
> + * NdisHalt message
> + */
> +struct rndis_halt_request {
> +	/* RNDIS request ID */
> +	uint32_t            request_id;
> +};
> +
> +/*
> + * NdisMIndicateStatus message
> + */
> +struct rndis_indicate_status {
> +	/* RNDIS status */
> +	uint32_t                                status;
> +	uint32_t                                status_buf_length;
> +	uint32_t                                status_buf_offset;
> +};
> +
> +#define RNDIS_STATUS_MEDIA_CONNECT              (0x4001000BL)
> +#define RNDIS_STATUS_MEDIA_DISCONNECT           (0x4001000CL)
> +#define RNDIS_STATUS_INVALID_DATA               (0xC0010015L)
> +
> +/*
> + * union with all of the RNDIS messages
> + */
> +union rndis_msg_container {
> +	struct rndis_initialize_request                init_request;
> +	struct rndis_initialize_complete               init_complete;
> +	struct rndis_set_request                       set_request;
> +	struct rndis_set_complete                      set_complete;
> +	struct rndis_query_request                     query_request;
> +	struct rndis_query_complete                    query_complete;
> +	struct rndis_packet                            packet;
> +	struct rndis_halt_request                      halt_request;
> +	struct rndis_indicate_status                   indicate_status;
> +#if 0
> +	rndis_keepalive_request                 keepalive_request;
> +	rndis_reset_request                     reset_request;
> +	rndis_reset_complete                    reset_complete;
> +	rndis_keepalive_complete                keepalive_complete;
> +	rcondis_mp_create_vc                    co_miniport_create_vc;
> +	rcondis_mp_delete_vc                    co_miniport_delete_vc;
> +	rcondis_indicate_status                 co_miniport_status;
> +	rcondis_mp_activate_vc_request          co_miniport_activate_vc;
> +	rcondis_mp_deactivate_vc_request        co_miniport_deactivate_vc;
> +	rcondis_mp_create_vc_complete
> co_miniport_create_vc_complete;
> +	rcondis_mp_delete_vc_complete
> co_miniport_delete_vc_complete;
> +	rcondis_mp_activate_vc_complete
> co_miniport_activate_vc_complete;
> +	rcondis_mp_deactivate_vc_complete
> co_miniport_deactivate_vc_complete;
> +#endif
> +	uint32_t packet_ex[16]; /* to pad the union size */
> +};
> +
> +struct rndis_msg {
> +	uint32_t         ndis_msg_type;
> +
> +	/*
> +	 * Total length of this message, from the beginning
> +	 * of the rndis_msg struct, in bytes.
> +	 */
> +	uint32_t         msg_len;
> +
> +	/* Actual message */
> +	union rndis_msg_container msg;
> +};
> +
> +#define RNDIS_HEADER_SIZE (sizeof(struct rndis_msg) - sizeof(union
> rndis_msg_container))
> +
> +#define NDIS_PACKET_TYPE_DIRECTED       0x00000001
> +#define NDIS_PACKET_TYPE_MULTICAST      0x00000002
> +#define NDIS_PACKET_TYPE_ALL_MULTICAST  0x00000004
> +#define NDIS_PACKET_TYPE_BROADCAST      0x00000008
> +#define NDIS_PACKET_TYPE_SOURCE_ROUTING 0x00000010
> +#define NDIS_PACKET_TYPE_PROMISCUOUS    0x00000020
> +
> +/*
> + * get the size of an RNDIS message. Pass in the message type,
> + * rndis_set_request, rndis_packet for example
> + */
> +#define RNDIS_MESSAGE_SIZE(message) \
> +	(sizeof(message) + (sizeof(struct rndis_msg) - sizeof(union
> rndis_msg_container)))
> +
> +
> +/*
> + * Definitions from hv_rndis_filter.h
> + */
> +enum {
> +	RNDIS_DEV_UNINITIALIZED = 0,
> +	RNDIS_DEV_INITIALIZING,
> +	RNDIS_DEV_INITIALIZED,
> +	RNDIS_DEV_DATAINITIALIZED,
> +};
> +
> +struct rndis_request {
> +	/* assumed a fixed size response here. */
> +	struct rndis_msg    response_msg;
> +
> +	/* Simplify allocation by having a netvsc packet inline */
> +	struct netvsc_packet pkt;
> +	/* set additional buffer since packet can cross page boundary */
> +	struct hv_vmbus_page_buffer buffer;
> +	/* assumed a fixed size request here. */
> +	struct rndis_msg    *request_msg;
> +	const struct rte_memzone *request_msg_memzone;
> +};
> +
> +struct rndis_filter_packet {
> +	struct rndis_msg                       message;
> +};
> +
> +#endif /* _HYPERV_DRV_H_ */
> diff --git a/lib/librte_pmd_hyperv/hyperv_ethdev.c
> b/lib/librte_pmd_hyperv/hyperv_ethdev.c
> new file mode 100644
> index 0000000..7b909db
> --- /dev/null
> +++ b/lib/librte_pmd_hyperv/hyperv_ethdev.c
> @@ -0,0 +1,332 @@
> +/*-
> + * Copyright (c) 2013-2015 Brocade Communications Systems, Inc.
> + * All rights reserved.
> + */
> +
> +#include <assert.h>
> +#include <unistd.h>
> +#include "hyperv.h"
> +
> +static struct rte_vmbus_id vmbus_id_hyperv_map[] = {
> +	{
> +		.device_id = 0x0,
> +	},
> +};
> +
> +static void
> +hyperv_dev_info_get(__rte_unused struct rte_eth_dev *dev,
> +		struct rte_eth_dev_info *dev_info)
> +{
> +	PMD_INIT_FUNC_TRACE();
> +	dev_info->max_rx_queues  = HV_MAX_RX_QUEUES;
> +	dev_info->max_tx_queues  = HV_MAX_TX_QUEUES;
> +	dev_info->min_rx_bufsize = HV_MIN_RX_BUF_SIZE;
> +	dev_info->max_rx_pktlen  = HV_MAX_RX_PKT_LEN;
> +	dev_info->max_mac_addrs  = HV_MAX_MAC_ADDRS;
> +}
> +
> +inline int
> +rte_hv_dev_atomic_write_link_status(struct rte_eth_dev *dev,
> +		struct rte_eth_link *link)
> +{
> +	struct rte_eth_link *dst = &(dev->data->dev_link);
> +	struct rte_eth_link *src = link;
> +
> +	if (rte_atomic64_cmpset((uint64_t *)dst, *(uint64_t *)dst,
> +				*(uint64_t *)src) == 0)
> +		return -1;
> +
> +	return 0;
> +}
> +
> +inline int
> +rte_hv_dev_atomic_read_link_status(struct rte_eth_dev *dev,
> +		struct rte_eth_link *link)
> +{
> +	struct rte_eth_link *dst = link;
> +	struct rte_eth_link *src = &(dev->data->dev_link);
> +
> +	if (rte_atomic64_cmpset((uint64_t *)dst, *(uint64_t *)dst,
> +				*(uint64_t *)src) == 0)
> +		return -1;
> +
> +	return 0;
> +}
> +
> +/* return 0 means link status changed, -1 means not changed */
> +static int
> +hyperv_dev_link_update(struct rte_eth_dev *dev,
> +		__rte_unused int wait_to_complete)
> +{
> +	uint8_t ret;
> +	struct rte_eth_link old, link;
> +	struct hv_data *hv = dev->data->dev_private;
> +
> +	PMD_INIT_FUNC_TRACE();
> +	memset(&old, 0, sizeof(old));
> +	memset(&link, 0, sizeof(link));
> +	rte_hv_dev_atomic_read_link_status(dev, &old);
> +	if (!hv->link_status && (hv->link_req_cnt == HV_MAX_LINK_REQ)) {
> +		ret = hyperv_get_link_status(hv);
> +		if (ret > 1)
> +			return -1;
> +		hv->link_req_cnt = 0;
> +	}
> +	link.link_duplex = ETH_LINK_FULL_DUPLEX;
> +	link.link_speed = ETH_LINK_SPEED_10000;
> +	link.link_status = hv->link_status;
> +	hv->link_req_cnt++;
> +	rte_hv_dev_atomic_write_link_status(dev, &link);
> +
> +	return (old.link_status == link.link_status) ? -1 : 0;
> +}
> +
> +static int
> +hyperv_dev_configure(struct rte_eth_dev *dev)
> +{
> +	struct hv_data *hv = dev->data->dev_private;
> +	const struct rte_eth_rxmode *rxmode = &dev->data-
> >dev_conf.rxmode;
> +
> +	PMD_INIT_FUNC_TRACE();
> +
> +	rte_memcpy(dev->data->mac_addrs->addr_bytes, hv-
> >hw_mac_addr,
> +			ETHER_ADDR_LEN);
> +	hv->jumbo_frame_support = rxmode->jumbo_frame;
> +
> +	return 0;
> +}
> +
> +static int
> +hyperv_init(struct rte_eth_dev *dev)
> +{
> +	struct hv_data *hv = dev->data->dev_private;
> +	struct rte_vmbus_device *vmbus_dev;
> +
> +	vmbus_dev = dev->vmbus_dev;
> +	hv->uio_fd = vmbus_dev->uio_fd;
> +	hv->kernel_initialized = 1;
> +	hv->vmbus_device = vmbus_dev->id.device_id;
> +	hv->monitor_bit = (uint8_t)(vmbus_dev->vmbus_monitor_id % 32);
> +	hv->monitor_group = (uint8_t)(vmbus_dev->vmbus_monitor_id /
> 32);
> +	PMD_PDEBUG_LOG(hv, DBG_LOAD, "hyperv_init for vmbus device
> %d",
> +			vmbus_dev->id.device_id);
> +
> +	/* get the memory mappings */
> +	hv->ring_pages = vmbus_dev-
> >mem_resource[TXRX_RING_MAP].addr;
> +	hv->int_page = vmbus_dev->mem_resource[INT_PAGE_MAP].addr;
> +	hv->monitor_pages =
> +		(struct hv_vmbus_monitor_page *)
> +		vmbus_dev->mem_resource[MON_PAGE_MAP].addr;
> +	hv->recv_buf = vmbus_dev-
> >mem_resource[RECV_BUF_MAP].addr;
> +	assert(hv->ring_pages);
> +	assert(hv->int_page);
> +	assert(hv->monitor_pages);
> +	assert(hv->recv_buf);
> +
> +	/* separate send/recv int_pages */
> +	hv->recv_interrupt_page = hv->int_page;
> +
> +	hv->send_interrupt_page =
> +		((uint8_t *) hv->int_page + (PAGE_SIZE >> 1));
> +
> +	/* retrieve in/out ring_buffers */
> +	hv->out = hv->ring_pages;
> +	hv->in  = (void *)((uint64_t)hv->out +
> +			(vmbus_dev->mem_resource[TXRX_RING_MAP].len
> / 2));
> +	hv->rb_size = (vmbus_dev->mem_resource[TXRX_RING_MAP].len /
> 2);
> +
> +	dev->rx_pkt_burst = hyperv_recv_pkts;
> +	dev->tx_pkt_burst = hyperv_xmit_pkts;
> +
> +	return hv_rf_on_device_add(hv);
> +}
> +
> +#define HV_DEV_ID (hv->vmbus_device << 1)
> +#define HV_MTU (dev->data->dev_conf.rxmode.max_rx_pkt_len << 9)
> +
> +static int
> +hyperv_dev_start(struct rte_eth_dev *dev)
> +{
> +	int ret;
> +	uint32_t cmd;
> +	size_t bytes;
> +	struct hv_data *hv = dev->data->dev_private;
> +
> +	PMD_INIT_FUNC_TRACE();
> +	if (!hv->kernel_initialized) {
> +		cmd = HV_DEV_ID | HV_MTU;
> +		bytes = write(hv->uio_fd, &cmd, sizeof(uint32_t));
> +		if (bytes < sizeof(uint32_t)) {
> +			PMD_PERROR_LOG(hv, DBG_LOAD, "write on uio_fd
> %d failed",
> +					hv->uio_fd);
> +			return -1;
> +		}
> +		ret = vmbus_uio_map_resource(dev->vmbus_dev);
> +		if (ret < 0) {
> +			PMD_PERROR_LOG(hv, DBG_LOAD, "Failed to map
> resources");
> +			return ret;
> +		}
> +		ret = hyperv_init(dev);
> +		if (ret)
> +			return ret;
> +	}
> +	ret = hv_rf_on_open(hv);
> +	if (ret) {
> +		PMD_PERROR_LOG(hv, DBG_LOAD, "hv_rf_on_open
> failed");
> +		return ret;
> +	}
> +	hv->link_req_cnt = HV_MAX_LINK_REQ;
> +
> +	return ret;
> +}
> +
> +static void
> +hyperv_dev_stop(struct rte_eth_dev *dev)
> +{
> +	struct hv_data *hv = dev->data->dev_private;
> +	uint32_t cmd;
> +	size_t bytes;
> +
> +	PMD_INIT_FUNC_TRACE();
> +	if (!hv->closed) {
> +		hv_rf_on_close(hv);
> +		hv_rf_on_device_remove(hv);
> +		if (hv->kernel_initialized) {
> +			cmd = 1 | HV_DEV_ID;
> +			bytes = write(hv->uio_fd, &cmd, sizeof(uint32_t));
> +			if (bytes)
> +				hv->kernel_initialized = 0;
> +			else
> +				PMD_PWARN_LOG(hv, DBG_LOAD, "write to
> uio_fd %d failed: (%zu)b",
> +						hv->uio_fd, bytes);
> +		}
> +		hv->link_status = 0;
> +	}
> +}
> +
> +static void
> +hyperv_dev_close(struct rte_eth_dev *dev)
> +{
> +	PMD_INIT_FUNC_TRACE();
> +	hyperv_dev_stop(dev);
> +}
> +
> +static void
> +hyperv_dev_promisc_enable(struct rte_eth_dev *dev)
> +{
> +	struct hv_data *hv = dev->data->dev_private;
> +
> +	PMD_INIT_FUNC_TRACE();
> +	hyperv_set_rx_mode(hv, 1, dev->data->all_multicast);
> +}
> +
> +static void
> +hyperv_dev_promisc_disable(struct rte_eth_dev *dev)
> +{
> +	struct hv_data *hv = dev->data->dev_private;
> +
> +	PMD_INIT_FUNC_TRACE();
> +	hyperv_set_rx_mode(hv, 0, dev->data->all_multicast);
> +}
> +
> +static void
> +hyperv_dev_allmulticast_enable(struct rte_eth_dev *dev)
> +{
> +	struct hv_data *hv = dev->data->dev_private;
> +
> +	PMD_INIT_FUNC_TRACE();
> +	hyperv_set_rx_mode(hv, dev->data->promiscuous, 1);
> +}
> +
> +static void
> +hyperv_dev_allmulticast_disable(struct rte_eth_dev *dev)
> +{
> +	struct hv_data *hv = dev->data->dev_private;
> +
> +	PMD_INIT_FUNC_TRACE();
> +	hyperv_set_rx_mode(hv, dev->data->promiscuous, 0);
> +}
> +
> +static void
> +hyperv_dev_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats
> *stats)
> +{
> +	struct hv_data *hv = dev->data->dev_private;
> +	struct hv_stats *st = &hv->stats;
> +
> +	PMD_INIT_FUNC_TRACE();
> +
> +	memset(stats, 0, sizeof(struct rte_eth_stats));
> +
> +	stats->opackets = st->opkts;
> +	stats->obytes = st->obytes;
> +	stats->oerrors = st->oerrors;
> +	stats->ipackets = st->ipkts;
> +	stats->ibytes = st->ibytes;
> +	stats->ierrors = st->ierrors;
> +	stats->rx_nombuf = st->rx_nombuf;
> +}
> +
> +static struct eth_dev_ops hyperv_eth_dev_ops = {
> +	.dev_configure		= hyperv_dev_configure,
> +	.dev_start		= hyperv_dev_start,
> +	.dev_stop		= hyperv_dev_stop,
> +	.dev_infos_get		= hyperv_dev_info_get,
> +	.rx_queue_release	= hyperv_dev_rx_queue_release,
> +	.tx_queue_release	= hyperv_dev_tx_queue_release,
> +	.rx_queue_setup		= hyperv_dev_rx_queue_setup,
> +	.tx_queue_setup		= hyperv_dev_tx_queue_setup,
> +	.dev_close		= hyperv_dev_close,
> +	.promiscuous_enable	= hyperv_dev_promisc_enable,
> +	.promiscuous_disable	= hyperv_dev_promisc_disable,
> +	.allmulticast_enable	= hyperv_dev_allmulticast_enable,
> +	.allmulticast_disable	= hyperv_dev_allmulticast_disable,
> +	.link_update		= hyperv_dev_link_update,
> +	.stats_get		= hyperv_dev_stats_get,
> +};
> +
> +static int
> +eth_hyperv_dev_init(struct rte_eth_dev *eth_dev)
> +{
> +	int ret;
> +
> +	PMD_INIT_FUNC_TRACE();
> +
> +	eth_dev->dev_ops = &hyperv_eth_dev_ops;
> +	eth_dev->data->mac_addrs = rte_malloc("mac_addrs",
> +					      sizeof(struct ether_addr),
> +					      RTE_CACHE_LINE_SIZE);
> +	if (!eth_dev->data->mac_addrs) {
> +		PMD_PERROR_LOG(hv, DBG_LOAD, "unable to allocate
> memory for mac addrs");
> +		return -1;
> +	}
> +
> +	ret = hyperv_init(eth_dev);
> +
> +	return ret;
> +}
> +
> +static struct eth_driver rte_hyperv_pmd = {
> +	.vmbus_drv = {
> +		.name = "rte_hyperv_pmd",
> +		.module_name = "hv_uio",
> +		.id_table = vmbus_id_hyperv_map,
> +	},
> +	.bus_type = RTE_BUS_VMBUS,
> +	.eth_dev_init = eth_hyperv_dev_init,
> +	.dev_private_size = sizeof(struct hv_data),
> +};
> +
> +static int
> +rte_hyperv_pmd_init(const char *name __rte_unused,
> +		    const char *param __rte_unused)
> +{
> +	rte_eth_driver_register(&rte_hyperv_pmd);
> +	return 0;
> +}
> +
> +static struct rte_driver rte_hyperv_driver = {
> +	.type = PMD_PDEV,
> +	.init = rte_hyperv_pmd_init,
> +};
> +
> +PMD_REGISTER_DRIVER(rte_hyperv_driver);
> diff --git a/lib/librte_pmd_hyperv/hyperv_logs.h
> b/lib/librte_pmd_hyperv/hyperv_logs.h
> new file mode 100644
> index 0000000..1b96468
> --- /dev/null
> +++ b/lib/librte_pmd_hyperv/hyperv_logs.h
> @@ -0,0 +1,69 @@
> +/*-
> + *   Copyright(c) 2013-2015 Brocade Communications Systems, Inc.
> + *   All rights reserved.
> + */
> +
> +#ifndef _HYPERV_LOGS_H_
> +#define _HYPERV_LOGS_H_
> +
> +#ifdef RTE_LIBRTE_HV_DEBUG_INIT
> +#define PMD_INIT_LOG(level, fmt, args...) \
> +	RTE_LOG(level, PMD, "%s(): " fmt "\n", __func__, ## args)
> +#define PMD_INIT_FUNC_TRACE() PMD_INIT_LOG(DEBUG, " >>")
> +#else
> +#define PMD_INIT_LOG(level, fmt, args...) do { } while (0)
> +#define PMD_INIT_FUNC_TRACE() do { } while (0)
> +#endif
> +
> +#ifdef RTE_LIBRTE_HV_DEBUG
> +
> +#define RTE_DBG_LOAD   INIT
> +#define RTE_DBG_STATS  STATS
> +#define RTE_DBG_TX     TX
> +#define RTE_DBG_RX     RX
> +#define RTE_DBG_MBUF   MBUF
> +#define RTE_DBG_ASSERT ASRT
> +#define RTE_DBG_RB     RB
> +#define RTE_DBG_VMBUS  VMBUS
> +#define RTE_DBG_ALL    ALL
> +
> +#define STR(x) #x
> +
> +#define HV_RTE_LOG(hv, codepath, level, fmt, args...) \
> +	RTE_LOG(level, PMD, "[%d]: %-6s: %s: " fmt "\n", \
> +		hv->vmbus_device, STR(codepath), __func__, ## args)
> +
> +#define PMD_PDEBUG_LOG(hv, codepath, fmt, args...) \
> +do { \
> +	if (unlikely(hv->debug & (codepath))) \
> +		HV_RTE_LOG(hv, RTE_##codepath, DEBUG, fmt, ## args) \
> +} while (0)
> +
> +#define PMD_PINFO_LOG(hv, codepath, fmt, args...) \
> +do { \
> +	if (unlikely(hv->debug & (codepath))) \
> +		HV_RTE_LOG(hv, RTE_##codepath, INFO, fmt, ## args) \
> +} while (0)
> +
> +#define PMD_PWARN_LOG(hv, codepath, fmt, args...) \
> +do { \
> +	if (unlikely(hv->debug & (codepath))) \
> +		HV_RTE_LOG(hv, RTE_##codepath, WARNING, fmt, ## args)
> \
> +} while (0)
> +
> +#define PMD_PERROR_LOG(hv, codepath, fmt, args...) \
> +do { \
> +	if (unlikely(hv->debug & (codepath))) \
> +		HV_RTE_LOG(hv, RTE_##codepath, ERR, fmt, ## args) \
> +} while (0)
> +#else
> +#define HV_RTE_LOG(level, fmt, args...) do { } while (0)
> +#define PMD_PDEBUG_LOG(fmt, args...) do { } while (0)
> +#define PMD_PINFO_LOG(fmt, args...) do { } while (0)
> +#define PMD_PWARN_LOG(fmt, args...) do { } while (0)
> +#define PMD_PERROR_LOG(fmt, args...) do { } while (0)
> +#undef RTE_LIBRTE_HV_DEBUG_TX
> +#undef RTE_LIBRTE_HV_DEBUG_RX
> +#endif
> +
> +#endif /* _HYPERV_LOGS_H_ */
> diff --git a/lib/librte_pmd_hyperv/hyperv_rxtx.c
> b/lib/librte_pmd_hyperv/hyperv_rxtx.c
> new file mode 100644
> index 0000000..9e423d0
> --- /dev/null
> +++ b/lib/librte_pmd_hyperv/hyperv_rxtx.c
> @@ -0,0 +1,403 @@
> +/*-
> + *   Copyright(c) 2013-2015 Brocade Communications Systems, Inc.
> + *   All rights reserved.
> + */
> +
> +#include "hyperv.h"
> +#include "hyperv_rxtx.h"
> +#include "hyperv_drv.h"
> +
> +#define RTE_MBUF_DATA_DMA_ADDR(mb) \
> +	((uint64_t)((mb)->buf_physaddr + (mb)->data_off))
> +
> +#define RPPI_SIZE	(sizeof(struct rndis_per_packet_info)\
> +			 + sizeof(struct ndis_8021q_info))
> +#define RNDIS_OFF	(sizeof(struct netvsc_packet) + RPPI_SIZE)
> +#define TX_PKT_SIZE	(RNDIS_OFF + sizeof(struct rndis_filter_packet) * 2)
> +
> +static inline struct rte_mbuf *
> +hv_rxmbuf_alloc(struct rte_mempool *mp)
> +{
> +	return	__rte_mbuf_raw_alloc(mp);
> +}
> +
> +static inline int
> +hyperv_has_rx_work(struct hv_data *hv)
> +{
> +	return hv->in->read_index != hv->in->write_index;
> +}
> +
> +#ifndef DEFAULT_TX_FREE_THRESHOLD
> +#define DEFAULT_TX_FREE_THRESHOLD 32
> +#endif
> +
> +int
> +hyperv_dev_tx_queue_setup(struct rte_eth_dev *dev,
> +			  uint16_t queue_idx,
> +			  uint16_t nb_desc,
> +			  unsigned int socket_id,
> +			  const struct rte_eth_txconf *tx_conf)
> +
> +{
> +	struct hv_data *hv = dev->data->dev_private;
> +	const struct rte_memzone *tz;
> +	struct hv_tx_queue *txq;
> +	char tz_name[RTE_MEMZONE_NAMESIZE];
> +	uint32_t i, delta = 0, new_delta;
> +	struct netvsc_packet *pkt;
> +
> +	PMD_INIT_FUNC_TRACE();
> +
> +	txq = rte_zmalloc_socket("ethdev TX queue", sizeof(struct
> hv_tx_queue),
> +				 RTE_CACHE_LINE_SIZE, socket_id);
> +	if (txq == NULL) {
> +		PMD_PERROR_LOG(hv, DBG_LOAD, "rte_zmalloc for
> tx_queue failed");
> +		return -ENOMEM;
> +	}
> +
> +	if (tx_conf->tx_free_thresh >= nb_desc) {
> +		PMD_PERROR_LOG(hv, DBG_LOAD,
> +			       "tx_free_thresh should be less then nb_desc");
> +		return -EINVAL;
> +	}
> +	txq->tx_free_thresh = (tx_conf->tx_free_thresh ? tx_conf-
> >tx_free_thresh :
> +			       DEFAULT_TX_FREE_THRESHOLD);
> +	txq->pkts = rte_calloc_socket("TX pkts", sizeof(void*), nb_desc,
> +				       RTE_CACHE_LINE_SIZE, socket_id);
> +	if (txq->pkts == NULL) {
> +		PMD_PERROR_LOG(hv, DBG_LOAD, "rte_zmalloc for pkts
> failed");
> +		return -ENOMEM;
> +	}
> +	sprintf(tz_name, "hv_%d_%u_%u", hv->vmbus_device, queue_idx,
> socket_id);
> +	tz = rte_memzone_reserve_aligned(tz_name,
> +					 (uint32_t)nb_desc * TX_PKT_SIZE,
> +
> rte_lcore_to_socket_id(rte_lcore_id()),
> +					 0, PAGE_SIZE);
> +	if (tz == NULL) {
> +		PMD_PERROR_LOG(hv, DBG_LOAD, "netvsc packet ring alloc
> fail");
> +		return -ENOMEM;
> +	}
> +	for (i = 0; i < nb_desc; i++) {
> +		pkt = txq->pkts[i] = (struct netvsc_packet *)((uint8_t *)tz-
> >addr +
> +							      i * TX_PKT_SIZE +
> delta);
> +		pkt->extension = (uint8_t *)tz->addr + i * TX_PKT_SIZE +
> RNDIS_OFF + delta;
> +		if (!pkt->extension) {
> +			PMD_PERROR_LOG(hv, DBG_TX,
> +				       "pkt->extension is NULL for %d-th pkt", i);
> +			return -EINVAL;
> +		}
> +		pkt->extension_phys_addr =
> +			tz->phys_addr + i * TX_PKT_SIZE + RNDIS_OFF +
> delta;
> +		pkt->ext_pages = 1;
> +		pkt->page_buffers[0].pfn = pkt->extension_phys_addr >>
> PAGE_SHIFT;
> +		pkt->page_buffers[0].offset =
> +			(unsigned long)pkt->extension & (PAGE_SIZE - 1);
> +		pkt->page_buffers[0].length = RNDIS_MESSAGE_SIZE(struct
> rndis_packet);
> +		if (pkt->page_buffers[0].offset + pkt-
> >page_buffers[0].length
> +		    > PAGE_SIZE) {
> +			new_delta = PAGE_SIZE - pkt-
> >page_buffers[0].offset;
> +			pkt->page_buffers[0].pfn++;
> +			delta += new_delta;
> +			pkt->page_buffers[0].offset = 0;
> +			pkt->extension = (uint8_t *)pkt->extension +
> new_delta;
> +			pkt->extension_phys_addr += new_delta;
> +		}
> +	}
> +	txq->sw_ring = rte_calloc_socket("txq_sw_ring",
> +					 sizeof(struct rte_mbuf *), nb_desc,
> +					 RTE_CACHE_LINE_SIZE, socket_id);
> +	if (txq->sw_ring == NULL) {
> +		hyperv_dev_tx_queue_release(txq);
> +		return -ENOMEM;
> +	}
> +	txq->port_id = dev->data->port_id;
> +	txq->nb_tx_desc = txq->tx_avail = nb_desc;
> +	txq->tx_free_thresh = tx_conf->tx_free_thresh;
> +	txq->hv = hv;
> +	dev->data->tx_queues[queue_idx] = txq;
> +	hv->txq = txq;
> +
> +	return 0;
> +}
> +
> +void
> +hyperv_dev_tx_queue_release(void *ptxq)
> +{
> +	struct hv_tx_queue *txq = ptxq;
> +
> +	PMD_INIT_FUNC_TRACE();
> +	if (txq == NULL)
> +		return;
> +	rte_free(txq->sw_ring);
> +	rte_free(txq->pkts);
> +	rte_free(txq);
> +}
> +
> +int
> +hyperv_dev_rx_queue_setup(struct rte_eth_dev *dev,
> +			  uint16_t queue_idx,
> +			  uint16_t nb_desc,
> +			  unsigned int socket_id,
> +			  const struct rte_eth_rxconf *rx_conf,
> +			  struct rte_mempool *mp)
> +{
> +	uint16_t i;
> +	struct hv_rx_queue *rxq;
> +	struct rte_mbuf *mbuf;
> +	struct hv_data *hv = dev->data->dev_private;
> +
> +	PMD_INIT_FUNC_TRACE();
> +
> +	rxq = rte_zmalloc_socket("ethdev RX queue", sizeof(struct
> hv_rx_queue),
> +				 RTE_CACHE_LINE_SIZE, socket_id);
> +	if (rxq == NULL) {
> +		PMD_PERROR_LOG(hv, DBG_LOAD,
> +			       "rte_zmalloc for rx_queue failed!");
> +		return -ENOMEM;
> +	}
> +	hv->desc = rxq->desc = rte_zmalloc_socket(NULL, PAGE_SIZE,
> +						  RTE_CACHE_LINE_SIZE,
> socket_id);
> +	if (rxq->desc == NULL) {
> +		PMD_PERROR_LOG(hv, DBG_LOAD,
> +			       "rte_zmalloc for vmbus_desc failed!");
> +		hyperv_dev_rx_queue_release(rxq);
> +		return -ENOMEM;
> +	}
> +	rxq->sw_ring = rte_calloc_socket("rxq->sw_ring",
> +					 sizeof(struct mbuf *), nb_desc,
> +					 RTE_CACHE_LINE_SIZE, socket_id);
> +	if (rxq->sw_ring == NULL) {
> +		hyperv_dev_rx_queue_release(rxq);
> +		return -ENOMEM;
> +	}
> +
> +	for (i = 0; i < nb_desc; i++) {
> +		mbuf = hv_rxmbuf_alloc(mp);
> +		if (mbuf == NULL) {
> +			PMD_PERROR_LOG(hv, DBG_LOAD, "RX mbuf alloc
> failed");
> +			return -ENOMEM;
> +		}
> +
> +		mbuf->nb_segs = 1;
> +		mbuf->next = NULL;
> +		mbuf->port = rxq->port_id;
> +		rxq->sw_ring[i] = mbuf;
> +	}
> +
> +	rxq->mb_pool = mp;
> +	rxq->nb_rx_desc = nb_desc;
> +	rxq->rx_head = 0;
> +	rxq->rx_tail = 0;
> +	rxq->rx_free_thresh = rx_conf->rx_free_thresh;
> +	rxq->port_id = dev->data->port_id;
> +	rxq->hv = hv;
> +	dev->data->rx_queues[queue_idx] = rxq;
> +	hv->rxq = rxq;
> +	hv->max_rx_pkt_len = mp->elt_size -
> +		(sizeof(struct rte_mbuf) + RTE_PKTMBUF_HEADROOM);
> +
> +	return 0;
> +}
> +
> +void
> +hyperv_dev_rx_queue_release(void *prxq)
> +{
> +	struct hv_rx_queue *rxq = prxq;
> +
> +	PMD_INIT_FUNC_TRACE();
> +	if (rxq == NULL)
> +		return;
> +	rte_free(rxq->sw_ring);
> +	rte_free(rxq->desc);
> +	rte_free(rxq);
> +}
> +
> +uint16_t
> +hyperv_recv_pkts(void *prxq, struct rte_mbuf **rx_pkts, uint16_t
> nb_pkts)
> +{
> +	struct hv_rx_queue *rxq = prxq;
> +	struct hv_data *hv = rxq->hv;
> +	struct rte_mbuf *new_mb, *rx_mbuf, *first_mbuf;
> +	uint16_t nb_rx = 0;
> +	uint16_t segs, i;
> +
> +	if (unlikely(hv->closed))
> +		return 0;
> +
> +	nb_pkts = MIN(nb_pkts, HV_MAX_PKT_BURST);
> +	hyperv_scan_comps(hv, 0);
> +
> +	while (nb_rx < nb_pkts) {
> +		/*
> +		 * if there are no mbufs in sw_ring,
> +		 * we need to trigger receive procedure
> +		 */
> +		if (rxq->rx_head == rxq->rx_tail) {
> +			if (!hyperv_has_rx_work(hv))
> +				break;
> +
> +			if (unlikely(!hyperv_get_buffer(hv, rxq->desc,
> PAGE_SIZE))) {
> +				hyperv_scan_comps(hv, 0);
> +				continue;
> +			}
> +		}
> +
> +		/*
> +		 * Now the received data is in sw_ring of our rxq
> +		 * we need to extract it and replace in sw_ring with new
> mbuf
> +		 */
> +		rx_mbuf = first_mbuf = rxq->sw_ring[rxq->rx_head];
> +		segs = first_mbuf->nb_segs;
> +		for (i = 0; i < segs; ++i) {
> +			new_mb = hv_rxmbuf_alloc(rxq->mb_pool);
> +			if (unlikely(!new_mb)) {
> +				PMD_PERROR_LOG(hv, DBG_RX, "mbuf alloc
> fail");
> +				++hv->stats.rx_nombuf;
> +				return nb_rx;
> +			}
> +
> +			rx_mbuf = rxq->sw_ring[rxq->rx_head];
> +			rxq->sw_ring[rxq->rx_head] = new_mb;
> +
> +			if (++rxq->rx_head == rxq->nb_rx_desc)
> +				rxq->rx_head = 0;
> +
> +			rx_mbuf->ol_flags |= PKT_RX_IPV4_HDR;
> +			rx_mbuf->port = rxq->port_id;
> +		}
> +		rx_mbuf->next = NULL;
> +
> +		rx_pkts[nb_rx++] = first_mbuf;
> +		++hv->stats.ipkts;
> +		hv->stats.ibytes += first_mbuf->pkt_len;
> +	}
> +
> +	return nb_rx;
> +}
> +
> +static void hyperv_txeof(struct hv_tx_queue *txq)
> +{
> +	struct rte_mbuf *mb, *mb_next;
> +
> +	txq->tx_avail += txq->tx_free;
> +	while (txq->tx_free) {
> +		--txq->tx_free;
> +		mb = txq->sw_ring[txq->tx_head];
> +		while (mb) {
> +			mb_next = mb->next;
> +			rte_mempool_put(mb->pool, mb);
> +			mb = mb_next;
> +		}
> +		if (++txq->tx_head == txq->nb_tx_desc)
> +			txq->tx_head = 0;
> +	}
> +}
> +
> +uint16_t
> +hyperv_xmit_pkts(void *ptxq, struct rte_mbuf **tx_pkts, uint16_t
> nb_pkts)
> +{
> +	struct hv_tx_queue *txq = ptxq;
> +	struct hv_data *hv = txq->hv;
> +	struct netvsc_packet *packet;
> +	struct rte_mbuf *m;
> +	uint32_t data_pages;
> +	uint64_t first_data_page;
> +	uint32_t total_len;
> +	uint32_t len;
> +	uint16_t i, nb_tx;
> +	uint8_t rndis_pages;
> +	int ret;
> +
> +	if (unlikely(hv->closed))
> +		return 0;
> +
> +	for (nb_tx = 0; nb_tx < nb_pkts; ++nb_tx) {
> +		hyperv_scan_comps(hv, 0);
> +		/* Determine if the descriptor ring needs to be cleaned. */
> +		if (txq->tx_free > txq->tx_free_thresh)
> +			hyperv_txeof(txq);
> +
> +		if (!txq->tx_avail) {
> +			hyperv_scan_comps(hv, 1);
> +			hyperv_txeof(txq);
> +			if (!txq->tx_avail) {
> +				PMD_PWARN_LOG(hv, DBG_TX, "No TX
> mbuf available");
> +				break;
> +			}
> +		}
> +		m = tx_pkts[nb_tx];
> +		len = m->data_len;
> +		total_len = m->pkt_len;
> +		first_data_page = RTE_MBUF_DATA_DMA_ADDR(m) >>
> PAGE_SHIFT;
> +		data_pages = ((RTE_MBUF_DATA_DMA_ADDR(m) + len - 1)
> >> PAGE_SHIFT) -
> +			first_data_page + 1;
> +
> +		packet = txq->pkts[txq->tx_tail];
> +		rndis_pages = packet->ext_pages;
> +
> +		txq->sw_ring[txq->tx_tail] = m;
> +		packet->tot_data_buf_len = total_len;
> +		packet->page_buffers[rndis_pages].pfn =
> +			RTE_MBUF_DATA_DMA_ADDR(m) >> PAGE_SHIFT;
> +		packet->page_buffers[rndis_pages].offset =
> +			RTE_MBUF_DATA_DMA_ADDR(m) & (PAGE_SIZE -
> 1);
> +		if (data_pages == 1)
> +			packet->page_buffers[rndis_pages].length = len;
> +		else
> +			packet->page_buffers[rndis_pages].length =
> PAGE_SIZE -
> +				packet->page_buffers[rndis_pages].offset;
> +
> +		for (i = 1; i < data_pages; ++i) {
> +			packet->page_buffers[rndis_pages + i].pfn =
> first_data_page + i;
> +			packet->page_buffers[rndis_pages + i].offset = 0;
> +			packet->page_buffers[rndis_pages + i].length =
> PAGE_SIZE;
> +		}
> +		if (data_pages > 1)
> +			packet->page_buffers[rndis_pages - 1 +
> data_pages].length =
> +				((rte_pktmbuf_mtod(m, unsigned long) + len
> - 1)
> +				 & (PAGE_SIZE - 1)) + 1;
> +
> +		uint16_t index = data_pages + rndis_pages;
> +
> +		for (i = 1; i < m->nb_segs; ++i) {
> +			m = m->next;
> +			len = m->data_len;
> +			first_data_page = RTE_MBUF_DATA_DMA_ADDR(m)
> >> PAGE_SHIFT;
> +			data_pages = ((RTE_MBUF_DATA_DMA_ADDR(m) +
> len - 1) >> PAGE_SHIFT) -
> +				first_data_page + 1;
> +			packet->page_buffers[index].pfn =
> +				RTE_MBUF_DATA_DMA_ADDR(m) >>
> PAGE_SHIFT;
> +			packet->page_buffers[index].offset =
> +				rte_pktmbuf_mtod(m, unsigned long)
> +				& (PAGE_SIZE - 1);
> +			packet->page_buffers[index].length = m->data_len;
> +			if (data_pages > 1) {
> +				/* It can be 2 in case of usual mbuf_size=2048
> */
> +				packet->page_buffers[index].length =
> PAGE_SIZE -
> +					packet->page_buffers[index].offset;
> +				packet->page_buffers[++index].offset = 0;
> +				packet->page_buffers[index].pfn =
> +					packet->page_buffers[index - 1].pfn
> + 1;
> +				packet->page_buffers[index].length =
> +					m->data_len
> +					- packet->page_buffers[index -
> 1].length;
> +			}
> +			++index;
> +		}
> +		packet->page_buf_count = index;
> +
> +		ret = hv_rf_on_send(hv, packet);
> +		if (likely(ret == 0)) {
> +			++hv->stats.opkts;
> +			hv->stats.obytes += total_len;
> +			if (++txq->tx_tail == txq->nb_tx_desc)
> +				txq->tx_tail = 0;
> +			--txq->tx_avail;
> +		} else {
> +			++hv->stats.oerrors;
> +			PMD_PERROR_LOG(hv, DBG_TX, "TX ring buffer is
> busy");
> +		}
> +	}
> +
> +	return nb_tx;
> +}
> diff --git a/lib/librte_pmd_hyperv/hyperv_rxtx.h
> b/lib/librte_pmd_hyperv/hyperv_rxtx.h
> new file mode 100644
> index 0000000..c45a704
> --- /dev/null
> +++ b/lib/librte_pmd_hyperv/hyperv_rxtx.h
> @@ -0,0 +1,35 @@
> +/*-
> + *   Copyright(c) 2013-2015 Brocade Communications Systems, Inc.
> + *   All rights reserved.
> + */
> +
> +/**
> + * Structure associated with each TX queue.
> + */
> +struct hv_tx_queue {
> +	struct netvsc_packet    **pkts;
> +	struct rte_mbuf         **sw_ring;
> +	uint16_t                nb_tx_desc;
> +	uint16_t                tx_avail;
> +	uint16_t                tx_head;
> +	uint16_t                tx_tail;
> +	uint16_t                tx_free_thresh;
> +	uint16_t                tx_free;
> +	uint8_t                 port_id;
> +	struct hv_data          *hv;
> +} __rte_cache_aligned;
> +
> +/**
> + * Structure associated with each RX queue.
> + */
> +struct hv_rx_queue {
> +	struct rte_mempool      *mb_pool;
> +	struct rte_mbuf         **sw_ring;
> +	uint16_t                nb_rx_desc;
> +	uint16_t                rx_head;
> +	uint16_t                rx_tail;
> +	uint16_t                rx_free_thresh;
> +	uint8_t                 port_id;
> +	struct hv_data          *hv;
> +	struct hv_vm_packet_descriptor *desc;
> +} __rte_cache_aligned;
> diff --git a/mk/rte.app.mk b/mk/rte.app.mk
> index 62a76ae..e0416d1 100644
> --- a/mk/rte.app.mk
> +++ b/mk/rte.app.mk
> @@ -133,6 +133,10 @@ LDLIBS += -lm
>  LDLIBS += -lrt
>  endif
> 
> +ifeq ($(CONFIG_RTE_LIBRTE_HV_PMD),y)
> +LDLIBS += -lrte_pmd_hyperv
> +endif
> +
>  ifeq ($(CONFIG_RTE_LIBRTE_VHOST), y)
>  LDLIBS += -lrte_vhost
>  endif
> --
> 2.1.4

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-dev] [PATCH v4 5/7] hv: poll mode driver
  2015-04-21 19:34   ` Butler, Siobhan A
@ 2015-04-21 21:35     ` Stephen Hemminger
  2015-07-09  0:01       ` Thomas Monjalon
  0 siblings, 1 reply; 17+ messages in thread
From: Stephen Hemminger @ 2015-04-21 21:35 UTC (permalink / raw)
  To: Butler, Siobhan A; +Cc: dev, Stas Egorov, Stephen Hemminger, alexmay

On Tue, 21 Apr 2015 19:34:39 +0000
"Butler, Siobhan A" <siobhan.a.butler@intel.com> wrote:

> Hi Stephen 
> Will you have documentation to go along with these changes?
> Thanks
> Siobhan

Unlikely. Microsoft or other contributors might add something
in a later version.

The documentation that exists in DPDK related drivers just
won't scale as more drivers are added. It needs to be massively
simplified and generalized.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-dev] [PATCH v4 5/7] hv: poll mode driver
  2015-04-21 21:35     ` Stephen Hemminger
@ 2015-07-09  0:01       ` Thomas Monjalon
  0 siblings, 0 replies; 17+ messages in thread
From: Thomas Monjalon @ 2015-07-09  0:01 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Stas Egorov, alexmay, Stephen Hemminger

2015-04-21 14:35, Stephen Hemminger:
> On Tue, 21 Apr 2015 19:34:39 +0000
> "Butler, Siobhan A" <siobhan.a.butler@intel.com> wrote:
> 
> > Hi Stephen 
> > Will you have documentation to go along with these changes?
> > Thanks
> > Siobhan
> 
> Unlikely. Microsoft or other contributors might add something
> in a later version.
> 
> The documentation that exists in DPDK related drivers just
> won't scale as more drivers are added. It needs to be massively
> simplified and generalized.

I'm afraid you'll need to put a rst file in doc/guides/nics/.
At least, you need to describe how to use the specific bus and explains that
it is supported only on Linux with recent kernels.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-dev] [PATCH v4 5/7] hv: poll mode driver
  2015-04-21 17:32 ` [dpdk-dev] [PATCH v4 5/7] hv: poll mode driver Stephen Hemminger
  2015-04-21 19:34   ` Butler, Siobhan A
@ 2015-07-09  0:05   ` Thomas Monjalon
  1 sibling, 0 replies; 17+ messages in thread
From: Thomas Monjalon @ 2015-07-09  0:05 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Stas Egorov, Stephen Hemminger, alexmay

2015-04-21 10:32, Stephen Hemminger:
> From: Stephen Hemminger <shemming@brocade.com>
> 
> This is new Poll Mode driver for using hyper-v virtual network
> interface.
> 
> Signed-off-by: Stas Egorov <segorov@mirantis.com>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  lib/Makefile                          |    1 +
>  lib/librte_pmd_hyperv/Makefile        |   28 +
>  lib/librte_pmd_hyperv/hyperv.h        |  169 ++++
>  lib/librte_pmd_hyperv/hyperv_drv.c    | 1653 +++++++++++++++++++++++++++++++++
>  lib/librte_pmd_hyperv/hyperv_drv.h    |  558 +++++++++++
>  lib/librte_pmd_hyperv/hyperv_ethdev.c |  332 +++++++
>  lib/librte_pmd_hyperv/hyperv_logs.h   |   69 ++
>  lib/librte_pmd_hyperv/hyperv_rxtx.c   |  403 ++++++++
>  lib/librte_pmd_hyperv/hyperv_rxtx.h   |   35 +
>  mk/rte.app.mk                         |    4 +
>  10 files changed, 3252 insertions(+)

Please split in separate patches:
- setup
- Rx
- Tx
- link state
- stats
- promisc

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [dpdk-dev] [PATCH v4 6/7] hv: enable driver in common config
  2015-04-21 17:32 [dpdk-dev] [PATCH v4 0/7] Hyper-V Poll Mode driver Stephen Hemminger
                   ` (4 preceding siblings ...)
  2015-04-21 17:32 ` [dpdk-dev] [PATCH v4 5/7] hv: poll mode driver Stephen Hemminger
@ 2015-04-21 17:32 ` Stephen Hemminger
  2015-07-08 23:58   ` Thomas Monjalon
  2015-04-21 17:32 ` [dpdk-dev] [PATCH v4 7/7] hv: add kernel patch Stephen Hemminger
  6 siblings, 1 reply; 17+ messages in thread
From: Stephen Hemminger @ 2015-04-21 17:32 UTC (permalink / raw)
  To: alexmay; +Cc: dev, Stephen Hemminger

From: Stephen Hemminger <shemming@brocade.com>

Add hyperv driver config to enable it.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 config/common_linuxapp | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 0078dc9..58cc352 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -234,6 +234,15 @@ CONFIG_RTE_LIBRTE_VMXNET3_DEBUG_TX_FREE=n
 CONFIG_RTE_LIBRTE_VMXNET3_DEBUG_DRIVER=n
 
 #
+# Compile burst-mode Hyperv PMD driver
+#
+CONFIG_RTE_LIBRTE_HV_PMD=y
+CONFIG_RTE_LIBRTE_HV_DEBUG=n
+CONFIG_RTE_LIBRTE_HV_DEBUG_INIT=n
+CONFIG_RTE_LIBRTE_HV_DEBUG_RX=n
+CONFIG_RTE_LIBRTE_HV_DEBUG_TX=n
+
+#
 # Compile example software rings based PMD
 #
 CONFIG_RTE_LIBRTE_PMD_RING=y
-- 
2.1.4

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-dev] [PATCH v4 6/7] hv: enable driver in common config
  2015-04-21 17:32 ` [dpdk-dev] [PATCH v4 6/7] hv: enable driver in common config Stephen Hemminger
@ 2015-07-08 23:58   ` Thomas Monjalon
  0 siblings, 0 replies; 17+ messages in thread
From: Thomas Monjalon @ 2015-07-08 23:58 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Stephen Hemminger, alexmay

2015-04-21 10:32, Stephen Hemminger:
> Add hyperv driver config to enable it.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  config/common_linuxapp | 9 +++++++++

It would be clearer to add a disabled config option in bsdapp with
the comment that it is not supported on FreeBSD.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [dpdk-dev] [PATCH v4 7/7] hv: add kernel patch
  2015-04-21 17:32 [dpdk-dev] [PATCH v4 0/7] Hyper-V Poll Mode driver Stephen Hemminger
                   ` (5 preceding siblings ...)
  2015-04-21 17:32 ` [dpdk-dev] [PATCH v4 6/7] hv: enable driver in common config Stephen Hemminger
@ 2015-04-21 17:32 ` Stephen Hemminger
  6 siblings, 0 replies; 17+ messages in thread
From: Stephen Hemminger @ 2015-04-21 17:32 UTC (permalink / raw)
  To: alexmay; +Cc: dev, Stephen Hemminger

From: Stephen Hemminger <shemming@brocade.com>

For users using non latest kernels, put kernel patch in for
them to use.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 .../linuxapp/hv_uio/vmbus-get-pages.patch          | 55 ++++++++++++++++++++++
 1 file changed, 55 insertions(+)
 create mode 100644 lib/librte_eal/linuxapp/hv_uio/vmbus-get-pages.patch

diff --git a/lib/librte_eal/linuxapp/hv_uio/vmbus-get-pages.patch b/lib/librte_eal/linuxapp/hv_uio/vmbus-get-pages.patch
new file mode 100644
index 0000000..ae27fbd
--- /dev/null
+++ b/lib/librte_eal/linuxapp/hv_uio/vmbus-get-pages.patch
@@ -0,0 +1,55 @@
+hyper-v: allow access to vmbus from userspace driver
+
+This is patch from  to allow access to hyper-v vmbus from UIO driver.
+
+Signed-off-by: Stas Egorov <segorov@mirantis.com>
+Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
+
+---
+v2 - simplify and rename to vmbus_get_monitor_pages
+
+ drivers/hv/connection.c |   20 +++++++++++++++++---
+ include/linux/hyperv.h  |    3 +++
+ 2 files changed, 20 insertions(+), 3 deletions(-)
+
+--- a/drivers/hv/connection.c	2015-02-03 10:58:51.751752450 -0800
++++ b/drivers/hv/connection.c	2015-02-04 14:59:51.636194383 -0800
+@@ -64,6 +64,15 @@ static __u32 vmbus_get_next_version(__u3
+ 	}
+ }
+
++void vmbus_get_monitor_pages(unsigned long *int_page,
++			     unsigned long monitor_pages[2])
++{
++	*int_page = (unsigned long)vmbus_connection.int_page;
++	monitor_pages[0] = (unsigned long)vmbus_connection.monitor_pages[0];
++	monitor_pages[1] = (unsigned long)vmbus_connection.monitor_pages[1];
++}
++EXPORT_SYMBOL_GPL(vmbus_get_monitor_pages);
++
+ static int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo,
+ 					__u32 version)
+ {
+@@ -347,10 +356,7 @@ static void process_chn_event(u32 relid)
+ 			else
+ 				bytes_to_read = 0;
+ 		} while (read_state && (bytes_to_read != 0));
+-	} else {
+-		pr_err("no channel callback for relid - %u\n", relid);
+ 	}
+-
+ }
+
+ /*
+--- a/include/linux/hyperv.h	2015-02-03 10:58:51.751752450 -0800
++++ b/include/linux/hyperv.h	2015-02-04 15:00:26.388355012 -0800
+@@ -868,6 +868,9 @@ extern int vmbus_recvpacket_raw(struct v
+
+ extern void vmbus_ontimer(unsigned long data);
+
++extern void vmbus_get_monitor_pages(unsigned long *int_page,
++				    unsigned long monitor_pages[2]);
++
+ /* Base driver object */
+ struct hv_driver {
+ 	const char *name;
-- 
2.1.4

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-02-08 23:25 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-21 17:32 [dpdk-dev] [PATCH v4 0/7] Hyper-V Poll Mode driver Stephen Hemminger
2015-04-21 17:32 ` [dpdk-dev] [PATCH v4 1/7] ether: add function to query for link state interrupt Stephen Hemminger
2015-07-08 23:42   ` Thomas Monjalon
     [not found]   ` <d0360434d10a44dcb9f5c9c7220c3162@HQ1WP-EXMB11.corp.brocade.com>
2017-02-08 23:25     ` Stephen Hemminger
2015-04-21 17:32 ` [dpdk-dev] [PATCH v4 2/7] pmd: change drivers initialization for pci Stephen Hemminger
2015-04-21 17:32 ` [dpdk-dev] [PATCH v4 3/7] hv: add basic vmbus support Stephen Hemminger
2015-07-08 23:51   ` Thomas Monjalon
2015-04-21 17:32 ` [dpdk-dev] [PATCH v4 4/7] hv: uio driver Stephen Hemminger
2015-07-08 23:55   ` Thomas Monjalon
2015-04-21 17:32 ` [dpdk-dev] [PATCH v4 5/7] hv: poll mode driver Stephen Hemminger
2015-04-21 19:34   ` Butler, Siobhan A
2015-04-21 21:35     ` Stephen Hemminger
2015-07-09  0:01       ` Thomas Monjalon
2015-07-09  0:05   ` Thomas Monjalon
2015-04-21 17:32 ` [dpdk-dev] [PATCH v4 6/7] hv: enable driver in common config Stephen Hemminger
2015-07-08 23:58   ` Thomas Monjalon
2015-04-21 17:32 ` [dpdk-dev] [PATCH v4 7/7] hv: add kernel patch Stephen Hemminger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).