DPDK patches and discussions
 help / color / Atom feed
* [dpdk-dev] [PATCH 0/4] enable FIFO for NTB
@ 2019-09-05  5:39 Xiaoyun Li
  2019-09-05  5:39 ` [dpdk-dev] [PATCH 1/4] raw/ntb: setup ntb queue Xiaoyun Li
                   ` (5 more replies)
  0 siblings, 6 replies; 42+ messages in thread
From: Xiaoyun Li @ 2019-09-05  5:39 UTC (permalink / raw)
  To: jingjing.wu, keith.wiles, omkar.maslekar, cunming.liang; +Cc: dev, Xiaoyun Li

Enable FIFO for NTB rawdev driver to support packet based
processing. And an example is provided to support txonly,
rxonly, iofwd between NTB device and ethdev, and file
transmission.

Xiaoyun Li (4):
  raw/ntb: setup ntb queue
  raw/ntb: add xstats support
  raw/ntb: add enqueue and dequeue functions
  examples/ntb: support more functions for NTB

 doc/guides/rawdevs/ntb.rst             |   67 +-
 doc/guides/rel_notes/release_19_11.rst |    4 +
 doc/guides/sample_app_ug/ntb.rst       |   59 +-
 drivers/raw/ntb/Makefile               |    3 +
 drivers/raw/ntb/meson.build            |    1 +
 drivers/raw/ntb/ntb.c                  | 1078 +++++++++++++++-----
 drivers/raw/ntb/ntb.h                  |  162 ++-
 drivers/raw/ntb/ntb_hw_intel.c         |   48 +-
 drivers/raw/ntb/rte_pmd_ntb.h          |   43 +
 examples/ntb/meson.build               |    3 +
 examples/ntb/ntb_fwd.c                 | 1297 +++++++++++++++++++++---
 11 files changed, 2349 insertions(+), 416 deletions(-)
 create mode 100644 drivers/raw/ntb/rte_pmd_ntb.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [PATCH 1/4] raw/ntb: setup ntb queue
  2019-09-05  5:39 [dpdk-dev] [PATCH 0/4] enable FIFO for NTB Xiaoyun Li
@ 2019-09-05  5:39 ` Xiaoyun Li
  2019-09-05  5:39 ` [dpdk-dev] [PATCH 2/4] raw/ntb: add xstats support Xiaoyun Li
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 42+ messages in thread
From: Xiaoyun Li @ 2019-09-05  5:39 UTC (permalink / raw)
  To: jingjing.wu, keith.wiles, omkar.maslekar, cunming.liang; +Cc: dev, Xiaoyun Li

Setup and init ntb txq and rxq. And negotiate queue information
with the peer. If queue size and number of queues are not
consistent on both sides, return error.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
---
 doc/guides/rawdevs/ntb.rst             |  39 +-
 doc/guides/rel_notes/release_19_11.rst |   4 +
 drivers/raw/ntb/Makefile               |   3 +
 drivers/raw/ntb/meson.build            |   1 +
 drivers/raw/ntb/ntb.c                  | 705 ++++++++++++++++++-------
 drivers/raw/ntb/ntb.h                  | 151 ++++--
 drivers/raw/ntb/ntb_hw_intel.c         |  26 +-
 drivers/raw/ntb/rte_pmd_ntb.h          |  43 ++
 8 files changed, 718 insertions(+), 254 deletions(-)
 create mode 100644 drivers/raw/ntb/rte_pmd_ntb.h

diff --git a/doc/guides/rawdevs/ntb.rst b/doc/guides/rawdevs/ntb.rst
index 0a61ec03d..99e7db441 100644
--- a/doc/guides/rawdevs/ntb.rst
+++ b/doc/guides/rawdevs/ntb.rst
@@ -45,8 +45,45 @@ to use, i.e. igb_uio, vfio. The ``dpdk-devbind.py`` script can be used to
 show devices status and to bind them to a suitable kernel driver. They will
 appear under the category of "Misc (rawdev) devices".
 
+Ring Layout
+-----------
+
+Since read/write remote system's memory are through PCI bus, remote read
+is much more expensive than remote write. Thus, the enqueue and dequeue
+based on ntb ring should avoid remote read. The ring layout for ntb is
+like the following:
+- Ring Format:
+  desc_ring:
+      0               16                                              64
+      +---------------------------------------------------------------+
+      |                        buffer address                         |
+      +---------------+-----------------------------------------------+
+      | buffer length |                      resv                     |
+      +---------------+-----------------------------------------------+
+  used_ring:
+      0               16              32
+      +---------------+---------------+
+      | packet length |     flags     |
+      +---------------+---------------+
+- Ring Layout
+      +------------------------+   +------------------------+
+      | used_ring              |   | desc_ring              |
+      | +---+                  |   | +---+                  |
+      | |   |                  |   | |   |                  |
+      | +---+      +--------+  |   | +---+                  |
+      | |   | ---> | buffer | <+---+-|   |                  |
+      | +---+      +--------+  |   | +---+                  |
+      | |   |                  |   | |   |                  |
+      | +---+                  |   | +---+                  |
+      |  ...                   |   |  ...                   |
+      |                        |   |                        |
+      |            +---------+ |   |            +---------+ |
+      |            | tx_tail | |   |            | rx_tail | |
+      | System A   +---------+ |   | System B   +---------+ |
+      +------------------------+   +------------------------+
+                    <---------traffic---------
+
 Limitation
 ----------
 
-- The FIFO hasn't been introduced and will come in 19.11 release.
 - This PMD only supports Intel Skylake platform.
diff --git a/doc/guides/rel_notes/release_19_11.rst b/doc/guides/rel_notes/release_19_11.rst
index 8490d897c..7ac3d5ca6 100644
--- a/doc/guides/rel_notes/release_19_11.rst
+++ b/doc/guides/rel_notes/release_19_11.rst
@@ -56,6 +56,10 @@ New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
+   * **Introduced FIFO for NTB PMD.**
+
+     Introduced FIFO for NTB (Non-transparent Bridge) PMD to support
+     packet based processing.
 
 Removed Items
 -------------
diff --git a/drivers/raw/ntb/Makefile b/drivers/raw/ntb/Makefile
index 6fe2aaf40..814cd05ca 100644
--- a/drivers/raw/ntb/Makefile
+++ b/drivers/raw/ntb/Makefile
@@ -25,4 +25,7 @@ LIBABIVER := 1
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_NTB_RAWDEV) += ntb.c
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_NTB_RAWDEV) += ntb_hw_intel.c
 
+# install this header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_PMD_NTB_RAWDEV)-include := rte_pmd_ntb.h
+
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/raw/ntb/meson.build b/drivers/raw/ntb/meson.build
index 7f39437f8..7a7d26126 100644
--- a/drivers/raw/ntb/meson.build
+++ b/drivers/raw/ntb/meson.build
@@ -5,4 +5,5 @@ deps += ['rawdev', 'mbuf', 'mempool',
 	 'pci', 'bus_pci']
 sources = files('ntb.c',
                 'ntb_hw_intel.c')
+install_headers('rte_pmd_ntb.h')
 allow_experimental_apis = true
diff --git a/drivers/raw/ntb/ntb.c b/drivers/raw/ntb/ntb.c
index bfecce1e4..124c82a95 100644
--- a/drivers/raw/ntb/ntb.c
+++ b/drivers/raw/ntb/ntb.c
@@ -12,6 +12,7 @@
 #include <rte_eal.h>
 #include <rte_log.h>
 #include <rte_pci.h>
+#include <rte_mbuf.h>
 #include <rte_bus_pci.h>
 #include <rte_memzone.h>
 #include <rte_memcpy.h>
@@ -19,6 +20,7 @@
 #include <rte_rawdev_pmd.h>
 
 #include "ntb_hw_intel.h"
+#include "rte_pmd_ntb.h"
 #include "ntb.h"
 
 int ntb_logtype;
@@ -28,48 +30,7 @@ static const struct rte_pci_id pci_id_ntb_map[] = {
 	{ .vendor_id = 0, /* sentinel */ },
 };
 
-static int
-ntb_set_mw(struct rte_rawdev *dev, int mw_idx, uint64_t mw_size)
-{
-	struct ntb_hw *hw = dev->dev_private;
-	char mw_name[RTE_MEMZONE_NAMESIZE];
-	const struct rte_memzone *mz;
-	int ret = 0;
-
-	if (hw->ntb_ops->mw_set_trans == NULL) {
-		NTB_LOG(ERR, "Not supported to set mw.");
-		return -ENOTSUP;
-	}
-
-	snprintf(mw_name, sizeof(mw_name), "ntb_%d_mw_%d",
-		 dev->dev_id, mw_idx);
-
-	mz = rte_memzone_lookup(mw_name);
-	if (mz)
-		return 0;
-
-	/**
-	 * Hardware requires that mapped memory base address should be
-	 * aligned with EMBARSZ and needs continuous memzone.
-	 */
-	mz = rte_memzone_reserve_aligned(mw_name, mw_size, dev->socket_id,
-				RTE_MEMZONE_IOVA_CONTIG, hw->mw_size[mw_idx]);
-	if (!mz) {
-		NTB_LOG(ERR, "Cannot allocate aligned memzone.");
-		return -EIO;
-	}
-	hw->mz[mw_idx] = mz;
-
-	ret = (*hw->ntb_ops->mw_set_trans)(dev, mw_idx, mz->iova, mw_size);
-	if (ret) {
-		NTB_LOG(ERR, "Cannot set mw translation.");
-		return ret;
-	}
-
-	return ret;
-}
-
-static void
+static inline void
 ntb_link_cleanup(struct rte_rawdev *dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
@@ -89,20 +50,94 @@ ntb_link_cleanup(struct rte_rawdev *dev)
 	}
 
 	/* Clear mw so that peer cannot access local memory.*/
-	for (i = 0; i < hw->mw_cnt; i++) {
+	for (i = 0; i < hw->used_mw_num; i++) {
 		status = (*hw->ntb_ops->mw_set_trans)(dev, i, 0, 0);
 		if (status)
 			NTB_LOG(ERR, "Failed to clean mw.");
 	}
 }
 
+static inline int
+ntb_handshake_work(const struct rte_rawdev *dev)
+{
+	struct ntb_hw *hw = dev->dev_private;
+	uint32_t val;
+	int ret, i;
+
+	if (hw->ntb_ops->spad_write == NULL ||
+	    hw->ntb_ops->mw_set_trans == NULL) {
+		NTB_LOG(ERR, "Scratchpad/MW setting is not supported.");
+		return -ENOTSUP;
+	}
+
+	/* Tell peer the mw info of local side. */
+	ret = (*hw->ntb_ops->spad_write)(dev, SPAD_NUM_MWS, 1, hw->mw_cnt);
+	if (ret < 0)
+		return ret;
+	for (i = 0; i < hw->mw_cnt; i++) {
+		NTB_LOG(INFO, "Local %u mw size: 0x%"PRIx64"", i,
+				hw->mw_size[i]);
+		val = hw->mw_size[i] >> 32;
+		ret = (*hw->ntb_ops->spad_write)(dev, SPAD_MW0_SZ_H + 2 * i,
+						 1, val);
+		if (ret < 0)
+			return ret;
+		val = hw->mw_size[i];
+		ret = (*hw->ntb_ops->spad_write)(dev, SPAD_MW0_SZ_L + 2 * i,
+						 1, val);
+		if (ret < 0)
+			return ret;
+	}
+
+	/* Tell peer about the queue info and map memory to the peer. */
+	ret = (*hw->ntb_ops->spad_write)(dev, SPAD_Q_SZ, 1, hw->queue_size);
+	if (ret < 0)
+		return ret;
+	ret = (*hw->ntb_ops->spad_write)(dev, SPAD_NUM_QPS, 1,
+					 hw->queue_pairs);
+	if (ret < 0)
+		return ret;
+	ret = (*hw->ntb_ops->spad_write)(dev, SPAD_USED_MWS, 1,
+					 hw->used_mw_num);
+	if (ret < 0)
+		return ret;
+	for (i = 0; i < hw->used_mw_num; i++) {
+		val = (uint64_t)(hw->mz[i]->addr) >> 32;
+		ret = (*hw->ntb_ops->spad_write)(dev, SPAD_MW0_BA_H + 2 * i,
+						 1, val);
+		if (ret < 0)
+			return ret;
+		val = (uint64_t)(hw->mz[i]->addr);
+		ret = (*hw->ntb_ops->spad_write)(dev, SPAD_MW0_BA_L + 2 * i,
+						 1, val);
+		if (ret < 0)
+			return ret;
+	}
+
+	for (i = 0; i < hw->used_mw_num; i++) {
+		ret = (*hw->ntb_ops->mw_set_trans)(dev, i, hw->mz[i]->iova,
+						   hw->mz[i]->len);
+		if (ret < 0)
+			return ret;
+	}
+
+	/* Ring doorbell 0 to tell peer the device is ready. */
+	ret = (*hw->ntb_ops->peer_db_set)(dev, 0);
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
+
 static void
 ntb_dev_intr_handler(void *param)
 {
 	struct rte_rawdev *dev = (struct rte_rawdev *)param;
 	struct ntb_hw *hw = dev->dev_private;
-	uint32_t mw_size_h, mw_size_l;
+	uint32_t val_h, val_l;
+	uint64_t peer_mw_size;
 	uint64_t db_bits = 0;
+	uint8_t peer_mw_cnt;
 	int i = 0;
 
 	if (hw->ntb_ops->db_read == NULL ||
@@ -118,7 +153,7 @@ ntb_dev_intr_handler(void *param)
 
 	/* Doorbell 0 is for peer device ready. */
 	if (db_bits & 1) {
-		NTB_LOG(DEBUG, "DB0: Peer device is up.");
+		NTB_LOG(INFO, "DB0: Peer device is up.");
 		/* Clear received doorbell. */
 		(*hw->ntb_ops->db_clear)(dev, 1);
 
@@ -129,47 +164,44 @@ ntb_dev_intr_handler(void *param)
 		if (hw->peer_dev_up)
 			return;
 
-		if (hw->ntb_ops->spad_read == NULL ||
-		    hw->ntb_ops->spad_write == NULL) {
-			NTB_LOG(ERR, "Scratchpad is not supported.");
+		if (hw->ntb_ops->spad_read == NULL) {
+			NTB_LOG(ERR, "Scratchpad read is not supported.");
+			return;
+		}
+
+		/* Check if mw setting on the peer is the same as local. */
+		peer_mw_cnt = (*hw->ntb_ops->spad_read)(dev, SPAD_NUM_MWS, 0);
+		if (peer_mw_cnt != hw->mw_cnt) {
+			NTB_LOG(ERR, "Both mw cnt must be the same.");
 			return;
 		}
 
-		hw->peer_mw_cnt = (*hw->ntb_ops->spad_read)
-				  (dev, SPAD_NUM_MWS, 0);
-		hw->peer_mw_size = rte_zmalloc("uint64_t",
-				   hw->peer_mw_cnt * sizeof(uint64_t), 0);
 		for (i = 0; i < hw->mw_cnt; i++) {
-			mw_size_h = (*hw->ntb_ops->spad_read)
-				    (dev, SPAD_MW0_SZ_H + 2 * i, 0);
-			mw_size_l = (*hw->ntb_ops->spad_read)
-				    (dev, SPAD_MW0_SZ_L + 2 * i, 0);
-			hw->peer_mw_size[i] = ((uint64_t)mw_size_h << 32) |
-					      mw_size_l;
+			val_h = (*hw->ntb_ops->spad_read)
+				(dev, SPAD_MW0_SZ_H + 2 * i, 0);
+			val_l = (*hw->ntb_ops->spad_read)
+				(dev, SPAD_MW0_SZ_L + 2 * i, 0);
+			peer_mw_size = ((uint64_t)val_h << 32) | val_l;
 			NTB_LOG(DEBUG, "Peer %u mw size: 0x%"PRIx64"", i,
-					hw->peer_mw_size[i]);
+					peer_mw_size);
+			if (peer_mw_size != hw->mw_size[i]) {
+				NTB_LOG(ERR, "Mw config must be the same.");
+				return;
+			}
 		}
 
 		hw->peer_dev_up = 1;
 
 		/**
-		 * Handshake with peer. Spad_write only works when both
-		 * devices are up. So write spad again when db is received.
-		 * And set db again for the later device who may miss
+		 * Handshake with peer. Spad_write & mw_set_trans only works
+		 * when both devices are up. So write spad again when db is
+		 * received. And set db again for the later device who may miss
 		 * the 1st db.
 		 */
-		for (i = 0; i < hw->mw_cnt; i++) {
-			(*hw->ntb_ops->spad_write)(dev, SPAD_NUM_MWS,
-						   1, hw->mw_cnt);
-			mw_size_h = hw->mw_size[i] >> 32;
-			(*hw->ntb_ops->spad_write)(dev, SPAD_MW0_SZ_H + 2 * i,
-						   1, mw_size_h);
-
-			mw_size_l = hw->mw_size[i];
-			(*hw->ntb_ops->spad_write)(dev, SPAD_MW0_SZ_L + 2 * i,
-						   1, mw_size_l);
+		if (ntb_handshake_work(dev) < 0) {
+			NTB_LOG(ERR, "Handshake work failed.");
+			return;
 		}
-		(*hw->ntb_ops->peer_db_set)(dev, 0);
 
 		/* To get the link info. */
 		if (hw->ntb_ops->get_link_status == NULL) {
@@ -183,7 +215,7 @@ ntb_dev_intr_handler(void *param)
 	}
 
 	if (db_bits & (1 << 1)) {
-		NTB_LOG(DEBUG, "DB1: Peer device is down.");
+		NTB_LOG(INFO, "DB1: Peer device is down.");
 		/* Clear received doorbell. */
 		(*hw->ntb_ops->db_clear)(dev, 2);
 
@@ -197,7 +229,7 @@ ntb_dev_intr_handler(void *param)
 	}
 
 	if (db_bits & (1 << 2)) {
-		NTB_LOG(DEBUG, "DB2: Peer device agrees dev to be down.");
+		NTB_LOG(INFO, "DB2: Peer device agrees dev to be down.");
 		/* Clear received doorbell. */
 		(*hw->ntb_ops->db_clear)(dev, (1 << 2));
 		hw->peer_dev_up = 0;
@@ -206,24 +238,228 @@ ntb_dev_intr_handler(void *param)
 }
 
 static void
-ntb_queue_conf_get(struct rte_rawdev *dev __rte_unused,
-		   uint16_t queue_id __rte_unused,
-		   rte_rawdev_obj_t queue_conf __rte_unused)
+ntb_queue_conf_get(struct rte_rawdev *dev,
+		   uint16_t queue_id,
+		   rte_rawdev_obj_t queue_conf)
+{
+	struct ntb_queue_conf *q_conf = queue_conf;
+	struct ntb_hw *hw = dev->dev_private;
+
+	q_conf->tx_free_thresh = hw->tx_queues[queue_id]->tx_free_thresh;
+	q_conf->nb_desc = hw->rx_queues[queue_id]->nb_rx_desc;
+	q_conf->rx_mp = hw->rx_queues[queue_id]->mpool;
+}
+
+static void
+ntb_rxq_release_mbufs(struct ntb_rx_queue *q)
+{
+	int i;
+
+	if (!q || !q->sw_ring) {
+		NTB_LOG(ERR, "Pointer to rxq or sw_ring is NULL");
+		return;
+	}
+
+	for (i = 0; i < q->nb_rx_desc; i++) {
+		if (q->sw_ring[i].mbuf) {
+			rte_pktmbuf_free_seg(q->sw_ring[i].mbuf);
+			q->sw_ring[i].mbuf = NULL;
+		}
+	}
+}
+
+static void
+ntb_rxq_release(struct ntb_rx_queue *rxq)
+{
+	if (!rxq) {
+		NTB_LOG(ERR, "Pointer to rxq is NULL");
+		return;
+	}
+
+	ntb_rxq_release_mbufs(rxq);
+
+	rte_free(rxq->sw_ring);
+	rte_free(rxq);
+}
+
+static int
+ntb_rxq_setup(struct rte_rawdev *dev,
+	      uint16_t qp_id,
+	      rte_rawdev_obj_t queue_conf)
+{
+	struct ntb_queue_conf *rxq_conf = queue_conf;
+	struct ntb_hw *hw = dev->dev_private;
+	struct ntb_rx_queue *rxq;
+
+	/* Allocate the rx queue data structure */
+	rxq = rte_zmalloc_socket("ntb rx queue",
+				 sizeof(struct ntb_rx_queue),
+				 RTE_CACHE_LINE_SIZE,
+				 dev->socket_id);
+	if (!rxq) {
+		NTB_LOG(ERR, "Failed to allocate memory for "
+			    "rx queue data structure.");
+		return -ENOMEM;
+	}
+
+	if (rxq_conf->rx_mp == NULL) {
+		NTB_LOG(ERR, "Invalid null mempool pointer.");
+		return -EINVAL;
+	}
+	rxq->nb_rx_desc = rxq_conf->nb_desc;
+	rxq->mpool = rxq_conf->rx_mp;
+	rxq->port_id = dev->dev_id;
+	rxq->queue_id = qp_id;
+	rxq->hw = hw;
+
+	/* Allocate the software ring. */
+	rxq->sw_ring =
+		rte_zmalloc_socket("ntb rx sw ring",
+				   sizeof(struct ntb_rx_entry) *
+				   rxq->nb_rx_desc,
+				   RTE_CACHE_LINE_SIZE,
+				   dev->socket_id);
+	if (!rxq->sw_ring) {
+		ntb_rxq_release(rxq);
+		NTB_LOG(ERR, "Failed to allocate memory for SW ring");
+		return -ENOMEM;
+	}
+
+	hw->rx_queues[qp_id] = rxq;
+
+	return 0;
+}
+
+static void
+ntb_txq_release_mbufs(struct ntb_tx_queue *q)
+{
+	int i;
+
+	if (!q || !q->sw_ring) {
+		NTB_LOG(ERR, "Pointer to txq or sw_ring is NULL");
+		return;
+	}
+
+	for (i = 0; i < q->nb_tx_desc; i++) {
+		if (q->sw_ring[i].mbuf) {
+			rte_pktmbuf_free_seg(q->sw_ring[i].mbuf);
+			q->sw_ring[i].mbuf = NULL;
+		}
+	}
+}
+
+static void
+ntb_txq_release(struct ntb_tx_queue *txq)
 {
+	if (!txq) {
+		NTB_LOG(ERR, "Pointer to txq is NULL");
+		return;
+	}
+
+	ntb_txq_release_mbufs(txq);
+
+	rte_free(txq->sw_ring);
+	rte_free(txq);
 }
 
 static int
-ntb_queue_setup(struct rte_rawdev *dev __rte_unused,
-		uint16_t queue_id __rte_unused,
-		rte_rawdev_obj_t queue_conf __rte_unused)
+ntb_txq_setup(struct rte_rawdev *dev,
+	      uint16_t qp_id,
+	      rte_rawdev_obj_t queue_conf)
 {
+	struct ntb_queue_conf *txq_conf = queue_conf;
+	struct ntb_hw *hw = dev->dev_private;
+	struct ntb_tx_queue *txq;
+	uint16_t i, prev;
+
+	/* Allocate the TX queue data structure. */
+	txq = rte_zmalloc_socket("ntb tx queue",
+				  sizeof(struct ntb_tx_queue),
+				  RTE_CACHE_LINE_SIZE,
+				  dev->socket_id);
+	if (!txq) {
+		NTB_LOG(ERR, "Failed to allocate memory for "
+			    "tx queue structure");
+		return -ENOMEM;
+	}
+
+	txq->nb_tx_desc = txq_conf->nb_desc;
+	txq->port_id = dev->dev_id;
+	txq->queue_id = qp_id;
+	txq->hw = hw;
+
+	/* Allocate software ring */
+	txq->sw_ring =
+		rte_zmalloc_socket("ntb tx sw ring",
+				   sizeof(struct ntb_tx_entry) *
+				   txq->nb_tx_desc,
+				   RTE_CACHE_LINE_SIZE,
+				   dev->socket_id);
+	if (!txq->sw_ring) {
+		ntb_txq_release(txq);
+		NTB_LOG(ERR, "Failed to allocate memory for SW TX ring");
+		return -ENOMEM;
+	}
+
+	prev = txq->nb_tx_desc - 1;
+	for (i = 0; i < txq->nb_tx_desc; i++) {
+		txq->sw_ring[i].mbuf = NULL;
+		txq->sw_ring[i].last_id = i;
+		txq->sw_ring[prev].next_id = i;
+		prev = i;
+	}
+
+	txq->tx_free_thresh = txq_conf->tx_free_thresh ?
+			      txq_conf->tx_free_thresh :
+			      NTB_DFLT_TX_FREE_THRESH;
+	if (txq->tx_free_thresh >= txq->nb_tx_desc - 3) {
+		NTB_LOG(ERR, "tx_free_thresh must be less than nb_desc - 3. "
+			"(tx_free_thresh=%u qp_id=%u)", txq->tx_free_thresh,
+			qp_id);
+		return -EINVAL;
+	}
+
+	hw->tx_queues[qp_id] = txq;
+
 	return 0;
 }
 
+
+static int
+ntb_queue_setup(struct rte_rawdev *dev,
+		uint16_t queue_id,
+		rte_rawdev_obj_t queue_conf)
+{
+	struct ntb_hw *hw = dev->dev_private;
+	int ret;
+
+	if (queue_id > hw->queue_pairs)
+		return -EINVAL;
+
+	ret = ntb_txq_setup(dev, queue_id, queue_conf);
+	if (ret < 0)
+		return ret;
+
+	ret = ntb_rxq_setup(dev, queue_id, queue_conf);
+
+	return ret;
+}
+
 static int
-ntb_queue_release(struct rte_rawdev *dev __rte_unused,
-		  uint16_t queue_id __rte_unused)
+ntb_queue_release(struct rte_rawdev *dev, uint16_t queue_id)
 {
+	struct ntb_hw *hw = dev->dev_private;
+	struct ntb_tx_queue *txq;
+	struct ntb_rx_queue *rxq;
+
+	if (queue_id > hw->queue_pairs)
+		return -EINVAL;
+
+	txq = hw->tx_queues[queue_id];
+	rxq = hw->rx_queues[queue_id];
+	ntb_txq_release(txq);
+	ntb_rxq_release(rxq);
+
 	return 0;
 }
 
@@ -234,6 +470,77 @@ ntb_queue_count(struct rte_rawdev *dev)
 	return hw->queue_pairs;
 }
 
+static int
+ntb_queue_init(struct rte_rawdev *dev, uint16_t qp_id)
+{
+	struct ntb_hw *hw = dev->dev_private;
+	struct ntb_rx_queue *rxq = hw->rx_queues[qp_id];
+	struct ntb_tx_queue *txq = hw->tx_queues[qp_id];
+	volatile struct ntb_header *local_hdr;
+	struct ntb_header *remote_hdr;
+	uint16_t q_size = hw->queue_size;
+	uint32_t hdr_offset;
+	void *bar_addr;
+	uint16_t i;
+
+	if (hw->ntb_ops->get_peer_mw_addr == NULL) {
+		NTB_LOG(ERR, "Failed to get mapped peer addr.");
+		return -EINVAL;
+	}
+
+	/* Put queue info into the start of shared memory. */
+	hdr_offset = hw->hdr_size_per_queue * qp_id;
+	local_hdr = (volatile struct ntb_header *)
+		    ((uint64_t)hw->mz[0]->addr + hdr_offset);
+	bar_addr = (*hw->ntb_ops->get_peer_mw_addr)(dev, 0);
+	if (bar_addr == NULL)
+		return -EINVAL;
+	remote_hdr = (struct ntb_header *)
+		     ((uint64_t)bar_addr + hdr_offset);
+
+	/* rxq init. */
+	rxq->rx_desc_ring = (struct ntb_desc *)
+			    (&remote_hdr->desc_ring);
+	rxq->rx_used_ring = (volatile struct ntb_used *)
+			    (&local_hdr->desc_ring[q_size]);
+	rxq->avail_cnt = &remote_hdr->avail_cnt;
+	rxq->used_cnt = &local_hdr->used_cnt;
+
+	for (i = 0; i < rxq->nb_rx_desc - 1; i++) {
+		struct rte_mbuf *mbuf = rte_mbuf_raw_alloc(rxq->mpool);
+		if (unlikely(!mbuf)) {
+			NTB_LOG(ERR, "Failed to allocate mbuf for RX");
+			return -ENOMEM;
+		}
+		mbuf->port = dev->dev_id;
+
+		rxq->sw_ring[i].mbuf = mbuf;
+
+		rxq->rx_desc_ring[i].addr = rte_pktmbuf_mtod(mbuf, uint64_t);
+		rxq->rx_desc_ring[i].len = mbuf->buf_len - RTE_PKTMBUF_HEADROOM;
+	}
+	rte_wmb();
+	*rxq->avail_cnt = rxq->nb_rx_desc - 1;
+	rxq->last_avail = rxq->nb_rx_desc - 1;
+	rxq->last_used = 0;
+
+	/* txq init */
+	txq->tx_desc_ring = (volatile struct ntb_desc *)
+			    (&local_hdr->desc_ring);
+	txq->tx_used_ring = (struct ntb_used *)
+			    (&remote_hdr->desc_ring[q_size]);
+	txq->avail_cnt = &local_hdr->avail_cnt;
+	txq->used_cnt = &remote_hdr->used_cnt;
+
+	rte_wmb();
+	*txq->used_cnt = 0;
+	txq->last_used = 0;
+	txq->last_avail = 0;
+	txq->nb_tx_free = txq->nb_tx_desc - 1;
+
+	return 0;
+}
+
 static int
 ntb_enqueue_bufs(struct rte_rawdev *dev,
 		 struct rte_rawdev_buf **buffers,
@@ -278,58 +585,51 @@ static void
 ntb_dev_info_get(struct rte_rawdev *dev, rte_rawdev_obj_t dev_info)
 {
 	struct ntb_hw *hw = dev->dev_private;
-	struct ntb_attr *ntb_attrs = dev_info;
+	struct ntb_dev_info *info = dev_info;
 
-	strncpy(ntb_attrs[NTB_TOPO_ID].name, NTB_TOPO_NAME, NTB_ATTR_NAME_LEN);
-	switch (hw->topo) {
-	case NTB_TOPO_B2B_DSD:
-		strncpy(ntb_attrs[NTB_TOPO_ID].value, "B2B DSD",
-			NTB_ATTR_VAL_LEN);
-		break;
-	case NTB_TOPO_B2B_USD:
-		strncpy(ntb_attrs[NTB_TOPO_ID].value, "B2B USD",
-			NTB_ATTR_VAL_LEN);
-		break;
-	default:
-		strncpy(ntb_attrs[NTB_TOPO_ID].value, "Unsupported",
-			NTB_ATTR_VAL_LEN);
-	}
+	info->mw_cnt = hw->mw_cnt;
+	info->mw_size = hw->mw_size;
 
-	strncpy(ntb_attrs[NTB_LINK_STATUS_ID].name, NTB_LINK_STATUS_NAME,
-		NTB_ATTR_NAME_LEN);
-	snprintf(ntb_attrs[NTB_LINK_STATUS_ID].value, NTB_ATTR_VAL_LEN,
-		 "%d", hw->link_status);
-
-	strncpy(ntb_attrs[NTB_SPEED_ID].name, NTB_SPEED_NAME,
-		NTB_ATTR_NAME_LEN);
-	snprintf(ntb_attrs[NTB_SPEED_ID].value, NTB_ATTR_VAL_LEN,
-		 "%d", hw->link_speed);
-
-	strncpy(ntb_attrs[NTB_WIDTH_ID].name, NTB_WIDTH_NAME,
-		NTB_ATTR_NAME_LEN);
-	snprintf(ntb_attrs[NTB_WIDTH_ID].value, NTB_ATTR_VAL_LEN,
-		 "%d", hw->link_width);
-
-	strncpy(ntb_attrs[NTB_MW_CNT_ID].name, NTB_MW_CNT_NAME,
-		NTB_ATTR_NAME_LEN);
-	snprintf(ntb_attrs[NTB_MW_CNT_ID].value, NTB_ATTR_VAL_LEN,
-		 "%d", hw->mw_cnt);
+	/**
+	 * Intel hardware requires that mapped memory base address should be
+	 * aligned with EMBARSZ and needs continuous memzone.
+	 */
+	info->mw_size_align = (uint8_t)(hw->pci_dev->id.vendor_id ==
+					NTB_INTEL_VENDOR_ID);
 
-	strncpy(ntb_attrs[NTB_DB_CNT_ID].name, NTB_DB_CNT_NAME,
-		NTB_ATTR_NAME_LEN);
-	snprintf(ntb_attrs[NTB_DB_CNT_ID].value, NTB_ATTR_VAL_LEN,
-		 "%d", hw->db_cnt);
+	if (!hw->queue_size || !hw->queue_pairs) {
+		NTB_LOG(ERR, "No queue size and queue num assigned.");
+		return;
+	}
 
-	strncpy(ntb_attrs[NTB_SPAD_CNT_ID].name, NTB_SPAD_CNT_NAME,
-		NTB_ATTR_NAME_LEN);
-	snprintf(ntb_attrs[NTB_SPAD_CNT_ID].value, NTB_ATTR_VAL_LEN,
-		 "%d", hw->spad_cnt);
+	hw->hdr_size_per_queue = RTE_ALIGN(sizeof(struct ntb_header) +
+				hw->queue_size * sizeof(struct ntb_desc) +
+				hw->queue_size * sizeof(struct ntb_used),
+				RTE_CACHE_LINE_SIZE);
+	info->ntb_hdr_size = hw->hdr_size_per_queue * hw->queue_pairs;
 }
 
 static int
-ntb_dev_configure(const struct rte_rawdev *dev __rte_unused,
-		  rte_rawdev_obj_t config __rte_unused)
+ntb_dev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config)
 {
+	struct ntb_dev_config *conf = config;
+	struct ntb_hw *hw = dev->dev_private;
+	int ret;
+
+	hw->queue_pairs	= conf->num_queues;
+	hw->queue_size = conf->queue_size;
+	hw->used_mw_num = conf->mz_num;
+	hw->mz = conf->mz_list;
+	hw->rx_queues = rte_zmalloc("ntb_rx_queues",
+			sizeof(struct ntb_rx_queue *) * hw->queue_pairs, 0);
+	hw->tx_queues = rte_zmalloc("ntb_tx_queues",
+			sizeof(struct ntb_tx_queue *) * hw->queue_pairs, 0);
+
+	/* Start handshake with the peer. */
+	ret = ntb_handshake_work(dev);
+	if (ret < 0)
+		return ret;
+
 	return 0;
 }
 
@@ -337,21 +637,52 @@ static int
 ntb_dev_start(struct rte_rawdev *dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
-	int ret, i;
+	uint32_t peer_base_l, peer_val;
+	uint64_t peer_base_h;
+	uint32_t i;
+	int ret;
 
-	/* TODO: init queues and start queues. */
+	if (!hw->link_status || !hw->peer_dev_up)
+		return -EINVAL;
 
-	/* Map memory of bar_size to remote. */
-	hw->mz = rte_zmalloc("struct rte_memzone *",
-			     hw->mw_cnt * sizeof(struct rte_memzone *), 0);
-	for (i = 0; i < hw->mw_cnt; i++) {
-		ret = ntb_set_mw(dev, i, hw->mw_size[i]);
+	for (i = 0; i < hw->queue_pairs; i++) {
+		ret = ntb_queue_init(dev, i);
 		if (ret) {
-			NTB_LOG(ERR, "Fail to set mw.");
+			NTB_LOG(ERR, "Failed to init queue.");
 			return ret;
 		}
 	}
 
+	hw->peer_mw_base = rte_zmalloc("ntb_peer_mw_base", hw->mw_cnt *
+					sizeof(uint64_t), 0);
+
+	if (hw->ntb_ops->spad_read == NULL)
+		return -ENOTSUP;
+
+	peer_val = (*hw->ntb_ops->spad_read)(dev, SPAD_Q_SZ, 0);
+	if (peer_val != hw->queue_size) {
+		NTB_LOG(ERR, "Inconsistent queue size! (local: %u peer: %u)",
+			hw->queue_size, peer_val);
+		return -EINVAL;
+	}
+
+	peer_val = (*hw->ntb_ops->spad_read)(dev, SPAD_NUM_QPS, 0);
+	if (peer_val != hw->queue_pairs) {
+		NTB_LOG(ERR, "Inconsistent number of queues! (local: %u peer:"
+			" %u)", hw->queue_pairs, peer_val);
+		return -EINVAL;
+	}
+
+	hw->peer_used_mws = (*hw->ntb_ops->spad_read)(dev, SPAD_USED_MWS, 0);
+
+	for (i = 0; i < hw->peer_used_mws; i++) {
+		peer_base_h = (*hw->ntb_ops->spad_read)(dev,
+				SPAD_MW0_BA_H + 2 * i, 0);
+		peer_base_l = (*hw->ntb_ops->spad_read)(dev,
+				SPAD_MW0_BA_L + 2 * i, 0);
+		hw->peer_mw_base[i] = (peer_base_h << 32) + peer_base_l;
+	}
+
 	dev->started = 1;
 
 	return 0;
@@ -361,10 +692,10 @@ static void
 ntb_dev_stop(struct rte_rawdev *dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
+	struct ntb_rx_queue *rxq;
+	struct ntb_tx_queue *txq;
 	uint32_t time_out;
-	int status;
-
-	/* TODO: stop rx/tx queues. */
+	int status, i;
 
 	if (!hw->peer_dev_up)
 		goto clean;
@@ -405,6 +736,13 @@ ntb_dev_stop(struct rte_rawdev *dev)
 	if (status)
 		NTB_LOG(ERR, "Failed to clear doorbells.");
 
+	for (i = 0; i < hw->queue_pairs; i++) {
+		rxq = hw->rx_queues[i];
+		txq = hw->tx_queues[i];
+		ntb_rxq_release_mbufs(rxq);
+		ntb_txq_release_mbufs(txq);
+	}
+
 	dev->started = 0;
 }
 
@@ -413,12 +751,15 @@ ntb_dev_close(struct rte_rawdev *dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	struct rte_intr_handle *intr_handle;
-	int ret = 0;
+	int i;
 
 	if (dev->started)
 		ntb_dev_stop(dev);
 
-	/* TODO: free queues. */
+	/* free queues */
+	for (i = 0; i < hw->queue_pairs; i++)
+		ntb_queue_release(dev, i);
+	hw->queue_pairs = 0;
 
 	intr_handle = &hw->pci_dev->intr_handle;
 	/* Clean datapath event and vec mapping */
@@ -434,7 +775,7 @@ ntb_dev_close(struct rte_rawdev *dev)
 	rte_intr_callback_unregister(intr_handle,
 				     ntb_dev_intr_handler, dev);
 
-	return ret;
+	return 0;
 }
 
 static int
@@ -445,7 +786,7 @@ ntb_dev_reset(struct rte_rawdev *rawdev __rte_unused)
 
 static int
 ntb_attr_set(struct rte_rawdev *dev, const char *attr_name,
-				 uint64_t attr_value)
+	     uint64_t attr_value)
 {
 	struct ntb_hw *hw;
 	int index;
@@ -463,7 +804,21 @@ ntb_attr_set(struct rte_rawdev *dev, const char *attr_name,
 		index = atoi(&attr_name[NTB_SPAD_USER_LEN]);
 		(*hw->ntb_ops->spad_write)(dev, hw->spad_user_list[index],
 					   1, attr_value);
-		NTB_LOG(INFO, "Set attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Set attribute (%s) Value (%" PRIu64 ")",
+			attr_name, attr_value);
+		return 0;
+	}
+
+	if (!strncmp(attr_name, NTB_QUEUE_SZ_NAME, NTB_ATTR_NAME_LEN)) {
+		hw->queue_size = attr_value;
+		NTB_LOG(DEBUG, "Set attribute (%s) Value (%" PRIu64 ")",
+			attr_name, attr_value);
+		return 0;
+	}
+
+	if (!strncmp(attr_name, NTB_QUEUE_NUM_NAME, NTB_ATTR_NAME_LEN)) {
+		hw->queue_pairs = attr_value;
+		NTB_LOG(DEBUG, "Set attribute (%s) Value (%" PRIu64 ")",
 			attr_name, attr_value);
 		return 0;
 	}
@@ -475,7 +830,7 @@ ntb_attr_set(struct rte_rawdev *dev, const char *attr_name,
 
 static int
 ntb_attr_get(struct rte_rawdev *dev, const char *attr_name,
-				 uint64_t *attr_value)
+	     uint64_t *attr_value)
 {
 	struct ntb_hw *hw;
 	int index;
@@ -489,49 +844,50 @@ ntb_attr_get(struct rte_rawdev *dev, const char *attr_name,
 
 	if (!strncmp(attr_name, NTB_TOPO_NAME, NTB_ATTR_NAME_LEN)) {
 		*attr_value = hw->topo;
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
 
 	if (!strncmp(attr_name, NTB_LINK_STATUS_NAME, NTB_ATTR_NAME_LEN)) {
-		*attr_value = hw->link_status;
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		/* hw->link_status only indicates hw link status. */
+		*attr_value = hw->link_status && hw->peer_dev_up;
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
 
 	if (!strncmp(attr_name, NTB_SPEED_NAME, NTB_ATTR_NAME_LEN)) {
 		*attr_value = hw->link_speed;
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
 
 	if (!strncmp(attr_name, NTB_WIDTH_NAME, NTB_ATTR_NAME_LEN)) {
 		*attr_value = hw->link_width;
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
 
 	if (!strncmp(attr_name, NTB_MW_CNT_NAME, NTB_ATTR_NAME_LEN)) {
 		*attr_value = hw->mw_cnt;
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
 
 	if (!strncmp(attr_name, NTB_DB_CNT_NAME, NTB_ATTR_NAME_LEN)) {
 		*attr_value = hw->db_cnt;
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
 
 	if (!strncmp(attr_name, NTB_SPAD_CNT_NAME, NTB_ATTR_NAME_LEN)) {
 		*attr_value = hw->spad_cnt;
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
@@ -542,7 +898,7 @@ ntb_attr_get(struct rte_rawdev *dev, const char *attr_name,
 		index = atoi(&attr_name[NTB_SPAD_USER_LEN]);
 		*attr_value = (*hw->ntb_ops->spad_read)(dev,
 				hw->spad_user_list[index], 0);
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
@@ -585,6 +941,7 @@ ntb_xstats_reset(struct rte_rawdev *dev __rte_unused,
 	return 0;
 }
 
+
 static const struct rte_rawdev_ops ntb_ops = {
 	.dev_info_get         = ntb_dev_info_get,
 	.dev_configure        = ntb_dev_configure,
@@ -615,7 +972,6 @@ ntb_init_hw(struct rte_rawdev *dev, struct rte_pci_device *pci_dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	struct rte_intr_handle *intr_handle;
-	uint32_t val;
 	int ret, i;
 
 	hw->pci_dev = pci_dev;
@@ -688,45 +1044,6 @@ ntb_init_hw(struct rte_rawdev *dev, struct rte_pci_device *pci_dev)
 	/* enable uio intr after callback register */
 	rte_intr_enable(intr_handle);
 
-	if (hw->ntb_ops->spad_write == NULL) {
-		NTB_LOG(ERR, "Scratchpad is not supported.");
-		return -ENOTSUP;
-	}
-	/* Tell peer the mw_cnt of local side. */
-	ret = (*hw->ntb_ops->spad_write)(dev, SPAD_NUM_MWS, 1, hw->mw_cnt);
-	if (ret) {
-		NTB_LOG(ERR, "Failed to tell peer mw count.");
-		return ret;
-	}
-
-	/* Tell peer each mw size on local side. */
-	for (i = 0; i < hw->mw_cnt; i++) {
-		NTB_LOG(DEBUG, "Local %u mw size: 0x%"PRIx64"", i,
-				hw->mw_size[i]);
-		val = hw->mw_size[i] >> 32;
-		ret = (*hw->ntb_ops->spad_write)
-				(dev, SPAD_MW0_SZ_H + 2 * i, 1, val);
-		if (ret) {
-			NTB_LOG(ERR, "Failed to tell peer mw size.");
-			return ret;
-		}
-
-		val = hw->mw_size[i];
-		ret = (*hw->ntb_ops->spad_write)
-				(dev, SPAD_MW0_SZ_L + 2 * i, 1, val);
-		if (ret) {
-			NTB_LOG(ERR, "Failed to tell peer mw size.");
-			return ret;
-		}
-	}
-
-	/* Ring doorbell 0 to tell peer the device is ready. */
-	ret = (*hw->ntb_ops->peer_db_set)(dev, 0);
-	if (ret) {
-		NTB_LOG(ERR, "Failed to tell peer device is probed.");
-		return ret;
-	}
-
 	return ret;
 }
 
@@ -839,5 +1156,5 @@ RTE_INIT(ntb_init_log)
 {
 	ntb_logtype = rte_log_register("pmd.raw.ntb");
 	if (ntb_logtype >= 0)
-		rte_log_set_level(ntb_logtype, RTE_LOG_DEBUG);
+		rte_log_set_level(ntb_logtype, RTE_LOG_INFO);
 }
diff --git a/drivers/raw/ntb/ntb.h b/drivers/raw/ntb/ntb.h
index d355231b0..0ad20aed3 100644
--- a/drivers/raw/ntb/ntb.h
+++ b/drivers/raw/ntb/ntb.h
@@ -2,8 +2,8 @@
  * Copyright(c) 2019 Intel Corporation.
  */
 
-#ifndef _NTB_RAWDEV_H_
-#define _NTB_RAWDEV_H_
+#ifndef _NTB_H_
+#define _NTB_H_
 
 #include <stdbool.h>
 
@@ -19,38 +19,13 @@ extern int ntb_logtype;
 /* Device IDs */
 #define NTB_INTEL_DEV_ID_B2B_SKX    0x201C
 
-#define NTB_TOPO_NAME               "topo"
-#define NTB_LINK_STATUS_NAME        "link_status"
-#define NTB_SPEED_NAME              "speed"
-#define NTB_WIDTH_NAME              "width"
-#define NTB_MW_CNT_NAME             "mw_count"
-#define NTB_DB_CNT_NAME             "db_count"
-#define NTB_SPAD_CNT_NAME           "spad_count"
 /* Reserved to app to use. */
 #define NTB_SPAD_USER               "spad_user_"
 #define NTB_SPAD_USER_LEN           (sizeof(NTB_SPAD_USER) - 1)
-#define NTB_SPAD_USER_MAX_NUM       10
+#define NTB_SPAD_USER_MAX_NUM       4
 #define NTB_ATTR_NAME_LEN           30
-#define NTB_ATTR_VAL_LEN            30
-#define NTB_ATTR_MAX                20
-
-/* NTB Attributes */
-struct ntb_attr {
-	/**< Name of the attribute */
-	char name[NTB_ATTR_NAME_LEN];
-	/**< Value or reference of value of attribute */
-	char value[NTB_ATTR_NAME_LEN];
-};
 
-enum ntb_attr_idx {
-	NTB_TOPO_ID = 0,
-	NTB_LINK_STATUS_ID,
-	NTB_SPEED_ID,
-	NTB_WIDTH_ID,
-	NTB_MW_CNT_ID,
-	NTB_DB_CNT_ID,
-	NTB_SPAD_CNT_ID,
-};
+#define NTB_DFLT_TX_FREE_THRESH     256
 
 enum ntb_topo {
 	NTB_TOPO_NONE = 0,
@@ -87,10 +62,15 @@ enum ntb_spad_idx {
 	SPAD_NUM_MWS = 1,
 	SPAD_NUM_QPS,
 	SPAD_Q_SZ,
+	SPAD_USED_MWS,
 	SPAD_MW0_SZ_H,
 	SPAD_MW0_SZ_L,
 	SPAD_MW1_SZ_H,
 	SPAD_MW1_SZ_L,
+	SPAD_MW0_BA_H,
+	SPAD_MW0_BA_L,
+	SPAD_MW1_BA_H,
+	SPAD_MW1_BA_L,
 };
 
 /**
@@ -110,26 +90,97 @@ enum ntb_spad_idx {
  * @vector_bind: Bind vector source [intr] to msix vector [msix].
  */
 struct ntb_dev_ops {
-	int (*ntb_dev_init)(struct rte_rawdev *dev);
-	void *(*get_peer_mw_addr)(struct rte_rawdev *dev, int mw_idx);
-	int (*mw_set_trans)(struct rte_rawdev *dev, int mw_idx,
+	int (*ntb_dev_init)(const struct rte_rawdev *dev);
+	void *(*get_peer_mw_addr)(const struct rte_rawdev *dev, int mw_idx);
+	int (*mw_set_trans)(const struct rte_rawdev *dev, int mw_idx,
 			    uint64_t addr, uint64_t size);
-	int (*get_link_status)(struct rte_rawdev *dev);
-	int (*set_link)(struct rte_rawdev *dev, bool up);
-	uint32_t (*spad_read)(struct rte_rawdev *dev, int spad, bool peer);
-	int (*spad_write)(struct rte_rawdev *dev, int spad,
+	int (*get_link_status)(const struct rte_rawdev *dev);
+	int (*set_link)(const struct rte_rawdev *dev, bool up);
+	uint32_t (*spad_read)(const struct rte_rawdev *dev, int spad,
+			      bool peer);
+	int (*spad_write)(const struct rte_rawdev *dev, int spad,
 			  bool peer, uint32_t spad_v);
-	uint64_t (*db_read)(struct rte_rawdev *dev);
-	int (*db_clear)(struct rte_rawdev *dev, uint64_t db_bits);
-	int (*db_set_mask)(struct rte_rawdev *dev, uint64_t db_mask);
-	int (*peer_db_set)(struct rte_rawdev *dev, uint8_t db_bit);
-	int (*vector_bind)(struct rte_rawdev *dev, uint8_t intr, uint8_t msix);
+	uint64_t (*db_read)(const struct rte_rawdev *dev);
+	int (*db_clear)(const struct rte_rawdev *dev, uint64_t db_bits);
+	int (*db_set_mask)(const struct rte_rawdev *dev, uint64_t db_mask);
+	int (*peer_db_set)(const struct rte_rawdev *dev, uint8_t db_bit);
+	int (*vector_bind)(const struct rte_rawdev *dev, uint8_t intr,
+			   uint8_t msix);
+};
+
+struct ntb_desc {
+	uint64_t addr; /* buffer addr */
+	uint16_t len;  /* buffer length */
+	uint16_t rsv1;
+	uint32_t rsv2;
+};
+
+struct ntb_used {
+	uint16_t len;     /* buffer length */
+#define NTB_FLAG_EOP    1 /* end of packet */
+	uint16_t flags;   /* flags */
+};
+
+struct ntb_rx_entry {
+	struct rte_mbuf *mbuf;
+};
+
+struct ntb_rx_queue {
+	struct ntb_desc *rx_desc_ring;
+	volatile struct ntb_used *rx_used_ring;
+	uint16_t *avail_cnt;
+	volatile uint16_t *used_cnt;
+	uint16_t last_avail;
+	uint16_t last_used;
+	uint16_t nb_rx_desc;
+
+	uint16_t rx_free_thresh;
+
+	struct rte_mempool *mpool; /**< mempool for mbuf allocation */
+	struct ntb_rx_entry *sw_ring;
+
+	uint16_t queue_id;         /**< DPDK queue index. */
+	uint16_t port_id;          /**< Device port identifier. */
+
+	struct ntb_hw *hw;
+};
+
+struct ntb_tx_entry {
+	struct rte_mbuf *mbuf;
+	uint16_t next_id;
+	uint16_t last_id;
+};
+
+struct ntb_tx_queue {
+	volatile struct ntb_desc *tx_desc_ring;
+	struct ntb_used *tx_used_ring;
+	volatile uint16_t *avail_cnt;
+	uint16_t *used_cnt;
+	uint16_t last_avail;          /**< Next need to be free. */
+	uint16_t last_used;           /**< Next need to be sent. */
+	uint16_t nb_tx_desc;
+
+	/**< Total number of TX descriptors ready to be allocated. */
+	uint16_t nb_tx_free;
+	uint16_t tx_free_thresh;
+
+	struct ntb_tx_entry *sw_ring;
+
+	uint16_t queue_id;            /**< DPDK queue index. */
+	uint16_t port_id;             /**< Device port identifier. */
+
+	struct ntb_hw *hw;
+};
+
+struct ntb_header {
+	uint16_t avail_cnt __rte_cache_aligned;
+	uint16_t used_cnt __rte_cache_aligned;
+	struct ntb_desc desc_ring[] __rte_cache_aligned;
 };
 
 /* ntb private data. */
 struct ntb_hw {
 	uint8_t mw_cnt;
-	uint8_t peer_mw_cnt;
 	uint8_t db_cnt;
 	uint8_t spad_cnt;
 
@@ -147,18 +198,26 @@ struct ntb_hw {
 	struct rte_pci_device *pci_dev;
 	char *hw_addr;
 
-	uint64_t *mw_size;
-	uint64_t *peer_mw_size;
 	uint8_t peer_dev_up;
+	uint64_t *mw_size;
+	/* remote mem base addr */
+	uint64_t *peer_mw_base;
 
 	uint16_t queue_pairs;
 	uint16_t queue_size;
+	uint32_t hdr_size_per_queue;
+
+	struct ntb_rx_queue **rx_queues;
+	struct ntb_tx_queue **tx_queues;
 
-	/**< mem zone to populate RX ring. */
+	/* memzone to populate RX ring. */
 	const struct rte_memzone **mz;
+	uint8_t used_mw_num;
+
+	uint8_t peer_used_mws;
 
 	/* Reserve several spad for app to use. */
 	int spad_user_list[NTB_SPAD_USER_MAX_NUM];
 };
 
-#endif /* _NTB_RAWDEV_H_ */
+#endif /* _NTB_H_ */
diff --git a/drivers/raw/ntb/ntb_hw_intel.c b/drivers/raw/ntb/ntb_hw_intel.c
index 21eaa8511..0e73f1609 100644
--- a/drivers/raw/ntb/ntb_hw_intel.c
+++ b/drivers/raw/ntb/ntb_hw_intel.c
@@ -26,7 +26,7 @@ static enum xeon_ntb_bar intel_ntb_bar[] = {
 };
 
 static int
-intel_ntb_dev_init(struct rte_rawdev *dev)
+intel_ntb_dev_init(const struct rte_rawdev *dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint8_t reg_val, bar;
@@ -77,7 +77,7 @@ intel_ntb_dev_init(struct rte_rawdev *dev)
 	hw->db_cnt = XEON_DB_COUNT;
 	hw->spad_cnt = XEON_SPAD_COUNT;
 
-	hw->mw_size = rte_zmalloc("uint64_t",
+	hw->mw_size = rte_zmalloc("ntb_mw_size",
 				  hw->mw_cnt * sizeof(uint64_t), 0);
 	for (i = 0; i < hw->mw_cnt; i++) {
 		bar = intel_ntb_bar[i];
@@ -94,7 +94,7 @@ intel_ntb_dev_init(struct rte_rawdev *dev)
 }
 
 static void *
-intel_ntb_get_peer_mw_addr(struct rte_rawdev *dev, int mw_idx)
+intel_ntb_get_peer_mw_addr(const struct rte_rawdev *dev, int mw_idx)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint8_t bar;
@@ -116,7 +116,7 @@ intel_ntb_get_peer_mw_addr(struct rte_rawdev *dev, int mw_idx)
 }
 
 static int
-intel_ntb_mw_set_trans(struct rte_rawdev *dev, int mw_idx,
+intel_ntb_mw_set_trans(const struct rte_rawdev *dev, int mw_idx,
 		       uint64_t addr, uint64_t size)
 {
 	struct ntb_hw *hw = dev->dev_private;
@@ -163,7 +163,7 @@ intel_ntb_mw_set_trans(struct rte_rawdev *dev, int mw_idx,
 }
 
 static int
-intel_ntb_get_link_status(struct rte_rawdev *dev)
+intel_ntb_get_link_status(const struct rte_rawdev *dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint16_t reg_val;
@@ -195,7 +195,7 @@ intel_ntb_get_link_status(struct rte_rawdev *dev)
 }
 
 static int
-intel_ntb_set_link(struct rte_rawdev *dev, bool up)
+intel_ntb_set_link(const struct rte_rawdev *dev, bool up)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint32_t ntb_ctrl, reg_off;
@@ -221,7 +221,7 @@ intel_ntb_set_link(struct rte_rawdev *dev, bool up)
 }
 
 static uint32_t
-intel_ntb_spad_read(struct rte_rawdev *dev, int spad, bool peer)
+intel_ntb_spad_read(const struct rte_rawdev *dev, int spad, bool peer)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint32_t spad_v, reg_off;
@@ -241,7 +241,7 @@ intel_ntb_spad_read(struct rte_rawdev *dev, int spad, bool peer)
 }
 
 static int
-intel_ntb_spad_write(struct rte_rawdev *dev, int spad,
+intel_ntb_spad_write(const struct rte_rawdev *dev, int spad,
 		     bool peer, uint32_t spad_v)
 {
 	struct ntb_hw *hw = dev->dev_private;
@@ -263,7 +263,7 @@ intel_ntb_spad_write(struct rte_rawdev *dev, int spad,
 }
 
 static uint64_t
-intel_ntb_db_read(struct rte_rawdev *dev)
+intel_ntb_db_read(const struct rte_rawdev *dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint64_t db_off, db_bits;
@@ -278,7 +278,7 @@ intel_ntb_db_read(struct rte_rawdev *dev)
 }
 
 static int
-intel_ntb_db_clear(struct rte_rawdev *dev, uint64_t db_bits)
+intel_ntb_db_clear(const struct rte_rawdev *dev, uint64_t db_bits)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint64_t db_off;
@@ -293,7 +293,7 @@ intel_ntb_db_clear(struct rte_rawdev *dev, uint64_t db_bits)
 }
 
 static int
-intel_ntb_db_set_mask(struct rte_rawdev *dev, uint64_t db_mask)
+intel_ntb_db_set_mask(const struct rte_rawdev *dev, uint64_t db_mask)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint64_t db_m_off;
@@ -312,7 +312,7 @@ intel_ntb_db_set_mask(struct rte_rawdev *dev, uint64_t db_mask)
 }
 
 static int
-intel_ntb_peer_db_set(struct rte_rawdev *dev, uint8_t db_idx)
+intel_ntb_peer_db_set(const struct rte_rawdev *dev, uint8_t db_idx)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint32_t db_off;
@@ -332,7 +332,7 @@ intel_ntb_peer_db_set(struct rte_rawdev *dev, uint8_t db_idx)
 }
 
 static int
-intel_ntb_vector_bind(struct rte_rawdev *dev, uint8_t intr, uint8_t msix)
+intel_ntb_vector_bind(const struct rte_rawdev *dev, uint8_t intr, uint8_t msix)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint8_t reg_off;
diff --git a/drivers/raw/ntb/rte_pmd_ntb.h b/drivers/raw/ntb/rte_pmd_ntb.h
new file mode 100644
index 000000000..6591ce793
--- /dev/null
+++ b/drivers/raw/ntb/rte_pmd_ntb.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Intel Corporation.
+ */
+
+#ifndef _RTE_PMD_NTB_H_
+#define _RTE_PMD_NTB_H_
+
+/* App needs to set/get these attrs */
+#define NTB_QUEUE_SZ_NAME           "queue_size"
+#define NTB_QUEUE_NUM_NAME          "queue_num"
+#define NTB_TOPO_NAME               "topo"
+#define NTB_LINK_STATUS_NAME        "link_status"
+#define NTB_SPEED_NAME              "speed"
+#define NTB_WIDTH_NAME              "width"
+#define NTB_MW_CNT_NAME             "mw_count"
+#define NTB_DB_CNT_NAME             "db_count"
+#define NTB_SPAD_CNT_NAME           "spad_count"
+
+#define NTB_MAX_DESC_SIZE           1024
+#define NTB_MIN_DESC_SIZE           64
+
+struct ntb_dev_info {
+	uint32_t ntb_hdr_size;
+	/**< memzone needs to be mw size align or not. */
+	uint8_t mw_size_align;
+	uint8_t mw_cnt;
+	uint64_t *mw_size;
+};
+
+struct ntb_dev_config {
+	uint16_t num_queues;
+	uint16_t queue_size;
+	uint8_t mz_num;
+	const struct rte_memzone **mz_list;
+};
+
+struct ntb_queue_conf {
+	uint16_t nb_desc;
+	uint16_t tx_free_thresh;
+	struct rte_mempool *rx_mp;
+};
+
+#endif /* _RTE_PMD_NTB_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [PATCH 2/4] raw/ntb: add xstats support
  2019-09-05  5:39 [dpdk-dev] [PATCH 0/4] enable FIFO for NTB Xiaoyun Li
  2019-09-05  5:39 ` [dpdk-dev] [PATCH 1/4] raw/ntb: setup ntb queue Xiaoyun Li
@ 2019-09-05  5:39 ` Xiaoyun Li
  2019-09-05  5:39 ` [dpdk-dev] [PATCH 3/4] raw/ntb: add enqueue and dequeue functions Xiaoyun Li
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 42+ messages in thread
From: Xiaoyun Li @ 2019-09-05  5:39 UTC (permalink / raw)
  To: jingjing.wu, keith.wiles, omkar.maslekar, cunming.liang; +Cc: dev, Xiaoyun Li

Add xstats support for ntb rawdev.
Support tx-packets, tx-bytes, tx-errors and
rx-packets, rx-bytes, rx-missed.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
---
 drivers/raw/ntb/ntb.c | 135 ++++++++++++++++++++++++++++++++++++------
 drivers/raw/ntb/ntb.h |  11 ++++
 2 files changed, 128 insertions(+), 18 deletions(-)

diff --git a/drivers/raw/ntb/ntb.c b/drivers/raw/ntb/ntb.c
index 124c82a95..c104557d4 100644
--- a/drivers/raw/ntb/ntb.c
+++ b/drivers/raw/ntb/ntb.c
@@ -30,6 +30,17 @@ static const struct rte_pci_id pci_id_ntb_map[] = {
 	{ .vendor_id = 0, /* sentinel */ },
 };
 
+/* Align with enum ntb_xstats_idx */
+static struct rte_rawdev_xstats_name ntb_xstats_names[] = {
+	{"Tx-packtes"},
+	{"Tx-bytes"},
+	{"Tx-errors"},
+	{"Rx-packets"},
+	{"Rx-bytes"},
+	{"Rx-missed"},
+};
+#define NTB_XSTATS_NUM RTE_DIM(ntb_xstats_names)
+
 static inline void
 ntb_link_cleanup(struct rte_rawdev *dev)
 {
@@ -538,6 +549,10 @@ ntb_queue_init(struct rte_rawdev *dev, uint16_t qp_id)
 	txq->last_avail = 0;
 	txq->nb_tx_free = txq->nb_tx_desc - 1;
 
+	/* Set per queue stats. */
+	for (i = 0; i < NTB_XSTATS_NUM; i++)
+		hw->ntb_xstats[i + NTB_XSTATS_NUM * (qp_id + 1)] = 0;
+
 	return 0;
 }
 
@@ -614,6 +629,7 @@ ntb_dev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config)
 {
 	struct ntb_dev_config *conf = config;
 	struct ntb_hw *hw = dev->dev_private;
+	uint32_t xstats_num;
 	int ret;
 
 	hw->queue_pairs	= conf->num_queues;
@@ -624,6 +640,10 @@ ntb_dev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config)
 			sizeof(struct ntb_rx_queue *) * hw->queue_pairs, 0);
 	hw->tx_queues = rte_zmalloc("ntb_tx_queues",
 			sizeof(struct ntb_tx_queue *) * hw->queue_pairs, 0);
+	/* First total stats, then per queue stats. */
+	xstats_num = (hw->queue_pairs + 1) * NTB_XSTATS_NUM;
+	hw->ntb_xstats = rte_zmalloc("ntb_xstats", xstats_num *
+				     sizeof(uint64_t), 0);
 
 	/* Start handshake with the peer. */
 	ret = ntb_handshake_work(dev);
@@ -645,6 +665,10 @@ ntb_dev_start(struct rte_rawdev *dev)
 	if (!hw->link_status || !hw->peer_dev_up)
 		return -EINVAL;
 
+	/* Set total stats. */
+	for (i = 0; i < NTB_XSTATS_NUM; i++)
+		hw->ntb_xstats[i] = 0;
+
 	for (i = 0; i < hw->queue_pairs; i++) {
 		ret = ntb_queue_init(dev, i);
 		if (ret) {
@@ -909,38 +933,113 @@ ntb_attr_get(struct rte_rawdev *dev, const char *attr_name,
 }
 
 static int
-ntb_xstats_get(const struct rte_rawdev *dev __rte_unused,
-	       const unsigned int ids[] __rte_unused,
-	       uint64_t values[] __rte_unused,
-	       unsigned int n __rte_unused)
+ntb_xstats_get(const struct rte_rawdev *dev,
+	       const unsigned int ids[],
+	       uint64_t values[],
+	       unsigned int n)
 {
-	return 0;
+	struct ntb_hw *hw = dev->dev_private;
+	uint32_t i, j, off, xstats_num;
+
+	/* Calculate total stats of all queues. */
+	for (i = 0; i < NTB_XSTATS_NUM; i++) {
+		for (j = 0; j < hw->queue_pairs; j++) {
+			off = NTB_XSTATS_NUM * (j + 1) + i;
+			hw->ntb_xstats[i] += hw->ntb_xstats[off];
+		}
+	}
+
+	xstats_num = NTB_XSTATS_NUM * (hw->queue_pairs + 1);
+	for (i = 0; i < n && ids[i] < xstats_num; i++)
+		values[i] = hw->ntb_xstats[ids[i]];
+
+	return i;
 }
 
 static int
-ntb_xstats_get_names(const struct rte_rawdev *dev __rte_unused,
-		     struct rte_rawdev_xstats_name *xstats_names __rte_unused,
-		     unsigned int size __rte_unused)
+ntb_xstats_get_names(const struct rte_rawdev *dev,
+		     struct rte_rawdev_xstats_name *xstats_names,
+		     unsigned int size)
 {
-	return 0;
+	struct ntb_hw *hw = dev->dev_private;
+	uint32_t xstats_num, i, j, off;
+
+	xstats_num = NTB_XSTATS_NUM * (hw->queue_pairs + 1);
+	if (xstats_names == NULL || size < xstats_num)
+		return xstats_num;
+
+	/* Total stats names */
+	for (i = 0; i < NTB_XSTATS_NUM; i++) {
+		strncpy(xstats_names[i].name, ntb_xstats_names[i].name,
+			sizeof(xstats_names[0].name));
+	}
+
+	/* Queue stats names */
+	for (i = 0; i < hw->queue_pairs; i++) {
+		for (j = 0; j < NTB_XSTATS_NUM; j++) {
+			off = j + (i + 1) * NTB_XSTATS_NUM;
+			snprintf(xstats_names[off].name,
+				sizeof(xstats_names[0].name),
+				"%s_q%u", ntb_xstats_names[j].name, i);
+		}
+	}
+
+	return xstats_num;
 }
 
 static uint64_t
-ntb_xstats_get_by_name(const struct rte_rawdev *dev __rte_unused,
-		       const char *name __rte_unused,
-		       unsigned int *id __rte_unused)
+ntb_xstats_get_by_name(const struct rte_rawdev *dev,
+		       const char *name, unsigned int *id)
 {
-	return 0;
+	struct rte_rawdev_xstats_name *xstats_names;
+	struct ntb_hw *hw = dev->dev_private;
+	uint32_t xstats_num, i, j, off;
+
+	if (name == NULL)
+		return -EINVAL;
+
+	xstats_num = NTB_XSTATS_NUM * (hw->queue_pairs + 1);
+	xstats_names = rte_zmalloc("ntb_stats_name",
+				   sizeof(struct rte_rawdev_xstats_name) *
+				   xstats_num, 0);
+	ntb_xstats_get_names(dev, xstats_names, xstats_num);
+
+	/* Calculate total stats of all queues. */
+	for (i = 0; i < NTB_XSTATS_NUM; i++) {
+		for (j = 0; j < hw->queue_pairs; j++) {
+			off = NTB_XSTATS_NUM * (j + 1) + i;
+			hw->ntb_xstats[i] += hw->ntb_xstats[off];
+		}
+	}
+
+	for (i = 0; i < xstats_num; i++) {
+		if (!strncmp(name, xstats_names[i].name,
+		    RTE_RAW_DEV_XSTATS_NAME_SIZE)) {
+			*id = i;
+			rte_free(xstats_names);
+			return hw->ntb_xstats[i];
+		}
+	}
+
+	NTB_LOG(ERR, "Cannot find the xstats name.");
+
+	return -EINVAL;
 }
 
 static int
-ntb_xstats_reset(struct rte_rawdev *dev __rte_unused,
-		 const uint32_t ids[] __rte_unused,
-		 uint32_t nb_ids __rte_unused)
+ntb_xstats_reset(struct rte_rawdev *dev,
+		 const uint32_t ids[],
+		 uint32_t nb_ids)
 {
-	return 0;
-}
+	struct ntb_hw *hw = dev->dev_private;
+	uint32_t i, xstats_num;
 
+	xstats_num = NTB_XSTATS_NUM * (hw->queue_pairs + 1);
+	for (i = 0; i < nb_ids && ids[i] < xstats_num; i++)
+		hw->ntb_xstats[ids[i]] = 0;
+
+	return i;
+}
 
 static const struct rte_rawdev_ops ntb_ops = {
 	.dev_info_get         = ntb_dev_info_get,
diff --git a/drivers/raw/ntb/ntb.h b/drivers/raw/ntb/ntb.h
index 0ad20aed3..09e28050f 100644
--- a/drivers/raw/ntb/ntb.h
+++ b/drivers/raw/ntb/ntb.h
@@ -27,6 +27,15 @@ extern int ntb_logtype;
 
 #define NTB_DFLT_TX_FREE_THRESH     256
 
+enum ntb_xstats_idx {
+	NTB_TX_PKTS_ID = 0,
+	NTB_TX_BYTES_ID,
+	NTB_TX_ERRS_ID,
+	NTB_RX_PKTS_ID,
+	NTB_RX_BYTES_ID,
+	NTB_RX_MISS_ID,
+};
+
 enum ntb_topo {
 	NTB_TOPO_NONE = 0,
 	NTB_TOPO_B2B_USD,
@@ -216,6 +225,8 @@ struct ntb_hw {
 
 	uint8_t peer_used_mws;
 
+	uint64_t *ntb_xstats;
+
 	/* Reserve several spad for app to use. */
 	int spad_user_list[NTB_SPAD_USER_MAX_NUM];
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [PATCH 3/4] raw/ntb: add enqueue and dequeue functions
  2019-09-05  5:39 [dpdk-dev] [PATCH 0/4] enable FIFO for NTB Xiaoyun Li
  2019-09-05  5:39 ` [dpdk-dev] [PATCH 1/4] raw/ntb: setup ntb queue Xiaoyun Li
  2019-09-05  5:39 ` [dpdk-dev] [PATCH 2/4] raw/ntb: add xstats support Xiaoyun Li
@ 2019-09-05  5:39 ` Xiaoyun Li
  2019-09-05  5:39 ` [dpdk-dev] [PATCH 4/4] examples/ntb: support more functions for NTB Xiaoyun Li
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 42+ messages in thread
From: Xiaoyun Li @ 2019-09-05  5:39 UTC (permalink / raw)
  To: jingjing.wu, keith.wiles, omkar.maslekar, cunming.liang; +Cc: dev, Xiaoyun Li

Introduce enqueue and dequeue functions to support packet based
processing. And enable write-combining for ntb driver since it
can improve the performance a lot.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
---
 doc/guides/rawdevs/ntb.rst     |  28 ++++
 drivers/raw/ntb/ntb.c          | 242 ++++++++++++++++++++++++++++++---
 drivers/raw/ntb/ntb.h          |   2 +
 drivers/raw/ntb/ntb_hw_intel.c |  22 +++
 4 files changed, 275 insertions(+), 19 deletions(-)

diff --git a/doc/guides/rawdevs/ntb.rst b/doc/guides/rawdevs/ntb.rst
index 99e7db441..afd5769fc 100644
--- a/doc/guides/rawdevs/ntb.rst
+++ b/doc/guides/rawdevs/ntb.rst
@@ -45,6 +45,24 @@ to use, i.e. igb_uio, vfio. The ``dpdk-devbind.py`` script can be used to
 show devices status and to bind them to a suitable kernel driver. They will
 appear under the category of "Misc (rawdev) devices".
 
+Prerequisites
+-------------
+NTB PMD needs kernel PCI driver to support write combining (WC) to get
+better performance. The difference will be more than 10 times.
+To enable WC, there are 2 ways.
+- Insert igb_uio with ``wc_active=1`` flag if use igb_uio driver.
+     insmod igb_uio.ko wc_active=1
+- Enable WC for NTB device's Bar 2 and Bar 4 (Mapped memory) manually.
+     Get bar base address using ``lspci -vvv -s ae:00.0 | grep Region``.
+        Region 0: Memory at 39bfe0000000 (64-bit, prefetchable) [size=64K]
+        Region 2: Memory at 39bfa0000000 (64-bit, prefetchable) [size=512M]
+        Region 4: Memory at 39bfc0000000 (64-bit, prefetchable) [size=512M]
+     Using the following command to enable WC.
+     echo "base=0x39bfa0000000 size=0x400000 type=write-combining" >> /proc/mtrr
+     echo "base=0x39bfc0000000 size=0x400000 type=write-combining" >> /proc/mtrr
+     To disable WC for these regions, using the following.
+     echo "disable=1" >> /proc/mtrr
+
 Ring Layout
 -----------
 
@@ -83,6 +101,16 @@ like the following:
       +------------------------+   +------------------------+
                     <---------traffic---------
 
+- Enqueue and Dequeue
+  Based on this ring layout, enqueue reads rx_tail to get how many free
+  buffers and writes used_ring and tx_tail to tell the peer which buffers
+  are filled with data.
+  And dequeue reads tx_tail to get how many packets are arrived, and
+  writes desc_ring and rx_tail to tell the peer about the new allocated
+  buffers.
+  So in this way, only remote write happens and remote read can be avoid
+  to get better performance.
+
 Limitation
 ----------
 
diff --git a/drivers/raw/ntb/ntb.c b/drivers/raw/ntb/ntb.c
index c104557d4..2874d734c 100644
--- a/drivers/raw/ntb/ntb.c
+++ b/drivers/raw/ntb/ntb.c
@@ -556,26 +556,140 @@ ntb_queue_init(struct rte_rawdev *dev, uint16_t qp_id)
 	return 0;
 }
 
+static inline void
+ntb_enqueue_cleanup(struct ntb_tx_queue *txq)
+{
+	struct ntb_tx_entry *sw_ring = txq->sw_ring;
+	uint16_t tx_free = txq->last_avail;
+	uint16_t nb_to_clean, i;
+
+	/* avail_cnt + 1 represents where to rx next in the peer. */
+	nb_to_clean = (*txq->avail_cnt - txq->last_avail + 1 +
+			txq->nb_tx_desc) & (txq->nb_tx_desc - 1);
+	nb_to_clean = RTE_MIN(nb_to_clean, txq->tx_free_thresh);
+	for (i = 0; i < nb_to_clean; i++) {
+		if (sw_ring[tx_free].mbuf)
+			rte_pktmbuf_free_seg(sw_ring[tx_free].mbuf);
+		tx_free = (tx_free + 1) & (txq->nb_tx_desc - 1);
+	}
+
+	txq->nb_tx_free += nb_to_clean;
+	txq->last_avail = tx_free;
+}
+
 static int
 ntb_enqueue_bufs(struct rte_rawdev *dev,
 		 struct rte_rawdev_buf **buffers,
 		 unsigned int count,
 		 rte_rawdev_obj_t context)
 {
-	/* Not FIFO right now. Just for testing memory write. */
 	struct ntb_hw *hw = dev->dev_private;
-	unsigned int i;
-	void *bar_addr;
-	size_t size;
+	struct ntb_tx_queue *txq = hw->tx_queues[(size_t)context];
+	struct ntb_tx_entry *sw_ring = txq->sw_ring;
+	struct rte_mbuf *txm;
+	struct ntb_used tx_used[NTB_MAX_DESC_SIZE];
+	volatile struct ntb_desc *tx_item;
+	uint16_t tx_last, nb_segs, off, last_used, avail_cnt;
+	uint16_t nb_mbufs = 0;
+	uint16_t nb_tx = 0;
+	uint64_t bytes = 0;
+	void *buf_addr;
+	int i;
 
-	if (hw->ntb_ops->get_peer_mw_addr == NULL)
-		return -ENOTSUP;
-	bar_addr = (*hw->ntb_ops->get_peer_mw_addr)(dev, 0);
-	size = (size_t)context;
+	if (unlikely(hw->ntb_ops->ioremap == NULL)) {
+		NTB_LOG(ERR, "Ioremap not supported.");
+		return nb_tx;
+	}
 
-	for (i = 0; i < count; i++)
-		rte_memcpy(bar_addr, buffers[i]->buf_addr, size);
-	return 0;
+	if (unlikely(dev->started == 0 || hw->peer_dev_up == 0)) {
+		NTB_LOG(DEBUG, "Link is not up.");
+		return nb_tx;
+	}
+
+	if (txq->nb_tx_free < txq->tx_free_thresh)
+		ntb_enqueue_cleanup(txq);
+
+	off = NTB_XSTATS_NUM * ((size_t)context + 1);
+	last_used = txq->last_used;
+	avail_cnt = *txq->avail_cnt;/* Where to alloc next. */
+	for (nb_tx = 0; nb_tx < count; nb_tx++) {
+		txm = (struct rte_mbuf *)(buffers[nb_tx]->buf_addr);
+		if (txm == NULL || txq->nb_tx_free < txm->nb_segs)
+			break;
+
+		tx_last = (txq->last_used + txm->nb_segs - 1) &
+			  (txq->nb_tx_desc - 1);
+		nb_segs = txm->nb_segs;
+		for (i = 0; i < nb_segs; i++) {
+			/* Not enough ring space for tx. */
+			if (txq->last_used == avail_cnt)
+				goto end_of_tx;
+			sw_ring[txq->last_used].mbuf = txm;
+			tx_item = txq->tx_desc_ring + txq->last_used;
+
+			if (!tx_item->len) {
+				(hw->ntb_xstats[NTB_TX_ERRS_ID + off])++;
+				goto end_of_tx;
+			}
+			if (txm->data_len > tx_item->len) {
+				NTB_LOG(ERR, "Data length exceeds buf length."
+					" Only %u data would be transmitted.",
+					tx_item->len);
+				txm->data_len = tx_item->len;
+			}
+
+			/* translate remote virtual addr to bar virtual addr */
+			buf_addr = (*hw->ntb_ops->ioremap)(dev, tx_item->addr);
+			if (buf_addr == NULL) {
+				(hw->ntb_xstats[NTB_TX_ERRS_ID + off])++;
+				NTB_LOG(ERR, "Null remap addr.");
+				goto end_of_tx;
+			}
+			rte_memcpy(buf_addr, rte_pktmbuf_mtod(txm, void *),
+				   txm->data_len);
+
+			tx_used[nb_mbufs].len = txm->data_len;
+			tx_used[nb_mbufs++].flags = (txq->last_used ==
+						    tx_last) ?
+						    NTB_FLAG_EOP : 0;
+
+			/* update stats */
+			bytes += txm->data_len;
+
+			txm = txm->next;
+
+			sw_ring[txq->last_used].next_id = (txq->last_used + 1) &
+						  (txq->nb_tx_desc - 1);
+			sw_ring[txq->last_used].last_id = tx_last;
+			txq->last_used = (txq->last_used + 1) &
+					 (txq->nb_tx_desc - 1);
+		}
+		txq->nb_tx_free -= nb_segs;
+	}
+
+end_of_tx:
+	if (nb_tx) {
+		uint16_t nb1, nb2;
+		if (nb_mbufs > txq->nb_tx_desc - last_used) {
+			nb1 = txq->nb_tx_desc - last_used;
+			nb2 = nb_mbufs - txq->nb_tx_desc + last_used;
+		} else {
+			nb1 = nb_mbufs;
+			nb2 = 0;
+		}
+		rte_memcpy(txq->tx_used_ring + last_used, tx_used,
+			   sizeof(struct ntb_used) * nb1);
+		rte_memcpy(txq->tx_used_ring, tx_used + nb1,
+			   sizeof(struct ntb_used) * nb2);
+		*txq->used_cnt = txq->last_used;
+		rte_wmb();
+
+		/* update queue stats */
+		hw->ntb_xstats[NTB_TX_BYTES_ID + off] += bytes;
+		hw->ntb_xstats[NTB_TX_PKTS_ID + off] += nb_tx;
+	}
+
+	return nb_tx;
 }
 
 static int
@@ -584,16 +698,106 @@ ntb_dequeue_bufs(struct rte_rawdev *dev,
 		 unsigned int count,
 		 rte_rawdev_obj_t context)
 {
-	/* Not FIFO. Just for testing memory read. */
 	struct ntb_hw *hw = dev->dev_private;
-	unsigned int i;
-	size_t size;
+	struct ntb_rx_queue *rxq = hw->rx_queues[(size_t)context];
+	struct ntb_rx_entry *sw_ring = rxq->sw_ring;
+	struct ntb_desc rx_desc[NTB_MAX_DESC_SIZE];
+	struct rte_mbuf *first, *rxm_t;
+	struct rte_mbuf *prev = NULL;
+	volatile struct ntb_used *rx_item;
+	uint16_t nb_mbufs = 0;
+	uint16_t nb_rx = 0;
+	uint64_t bytes = 0;
+	uint16_t off, last_avail, used_cnt, used_nb;
+	int i;
 
-	size = (size_t)context;
+	if (unlikely(dev->started == 0 || hw->peer_dev_up == 0)) {
+		NTB_LOG(DEBUG, "Link is not up");
+		return nb_rx;
+	}
+
+	used_cnt = *rxq->used_cnt;
+
+	if (rxq->last_used == used_cnt)
+		return nb_rx;
+
+	last_avail = rxq->last_avail;
+	used_nb = (used_cnt - rxq->last_used) & (rxq->nb_rx_desc - 1);
+	count = RTE_MIN(count, used_nb);
+	for (nb_rx = 0; nb_rx < count; nb_rx++) {
+		i = 0;
+		while (true) {
+			rx_item = rxq->rx_used_ring + rxq->last_used;
+			rxm_t = sw_ring[rxq->last_used].mbuf;
+			rxm_t->data_len = rx_item->len;
+			rxm_t->data_off = RTE_PKTMBUF_HEADROOM;
+			rxm_t->port = rxq->port_id;
+
+			if (!i) {
+				rxm_t->nb_segs = 1;
+				first = rxm_t;
+				first->pkt_len = 0;
+				buffers[nb_rx]->buf_addr = rxm_t;
+			} else {
+				prev->next = rxm_t;
+				first->nb_segs++;
+			}
 
-	for (i = 0; i < count; i++)
-		rte_memcpy(buffers[i]->buf_addr, hw->mz[i]->addr, size);
-	return 0;
+			prev = rxm_t;
+			first->pkt_len += prev->data_len;
+			rxq->last_used = (rxq->last_used + 1) &
+					 (rxq->nb_rx_desc - 1);
+
+			/* alloc new mbuf */
+			rxm_t = rte_mbuf_raw_alloc(rxq->mpool);
+			if (unlikely(rxm_t == NULL)) {
+				NTB_LOG(ERR, "recv alloc mbuf failed.");
+				goto end_of_rx;
+			}
+			rxm_t->port = rxq->port_id;
+			sw_ring[rxq->last_avail].mbuf = rxm_t;
+			i++;
+
+			/* fill new desc */
+			rx_desc[nb_mbufs].addr =
+					rte_pktmbuf_mtod(rxm_t, uint64_t);
+			rx_desc[nb_mbufs++].len = rxm_t->buf_len -
+						  RTE_PKTMBUF_HEADROOM;
+			rxq->last_avail = (rxq->last_avail + 1) &
+					  (rxq->nb_rx_desc - 1);
+
+			if (rx_item->flags & NTB_FLAG_EOP)
+				break;
+		}
+		/* update stats */
+		bytes += first->pkt_len;
+	}
+
+end_of_rx:
+	if (nb_rx) {
+		uint16_t nb1, nb2;
+		if (nb_mbufs > rxq->nb_rx_desc - last_avail) {
+			nb1 = rxq->nb_rx_desc - last_avail;
+			nb2 = nb_mbufs - rxq->nb_rx_desc + last_avail;
+		} else {
+			nb1 = nb_mbufs;
+			nb2 = 0;
+		}
+		rte_memcpy(rxq->rx_desc_ring + last_avail, rx_desc,
+			   sizeof(struct ntb_desc) * nb1);
+		rte_memcpy(rxq->rx_desc_ring, rx_desc + nb1,
+			   sizeof(struct ntb_desc) * nb2);
+		*rxq->avail_cnt = rxq->last_avail;
+		rte_wmb();
+
+		/* update queue stats */
+		off = NTB_XSTATS_NUM * ((size_t)context + 1);
+		hw->ntb_xstats[NTB_RX_BYTES_ID + off] += bytes;
+		hw->ntb_xstats[NTB_RX_PKTS_ID + off] += nb_rx;
+		hw->ntb_xstats[NTB_RX_MISS_ID + off] += (count - nb_rx);
+	}
+
+	return nb_rx;
 }
 
 static void
@@ -1242,7 +1446,7 @@ ntb_remove(struct rte_pci_device *pci_dev)
 
 static struct rte_pci_driver rte_ntb_pmd = {
 	.id_table = pci_id_ntb_map,
-	.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_WC_ACTIVATE,
 	.probe = ntb_probe,
 	.remove = ntb_remove,
 };
diff --git a/drivers/raw/ntb/ntb.h b/drivers/raw/ntb/ntb.h
index 09e28050f..eff1f6f07 100644
--- a/drivers/raw/ntb/ntb.h
+++ b/drivers/raw/ntb/ntb.h
@@ -87,6 +87,7 @@ enum ntb_spad_idx {
  * @ntb_dev_init: Init ntb dev.
  * @get_peer_mw_addr: To get the addr of peer mw[mw_idx].
  * @mw_set_trans: Set translation of internal memory that remote can access.
+ * @ioremap: Translate the remote host address to bar address.
  * @get_link_status: get link status, link speed and link width.
  * @set_link: Set local side up/down.
  * @spad_read: Read local/peer spad register val.
@@ -103,6 +104,7 @@ struct ntb_dev_ops {
 	void *(*get_peer_mw_addr)(const struct rte_rawdev *dev, int mw_idx);
 	int (*mw_set_trans)(const struct rte_rawdev *dev, int mw_idx,
 			    uint64_t addr, uint64_t size);
+	void *(*ioremap)(const struct rte_rawdev *dev, uint64_t addr);
 	int (*get_link_status)(const struct rte_rawdev *dev);
 	int (*set_link)(const struct rte_rawdev *dev, bool up);
 	uint32_t (*spad_read)(const struct rte_rawdev *dev, int spad,
diff --git a/drivers/raw/ntb/ntb_hw_intel.c b/drivers/raw/ntb/ntb_hw_intel.c
index 0e73f1609..455cb4e3f 100644
--- a/drivers/raw/ntb/ntb_hw_intel.c
+++ b/drivers/raw/ntb/ntb_hw_intel.c
@@ -162,6 +162,27 @@ intel_ntb_mw_set_trans(const struct rte_rawdev *dev, int mw_idx,
 	return 0;
 }
 
+static void *
+intel_ntb_ioremap(const struct rte_rawdev *dev, uint64_t addr)
+{
+	struct ntb_hw *hw = dev->dev_private;
+	void *mapped = NULL;
+	void *base;
+	int i;
+
+	for (i = 0; i < hw->peer_used_mws; i++) {
+		if (addr >= hw->peer_mw_base[i] &&
+		    addr <= hw->peer_mw_base[i] + hw->mw_size[i]) {
+			base = intel_ntb_get_peer_mw_addr(dev, i);
+			mapped = (void *)(addr - hw->peer_mw_base[i] +
+				(uint64_t)base);
+			break;
+		}
+	}
+
+	return mapped;
+}
+
 static int
 intel_ntb_get_link_status(const struct rte_rawdev *dev)
 {
@@ -357,6 +378,7 @@ const struct ntb_dev_ops intel_ntb_ops = {
 	.ntb_dev_init       = intel_ntb_dev_init,
 	.get_peer_mw_addr   = intel_ntb_get_peer_mw_addr,
 	.mw_set_trans       = intel_ntb_mw_set_trans,
+	.ioremap            = intel_ntb_ioremap,
 	.get_link_status    = intel_ntb_get_link_status,
 	.set_link           = intel_ntb_set_link,
 	.spad_read          = intel_ntb_spad_read,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [PATCH 4/4] examples/ntb: support more functions for NTB
  2019-09-05  5:39 [dpdk-dev] [PATCH 0/4] enable FIFO for NTB Xiaoyun Li
                   ` (2 preceding siblings ...)
  2019-09-05  5:39 ` [dpdk-dev] [PATCH 3/4] raw/ntb: add enqueue and dequeue functions Xiaoyun Li
@ 2019-09-05  5:39 ` Xiaoyun Li
  2019-09-05 18:34 ` [dpdk-dev] [PATCH 0/4] enable FIFO " Maslekar, Omkar
  2019-09-06  3:02 ` [dpdk-dev] [PATCH v2 " Xiaoyun Li
  5 siblings, 0 replies; 42+ messages in thread
From: Xiaoyun Li @ 2019-09-05  5:39 UTC (permalink / raw)
  To: jingjing.wu, keith.wiles, omkar.maslekar, cunming.liang; +Cc: dev, Xiaoyun Li

Support to transmit files between two systems.
Support iofwd between one ethdev and NTB device.
Support rxonly and txonly for NTB device.
Support to set forwarding mode as file-trans, txonly,
rxonly or iofwd.
Support to show/clear port stats and throughput.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
---
 doc/guides/sample_app_ug/ntb.rst |   59 +-
 examples/ntb/meson.build         |    3 +
 examples/ntb/ntb_fwd.c           | 1297 +++++++++++++++++++++++++++---
 3 files changed, 1231 insertions(+), 128 deletions(-)

diff --git a/doc/guides/sample_app_ug/ntb.rst b/doc/guides/sample_app_ug/ntb.rst
index 079242175..f8291d7d1 100644
--- a/doc/guides/sample_app_ug/ntb.rst
+++ b/doc/guides/sample_app_ug/ntb.rst
@@ -5,8 +5,17 @@ NTB Sample Application
 ======================
 
 The ntb sample application shows how to use ntb rawdev driver.
-This sample provides interactive mode to transmit file between
-two hosts.
+This sample provides interactive mode to do packet based processing
+between two systems.
+
+This sample supports 4 types of packet forwarding mode.
+
+* ``file-trans``: transmit files between two systems. The sample will
+  be polling to receive files from the peer and save the file as
+  ``ntb_recv_file[N]``, [N] represents the number of received file.
+* ``rxonly``: NTB receives packets but doesn't transmit them.
+* ``txonly``: NTB generates and transmits packets without receiving any.
+* ``iofwd``: iofwd between NTB device and ethdev.
 
 Compiling the Application
 -------------------------
@@ -29,6 +38,40 @@ Refer to the *DPDK Getting Started Guide* for general information on
 running applications and the Environment Abstraction Layer (EAL)
 options.
 
+Command-line Options
+--------------------
+
+The application supports the following command-line options.
+
+* ``--buf-size=N``
+
+  Set the data size of the mbufs used to N bytes, where N < 65536.
+  The default value is 2048.
+
+* ``--fwd-mode=mode``
+
+  Set the packet forwarding mode as ``file-trans``, ``txonly``,
+  ``rxonly`` or ``iofwd``.
+
+* ``--nb-desc=N``
+
+  Set number of descriptors of queue as N, namely queue size,
+  where 64 <= N <= 1024. The default value is 1024.
+
+* ``--txfreet=N``
+
+  Set the transmit free threshold of TX rings to N, where 0 <= N <=
+  the value of ``--nb-desc``. The default value is 256.
+
+* ``--burst=N``
+
+  Set the number of packets per burst to N, where 1 <= N <= 32.
+  The default value is 32.
+
+* ``--qp=N``
+
+  Set the number of queues as N, where qp > 0.
+
 Using the application
 ---------------------
 
@@ -41,7 +84,11 @@ The application is console-driven using the cmdline DPDK interface:
 From this interface the available commands and descriptions of what
 they do as as follows:
 
-* ``send [filepath]``: Send file to the peer host.
-* ``receive [filepath]``: Receive file to [filepath]. Need the peer
-  to send file successfully first.
-* ``quit``: Exit program
+* ``send [filepath]``: Send file to the peer host. Need to be in
+  file-trans forwarding mode first.
+* ``start``: Start transmission.
+* ``stop``: Stop transmission.
+* ``show/clear port stats``: Show/Clear port stats and throughput.
+* ``set fwd file-trans/rxonly/txonly/iofwd``: Set packet forwarding
+  mode.
+* ``quit``: Exit program.
diff --git a/examples/ntb/meson.build b/examples/ntb/meson.build
index 9a6288f4f..f5435fe12 100644
--- a/examples/ntb/meson.build
+++ b/examples/ntb/meson.build
@@ -14,3 +14,6 @@ cflags += ['-D_FILE_OFFSET_BITS=64']
 sources = files(
 	'ntb_fwd.c'
 )
+if dpdk_conf.has('RTE_LIBRTE_PMD_NTB_RAWDEV')
+	deps += 'rawdev_ntb'
+endif
diff --git a/examples/ntb/ntb_fwd.c b/examples/ntb/ntb_fwd.c
index f8c970cdb..60b5b3b99 100644
--- a/examples/ntb/ntb_fwd.c
+++ b/examples/ntb/ntb_fwd.c
@@ -14,21 +14,102 @@
 #include <cmdline.h>
 #include <rte_common.h>
 #include <rte_rawdev.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
 #include <rte_lcore.h>
+#include <rte_pmd_ntb.h>
 
-#define NTB_DRV_NAME_LEN	7
-static uint64_t max_file_size = 0x400000;
+/* Per-port statistics struct */
+struct ntb_port_statistics {
+	uint64_t tx;
+	uint64_t rx;
+} __rte_cache_aligned;
+/* Port 0: NTB dev, Port 1: ethdev when iofwd. */
+struct ntb_port_statistics ntb_port_stats[2];
+
+struct ntb_fwd_stream {
+	uint16_t tx_port;
+	uint16_t rx_port;
+	uint16_t qp_id;
+	uint8_t tx_ntb;  /* If ntb device is tx port. */
+};
+
+struct ntb_fwd_lcore_conf {
+	uint16_t stream_id;
+	uint16_t nb_stream;
+	uint8_t stopped;
+};
+
+enum ntb_fwd_mode {
+	FILE_TRANS = 0,
+	RXONLY,
+	TXONLY,
+	IOFWD,
+	MAX_FWD_MODE,
+};
+static const char *const fwd_mode_s[] = {
+	"file-trans",
+	"rxonly",
+	"txonly",
+	"iofwd",
+	NULL,
+};
+static enum ntb_fwd_mode fwd_mode = MAX_FWD_MODE;
+
+static struct ntb_fwd_lcore_conf fwd_lcore_conf[RTE_MAX_LCORE];
+static struct ntb_fwd_stream *fwd_streams;
+
+static struct rte_mempool *mbuf_pool;
+
+#define NTB_DRV_NAME_LEN 7
+#define MEMPOOL_CACHE_SIZE 256
+
+static uint8_t in_test;
 static uint8_t interactive = 1;
+static uint16_t eth_port_id = RTE_MAX_ETHPORTS;
 static uint16_t dev_id;
 
+/* Number of queues, default set as 1 */
+static uint16_t num_queues = 1;
+static uint16_t ntb_buf_size = RTE_MBUF_DEFAULT_BUF_SIZE;
+
+/* Configurable number of descriptors */
+#define NTB_DEFAULT_NUM_DESCS 1024
+static uint16_t nb_desc = NTB_DEFAULT_NUM_DESCS;
+
+static uint16_t tx_free_thresh;
+
+#define NTB_MAX_PKT_BURST 32
+#define NTB_DFLT_PKT_BURST 32
+static uint16_t pkt_burst = NTB_DFLT_PKT_BURST;
+
+#define BURST_TX_RETRIES 64
+
+static struct rte_eth_conf eth_port_conf = {
+	.rxmode = {
+		.mq_mode = ETH_MQ_RX_RSS,
+		.split_hdr_size = 0,
+	},
+	.rx_adv_conf = {
+		.rss_conf = {
+			.rss_key = NULL,
+			.rss_hf = ETH_RSS_IP,
+		},
+	},
+	.txmode = {
+		.mq_mode = ETH_MQ_TX_NONE,
+	},
+};
+
 /* *** Help command with introduction. *** */
 struct cmd_help_result {
 	cmdline_fixed_string_t help;
 };
 
-static void cmd_help_parsed(__attribute__((unused)) void *parsed_result,
-			    struct cmdline *cl,
-			    __attribute__((unused)) void *data)
+static void
+cmd_help_parsed(__attribute__((unused)) void *parsed_result,
+		struct cmdline *cl,
+		__attribute__((unused)) void *data)
 {
 	cmdline_printf(
 		cl,
@@ -37,13 +118,17 @@ static void cmd_help_parsed(__attribute__((unused)) void *parsed_result,
 		"Control:\n"
 		"    quit                                      :"
 		" Quit the application.\n"
-		"\nFile transmit:\n"
+		"\nTransmission:\n"
 		"    send [path]                               :"
-		" Send [path] file. (No more than %"PRIu64")\n"
-		"    recv [path]                            :"
-		" Receive file to [path]. Make sure sending is done"
-		" on the other side.\n",
-		max_file_size
+		" Send [path] file. Only take effect in file-trans mode\n"
+		"    start                                     :"
+		" Start transmissions.\n"
+		"    stop                                      :"
+		" Stop transmissions.\n"
+		"    clear/show port stats                     :"
+		" Clear/show port stats.\n"
+		"    set fwd file-trans/rxonly/txonly/iofwd    :"
+		" Set packet forwarding mode.\n"
 	);
 
 }
@@ -66,13 +151,37 @@ struct cmd_quit_result {
 	cmdline_fixed_string_t quit;
 };
 
-static void cmd_quit_parsed(__attribute__((unused)) void *parsed_result,
-			    struct cmdline *cl,
-			    __attribute__((unused)) void *data)
+static void
+cmd_quit_parsed(__attribute__((unused)) void *parsed_result,
+		struct cmdline *cl,
+		__attribute__((unused)) void *data)
 {
+	struct ntb_fwd_lcore_conf *conf;
+	uint8_t lcore_id;
+
+	/* Stop transmission first. */
+	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
+		conf = &fwd_lcore_conf[lcore_id];
+
+		if (!conf->nb_stream)
+			continue;
+
+		if (conf->stopped)
+			continue;
+
+		conf->stopped = 1;
+	}
+	printf("\nWaiting for lcores to finish...\n");
+	rte_eal_mp_wait_lcore();
+	in_test = 0;
+
 	/* Stop traffic and Close port. */
 	rte_rawdev_stop(dev_id);
 	rte_rawdev_close(dev_id);
+	if (eth_port_id < RTE_MAX_ETHPORTS && fwd_mode == IOFWD) {
+		rte_eth_dev_stop(eth_port_id);
+		rte_eth_dev_close(eth_port_id);
+	}
 
 	cmdline_quit(cl);
 }
@@ -102,21 +211,19 @@ cmd_sendfile_parsed(void *parsed_result,
 		    __attribute__((unused)) void *data)
 {
 	struct cmd_sendfile_result *res = parsed_result;
-	struct rte_rawdev_buf *pkts_send[1];
-	uint64_t rsize, size, link;
-	uint8_t *buff;
+	struct rte_rawdev_buf *pkts_send[NTB_MAX_PKT_BURST];
+	struct rte_mbuf *mbuf_send[NTB_MAX_PKT_BURST];
+	uint64_t size, count, i, nb_burst;
+	uint16_t nb_tx, buf_size;
+	unsigned int nb_pkt;
+	size_t queue_id = 0;
+	uint16_t retry = 0;
 	uint32_t val;
 	FILE *file;
 
-	if (!rte_rawdevs[dev_id].started) {
-		printf("Device needs to be up first. Try later.\n");
-		return;
-	}
-
-	rte_rawdev_get_attr(dev_id, "link_status", &link);
-	if (!link) {
-		printf("Link is not up, cannot send file.\n");
-		return;
+	if (num_queues != 1) {
+		printf("File transmission only supports 1 queue.\n");
+		num_queues = 1;
 	}
 
 	file = fopen(res->filepath, "r");
@@ -127,30 +234,13 @@ cmd_sendfile_parsed(void *parsed_result,
 
 	if (fseek(file, 0, SEEK_END) < 0) {
 		printf("Fail to get file size.\n");
+		fclose(file);
 		return;
 	}
 	size = ftell(file);
 	if (fseek(file, 0, SEEK_SET) < 0) {
 		printf("Fail to get file size.\n");
-		return;
-	}
-
-	/**
-	 * No FIFO now. Only test memory. Limit sending file
-	 * size <= max_file_size.
-	 */
-	if (size > max_file_size) {
-		printf("Warning: The file is too large. Only send first"
-		       " %"PRIu64" bits.\n", max_file_size);
-		size = max_file_size;
-	}
-
-	buff = (uint8_t *)malloc(size);
-	rsize = fread(buff, size, 1, file);
-	if (rsize != 1) {
-		printf("Fail to read file.\n");
 		fclose(file);
-		free(buff);
 		return;
 	}
 
@@ -159,22 +249,63 @@ cmd_sendfile_parsed(void *parsed_result,
 	rte_rawdev_set_attr(dev_id, "spad_user_0", val);
 	val = size;
 	rte_rawdev_set_attr(dev_id, "spad_user_1", val);
+	printf("Sending file, size is %"PRIu64"\n", size);
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		pkts_send[i] = (struct rte_rawdev_buf *)
+				malloc(sizeof(struct rte_rawdev_buf));
+
+	buf_size = ntb_buf_size - RTE_PKTMBUF_HEADROOM;
+	count = (size + buf_size - 1) / buf_size;
+	nb_burst = (count + pkt_burst - 1) / pkt_burst;
 
-	pkts_send[0] = (struct rte_rawdev_buf *)malloc
-			(sizeof(struct rte_rawdev_buf));
-	pkts_send[0]->buf_addr = buff;
+	for (i = 0; i < nb_burst; i++) {
+		val = RTE_MIN(count, pkt_burst);
+		if (rte_mempool_get_bulk(mbuf_pool, (void **)mbuf_send,
+					val) == 0) {
+			for (nb_pkt = 0; nb_pkt < val; nb_pkt++) {
+				mbuf_send[nb_pkt]->port = dev_id;
+				mbuf_send[nb_pkt]->data_len =
+				fread(rte_pktmbuf_mtod(mbuf_send[nb_pkt],
+					void *), 1, buf_size, file);
+				mbuf_send[nb_pkt]->pkt_len =
+					mbuf_send[nb_pkt]->data_len;
+				pkts_send[nb_pkt]->buf_addr = mbuf_send[nb_pkt];
+			}
+		} else {
+			for (nb_pkt = 0; nb_pkt < val; nb_pkt++) {
+				mbuf_send[nb_pkt] =
+					rte_mbuf_raw_alloc(mbuf_pool);
+				if (mbuf_send[nb_pkt] == NULL)
+					break;
+				mbuf_send[nb_pkt]->port = dev_id;
+				mbuf_send[nb_pkt]->data_len =
+				fread(rte_pktmbuf_mtod(mbuf_send[nb_pkt],
+					void *), 1, buf_size, file);
+				mbuf_send[nb_pkt]->pkt_len =
+					mbuf_send[nb_pkt]->data_len;
+				pkts_send[nb_pkt]->buf_addr = mbuf_send[nb_pkt];
+			}
+		}
 
-	if (rte_rawdev_enqueue_buffers(dev_id, pkts_send, 1,
-				       (void *)(size_t)size)) {
-		printf("Fail to enqueue.\n");
-		goto clean;
+		nb_tx = rte_rawdev_enqueue_buffers(dev_id, pkts_send, nb_pkt,
+						   (void *)queue_id);
+		while (nb_tx != nb_pkt && retry < BURST_TX_RETRIES) {
+			rte_delay_us(1);
+			nb_tx += rte_rawdev_enqueue_buffers(dev_id,
+				&pkts_send[nb_tx], nb_pkt - nb_tx,
+				(void *)queue_id);
+		}
+		count -= nb_pkt;
 	}
+	/* Clear register after file sending done. */
+	rte_rawdev_set_attr(dev_id, "spad_user_0", 0);
+	rte_rawdev_set_attr(dev_id, "spad_user_1", 0);
 	printf("Done sending file.\n");
 
-clean:
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		free(pkts_send[i]);
 	fclose(file);
-	free(buff);
-	free(pkts_send[0]);
 }
 
 cmdline_parse_token_string_t cmd_send_file_send =
@@ -195,79 +326,680 @@ cmdline_parse_inst_t cmd_send_file = {
 	},
 };
 
-/* *** RECEIVE FILE PARAMETERS *** */
-struct cmd_recvfile_result {
-	cmdline_fixed_string_t recv_string;
-	char filepath[];
-};
+#define RECV_FILE_LEN 30
+static int
+start_polling_recv_file(void *param)
+{
+	struct rte_rawdev_buf *pkts_recv[NTB_MAX_PKT_BURST];
+	struct ntb_fwd_lcore_conf *conf = param;
+	struct rte_mbuf *mbuf;
+	char filepath[RECV_FILE_LEN];
+	uint64_t val, size, file_len;
+	uint16_t nb_rx, i, file_no;
+	size_t queue_id = 0;
+	FILE *file;
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		pkts_recv[i] = (struct rte_rawdev_buf *)
+				malloc(sizeof(struct rte_rawdev_buf));
+
+	file_no = 0;
+	while (!conf->stopped) {
+		snprintf(filepath, RECV_FILE_LEN, "ntb_recv_file%d", file_no);
+		file = fopen(filepath, "w");
+		if (file == NULL) {
+			printf("Fail to open the file.\n");
+			return -EINVAL;
+		}
+
+		rte_rawdev_get_attr(dev_id, "spad_user_0", &val);
+		size = val << 32;
+		rte_rawdev_get_attr(dev_id, "spad_user_1", &val);
+		size |= val;
+
+		if (!size) {
+			fclose(file);
+			continue;
+		}
+
+		file_len = 0;
+		nb_rx = NTB_MAX_PKT_BURST;
+		while (file_len < size && !conf->stopped) {
+			nb_rx = rte_rawdev_dequeue_buffers(dev_id, pkts_recv,
+						pkt_burst, (void *)queue_id);
+			ntb_port_stats[0].rx += nb_rx;
+			for (i = 0; i < nb_rx; i++) {
+				mbuf = pkts_recv[i]->buf_addr;
+				fwrite(rte_pktmbuf_mtod(mbuf, void *), 1,
+					mbuf->data_len, file);
+				file_len += mbuf->data_len;
+				rte_pktmbuf_free(mbuf);
+				pkts_recv[i]->buf_addr = NULL;
+			}
+		}
+
+		printf("Received file (size: %" PRIu64 ") from peer to %s.\n",
+			size, filepath);
+		fclose(file);
+		file_no++;
+	}
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		free(pkts_recv[i]);
+	return 0;
+}
+
+static int
+start_iofwd_per_lcore(void *param)
+{
+	struct rte_rawdev_buf *ntb_buf[NTB_MAX_PKT_BURST];
+	struct rte_mbuf *pkts_burst[NTB_MAX_PKT_BURST];
+	struct ntb_fwd_lcore_conf *conf = param;
+	struct ntb_fwd_stream fs;
+	uint16_t nb_rx, nb_tx;
+	int i, j;
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		ntb_buf[i] = (struct rte_rawdev_buf *)
+			     malloc(sizeof(struct rte_rawdev_buf));
+
+	while (!conf->stopped) {
+		for (i = 0; i < conf->nb_stream; i++) {
+			fs = fwd_streams[conf->stream_id + i];
+			if (fs.tx_ntb) {
+				nb_rx = rte_eth_rx_burst(fs.rx_port,
+						fs.qp_id, pkts_burst,
+						pkt_burst);
+				if (unlikely(nb_rx == 0))
+					continue;
+				for (j = 0; j < nb_rx; j++)
+					ntb_buf[j]->buf_addr = pkts_burst[j];
+				nb_tx =
+				rte_rawdev_enqueue_buffers(fs.tx_port,
+						ntb_buf, nb_rx,
+						(void *)(size_t)fs.qp_id);
+				ntb_port_stats[0].tx += nb_tx;
+				ntb_port_stats[1].rx += nb_rx;
+			} else {
+				nb_rx =
+				rte_rawdev_dequeue_buffers(fs.rx_port,
+						ntb_buf, pkt_burst,
+						(void *)(size_t)fs.qp_id);
+				if (unlikely(nb_rx == 0))
+					continue;
+				for (j = 0; j < nb_rx; j++)
+					pkts_burst[j] = ntb_buf[j]->buf_addr;
+				nb_tx = rte_eth_tx_burst(fs.tx_port,
+					fs.qp_id, pkts_burst, nb_rx);
+				ntb_port_stats[1].tx += nb_tx;
+				ntb_port_stats[0].rx += nb_rx;
+			}
+			if (unlikely(nb_tx < nb_rx)) {
+				do {
+					rte_pktmbuf_free(pkts_burst[nb_tx]);
+				} while (++nb_tx < nb_rx);
+			}
+		}
+	}
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		free(ntb_buf[i]);
+
+	return 0;
+}
+
+static int
+start_rxonly_per_lcore(void *param)
+{
+	struct rte_rawdev_buf *ntb_buf[NTB_MAX_PKT_BURST];
+	struct ntb_fwd_lcore_conf *conf = param;
+	struct ntb_fwd_stream fs;
+	uint16_t nb_rx;
+	int i, j;
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		ntb_buf[i] = (struct rte_rawdev_buf *)
+			     malloc(sizeof(struct rte_rawdev_buf));
+
+	while (!conf->stopped) {
+		for (i = 0; i < conf->nb_stream; i++) {
+			fs = fwd_streams[conf->stream_id + i];
+			nb_rx = rte_rawdev_dequeue_buffers(fs.rx_port,
+				ntb_buf, pkt_burst, (void *)(size_t)fs.qp_id);
+			if (unlikely(nb_rx == 0))
+				continue;
+			ntb_port_stats[0].rx += nb_rx;
+
+			for (j = 0; j < nb_rx; j++)
+				rte_pktmbuf_free(ntb_buf[j]->buf_addr);
+		}
+	}
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		free(ntb_buf[i]);
+
+	return 0;
+}
+
+
+static int
+start_txonly_per_lcore(void *param)
+{
+	struct rte_rawdev_buf *ntb_buf[NTB_MAX_PKT_BURST];
+	struct rte_mbuf *pkts_burst[NTB_MAX_PKT_BURST];
+	struct ntb_fwd_lcore_conf *conf = param;
+	struct ntb_fwd_stream fs;
+	uint16_t nb_pkt, nb_tx;
+	int i;
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		ntb_buf[i] = (struct rte_rawdev_buf *)
+			     malloc(sizeof(struct rte_rawdev_buf));
+
+	while (!conf->stopped) {
+		for (i = 0; i < conf->nb_stream; i++) {
+			fs = fwd_streams[conf->stream_id + i];
+			if (rte_mempool_get_bulk(mbuf_pool, (void **)pkts_burst,
+				  pkt_burst) == 0) {
+				for (nb_pkt = 0; nb_pkt < pkt_burst; nb_pkt++) {
+					pkts_burst[nb_pkt]->port = dev_id;
+					pkts_burst[nb_pkt]->data_len =
+						pkts_burst[nb_pkt]->buf_len -
+						RTE_PKTMBUF_HEADROOM;
+					pkts_burst[nb_pkt]->pkt_len =
+						pkts_burst[nb_pkt]->data_len;
+					ntb_buf[nb_pkt]->buf_addr =
+						pkts_burst[nb_pkt];
+				}
+			} else {
+				for (nb_pkt = 0; nb_pkt < pkt_burst; nb_pkt++) {
+					pkts_burst[nb_pkt] =
+						rte_pktmbuf_alloc(mbuf_pool);
+					if (pkts_burst[nb_pkt] == NULL)
+						break;
+					pkts_burst[nb_pkt]->port = dev_id;
+					pkts_burst[nb_pkt]->data_len =
+						pkts_burst[nb_pkt]->buf_len -
+						RTE_PKTMBUF_HEADROOM;
+					pkts_burst[nb_pkt]->pkt_len =
+						pkts_burst[nb_pkt]->data_len;
+					ntb_buf[nb_pkt]->buf_addr =
+						pkts_burst[nb_pkt];
+				}
+			}
+			nb_tx = rte_rawdev_enqueue_buffers(fs.tx_port,
+				ntb_buf, nb_pkt, (void *)(size_t)fs.qp_id);
+			ntb_port_stats[0].tx += nb_tx;
+			if (unlikely(nb_tx < nb_pkt)) {
+				do {
+					rte_pktmbuf_free(
+						ntb_buf[nb_tx]->buf_addr);
+				} while (++nb_tx < nb_pkt);
+			}
+		}
+	}
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		free(ntb_buf[i]);
+
+	return 0;
+}
+
+static int
+ntb_fwd_config_setup(void)
+{
+	uint16_t i;
+
+	/* Make sure iofwd has valid ethdev. */
+	if (fwd_mode == IOFWD && eth_port_id >= RTE_MAX_ETHPORTS) {
+		printf("No ethdev, cannot be in iofwd mode.");
+		return -EINVAL;
+	}
+
+	if (fwd_mode == IOFWD) {
+		fwd_streams = rte_zmalloc("ntb_fwd: fwd_streams",
+			sizeof(struct ntb_fwd_stream) * num_queues * 2,
+			RTE_CACHE_LINE_SIZE);
+		for (i = 0; i < num_queues; i++) {
+			fwd_streams[i * 2].qp_id = i;
+			fwd_streams[i * 2].tx_port = dev_id;
+			fwd_streams[i * 2].rx_port = eth_port_id;
+			fwd_streams[i * 2].tx_ntb = 1;
+
+			fwd_streams[i * 2 + 1].qp_id = i;
+			fwd_streams[i * 2 + 1].tx_port = eth_port_id;
+			fwd_streams[i * 2 + 1].rx_port = dev_id;
+			fwd_streams[i * 2 + 1].tx_ntb = 0;
+		}
+		return 0;
+	}
+
+	if (fwd_mode == RXONLY || fwd_mode == FILE_TRANS) {
+		/* Only support 1 queue in file-trans for in order. */
+		if (fwd_mode == FILE_TRANS)
+			num_queues = 1;
+
+		fwd_streams = rte_zmalloc("ntb_fwd: fwd_streams",
+				sizeof(struct ntb_fwd_stream) * num_queues,
+				RTE_CACHE_LINE_SIZE);
+		for (i = 0; i < num_queues; i++) {
+			fwd_streams[i].qp_id = i;
+			fwd_streams[i].tx_port = RTE_MAX_ETHPORTS;
+			fwd_streams[i].rx_port = dev_id;
+			fwd_streams[i].tx_ntb = 0;
+		}
+		return 0;
+	}
+
+	if (fwd_mode == TXONLY) {
+		fwd_streams = rte_zmalloc("ntb_fwd: fwd_streams",
+				sizeof(struct ntb_fwd_stream) * num_queues,
+				RTE_CACHE_LINE_SIZE);
+		for (i = 0; i < num_queues; i++) {
+			fwd_streams[i].qp_id = i;
+			fwd_streams[i].tx_port = dev_id;
+			fwd_streams[i].rx_port = RTE_MAX_ETHPORTS;
+			fwd_streams[i].tx_ntb = 1;
+		}
+	}
+	return 0;
+}
 
 static void
-cmd_recvfile_parsed(void *parsed_result,
-		    __attribute__((unused)) struct cmdline *cl,
-		    __attribute__((unused)) void *data)
+assign_stream_to_lcores(void)
 {
-	struct cmd_sendfile_result *res = parsed_result;
-	struct rte_rawdev_buf *pkts_recv[1];
-	uint8_t *buff;
-	uint64_t val;
-	size_t size;
-	FILE *file;
+	struct ntb_fwd_lcore_conf *conf;
+	struct ntb_fwd_stream *fs;
+	uint16_t nb_streams, sm_per_lcore, sm_id, i;
+	uint8_t lcore_id, lcore_num, nb_extra;
 
-	if (!rte_rawdevs[dev_id].started) {
-		printf("Device needs to be up first. Try later.\n");
-		return;
+	lcore_num = rte_lcore_count();
+	/* Exclude master core */
+	lcore_num--;
+
+	nb_streams = (fwd_mode == IOFWD) ? num_queues * 2 : num_queues;
+
+	sm_per_lcore = nb_streams / lcore_num;
+	nb_extra = nb_streams % lcore_num;
+	sm_id = 0;
+	i = 0;
+
+	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
+		conf = &fwd_lcore_conf[lcore_id];
+
+		if (i < nb_extra) {
+			conf->nb_stream = sm_per_lcore + 1;
+			conf->stream_id = sm_id;
+			sm_id = sm_id + sm_per_lcore + 1;
+		} else {
+			conf->nb_stream = sm_per_lcore;
+			conf->stream_id = sm_id;
+			sm_id = sm_id + sm_per_lcore;
+		}
+
+		i++;
+		if (sm_id >= nb_streams)
+			break;
+	}
+
+	/* Print packet forwading config. */
+	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
+		conf = &fwd_lcore_conf[lcore_id];
+
+		if (!conf->nb_stream)
+			continue;
+
+		printf("Streams on Lcore %u :\n", lcore_id);
+		for (i = 0; i < conf->nb_stream; i++) {
+			fs = &fwd_streams[conf->stream_id + i];
+			if (fwd_mode == IOFWD)
+				printf(" + Stream %u : %s%u RX -> %s%u TX,"
+					" Q=%u\n", conf->stream_id + i,
+					fs->tx_ntb ? "Eth" : "NTB", fs->rx_port,
+					fs->tx_ntb ? "NTB" : "Eth", fs->tx_port,
+					fs->qp_id);
+			if (fwd_mode == FILE_TRANS || fwd_mode == RXONLY)
+				printf(" + Stream %u : %s%u RX only\n",
+					conf->stream_id, "NTB", fs->rx_port);
+			if (fwd_mode == TXONLY)
+				printf(" + Stream %u : %s%u TX only\n",
+					conf->stream_id, "NTB", fs->tx_port);
+		}
 	}
+}
 
-	rte_rawdev_get_attr(dev_id, "link_status", &val);
-	if (!val) {
-		printf("Link is not up, cannot receive file.\n");
+static void
+start_pkt_fwd(void)
+{
+	struct ntb_fwd_lcore_conf *conf;
+	struct rte_eth_link eth_link;
+	uint8_t lcore_id;
+	int ret, i;
+
+	ret = ntb_fwd_config_setup();
+	if (ret < 0) {
+		printf("Cannot start traffic. Please reset fwd mode.\n");
 		return;
 	}
 
-	file = fopen(res->filepath, "w");
-	if (file == NULL) {
-		printf("Fail to open the file.\n");
+	/* If using iofwd, checking ethdev link status first. */
+	if (fwd_mode == IOFWD) {
+		printf("Checking eth link status...\n");
+		/* Wait for eth link up at most 100 times. */
+		for (i = 0; i < 100; i++) {
+			rte_eth_link_get(eth_port_id, &eth_link);
+			if (eth_link.link_status) {
+				printf("Eth%u Link Up. Speed %u Mbps - %s\n",
+					eth_port_id, eth_link.link_speed,
+					(eth_link.link_duplex ==
+					 ETH_LINK_FULL_DUPLEX) ?
+					("full-duplex") : ("half-duplex"));
+				break;
+			}
+		}
+		if (!eth_link.link_status) {
+			printf("Eth%u link down. Cannot start traffic.\n",
+				eth_port_id);
+			return;
+		}
+	}
+
+	assign_stream_to_lcores();
+	in_test = 1;
+
+	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
+		conf = &fwd_lcore_conf[lcore_id];
+
+		if (!conf->nb_stream)
+			continue;
+
+		conf->stopped = 0;
+		if (fwd_mode == FILE_TRANS)
+			rte_eal_remote_launch(start_polling_recv_file,
+					      conf, lcore_id);
+		else if (fwd_mode == IOFWD)
+			rte_eal_remote_launch(start_iofwd_per_lcore,
+					      conf, lcore_id);
+		else if (fwd_mode == RXONLY)
+			rte_eal_remote_launch(start_rxonly_per_lcore,
+					      conf, lcore_id);
+		else if (fwd_mode == TXONLY)
+			rte_eal_remote_launch(start_txonly_per_lcore,
+					      conf, lcore_id);
+	}
+}
+
+/* *** START FWD PARAMETERS *** */
+struct cmd_start_result {
+	cmdline_fixed_string_t start;
+};
+
+static void
+cmd_start_parsed(__attribute__((unused)) void *parsed_result,
+			    __attribute__((unused)) struct cmdline *cl,
+			    __attribute__((unused)) void *data)
+{
+	start_pkt_fwd();
+}
+
+cmdline_parse_token_string_t cmd_start_start =
+		TOKEN_STRING_INITIALIZER(struct cmd_start_result, start, "start");
+
+cmdline_parse_inst_t cmd_start = {
+	.f = cmd_start_parsed,
+	.data = NULL,
+	.help_str = "start pkt fwd between ntb and ethdev",
+	.tokens = {
+		(void *)&cmd_start_start,
+		NULL,
+	},
+};
+
+/* *** STOP *** */
+struct cmd_stop_result {
+	cmdline_fixed_string_t stop;
+};
+
+static void
+cmd_stop_parsed(__attribute__((unused)) void *parsed_result,
+		__attribute__((unused)) struct cmdline *cl,
+		__attribute__((unused)) void *data)
+{
+	struct ntb_fwd_lcore_conf *conf;
+	uint8_t lcore_id;
+
+	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
+		conf = &fwd_lcore_conf[lcore_id];
+
+		if (!conf->nb_stream)
+			continue;
+
+		if (conf->stopped)
+			continue;
+
+		conf->stopped = 1;
+	}
+	printf("\nWaiting for lcores to finish...\n");
+	rte_eal_mp_wait_lcore();
+	in_test = 0;
+	printf("\nDone.\n");
+}
+
+cmdline_parse_token_string_t cmd_stop_stop =
+		TOKEN_STRING_INITIALIZER(struct cmd_stop_result, stop, "stop");
+
+cmdline_parse_inst_t cmd_stop = {
+	.f = cmd_stop_parsed,
+	.data = NULL,
+	.help_str = "stop: Stop packet forwarding",
+	.tokens = {
+		(void *)&cmd_stop_stop,
+		NULL,
+	},
+};
+
+static void
+ntb_stats_clear(void)
+{
+	int nb_ids, i;
+	uint32_t *ids;
+
+	/* Clear NTB dev stats */
+	nb_ids = rte_rawdev_xstats_names_get(dev_id, NULL, 0);
+	if (nb_ids  < 0) {
+		printf("Error: Cannot get count of xstats\n");
 		return;
 	}
+	ids = malloc(sizeof(uint32_t) * nb_ids);
+	for (i = 0; i < nb_ids; i++)
+		ids[i] = i;
+	rte_rawdev_xstats_reset(dev_id, ids, nb_ids);
+	printf("\n  statistics for NTB port %d cleared\n", dev_id);
+
+	/* Clear Ethdev stats if have any */
+	if (fwd_mode == IOFWD && eth_port_id != RTE_MAX_ETHPORTS) {
+		rte_eth_stats_reset(eth_port_id);
+		printf("\n  statistics for ETH port %d cleared\n", eth_port_id);
+	}
+}
+
+static inline void
+ntb_calculate_throughput(uint16_t port) {
+	uint64_t diff_pkts_rx, diff_pkts_tx, diff_cycles;
+	uint64_t mpps_rx, mpps_tx;
+	static uint64_t prev_pkts_rx[2];
+	static uint64_t prev_pkts_tx[2];
+	static uint64_t prev_cycles[2];
+
+	diff_cycles = prev_cycles[port];
+	prev_cycles[port] = rte_rdtsc();
+	if (diff_cycles > 0)
+		diff_cycles = prev_cycles[port] - diff_cycles;
+	diff_pkts_rx = (ntb_port_stats[port].rx > prev_pkts_rx[port]) ?
+		(ntb_port_stats[port].rx - prev_pkts_rx[port]) : 0;
+	diff_pkts_tx = (ntb_port_stats[port].tx > prev_pkts_tx[port]) ?
+		(ntb_port_stats[port].tx - prev_pkts_tx[port]) : 0;
+	prev_pkts_rx[port] = ntb_port_stats[port].rx;
+	prev_pkts_tx[port] = ntb_port_stats[port].tx;
+	mpps_rx = diff_cycles > 0 ?
+		diff_pkts_rx * rte_get_tsc_hz() / diff_cycles : 0;
+	mpps_tx = diff_cycles > 0 ?
+		diff_pkts_tx * rte_get_tsc_hz() / diff_cycles : 0;
+	printf("  Throughput (since last show)\n");
+	printf("  Rx-pps: %12"PRIu64"\n  Tx-pps: %12"PRIu64"\n",
+			mpps_rx, mpps_tx);
+
+}
+
+static void
+ntb_stats_display(void)
+{
+	struct rte_rawdev_xstats_name *xstats_names;
+	struct rte_eth_stats stats;
+	uint64_t *values;
+	uint32_t *ids;
+	int nb_ids, i;
 
-	rte_rawdev_get_attr(dev_id, "spad_user_0", &val);
-	size = val << 32;
-	rte_rawdev_get_attr(dev_id, "spad_user_1", &val);
-	size |= val;
+	printf("###### statistics for NTB port %d #######\n", dev_id);
 
-	buff = (uint8_t *)malloc(size);
-	pkts_recv[0] = (struct rte_rawdev_buf *)malloc
-			(sizeof(struct rte_rawdev_buf));
-	pkts_recv[0]->buf_addr = buff;
+	/* Get NTB dev stats and stats names */
+	nb_ids = rte_rawdev_xstats_names_get(dev_id, NULL, 0);
+	if (nb_ids  < 0) {
+		printf("Error: Cannot get count of xstats\n");
+		return;
+	}
+	xstats_names = malloc(sizeof(struct rte_rawdev_xstats_name) * nb_ids);
+	if (xstats_names == NULL) {
+		printf("Cannot allocate memory for xstats lookup\n");
+		return;
+	}
+	if (nb_ids != rte_rawdev_xstats_names_get(
+			dev_id, xstats_names, nb_ids)) {
+		printf("Error: Cannot get xstats lookup\n");
+		free(xstats_names);
+		return;
+	}
+	ids = malloc(sizeof(uint32_t) * nb_ids);
+	for (i = 0; i < nb_ids; i++)
+		ids[i] = i;
+	values = malloc(sizeof(uint64_t) * nb_ids);
+	if (nb_ids != rte_rawdev_xstats_get(dev_id, ids, values, nb_ids)) {
+		printf("Error: Unable to get xstats\n");
+		free(xstats_names);
+		free(values);
+		free(ids);
+		return;
+	}
+
+	/* Display NTB dev stats */
+	for (i = 0; i < nb_ids; i++)
+		printf("  %s: %"PRIu64"\n", xstats_names[i].name, values[i]);
+	ntb_calculate_throughput(0);
 
-	if (rte_rawdev_dequeue_buffers(dev_id, pkts_recv, 1, (void *)size)) {
-		printf("Fail to dequeue.\n");
-		goto clean;
+	/* Get Ethdev stats if have any */
+	if (fwd_mode == IOFWD && eth_port_id != RTE_MAX_ETHPORTS) {
+		printf("###### statistics for ETH port %d ######\n",
+			eth_port_id);
+		rte_eth_stats_get(eth_port_id, &stats);
+		printf("  RX-packets: %"PRIu64"\n", stats.ipackets);
+		printf("  RX-bytes: %"PRIu64"\n", stats.ibytes);
+		printf("  RX-errors: %"PRIu64"\n", stats.ierrors);
+		printf("  RX-missed: %"PRIu64"\n", stats.imissed);
+		printf("  TX-packets: %"PRIu64"\n", stats.opackets);
+		printf("  TX-bytes: %"PRIu64"\n", stats.obytes);
+		printf("  TX-errors: %"PRIu64"\n", stats.oerrors);
+		ntb_calculate_throughput(1);
 	}
 
-	fwrite(buff, size, 1, file);
-	printf("Done receiving to file.\n");
+	free(xstats_names);
+	free(values);
+	free(ids);
+}
 
-clean:
-	fclose(file);
-	free(buff);
-	free(pkts_recv[0]);
+/* *** SHOW/CLEAR PORT STATS *** */
+struct cmd_stats_result {
+	cmdline_fixed_string_t show;
+	cmdline_fixed_string_t port;
+	cmdline_fixed_string_t stats;
+};
+
+static void
+cmd_stats_parsed(void *parsed_result,
+		 __attribute__((unused)) struct cmdline *cl,
+		 __attribute__((unused)) void *data)
+{
+	struct cmd_stats_result *res = parsed_result;
+	if (!strcmp(res->show, "clear"))
+		ntb_stats_clear();
+	else
+		ntb_stats_display();
 }
 
-cmdline_parse_token_string_t cmd_recv_file_recv =
-	TOKEN_STRING_INITIALIZER(struct cmd_recvfile_result, recv_string,
-				 "recv");
-cmdline_parse_token_string_t cmd_recv_file_filepath =
-	TOKEN_STRING_INITIALIZER(struct cmd_recvfile_result, filepath, NULL);
+cmdline_parse_token_string_t cmd_stats_show =
+	TOKEN_STRING_INITIALIZER(struct cmd_stats_result, show, "show#clear");
+cmdline_parse_token_string_t cmd_stats_port =
+	TOKEN_STRING_INITIALIZER(struct cmd_stats_result, port, "port");
+cmdline_parse_token_string_t cmd_stats_stats =
+	TOKEN_STRING_INITIALIZER(struct cmd_stats_result, stats, "stats");
 
 
-cmdline_parse_inst_t cmd_recv_file = {
-	.f = cmd_recvfile_parsed,
+cmdline_parse_inst_t cmd_stats = {
+	.f = cmd_stats_parsed,
 	.data = NULL,
-	.help_str = "recv <file_path>",
+	.help_str = "show|clear port stats",
 	.tokens = {
-		(void *)&cmd_recv_file_recv,
-		(void *)&cmd_recv_file_filepath,
+		(void *)&cmd_stats_show,
+		(void *)&cmd_stats_port,
+		(void *)&cmd_stats_stats,
+		NULL,
+	},
+};
+
+/* *** SET FORWARDING MODE *** */
+struct cmd_set_fwd_mode_result {
+	cmdline_fixed_string_t set;
+	cmdline_fixed_string_t fwd;
+	cmdline_fixed_string_t mode;
+};
+
+static void
+cmd_set_fwd_mode_parsed(__attribute__((unused)) void *parsed_result,
+			__attribute__((unused)) struct cmdline *cl,
+			__attribute__((unused)) void *data)
+{
+	struct cmd_set_fwd_mode_result *res = parsed_result;
+	int i;
+
+	if (in_test) {
+		printf("Please stop traffic first.\n");
+		return;
+	}
+
+	for (i = 0; i < MAX_FWD_MODE; i++) {
+		if (!strcmp(res->mode, fwd_mode_s[i])) {
+			fwd_mode = i;
+			return;
+		}
+	}
+	printf("Invalid %s packet forwarding mode.\n", res->mode);
+}
+
+cmdline_parse_token_string_t cmd_setfwd_set =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_fwd_mode_result, set, "set");
+cmdline_parse_token_string_t cmd_setfwd_fwd =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_fwd_mode_result, fwd, "fwd");
+cmdline_parse_token_string_t cmd_setfwd_mode =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_fwd_mode_result, mode,
+				"file-trans#iofwd#txonly#rxonly");
+
+cmdline_parse_inst_t cmd_set_fwd_mode = {
+	.f = cmd_set_fwd_mode_parsed,
+	.data = NULL,
+	.help_str = "set forwarding mode as file-trans|rxonly|txonly|iofwd",
+	.tokens = {
+		(void *)&cmd_setfwd_set,
+		(void *)&cmd_setfwd_fwd,
+		(void *)&cmd_setfwd_mode,
 		NULL,
 	},
 };
@@ -276,7 +1008,10 @@ cmdline_parse_inst_t cmd_recv_file = {
 cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_help,
 	(cmdline_parse_inst_t *)&cmd_send_file,
-	(cmdline_parse_inst_t *)&cmd_recv_file,
+	(cmdline_parse_inst_t *)&cmd_start,
+	(cmdline_parse_inst_t *)&cmd_stop,
+	(cmdline_parse_inst_t *)&cmd_stats,
+	(cmdline_parse_inst_t *)&cmd_set_fwd_mode,
 	(cmdline_parse_inst_t *)&cmd_quit,
 	NULL,
 };
@@ -305,45 +1040,257 @@ signal_handler(int signum)
 	}
 }
 
+#define OPT_BUF_SIZE         "buf-size"
+#define OPT_FWD_MODE         "fwd-mode"
+#define OPT_NB_DESC          "nb-desc"
+#define OPT_TXFREET          "txfreet"
+#define OPT_BURST            "burst"
+#define OPT_QP               "qp"
+
+enum {
+	/* long options mapped to a short option */
+	OPT_NO_ZERO_COPY_NUM = 1,
+	OPT_BUF_SIZE_NUM,
+	OPT_FWD_MODE_NUM,
+	OPT_NB_DESC_NUM,
+	OPT_TXFREET_NUM,
+	OPT_BURST_NUM,
+	OPT_QP_NUM,
+};
+
+static const char short_options[] =
+	"i" /* interactive mode */
+	;
+
+static const struct option lgopts[] = {
+	{OPT_BUF_SIZE,     1, NULL, OPT_BUF_SIZE_NUM     },
+	{OPT_FWD_MODE,     1, NULL, OPT_FWD_MODE_NUM     },
+	{OPT_NB_DESC,      1, NULL, OPT_NB_DESC_NUM      },
+	{OPT_TXFREET,      1, NULL, OPT_TXFREET_NUM      },
+	{OPT_BURST,        1, NULL, OPT_BURST_NUM        },
+	{OPT_QP,           1, NULL, OPT_QP_NUM           },
+	{0,                0, NULL, 0                    }
+};
+
 static void
 ntb_usage(const char *prgname)
 {
 	printf("%s [EAL options] -- [options]\n"
-	       "-i : run in interactive mode (default value is 1)\n",
-	       prgname);
+	       "-i: run in interactive mode.\n"
+	       "-qp=N: set number of queues as N (N > 0, default: 1).\n"
+	       "--fwd-mode=N: set fwd mode (N: file-trans | rxonly | "
+	       "txonly | iofwd, default: file-trans)\n"
+	       "--buf-size=N: set mbuf dataroom size as N (0 < N < 65535,"
+	       " default: 2048).\n"
+	       "--nb-desc=N: set number of descriptors as N (%u <= N <= %u,"
+	       " default: 1024).\n"
+	       "--txfreet=N: set tx free thresh for NTB driver as N. (N >= 0)\n"
+	       "--burst=N: set pkt burst as N (0 < N <= %u default: 32).\n",
+	       prgname, NTB_MIN_DESC_SIZE, NTB_MAX_DESC_SIZE,
+	       NTB_MAX_PKT_BURST);
 }
 
-static int
-parse_args(int argc, char **argv)
+static void
+ntb_parse_args(int argc, char **argv)
 {
 	char *prgname = argv[0], **argvopt = argv;
-	int opt, ret;
+	int opt, opt_idx, n, i;
 
-	/* Only support interactive mode to send/recv file first. */
-	while ((opt = getopt(argc, argvopt, "i")) != EOF) {
+	while ((opt = getopt_long(argc, argvopt, short_options,
+				lgopts, &opt_idx)) != EOF) {
 		switch (opt) {
 		case 'i':
-			printf("Interactive-mode selected\n");
+			printf("Interactive-mode selected.\n");
 			interactive = 1;
 			break;
+		case OPT_QP_NUM:
+			n = atoi(optarg);
+			if (n > 0)
+				num_queues = n;
+			else
+				rte_exit(EXIT_FAILURE, "q must be > 0.\n");
+			break;
+		case OPT_BUF_SIZE_NUM:
+			n = atoi(optarg);
+			if (n > RTE_PKTMBUF_HEADROOM && n <= 0xFFFF)
+				ntb_buf_size = n;
+			else
+				rte_exit(EXIT_FAILURE, "buf-size must be > "
+					"%u and < 65536.\n",
+					RTE_PKTMBUF_HEADROOM);
+			break;
+		case OPT_FWD_MODE_NUM:
+			for (i = 0; i < MAX_FWD_MODE; i++) {
+				if (!strcmp(optarg, fwd_mode_s[i])) {
+					fwd_mode = i;
+					break;
+				}
+			}
+			if (i == MAX_FWD_MODE)
+				rte_exit(EXIT_FAILURE, "Unsupported mode. "
+				"(Should be: file-trans | rxonly | txonly "
+				"| iofwd)\n");
+			break;
+		case OPT_NB_DESC_NUM:
+			n = atoi(optarg);
+			if (n >= NTB_MIN_DESC_SIZE && n <= NTB_MAX_DESC_SIZE)
+				nb_desc = n;
+			else
+				rte_exit(EXIT_FAILURE, "nb-desc must be within"
+					" [%u, %u].\n", NTB_MIN_DESC_SIZE,
+					NTB_MAX_DESC_SIZE);
+			break;
+		case OPT_TXFREET_NUM:
+			n = atoi(optarg);
+			if (n >= 0)
+				tx_free_thresh = n;
+			else
+				rte_exit(EXIT_FAILURE, "txfreet must be"
+					" >= 0\n");
+			break;
+		case OPT_BURST_NUM:
+			n = atoi(optarg);
+			if (n > 0 && n <= NTB_MAX_PKT_BURST)
+				pkt_burst = n;
+			else
+				rte_exit(EXIT_FAILURE, "burst must be within "
+					"(0, %u].\n", NTB_MAX_PKT_BURST);
+			break;
 
 		default:
 			ntb_usage(prgname);
-			return -1;
+			rte_exit(EXIT_FAILURE,
+				 "Command line is incomplete or incorrect.\n");
+			break;
 		}
 	}
+}
 
-	if (optind >= 0)
-		argv[optind-1] = prgname;
+static void
+ntb_mempool_mz_free(__rte_unused struct rte_mempool_memhdr *memhdr,
+		void *opaque)
+{
+	const struct rte_memzone *mz = opaque;
+	rte_memzone_free(mz);
+}
 
-	ret = optind-1;
-	optind = 1; /* reset getopt lib */
-	return ret;
+static struct rte_mempool *
+ntb_mbuf_pool_create(uint16_t mbuf_seg_size, uint32_t nb_mbuf,
+		     struct ntb_dev_info ntb_info,
+		     struct ntb_dev_config *ntb_conf,
+		     unsigned int socket_id)
+{
+	size_t mz_len, total_elt_sz, max_mz_len, left_sz;
+	struct rte_pktmbuf_pool_private mbp_priv;
+	char pool_name[RTE_MEMPOOL_NAMESIZE];
+	char mz_name[RTE_MEMZONE_NAMESIZE];
+	const struct rte_memzone *mz;
+	struct rte_mempool *mp;
+	uint64_t align;
+	uint32_t mz_id;
+	int ret;
+
+	snprintf(pool_name, sizeof(pool_name), "ntb_mbuf_pool_%u", socket_id);
+	mp = rte_mempool_create_empty(pool_name, nb_mbuf,
+				      (mbuf_seg_size + sizeof(struct rte_mbuf)),
+				      MEMPOOL_CACHE_SIZE,
+				      sizeof(struct rte_pktmbuf_pool_private),
+				      socket_id, 0);
+	if (mp == NULL)
+		return NULL;
+
+	mbp_priv.mbuf_data_room_size = mbuf_seg_size;
+	mbp_priv.mbuf_priv_size = 0;
+	rte_pktmbuf_pool_init(mp, &mbp_priv);
+
+	ntb_conf->mz_list = rte_zmalloc("ntb_memzone_list",
+				sizeof(struct rte_memzone *) *
+				ntb_info.mw_cnt, 0);
+	if (ntb_conf->mz_list == NULL)
+		goto fail;
+
+	/* Put ntb header on mw0. */
+	if (ntb_info.mw_size[0] < ntb_info.ntb_hdr_size) {
+		printf("mw0 (size: %" PRIu64 ") is not enough for ntb hdr"
+		       " (size: %u)\n", ntb_info.mw_size[0],
+		       ntb_info.ntb_hdr_size);
+		goto fail;
+	}
+
+	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
+	left_sz = total_elt_sz * nb_mbuf;
+	for (mz_id = 0; mz_id < ntb_info.mw_cnt; mz_id++) {
+		/* If populated mbuf is enough, no need to reserve extra mz. */
+		if (!left_sz)
+			break;
+		snprintf(mz_name, sizeof(mz_name), "ntb_mw_%d", mz_id);
+		align = ntb_info.mw_size_align ? ntb_info.mw_size[mz_id] :
+			RTE_CACHE_LINE_SIZE;
+		/* Reserve ntb header space on memzone 0. */
+		max_mz_len = mz_id ? ntb_info.mw_size[mz_id] :
+			     ntb_info.mw_size[mz_id] - ntb_info.ntb_hdr_size;
+		mz_len = left_sz <= max_mz_len ? left_sz :
+			(max_mz_len / total_elt_sz * total_elt_sz);
+		if (!mz_len)
+			continue;
+		mz = rte_memzone_reserve_aligned(mz_name, mz_len, socket_id,
+					RTE_MEMZONE_IOVA_CONTIG, align);
+		if (mz == NULL) {
+			printf("Cannot allocate %" PRIu64 " aligned memzone"
+				" %u\n", align, mz_id);
+			goto fail;
+		}
+		left_sz -= mz_len;
+
+		/* Reserve ntb header space on memzone 0. */
+		if (mz_id)
+			ret = rte_mempool_populate_iova(mp, mz->addr, mz->iova,
+					mz->len, ntb_mempool_mz_free,
+					(void *)(uintptr_t)mz);
+		else
+			ret = rte_mempool_populate_iova(mp,
+					(void *)((uint64_t)mz->addr +
+					ntb_info.ntb_hdr_size),
+					mz->iova + ntb_info.ntb_hdr_size,
+					mz->len - ntb_info.ntb_hdr_size,
+					ntb_mempool_mz_free,
+					(void *)(uintptr_t)mz);
+		if (ret < 0) {
+			rte_memzone_free(mz);
+			rte_mempool_free(mp);
+			return NULL;
+		}
+
+		ntb_conf->mz_list[mz_id] = mz;
+	}
+	if (left_sz) {
+		printf("mw space is not enough for mempool.\n");
+		goto fail;
+	}
+
+	ntb_conf->mz_num = mz_id;
+	rte_mempool_obj_iter(mp, rte_pktmbuf_init, NULL);
+
+	return mp;
+fail:
+	rte_mempool_free(mp);
+	return NULL;
 }
 
 int
 main(int argc, char **argv)
 {
+	struct rte_eth_conf eth_pconf = eth_port_conf;
+	struct rte_rawdev_info ntb_rawdev_conf;
+	struct rte_rawdev_info ntb_rawdev_info;
+	struct rte_eth_dev_info ethdev_info;
+	struct rte_eth_rxconf eth_rx_conf;
+	struct rte_eth_txconf eth_tx_conf;
+	struct ntb_queue_conf ntb_q_conf;
+	struct ntb_dev_config ntb_conf;
+	struct ntb_dev_info ntb_info;
+	uint64_t ntb_link_status;
+	uint32_t nb_mbuf;
 	int ret, i;
 
 	signal(SIGINT, signal_handler);
@@ -353,6 +1300,9 @@ main(int argc, char **argv)
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "Error with EAL initialization.\n");
 
+	if (rte_lcore_count() < 2)
+		rte_exit(EXIT_FAILURE, "Need at least 2 cores\n");
+
 	/* Find 1st ntb rawdev. */
 	for (i = 0; i < RTE_RAWDEV_MAX_DEVS; i++)
 		if (rte_rawdevs[i].driver_name &&
@@ -368,15 +1318,118 @@ main(int argc, char **argv)
 	argc -= ret;
 	argv += ret;
 
-	ret = parse_args(argc, argv);
+	ntb_parse_args(argc, argv);
+
+	rte_rawdev_set_attr(dev_id, NTB_QUEUE_SZ_NAME, nb_desc);
+	printf("Set queue size as %u.\n", nb_desc);
+	rte_rawdev_set_attr(dev_id, NTB_QUEUE_NUM_NAME, num_queues);
+	printf("Set queue number as %u.\n", num_queues);
+	ntb_rawdev_info.dev_private = (rte_rawdev_obj_t)(&ntb_info);
+	rte_rawdev_info_get(dev_id, &ntb_rawdev_info);
+
+	nb_mbuf = nb_desc * num_queues * 2 * 2 + rte_lcore_count() *
+		  MEMPOOL_CACHE_SIZE;
+	mbuf_pool = ntb_mbuf_pool_create(ntb_buf_size, nb_mbuf, ntb_info,
+					 &ntb_conf, rte_socket_id());
+	if (mbuf_pool == NULL)
+		rte_exit(EXIT_FAILURE, "Cannot create mbuf pool.\n");
+
+	ntb_conf.num_queues = num_queues;
+	ntb_conf.queue_size = nb_desc;
+	ntb_rawdev_conf.dev_private = (rte_rawdev_obj_t)(&ntb_conf);
+	ret = rte_rawdev_configure(dev_id, &ntb_rawdev_conf);
+	if (ret)
+		rte_exit(EXIT_FAILURE, "Can't config ntb dev: err=%d, "
+			"port=%u\n", ret, dev_id);
+
+	ntb_q_conf.tx_free_thresh = tx_free_thresh;
+	ntb_q_conf.nb_desc = nb_desc;
+	ntb_q_conf.rx_mp = mbuf_pool;
+	for (i = 0; i < num_queues; i++) {
+		/* Setup rawdev queue */
+		ret = rte_rawdev_queue_setup(dev_id, i, &ntb_q_conf);
+		if (ret < 0)
+			rte_exit(EXIT_FAILURE,
+				"Failed to setup ntb queue %u.\n", i);
+	}
+
+	/* Waiting for peer dev up at most 100s.*/
+	printf("Checking ntb link status...\n");
+	for (i = 0; i < 1000; i++) {
+		rte_rawdev_get_attr(dev_id, NTB_LINK_STATUS_NAME,
+				    &ntb_link_status);
+		if (ntb_link_status) {
+			printf("Peer dev ready, ntb link up.\n");
+			break;
+		}
+		rte_delay_ms(100);
+	}
+	rte_rawdev_get_attr(dev_id, NTB_LINK_STATUS_NAME, &ntb_link_status);
+	if (ntb_link_status == 0)
+		printf("Expire 100s. Link is not up. Please restart app.\n");
+
+	ret = rte_rawdev_start(dev_id);
 	if (ret < 0)
-		rte_exit(EXIT_FAILURE, "Invalid arguments\n");
+		rte_exit(EXIT_FAILURE, "rte_rawdev_start: err=%d, port=%u\n",
+			ret, dev_id);
+
+	/* Find 1st ethdev */
+	eth_port_id = rte_eth_find_next(0);
 
-	rte_rawdev_start(dev_id);
+	if (eth_port_id < RTE_MAX_ETHPORTS) {
+		rte_eth_dev_info_get(eth_port_id, &ethdev_info);
+		eth_pconf.rx_adv_conf.rss_conf.rss_hf &=
+				ethdev_info.flow_type_rss_offloads;
+		ret = rte_eth_dev_configure(eth_port_id, num_queues,
+					    num_queues, &eth_pconf);
+		if (ret)
+			rte_exit(EXIT_FAILURE, "Can't config ethdev: err=%d, "
+				"port=%u\n", ret, eth_port_id);
+		eth_rx_conf = ethdev_info.default_rxconf;
+		eth_rx_conf.offloads = eth_pconf.rxmode.offloads;
+		eth_tx_conf = ethdev_info.default_txconf;
+		eth_tx_conf.offloads = eth_pconf.txmode.offloads;
+
+		/* Setup ethdev queue if ethdev exists */
+		for (i = 0; i < num_queues; i++) {
+			ret = rte_eth_rx_queue_setup(eth_port_id, i, nb_desc,
+					rte_eth_dev_socket_id(eth_port_id),
+					&eth_rx_conf, mbuf_pool);
+			if (ret < 0)
+				rte_exit(EXIT_FAILURE,
+					"Failed to setup eth rxq %u.\n", i);
+			ret = rte_eth_tx_queue_setup(eth_port_id, i, nb_desc,
+					rte_eth_dev_socket_id(eth_port_id),
+					&eth_tx_conf);
+			if (ret < 0)
+				rte_exit(EXIT_FAILURE,
+					"Failed to setup eth txq %u.\n", i);
+		}
+
+		ret = rte_eth_dev_start(eth_port_id);
+		if (ret < 0)
+			rte_exit(EXIT_FAILURE, "rte_eth_dev_start: err=%d, "
+				"port=%u\n", ret, eth_port_id);
+	}
+
+	/* initialize port stats */
+	memset(&ntb_port_stats, 0, sizeof(ntb_port_stats));
+
+	/* Set default fwd mode if user doesn't set it. */
+	if (fwd_mode == MAX_FWD_MODE && eth_port_id < RTE_MAX_ETHPORTS) {
+		printf("Set default fwd mode as iofwd.\n");
+		fwd_mode = IOFWD;
+	}
+	if (fwd_mode == MAX_FWD_MODE) {
+		printf("Set default fwd mode as file-trans.\n");
+		fwd_mode = FILE_TRANS;
+	}
 
 	if (interactive) {
 		sleep(1);
 		prompt();
+	} else {
+		start_pkt_fwd();
 	}
 
 	return 0;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4] enable FIFO for NTB
  2019-09-05  5:39 [dpdk-dev] [PATCH 0/4] enable FIFO for NTB Xiaoyun Li
                   ` (3 preceding siblings ...)
  2019-09-05  5:39 ` [dpdk-dev] [PATCH 4/4] examples/ntb: support more functions for NTB Xiaoyun Li
@ 2019-09-05 18:34 ` " Maslekar, Omkar
  2019-09-06  3:02 ` [dpdk-dev] [PATCH v2 " Xiaoyun Li
  5 siblings, 0 replies; 42+ messages in thread
From: Maslekar, Omkar @ 2019-09-05 18:34 UTC (permalink / raw)
  To: Li, Xiaoyun, Wu, Jingjing, Wiles, Keith, Liang, Cunming; +Cc: dev

Acked-by: Omkar Maslekar <omkar.maslekar@intel.com>

-----Original Message-----
From: Li, Xiaoyun 
Sent: Wednesday, September 4, 2019 10:39 PM
To: Wu, Jingjing <jingjing.wu@intel.com>; Wiles, Keith <keith.wiles@intel.com>; Maslekar, Omkar <omkar.maslekar@intel.com>; Liang, Cunming <cunming.liang@intel.com>
Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>
Subject: [PATCH 0/4] enable FIFO for NTB

Enable FIFO for NTB rawdev driver to support packet based processing. And an example is provided to support txonly, rxonly, iofwd between NTB device and ethdev, and file transmission.

Xiaoyun Li (4):
  raw/ntb: setup ntb queue
  raw/ntb: add xstats support
  raw/ntb: add enqueue and dequeue functions
  examples/ntb: support more functions for NTB

 doc/guides/rawdevs/ntb.rst             |   67 +-
 doc/guides/rel_notes/release_19_11.rst |    4 +
 doc/guides/sample_app_ug/ntb.rst       |   59 +-
 drivers/raw/ntb/Makefile               |    3 +
 drivers/raw/ntb/meson.build            |    1 +
 drivers/raw/ntb/ntb.c                  | 1078 +++++++++++++++-----
 drivers/raw/ntb/ntb.h                  |  162 ++-
 drivers/raw/ntb/ntb_hw_intel.c         |   48 +-
 drivers/raw/ntb/rte_pmd_ntb.h          |   43 +
 examples/ntb/meson.build               |    3 +
 examples/ntb/ntb_fwd.c                 | 1297 +++++++++++++++++++++---
 11 files changed, 2349 insertions(+), 416 deletions(-)  create mode 100644 drivers/raw/ntb/rte_pmd_ntb.h

--
2.17.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [PATCH v2 0/4] enable FIFO for NTB
  2019-09-05  5:39 [dpdk-dev] [PATCH 0/4] enable FIFO for NTB Xiaoyun Li
                   ` (4 preceding siblings ...)
  2019-09-05 18:34 ` [dpdk-dev] [PATCH 0/4] enable FIFO " Maslekar, Omkar
@ 2019-09-06  3:02 ` " Xiaoyun Li
  2019-09-06  3:02   ` [dpdk-dev] [PATCH v2 1/4] raw/ntb: setup ntb queue Xiaoyun Li
                     ` (4 more replies)
  5 siblings, 5 replies; 42+ messages in thread
From: Xiaoyun Li @ 2019-09-06  3:02 UTC (permalink / raw)
  To: jingjing.wu, keith.wiles, omkar.maslekar, cunming.liang; +Cc: dev, Xiaoyun Li

Enable FIFO for NTB rawdev driver to support packet based
processing. And an example is provided to support txonly,
rxonly, iofwd between NTB device and ethdev, and file
transmission.

Xiaoyun Li (4):
  raw/ntb: setup ntb queue
  raw/ntb: add xstats support
  raw/ntb: add enqueue and dequeue functions
  examples/ntb: support more functions for NTB

Acked-by: Omkar Maslekar <omkar.maslekar@intel.com>

---
v2:
 * Fixed compile issues with 32-bit machine and lack of including file.
 * Fixed a typo.

 doc/guides/rawdevs/ntb.rst             |   67 +-
 doc/guides/rel_notes/release_19_11.rst |    4 +
 doc/guides/sample_app_ug/ntb.rst       |   59 +-
 drivers/raw/ntb/Makefile               |    3 +
 drivers/raw/ntb/meson.build            |    1 +
 drivers/raw/ntb/ntb.c                  | 1078 +++++++++++++++-----
 drivers/raw/ntb/ntb.h                  |  162 ++-
 drivers/raw/ntb/ntb_hw_intel.c         |   48 +-
 drivers/raw/ntb/rte_pmd_ntb.h          |   43 +
 examples/ntb/meson.build               |    3 +
 examples/ntb/ntb_fwd.c                 | 1298 +++++++++++++++++++++---
 11 files changed, 2350 insertions(+), 416 deletions(-)
 create mode 100644 drivers/raw/ntb/rte_pmd_ntb.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [PATCH v2 1/4] raw/ntb: setup ntb queue
  2019-09-06  3:02 ` [dpdk-dev] [PATCH v2 " Xiaoyun Li
@ 2019-09-06  3:02   ` Xiaoyun Li
  2019-09-06  3:02   ` [dpdk-dev] [PATCH v2 2/4] raw/ntb: add xstats support Xiaoyun Li
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 42+ messages in thread
From: Xiaoyun Li @ 2019-09-06  3:02 UTC (permalink / raw)
  To: jingjing.wu, keith.wiles, omkar.maslekar, cunming.liang; +Cc: dev, Xiaoyun Li

Setup and init ntb txq and rxq. And negotiate queue information
with the peer. If queue size and number of queues are not
consistent on both sides, return error.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
---
 doc/guides/rawdevs/ntb.rst             |  39 +-
 doc/guides/rel_notes/release_19_11.rst |   4 +
 drivers/raw/ntb/Makefile               |   3 +
 drivers/raw/ntb/meson.build            |   1 +
 drivers/raw/ntb/ntb.c                  | 705 ++++++++++++++++++-------
 drivers/raw/ntb/ntb.h                  | 151 ++++--
 drivers/raw/ntb/ntb_hw_intel.c         |  26 +-
 drivers/raw/ntb/rte_pmd_ntb.h          |  43 ++
 8 files changed, 718 insertions(+), 254 deletions(-)
 create mode 100644 drivers/raw/ntb/rte_pmd_ntb.h

diff --git a/doc/guides/rawdevs/ntb.rst b/doc/guides/rawdevs/ntb.rst
index 0a61ec03d..99e7db441 100644
--- a/doc/guides/rawdevs/ntb.rst
+++ b/doc/guides/rawdevs/ntb.rst
@@ -45,8 +45,45 @@ to use, i.e. igb_uio, vfio. The ``dpdk-devbind.py`` script can be used to
 show devices status and to bind them to a suitable kernel driver. They will
 appear under the category of "Misc (rawdev) devices".
 
+Ring Layout
+-----------
+
+Since read/write remote system's memory are through PCI bus, remote read
+is much more expensive than remote write. Thus, the enqueue and dequeue
+based on ntb ring should avoid remote read. The ring layout for ntb is
+like the following:
+- Ring Format:
+  desc_ring:
+      0               16                                              64
+      +---------------------------------------------------------------+
+      |                        buffer address                         |
+      +---------------+-----------------------------------------------+
+      | buffer length |                      resv                     |
+      +---------------+-----------------------------------------------+
+  used_ring:
+      0               16              32
+      +---------------+---------------+
+      | packet length |     flags     |
+      +---------------+---------------+
+- Ring Layout
+      +------------------------+   +------------------------+
+      | used_ring              |   | desc_ring              |
+      | +---+                  |   | +---+                  |
+      | |   |                  |   | |   |                  |
+      | +---+      +--------+  |   | +---+                  |
+      | |   | ---> | buffer | <+---+-|   |                  |
+      | +---+      +--------+  |   | +---+                  |
+      | |   |                  |   | |   |                  |
+      | +---+                  |   | +---+                  |
+      |  ...                   |   |  ...                   |
+      |                        |   |                        |
+      |            +---------+ |   |            +---------+ |
+      |            | tx_tail | |   |            | rx_tail | |
+      | System A   +---------+ |   | System B   +---------+ |
+      +------------------------+   +------------------------+
+                    <---------traffic---------
+
 Limitation
 ----------
 
-- The FIFO hasn't been introduced and will come in 19.11 release.
 - This PMD only supports Intel Skylake platform.
diff --git a/doc/guides/rel_notes/release_19_11.rst b/doc/guides/rel_notes/release_19_11.rst
index 8490d897c..7ac3d5ca6 100644
--- a/doc/guides/rel_notes/release_19_11.rst
+++ b/doc/guides/rel_notes/release_19_11.rst
@@ -56,6 +56,10 @@ New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
+   * **Introduced FIFO for NTB PMD.**
+
+     Introduced FIFO for NTB (Non-transparent Bridge) PMD to support
+     packet based processing.
 
 Removed Items
 -------------
diff --git a/drivers/raw/ntb/Makefile b/drivers/raw/ntb/Makefile
index 6fe2aaf40..814cd05ca 100644
--- a/drivers/raw/ntb/Makefile
+++ b/drivers/raw/ntb/Makefile
@@ -25,4 +25,7 @@ LIBABIVER := 1
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_NTB_RAWDEV) += ntb.c
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_NTB_RAWDEV) += ntb_hw_intel.c
 
+# install this header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_PMD_NTB_RAWDEV)-include := rte_pmd_ntb.h
+
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/raw/ntb/meson.build b/drivers/raw/ntb/meson.build
index 7f39437f8..7a7d26126 100644
--- a/drivers/raw/ntb/meson.build
+++ b/drivers/raw/ntb/meson.build
@@ -5,4 +5,5 @@ deps += ['rawdev', 'mbuf', 'mempool',
 	 'pci', 'bus_pci']
 sources = files('ntb.c',
                 'ntb_hw_intel.c')
+install_headers('rte_pmd_ntb.h')
 allow_experimental_apis = true
diff --git a/drivers/raw/ntb/ntb.c b/drivers/raw/ntb/ntb.c
index bfecce1e4..02784a134 100644
--- a/drivers/raw/ntb/ntb.c
+++ b/drivers/raw/ntb/ntb.c
@@ -12,6 +12,7 @@
 #include <rte_eal.h>
 #include <rte_log.h>
 #include <rte_pci.h>
+#include <rte_mbuf.h>
 #include <rte_bus_pci.h>
 #include <rte_memzone.h>
 #include <rte_memcpy.h>
@@ -19,6 +20,7 @@
 #include <rte_rawdev_pmd.h>
 
 #include "ntb_hw_intel.h"
+#include "rte_pmd_ntb.h"
 #include "ntb.h"
 
 int ntb_logtype;
@@ -28,48 +30,7 @@ static const struct rte_pci_id pci_id_ntb_map[] = {
 	{ .vendor_id = 0, /* sentinel */ },
 };
 
-static int
-ntb_set_mw(struct rte_rawdev *dev, int mw_idx, uint64_t mw_size)
-{
-	struct ntb_hw *hw = dev->dev_private;
-	char mw_name[RTE_MEMZONE_NAMESIZE];
-	const struct rte_memzone *mz;
-	int ret = 0;
-
-	if (hw->ntb_ops->mw_set_trans == NULL) {
-		NTB_LOG(ERR, "Not supported to set mw.");
-		return -ENOTSUP;
-	}
-
-	snprintf(mw_name, sizeof(mw_name), "ntb_%d_mw_%d",
-		 dev->dev_id, mw_idx);
-
-	mz = rte_memzone_lookup(mw_name);
-	if (mz)
-		return 0;
-
-	/**
-	 * Hardware requires that mapped memory base address should be
-	 * aligned with EMBARSZ and needs continuous memzone.
-	 */
-	mz = rte_memzone_reserve_aligned(mw_name, mw_size, dev->socket_id,
-				RTE_MEMZONE_IOVA_CONTIG, hw->mw_size[mw_idx]);
-	if (!mz) {
-		NTB_LOG(ERR, "Cannot allocate aligned memzone.");
-		return -EIO;
-	}
-	hw->mz[mw_idx] = mz;
-
-	ret = (*hw->ntb_ops->mw_set_trans)(dev, mw_idx, mz->iova, mw_size);
-	if (ret) {
-		NTB_LOG(ERR, "Cannot set mw translation.");
-		return ret;
-	}
-
-	return ret;
-}
-
-static void
+static inline void
 ntb_link_cleanup(struct rte_rawdev *dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
@@ -89,20 +50,94 @@ ntb_link_cleanup(struct rte_rawdev *dev)
 	}
 
 	/* Clear mw so that peer cannot access local memory.*/
-	for (i = 0; i < hw->mw_cnt; i++) {
+	for (i = 0; i < hw->used_mw_num; i++) {
 		status = (*hw->ntb_ops->mw_set_trans)(dev, i, 0, 0);
 		if (status)
 			NTB_LOG(ERR, "Failed to clean mw.");
 	}
 }
 
+static inline int
+ntb_handshake_work(const struct rte_rawdev *dev)
+{
+	struct ntb_hw *hw = dev->dev_private;
+	uint32_t val;
+	int ret, i;
+
+	if (hw->ntb_ops->spad_write == NULL ||
+	    hw->ntb_ops->mw_set_trans == NULL) {
+		NTB_LOG(ERR, "Scratchpad/MW setting is not supported.");
+		return -ENOTSUP;
+	}
+
+	/* Tell peer the mw info of local side. */
+	ret = (*hw->ntb_ops->spad_write)(dev, SPAD_NUM_MWS, 1, hw->mw_cnt);
+	if (ret < 0)
+		return ret;
+	for (i = 0; i < hw->mw_cnt; i++) {
+		NTB_LOG(INFO, "Local %u mw size: 0x%"PRIx64"", i,
+				hw->mw_size[i]);
+		val = hw->mw_size[i] >> 32;
+		ret = (*hw->ntb_ops->spad_write)(dev, SPAD_MW0_SZ_H + 2 * i,
+						 1, val);
+		if (ret < 0)
+			return ret;
+		val = hw->mw_size[i];
+		ret = (*hw->ntb_ops->spad_write)(dev, SPAD_MW0_SZ_L + 2 * i,
+						 1, val);
+		if (ret < 0)
+			return ret;
+	}
+
+	/* Tell peer about the queue info and map memory to the peer. */
+	ret = (*hw->ntb_ops->spad_write)(dev, SPAD_Q_SZ, 1, hw->queue_size);
+	if (ret < 0)
+		return ret;
+	ret = (*hw->ntb_ops->spad_write)(dev, SPAD_NUM_QPS, 1,
+					 hw->queue_pairs);
+	if (ret < 0)
+		return ret;
+	ret = (*hw->ntb_ops->spad_write)(dev, SPAD_USED_MWS, 1,
+					 hw->used_mw_num);
+	if (ret < 0)
+		return ret;
+	for (i = 0; i < hw->used_mw_num; i++) {
+		val = (uint64_t)(hw->mz[i]->addr) >> 32;
+		ret = (*hw->ntb_ops->spad_write)(dev, SPAD_MW0_BA_H + 2 * i,
+						 1, val);
+		if (ret < 0)
+			return ret;
+		val = (uint64_t)(hw->mz[i]->addr);
+		ret = (*hw->ntb_ops->spad_write)(dev, SPAD_MW0_BA_L + 2 * i,
+						 1, val);
+		if (ret < 0)
+			return ret;
+	}
+
+	for (i = 0; i < hw->used_mw_num; i++) {
+		ret = (*hw->ntb_ops->mw_set_trans)(dev, i, hw->mz[i]->iova,
+						   hw->mz[i]->len);
+		if (ret < 0)
+			return ret;
+	}
+
+	/* Ring doorbell 0 to tell peer the device is ready. */
+	ret = (*hw->ntb_ops->peer_db_set)(dev, 0);
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
+
 static void
 ntb_dev_intr_handler(void *param)
 {
 	struct rte_rawdev *dev = (struct rte_rawdev *)param;
 	struct ntb_hw *hw = dev->dev_private;
-	uint32_t mw_size_h, mw_size_l;
+	uint32_t val_h, val_l;
+	uint64_t peer_mw_size;
 	uint64_t db_bits = 0;
+	uint8_t peer_mw_cnt;
 	int i = 0;
 
 	if (hw->ntb_ops->db_read == NULL ||
@@ -118,7 +153,7 @@ ntb_dev_intr_handler(void *param)
 
 	/* Doorbell 0 is for peer device ready. */
 	if (db_bits & 1) {
-		NTB_LOG(DEBUG, "DB0: Peer device is up.");
+		NTB_LOG(INFO, "DB0: Peer device is up.");
 		/* Clear received doorbell. */
 		(*hw->ntb_ops->db_clear)(dev, 1);
 
@@ -129,47 +164,44 @@ ntb_dev_intr_handler(void *param)
 		if (hw->peer_dev_up)
 			return;
 
-		if (hw->ntb_ops->spad_read == NULL ||
-		    hw->ntb_ops->spad_write == NULL) {
-			NTB_LOG(ERR, "Scratchpad is not supported.");
+		if (hw->ntb_ops->spad_read == NULL) {
+			NTB_LOG(ERR, "Scratchpad read is not supported.");
+			return;
+		}
+
+		/* Check if mw setting on the peer is the same as local. */
+		peer_mw_cnt = (*hw->ntb_ops->spad_read)(dev, SPAD_NUM_MWS, 0);
+		if (peer_mw_cnt != hw->mw_cnt) {
+			NTB_LOG(ERR, "Both mw cnt must be the same.");
 			return;
 		}
 
-		hw->peer_mw_cnt = (*hw->ntb_ops->spad_read)
-				  (dev, SPAD_NUM_MWS, 0);
-		hw->peer_mw_size = rte_zmalloc("uint64_t",
-				   hw->peer_mw_cnt * sizeof(uint64_t), 0);
 		for (i = 0; i < hw->mw_cnt; i++) {
-			mw_size_h = (*hw->ntb_ops->spad_read)
-				    (dev, SPAD_MW0_SZ_H + 2 * i, 0);
-			mw_size_l = (*hw->ntb_ops->spad_read)
-				    (dev, SPAD_MW0_SZ_L + 2 * i, 0);
-			hw->peer_mw_size[i] = ((uint64_t)mw_size_h << 32) |
-					      mw_size_l;
+			val_h = (*hw->ntb_ops->spad_read)
+				(dev, SPAD_MW0_SZ_H + 2 * i, 0);
+			val_l = (*hw->ntb_ops->spad_read)
+				(dev, SPAD_MW0_SZ_L + 2 * i, 0);
+			peer_mw_size = ((uint64_t)val_h << 32) | val_l;
 			NTB_LOG(DEBUG, "Peer %u mw size: 0x%"PRIx64"", i,
-					hw->peer_mw_size[i]);
+					peer_mw_size);
+			if (peer_mw_size != hw->mw_size[i]) {
+				NTB_LOG(ERR, "Mw config must be the same.");
+				return;
+			}
 		}
 
 		hw->peer_dev_up = 1;
 
 		/**
-		 * Handshake with peer. Spad_write only works when both
-		 * devices are up. So write spad again when db is received.
-		 * And set db again for the later device who may miss
+		 * Handshake with peer. Spad_write & mw_set_trans only works
+		 * when both devices are up. So write spad again when db is
+		 * received. And set db again for the later device who may miss
 		 * the 1st db.
 		 */
-		for (i = 0; i < hw->mw_cnt; i++) {
-			(*hw->ntb_ops->spad_write)(dev, SPAD_NUM_MWS,
-						   1, hw->mw_cnt);
-			mw_size_h = hw->mw_size[i] >> 32;
-			(*hw->ntb_ops->spad_write)(dev, SPAD_MW0_SZ_H + 2 * i,
-						   1, mw_size_h);
-
-			mw_size_l = hw->mw_size[i];
-			(*hw->ntb_ops->spad_write)(dev, SPAD_MW0_SZ_L + 2 * i,
-						   1, mw_size_l);
+		if (ntb_handshake_work(dev) < 0) {
+			NTB_LOG(ERR, "Handshake work failed.");
+			return;
 		}
-		(*hw->ntb_ops->peer_db_set)(dev, 0);
 
 		/* To get the link info. */
 		if (hw->ntb_ops->get_link_status == NULL) {
@@ -183,7 +215,7 @@ ntb_dev_intr_handler(void *param)
 	}
 
 	if (db_bits & (1 << 1)) {
-		NTB_LOG(DEBUG, "DB1: Peer device is down.");
+		NTB_LOG(INFO, "DB1: Peer device is down.");
 		/* Clear received doorbell. */
 		(*hw->ntb_ops->db_clear)(dev, 2);
 
@@ -197,7 +229,7 @@ ntb_dev_intr_handler(void *param)
 	}
 
 	if (db_bits & (1 << 2)) {
-		NTB_LOG(DEBUG, "DB2: Peer device agrees dev to be down.");
+		NTB_LOG(INFO, "DB2: Peer device agrees dev to be down.");
 		/* Clear received doorbell. */
 		(*hw->ntb_ops->db_clear)(dev, (1 << 2));
 		hw->peer_dev_up = 0;
@@ -206,24 +238,228 @@ ntb_dev_intr_handler(void *param)
 }
 
 static void
-ntb_queue_conf_get(struct rte_rawdev *dev __rte_unused,
-		   uint16_t queue_id __rte_unused,
-		   rte_rawdev_obj_t queue_conf __rte_unused)
+ntb_queue_conf_get(struct rte_rawdev *dev,
+		   uint16_t queue_id,
+		   rte_rawdev_obj_t queue_conf)
+{
+	struct ntb_queue_conf *q_conf = queue_conf;
+	struct ntb_hw *hw = dev->dev_private;
+
+	q_conf->tx_free_thresh = hw->tx_queues[queue_id]->tx_free_thresh;
+	q_conf->nb_desc = hw->rx_queues[queue_id]->nb_rx_desc;
+	q_conf->rx_mp = hw->rx_queues[queue_id]->mpool;
+}
+
+static void
+ntb_rxq_release_mbufs(struct ntb_rx_queue *q)
+{
+	int i;
+
+	if (!q || !q->sw_ring) {
+		NTB_LOG(ERR, "Pointer to rxq or sw_ring is NULL");
+		return;
+	}
+
+	for (i = 0; i < q->nb_rx_desc; i++) {
+		if (q->sw_ring[i].mbuf) {
+			rte_pktmbuf_free_seg(q->sw_ring[i].mbuf);
+			q->sw_ring[i].mbuf = NULL;
+		}
+	}
+}
+
+static void
+ntb_rxq_release(struct ntb_rx_queue *rxq)
+{
+	if (!rxq) {
+		NTB_LOG(ERR, "Pointer to rxq is NULL");
+		return;
+	}
+
+	ntb_rxq_release_mbufs(rxq);
+
+	rte_free(rxq->sw_ring);
+	rte_free(rxq);
+}
+
+static int
+ntb_rxq_setup(struct rte_rawdev *dev,
+	      uint16_t qp_id,
+	      rte_rawdev_obj_t queue_conf)
+{
+	struct ntb_queue_conf *rxq_conf = queue_conf;
+	struct ntb_hw *hw = dev->dev_private;
+	struct ntb_rx_queue *rxq;
+
+	/* Allocate the rx queue data structure */
+	rxq = rte_zmalloc_socket("ntb rx queue",
+				 sizeof(struct ntb_rx_queue),
+				 RTE_CACHE_LINE_SIZE,
+				 dev->socket_id);
+	if (!rxq) {
+		NTB_LOG(ERR, "Failed to allocate memory for "
+			    "rx queue data structure.");
+		return -ENOMEM;
+	}
+
+	if (rxq_conf->rx_mp == NULL) {
+		NTB_LOG(ERR, "Invalid null mempool pointer.");
+		return -EINVAL;
+	}
+	rxq->nb_rx_desc = rxq_conf->nb_desc;
+	rxq->mpool = rxq_conf->rx_mp;
+	rxq->port_id = dev->dev_id;
+	rxq->queue_id = qp_id;
+	rxq->hw = hw;
+
+	/* Allocate the software ring. */
+	rxq->sw_ring =
+		rte_zmalloc_socket("ntb rx sw ring",
+				   sizeof(struct ntb_rx_entry) *
+				   rxq->nb_rx_desc,
+				   RTE_CACHE_LINE_SIZE,
+				   dev->socket_id);
+	if (!rxq->sw_ring) {
+		ntb_rxq_release(rxq);
+		NTB_LOG(ERR, "Failed to allocate memory for SW ring");
+		return -ENOMEM;
+	}
+
+	hw->rx_queues[qp_id] = rxq;
+
+	return 0;
+}
+
+static void
+ntb_txq_release_mbufs(struct ntb_tx_queue *q)
+{
+	int i;
+
+	if (!q || !q->sw_ring) {
+		NTB_LOG(ERR, "Pointer to txq or sw_ring is NULL");
+		return;
+	}
+
+	for (i = 0; i < q->nb_tx_desc; i++) {
+		if (q->sw_ring[i].mbuf) {
+			rte_pktmbuf_free_seg(q->sw_ring[i].mbuf);
+			q->sw_ring[i].mbuf = NULL;
+		}
+	}
+}
+
+static void
+ntb_txq_release(struct ntb_tx_queue *txq)
 {
+	if (!txq) {
+		NTB_LOG(ERR, "Pointer to txq is NULL");
+		return;
+	}
+
+	ntb_txq_release_mbufs(txq);
+
+	rte_free(txq->sw_ring);
+	rte_free(txq);
 }
 
 static int
-ntb_queue_setup(struct rte_rawdev *dev __rte_unused,
-		uint16_t queue_id __rte_unused,
-		rte_rawdev_obj_t queue_conf __rte_unused)
+ntb_txq_setup(struct rte_rawdev *dev,
+	      uint16_t qp_id,
+	      rte_rawdev_obj_t queue_conf)
 {
+	struct ntb_queue_conf *txq_conf = queue_conf;
+	struct ntb_hw *hw = dev->dev_private;
+	struct ntb_tx_queue *txq;
+	uint16_t i, prev;
+
+	/* Allocate the TX queue data structure. */
+	txq = rte_zmalloc_socket("ntb tx queue",
+				  sizeof(struct ntb_tx_queue),
+				  RTE_CACHE_LINE_SIZE,
+				  dev->socket_id);
+	if (!txq) {
+		NTB_LOG(ERR, "Failed to allocate memory for "
+			    "tx queue structure");
+		return -ENOMEM;
+	}
+
+	txq->nb_tx_desc = txq_conf->nb_desc;
+	txq->port_id = dev->dev_id;
+	txq->queue_id = qp_id;
+	txq->hw = hw;
+
+	/* Allocate software ring */
+	txq->sw_ring =
+		rte_zmalloc_socket("ntb tx sw ring",
+				   sizeof(struct ntb_tx_entry) *
+				   txq->nb_tx_desc,
+				   RTE_CACHE_LINE_SIZE,
+				   dev->socket_id);
+	if (!txq->sw_ring) {
+		ntb_txq_release(txq);
+		NTB_LOG(ERR, "Failed to allocate memory for SW TX ring");
+		return -ENOMEM;
+	}
+
+	prev = txq->nb_tx_desc - 1;
+	for (i = 0; i < txq->nb_tx_desc; i++) {
+		txq->sw_ring[i].mbuf = NULL;
+		txq->sw_ring[i].last_id = i;
+		txq->sw_ring[prev].next_id = i;
+		prev = i;
+	}
+
+	txq->tx_free_thresh = txq_conf->tx_free_thresh ?
+			      txq_conf->tx_free_thresh :
+			      NTB_DFLT_TX_FREE_THRESH;
+	if (txq->tx_free_thresh >= txq->nb_tx_desc - 3) {
+		NTB_LOG(ERR, "tx_free_thresh must be less than nb_desc - 3. "
+			"(tx_free_thresh=%u qp_id=%u)", txq->tx_free_thresh,
+			qp_id);
+		return -EINVAL;
+	}
+
+	hw->tx_queues[qp_id] = txq;
+
 	return 0;
 }
 
+
+static int
+ntb_queue_setup(struct rte_rawdev *dev,
+		uint16_t queue_id,
+		rte_rawdev_obj_t queue_conf)
+{
+	struct ntb_hw *hw = dev->dev_private;
+	int ret;
+
+	if (queue_id > hw->queue_pairs)
+		return -EINVAL;
+
+	ret = ntb_txq_setup(dev, queue_id, queue_conf);
+	if (ret < 0)
+		return ret;
+
+	ret = ntb_rxq_setup(dev, queue_id, queue_conf);
+
+	return ret;
+}
+
 static int
-ntb_queue_release(struct rte_rawdev *dev __rte_unused,
-		  uint16_t queue_id __rte_unused)
+ntb_queue_release(struct rte_rawdev *dev, uint16_t queue_id)
 {
+	struct ntb_hw *hw = dev->dev_private;
+	struct ntb_tx_queue *txq;
+	struct ntb_rx_queue *rxq;
+
+	if (queue_id > hw->queue_pairs)
+		return -EINVAL;
+
+	txq = hw->tx_queues[queue_id];
+	rxq = hw->rx_queues[queue_id];
+	ntb_txq_release(txq);
+	ntb_rxq_release(rxq);
+
 	return 0;
 }
 
@@ -234,6 +470,77 @@ ntb_queue_count(struct rte_rawdev *dev)
 	return hw->queue_pairs;
 }
 
+static int
+ntb_queue_init(struct rte_rawdev *dev, uint16_t qp_id)
+{
+	struct ntb_hw *hw = dev->dev_private;
+	struct ntb_rx_queue *rxq = hw->rx_queues[qp_id];
+	struct ntb_tx_queue *txq = hw->tx_queues[qp_id];
+	volatile struct ntb_header *local_hdr;
+	struct ntb_header *remote_hdr;
+	uint16_t q_size = hw->queue_size;
+	uint32_t hdr_offset;
+	void *bar_addr;
+	uint16_t i;
+
+	if (hw->ntb_ops->get_peer_mw_addr == NULL) {
+		NTB_LOG(ERR, "Failed to get mapped peer addr.");
+		return -EINVAL;
+	}
+
+	/* Put queue info into the start of shared memory. */
+	hdr_offset = hw->hdr_size_per_queue * qp_id;
+	local_hdr = (volatile struct ntb_header *)
+		    ((size_t)hw->mz[0]->addr + hdr_offset);
+	bar_addr = (*hw->ntb_ops->get_peer_mw_addr)(dev, 0);
+	if (bar_addr == NULL)
+		return -EINVAL;
+	remote_hdr = (struct ntb_header *)
+		     ((size_t)bar_addr + hdr_offset);
+
+	/* rxq init. */
+	rxq->rx_desc_ring = (struct ntb_desc *)
+			    (&remote_hdr->desc_ring);
+	rxq->rx_used_ring = (volatile struct ntb_used *)
+			    (&local_hdr->desc_ring[q_size]);
+	rxq->avail_cnt = &remote_hdr->avail_cnt;
+	rxq->used_cnt = &local_hdr->used_cnt;
+
+	for (i = 0; i < rxq->nb_rx_desc - 1; i++) {
+		struct rte_mbuf *mbuf = rte_mbuf_raw_alloc(rxq->mpool);
+		if (unlikely(!mbuf)) {
+			NTB_LOG(ERR, "Failed to allocate mbuf for RX");
+			return -ENOMEM;
+		}
+		mbuf->port = dev->dev_id;
+
+		rxq->sw_ring[i].mbuf = mbuf;
+
+		rxq->rx_desc_ring[i].addr = rte_pktmbuf_mtod(mbuf, uint64_t);
+		rxq->rx_desc_ring[i].len = mbuf->buf_len - RTE_PKTMBUF_HEADROOM;
+	}
+	rte_wmb();
+	*rxq->avail_cnt = rxq->nb_rx_desc - 1;
+	rxq->last_avail = rxq->nb_rx_desc - 1;
+	rxq->last_used = 0;
+
+	/* txq init */
+	txq->tx_desc_ring = (volatile struct ntb_desc *)
+			    (&local_hdr->desc_ring);
+	txq->tx_used_ring = (struct ntb_used *)
+			    (&remote_hdr->desc_ring[q_size]);
+	txq->avail_cnt = &local_hdr->avail_cnt;
+	txq->used_cnt = &remote_hdr->used_cnt;
+
+	rte_wmb();
+	*txq->used_cnt = 0;
+	txq->last_used = 0;
+	txq->last_avail = 0;
+	txq->nb_tx_free = txq->nb_tx_desc - 1;
+
+	return 0;
+}
+
 static int
 ntb_enqueue_bufs(struct rte_rawdev *dev,
 		 struct rte_rawdev_buf **buffers,
@@ -278,58 +585,51 @@ static void
 ntb_dev_info_get(struct rte_rawdev *dev, rte_rawdev_obj_t dev_info)
 {
 	struct ntb_hw *hw = dev->dev_private;
-	struct ntb_attr *ntb_attrs = dev_info;
+	struct ntb_dev_info *info = dev_info;
 
-	strncpy(ntb_attrs[NTB_TOPO_ID].name, NTB_TOPO_NAME, NTB_ATTR_NAME_LEN);
-	switch (hw->topo) {
-	case NTB_TOPO_B2B_DSD:
-		strncpy(ntb_attrs[NTB_TOPO_ID].value, "B2B DSD",
-			NTB_ATTR_VAL_LEN);
-		break;
-	case NTB_TOPO_B2B_USD:
-		strncpy(ntb_attrs[NTB_TOPO_ID].value, "B2B USD",
-			NTB_ATTR_VAL_LEN);
-		break;
-	default:
-		strncpy(ntb_attrs[NTB_TOPO_ID].value, "Unsupported",
-			NTB_ATTR_VAL_LEN);
-	}
+	info->mw_cnt = hw->mw_cnt;
+	info->mw_size = hw->mw_size;
 
-	strncpy(ntb_attrs[NTB_LINK_STATUS_ID].name, NTB_LINK_STATUS_NAME,
-		NTB_ATTR_NAME_LEN);
-	snprintf(ntb_attrs[NTB_LINK_STATUS_ID].value, NTB_ATTR_VAL_LEN,
-		 "%d", hw->link_status);
-
-	strncpy(ntb_attrs[NTB_SPEED_ID].name, NTB_SPEED_NAME,
-		NTB_ATTR_NAME_LEN);
-	snprintf(ntb_attrs[NTB_SPEED_ID].value, NTB_ATTR_VAL_LEN,
-		 "%d", hw->link_speed);
-
-	strncpy(ntb_attrs[NTB_WIDTH_ID].name, NTB_WIDTH_NAME,
-		NTB_ATTR_NAME_LEN);
-	snprintf(ntb_attrs[NTB_WIDTH_ID].value, NTB_ATTR_VAL_LEN,
-		 "%d", hw->link_width);
-
-	strncpy(ntb_attrs[NTB_MW_CNT_ID].name, NTB_MW_CNT_NAME,
-		NTB_ATTR_NAME_LEN);
-	snprintf(ntb_attrs[NTB_MW_CNT_ID].value, NTB_ATTR_VAL_LEN,
-		 "%d", hw->mw_cnt);
+	/**
+	 * Intel hardware requires that mapped memory base address should be
+	 * aligned with EMBARSZ and needs continuous memzone.
+	 */
+	info->mw_size_align = (uint8_t)(hw->pci_dev->id.vendor_id ==
+					NTB_INTEL_VENDOR_ID);
 
-	strncpy(ntb_attrs[NTB_DB_CNT_ID].name, NTB_DB_CNT_NAME,
-		NTB_ATTR_NAME_LEN);
-	snprintf(ntb_attrs[NTB_DB_CNT_ID].value, NTB_ATTR_VAL_LEN,
-		 "%d", hw->db_cnt);
+	if (!hw->queue_size || !hw->queue_pairs) {
+		NTB_LOG(ERR, "No queue size and queue num assigned.");
+		return;
+	}
 
-	strncpy(ntb_attrs[NTB_SPAD_CNT_ID].name, NTB_SPAD_CNT_NAME,
-		NTB_ATTR_NAME_LEN);
-	snprintf(ntb_attrs[NTB_SPAD_CNT_ID].value, NTB_ATTR_VAL_LEN,
-		 "%d", hw->spad_cnt);
+	hw->hdr_size_per_queue = RTE_ALIGN(sizeof(struct ntb_header) +
+				hw->queue_size * sizeof(struct ntb_desc) +
+				hw->queue_size * sizeof(struct ntb_used),
+				RTE_CACHE_LINE_SIZE);
+	info->ntb_hdr_size = hw->hdr_size_per_queue * hw->queue_pairs;
 }
 
 static int
-ntb_dev_configure(const struct rte_rawdev *dev __rte_unused,
-		  rte_rawdev_obj_t config __rte_unused)
+ntb_dev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config)
 {
+	struct ntb_dev_config *conf = config;
+	struct ntb_hw *hw = dev->dev_private;
+	int ret;
+
+	hw->queue_pairs	= conf->num_queues;
+	hw->queue_size = conf->queue_size;
+	hw->used_mw_num = conf->mz_num;
+	hw->mz = conf->mz_list;
+	hw->rx_queues = rte_zmalloc("ntb_rx_queues",
+			sizeof(struct ntb_rx_queue *) * hw->queue_pairs, 0);
+	hw->tx_queues = rte_zmalloc("ntb_tx_queues",
+			sizeof(struct ntb_tx_queue *) * hw->queue_pairs, 0);
+
+	/* Start handshake with the peer. */
+	ret = ntb_handshake_work(dev);
+	if (ret < 0)
+		return ret;
+
 	return 0;
 }
 
@@ -337,21 +637,52 @@ static int
 ntb_dev_start(struct rte_rawdev *dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
-	int ret, i;
+	uint32_t peer_base_l, peer_val;
+	uint64_t peer_base_h;
+	uint32_t i;
+	int ret;
 
-	/* TODO: init queues and start queues. */
+	if (!hw->link_status || !hw->peer_dev_up)
+		return -EINVAL;
 
-	/* Map memory of bar_size to remote. */
-	hw->mz = rte_zmalloc("struct rte_memzone *",
-			     hw->mw_cnt * sizeof(struct rte_memzone *), 0);
-	for (i = 0; i < hw->mw_cnt; i++) {
-		ret = ntb_set_mw(dev, i, hw->mw_size[i]);
+	for (i = 0; i < hw->queue_pairs; i++) {
+		ret = ntb_queue_init(dev, i);
 		if (ret) {
-			NTB_LOG(ERR, "Fail to set mw.");
+			NTB_LOG(ERR, "Failed to init queue.");
 			return ret;
 		}
 	}
 
+	hw->peer_mw_base = rte_zmalloc("ntb_peer_mw_base", hw->mw_cnt *
+					sizeof(uint64_t), 0);
+
+	if (hw->ntb_ops->spad_read == NULL)
+		return -ENOTSUP;
+
+	peer_val = (*hw->ntb_ops->spad_read)(dev, SPAD_Q_SZ, 0);
+	if (peer_val != hw->queue_size) {
+		NTB_LOG(ERR, "Inconsistent queue size! (local: %u peer: %u)",
+			hw->queue_size, peer_val);
+		return -EINVAL;
+	}
+
+	peer_val = (*hw->ntb_ops->spad_read)(dev, SPAD_NUM_QPS, 0);
+	if (peer_val != hw->queue_pairs) {
+		NTB_LOG(ERR, "Inconsistent number of queues! (local: %u peer:"
+			" %u)", hw->queue_pairs, peer_val);
+		return -EINVAL;
+	}
+
+	hw->peer_used_mws = (*hw->ntb_ops->spad_read)(dev, SPAD_USED_MWS, 0);
+
+	for (i = 0; i < hw->peer_used_mws; i++) {
+		peer_base_h = (*hw->ntb_ops->spad_read)(dev,
+				SPAD_MW0_BA_H + 2 * i, 0);
+		peer_base_l = (*hw->ntb_ops->spad_read)(dev,
+				SPAD_MW0_BA_L + 2 * i, 0);
+		hw->peer_mw_base[i] = (peer_base_h << 32) + peer_base_l;
+	}
+
 	dev->started = 1;
 
 	return 0;
@@ -361,10 +692,10 @@ static void
 ntb_dev_stop(struct rte_rawdev *dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
+	struct ntb_rx_queue *rxq;
+	struct ntb_tx_queue *txq;
 	uint32_t time_out;
-	int status;
-
-	/* TODO: stop rx/tx queues. */
+	int status, i;
 
 	if (!hw->peer_dev_up)
 		goto clean;
@@ -405,6 +736,13 @@ ntb_dev_stop(struct rte_rawdev *dev)
 	if (status)
 		NTB_LOG(ERR, "Failed to clear doorbells.");
 
+	for (i = 0; i < hw->queue_pairs; i++) {
+		rxq = hw->rx_queues[i];
+		txq = hw->tx_queues[i];
+		ntb_rxq_release_mbufs(rxq);
+		ntb_txq_release_mbufs(txq);
+	}
+
 	dev->started = 0;
 }
 
@@ -413,12 +751,15 @@ ntb_dev_close(struct rte_rawdev *dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	struct rte_intr_handle *intr_handle;
-	int ret = 0;
+	int i;
 
 	if (dev->started)
 		ntb_dev_stop(dev);
 
-	/* TODO: free queues. */
+	/* free queues */
+	for (i = 0; i < hw->queue_pairs; i++)
+		ntb_queue_release(dev, i);
+	hw->queue_pairs = 0;
 
 	intr_handle = &hw->pci_dev->intr_handle;
 	/* Clean datapath event and vec mapping */
@@ -434,7 +775,7 @@ ntb_dev_close(struct rte_rawdev *dev)
 	rte_intr_callback_unregister(intr_handle,
 				     ntb_dev_intr_handler, dev);
 
-	return ret;
+	return 0;
 }
 
 static int
@@ -445,7 +786,7 @@ ntb_dev_reset(struct rte_rawdev *rawdev __rte_unused)
 
 static int
 ntb_attr_set(struct rte_rawdev *dev, const char *attr_name,
-				 uint64_t attr_value)
+	     uint64_t attr_value)
 {
 	struct ntb_hw *hw;
 	int index;
@@ -463,7 +804,21 @@ ntb_attr_set(struct rte_rawdev *dev, const char *attr_name,
 		index = atoi(&attr_name[NTB_SPAD_USER_LEN]);
 		(*hw->ntb_ops->spad_write)(dev, hw->spad_user_list[index],
 					   1, attr_value);
-		NTB_LOG(INFO, "Set attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Set attribute (%s) Value (%" PRIu64 ")",
+			attr_name, attr_value);
+		return 0;
+	}
+
+	if (!strncmp(attr_name, NTB_QUEUE_SZ_NAME, NTB_ATTR_NAME_LEN)) {
+		hw->queue_size = attr_value;
+		NTB_LOG(DEBUG, "Set attribute (%s) Value (%" PRIu64 ")",
+			attr_name, attr_value);
+		return 0;
+	}
+
+	if (!strncmp(attr_name, NTB_QUEUE_NUM_NAME, NTB_ATTR_NAME_LEN)) {
+		hw->queue_pairs = attr_value;
+		NTB_LOG(DEBUG, "Set attribute (%s) Value (%" PRIu64 ")",
 			attr_name, attr_value);
 		return 0;
 	}
@@ -475,7 +830,7 @@ ntb_attr_set(struct rte_rawdev *dev, const char *attr_name,
 
 static int
 ntb_attr_get(struct rte_rawdev *dev, const char *attr_name,
-				 uint64_t *attr_value)
+	     uint64_t *attr_value)
 {
 	struct ntb_hw *hw;
 	int index;
@@ -489,49 +844,50 @@ ntb_attr_get(struct rte_rawdev *dev, const char *attr_name,
 
 	if (!strncmp(attr_name, NTB_TOPO_NAME, NTB_ATTR_NAME_LEN)) {
 		*attr_value = hw->topo;
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
 
 	if (!strncmp(attr_name, NTB_LINK_STATUS_NAME, NTB_ATTR_NAME_LEN)) {
-		*attr_value = hw->link_status;
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		/* hw->link_status only indicates hw link status. */
+		*attr_value = hw->link_status && hw->peer_dev_up;
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
 
 	if (!strncmp(attr_name, NTB_SPEED_NAME, NTB_ATTR_NAME_LEN)) {
 		*attr_value = hw->link_speed;
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
 
 	if (!strncmp(attr_name, NTB_WIDTH_NAME, NTB_ATTR_NAME_LEN)) {
 		*attr_value = hw->link_width;
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
 
 	if (!strncmp(attr_name, NTB_MW_CNT_NAME, NTB_ATTR_NAME_LEN)) {
 		*attr_value = hw->mw_cnt;
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
 
 	if (!strncmp(attr_name, NTB_DB_CNT_NAME, NTB_ATTR_NAME_LEN)) {
 		*attr_value = hw->db_cnt;
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
 
 	if (!strncmp(attr_name, NTB_SPAD_CNT_NAME, NTB_ATTR_NAME_LEN)) {
 		*attr_value = hw->spad_cnt;
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
@@ -542,7 +898,7 @@ ntb_attr_get(struct rte_rawdev *dev, const char *attr_name,
 		index = atoi(&attr_name[NTB_SPAD_USER_LEN]);
 		*attr_value = (*hw->ntb_ops->spad_read)(dev,
 				hw->spad_user_list[index], 0);
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
@@ -585,6 +941,7 @@ ntb_xstats_reset(struct rte_rawdev *dev __rte_unused,
 	return 0;
 }
 
+
 static const struct rte_rawdev_ops ntb_ops = {
 	.dev_info_get         = ntb_dev_info_get,
 	.dev_configure        = ntb_dev_configure,
@@ -615,7 +972,6 @@ ntb_init_hw(struct rte_rawdev *dev, struct rte_pci_device *pci_dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	struct rte_intr_handle *intr_handle;
-	uint32_t val;
 	int ret, i;
 
 	hw->pci_dev = pci_dev;
@@ -688,45 +1044,6 @@ ntb_init_hw(struct rte_rawdev *dev, struct rte_pci_device *pci_dev)
 	/* enable uio intr after callback register */
 	rte_intr_enable(intr_handle);
 
-	if (hw->ntb_ops->spad_write == NULL) {
-		NTB_LOG(ERR, "Scratchpad is not supported.");
-		return -ENOTSUP;
-	}
-	/* Tell peer the mw_cnt of local side. */
-	ret = (*hw->ntb_ops->spad_write)(dev, SPAD_NUM_MWS, 1, hw->mw_cnt);
-	if (ret) {
-		NTB_LOG(ERR, "Failed to tell peer mw count.");
-		return ret;
-	}
-
-	/* Tell peer each mw size on local side. */
-	for (i = 0; i < hw->mw_cnt; i++) {
-		NTB_LOG(DEBUG, "Local %u mw size: 0x%"PRIx64"", i,
-				hw->mw_size[i]);
-		val = hw->mw_size[i] >> 32;
-		ret = (*hw->ntb_ops->spad_write)
-				(dev, SPAD_MW0_SZ_H + 2 * i, 1, val);
-		if (ret) {
-			NTB_LOG(ERR, "Failed to tell peer mw size.");
-			return ret;
-		}
-
-		val = hw->mw_size[i];
-		ret = (*hw->ntb_ops->spad_write)
-				(dev, SPAD_MW0_SZ_L + 2 * i, 1, val);
-		if (ret) {
-			NTB_LOG(ERR, "Failed to tell peer mw size.");
-			return ret;
-		}
-	}
-
-	/* Ring doorbell 0 to tell peer the device is ready. */
-	ret = (*hw->ntb_ops->peer_db_set)(dev, 0);
-	if (ret) {
-		NTB_LOG(ERR, "Failed to tell peer device is probed.");
-		return ret;
-	}
-
 	return ret;
 }
 
@@ -839,5 +1156,5 @@ RTE_INIT(ntb_init_log)
 {
 	ntb_logtype = rte_log_register("pmd.raw.ntb");
 	if (ntb_logtype >= 0)
-		rte_log_set_level(ntb_logtype, RTE_LOG_DEBUG);
+		rte_log_set_level(ntb_logtype, RTE_LOG_INFO);
 }
diff --git a/drivers/raw/ntb/ntb.h b/drivers/raw/ntb/ntb.h
index d355231b0..0ad20aed3 100644
--- a/drivers/raw/ntb/ntb.h
+++ b/drivers/raw/ntb/ntb.h
@@ -2,8 +2,8 @@
  * Copyright(c) 2019 Intel Corporation.
  */
 
-#ifndef _NTB_RAWDEV_H_
-#define _NTB_RAWDEV_H_
+#ifndef _NTB_H_
+#define _NTB_H_
 
 #include <stdbool.h>
 
@@ -19,38 +19,13 @@ extern int ntb_logtype;
 /* Device IDs */
 #define NTB_INTEL_DEV_ID_B2B_SKX    0x201C
 
-#define NTB_TOPO_NAME               "topo"
-#define NTB_LINK_STATUS_NAME        "link_status"
-#define NTB_SPEED_NAME              "speed"
-#define NTB_WIDTH_NAME              "width"
-#define NTB_MW_CNT_NAME             "mw_count"
-#define NTB_DB_CNT_NAME             "db_count"
-#define NTB_SPAD_CNT_NAME           "spad_count"
 /* Reserved to app to use. */
 #define NTB_SPAD_USER               "spad_user_"
 #define NTB_SPAD_USER_LEN           (sizeof(NTB_SPAD_USER) - 1)
-#define NTB_SPAD_USER_MAX_NUM       10
+#define NTB_SPAD_USER_MAX_NUM       4
 #define NTB_ATTR_NAME_LEN           30
-#define NTB_ATTR_VAL_LEN            30
-#define NTB_ATTR_MAX                20
-
-/* NTB Attributes */
-struct ntb_attr {
-	/**< Name of the attribute */
-	char name[NTB_ATTR_NAME_LEN];
-	/**< Value or reference of value of attribute */
-	char value[NTB_ATTR_NAME_LEN];
-};
 
-enum ntb_attr_idx {
-	NTB_TOPO_ID = 0,
-	NTB_LINK_STATUS_ID,
-	NTB_SPEED_ID,
-	NTB_WIDTH_ID,
-	NTB_MW_CNT_ID,
-	NTB_DB_CNT_ID,
-	NTB_SPAD_CNT_ID,
-};
+#define NTB_DFLT_TX_FREE_THRESH     256
 
 enum ntb_topo {
 	NTB_TOPO_NONE = 0,
@@ -87,10 +62,15 @@ enum ntb_spad_idx {
 	SPAD_NUM_MWS = 1,
 	SPAD_NUM_QPS,
 	SPAD_Q_SZ,
+	SPAD_USED_MWS,
 	SPAD_MW0_SZ_H,
 	SPAD_MW0_SZ_L,
 	SPAD_MW1_SZ_H,
 	SPAD_MW1_SZ_L,
+	SPAD_MW0_BA_H,
+	SPAD_MW0_BA_L,
+	SPAD_MW1_BA_H,
+	SPAD_MW1_BA_L,
 };
 
 /**
@@ -110,26 +90,97 @@ enum ntb_spad_idx {
  * @vector_bind: Bind vector source [intr] to msix vector [msix].
  */
 struct ntb_dev_ops {
-	int (*ntb_dev_init)(struct rte_rawdev *dev);
-	void *(*get_peer_mw_addr)(struct rte_rawdev *dev, int mw_idx);
-	int (*mw_set_trans)(struct rte_rawdev *dev, int mw_idx,
+	int (*ntb_dev_init)(const struct rte_rawdev *dev);
+	void *(*get_peer_mw_addr)(const struct rte_rawdev *dev, int mw_idx);
+	int (*mw_set_trans)(const struct rte_rawdev *dev, int mw_idx,
 			    uint64_t addr, uint64_t size);
-	int (*get_link_status)(struct rte_rawdev *dev);
-	int (*set_link)(struct rte_rawdev *dev, bool up);
-	uint32_t (*spad_read)(struct rte_rawdev *dev, int spad, bool peer);
-	int (*spad_write)(struct rte_rawdev *dev, int spad,
+	int (*get_link_status)(const struct rte_rawdev *dev);
+	int (*set_link)(const struct rte_rawdev *dev, bool up);
+	uint32_t (*spad_read)(const struct rte_rawdev *dev, int spad,
+			      bool peer);
+	int (*spad_write)(const struct rte_rawdev *dev, int spad,
 			  bool peer, uint32_t spad_v);
-	uint64_t (*db_read)(struct rte_rawdev *dev);
-	int (*db_clear)(struct rte_rawdev *dev, uint64_t db_bits);
-	int (*db_set_mask)(struct rte_rawdev *dev, uint64_t db_mask);
-	int (*peer_db_set)(struct rte_rawdev *dev, uint8_t db_bit);
-	int (*vector_bind)(struct rte_rawdev *dev, uint8_t intr, uint8_t msix);
+	uint64_t (*db_read)(const struct rte_rawdev *dev);
+	int (*db_clear)(const struct rte_rawdev *dev, uint64_t db_bits);
+	int (*db_set_mask)(const struct rte_rawdev *dev, uint64_t db_mask);
+	int (*peer_db_set)(const struct rte_rawdev *dev, uint8_t db_bit);
+	int (*vector_bind)(const struct rte_rawdev *dev, uint8_t intr,
+			   uint8_t msix);
+};
+
+struct ntb_desc {
+	uint64_t addr; /* buffer addr */
+	uint16_t len;  /* buffer length */
+	uint16_t rsv1;
+	uint32_t rsv2;
+};
+
+struct ntb_used {
+	uint16_t len;     /* buffer length */
+#define NTB_FLAG_EOP    1 /* end of packet */
+	uint16_t flags;   /* flags */
+};
+
+struct ntb_rx_entry {
+	struct rte_mbuf *mbuf;
+};
+
+struct ntb_rx_queue {
+	struct ntb_desc *rx_desc_ring;
+	volatile struct ntb_used *rx_used_ring;
+	uint16_t *avail_cnt;
+	volatile uint16_t *used_cnt;
+	uint16_t last_avail;
+	uint16_t last_used;
+	uint16_t nb_rx_desc;
+
+	uint16_t rx_free_thresh;
+
+	struct rte_mempool *mpool; /**< mempool for mbuf allocation */
+	struct ntb_rx_entry *sw_ring;
+
+	uint16_t queue_id;         /**< DPDK queue index. */
+	uint16_t port_id;          /**< Device port identifier. */
+
+	struct ntb_hw *hw;
+};
+
+struct ntb_tx_entry {
+	struct rte_mbuf *mbuf;
+	uint16_t next_id;
+	uint16_t last_id;
+};
+
+struct ntb_tx_queue {
+	volatile struct ntb_desc *tx_desc_ring;
+	struct ntb_used *tx_used_ring;
+	volatile uint16_t *avail_cnt;
+	uint16_t *used_cnt;
+	uint16_t last_avail;          /**< Next need to be free. */
+	uint16_t last_used;           /**< Next need to be sent. */
+	uint16_t nb_tx_desc;
+
+	/**< Total number of TX descriptors ready to be allocated. */
+	uint16_t nb_tx_free;
+	uint16_t tx_free_thresh;
+
+	struct ntb_tx_entry *sw_ring;
+
+	uint16_t queue_id;            /**< DPDK queue index. */
+	uint16_t port_id;             /**< Device port identifier. */
+
+	struct ntb_hw *hw;
+};
+
+struct ntb_header {
+	uint16_t avail_cnt __rte_cache_aligned;
+	uint16_t used_cnt __rte_cache_aligned;
+	struct ntb_desc desc_ring[] __rte_cache_aligned;
 };
 
 /* ntb private data. */
 struct ntb_hw {
 	uint8_t mw_cnt;
-	uint8_t peer_mw_cnt;
 	uint8_t db_cnt;
 	uint8_t spad_cnt;
 
@@ -147,18 +198,26 @@ struct ntb_hw {
 	struct rte_pci_device *pci_dev;
 	char *hw_addr;
 
-	uint64_t *mw_size;
-	uint64_t *peer_mw_size;
 	uint8_t peer_dev_up;
+	uint64_t *mw_size;
+	/* remote mem base addr */
+	uint64_t *peer_mw_base;
 
 	uint16_t queue_pairs;
 	uint16_t queue_size;
+	uint32_t hdr_size_per_queue;
+
+	struct ntb_rx_queue **rx_queues;
+	struct ntb_tx_queue **tx_queues;
 
-	/**< mem zone to populate RX ring. */
+	/* memzone to populate RX ring. */
 	const struct rte_memzone **mz;
+	uint8_t used_mw_num;
+
+	uint8_t peer_used_mws;
 
 	/* Reserve several spad for app to use. */
 	int spad_user_list[NTB_SPAD_USER_MAX_NUM];
 };
 
-#endif /* _NTB_RAWDEV_H_ */
+#endif /* _NTB_H_ */
diff --git a/drivers/raw/ntb/ntb_hw_intel.c b/drivers/raw/ntb/ntb_hw_intel.c
index 21eaa8511..0e73f1609 100644
--- a/drivers/raw/ntb/ntb_hw_intel.c
+++ b/drivers/raw/ntb/ntb_hw_intel.c
@@ -26,7 +26,7 @@ static enum xeon_ntb_bar intel_ntb_bar[] = {
 };
 
 static int
-intel_ntb_dev_init(struct rte_rawdev *dev)
+intel_ntb_dev_init(const struct rte_rawdev *dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint8_t reg_val, bar;
@@ -77,7 +77,7 @@ intel_ntb_dev_init(struct rte_rawdev *dev)
 	hw->db_cnt = XEON_DB_COUNT;
 	hw->spad_cnt = XEON_SPAD_COUNT;
 
-	hw->mw_size = rte_zmalloc("uint64_t",
+	hw->mw_size = rte_zmalloc("ntb_mw_size",
 				  hw->mw_cnt * sizeof(uint64_t), 0);
 	for (i = 0; i < hw->mw_cnt; i++) {
 		bar = intel_ntb_bar[i];
@@ -94,7 +94,7 @@ intel_ntb_dev_init(struct rte_rawdev *dev)
 }
 
 static void *
-intel_ntb_get_peer_mw_addr(struct rte_rawdev *dev, int mw_idx)
+intel_ntb_get_peer_mw_addr(const struct rte_rawdev *dev, int mw_idx)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint8_t bar;
@@ -116,7 +116,7 @@ intel_ntb_get_peer_mw_addr(struct rte_rawdev *dev, int mw_idx)
 }
 
 static int
-intel_ntb_mw_set_trans(struct rte_rawdev *dev, int mw_idx,
+intel_ntb_mw_set_trans(const struct rte_rawdev *dev, int mw_idx,
 		       uint64_t addr, uint64_t size)
 {
 	struct ntb_hw *hw = dev->dev_private;
@@ -163,7 +163,7 @@ intel_ntb_mw_set_trans(struct rte_rawdev *dev, int mw_idx,
 }
 
 static int
-intel_ntb_get_link_status(struct rte_rawdev *dev)
+intel_ntb_get_link_status(const struct rte_rawdev *dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint16_t reg_val;
@@ -195,7 +195,7 @@ intel_ntb_get_link_status(struct rte_rawdev *dev)
 }
 
 static int
-intel_ntb_set_link(struct rte_rawdev *dev, bool up)
+intel_ntb_set_link(const struct rte_rawdev *dev, bool up)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint32_t ntb_ctrl, reg_off;
@@ -221,7 +221,7 @@ intel_ntb_set_link(struct rte_rawdev *dev, bool up)
 }
 
 static uint32_t
-intel_ntb_spad_read(struct rte_rawdev *dev, int spad, bool peer)
+intel_ntb_spad_read(const struct rte_rawdev *dev, int spad, bool peer)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint32_t spad_v, reg_off;
@@ -241,7 +241,7 @@ intel_ntb_spad_read(struct rte_rawdev *dev, int spad, bool peer)
 }
 
 static int
-intel_ntb_spad_write(struct rte_rawdev *dev, int spad,
+intel_ntb_spad_write(const struct rte_rawdev *dev, int spad,
 		     bool peer, uint32_t spad_v)
 {
 	struct ntb_hw *hw = dev->dev_private;
@@ -263,7 +263,7 @@ intel_ntb_spad_write(struct rte_rawdev *dev, int spad,
 }
 
 static uint64_t
-intel_ntb_db_read(struct rte_rawdev *dev)
+intel_ntb_db_read(const struct rte_rawdev *dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint64_t db_off, db_bits;
@@ -278,7 +278,7 @@ intel_ntb_db_read(struct rte_rawdev *dev)
 }
 
 static int
-intel_ntb_db_clear(struct rte_rawdev *dev, uint64_t db_bits)
+intel_ntb_db_clear(const struct rte_rawdev *dev, uint64_t db_bits)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint64_t db_off;
@@ -293,7 +293,7 @@ intel_ntb_db_clear(struct rte_rawdev *dev, uint64_t db_bits)
 }
 
 static int
-intel_ntb_db_set_mask(struct rte_rawdev *dev, uint64_t db_mask)
+intel_ntb_db_set_mask(const struct rte_rawdev *dev, uint64_t db_mask)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint64_t db_m_off;
@@ -312,7 +312,7 @@ intel_ntb_db_set_mask(struct rte_rawdev *dev, uint64_t db_mask)
 }
 
 static int
-intel_ntb_peer_db_set(struct rte_rawdev *dev, uint8_t db_idx)
+intel_ntb_peer_db_set(const struct rte_rawdev *dev, uint8_t db_idx)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint32_t db_off;
@@ -332,7 +332,7 @@ intel_ntb_peer_db_set(struct rte_rawdev *dev, uint8_t db_idx)
 }
 
 static int
-intel_ntb_vector_bind(struct rte_rawdev *dev, uint8_t intr, uint8_t msix)
+intel_ntb_vector_bind(const struct rte_rawdev *dev, uint8_t intr, uint8_t msix)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint8_t reg_off;
diff --git a/drivers/raw/ntb/rte_pmd_ntb.h b/drivers/raw/ntb/rte_pmd_ntb.h
new file mode 100644
index 000000000..6591ce793
--- /dev/null
+++ b/drivers/raw/ntb/rte_pmd_ntb.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Intel Corporation.
+ */
+
+#ifndef _RTE_PMD_NTB_H_
+#define _RTE_PMD_NTB_H_
+
+/* App needs to set/get these attrs */
+#define NTB_QUEUE_SZ_NAME           "queue_size"
+#define NTB_QUEUE_NUM_NAME          "queue_num"
+#define NTB_TOPO_NAME               "topo"
+#define NTB_LINK_STATUS_NAME        "link_status"
+#define NTB_SPEED_NAME              "speed"
+#define NTB_WIDTH_NAME              "width"
+#define NTB_MW_CNT_NAME             "mw_count"
+#define NTB_DB_CNT_NAME             "db_count"
+#define NTB_SPAD_CNT_NAME           "spad_count"
+
+#define NTB_MAX_DESC_SIZE           1024
+#define NTB_MIN_DESC_SIZE           64
+
+struct ntb_dev_info {
+	uint32_t ntb_hdr_size;
+	/**< memzone needs to be mw size align or not. */
+	uint8_t mw_size_align;
+	uint8_t mw_cnt;
+	uint64_t *mw_size;
+};
+
+struct ntb_dev_config {
+	uint16_t num_queues;
+	uint16_t queue_size;
+	uint8_t mz_num;
+	const struct rte_memzone **mz_list;
+};
+
+struct ntb_queue_conf {
+	uint16_t nb_desc;
+	uint16_t tx_free_thresh;
+	struct rte_mempool *rx_mp;
+};
+
+#endif /* _RTE_PMD_NTB_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [PATCH v2 2/4] raw/ntb: add xstats support
  2019-09-06  3:02 ` [dpdk-dev] [PATCH v2 " Xiaoyun Li
  2019-09-06  3:02   ` [dpdk-dev] [PATCH v2 1/4] raw/ntb: setup ntb queue Xiaoyun Li
@ 2019-09-06  3:02   ` Xiaoyun Li
  2019-09-06  3:02   ` [dpdk-dev] [PATCH v2 3/4] raw/ntb: add enqueue and dequeue functions Xiaoyun Li
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 42+ messages in thread
From: Xiaoyun Li @ 2019-09-06  3:02 UTC (permalink / raw)
  To: jingjing.wu, keith.wiles, omkar.maslekar, cunming.liang; +Cc: dev, Xiaoyun Li

Add xstats support for ntb rawdev.
Support tx-packets, tx-bytes, tx-errors and
rx-packets, rx-bytes, rx-missed.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
---
 drivers/raw/ntb/ntb.c | 135 ++++++++++++++++++++++++++++++++++++------
 drivers/raw/ntb/ntb.h |  11 ++++
 2 files changed, 128 insertions(+), 18 deletions(-)

diff --git a/drivers/raw/ntb/ntb.c b/drivers/raw/ntb/ntb.c
index 02784a134..7f26f8a4b 100644
--- a/drivers/raw/ntb/ntb.c
+++ b/drivers/raw/ntb/ntb.c
@@ -30,6 +30,17 @@ static const struct rte_pci_id pci_id_ntb_map[] = {
 	{ .vendor_id = 0, /* sentinel */ },
 };
 
+/* Align with enum ntb_xstats_idx */
+static struct rte_rawdev_xstats_name ntb_xstats_names[] = {
+	{"Tx-packets"},
+	{"Tx-bytes"},
+	{"Tx-errors"},
+	{"Rx-packets"},
+	{"Rx-bytes"},
+	{"Rx-missed"},
+};
+#define NTB_XSTATS_NUM RTE_DIM(ntb_xstats_names)
+
 static inline void
 ntb_link_cleanup(struct rte_rawdev *dev)
 {
@@ -538,6 +549,10 @@ ntb_queue_init(struct rte_rawdev *dev, uint16_t qp_id)
 	txq->last_avail = 0;
 	txq->nb_tx_free = txq->nb_tx_desc - 1;
 
+	/* Set per queue stats. */
+	for (i = 0; i < NTB_XSTATS_NUM; i++)
+		hw->ntb_xstats[i + NTB_XSTATS_NUM * (qp_id + 1)] = 0;
+
 	return 0;
 }
 
@@ -614,6 +629,7 @@ ntb_dev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config)
 {
 	struct ntb_dev_config *conf = config;
 	struct ntb_hw *hw = dev->dev_private;
+	uint32_t xstats_num;
 	int ret;
 
 	hw->queue_pairs	= conf->num_queues;
@@ -624,6 +640,10 @@ ntb_dev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config)
 			sizeof(struct ntb_rx_queue *) * hw->queue_pairs, 0);
 	hw->tx_queues = rte_zmalloc("ntb_tx_queues",
 			sizeof(struct ntb_tx_queue *) * hw->queue_pairs, 0);
+	/* First total stats, then per queue stats. */
+	xstats_num = (hw->queue_pairs + 1) * NTB_XSTATS_NUM;
+	hw->ntb_xstats = rte_zmalloc("ntb_xstats", xstats_num *
+				     sizeof(uint64_t), 0);
 
 	/* Start handshake with the peer. */
 	ret = ntb_handshake_work(dev);
@@ -645,6 +665,10 @@ ntb_dev_start(struct rte_rawdev *dev)
 	if (!hw->link_status || !hw->peer_dev_up)
 		return -EINVAL;
 
+	/* Set total stats. */
+	for (i = 0; i < NTB_XSTATS_NUM; i++)
+		hw->ntb_xstats[i] = 0;
+
 	for (i = 0; i < hw->queue_pairs; i++) {
 		ret = ntb_queue_init(dev, i);
 		if (ret) {
@@ -909,38 +933,113 @@ ntb_attr_get(struct rte_rawdev *dev, const char *attr_name,
 }
 
 static int
-ntb_xstats_get(const struct rte_rawdev *dev __rte_unused,
-	       const unsigned int ids[] __rte_unused,
-	       uint64_t values[] __rte_unused,
-	       unsigned int n __rte_unused)
+ntb_xstats_get(const struct rte_rawdev *dev,
+	       const unsigned int ids[],
+	       uint64_t values[],
+	       unsigned int n)
 {
-	return 0;
+	struct ntb_hw *hw = dev->dev_private;
+	uint32_t i, j, off, xstats_num;
+
+	/* Calculate total stats of all queues. */
+	for (i = 0; i < NTB_XSTATS_NUM; i++) {
+		for (j = 0; j < hw->queue_pairs; j++) {
+			off = NTB_XSTATS_NUM * (j + 1) + i;
+			hw->ntb_xstats[i] += hw->ntb_xstats[off];
+		}
+	}
+
+	xstats_num = NTB_XSTATS_NUM * (hw->queue_pairs + 1);
+	for (i = 0; i < n && ids[i] < xstats_num; i++)
+		values[i] = hw->ntb_xstats[ids[i]];
+
+	return i;
 }
 
 static int
-ntb_xstats_get_names(const struct rte_rawdev *dev __rte_unused,
-		     struct rte_rawdev_xstats_name *xstats_names __rte_unused,
-		     unsigned int size __rte_unused)
+ntb_xstats_get_names(const struct rte_rawdev *dev,
+		     struct rte_rawdev_xstats_name *xstats_names,
+		     unsigned int size)
 {
-	return 0;
+	struct ntb_hw *hw = dev->dev_private;
+	uint32_t xstats_num, i, j, off;
+
+	xstats_num = NTB_XSTATS_NUM * (hw->queue_pairs + 1);
+	if (xstats_names == NULL || size < xstats_num)
+		return xstats_num;
+
+	/* Total stats names */
+	for (i = 0; i < NTB_XSTATS_NUM; i++) {
+		strncpy(xstats_names[i].name, ntb_xstats_names[i].name,
+			RTE_RAW_DEV_XSTATS_NAME_SIZE);
+	}
+
+	/* Queue stats names */
+	for (i = 0; i < hw->queue_pairs; i++) {
+		for (j = 0; j < NTB_XSTATS_NUM; j++) {
+			off = j + (i + 1) * NTB_XSTATS_NUM;
+			snprintf(xstats_names[off].name,
+				sizeof(xstats_names[0].name),
+				"%s_q%u", ntb_xstats_names[j].name, i);
+		}
+	}
+
+	return xstats_num;
 }
 
 static uint64_t
-ntb_xstats_get_by_name(const struct rte_rawdev *dev __rte_unused,
-		       const char *name __rte_unused,
-		       unsigned int *id __rte_unused)
+ntb_xstats_get_by_name(const struct rte_rawdev *dev,
+		       const char *name, unsigned int *id)
 {
-	return 0;
+	struct rte_rawdev_xstats_name *xstats_names;
+	struct ntb_hw *hw = dev->dev_private;
+	uint32_t xstats_num, i, j, off;
+
+	if (name == NULL)
+		return -EINVAL;
+
+	xstats_num = NTB_XSTATS_NUM * (hw->queue_pairs + 1);
+	xstats_names = rte_zmalloc("ntb_stats_name",
+				   sizeof(struct rte_rawdev_xstats_name) *
+				   xstats_num, 0);
+	ntb_xstats_get_names(dev, xstats_names, xstats_num);
+
+	/* Calculate total stats of all queues. */
+	for (i = 0; i < NTB_XSTATS_NUM; i++) {
+		for (j = 0; j < hw->queue_pairs; j++) {
+			off = NTB_XSTATS_NUM * (j + 1) + i;
+			hw->ntb_xstats[i] += hw->ntb_xstats[off];
+		}
+	}
+
+	for (i = 0; i < xstats_num; i++) {
+		if (!strncmp(name, xstats_names[i].name,
+		    RTE_RAW_DEV_XSTATS_NAME_SIZE)) {
+			*id = i;
+			rte_free(xstats_names);
+			return hw->ntb_xstats[i];
+		}
+	}
+
+	NTB_LOG(ERR, "Cannot find the xstats name.");
+
+	return -EINVAL;
 }
 
 static int
-ntb_xstats_reset(struct rte_rawdev *dev __rte_unused,
-		 const uint32_t ids[] __rte_unused,
-		 uint32_t nb_ids __rte_unused)
+ntb_xstats_reset(struct rte_rawdev *dev,
+		 const uint32_t ids[],
+		 uint32_t nb_ids)
 {
-	return 0;
-}
+	struct ntb_hw *hw = dev->dev_private;
+	uint32_t i, xstats_num;
 
+	xstats_num = NTB_XSTATS_NUM * (hw->queue_pairs + 1);
+	for (i = 0; i < nb_ids && ids[i] < xstats_num; i++)
+		hw->ntb_xstats[ids[i]] = 0;
+
+	return i;
+}
 
 static const struct rte_rawdev_ops ntb_ops = {
 	.dev_info_get         = ntb_dev_info_get,
diff --git a/drivers/raw/ntb/ntb.h b/drivers/raw/ntb/ntb.h
index 0ad20aed3..09e28050f 100644
--- a/drivers/raw/ntb/ntb.h
+++ b/drivers/raw/ntb/ntb.h
@@ -27,6 +27,15 @@ extern int ntb_logtype;
 
 #define NTB_DFLT_TX_FREE_THRESH     256
 
+enum ntb_xstats_idx {
+	NTB_TX_PKTS_ID = 0,
+	NTB_TX_BYTES_ID,
+	NTB_TX_ERRS_ID,
+	NTB_RX_PKTS_ID,
+	NTB_RX_BYTES_ID,
+	NTB_RX_MISS_ID,
+};
+
 enum ntb_topo {
 	NTB_TOPO_NONE = 0,
 	NTB_TOPO_B2B_USD,
@@ -216,6 +225,8 @@ struct ntb_hw {
 
 	uint8_t peer_used_mws;
 
+	uint64_t *ntb_xstats;
+
 	/* Reserve several spad for app to use. */
 	int spad_user_list[NTB_SPAD_USER_MAX_NUM];
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [PATCH v2 3/4] raw/ntb: add enqueue and dequeue functions
  2019-09-06  3:02 ` [dpdk-dev] [PATCH v2 " Xiaoyun Li
  2019-09-06  3:02   ` [dpdk-dev] [PATCH v2 1/4] raw/ntb: setup ntb queue Xiaoyun Li
  2019-09-06  3:02   ` [dpdk-dev] [PATCH v2 2/4] raw/ntb: add xstats support Xiaoyun Li
@ 2019-09-06  3:02   ` Xiaoyun Li
  2019-09-06  3:02   ` [dpdk-dev] [PATCH v2 4/4] examples/ntb: support more functions for NTB Xiaoyun Li
  2019-09-06  7:53   ` [dpdk-dev] [PATCH v3 0/4] enable FIFO " Xiaoyun Li
  4 siblings, 0 replies; 42+ messages in thread
From: Xiaoyun Li @ 2019-09-06  3:02 UTC (permalink / raw)
  To: jingjing.wu, keith.wiles, omkar.maslekar, cunming.liang; +Cc: dev, Xiaoyun Li

Introduce enqueue and dequeue functions to support packet based
processing. And enable write-combining for ntb driver since it
can improve the performance a lot.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
---
 doc/guides/rawdevs/ntb.rst     |  28 ++++
 drivers/raw/ntb/ntb.c          | 242 ++++++++++++++++++++++++++++++---
 drivers/raw/ntb/ntb.h          |   2 +
 drivers/raw/ntb/ntb_hw_intel.c |  22 +++
 4 files changed, 275 insertions(+), 19 deletions(-)

diff --git a/doc/guides/rawdevs/ntb.rst b/doc/guides/rawdevs/ntb.rst
index 99e7db441..afd5769fc 100644
--- a/doc/guides/rawdevs/ntb.rst
+++ b/doc/guides/rawdevs/ntb.rst
@@ -45,6 +45,24 @@ to use, i.e. igb_uio, vfio. The ``dpdk-devbind.py`` script can be used to
 show devices status and to bind them to a suitable kernel driver. They will
 appear under the category of "Misc (rawdev) devices".
 
+Prerequisites
+-------------
+NTB PMD needs kernel PCI driver to support write combining (WC) to get
+better performance. The difference will be more than 10 times.
+To enable WC, there are 2 ways.
+- Insert igb_uio with ``wc_active=1`` flag if use igb_uio driver.
+     insmod igb_uio.ko wc_active=1
+- Enable WC for NTB device's Bar 2 and Bar 4 (Mapped memory) manually.
+     Get bar base address using ``lspci -vvv -s ae:00.0 | grep Region``.
+        Region 0: Memory at 39bfe0000000 (64-bit, prefetchable) [size=64K]
+        Region 2: Memory at 39bfa0000000 (64-bit, prefetchable) [size=512M]
+        Region 4: Memory at 39bfc0000000 (64-bit, prefetchable) [size=512M]
+     Using the following command to enable WC.
+     echo "base=0x39bfa0000000 size=0x400000 type=write-combining" >> /proc/mtrr
+     echo "base=0x39bfc0000000 size=0x400000 type=write-combining" >> /proc/mtrr
+     To disable WC for these regions, using the following.
+     echo "disable=1" >> /proc/mtrr
+
 Ring Layout
 -----------
 
@@ -83,6 +101,16 @@ like the following:
       +------------------------+   +------------------------+
                     <---------traffic---------
 
+- Enqueue and Dequeue
+  Based on this ring layout, enqueue reads rx_tail to get how many free
+  buffers and writes used_ring and tx_tail to tell the peer which buffers
+  are filled with data.
+  And dequeue reads tx_tail to get how many packets are arrived, and
+  writes desc_ring and rx_tail to tell the peer about the new allocated
+  buffers.
+  So in this way, only remote write happens and remote read can be avoid
+  to get better performance.
+
 Limitation
 ----------
 
diff --git a/drivers/raw/ntb/ntb.c b/drivers/raw/ntb/ntb.c
index 7f26f8a4b..c227afab9 100644
--- a/drivers/raw/ntb/ntb.c
+++ b/drivers/raw/ntb/ntb.c
@@ -556,26 +556,140 @@ ntb_queue_init(struct rte_rawdev *dev, uint16_t qp_id)
 	return 0;
 }
 
+static inline void
+ntb_enqueue_cleanup(struct ntb_tx_queue *txq)
+{
+	struct ntb_tx_entry *sw_ring = txq->sw_ring;
+	uint16_t tx_free = txq->last_avail;
+	uint16_t nb_to_clean, i;
+
+	/* avail_cnt + 1 represents where to rx next in the peer. */
+	nb_to_clean = (*txq->avail_cnt - txq->last_avail + 1 +
+			txq->nb_tx_desc) & (txq->nb_tx_desc - 1);
+	nb_to_clean = RTE_MIN(nb_to_clean, txq->tx_free_thresh);
+	for (i = 0; i < nb_to_clean; i++) {
+		if (sw_ring[tx_free].mbuf)
+			rte_pktmbuf_free_seg(sw_ring[tx_free].mbuf);
+		tx_free = (tx_free + 1) & (txq->nb_tx_desc - 1);
+	}
+
+	txq->nb_tx_free += nb_to_clean;
+	txq->last_avail = tx_free;
+}
+
 static int
 ntb_enqueue_bufs(struct rte_rawdev *dev,
 		 struct rte_rawdev_buf **buffers,
 		 unsigned int count,
 		 rte_rawdev_obj_t context)
 {
-	/* Not FIFO right now. Just for testing memory write. */
 	struct ntb_hw *hw = dev->dev_private;
-	unsigned int i;
-	void *bar_addr;
-	size_t size;
+	struct ntb_tx_queue *txq = hw->tx_queues[(size_t)context];
+	struct ntb_tx_entry *sw_ring = txq->sw_ring;
+	struct rte_mbuf *txm;
+	struct ntb_used tx_used[NTB_MAX_DESC_SIZE];
+	volatile struct ntb_desc *tx_item;
+	uint16_t tx_last, nb_segs, off, last_used, avail_cnt;
+	uint16_t nb_mbufs = 0;
+	uint16_t nb_tx = 0;
+	uint64_t bytes = 0;
+	void *buf_addr;
+	int i;
 
-	if (hw->ntb_ops->get_peer_mw_addr == NULL)
-		return -ENOTSUP;
-	bar_addr = (*hw->ntb_ops->get_peer_mw_addr)(dev, 0);
-	size = (size_t)context;
+	if (unlikely(hw->ntb_ops->ioremap == NULL)) {
+		NTB_LOG(ERR, "Ioremap not supported.");
+		return nb_tx;
+	}
 
-	for (i = 0; i < count; i++)
-		rte_memcpy(bar_addr, buffers[i]->buf_addr, size);
-	return 0;
+	if (unlikely(dev->started == 0 || hw->peer_dev_up == 0)) {
+		NTB_LOG(DEBUG, "Link is not up.");
+		return nb_tx;
+	}
+
+	if (txq->nb_tx_free < txq->tx_free_thresh)
+		ntb_enqueue_cleanup(txq);
+
+	off = NTB_XSTATS_NUM * ((size_t)context + 1);
+	last_used = txq->last_used;
+	avail_cnt = *txq->avail_cnt;/* Where to alloc next. */
+	for (nb_tx = 0; nb_tx < count; nb_tx++) {
+		txm = (struct rte_mbuf *)(buffers[nb_tx]->buf_addr);
+		if (txm == NULL || txq->nb_tx_free < txm->nb_segs)
+			break;
+
+		tx_last = (txq->last_used + txm->nb_segs - 1) &
+			  (txq->nb_tx_desc - 1);
+		nb_segs = txm->nb_segs;
+		for (i = 0; i < nb_segs; i++) {
+			/* Not enough ring space for tx. */
+			if (txq->last_used == avail_cnt)
+				goto end_of_tx;
+			sw_ring[txq->last_used].mbuf = txm;
+			tx_item = txq->tx_desc_ring + txq->last_used;
+
+			if (!tx_item->len) {
+				(hw->ntb_xstats[NTB_TX_ERRS_ID + off])++;
+				goto end_of_tx;
+			}
+			if (txm->data_len > tx_item->len) {
+				NTB_LOG(ERR, "Data length exceeds buf length."
+					" Only %u data would be transmitted.",
+					tx_item->len);
+				txm->data_len = tx_item->len;
+			}
+
+			/* translate remote virtual addr to bar virtual addr */
+			buf_addr = (*hw->ntb_ops->ioremap)(dev, tx_item->addr);
+			if (buf_addr == NULL) {
+				(hw->ntb_xstats[NTB_TX_ERRS_ID + off])++;
+				NTB_LOG(ERR, "Null remap addr.");
+				goto end_of_tx;
+			}
+			rte_memcpy(buf_addr, rte_pktmbuf_mtod(txm, void *),
+				   txm->data_len);
+
+			tx_used[nb_mbufs].len = txm->data_len;
+			tx_used[nb_mbufs++].flags = (txq->last_used ==
+						    tx_last) ?
+						    NTB_FLAG_EOP : 0;
+
+			/* update stats */
+			bytes += txm->data_len;
+
+			txm = txm->next;
+
+			sw_ring[txq->last_used].next_id = (txq->last_used + 1) &
+						  (txq->nb_tx_desc - 1);
+			sw_ring[txq->last_used].last_id = tx_last;
+			txq->last_used = (txq->last_used + 1) &
+					 (txq->nb_tx_desc - 1);
+		}
+		txq->nb_tx_free -= nb_segs;
+	}
+
+end_of_tx:
+	if (nb_tx) {
+		uint16_t nb1, nb2;
+		if (nb_mbufs > txq->nb_tx_desc - last_used) {
+			nb1 = txq->nb_tx_desc - last_used;
+			nb2 = nb_mbufs - txq->nb_tx_desc + last_used;
+		} else {
+			nb1 = nb_mbufs;
+			nb2 = 0;
+		}
+		rte_memcpy(txq->tx_used_ring + last_used, tx_used,
+			   sizeof(struct ntb_used) * nb1);
+		rte_memcpy(txq->tx_used_ring, tx_used + nb1,
+			   sizeof(struct ntb_used) * nb2);
+		*txq->used_cnt = txq->last_used;
+		rte_wmb();
+
+		/* update queue stats */
+		hw->ntb_xstats[NTB_TX_BYTES_ID + off] += bytes;
+		hw->ntb_xstats[NTB_TX_PKTS_ID + off] += nb_tx;
+	}
+
+	return nb_tx;
 }
 
 static int
@@ -584,16 +698,106 @@ ntb_dequeue_bufs(struct rte_rawdev *dev,
 		 unsigned int count,
 		 rte_rawdev_obj_t context)
 {
-	/* Not FIFO. Just for testing memory read. */
 	struct ntb_hw *hw = dev->dev_private;
-	unsigned int i;
-	size_t size;
+	struct ntb_rx_queue *rxq = hw->rx_queues[(size_t)context];
+	struct ntb_rx_entry *sw_ring = rxq->sw_ring;
+	struct ntb_desc rx_desc[NTB_MAX_DESC_SIZE];
+	struct rte_mbuf *first, *rxm_t;
+	struct rte_mbuf *prev = NULL;
+	volatile struct ntb_used *rx_item;
+	uint16_t nb_mbufs = 0;
+	uint16_t nb_rx = 0;
+	uint64_t bytes = 0;
+	uint16_t off, last_avail, used_cnt, used_nb;
+	int i;
 
-	size = (size_t)context;
+	if (unlikely(dev->started == 0 || hw->peer_dev_up == 0)) {
+		NTB_LOG(DEBUG, "Link is not up");
+		return nb_rx;
+	}
+
+	used_cnt = *rxq->used_cnt;
+
+	if (rxq->last_used == used_cnt)
+		return nb_rx;
+
+	last_avail = rxq->last_avail;
+	used_nb = (used_cnt - rxq->last_used) & (rxq->nb_rx_desc - 1);
+	count = RTE_MIN(count, used_nb);
+	for (nb_rx = 0; nb_rx < count; nb_rx++) {
+		i = 0;
+		while (true) {
+			rx_item = rxq->rx_used_ring + rxq->last_used;
+			rxm_t = sw_ring[rxq->last_used].mbuf;
+			rxm_t->data_len = rx_item->len;
+			rxm_t->data_off = RTE_PKTMBUF_HEADROOM;
+			rxm_t->port = rxq->port_id;
+
+			if (!i) {
+				rxm_t->nb_segs = 1;
+				first = rxm_t;
+				first->pkt_len = 0;
+				buffers[nb_rx]->buf_addr = rxm_t;
+			} else {
+				prev->next = rxm_t;
+				first->nb_segs++;
+			}
 
-	for (i = 0; i < count; i++)
-		rte_memcpy(buffers[i]->buf_addr, hw->mz[i]->addr, size);
-	return 0;
+			prev = rxm_t;
+			first->pkt_len += prev->data_len;
+			rxq->last_used = (rxq->last_used + 1) &
+					 (rxq->nb_rx_desc - 1);
+
+			/* alloc new mbuf */
+			rxm_t = rte_mbuf_raw_alloc(rxq->mpool);
+			if (unlikely(rxm_t == NULL)) {
+				NTB_LOG(ERR, "recv alloc mbuf failed.");
+				goto end_of_rx;
+			}
+			rxm_t->port = rxq->port_id;
+			sw_ring[rxq->last_avail].mbuf = rxm_t;
+			i++;
+
+			/* fill new desc */
+			rx_desc[nb_mbufs].addr =
+					rte_pktmbuf_mtod(rxm_t, uint64_t);
+			rx_desc[nb_mbufs++].len = rxm_t->buf_len -
+						  RTE_PKTMBUF_HEADROOM;
+			rxq->last_avail = (rxq->last_avail + 1) &
+					  (rxq->nb_rx_desc - 1);
+
+			if (rx_item->flags & NTB_FLAG_EOP)
+				break;
+		}
+		/* update stats */
+		bytes += first->pkt_len;
+	}
+
+end_of_rx:
+	if (nb_rx) {
+		uint16_t nb1, nb2;
+		if (nb_mbufs > rxq->nb_rx_desc - last_avail) {
+			nb1 = rxq->nb_rx_desc - last_avail;
+			nb2 = nb_mbufs - rxq->nb_rx_desc + last_avail;
+		} else {
+			nb1 = nb_mbufs;
+			nb2 = 0;
+		}
+		rte_memcpy(rxq->rx_desc_ring + last_avail, rx_desc,
+			   sizeof(struct ntb_desc) * nb1);
+		rte_memcpy(rxq->rx_desc_ring, rx_desc + nb1,
+			   sizeof(struct ntb_desc) * nb2);
+		*rxq->avail_cnt = rxq->last_avail;
+		rte_wmb();
+
+		/* update queue stats */
+		off = NTB_XSTATS_NUM * ((size_t)context + 1);
+		hw->ntb_xstats[NTB_RX_BYTES_ID + off] += bytes;
+		hw->ntb_xstats[NTB_RX_PKTS_ID + off] += nb_rx;
+		hw->ntb_xstats[NTB_RX_MISS_ID + off] += (count - nb_rx);
+	}
+
+	return nb_rx;
 }
 
 static void
@@ -1242,7 +1446,7 @@ ntb_remove(struct rte_pci_device *pci_dev)
 
 static struct rte_pci_driver rte_ntb_pmd = {
 	.id_table = pci_id_ntb_map,
-	.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_WC_ACTIVATE,
 	.probe = ntb_probe,
 	.remove = ntb_remove,
 };
diff --git a/drivers/raw/ntb/ntb.h b/drivers/raw/ntb/ntb.h
index 09e28050f..eff1f6f07 100644
--- a/drivers/raw/ntb/ntb.h
+++ b/drivers/raw/ntb/ntb.h
@@ -87,6 +87,7 @@ enum ntb_spad_idx {
  * @ntb_dev_init: Init ntb dev.
  * @get_peer_mw_addr: To get the addr of peer mw[mw_idx].
  * @mw_set_trans: Set translation of internal memory that remote can access.
+ * @ioremap: Translate the remote host address to bar address.
  * @get_link_status: get link status, link speed and link width.
  * @set_link: Set local side up/down.
  * @spad_read: Read local/peer spad register val.
@@ -103,6 +104,7 @@ struct ntb_dev_ops {
 	void *(*get_peer_mw_addr)(const struct rte_rawdev *dev, int mw_idx);
 	int (*mw_set_trans)(const struct rte_rawdev *dev, int mw_idx,
 			    uint64_t addr, uint64_t size);
+	void *(*ioremap)(const struct rte_rawdev *dev, uint64_t addr);
 	int (*get_link_status)(const struct rte_rawdev *dev);
 	int (*set_link)(const struct rte_rawdev *dev, bool up);
 	uint32_t (*spad_read)(const struct rte_rawdev *dev, int spad,
diff --git a/drivers/raw/ntb/ntb_hw_intel.c b/drivers/raw/ntb/ntb_hw_intel.c
index 0e73f1609..e7f8667cd 100644
--- a/drivers/raw/ntb/ntb_hw_intel.c
+++ b/drivers/raw/ntb/ntb_hw_intel.c
@@ -162,6 +162,27 @@ intel_ntb_mw_set_trans(const struct rte_rawdev *dev, int mw_idx,
 	return 0;
 }
 
+static void *
+intel_ntb_ioremap(const struct rte_rawdev *dev, uint64_t addr)
+{
+	struct ntb_hw *hw = dev->dev_private;
+	void *mapped = NULL;
+	void *base;
+	int i;
+
+	for (i = 0; i < hw->peer_used_mws; i++) {
+		if (addr >= hw->peer_mw_base[i] &&
+		    addr <= hw->peer_mw_base[i] + hw->mw_size[i]) {
+			base = intel_ntb_get_peer_mw_addr(dev, i);
+			mapped = (void *)(size_t)(addr - hw->peer_mw_base[i] +
+				 (size_t)base);
+			break;
+		}
+	}
+
+	return mapped;
+}
+
 static int
 intel_ntb_get_link_status(const struct rte_rawdev *dev)
 {
@@ -357,6 +378,7 @@ const struct ntb_dev_ops intel_ntb_ops = {
 	.ntb_dev_init       = intel_ntb_dev_init,
 	.get_peer_mw_addr   = intel_ntb_get_peer_mw_addr,
 	.mw_set_trans       = intel_ntb_mw_set_trans,
+	.ioremap            = intel_ntb_ioremap,
 	.get_link_status    = intel_ntb_get_link_status,
 	.set_link           = intel_ntb_set_link,
 	.spad_read          = intel_ntb_spad_read,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [PATCH v2 4/4] examples/ntb: support more functions for NTB
  2019-09-06  3:02 ` [dpdk-dev] [PATCH v2 " Xiaoyun Li
                     ` (2 preceding siblings ...)
  2019-09-06  3:02   ` [dpdk-dev] [PATCH v2 3/4] raw/ntb: add enqueue and dequeue functions Xiaoyun Li
@ 2019-09-06  3:02   ` Xiaoyun Li
  2019-09-06  7:53   ` [dpdk-dev] [PATCH v3 0/4] enable FIFO " Xiaoyun Li
  4 siblings, 0 replies; 42+ messages in thread
From: Xiaoyun Li @ 2019-09-06  3:02 UTC (permalink / raw)
  To: jingjing.wu, keith.wiles, omkar.maslekar, cunming.liang; +Cc: dev, Xiaoyun Li

Support to transmit files between two systems.
Support iofwd between one ethdev and NTB device.
Support rxonly and txonly for NTB device.
Support to set forwarding mode as file-trans, txonly,
rxonly or iofwd.
Support to show/clear port stats and throughput.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
---
 doc/guides/sample_app_ug/ntb.rst |   59 +-
 examples/ntb/meson.build         |    3 +
 examples/ntb/ntb_fwd.c           | 1298 +++++++++++++++++++++++++++---
 3 files changed, 1232 insertions(+), 128 deletions(-)

diff --git a/doc/guides/sample_app_ug/ntb.rst b/doc/guides/sample_app_ug/ntb.rst
index 079242175..f8291d7d1 100644
--- a/doc/guides/sample_app_ug/ntb.rst
+++ b/doc/guides/sample_app_ug/ntb.rst
@@ -5,8 +5,17 @@ NTB Sample Application
 ======================
 
 The ntb sample application shows how to use ntb rawdev driver.
-This sample provides interactive mode to transmit file between
-two hosts.
+This sample provides interactive mode to do packet based processing
+between two systems.
+
+This sample supports 4 types of packet forwarding mode.
+
+* ``file-trans``: transmit files between two systems. The sample will
+  be polling to receive files from the peer and save the file as
+  ``ntb_recv_file[N]``, [N] represents the number of received file.
+* ``rxonly``: NTB receives packets but doesn't transmit them.
+* ``txonly``: NTB generates and transmits packets without receiving any.
+* ``iofwd``: iofwd between NTB device and ethdev.
 
 Compiling the Application
 -------------------------
@@ -29,6 +38,40 @@ Refer to the *DPDK Getting Started Guide* for general information on
 running applications and the Environment Abstraction Layer (EAL)
 options.
 
+Command-line Options
+--------------------
+
+The application supports the following command-line options.
+
+* ``--buf-size=N``
+
+  Set the data size of the mbufs used to N bytes, where N < 65536.
+  The default value is 2048.
+
+* ``--fwd-mode=mode``
+
+  Set the packet forwarding mode as ``file-trans``, ``txonly``,
+  ``rxonly`` or ``iofwd``.
+
+* ``--nb-desc=N``
+
+  Set number of descriptors of queue as N, namely queue size,
+  where 64 <= N <= 1024. The default value is 1024.
+
+* ``--txfreet=N``
+
+  Set the transmit free threshold of TX rings to N, where 0 <= N <=
+  the value of ``--nb-desc``. The default value is 256.
+
+* ``--burst=N``
+
+  Set the number of packets per burst to N, where 1 <= N <= 32.
+  The default value is 32.
+
+* ``--qp=N``
+
+  Set the number of queues as N, where qp > 0.
+
 Using the application
 ---------------------
 
@@ -41,7 +84,11 @@ The application is console-driven using the cmdline DPDK interface:
 From this interface the available commands and descriptions of what
 they do as as follows:
 
-* ``send [filepath]``: Send file to the peer host.
-* ``receive [filepath]``: Receive file to [filepath]. Need the peer
-  to send file successfully first.
-* ``quit``: Exit program
+* ``send [filepath]``: Send file to the peer host. Need to be in
+  file-trans forwarding mode first.
+* ``start``: Start transmission.
+* ``stop``: Stop transmission.
+* ``show/clear port stats``: Show/Clear port stats and throughput.
+* ``set fwd file-trans/rxonly/txonly/iofwd``: Set packet forwarding
+  mode.
+* ``quit``: Exit program.
diff --git a/examples/ntb/meson.build b/examples/ntb/meson.build
index 9a6288f4f..f5435fe12 100644
--- a/examples/ntb/meson.build
+++ b/examples/ntb/meson.build
@@ -14,3 +14,6 @@ cflags += ['-D_FILE_OFFSET_BITS=64']
 sources = files(
 	'ntb_fwd.c'
 )
+if dpdk_conf.has('RTE_LIBRTE_PMD_NTB_RAWDEV')
+	deps += 'rawdev_ntb'
+endif
diff --git a/examples/ntb/ntb_fwd.c b/examples/ntb/ntb_fwd.c
index f8c970cdb..b1ea71c8f 100644
--- a/examples/ntb/ntb_fwd.c
+++ b/examples/ntb/ntb_fwd.c
@@ -14,21 +14,103 @@
 #include <cmdline.h>
 #include <rte_common.h>
 #include <rte_rawdev.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
 #include <rte_lcore.h>
+#include <rte_cycles.h>
+#include <rte_pmd_ntb.h>
 
-#define NTB_DRV_NAME_LEN	7
-static uint64_t max_file_size = 0x400000;
+/* Per-port statistics struct */
+struct ntb_port_statistics {
+	uint64_t tx;
+	uint64_t rx;
+} __rte_cache_aligned;
+/* Port 0: NTB dev, Port 1: ethdev when iofwd. */
+struct ntb_port_statistics ntb_port_stats[2];
+
+struct ntb_fwd_stream {
+	uint16_t tx_port;
+	uint16_t rx_port;
+	uint16_t qp_id;
+	uint8_t tx_ntb;  /* If ntb device is tx port. */
+};
+
+struct ntb_fwd_lcore_conf {
+	uint16_t stream_id;
+	uint16_t nb_stream;
+	uint8_t stopped;
+};
+
+enum ntb_fwd_mode {
+	FILE_TRANS = 0,
+	RXONLY,
+	TXONLY,
+	IOFWD,
+	MAX_FWD_MODE,
+};
+static const char *const fwd_mode_s[] = {
+	"file-trans",
+	"rxonly",
+	"txonly",
+	"iofwd",
+	NULL,
+};
+static enum ntb_fwd_mode fwd_mode = MAX_FWD_MODE;
+
+static struct ntb_fwd_lcore_conf fwd_lcore_conf[RTE_MAX_LCORE];
+static struct ntb_fwd_stream *fwd_streams;
+
+static struct rte_mempool *mbuf_pool;
+
+#define NTB_DRV_NAME_LEN 7
+#define MEMPOOL_CACHE_SIZE 256
+
+static uint8_t in_test;
 static uint8_t interactive = 1;
+static uint16_t eth_port_id = RTE_MAX_ETHPORTS;
 static uint16_t dev_id;
 
+/* Number of queues, default set as 1 */
+static uint16_t num_queues = 1;
+static uint16_t ntb_buf_size = RTE_MBUF_DEFAULT_BUF_SIZE;
+
+/* Configurable number of descriptors */
+#define NTB_DEFAULT_NUM_DESCS 1024
+static uint16_t nb_desc = NTB_DEFAULT_NUM_DESCS;
+
+static uint16_t tx_free_thresh;
+
+#define NTB_MAX_PKT_BURST 32
+#define NTB_DFLT_PKT_BURST 32
+static uint16_t pkt_burst = NTB_DFLT_PKT_BURST;
+
+#define BURST_TX_RETRIES 64
+
+static struct rte_eth_conf eth_port_conf = {
+	.rxmode = {
+		.mq_mode = ETH_MQ_RX_RSS,
+		.split_hdr_size = 0,
+	},
+	.rx_adv_conf = {
+		.rss_conf = {
+			.rss_key = NULL,
+			.rss_hf = ETH_RSS_IP,
+		},
+	},
+	.txmode = {
+		.mq_mode = ETH_MQ_TX_NONE,
+	},
+};
+
 /* *** Help command with introduction. *** */
 struct cmd_help_result {
 	cmdline_fixed_string_t help;
 };
 
-static void cmd_help_parsed(__attribute__((unused)) void *parsed_result,
-			    struct cmdline *cl,
-			    __attribute__((unused)) void *data)
+static void
+cmd_help_parsed(__attribute__((unused)) void *parsed_result,
+		struct cmdline *cl,
+		__attribute__((unused)) void *data)
 {
 	cmdline_printf(
 		cl,
@@ -37,13 +119,17 @@ static void cmd_help_parsed(__attribute__((unused)) void *parsed_result,
 		"Control:\n"
 		"    quit                                      :"
 		" Quit the application.\n"
-		"\nFile transmit:\n"
+		"\nTransmission:\n"
 		"    send [path]                               :"
-		" Send [path] file. (No more than %"PRIu64")\n"
-		"    recv [path]                            :"
-		" Receive file to [path]. Make sure sending is done"
-		" on the other side.\n",
-		max_file_size
+		" Send [path] file. Only take effect in file-trans mode\n"
+		"    start                                     :"
+		" Start transmissions.\n"
+		"    stop                                      :"
+		" Stop transmissions.\n"
+		"    clear/show port stats                     :"
+		" Clear/show port stats.\n"
+		"    set fwd file-trans/rxonly/txonly/iofwd    :"
+		" Set packet forwarding mode.\n"
 	);
 
 }
@@ -66,13 +152,37 @@ struct cmd_quit_result {
 	cmdline_fixed_string_t quit;
 };
 
-static void cmd_quit_parsed(__attribute__((unused)) void *parsed_result,
-			    struct cmdline *cl,
-			    __attribute__((unused)) void *data)
+static void
+cmd_quit_parsed(__attribute__((unused)) void *parsed_result,
+		struct cmdline *cl,
+		__attribute__((unused)) void *data)
 {
+	struct ntb_fwd_lcore_conf *conf;
+	uint8_t lcore_id;
+
+	/* Stop transmission first. */
+	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
+		conf = &fwd_lcore_conf[lcore_id];
+
+		if (!conf->nb_stream)
+			continue;
+
+		if (conf->stopped)
+			continue;
+
+		conf->stopped = 1;
+	}
+	printf("\nWaiting for lcores to finish...\n");
+	rte_eal_mp_wait_lcore();
+	in_test = 0;
+
 	/* Stop traffic and Close port. */
 	rte_rawdev_stop(dev_id);
 	rte_rawdev_close(dev_id);
+	if (eth_port_id < RTE_MAX_ETHPORTS && fwd_mode == IOFWD) {
+		rte_eth_dev_stop(eth_port_id);
+		rte_eth_dev_close(eth_port_id);
+	}
 
 	cmdline_quit(cl);
 }
@@ -102,21 +212,19 @@ cmd_sendfile_parsed(void *parsed_result,
 		    __attribute__((unused)) void *data)
 {
 	struct cmd_sendfile_result *res = parsed_result;
-	struct rte_rawdev_buf *pkts_send[1];
-	uint64_t rsize, size, link;
-	uint8_t *buff;
+	struct rte_rawdev_buf *pkts_send[NTB_MAX_PKT_BURST];
+	struct rte_mbuf *mbuf_send[NTB_MAX_PKT_BURST];
+	uint64_t size, count, i, nb_burst;
+	uint16_t nb_tx, buf_size;
+	unsigned int nb_pkt;
+	size_t queue_id = 0;
+	uint16_t retry = 0;
 	uint32_t val;
 	FILE *file;
 
-	if (!rte_rawdevs[dev_id].started) {
-		printf("Device needs to be up first. Try later.\n");
-		return;
-	}
-
-	rte_rawdev_get_attr(dev_id, "link_status", &link);
-	if (!link) {
-		printf("Link is not up, cannot send file.\n");
-		return;
+	if (num_queues != 1) {
+		printf("File transmission only supports 1 queue.\n");
+		num_queues = 1;
 	}
 
 	file = fopen(res->filepath, "r");
@@ -127,30 +235,13 @@ cmd_sendfile_parsed(void *parsed_result,
 
 	if (fseek(file, 0, SEEK_END) < 0) {
 		printf("Fail to get file size.\n");
+		fclose(file);
 		return;
 	}
 	size = ftell(file);
 	if (fseek(file, 0, SEEK_SET) < 0) {
 		printf("Fail to get file size.\n");
-		return;
-	}
-
-	/**
-	 * No FIFO now. Only test memory. Limit sending file
-	 * size <= max_file_size.
-	 */
-	if (size > max_file_size) {
-		printf("Warning: The file is too large. Only send first"
-		       " %"PRIu64" bits.\n", max_file_size);
-		size = max_file_size;
-	}
-
-	buff = (uint8_t *)malloc(size);
-	rsize = fread(buff, size, 1, file);
-	if (rsize != 1) {
-		printf("Fail to read file.\n");
 		fclose(file);
-		free(buff);
 		return;
 	}
 
@@ -159,22 +250,63 @@ cmd_sendfile_parsed(void *parsed_result,
 	rte_rawdev_set_attr(dev_id, "spad_user_0", val);
 	val = size;
 	rte_rawdev_set_attr(dev_id, "spad_user_1", val);
+	printf("Sending file, size is %"PRIu64"\n", size);
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		pkts_send[i] = (struct rte_rawdev_buf *)
+				malloc(sizeof(struct rte_rawdev_buf));
+
+	buf_size = ntb_buf_size - RTE_PKTMBUF_HEADROOM;
+	count = (size + buf_size - 1) / buf_size;
+	nb_burst = (count + pkt_burst - 1) / pkt_burst;
 
-	pkts_send[0] = (struct rte_rawdev_buf *)malloc
-			(sizeof(struct rte_rawdev_buf));
-	pkts_send[0]->buf_addr = buff;
+	for (i = 0; i < nb_burst; i++) {
+		val = RTE_MIN(count, pkt_burst);
+		if (rte_mempool_get_bulk(mbuf_pool, (void **)mbuf_send,
+					val) == 0) {
+			for (nb_pkt = 0; nb_pkt < val; nb_pkt++) {
+				mbuf_send[nb_pkt]->port = dev_id;
+				mbuf_send[nb_pkt]->data_len =
+				fread(rte_pktmbuf_mtod(mbuf_send[nb_pkt],
+					void *), 1, buf_size, file);
+				mbuf_send[nb_pkt]->pkt_len =
+					mbuf_send[nb_pkt]->data_len;
+				pkts_send[nb_pkt]->buf_addr = mbuf_send[nb_pkt];
+			}
+		} else {
+			for (nb_pkt = 0; nb_pkt < val; nb_pkt++) {
+				mbuf_send[nb_pkt] =
+					rte_mbuf_raw_alloc(mbuf_pool);
+				if (mbuf_send[nb_pkt] == NULL)
+					break;
+				mbuf_send[nb_pkt]->port = dev_id;
+				mbuf_send[nb_pkt]->data_len =
+				fread(rte_pktmbuf_mtod(mbuf_send[nb_pkt],
+					void *), 1, buf_size, file);
+				mbuf_send[nb_pkt]->pkt_len =
+					mbuf_send[nb_pkt]->data_len;
+				pkts_send[nb_pkt]->buf_addr = mbuf_send[nb_pkt];
+			}
+		}
 
-	if (rte_rawdev_enqueue_buffers(dev_id, pkts_send, 1,
-				       (void *)(size_t)size)) {
-		printf("Fail to enqueue.\n");
-		goto clean;
+		nb_tx = rte_rawdev_enqueue_buffers(dev_id, pkts_send, nb_pkt,
+						   (void *)queue_id);
+		while (nb_tx != nb_pkt && retry < BURST_TX_RETRIES) {
+			rte_delay_us(1);
+			nb_tx += rte_rawdev_enqueue_buffers(dev_id,
+				&pkts_send[nb_tx], nb_pkt - nb_tx,
+				(void *)queue_id);
+		}
+		count -= nb_pkt;
 	}
+	/* Clear register after file sending done. */
+	rte_rawdev_set_attr(dev_id, "spad_user_0", 0);
+	rte_rawdev_set_attr(dev_id, "spad_user_1", 0);
 	printf("Done sending file.\n");
 
-clean:
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		free(pkts_send[i]);
 	fclose(file);
-	free(buff);
-	free(pkts_send[0]);
 }
 
 cmdline_parse_token_string_t cmd_send_file_send =
@@ -195,79 +327,680 @@ cmdline_parse_inst_t cmd_send_file = {
 	},
 };
 
-/* *** RECEIVE FILE PARAMETERS *** */
-struct cmd_recvfile_result {
-	cmdline_fixed_string_t recv_string;
-	char filepath[];
-};
+#define RECV_FILE_LEN 30
+static int
+start_polling_recv_file(void *param)
+{
+	struct rte_rawdev_buf *pkts_recv[NTB_MAX_PKT_BURST];
+	struct ntb_fwd_lcore_conf *conf = param;
+	struct rte_mbuf *mbuf;
+	char filepath[RECV_FILE_LEN];
+	uint64_t val, size, file_len;
+	uint16_t nb_rx, i, file_no;
+	size_t queue_id = 0;
+	FILE *file;
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		pkts_recv[i] = (struct rte_rawdev_buf *)
+				malloc(sizeof(struct rte_rawdev_buf));
+
+	file_no = 0;
+	while (!conf->stopped) {
+		snprintf(filepath, RECV_FILE_LEN, "ntb_recv_file%d", file_no);
+		file = fopen(filepath, "w");
+		if (file == NULL) {
+			printf("Fail to open the file.\n");
+			return -EINVAL;
+		}
+
+		rte_rawdev_get_attr(dev_id, "spad_user_0", &val);
+		size = val << 32;
+		rte_rawdev_get_attr(dev_id, "spad_user_1", &val);
+		size |= val;
+
+		if (!size) {
+			fclose(file);
+			continue;
+		}
+
+		file_len = 0;
+		nb_rx = NTB_MAX_PKT_BURST;
+		while (file_len < size && !conf->stopped) {
+			nb_rx = rte_rawdev_dequeue_buffers(dev_id, pkts_recv,
+						pkt_burst, (void *)queue_id);
+			ntb_port_stats[0].rx += nb_rx;
+			for (i = 0; i < nb_rx; i++) {
+				mbuf = pkts_recv[i]->buf_addr;
+				fwrite(rte_pktmbuf_mtod(mbuf, void *), 1,
+					mbuf->data_len, file);
+				file_len += mbuf->data_len;
+				rte_pktmbuf_free(mbuf);
+				pkts_recv[i]->buf_addr = NULL;
+			}
+		}
+
+		printf("Received file (size: %" PRIu64 ") from peer to %s.\n",
+			size, filepath);
+		fclose(file);
+		file_no++;
+	}
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		free(pkts_recv[i]);
+	return 0;
+}
+
+static int
+start_iofwd_per_lcore(void *param)
+{
+	struct rte_rawdev_buf *ntb_buf[NTB_MAX_PKT_BURST];
+	struct rte_mbuf *pkts_burst[NTB_MAX_PKT_BURST];
+	struct ntb_fwd_lcore_conf *conf = param;
+	struct ntb_fwd_stream fs;
+	uint16_t nb_rx, nb_tx;
+	int i, j;
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		ntb_buf[i] = (struct rte_rawdev_buf *)
+			     malloc(sizeof(struct rte_rawdev_buf));
+
+	while (!conf->stopped) {
+		for (i = 0; i < conf->nb_stream; i++) {
+			fs = fwd_streams[conf->stream_id + i];
+			if (fs.tx_ntb) {
+				nb_rx = rte_eth_rx_burst(fs.rx_port,
+						fs.qp_id, pkts_burst,
+						pkt_burst);
+				if (unlikely(nb_rx == 0))
+					continue;
+				for (j = 0; j < nb_rx; j++)
+					ntb_buf[j]->buf_addr = pkts_burst[j];
+				nb_tx =
+				rte_rawdev_enqueue_buffers(fs.tx_port,
+						ntb_buf, nb_rx,
+						(void *)(size_t)fs.qp_id);
+				ntb_port_stats[0].tx += nb_tx;
+				ntb_port_stats[1].rx += nb_rx;
+			} else {
+				nb_rx =
+				rte_rawdev_dequeue_buffers(fs.rx_port,
+						ntb_buf, pkt_burst,
+						(void *)(size_t)fs.qp_id);
+				if (unlikely(nb_rx == 0))
+					continue;
+				for (j = 0; j < nb_rx; j++)
+					pkts_burst[j] = ntb_buf[j]->buf_addr;
+				nb_tx = rte_eth_tx_burst(fs.tx_port,
+					fs.qp_id, pkts_burst, nb_rx);
+				ntb_port_stats[1].tx += nb_tx;
+				ntb_port_stats[0].rx += nb_rx;
+			}
+			if (unlikely(nb_tx < nb_rx)) {
+				do {
+					rte_pktmbuf_free(pkts_burst[nb_tx]);
+				} while (++nb_tx < nb_rx);
+			}
+		}
+	}
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		free(ntb_buf[i]);
+
+	return 0;
+}
+
+static int
+start_rxonly_per_lcore(void *param)
+{
+	struct rte_rawdev_buf *ntb_buf[NTB_MAX_PKT_BURST];
+	struct ntb_fwd_lcore_conf *conf = param;
+	struct ntb_fwd_stream fs;
+	uint16_t nb_rx;
+	int i, j;
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		ntb_buf[i] = (struct rte_rawdev_buf *)
+			     malloc(sizeof(struct rte_rawdev_buf));
+
+	while (!conf->stopped) {
+		for (i = 0; i < conf->nb_stream; i++) {
+			fs = fwd_streams[conf->stream_id + i];
+			nb_rx = rte_rawdev_dequeue_buffers(fs.rx_port,
+				ntb_buf, pkt_burst, (void *)(size_t)fs.qp_id);
+			if (unlikely(nb_rx == 0))
+				continue;
+			ntb_port_stats[0].rx += nb_rx;
+
+			for (j = 0; j < nb_rx; j++)
+				rte_pktmbuf_free(ntb_buf[j]->buf_addr);
+		}
+	}
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		free(ntb_buf[i]);
+
+	return 0;
+}
+
+
+static int
+start_txonly_per_lcore(void *param)
+{
+	struct rte_rawdev_buf *ntb_buf[NTB_MAX_PKT_BURST];
+	struct rte_mbuf *pkts_burst[NTB_MAX_PKT_BURST];
+	struct ntb_fwd_lcore_conf *conf = param;
+	struct ntb_fwd_stream fs;
+	uint16_t nb_pkt, nb_tx;
+	int i;
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		ntb_buf[i] = (struct rte_rawdev_buf *)
+			     malloc(sizeof(struct rte_rawdev_buf));
+
+	while (!conf->stopped) {
+		for (i = 0; i < conf->nb_stream; i++) {
+			fs = fwd_streams[conf->stream_id + i];
+			if (rte_mempool_get_bulk(mbuf_pool, (void **)pkts_burst,
+				  pkt_burst) == 0) {
+				for (nb_pkt = 0; nb_pkt < pkt_burst; nb_pkt++) {
+					pkts_burst[nb_pkt]->port = dev_id;
+					pkts_burst[nb_pkt]->data_len =
+						pkts_burst[nb_pkt]->buf_len -
+						RTE_PKTMBUF_HEADROOM;
+					pkts_burst[nb_pkt]->pkt_len =
+						pkts_burst[nb_pkt]->data_len;
+					ntb_buf[nb_pkt]->buf_addr =
+						pkts_burst[nb_pkt];
+				}
+			} else {
+				for (nb_pkt = 0; nb_pkt < pkt_burst; nb_pkt++) {
+					pkts_burst[nb_pkt] =
+						rte_pktmbuf_alloc(mbuf_pool);
+					if (pkts_burst[nb_pkt] == NULL)
+						break;
+					pkts_burst[nb_pkt]->port = dev_id;
+					pkts_burst[nb_pkt]->data_len =
+						pkts_burst[nb_pkt]->buf_len -
+						RTE_PKTMBUF_HEADROOM;
+					pkts_burst[nb_pkt]->pkt_len =
+						pkts_burst[nb_pkt]->data_len;
+					ntb_buf[nb_pkt]->buf_addr =
+						pkts_burst[nb_pkt];
+				}
+			}
+			nb_tx = rte_rawdev_enqueue_buffers(fs.tx_port,
+				ntb_buf, nb_pkt, (void *)(size_t)fs.qp_id);
+			ntb_port_stats[0].tx += nb_tx;
+			if (unlikely(nb_tx < nb_pkt)) {
+				do {
+					rte_pktmbuf_free(
+						ntb_buf[nb_tx]->buf_addr);
+				} while (++nb_tx < nb_pkt);
+			}
+		}
+	}
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		free(ntb_buf[i]);
+
+	return 0;
+}
+
+static int
+ntb_fwd_config_setup(void)
+{
+	uint16_t i;
+
+	/* Make sure iofwd has valid ethdev. */
+	if (fwd_mode == IOFWD && eth_port_id >= RTE_MAX_ETHPORTS) {
+		printf("No ethdev, cannot be in iofwd mode.");
+		return -EINVAL;
+	}
+
+	if (fwd_mode == IOFWD) {
+		fwd_streams = rte_zmalloc("ntb_fwd: fwd_streams",
+			sizeof(struct ntb_fwd_stream) * num_queues * 2,
+			RTE_CACHE_LINE_SIZE);
+		for (i = 0; i < num_queues; i++) {
+			fwd_streams[i * 2].qp_id = i;
+			fwd_streams[i * 2].tx_port = dev_id;
+			fwd_streams[i * 2].rx_port = eth_port_id;
+			fwd_streams[i * 2].tx_ntb = 1;
+
+			fwd_streams[i * 2 + 1].qp_id = i;
+			fwd_streams[i * 2 + 1].tx_port = eth_port_id;
+			fwd_streams[i * 2 + 1].rx_port = dev_id;
+			fwd_streams[i * 2 + 1].tx_ntb = 0;
+		}
+		return 0;
+	}
+
+	if (fwd_mode == RXONLY || fwd_mode == FILE_TRANS) {
+		/* Only support 1 queue in file-trans for in order. */
+		if (fwd_mode == FILE_TRANS)
+			num_queues = 1;
+
+		fwd_streams = rte_zmalloc("ntb_fwd: fwd_streams",
+				sizeof(struct ntb_fwd_stream) * num_queues,
+				RTE_CACHE_LINE_SIZE);
+		for (i = 0; i < num_queues; i++) {
+			fwd_streams[i].qp_id = i;
+			fwd_streams[i].tx_port = RTE_MAX_ETHPORTS;
+			fwd_streams[i].rx_port = dev_id;
+			fwd_streams[i].tx_ntb = 0;
+		}
+		return 0;
+	}
+
+	if (fwd_mode == TXONLY) {
+		fwd_streams = rte_zmalloc("ntb_fwd: fwd_streams",
+				sizeof(struct ntb_fwd_stream) * num_queues,
+				RTE_CACHE_LINE_SIZE);
+		for (i = 0; i < num_queues; i++) {
+			fwd_streams[i].qp_id = i;
+			fwd_streams[i].tx_port = dev_id;
+			fwd_streams[i].rx_port = RTE_MAX_ETHPORTS;
+			fwd_streams[i].tx_ntb = 1;
+		}
+	}
+	return 0;
+}
 
 static void
-cmd_recvfile_parsed(void *parsed_result,
-		    __attribute__((unused)) struct cmdline *cl,
-		    __attribute__((unused)) void *data)
+assign_stream_to_lcores(void)
 {
-	struct cmd_sendfile_result *res = parsed_result;
-	struct rte_rawdev_buf *pkts_recv[1];
-	uint8_t *buff;
-	uint64_t val;
-	size_t size;
-	FILE *file;
+	struct ntb_fwd_lcore_conf *conf;
+	struct ntb_fwd_stream *fs;
+	uint16_t nb_streams, sm_per_lcore, sm_id, i;
+	uint8_t lcore_id, lcore_num, nb_extra;
 
-	if (!rte_rawdevs[dev_id].started) {
-		printf("Device needs to be up first. Try later.\n");
-		return;
+	lcore_num = rte_lcore_count();
+	/* Exclude master core */
+	lcore_num--;
+
+	nb_streams = (fwd_mode == IOFWD) ? num_queues * 2 : num_queues;
+
+	sm_per_lcore = nb_streams / lcore_num;
+	nb_extra = nb_streams % lcore_num;
+	sm_id = 0;
+	i = 0;
+
+	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
+		conf = &fwd_lcore_conf[lcore_id];
+
+		if (i < nb_extra) {
+			conf->nb_stream = sm_per_lcore + 1;
+			conf->stream_id = sm_id;
+			sm_id = sm_id + sm_per_lcore + 1;
+		} else {
+			conf->nb_stream = sm_per_lcore;
+			conf->stream_id = sm_id;
+			sm_id = sm_id + sm_per_lcore;
+		}
+
+		i++;
+		if (sm_id >= nb_streams)
+			break;
+	}
+
+	/* Print packet forwading config. */
+	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
+		conf = &fwd_lcore_conf[lcore_id];
+
+		if (!conf->nb_stream)
+			continue;
+
+		printf("Streams on Lcore %u :\n", lcore_id);
+		for (i = 0; i < conf->nb_stream; i++) {
+			fs = &fwd_streams[conf->stream_id + i];
+			if (fwd_mode == IOFWD)
+				printf(" + Stream %u : %s%u RX -> %s%u TX,"
+					" Q=%u\n", conf->stream_id + i,
+					fs->tx_ntb ? "Eth" : "NTB", fs->rx_port,
+					fs->tx_ntb ? "NTB" : "Eth", fs->tx_port,
+					fs->qp_id);
+			if (fwd_mode == FILE_TRANS || fwd_mode == RXONLY)
+				printf(" + Stream %u : %s%u RX only\n",
+					conf->stream_id, "NTB", fs->rx_port);
+			if (fwd_mode == TXONLY)
+				printf(" + Stream %u : %s%u TX only\n",
+					conf->stream_id, "NTB", fs->tx_port);
+		}
 	}
+}
 
-	rte_rawdev_get_attr(dev_id, "link_status", &val);
-	if (!val) {
-		printf("Link is not up, cannot receive file.\n");
+static void
+start_pkt_fwd(void)
+{
+	struct ntb_fwd_lcore_conf *conf;
+	struct rte_eth_link eth_link;
+	uint8_t lcore_id;
+	int ret, i;
+
+	ret = ntb_fwd_config_setup();
+	if (ret < 0) {
+		printf("Cannot start traffic. Please reset fwd mode.\n");
 		return;
 	}
 
-	file = fopen(res->filepath, "w");
-	if (file == NULL) {
-		printf("Fail to open the file.\n");
+	/* If using iofwd, checking ethdev link status first. */
+	if (fwd_mode == IOFWD) {
+		printf("Checking eth link status...\n");
+		/* Wait for eth link up at most 100 times. */
+		for (i = 0; i < 100; i++) {
+			rte_eth_link_get(eth_port_id, &eth_link);
+			if (eth_link.link_status) {
+				printf("Eth%u Link Up. Speed %u Mbps - %s\n",
+					eth_port_id, eth_link.link_speed,
+					(eth_link.link_duplex ==
+					 ETH_LINK_FULL_DUPLEX) ?
+					("full-duplex") : ("half-duplex"));
+				break;
+			}
+		}
+		if (!eth_link.link_status) {
+			printf("Eth%u link down. Cannot start traffic.\n",
+				eth_port_id);
+			return;
+		}
+	}
+
+	assign_stream_to_lcores();
+	in_test = 1;
+
+	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
+		conf = &fwd_lcore_conf[lcore_id];
+
+		if (!conf->nb_stream)
+			continue;
+
+		conf->stopped = 0;
+		if (fwd_mode == FILE_TRANS)
+			rte_eal_remote_launch(start_polling_recv_file,
+					      conf, lcore_id);
+		else if (fwd_mode == IOFWD)
+			rte_eal_remote_launch(start_iofwd_per_lcore,
+					      conf, lcore_id);
+		else if (fwd_mode == RXONLY)
+			rte_eal_remote_launch(start_rxonly_per_lcore,
+					      conf, lcore_id);
+		else if (fwd_mode == TXONLY)
+			rte_eal_remote_launch(start_txonly_per_lcore,
+					      conf, lcore_id);
+	}
+}
+
+/* *** START FWD PARAMETERS *** */
+struct cmd_start_result {
+	cmdline_fixed_string_t start;
+};
+
+static void
+cmd_start_parsed(__attribute__((unused)) void *parsed_result,
+			    __attribute__((unused)) struct cmdline *cl,
+			    __attribute__((unused)) void *data)
+{
+	start_pkt_fwd();
+}
+
+cmdline_parse_token_string_t cmd_start_start =
+		TOKEN_STRING_INITIALIZER(struct cmd_start_result, start, "start");
+
+cmdline_parse_inst_t cmd_start = {
+	.f = cmd_start_parsed,
+	.data = NULL,
+	.help_str = "start pkt fwd between ntb and ethdev",
+	.tokens = {
+		(void *)&cmd_start_start,
+		NULL,
+	},
+};
+
+/* *** STOP *** */
+struct cmd_stop_result {
+	cmdline_fixed_string_t stop;
+};
+
+static void
+cmd_stop_parsed(__attribute__((unused)) void *parsed_result,
+		__attribute__((unused)) struct cmdline *cl,
+		__attribute__((unused)) void *data)
+{
+	struct ntb_fwd_lcore_conf *conf;
+	uint8_t lcore_id;
+
+	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
+		conf = &fwd_lcore_conf[lcore_id];
+
+		if (!conf->nb_stream)
+			continue;
+
+		if (conf->stopped)
+			continue;
+
+		conf->stopped = 1;
+	}
+	printf("\nWaiting for lcores to finish...\n");
+	rte_eal_mp_wait_lcore();
+	in_test = 0;
+	printf("\nDone.\n");
+}
+
+cmdline_parse_token_string_t cmd_stop_stop =
+		TOKEN_STRING_INITIALIZER(struct cmd_stop_result, stop, "stop");
+
+cmdline_parse_inst_t cmd_stop = {
+	.f = cmd_stop_parsed,
+	.data = NULL,
+	.help_str = "stop: Stop packet forwarding",
+	.tokens = {
+		(void *)&cmd_stop_stop,
+		NULL,
+	},
+};
+
+static void
+ntb_stats_clear(void)
+{
+	int nb_ids, i;
+	uint32_t *ids;
+
+	/* Clear NTB dev stats */
+	nb_ids = rte_rawdev_xstats_names_get(dev_id, NULL, 0);
+	if (nb_ids  < 0) {
+		printf("Error: Cannot get count of xstats\n");
 		return;
 	}
+	ids = malloc(sizeof(uint32_t) * nb_ids);
+	for (i = 0; i < nb_ids; i++)
+		ids[i] = i;
+	rte_rawdev_xstats_reset(dev_id, ids, nb_ids);
+	printf("\n  statistics for NTB port %d cleared\n", dev_id);
+
+	/* Clear Ethdev stats if have any */
+	if (fwd_mode == IOFWD && eth_port_id != RTE_MAX_ETHPORTS) {
+		rte_eth_stats_reset(eth_port_id);
+		printf("\n  statistics for ETH port %d cleared\n", eth_port_id);
+	}
+}
+
+static inline void
+ntb_calculate_throughput(uint16_t port) {
+	uint64_t diff_pkts_rx, diff_pkts_tx, diff_cycles;
+	uint64_t mpps_rx, mpps_tx;
+	static uint64_t prev_pkts_rx[2];
+	static uint64_t prev_pkts_tx[2];
+	static uint64_t prev_cycles[2];
+
+	diff_cycles = prev_cycles[port];
+	prev_cycles[port] = rte_rdtsc();
+	if (diff_cycles > 0)
+		diff_cycles = prev_cycles[port] - diff_cycles;
+	diff_pkts_rx = (ntb_port_stats[port].rx > prev_pkts_rx[port]) ?
+		(ntb_port_stats[port].rx - prev_pkts_rx[port]) : 0;
+	diff_pkts_tx = (ntb_port_stats[port].tx > prev_pkts_tx[port]) ?
+		(ntb_port_stats[port].tx - prev_pkts_tx[port]) : 0;
+	prev_pkts_rx[port] = ntb_port_stats[port].rx;
+	prev_pkts_tx[port] = ntb_port_stats[port].tx;
+	mpps_rx = diff_cycles > 0 ?
+		diff_pkts_rx * rte_get_tsc_hz() / diff_cycles : 0;
+	mpps_tx = diff_cycles > 0 ?
+		diff_pkts_tx * rte_get_tsc_hz() / diff_cycles : 0;
+	printf("  Throughput (since last show)\n");
+	printf("  Rx-pps: %12"PRIu64"\n  Tx-pps: %12"PRIu64"\n",
+			mpps_rx, mpps_tx);
+
+}
+
+static void
+ntb_stats_display(void)
+{
+	struct rte_rawdev_xstats_name *xstats_names;
+	struct rte_eth_stats stats;
+	uint64_t *values;
+	uint32_t *ids;
+	int nb_ids, i;
 
-	rte_rawdev_get_attr(dev_id, "spad_user_0", &val);
-	size = val << 32;
-	rte_rawdev_get_attr(dev_id, "spad_user_1", &val);
-	size |= val;
+	printf("###### statistics for NTB port %d #######\n", dev_id);
 
-	buff = (uint8_t *)malloc(size);
-	pkts_recv[0] = (struct rte_rawdev_buf *)malloc
-			(sizeof(struct rte_rawdev_buf));
-	pkts_recv[0]->buf_addr = buff;
+	/* Get NTB dev stats and stats names */
+	nb_ids = rte_rawdev_xstats_names_get(dev_id, NULL, 0);
+	if (nb_ids  < 0) {
+		printf("Error: Cannot get count of xstats\n");
+		return;
+	}
+	xstats_names = malloc(sizeof(struct rte_rawdev_xstats_name) * nb_ids);
+	if (xstats_names == NULL) {
+		printf("Cannot allocate memory for xstats lookup\n");
+		return;
+	}
+	if (nb_ids != rte_rawdev_xstats_names_get(
+			dev_id, xstats_names, nb_ids)) {
+		printf("Error: Cannot get xstats lookup\n");
+		free(xstats_names);
+		return;
+	}
+	ids = malloc(sizeof(uint32_t) * nb_ids);
+	for (i = 0; i < nb_ids; i++)
+		ids[i] = i;
+	values = malloc(sizeof(uint64_t) * nb_ids);
+	if (nb_ids != rte_rawdev_xstats_get(dev_id, ids, values, nb_ids)) {
+		printf("Error: Unable to get xstats\n");
+		free(xstats_names);
+		free(values);
+		free(ids);
+		return;
+	}
+
+	/* Display NTB dev stats */
+	for (i = 0; i < nb_ids; i++)
+		printf("  %s: %"PRIu64"\n", xstats_names[i].name, values[i]);
+	ntb_calculate_throughput(0);
 
-	if (rte_rawdev_dequeue_buffers(dev_id, pkts_recv, 1, (void *)size)) {
-		printf("Fail to dequeue.\n");
-		goto clean;
+	/* Get Ethdev stats if have any */
+	if (fwd_mode == IOFWD && eth_port_id != RTE_MAX_ETHPORTS) {
+		printf("###### statistics for ETH port %d ######\n",
+			eth_port_id);
+		rte_eth_stats_get(eth_port_id, &stats);
+		printf("  RX-packets: %"PRIu64"\n", stats.ipackets);
+		printf("  RX-bytes: %"PRIu64"\n", stats.ibytes);
+		printf("  RX-errors: %"PRIu64"\n", stats.ierrors);
+		printf("  RX-missed: %"PRIu64"\n", stats.imissed);
+		printf("  TX-packets: %"PRIu64"\n", stats.opackets);
+		printf("  TX-bytes: %"PRIu64"\n", stats.obytes);
+		printf("  TX-errors: %"PRIu64"\n", stats.oerrors);
+		ntb_calculate_throughput(1);
 	}
 
-	fwrite(buff, size, 1, file);
-	printf("Done receiving to file.\n");
+	free(xstats_names);
+	free(values);
+	free(ids);
+}
 
-clean:
-	fclose(file);
-	free(buff);
-	free(pkts_recv[0]);
+/* *** SHOW/CLEAR PORT STATS *** */
+struct cmd_stats_result {
+	cmdline_fixed_string_t show;
+	cmdline_fixed_string_t port;
+	cmdline_fixed_string_t stats;
+};
+
+static void
+cmd_stats_parsed(void *parsed_result,
+		 __attribute__((unused)) struct cmdline *cl,
+		 __attribute__((unused)) void *data)
+{
+	struct cmd_stats_result *res = parsed_result;
+	if (!strcmp(res->show, "clear"))
+		ntb_stats_clear();
+	else
+		ntb_stats_display();
 }
 
-cmdline_parse_token_string_t cmd_recv_file_recv =
-	TOKEN_STRING_INITIALIZER(struct cmd_recvfile_result, recv_string,
-				 "recv");
-cmdline_parse_token_string_t cmd_recv_file_filepath =
-	TOKEN_STRING_INITIALIZER(struct cmd_recvfile_result, filepath, NULL);
+cmdline_parse_token_string_t cmd_stats_show =
+	TOKEN_STRING_INITIALIZER(struct cmd_stats_result, show, "show#clear");
+cmdline_parse_token_string_t cmd_stats_port =
+	TOKEN_STRING_INITIALIZER(struct cmd_stats_result, port, "port");
+cmdline_parse_token_string_t cmd_stats_stats =
+	TOKEN_STRING_INITIALIZER(struct cmd_stats_result, stats, "stats");
 
 
-cmdline_parse_inst_t cmd_recv_file = {
-	.f = cmd_recvfile_parsed,
+cmdline_parse_inst_t cmd_stats = {
+	.f = cmd_stats_parsed,
 	.data = NULL,
-	.help_str = "recv <file_path>",
+	.help_str = "show|clear port stats",
 	.tokens = {
-		(void *)&cmd_recv_file_recv,
-		(void *)&cmd_recv_file_filepath,
+		(void *)&cmd_stats_show,
+		(void *)&cmd_stats_port,
+		(void *)&cmd_stats_stats,
+		NULL,
+	},
+};
+
+/* *** SET FORWARDING MODE *** */
+struct cmd_set_fwd_mode_result {
+	cmdline_fixed_string_t set;
+	cmdline_fixed_string_t fwd;
+	cmdline_fixed_string_t mode;
+};
+
+static void
+cmd_set_fwd_mode_parsed(__attribute__((unused)) void *parsed_result,
+			__attribute__((unused)) struct cmdline *cl,
+			__attribute__((unused)) void *data)
+{
+	struct cmd_set_fwd_mode_result *res = parsed_result;
+	int i;
+
+	if (in_test) {
+		printf("Please stop traffic first.\n");
+		return;
+	}
+
+	for (i = 0; i < MAX_FWD_MODE; i++) {
+		if (!strcmp(res->mode, fwd_mode_s[i])) {
+			fwd_mode = i;
+			return;
+		}
+	}
+	printf("Invalid %s packet forwarding mode.\n", res->mode);
+}
+
+cmdline_parse_token_string_t cmd_setfwd_set =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_fwd_mode_result, set, "set");
+cmdline_parse_token_string_t cmd_setfwd_fwd =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_fwd_mode_result, fwd, "fwd");
+cmdline_parse_token_string_t cmd_setfwd_mode =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_fwd_mode_result, mode,
+				"file-trans#iofwd#txonly#rxonly");
+
+cmdline_parse_inst_t cmd_set_fwd_mode = {
+	.f = cmd_set_fwd_mode_parsed,
+	.data = NULL,
+	.help_str = "set forwarding mode as file-trans|rxonly|txonly|iofwd",
+	.tokens = {
+		(void *)&cmd_setfwd_set,
+		(void *)&cmd_setfwd_fwd,
+		(void *)&cmd_setfwd_mode,
 		NULL,
 	},
 };
@@ -276,7 +1009,10 @@ cmdline_parse_inst_t cmd_recv_file = {
 cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_help,
 	(cmdline_parse_inst_t *)&cmd_send_file,
-	(cmdline_parse_inst_t *)&cmd_recv_file,
+	(cmdline_parse_inst_t *)&cmd_start,
+	(cmdline_parse_inst_t *)&cmd_stop,
+	(cmdline_parse_inst_t *)&cmd_stats,
+	(cmdline_parse_inst_t *)&cmd_set_fwd_mode,
 	(cmdline_parse_inst_t *)&cmd_quit,
 	NULL,
 };
@@ -305,45 +1041,257 @@ signal_handler(int signum)
 	}
 }
 
+#define OPT_BUF_SIZE         "buf-size"
+#define OPT_FWD_MODE         "fwd-mode"
+#define OPT_NB_DESC          "nb-desc"
+#define OPT_TXFREET          "txfreet"
+#define OPT_BURST            "burst"
+#define OPT_QP               "qp"
+
+enum {
+	/* long options mapped to a short option */
+	OPT_NO_ZERO_COPY_NUM = 1,
+	OPT_BUF_SIZE_NUM,
+	OPT_FWD_MODE_NUM,
+	OPT_NB_DESC_NUM,
+	OPT_TXFREET_NUM,
+	OPT_BURST_NUM,
+	OPT_QP_NUM,
+};
+
+static const char short_options[] =
+	"i" /* interactive mode */
+	;
+
+static const struct option lgopts[] = {
+	{OPT_BUF_SIZE,     1, NULL, OPT_BUF_SIZE_NUM     },
+	{OPT_FWD_MODE,     1, NULL, OPT_FWD_MODE_NUM     },
+	{OPT_NB_DESC,      1, NULL, OPT_NB_DESC_NUM      },
+	{OPT_TXFREET,      1, NULL, OPT_TXFREET_NUM      },
+	{OPT_BURST,        1, NULL, OPT_BURST_NUM        },
+	{OPT_QP,           1, NULL, OPT_QP_NUM           },
+	{0,                0, NULL, 0                    }
+};
+
 static void
 ntb_usage(const char *prgname)
 {
 	printf("%s [EAL options] -- [options]\n"
-	       "-i : run in interactive mode (default value is 1)\n",
-	       prgname);
+	       "-i: run in interactive mode.\n"
+	       "-qp=N: set number of queues as N (N > 0, default: 1).\n"
+	       "--fwd-mode=N: set fwd mode (N: file-trans | rxonly | "
+	       "txonly | iofwd, default: file-trans)\n"
+	       "--buf-size=N: set mbuf dataroom size as N (0 < N < 65535,"
+	       " default: 2048).\n"
+	       "--nb-desc=N: set number of descriptors as N (%u <= N <= %u,"
+	       " default: 1024).\n"
+	       "--txfreet=N: set tx free thresh for NTB driver as N. (N >= 0)\n"
+	       "--burst=N: set pkt burst as N (0 < N <= %u default: 32).\n",
+	       prgname, NTB_MIN_DESC_SIZE, NTB_MAX_DESC_SIZE,
+	       NTB_MAX_PKT_BURST);
 }
 
-static int
-parse_args(int argc, char **argv)
+static void
+ntb_parse_args(int argc, char **argv)
 {
 	char *prgname = argv[0], **argvopt = argv;
-	int opt, ret;
+	int opt, opt_idx, n, i;
 
-	/* Only support interactive mode to send/recv file first. */
-	while ((opt = getopt(argc, argvopt, "i")) != EOF) {
+	while ((opt = getopt_long(argc, argvopt, short_options,
+				lgopts, &opt_idx)) != EOF) {
 		switch (opt) {
 		case 'i':
-			printf("Interactive-mode selected\n");
+			printf("Interactive-mode selected.\n");
 			interactive = 1;
 			break;
+		case OPT_QP_NUM:
+			n = atoi(optarg);
+			if (n > 0)
+				num_queues = n;
+			else
+				rte_exit(EXIT_FAILURE, "q must be > 0.\n");
+			break;
+		case OPT_BUF_SIZE_NUM:
+			n = atoi(optarg);
+			if (n > RTE_PKTMBUF_HEADROOM && n <= 0xFFFF)
+				ntb_buf_size = n;
+			else
+				rte_exit(EXIT_FAILURE, "buf-size must be > "
+					"%u and < 65536.\n",
+					RTE_PKTMBUF_HEADROOM);
+			break;
+		case OPT_FWD_MODE_NUM:
+			for (i = 0; i < MAX_FWD_MODE; i++) {
+				if (!strcmp(optarg, fwd_mode_s[i])) {
+					fwd_mode = i;
+					break;
+				}
+			}
+			if (i == MAX_FWD_MODE)
+				rte_exit(EXIT_FAILURE, "Unsupported mode. "
+				"(Should be: file-trans | rxonly | txonly "
+				"| iofwd)\n");
+			break;
+		case OPT_NB_DESC_NUM:
+			n = atoi(optarg);
+			if (n >= NTB_MIN_DESC_SIZE && n <= NTB_MAX_DESC_SIZE)
+				nb_desc = n;
+			else
+				rte_exit(EXIT_FAILURE, "nb-desc must be within"
+					" [%u, %u].\n", NTB_MIN_DESC_SIZE,
+					NTB_MAX_DESC_SIZE);
+			break;
+		case OPT_TXFREET_NUM:
+			n = atoi(optarg);
+			if (n >= 0)
+				tx_free_thresh = n;
+			else
+				rte_exit(EXIT_FAILURE, "txfreet must be"
+					" >= 0\n");
+			break;
+		case OPT_BURST_NUM:
+			n = atoi(optarg);
+			if (n > 0 && n <= NTB_MAX_PKT_BURST)
+				pkt_burst = n;
+			else
+				rte_exit(EXIT_FAILURE, "burst must be within "
+					"(0, %u].\n", NTB_MAX_PKT_BURST);
+			break;
 
 		default:
 			ntb_usage(prgname);
-			return -1;
+			rte_exit(EXIT_FAILURE,
+				 "Command line is incomplete or incorrect.\n");
+			break;
 		}
 	}
+}
 
-	if (optind >= 0)
-		argv[optind-1] = prgname;
+static void
+ntb_mempool_mz_free(__rte_unused struct rte_mempool_memhdr *memhdr,
+		void *opaque)
+{
+	const struct rte_memzone *mz = opaque;
+	rte_memzone_free(mz);
+}
 
-	ret = optind-1;
-	optind = 1; /* reset getopt lib */
-	return ret;
+static struct rte_mempool *
+ntb_mbuf_pool_create(uint16_t mbuf_seg_size, uint32_t nb_mbuf,
+		     struct ntb_dev_info ntb_info,
+		     struct ntb_dev_config *ntb_conf,
+		     unsigned int socket_id)
+{
+	size_t mz_len, total_elt_sz, max_mz_len, left_sz;
+	struct rte_pktmbuf_pool_private mbp_priv;
+	char pool_name[RTE_MEMPOOL_NAMESIZE];
+	char mz_name[RTE_MEMZONE_NAMESIZE];
+	const struct rte_memzone *mz;
+	struct rte_mempool *mp;
+	uint64_t align;
+	uint32_t mz_id;
+	int ret;
+
+	snprintf(pool_name, sizeof(pool_name), "ntb_mbuf_pool_%u", socket_id);
+	mp = rte_mempool_create_empty(pool_name, nb_mbuf,
+				      (mbuf_seg_size + sizeof(struct rte_mbuf)),
+				      MEMPOOL_CACHE_SIZE,
+				      sizeof(struct rte_pktmbuf_pool_private),
+				      socket_id, 0);
+	if (mp == NULL)
+		return NULL;
+
+	mbp_priv.mbuf_data_room_size = mbuf_seg_size;
+	mbp_priv.mbuf_priv_size = 0;
+	rte_pktmbuf_pool_init(mp, &mbp_priv);
+
+	ntb_conf->mz_list = rte_zmalloc("ntb_memzone_list",
+				sizeof(struct rte_memzone *) *
+				ntb_info.mw_cnt, 0);
+	if (ntb_conf->mz_list == NULL)
+		goto fail;
+
+	/* Put ntb header on mw0. */
+	if (ntb_info.mw_size[0] < ntb_info.ntb_hdr_size) {
+		printf("mw0 (size: %" PRIu64 ") is not enough for ntb hdr"
+		       " (size: %u)\n", ntb_info.mw_size[0],
+		       ntb_info.ntb_hdr_size);
+		goto fail;
+	}
+
+	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
+	left_sz = total_elt_sz * nb_mbuf;
+	for (mz_id = 0; mz_id < ntb_info.mw_cnt; mz_id++) {
+		/* If populated mbuf is enough, no need to reserve extra mz. */
+		if (!left_sz)
+			break;
+		snprintf(mz_name, sizeof(mz_name), "ntb_mw_%d", mz_id);
+		align = ntb_info.mw_size_align ? ntb_info.mw_size[mz_id] :
+			RTE_CACHE_LINE_SIZE;
+		/* Reserve ntb header space on memzone 0. */
+		max_mz_len = mz_id ? ntb_info.mw_size[mz_id] :
+			     ntb_info.mw_size[mz_id] - ntb_info.ntb_hdr_size;
+		mz_len = left_sz <= max_mz_len ? left_sz :
+			(max_mz_len / total_elt_sz * total_elt_sz);
+		if (!mz_len)
+			continue;
+		mz = rte_memzone_reserve_aligned(mz_name, mz_len, socket_id,
+					RTE_MEMZONE_IOVA_CONTIG, align);
+		if (mz == NULL) {
+			printf("Cannot allocate %" PRIu64 " aligned memzone"
+				" %u\n", align, mz_id);
+			goto fail;
+		}
+		left_sz -= mz_len;
+
+		/* Reserve ntb header space on memzone 0. */
+		if (mz_id)
+			ret = rte_mempool_populate_iova(mp, mz->addr, mz->iova,
+					mz->len, ntb_mempool_mz_free,
+					(void *)(uintptr_t)mz);
+		else
+			ret = rte_mempool_populate_iova(mp,
+					(void *)((size_t)mz->addr +
+					ntb_info.ntb_hdr_size),
+					mz->iova + ntb_info.ntb_hdr_size,
+					mz->len - ntb_info.ntb_hdr_size,
+					ntb_mempool_mz_free,
+					(void *)(uintptr_t)mz);
+		if (ret < 0) {
+			rte_memzone_free(mz);
+			rte_mempool_free(mp);
+			return NULL;
+		}
+
+		ntb_conf->mz_list[mz_id] = mz;
+	}
+	if (left_sz) {
+		printf("mw space is not enough for mempool.\n");
+		goto fail;
+	}
+
+	ntb_conf->mz_num = mz_id;
+	rte_mempool_obj_iter(mp, rte_pktmbuf_init, NULL);
+
+	return mp;
+fail:
+	rte_mempool_free(mp);
+	return NULL;
 }
 
 int
 main(int argc, char **argv)
 {
+	struct rte_eth_conf eth_pconf = eth_port_conf;
+	struct rte_rawdev_info ntb_rawdev_conf;
+	struct rte_rawdev_info ntb_rawdev_info;
+	struct rte_eth_dev_info ethdev_info;
+	struct rte_eth_rxconf eth_rx_conf;
+	struct rte_eth_txconf eth_tx_conf;
+	struct ntb_queue_conf ntb_q_conf;
+	struct ntb_dev_config ntb_conf;
+	struct ntb_dev_info ntb_info;
+	uint64_t ntb_link_status;
+	uint32_t nb_mbuf;
 	int ret, i;
 
 	signal(SIGINT, signal_handler);
@@ -353,6 +1301,9 @@ main(int argc, char **argv)
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "Error with EAL initialization.\n");
 
+	if (rte_lcore_count() < 2)
+		rte_exit(EXIT_FAILURE, "Need at least 2 cores\n");
+
 	/* Find 1st ntb rawdev. */
 	for (i = 0; i < RTE_RAWDEV_MAX_DEVS; i++)
 		if (rte_rawdevs[i].driver_name &&
@@ -368,15 +1319,118 @@ main(int argc, char **argv)
 	argc -= ret;
 	argv += ret;
 
-	ret = parse_args(argc, argv);
+	ntb_parse_args(argc, argv);
+
+	rte_rawdev_set_attr(dev_id, NTB_QUEUE_SZ_NAME, nb_desc);
+	printf("Set queue size as %u.\n", nb_desc);
+	rte_rawdev_set_attr(dev_id, NTB_QUEUE_NUM_NAME, num_queues);
+	printf("Set queue number as %u.\n", num_queues);
+	ntb_rawdev_info.dev_private = (rte_rawdev_obj_t)(&ntb_info);
+	rte_rawdev_info_get(dev_id, &ntb_rawdev_info);
+
+	nb_mbuf = nb_desc * num_queues * 2 * 2 + rte_lcore_count() *
+		  MEMPOOL_CACHE_SIZE;
+	mbuf_pool = ntb_mbuf_pool_create(ntb_buf_size, nb_mbuf, ntb_info,
+					 &ntb_conf, rte_socket_id());
+	if (mbuf_pool == NULL)
+		rte_exit(EXIT_FAILURE, "Cannot create mbuf pool.\n");
+
+	ntb_conf.num_queues = num_queues;
+	ntb_conf.queue_size = nb_desc;
+	ntb_rawdev_conf.dev_private = (rte_rawdev_obj_t)(&ntb_conf);
+	ret = rte_rawdev_configure(dev_id, &ntb_rawdev_conf);
+	if (ret)
+		rte_exit(EXIT_FAILURE, "Can't config ntb dev: err=%d, "
+			"port=%u\n", ret, dev_id);
+
+	ntb_q_conf.tx_free_thresh = tx_free_thresh;
+	ntb_q_conf.nb_desc = nb_desc;
+	ntb_q_conf.rx_mp = mbuf_pool;
+	for (i = 0; i < num_queues; i++) {
+		/* Setup rawdev queue */
+		ret = rte_rawdev_queue_setup(dev_id, i, &ntb_q_conf);
+		if (ret < 0)
+			rte_exit(EXIT_FAILURE,
+				"Failed to setup ntb queue %u.\n", i);
+	}
+
+	/* Waiting for peer dev up at most 100s.*/
+	printf("Checking ntb link status...\n");
+	for (i = 0; i < 1000; i++) {
+		rte_rawdev_get_attr(dev_id, NTB_LINK_STATUS_NAME,
+				    &ntb_link_status);
+		if (ntb_link_status) {
+			printf("Peer dev ready, ntb link up.\n");
+			break;
+		}
+		rte_delay_ms(100);
+	}
+	rte_rawdev_get_attr(dev_id, NTB_LINK_STATUS_NAME, &ntb_link_status);
+	if (ntb_link_status == 0)
+		printf("Expire 100s. Link is not up. Please restart app.\n");
+
+	ret = rte_rawdev_start(dev_id);
 	if (ret < 0)
-		rte_exit(EXIT_FAILURE, "Invalid arguments\n");
+		rte_exit(EXIT_FAILURE, "rte_rawdev_start: err=%d, port=%u\n",
+			ret, dev_id);
+
+	/* Find 1st ethdev */
+	eth_port_id = rte_eth_find_next(0);
 
-	rte_rawdev_start(dev_id);
+	if (eth_port_id < RTE_MAX_ETHPORTS) {
+		rte_eth_dev_info_get(eth_port_id, &ethdev_info);
+		eth_pconf.rx_adv_conf.rss_conf.rss_hf &=
+				ethdev_info.flow_type_rss_offloads;
+		ret = rte_eth_dev_configure(eth_port_id, num_queues,
+					    num_queues, &eth_pconf);
+		if (ret)
+			rte_exit(EXIT_FAILURE, "Can't config ethdev: err=%d, "
+				"port=%u\n", ret, eth_port_id);
+		eth_rx_conf = ethdev_info.default_rxconf;
+		eth_rx_conf.offloads = eth_pconf.rxmode.offloads;
+		eth_tx_conf = ethdev_info.default_txconf;
+		eth_tx_conf.offloads = eth_pconf.txmode.offloads;
+
+		/* Setup ethdev queue if ethdev exists */
+		for (i = 0; i < num_queues; i++) {
+			ret = rte_eth_rx_queue_setup(eth_port_id, i, nb_desc,
+					rte_eth_dev_socket_id(eth_port_id),
+					&eth_rx_conf, mbuf_pool);
+			if (ret < 0)
+				rte_exit(EXIT_FAILURE,
+					"Failed to setup eth rxq %u.\n", i);
+			ret = rte_eth_tx_queue_setup(eth_port_id, i, nb_desc,
+					rte_eth_dev_socket_id(eth_port_id),
+					&eth_tx_conf);
+			if (ret < 0)
+				rte_exit(EXIT_FAILURE,
+					"Failed to setup eth txq %u.\n", i);
+		}
+
+		ret = rte_eth_dev_start(eth_port_id);
+		if (ret < 0)
+			rte_exit(EXIT_FAILURE, "rte_eth_dev_start: err=%d, "
+				"port=%u\n", ret, eth_port_id);
+	}
+
+	/* initialize port stats */
+	memset(&ntb_port_stats, 0, sizeof(ntb_port_stats));
+
+	/* Set default fwd mode if user doesn't set it. */
+	if (fwd_mode == MAX_FWD_MODE && eth_port_id < RTE_MAX_ETHPORTS) {
+		printf("Set default fwd mode as iofwd.\n");
+		fwd_mode = IOFWD;
+	}
+	if (fwd_mode == MAX_FWD_MODE) {
+		printf("Set default fwd mode as file-trans.\n");
+		fwd_mode = FILE_TRANS;
+	}
 
 	if (interactive) {
 		sleep(1);
 		prompt();
+	} else {
+		start_pkt_fwd();
 	}
 
 	return 0;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [PATCH v3 0/4] enable FIFO for NTB
  2019-09-06  3:02 ` [dpdk-dev] [PATCH v2 " Xiaoyun Li
                     ` (3 preceding siblings ...)
  2019-09-06  3:02   ` [dpdk-dev] [PATCH v2 4/4] examples/ntb: support more functions for NTB Xiaoyun Li
@ 2019-09-06  7:53   ` " Xiaoyun Li
  2019-09-06  7:53     ` [dpdk-dev] [PATCH v3 1/4] raw/ntb: setup ntb queue Xiaoyun Li
                       ` (4 more replies)
  4 siblings, 5 replies; 42+ messages in thread
From: Xiaoyun Li @ 2019-09-06  7:53 UTC (permalink / raw)
  To: jingjing.wu, keith.wiles, omkar.maslekar, cunming.liang; +Cc: dev, Xiaoyun Li

Enable FIFO for NTB rawdev driver to support packet based
processing. And an example is provided to support txonly,
rxonly, iofwd between NTB device and ethdev, and file
transmission.

Acked-by: Omkar Maslekar <omkar.maslekar@intel.com>

---
v3:
 * Replace strncpy with memcpy to avoid gcc-9 compile issue.

v2:
 * Fixed compile issues with 32-bit machine and lack of including file.
 * Fixed a typo.

Xiaoyun Li (4):
  raw/ntb: setup ntb queue
  raw/ntb: add xstats support
  raw/ntb: add enqueue and dequeue functions
  examples/ntb: support more functions for NTB

 doc/guides/rawdevs/ntb.rst             |   67 +-
 doc/guides/rel_notes/release_19_11.rst |    4 +
 doc/guides/sample_app_ug/ntb.rst       |   59 +-
 drivers/raw/ntb/Makefile               |    3 +
 drivers/raw/ntb/meson.build            |    1 +
 drivers/raw/ntb/ntb.c                  | 1075 +++++++++++++++-----
 drivers/raw/ntb/ntb.h                  |  162 ++-
 drivers/raw/ntb/ntb_hw_intel.c         |   48 +-
 drivers/raw/ntb/rte_pmd_ntb.h          |   43 +
 examples/ntb/meson.build               |    3 +
 examples/ntb/ntb_fwd.c                 | 1298 +++++++++++++++++++++---
 11 files changed, 2347 insertions(+), 416 deletions(-)
 create mode 100644 drivers/raw/ntb/rte_pmd_ntb.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [PATCH v3 1/4] raw/ntb: setup ntb queue
  2019-09-06  7:53   ` [dpdk-dev] [PATCH v3 0/4] enable FIFO " Xiaoyun Li
@ 2019-09-06  7:53     ` Xiaoyun Li
  2019-09-06  7:54     ` [dpdk-dev] [PATCH v3 2/4] raw/ntb: add xstats support Xiaoyun Li
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 42+ messages in thread
From: Xiaoyun Li @ 2019-09-06  7:53 UTC (permalink / raw)
  To: jingjing.wu, keith.wiles, omkar.maslekar, cunming.liang; +Cc: dev, Xiaoyun Li

Setup and init ntb txq and rxq. And negotiate queue information
with the peer. If queue size and number of queues are not
consistent on both sides, return error.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
---
 doc/guides/rawdevs/ntb.rst             |  39 +-
 doc/guides/rel_notes/release_19_11.rst |   4 +
 drivers/raw/ntb/Makefile               |   3 +
 drivers/raw/ntb/meson.build            |   1 +
 drivers/raw/ntb/ntb.c                  | 705 ++++++++++++++++++-------
 drivers/raw/ntb/ntb.h                  | 151 ++++--
 drivers/raw/ntb/ntb_hw_intel.c         |  26 +-
 drivers/raw/ntb/rte_pmd_ntb.h          |  43 ++
 8 files changed, 718 insertions(+), 254 deletions(-)
 create mode 100644 drivers/raw/ntb/rte_pmd_ntb.h

diff --git a/doc/guides/rawdevs/ntb.rst b/doc/guides/rawdevs/ntb.rst
index 0a61ec03d..99e7db441 100644
--- a/doc/guides/rawdevs/ntb.rst
+++ b/doc/guides/rawdevs/ntb.rst
@@ -45,8 +45,45 @@ to use, i.e. igb_uio, vfio. The ``dpdk-devbind.py`` script can be used to
 show devices status and to bind them to a suitable kernel driver. They will
 appear under the category of "Misc (rawdev) devices".
 
+Ring Layout
+-----------
+
+Since read/write remote system's memory are through PCI bus, remote read
+is much more expensive than remote write. Thus, the enqueue and dequeue
+based on ntb ring should avoid remote read. The ring layout for ntb is
+like the following:
+- Ring Format:
+  desc_ring:
+      0               16                                              64
+      +---------------------------------------------------------------+
+      |                        buffer address                         |
+      +---------------+-----------------------------------------------+
+      | buffer length |                      resv                     |
+      +---------------+-----------------------------------------------+
+  used_ring:
+      0               16              32
+      +---------------+---------------+
+      | packet length |     flags     |
+      +---------------+---------------+
+- Ring Layout
+      +------------------------+   +------------------------+
+      | used_ring              |   | desc_ring              |
+      | +---+                  |   | +---+                  |
+      | |   |                  |   | |   |                  |
+      | +---+      +--------+  |   | +---+                  |
+      | |   | ---> | buffer | <+---+-|   |                  |
+      | +---+      +--------+  |   | +---+                  |
+      | |   |                  |   | |   |                  |
+      | +---+                  |   | +---+                  |
+      |  ...                   |   |  ...                   |
+      |                        |   |                        |
+      |            +---------+ |   |            +---------+ |
+      |            | tx_tail | |   |            | rx_tail | |
+      | System A   +---------+ |   | System B   +---------+ |
+      +------------------------+   +------------------------+
+                    <---------traffic---------
+
 Limitation
 ----------
 
-- The FIFO hasn't been introduced and will come in 19.11 release.
 - This PMD only supports Intel Skylake platform.
diff --git a/doc/guides/rel_notes/release_19_11.rst b/doc/guides/rel_notes/release_19_11.rst
index 8490d897c..7ac3d5ca6 100644
--- a/doc/guides/rel_notes/release_19_11.rst
+++ b/doc/guides/rel_notes/release_19_11.rst
@@ -56,6 +56,10 @@ New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
+   * **Introduced FIFO for NTB PMD.**
+
+     Introduced FIFO for NTB (Non-transparent Bridge) PMD to support
+     packet based processing.
 
 Removed Items
 -------------
diff --git a/drivers/raw/ntb/Makefile b/drivers/raw/ntb/Makefile
index 6fe2aaf40..814cd05ca 100644
--- a/drivers/raw/ntb/Makefile
+++ b/drivers/raw/ntb/Makefile
@@ -25,4 +25,7 @@ LIBABIVER := 1
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_NTB_RAWDEV) += ntb.c
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_NTB_RAWDEV) += ntb_hw_intel.c
 
+# install this header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_PMD_NTB_RAWDEV)-include := rte_pmd_ntb.h
+
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/raw/ntb/meson.build b/drivers/raw/ntb/meson.build
index 7f39437f8..7a7d26126 100644
--- a/drivers/raw/ntb/meson.build
+++ b/drivers/raw/ntb/meson.build
@@ -5,4 +5,5 @@ deps += ['rawdev', 'mbuf', 'mempool',
 	 'pci', 'bus_pci']
 sources = files('ntb.c',
                 'ntb_hw_intel.c')
+install_headers('rte_pmd_ntb.h')
 allow_experimental_apis = true
diff --git a/drivers/raw/ntb/ntb.c b/drivers/raw/ntb/ntb.c
index bfecce1e4..02784a134 100644
--- a/drivers/raw/ntb/ntb.c
+++ b/drivers/raw/ntb/ntb.c
@@ -12,6 +12,7 @@
 #include <rte_eal.h>
 #include <rte_log.h>
 #include <rte_pci.h>
+#include <rte_mbuf.h>
 #include <rte_bus_pci.h>
 #include <rte_memzone.h>
 #include <rte_memcpy.h>
@@ -19,6 +20,7 @@
 #include <rte_rawdev_pmd.h>
 
 #include "ntb_hw_intel.h"
+#include "rte_pmd_ntb.h"
 #include "ntb.h"
 
 int ntb_logtype;
@@ -28,48 +30,7 @@ static const struct rte_pci_id pci_id_ntb_map[] = {
 	{ .vendor_id = 0, /* sentinel */ },
 };
 
-static int
-ntb_set_mw(struct rte_rawdev *dev, int mw_idx, uint64_t mw_size)
-{
-	struct ntb_hw *hw = dev->dev_private;
-	char mw_name[RTE_MEMZONE_NAMESIZE];
-	const struct rte_memzone *mz;
-	int ret = 0;
-
-	if (hw->ntb_ops->mw_set_trans == NULL) {
-		NTB_LOG(ERR, "Not supported to set mw.");
-		return -ENOTSUP;
-	}
-
-	snprintf(mw_name, sizeof(mw_name), "ntb_%d_mw_%d",
-		 dev->dev_id, mw_idx);
-
-	mz = rte_memzone_lookup(mw_name);
-	if (mz)
-		return 0;
-
-	/**
-	 * Hardware requires that mapped memory base address should be
-	 * aligned with EMBARSZ and needs continuous memzone.
-	 */
-	mz = rte_memzone_reserve_aligned(mw_name, mw_size, dev->socket_id,
-				RTE_MEMZONE_IOVA_CONTIG, hw->mw_size[mw_idx]);
-	if (!mz) {
-		NTB_LOG(ERR, "Cannot allocate aligned memzone.");
-		return -EIO;
-	}
-	hw->mz[mw_idx] = mz;
-
-	ret = (*hw->ntb_ops->mw_set_trans)(dev, mw_idx, mz->iova, mw_size);
-	if (ret) {
-		NTB_LOG(ERR, "Cannot set mw translation.");
-		return ret;
-	}
-
-	return ret;
-}
-
-static void
+static inline void
 ntb_link_cleanup(struct rte_rawdev *dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
@@ -89,20 +50,94 @@ ntb_link_cleanup(struct rte_rawdev *dev)
 	}
 
 	/* Clear mw so that peer cannot access local memory.*/
-	for (i = 0; i < hw->mw_cnt; i++) {
+	for (i = 0; i < hw->used_mw_num; i++) {
 		status = (*hw->ntb_ops->mw_set_trans)(dev, i, 0, 0);
 		if (status)
 			NTB_LOG(ERR, "Failed to clean mw.");
 	}
 }
 
+static inline int
+ntb_handshake_work(const struct rte_rawdev *dev)
+{
+	struct ntb_hw *hw = dev->dev_private;
+	uint32_t val;
+	int ret, i;
+
+	if (hw->ntb_ops->spad_write == NULL ||
+	    hw->ntb_ops->mw_set_trans == NULL) {
+		NTB_LOG(ERR, "Scratchpad/MW setting is not supported.");
+		return -ENOTSUP;
+	}
+
+	/* Tell peer the mw info of local side. */
+	ret = (*hw->ntb_ops->spad_write)(dev, SPAD_NUM_MWS, 1, hw->mw_cnt);
+	if (ret < 0)
+		return ret;
+	for (i = 0; i < hw->mw_cnt; i++) {
+		NTB_LOG(INFO, "Local %u mw size: 0x%"PRIx64"", i,
+				hw->mw_size[i]);
+		val = hw->mw_size[i] >> 32;
+		ret = (*hw->ntb_ops->spad_write)(dev, SPAD_MW0_SZ_H + 2 * i,
+						 1, val);
+		if (ret < 0)
+			return ret;
+		val = hw->mw_size[i];
+		ret = (*hw->ntb_ops->spad_write)(dev, SPAD_MW0_SZ_L + 2 * i,
+						 1, val);
+		if (ret < 0)
+			return ret;
+	}
+
+	/* Tell peer about the queue info and map memory to the peer. */
+	ret = (*hw->ntb_ops->spad_write)(dev, SPAD_Q_SZ, 1, hw->queue_size);
+	if (ret < 0)
+		return ret;
+	ret = (*hw->ntb_ops->spad_write)(dev, SPAD_NUM_QPS, 1,
+					 hw->queue_pairs);
+	if (ret < 0)
+		return ret;
+	ret = (*hw->ntb_ops->spad_write)(dev, SPAD_USED_MWS, 1,
+					 hw->used_mw_num);
+	if (ret < 0)
+		return ret;
+	for (i = 0; i < hw->used_mw_num; i++) {
+		val = (uint64_t)(hw->mz[i]->addr) >> 32;
+		ret = (*hw->ntb_ops->spad_write)(dev, SPAD_MW0_BA_H + 2 * i,
+						 1, val);
+		if (ret < 0)
+			return ret;
+		val = (uint64_t)(hw->mz[i]->addr);
+		ret = (*hw->ntb_ops->spad_write)(dev, SPAD_MW0_BA_L + 2 * i,
+						 1, val);
+		if (ret < 0)
+			return ret;
+	}
+
+	for (i = 0; i < hw->used_mw_num; i++) {
+		ret = (*hw->ntb_ops->mw_set_trans)(dev, i, hw->mz[i]->iova,
+						   hw->mz[i]->len);
+		if (ret < 0)
+			return ret;
+	}
+
+	/* Ring doorbell 0 to tell peer the device is ready. */
+	ret = (*hw->ntb_ops->peer_db_set)(dev, 0);
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
+
 static void
 ntb_dev_intr_handler(void *param)
 {
 	struct rte_rawdev *dev = (struct rte_rawdev *)param;
 	struct ntb_hw *hw = dev->dev_private;
-	uint32_t mw_size_h, mw_size_l;
+	uint32_t val_h, val_l;
+	uint64_t peer_mw_size;
 	uint64_t db_bits = 0;
+	uint8_t peer_mw_cnt;
 	int i = 0;
 
 	if (hw->ntb_ops->db_read == NULL ||
@@ -118,7 +153,7 @@ ntb_dev_intr_handler(void *param)
 
 	/* Doorbell 0 is for peer device ready. */
 	if (db_bits & 1) {
-		NTB_LOG(DEBUG, "DB0: Peer device is up.");
+		NTB_LOG(INFO, "DB0: Peer device is up.");
 		/* Clear received doorbell. */
 		(*hw->ntb_ops->db_clear)(dev, 1);
 
@@ -129,47 +164,44 @@ ntb_dev_intr_handler(void *param)
 		if (hw->peer_dev_up)
 			return;
 
-		if (hw->ntb_ops->spad_read == NULL ||
-		    hw->ntb_ops->spad_write == NULL) {
-			NTB_LOG(ERR, "Scratchpad is not supported.");
+		if (hw->ntb_ops->spad_read == NULL) {
+			NTB_LOG(ERR, "Scratchpad read is not supported.");
+			return;
+		}
+
+		/* Check if mw setting on the peer is the same as local. */
+		peer_mw_cnt = (*hw->ntb_ops->spad_read)(dev, SPAD_NUM_MWS, 0);
+		if (peer_mw_cnt != hw->mw_cnt) {
+			NTB_LOG(ERR, "Both mw cnt must be the same.");
 			return;
 		}
 
-		hw->peer_mw_cnt = (*hw->ntb_ops->spad_read)
-				  (dev, SPAD_NUM_MWS, 0);
-		hw->peer_mw_size = rte_zmalloc("uint64_t",
-				   hw->peer_mw_cnt * sizeof(uint64_t), 0);
 		for (i = 0; i < hw->mw_cnt; i++) {
-			mw_size_h = (*hw->ntb_ops->spad_read)
-				    (dev, SPAD_MW0_SZ_H + 2 * i, 0);
-			mw_size_l = (*hw->ntb_ops->spad_read)
-				    (dev, SPAD_MW0_SZ_L + 2 * i, 0);
-			hw->peer_mw_size[i] = ((uint64_t)mw_size_h << 32) |
-					      mw_size_l;
+			val_h = (*hw->ntb_ops->spad_read)
+				(dev, SPAD_MW0_SZ_H + 2 * i, 0);
+			val_l = (*hw->ntb_ops->spad_read)
+				(dev, SPAD_MW0_SZ_L + 2 * i, 0);
+			peer_mw_size = ((uint64_t)val_h << 32) | val_l;
 			NTB_LOG(DEBUG, "Peer %u mw size: 0x%"PRIx64"", i,
-					hw->peer_mw_size[i]);
+					peer_mw_size);
+			if (peer_mw_size != hw->mw_size[i]) {
+				NTB_LOG(ERR, "Mw config must be the same.");
+				return;
+			}
 		}
 
 		hw->peer_dev_up = 1;
 
 		/**
-		 * Handshake with peer. Spad_write only works when both
-		 * devices are up. So write spad again when db is received.
-		 * And set db again for the later device who may miss
+		 * Handshake with peer. Spad_write & mw_set_trans only works
+		 * when both devices are up. So write spad again when db is
+		 * received. And set db again for the later device who may miss
 		 * the 1st db.
 		 */
-		for (i = 0; i < hw->mw_cnt; i++) {
-			(*hw->ntb_ops->spad_write)(dev, SPAD_NUM_MWS,
-						   1, hw->mw_cnt);
-			mw_size_h = hw->mw_size[i] >> 32;
-			(*hw->ntb_ops->spad_write)(dev, SPAD_MW0_SZ_H + 2 * i,
-						   1, mw_size_h);
-
-			mw_size_l = hw->mw_size[i];
-			(*hw->ntb_ops->spad_write)(dev, SPAD_MW0_SZ_L + 2 * i,
-						   1, mw_size_l);
+		if (ntb_handshake_work(dev) < 0) {
+			NTB_LOG(ERR, "Handshake work failed.");
+			return;
 		}
-		(*hw->ntb_ops->peer_db_set)(dev, 0);
 
 		/* To get the link info. */
 		if (hw->ntb_ops->get_link_status == NULL) {
@@ -183,7 +215,7 @@ ntb_dev_intr_handler(void *param)
 	}
 
 	if (db_bits & (1 << 1)) {
-		NTB_LOG(DEBUG, "DB1: Peer device is down.");
+		NTB_LOG(INFO, "DB1: Peer device is down.");
 		/* Clear received doorbell. */
 		(*hw->ntb_ops->db_clear)(dev, 2);
 
@@ -197,7 +229,7 @@ ntb_dev_intr_handler(void *param)
 	}
 
 	if (db_bits & (1 << 2)) {
-		NTB_LOG(DEBUG, "DB2: Peer device agrees dev to be down.");
+		NTB_LOG(INFO, "DB2: Peer device agrees dev to be down.");
 		/* Clear received doorbell. */
 		(*hw->ntb_ops->db_clear)(dev, (1 << 2));
 		hw->peer_dev_up = 0;
@@ -206,24 +238,228 @@ ntb_dev_intr_handler(void *param)
 }
 
 static void
-ntb_queue_conf_get(struct rte_rawdev *dev __rte_unused,
-		   uint16_t queue_id __rte_unused,
-		   rte_rawdev_obj_t queue_conf __rte_unused)
+ntb_queue_conf_get(struct rte_rawdev *dev,
+		   uint16_t queue_id,
+		   rte_rawdev_obj_t queue_conf)
+{
+	struct ntb_queue_conf *q_conf = queue_conf;
+	struct ntb_hw *hw = dev->dev_private;
+
+	q_conf->tx_free_thresh = hw->tx_queues[queue_id]->tx_free_thresh;
+	q_conf->nb_desc = hw->rx_queues[queue_id]->nb_rx_desc;
+	q_conf->rx_mp = hw->rx_queues[queue_id]->mpool;
+}
+
+static void
+ntb_rxq_release_mbufs(struct ntb_rx_queue *q)
+{
+	int i;
+
+	if (!q || !q->sw_ring) {
+		NTB_LOG(ERR, "Pointer to rxq or sw_ring is NULL");
+		return;
+	}
+
+	for (i = 0; i < q->nb_rx_desc; i++) {
+		if (q->sw_ring[i].mbuf) {
+			rte_pktmbuf_free_seg(q->sw_ring[i].mbuf);
+			q->sw_ring[i].mbuf = NULL;
+		}
+	}
+}
+
+static void
+ntb_rxq_release(struct ntb_rx_queue *rxq)
+{
+	if (!rxq) {
+		NTB_LOG(ERR, "Pointer to rxq is NULL");
+		return;
+	}
+
+	ntb_rxq_release_mbufs(rxq);
+
+	rte_free(rxq->sw_ring);
+	rte_free(rxq);
+}
+
+static int
+ntb_rxq_setup(struct rte_rawdev *dev,
+	      uint16_t qp_id,
+	      rte_rawdev_obj_t queue_conf)
+{
+	struct ntb_queue_conf *rxq_conf = queue_conf;
+	struct ntb_hw *hw = dev->dev_private;
+	struct ntb_rx_queue *rxq;
+
+	/* Allocate the rx queue data structure */
+	rxq = rte_zmalloc_socket("ntb rx queue",
+				 sizeof(struct ntb_rx_queue),
+				 RTE_CACHE_LINE_SIZE,
+				 dev->socket_id);
+	if (!rxq) {
+		NTB_LOG(ERR, "Failed to allocate memory for "
+			    "rx queue data structure.");
+		return -ENOMEM;
+	}
+
+	if (rxq_conf->rx_mp == NULL) {
+		NTB_LOG(ERR, "Invalid null mempool pointer.");
+		return -EINVAL;
+	}
+	rxq->nb_rx_desc = rxq_conf->nb_desc;
+	rxq->mpool = rxq_conf->rx_mp;
+	rxq->port_id = dev->dev_id;
+	rxq->queue_id = qp_id;
+	rxq->hw = hw;
+
+	/* Allocate the software ring. */
+	rxq->sw_ring =
+		rte_zmalloc_socket("ntb rx sw ring",
+				   sizeof(struct ntb_rx_entry) *
+				   rxq->nb_rx_desc,
+				   RTE_CACHE_LINE_SIZE,
+				   dev->socket_id);
+	if (!rxq->sw_ring) {
+		ntb_rxq_release(rxq);
+		NTB_LOG(ERR, "Failed to allocate memory for SW ring");
+		return -ENOMEM;
+	}
+
+	hw->rx_queues[qp_id] = rxq;
+
+	return 0;
+}
+
+static void
+ntb_txq_release_mbufs(struct ntb_tx_queue *q)
+{
+	int i;
+
+	if (!q || !q->sw_ring) {
+		NTB_LOG(ERR, "Pointer to txq or sw_ring is NULL");
+		return;
+	}
+
+	for (i = 0; i < q->nb_tx_desc; i++) {
+		if (q->sw_ring[i].mbuf) {
+			rte_pktmbuf_free_seg(q->sw_ring[i].mbuf);
+			q->sw_ring[i].mbuf = NULL;
+		}
+	}
+}
+
+static void
+ntb_txq_release(struct ntb_tx_queue *txq)
 {
+	if (!txq) {
+		NTB_LOG(ERR, "Pointer to txq is NULL");
+		return;
+	}
+
+	ntb_txq_release_mbufs(txq);
+
+	rte_free(txq->sw_ring);
+	rte_free(txq);
 }
 
 static int
-ntb_queue_setup(struct rte_rawdev *dev __rte_unused,
-		uint16_t queue_id __rte_unused,
-		rte_rawdev_obj_t queue_conf __rte_unused)
+ntb_txq_setup(struct rte_rawdev *dev,
+	      uint16_t qp_id,
+	      rte_rawdev_obj_t queue_conf)
 {
+	struct ntb_queue_conf *txq_conf = queue_conf;
+	struct ntb_hw *hw = dev->dev_private;
+	struct ntb_tx_queue *txq;
+	uint16_t i, prev;
+
+	/* Allocate the TX queue data structure. */
+	txq = rte_zmalloc_socket("ntb tx queue",
+				  sizeof(struct ntb_tx_queue),
+				  RTE_CACHE_LINE_SIZE,
+				  dev->socket_id);
+	if (!txq) {
+		NTB_LOG(ERR, "Failed to allocate memory for "
+			    "tx queue structure");
+		return -ENOMEM;
+	}
+
+	txq->nb_tx_desc = txq_conf->nb_desc;
+	txq->port_id = dev->dev_id;
+	txq->queue_id = qp_id;
+	txq->hw = hw;
+
+	/* Allocate software ring */
+	txq->sw_ring =
+		rte_zmalloc_socket("ntb tx sw ring",
+				   sizeof(struct ntb_tx_entry) *
+				   txq->nb_tx_desc,
+				   RTE_CACHE_LINE_SIZE,
+				   dev->socket_id);
+	if (!txq->sw_ring) {
+		ntb_txq_release(txq);
+		NTB_LOG(ERR, "Failed to allocate memory for SW TX ring");
+		return -ENOMEM;
+	}
+
+	prev = txq->nb_tx_desc - 1;
+	for (i = 0; i < txq->nb_tx_desc; i++) {
+		txq->sw_ring[i].mbuf = NULL;
+		txq->sw_ring[i].last_id = i;
+		txq->sw_ring[prev].next_id = i;
+		prev = i;
+	}
+
+	txq->tx_free_thresh = txq_conf->tx_free_thresh ?
+			      txq_conf->tx_free_thresh :
+			      NTB_DFLT_TX_FREE_THRESH;
+	if (txq->tx_free_thresh >= txq->nb_tx_desc - 3) {
+		NTB_LOG(ERR, "tx_free_thresh must be less than nb_desc - 3. "
+			"(tx_free_thresh=%u qp_id=%u)", txq->tx_free_thresh,
+			qp_id);
+		return -EINVAL;
+	}
+
+	hw->tx_queues[qp_id] = txq;
+
 	return 0;
 }
 
+
+static int
+ntb_queue_setup(struct rte_rawdev *dev,
+		uint16_t queue_id,
+		rte_rawdev_obj_t queue_conf)
+{
+	struct ntb_hw *hw = dev->dev_private;
+	int ret;
+
+	if (queue_id > hw->queue_pairs)
+		return -EINVAL;
+
+	ret = ntb_txq_setup(dev, queue_id, queue_conf);
+	if (ret < 0)
+		return ret;
+
+	ret = ntb_rxq_setup(dev, queue_id, queue_conf);
+
+	return ret;
+}
+
 static int
-ntb_queue_release(struct rte_rawdev *dev __rte_unused,
-		  uint16_t queue_id __rte_unused)
+ntb_queue_release(struct rte_rawdev *dev, uint16_t queue_id)
 {
+	struct ntb_hw *hw = dev->dev_private;
+	struct ntb_tx_queue *txq;
+	struct ntb_rx_queue *rxq;
+
+	if (queue_id > hw->queue_pairs)
+		return -EINVAL;
+
+	txq = hw->tx_queues[queue_id];
+	rxq = hw->rx_queues[queue_id];
+	ntb_txq_release(txq);
+	ntb_rxq_release(rxq);
+
 	return 0;
 }
 
@@ -234,6 +470,77 @@ ntb_queue_count(struct rte_rawdev *dev)
 	return hw->queue_pairs;
 }
 
+static int
+ntb_queue_init(struct rte_rawdev *dev, uint16_t qp_id)
+{
+	struct ntb_hw *hw = dev->dev_private;
+	struct ntb_rx_queue *rxq = hw->rx_queues[qp_id];
+	struct ntb_tx_queue *txq = hw->tx_queues[qp_id];
+	volatile struct ntb_header *local_hdr;
+	struct ntb_header *remote_hdr;
+	uint16_t q_size = hw->queue_size;
+	uint32_t hdr_offset;
+	void *bar_addr;
+	uint16_t i;
+
+	if (hw->ntb_ops->get_peer_mw_addr == NULL) {
+		NTB_LOG(ERR, "Failed to get mapped peer addr.");
+		return -EINVAL;
+	}
+
+	/* Put queue info into the start of shared memory. */
+	hdr_offset = hw->hdr_size_per_queue * qp_id;
+	local_hdr = (volatile struct ntb_header *)
+		    ((size_t)hw->mz[0]->addr + hdr_offset);
+	bar_addr = (*hw->ntb_ops->get_peer_mw_addr)(dev, 0);
+	if (bar_addr == NULL)
+		return -EINVAL;
+	remote_hdr = (struct ntb_header *)
+		     ((size_t)bar_addr + hdr_offset);
+
+	/* rxq init. */
+	rxq->rx_desc_ring = (struct ntb_desc *)
+			    (&remote_hdr->desc_ring);
+	rxq->rx_used_ring = (volatile struct ntb_used *)
+			    (&local_hdr->desc_ring[q_size]);
+	rxq->avail_cnt = &remote_hdr->avail_cnt;
+	rxq->used_cnt = &local_hdr->used_cnt;
+
+	for (i = 0; i < rxq->nb_rx_desc - 1; i++) {
+		struct rte_mbuf *mbuf = rte_mbuf_raw_alloc(rxq->mpool);
+		if (unlikely(!mbuf)) {
+			NTB_LOG(ERR, "Failed to allocate mbuf for RX");
+			return -ENOMEM;
+		}
+		mbuf->port = dev->dev_id;
+
+		rxq->sw_ring[i].mbuf = mbuf;
+
+		rxq->rx_desc_ring[i].addr = rte_pktmbuf_mtod(mbuf, uint64_t);
+		rxq->rx_desc_ring[i].len = mbuf->buf_len - RTE_PKTMBUF_HEADROOM;
+	}
+	rte_wmb();
+	*rxq->avail_cnt = rxq->nb_rx_desc - 1;
+	rxq->last_avail = rxq->nb_rx_desc - 1;
+	rxq->last_used = 0;
+
+	/* txq init */
+	txq->tx_desc_ring = (volatile struct ntb_desc *)
+			    (&local_hdr->desc_ring);
+	txq->tx_used_ring = (struct ntb_used *)
+			    (&remote_hdr->desc_ring[q_size]);
+	txq->avail_cnt = &local_hdr->avail_cnt;
+	txq->used_cnt = &remote_hdr->used_cnt;
+
+	rte_wmb();
+	*txq->used_cnt = 0;
+	txq->last_used = 0;
+	txq->last_avail = 0;
+	txq->nb_tx_free = txq->nb_tx_desc - 1;
+
+	return 0;
+}
+
 static int
 ntb_enqueue_bufs(struct rte_rawdev *dev,
 		 struct rte_rawdev_buf **buffers,
@@ -278,58 +585,51 @@ static void
 ntb_dev_info_get(struct rte_rawdev *dev, rte_rawdev_obj_t dev_info)
 {
 	struct ntb_hw *hw = dev->dev_private;
-	struct ntb_attr *ntb_attrs = dev_info;
+	struct ntb_dev_info *info = dev_info;
 
-	strncpy(ntb_attrs[NTB_TOPO_ID].name, NTB_TOPO_NAME, NTB_ATTR_NAME_LEN);
-	switch (hw->topo) {
-	case NTB_TOPO_B2B_DSD:
-		strncpy(ntb_attrs[NTB_TOPO_ID].value, "B2B DSD",
-			NTB_ATTR_VAL_LEN);
-		break;
-	case NTB_TOPO_B2B_USD:
-		strncpy(ntb_attrs[NTB_TOPO_ID].value, "B2B USD",
-			NTB_ATTR_VAL_LEN);
-		break;
-	default:
-		strncpy(ntb_attrs[NTB_TOPO_ID].value, "Unsupported",
-			NTB_ATTR_VAL_LEN);
-	}
+	info->mw_cnt = hw->mw_cnt;
+	info->mw_size = hw->mw_size;
 
-	strncpy(ntb_attrs[NTB_LINK_STATUS_ID].name, NTB_LINK_STATUS_NAME,
-		NTB_ATTR_NAME_LEN);
-	snprintf(ntb_attrs[NTB_LINK_STATUS_ID].value, NTB_ATTR_VAL_LEN,
-		 "%d", hw->link_status);
-
-	strncpy(ntb_attrs[NTB_SPEED_ID].name, NTB_SPEED_NAME,
-		NTB_ATTR_NAME_LEN);
-	snprintf(ntb_attrs[NTB_SPEED_ID].value, NTB_ATTR_VAL_LEN,
-		 "%d", hw->link_speed);
-
-	strncpy(ntb_attrs[NTB_WIDTH_ID].name, NTB_WIDTH_NAME,
-		NTB_ATTR_NAME_LEN);
-	snprintf(ntb_attrs[NTB_WIDTH_ID].value, NTB_ATTR_VAL_LEN,
-		 "%d", hw->link_width);
-
-	strncpy(ntb_attrs[NTB_MW_CNT_ID].name, NTB_MW_CNT_NAME,
-		NTB_ATTR_NAME_LEN);
-	snprintf(ntb_attrs[NTB_MW_CNT_ID].value, NTB_ATTR_VAL_LEN,
-		 "%d", hw->mw_cnt);
+	/**
+	 * Intel hardware requires that mapped memory base address should be
+	 * aligned with EMBARSZ and needs continuous memzone.
+	 */
+	info->mw_size_align = (uint8_t)(hw->pci_dev->id.vendor_id ==
+					NTB_INTEL_VENDOR_ID);
 
-	strncpy(ntb_attrs[NTB_DB_CNT_ID].name, NTB_DB_CNT_NAME,
-		NTB_ATTR_NAME_LEN);
-	snprintf(ntb_attrs[NTB_DB_CNT_ID].value, NTB_ATTR_VAL_LEN,
-		 "%d", hw->db_cnt);
+	if (!hw->queue_size || !hw->queue_pairs) {
+		NTB_LOG(ERR, "No queue size and queue num assigned.");
+		return;
+	}
 
-	strncpy(ntb_attrs[NTB_SPAD_CNT_ID].name, NTB_SPAD_CNT_NAME,
-		NTB_ATTR_NAME_LEN);
-	snprintf(ntb_attrs[NTB_SPAD_CNT_ID].value, NTB_ATTR_VAL_LEN,
-		 "%d", hw->spad_cnt);
+	hw->hdr_size_per_queue = RTE_ALIGN(sizeof(struct ntb_header) +
+				hw->queue_size * sizeof(struct ntb_desc) +
+				hw->queue_size * sizeof(struct ntb_used),
+				RTE_CACHE_LINE_SIZE);
+	info->ntb_hdr_size = hw->hdr_size_per_queue * hw->queue_pairs;
 }
 
 static int
-ntb_dev_configure(const struct rte_rawdev *dev __rte_unused,
-		  rte_rawdev_obj_t config __rte_unused)
+ntb_dev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config)
 {
+	struct ntb_dev_config *conf = config;
+	struct ntb_hw *hw = dev->dev_private;
+	int ret;
+
+	hw->queue_pairs	= conf->num_queues;
+	hw->queue_size = conf->queue_size;
+	hw->used_mw_num = conf->mz_num;
+	hw->mz = conf->mz_list;
+	hw->rx_queues = rte_zmalloc("ntb_rx_queues",
+			sizeof(struct ntb_rx_queue *) * hw->queue_pairs, 0);
+	hw->tx_queues = rte_zmalloc("ntb_tx_queues",
+			sizeof(struct ntb_tx_queue *) * hw->queue_pairs, 0);
+
+	/* Start handshake with the peer. */
+	ret = ntb_handshake_work(dev);
+	if (ret < 0)
+		return ret;
+
 	return 0;
 }
 
@@ -337,21 +637,52 @@ static int
 ntb_dev_start(struct rte_rawdev *dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
-	int ret, i;
+	uint32_t peer_base_l, peer_val;
+	uint64_t peer_base_h;
+	uint32_t i;
+	int ret;
 
-	/* TODO: init queues and start queues. */
+	if (!hw->link_status || !hw->peer_dev_up)
+		return -EINVAL;
 
-	/* Map memory of bar_size to remote. */
-	hw->mz = rte_zmalloc("struct rte_memzone *",
-			     hw->mw_cnt * sizeof(struct rte_memzone *), 0);
-	for (i = 0; i < hw->mw_cnt; i++) {
-		ret = ntb_set_mw(dev, i, hw->mw_size[i]);
+	for (i = 0; i < hw->queue_pairs; i++) {
+		ret = ntb_queue_init(dev, i);
 		if (ret) {
-			NTB_LOG(ERR, "Fail to set mw.");
+			NTB_LOG(ERR, "Failed to init queue.");
 			return ret;
 		}
 	}
 
+	hw->peer_mw_base = rte_zmalloc("ntb_peer_mw_base", hw->mw_cnt *
+					sizeof(uint64_t), 0);
+
+	if (hw->ntb_ops->spad_read == NULL)
+		return -ENOTSUP;
+
+	peer_val = (*hw->ntb_ops->spad_read)(dev, SPAD_Q_SZ, 0);
+	if (peer_val != hw->queue_size) {
+		NTB_LOG(ERR, "Inconsistent queue size! (local: %u peer: %u)",
+			hw->queue_size, peer_val);
+		return -EINVAL;
+	}
+
+	peer_val = (*hw->ntb_ops->spad_read)(dev, SPAD_NUM_QPS, 0);
+	if (peer_val != hw->queue_pairs) {
+		NTB_LOG(ERR, "Inconsistent number of queues! (local: %u peer:"
+			" %u)", hw->queue_pairs, peer_val);
+		return -EINVAL;
+	}
+
+	hw->peer_used_mws = (*hw->ntb_ops->spad_read)(dev, SPAD_USED_MWS, 0);
+
+	for (i = 0; i < hw->peer_used_mws; i++) {
+		peer_base_h = (*hw->ntb_ops->spad_read)(dev,
+				SPAD_MW0_BA_H + 2 * i, 0);
+		peer_base_l = (*hw->ntb_ops->spad_read)(dev,
+				SPAD_MW0_BA_L + 2 * i, 0);
+		hw->peer_mw_base[i] = (peer_base_h << 32) + peer_base_l;
+	}
+
 	dev->started = 1;
 
 	return 0;
@@ -361,10 +692,10 @@ static void
 ntb_dev_stop(struct rte_rawdev *dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
+	struct ntb_rx_queue *rxq;
+	struct ntb_tx_queue *txq;
 	uint32_t time_out;
-	int status;
-
-	/* TODO: stop rx/tx queues. */
+	int status, i;
 
 	if (!hw->peer_dev_up)
 		goto clean;
@@ -405,6 +736,13 @@ ntb_dev_stop(struct rte_rawdev *dev)
 	if (status)
 		NTB_LOG(ERR, "Failed to clear doorbells.");
 
+	for (i = 0; i < hw->queue_pairs; i++) {
+		rxq = hw->rx_queues[i];
+		txq = hw->tx_queues[i];
+		ntb_rxq_release_mbufs(rxq);
+		ntb_txq_release_mbufs(txq);
+	}
+
 	dev->started = 0;
 }
 
@@ -413,12 +751,15 @@ ntb_dev_close(struct rte_rawdev *dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	struct rte_intr_handle *intr_handle;
-	int ret = 0;
+	int i;
 
 	if (dev->started)
 		ntb_dev_stop(dev);
 
-	/* TODO: free queues. */
+	/* free queues */
+	for (i = 0; i < hw->queue_pairs; i++)
+		ntb_queue_release(dev, i);
+	hw->queue_pairs = 0;
 
 	intr_handle = &hw->pci_dev->intr_handle;
 	/* Clean datapath event and vec mapping */
@@ -434,7 +775,7 @@ ntb_dev_close(struct rte_rawdev *dev)
 	rte_intr_callback_unregister(intr_handle,
 				     ntb_dev_intr_handler, dev);
 
-	return ret;
+	return 0;
 }
 
 static int
@@ -445,7 +786,7 @@ ntb_dev_reset(struct rte_rawdev *rawdev __rte_unused)
 
 static int
 ntb_attr_set(struct rte_rawdev *dev, const char *attr_name,
-				 uint64_t attr_value)
+	     uint64_t attr_value)
 {
 	struct ntb_hw *hw;
 	int index;
@@ -463,7 +804,21 @@ ntb_attr_set(struct rte_rawdev *dev, const char *attr_name,
 		index = atoi(&attr_name[NTB_SPAD_USER_LEN]);
 		(*hw->ntb_ops->spad_write)(dev, hw->spad_user_list[index],
 					   1, attr_value);
-		NTB_LOG(INFO, "Set attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Set attribute (%s) Value (%" PRIu64 ")",
+			attr_name, attr_value);
+		return 0;
+	}
+
+	if (!strncmp(attr_name, NTB_QUEUE_SZ_NAME, NTB_ATTR_NAME_LEN)) {
+		hw->queue_size = attr_value;
+		NTB_LOG(DEBUG, "Set attribute (%s) Value (%" PRIu64 ")",
+			attr_name, attr_value);
+		return 0;
+	}
+
+	if (!strncmp(attr_name, NTB_QUEUE_NUM_NAME, NTB_ATTR_NAME_LEN)) {
+		hw->queue_pairs = attr_value;
+		NTB_LOG(DEBUG, "Set attribute (%s) Value (%" PRIu64 ")",
 			attr_name, attr_value);
 		return 0;
 	}
@@ -475,7 +830,7 @@ ntb_attr_set(struct rte_rawdev *dev, const char *attr_name,
 
 static int
 ntb_attr_get(struct rte_rawdev *dev, const char *attr_name,
-				 uint64_t *attr_value)
+	     uint64_t *attr_value)
 {
 	struct ntb_hw *hw;
 	int index;
@@ -489,49 +844,50 @@ ntb_attr_get(struct rte_rawdev *dev, const char *attr_name,
 
 	if (!strncmp(attr_name, NTB_TOPO_NAME, NTB_ATTR_NAME_LEN)) {
 		*attr_value = hw->topo;
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
 
 	if (!strncmp(attr_name, NTB_LINK_STATUS_NAME, NTB_ATTR_NAME_LEN)) {
-		*attr_value = hw->link_status;
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		/* hw->link_status only indicates hw link status. */
+		*attr_value = hw->link_status && hw->peer_dev_up;
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
 
 	if (!strncmp(attr_name, NTB_SPEED_NAME, NTB_ATTR_NAME_LEN)) {
 		*attr_value = hw->link_speed;
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
 
 	if (!strncmp(attr_name, NTB_WIDTH_NAME, NTB_ATTR_NAME_LEN)) {
 		*attr_value = hw->link_width;
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
 
 	if (!strncmp(attr_name, NTB_MW_CNT_NAME, NTB_ATTR_NAME_LEN)) {
 		*attr_value = hw->mw_cnt;
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
 
 	if (!strncmp(attr_name, NTB_DB_CNT_NAME, NTB_ATTR_NAME_LEN)) {
 		*attr_value = hw->db_cnt;
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
 
 	if (!strncmp(attr_name, NTB_SPAD_CNT_NAME, NTB_ATTR_NAME_LEN)) {
 		*attr_value = hw->spad_cnt;
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
@@ -542,7 +898,7 @@ ntb_attr_get(struct rte_rawdev *dev, const char *attr_name,
 		index = atoi(&attr_name[NTB_SPAD_USER_LEN]);
 		*attr_value = (*hw->ntb_ops->spad_read)(dev,
 				hw->spad_user_list[index], 0);
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
@@ -585,6 +941,7 @@ ntb_xstats_reset(struct rte_rawdev *dev __rte_unused,
 	return 0;
 }
 
+
 static const struct rte_rawdev_ops ntb_ops = {
 	.dev_info_get         = ntb_dev_info_get,
 	.dev_configure        = ntb_dev_configure,
@@ -615,7 +972,6 @@ ntb_init_hw(struct rte_rawdev *dev, struct rte_pci_device *pci_dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	struct rte_intr_handle *intr_handle;
-	uint32_t val;
 	int ret, i;
 
 	hw->pci_dev = pci_dev;
@@ -688,45 +1044,6 @@ ntb_init_hw(struct rte_rawdev *dev, struct rte_pci_device *pci_dev)
 	/* enable uio intr after callback register */
 	rte_intr_enable(intr_handle);
 
-	if (hw->ntb_ops->spad_write == NULL) {
-		NTB_LOG(ERR, "Scratchpad is not supported.");
-		return -ENOTSUP;
-	}
-	/* Tell peer the mw_cnt of local side. */
-	ret = (*hw->ntb_ops->spad_write)(dev, SPAD_NUM_MWS, 1, hw->mw_cnt);
-	if (ret) {
-		NTB_LOG(ERR, "Failed to tell peer mw count.");
-		return ret;
-	}
-
-	/* Tell peer each mw size on local side. */
-	for (i = 0; i < hw->mw_cnt; i++) {
-		NTB_LOG(DEBUG, "Local %u mw size: 0x%"PRIx64"", i,
-				hw->mw_size[i]);
-		val = hw->mw_size[i] >> 32;
-		ret = (*hw->ntb_ops->spad_write)
-				(dev, SPAD_MW0_SZ_H + 2 * i, 1, val);
-		if (ret) {
-			NTB_LOG(ERR, "Failed to tell peer mw size.");
-			return ret;
-		}
-
-		val = hw->mw_size[i];
-		ret = (*hw->ntb_ops->spad_write)
-				(dev, SPAD_MW0_SZ_L + 2 * i, 1, val);
-		if (ret) {
-			NTB_LOG(ERR, "Failed to tell peer mw size.");
-			return ret;
-		}
-	}
-
-	/* Ring doorbell 0 to tell peer the device is ready. */
-	ret = (*hw->ntb_ops->peer_db_set)(dev, 0);
-	if (ret) {
-		NTB_LOG(ERR, "Failed to tell peer device is probed.");
-		return ret;
-	}
-
 	return ret;
 }
 
@@ -839,5 +1156,5 @@ RTE_INIT(ntb_init_log)
 {
 	ntb_logtype = rte_log_register("pmd.raw.ntb");
 	if (ntb_logtype >= 0)
-		rte_log_set_level(ntb_logtype, RTE_LOG_DEBUG);
+		rte_log_set_level(ntb_logtype, RTE_LOG_INFO);
 }
diff --git a/drivers/raw/ntb/ntb.h b/drivers/raw/ntb/ntb.h
index d355231b0..0ad20aed3 100644
--- a/drivers/raw/ntb/ntb.h
+++ b/drivers/raw/ntb/ntb.h
@@ -2,8 +2,8 @@
  * Copyright(c) 2019 Intel Corporation.
  */
 
-#ifndef _NTB_RAWDEV_H_
-#define _NTB_RAWDEV_H_
+#ifndef _NTB_H_
+#define _NTB_H_
 
 #include <stdbool.h>
 
@@ -19,38 +19,13 @@ extern int ntb_logtype;
 /* Device IDs */
 #define NTB_INTEL_DEV_ID_B2B_SKX    0x201C
 
-#define NTB_TOPO_NAME               "topo"
-#define NTB_LINK_STATUS_NAME        "link_status"
-#define NTB_SPEED_NAME              "speed"
-#define NTB_WIDTH_NAME              "width"
-#define NTB_MW_CNT_NAME             "mw_count"
-#define NTB_DB_CNT_NAME             "db_count"
-#define NTB_SPAD_CNT_NAME           "spad_count"
 /* Reserved to app to use. */
 #define NTB_SPAD_USER               "spad_user_"
 #define NTB_SPAD_USER_LEN           (sizeof(NTB_SPAD_USER) - 1)
-#define NTB_SPAD_USER_MAX_NUM       10
+#define NTB_SPAD_USER_MAX_NUM       4
 #define NTB_ATTR_NAME_LEN           30
-#define NTB_ATTR_VAL_LEN            30
-#define NTB_ATTR_MAX                20
-
-/* NTB Attributes */
-struct ntb_attr {
-	/**< Name of the attribute */
-	char name[NTB_ATTR_NAME_LEN];
-	/**< Value or reference of value of attribute */
-	char value[NTB_ATTR_NAME_LEN];
-};
 
-enum ntb_attr_idx {
-	NTB_TOPO_ID = 0,
-	NTB_LINK_STATUS_ID,
-	NTB_SPEED_ID,
-	NTB_WIDTH_ID,
-	NTB_MW_CNT_ID,
-	NTB_DB_CNT_ID,
-	NTB_SPAD_CNT_ID,
-};
+#define NTB_DFLT_TX_FREE_THRESH     256
 
 enum ntb_topo {
 	NTB_TOPO_NONE = 0,
@@ -87,10 +62,15 @@ enum ntb_spad_idx {
 	SPAD_NUM_MWS = 1,
 	SPAD_NUM_QPS,
 	SPAD_Q_SZ,
+	SPAD_USED_MWS,
 	SPAD_MW0_SZ_H,
 	SPAD_MW0_SZ_L,
 	SPAD_MW1_SZ_H,
 	SPAD_MW1_SZ_L,
+	SPAD_MW0_BA_H,
+	SPAD_MW0_BA_L,
+	SPAD_MW1_BA_H,
+	SPAD_MW1_BA_L,
 };
 
 /**
@@ -110,26 +90,97 @@ enum ntb_spad_idx {
  * @vector_bind: Bind vector source [intr] to msix vector [msix].
  */
 struct ntb_dev_ops {
-	int (*ntb_dev_init)(struct rte_rawdev *dev);
-	void *(*get_peer_mw_addr)(struct rte_rawdev *dev, int mw_idx);
-	int (*mw_set_trans)(struct rte_rawdev *dev, int mw_idx,
+	int (*ntb_dev_init)(const struct rte_rawdev *dev);
+	void *(*get_peer_mw_addr)(const struct rte_rawdev *dev, int mw_idx);
+	int (*mw_set_trans)(const struct rte_rawdev *dev, int mw_idx,
 			    uint64_t addr, uint64_t size);
-	int (*get_link_status)(struct rte_rawdev *dev);
-	int (*set_link)(struct rte_rawdev *dev, bool up);
-	uint32_t (*spad_read)(struct rte_rawdev *dev, int spad, bool peer);
-	int (*spad_write)(struct rte_rawdev *dev, int spad,
+	int (*get_link_status)(const struct rte_rawdev *dev);
+	int (*set_link)(const struct rte_rawdev *dev, bool up);
+	uint32_t (*spad_read)(const struct rte_rawdev *dev, int spad,
+			      bool peer);
+	int (*spad_write)(const struct rte_rawdev *dev, int spad,
 			  bool peer, uint32_t spad_v);
-	uint64_t (*db_read)(struct rte_rawdev *dev);
-	int (*db_clear)(struct rte_rawdev *dev, uint64_t db_bits);
-	int (*db_set_mask)(struct rte_rawdev *dev, uint64_t db_mask);
-	int (*peer_db_set)(struct rte_rawdev *dev, uint8_t db_bit);
-	int (*vector_bind)(struct rte_rawdev *dev, uint8_t intr, uint8_t msix);
+	uint64_t (*db_read)(const struct rte_rawdev *dev);
+	int (*db_clear)(const struct rte_rawdev *dev, uint64_t db_bits);
+	int (*db_set_mask)(const struct rte_rawdev *dev, uint64_t db_mask);
+	int (*peer_db_set)(const struct rte_rawdev *dev, uint8_t db_bit);
+	int (*vector_bind)(const struct rte_rawdev *dev, uint8_t intr,
+			   uint8_t msix);
+};
+
+struct ntb_desc {
+	uint64_t addr; /* buffer addr */
+	uint16_t len;  /* buffer length */
+	uint16_t rsv1;
+	uint32_t rsv2;
+};
+
+struct ntb_used {
+	uint16_t len;     /* buffer length */
+#define NTB_FLAG_EOP    1 /* end of packet */
+	uint16_t flags;   /* flags */
+};
+
+struct ntb_rx_entry {
+	struct rte_mbuf *mbuf;
+};
+
+struct ntb_rx_queue {
+	struct ntb_desc *rx_desc_ring;
+	volatile struct ntb_used *rx_used_ring;
+	uint16_t *avail_cnt;
+	volatile uint16_t *used_cnt;
+	uint16_t last_avail;
+	uint16_t last_used;
+	uint16_t nb_rx_desc;
+
+	uint16_t rx_free_thresh;
+
+	struct rte_mempool *mpool; /**< mempool for mbuf allocation */
+	struct ntb_rx_entry *sw_ring;
+
+	uint16_t queue_id;         /**< DPDK queue index. */
+	uint16_t port_id;          /**< Device port identifier. */
+
+	struct ntb_hw *hw;
+};
+
+struct ntb_tx_entry {
+	struct rte_mbuf *mbuf;
+	uint16_t next_id;
+	uint16_t last_id;
+};
+
+struct ntb_tx_queue {
+	volatile struct ntb_desc *tx_desc_ring;
+	struct ntb_used *tx_used_ring;
+	volatile uint16_t *avail_cnt;
+	uint16_t *used_cnt;
+	uint16_t last_avail;          /**< Next need to be free. */
+	uint16_t last_used;           /**< Next need to be sent. */
+	uint16_t nb_tx_desc;
+
+	/**< Total number of TX descriptors ready to be allocated. */
+	uint16_t nb_tx_free;
+	uint16_t tx_free_thresh;
+
+	struct ntb_tx_entry *sw_ring;
+
+	uint16_t queue_id;            /**< DPDK queue index. */
+	uint16_t port_id;             /**< Device port identifier. */
+
+	struct ntb_hw *hw;
+};
+
+struct ntb_header {
+	uint16_t avail_cnt __rte_cache_aligned;
+	uint16_t used_cnt __rte_cache_aligned;
+	struct ntb_desc desc_ring[] __rte_cache_aligned;
 };
 
 /* ntb private data. */
 struct ntb_hw {
 	uint8_t mw_cnt;
-	uint8_t peer_mw_cnt;
 	uint8_t db_cnt;
 	uint8_t spad_cnt;
 
@@ -147,18 +198,26 @@ struct ntb_hw {
 	struct rte_pci_device *pci_dev;
 	char *hw_addr;
 
-	uint64_t *mw_size;
-	uint64_t *peer_mw_size;
 	uint8_t peer_dev_up;
+	uint64_t *mw_size;
+	/* remote mem base addr */
+	uint64_t *peer_mw_base;
 
 	uint16_t queue_pairs;
 	uint16_t queue_size;
+	uint32_t hdr_size_per_queue;
+
+	struct ntb_rx_queue **rx_queues;
+	struct ntb_tx_queue **tx_queues;
 
-	/**< mem zone to populate RX ring. */
+	/* memzone to populate RX ring. */
 	const struct rte_memzone **mz;
+	uint8_t used_mw_num;
+
+	uint8_t peer_used_mws;
 
 	/* Reserve several spad for app to use. */
 	int spad_user_list[NTB_SPAD_USER_MAX_NUM];
 };
 
-#endif /* _NTB_RAWDEV_H_ */
+#endif /* _NTB_H_ */
diff --git a/drivers/raw/ntb/ntb_hw_intel.c b/drivers/raw/ntb/ntb_hw_intel.c
index 21eaa8511..0e73f1609 100644
--- a/drivers/raw/ntb/ntb_hw_intel.c
+++ b/drivers/raw/ntb/ntb_hw_intel.c
@@ -26,7 +26,7 @@ static enum xeon_ntb_bar intel_ntb_bar[] = {
 };
 
 static int
-intel_ntb_dev_init(struct rte_rawdev *dev)
+intel_ntb_dev_init(const struct rte_rawdev *dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint8_t reg_val, bar;
@@ -77,7 +77,7 @@ intel_ntb_dev_init(struct rte_rawdev *dev)
 	hw->db_cnt = XEON_DB_COUNT;
 	hw->spad_cnt = XEON_SPAD_COUNT;
 
-	hw->mw_size = rte_zmalloc("uint64_t",
+	hw->mw_size = rte_zmalloc("ntb_mw_size",
 				  hw->mw_cnt * sizeof(uint64_t), 0);
 	for (i = 0; i < hw->mw_cnt; i++) {
 		bar = intel_ntb_bar[i];
@@ -94,7 +94,7 @@ intel_ntb_dev_init(struct rte_rawdev *dev)
 }
 
 static void *
-intel_ntb_get_peer_mw_addr(struct rte_rawdev *dev, int mw_idx)
+intel_ntb_get_peer_mw_addr(const struct rte_rawdev *dev, int mw_idx)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint8_t bar;
@@ -116,7 +116,7 @@ intel_ntb_get_peer_mw_addr(struct rte_rawdev *dev, int mw_idx)
 }
 
 static int
-intel_ntb_mw_set_trans(struct rte_rawdev *dev, int mw_idx,
+intel_ntb_mw_set_trans(const struct rte_rawdev *dev, int mw_idx,
 		       uint64_t addr, uint64_t size)
 {
 	struct ntb_hw *hw = dev->dev_private;
@@ -163,7 +163,7 @@ intel_ntb_mw_set_trans(struct rte_rawdev *dev, int mw_idx,
 }
 
 static int
-intel_ntb_get_link_status(struct rte_rawdev *dev)
+intel_ntb_get_link_status(const struct rte_rawdev *dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint16_t reg_val;
@@ -195,7 +195,7 @@ intel_ntb_get_link_status(struct rte_rawdev *dev)
 }
 
 static int
-intel_ntb_set_link(struct rte_rawdev *dev, bool up)
+intel_ntb_set_link(const struct rte_rawdev *dev, bool up)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint32_t ntb_ctrl, reg_off;
@@ -221,7 +221,7 @@ intel_ntb_set_link(struct rte_rawdev *dev, bool up)
 }
 
 static uint32_t
-intel_ntb_spad_read(struct rte_rawdev *dev, int spad, bool peer)
+intel_ntb_spad_read(const struct rte_rawdev *dev, int spad, bool peer)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint32_t spad_v, reg_off;
@@ -241,7 +241,7 @@ intel_ntb_spad_read(struct rte_rawdev *dev, int spad, bool peer)
 }
 
 static int
-intel_ntb_spad_write(struct rte_rawdev *dev, int spad,
+intel_ntb_spad_write(const struct rte_rawdev *dev, int spad,
 		     bool peer, uint32_t spad_v)
 {
 	struct ntb_hw *hw = dev->dev_private;
@@ -263,7 +263,7 @@ intel_ntb_spad_write(struct rte_rawdev *dev, int spad,
 }
 
 static uint64_t
-intel_ntb_db_read(struct rte_rawdev *dev)
+intel_ntb_db_read(const struct rte_rawdev *dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint64_t db_off, db_bits;
@@ -278,7 +278,7 @@ intel_ntb_db_read(struct rte_rawdev *dev)
 }
 
 static int
-intel_ntb_db_clear(struct rte_rawdev *dev, uint64_t db_bits)
+intel_ntb_db_clear(const struct rte_rawdev *dev, uint64_t db_bits)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint64_t db_off;
@@ -293,7 +293,7 @@ intel_ntb_db_clear(struct rte_rawdev *dev, uint64_t db_bits)
 }
 
 static int
-intel_ntb_db_set_mask(struct rte_rawdev *dev, uint64_t db_mask)
+intel_ntb_db_set_mask(const struct rte_rawdev *dev, uint64_t db_mask)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint64_t db_m_off;
@@ -312,7 +312,7 @@ intel_ntb_db_set_mask(struct rte_rawdev *dev, uint64_t db_mask)
 }
 
 static int
-intel_ntb_peer_db_set(struct rte_rawdev *dev, uint8_t db_idx)
+intel_ntb_peer_db_set(const struct rte_rawdev *dev, uint8_t db_idx)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint32_t db_off;
@@ -332,7 +332,7 @@ intel_ntb_peer_db_set(struct rte_rawdev *dev, uint8_t db_idx)
 }
 
 static int
-intel_ntb_vector_bind(struct rte_rawdev *dev, uint8_t intr, uint8_t msix)
+intel_ntb_vector_bind(const struct rte_rawdev *dev, uint8_t intr, uint8_t msix)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint8_t reg_off;
diff --git a/drivers/raw/ntb/rte_pmd_ntb.h b/drivers/raw/ntb/rte_pmd_ntb.h
new file mode 100644
index 000000000..6591ce793
--- /dev/null
+++ b/drivers/raw/ntb/rte_pmd_ntb.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Intel Corporation.
+ */
+
+#ifndef _RTE_PMD_NTB_H_
+#define _RTE_PMD_NTB_H_
+
+/* App needs to set/get these attrs */
+#define NTB_QUEUE_SZ_NAME           "queue_size"
+#define NTB_QUEUE_NUM_NAME          "queue_num"
+#define NTB_TOPO_NAME               "topo"
+#define NTB_LINK_STATUS_NAME        "link_status"
+#define NTB_SPEED_NAME              "speed"
+#define NTB_WIDTH_NAME              "width"
+#define NTB_MW_CNT_NAME             "mw_count"
+#define NTB_DB_CNT_NAME             "db_count"
+#define NTB_SPAD_CNT_NAME           "spad_count"
+
+#define NTB_MAX_DESC_SIZE           1024
+#define NTB_MIN_DESC_SIZE           64
+
+struct ntb_dev_info {
+	uint32_t ntb_hdr_size;
+	/**< memzone needs to be mw size align or not. */
+	uint8_t mw_size_align;
+	uint8_t mw_cnt;
+	uint64_t *mw_size;
+};
+
+struct ntb_dev_config {
+	uint16_t num_queues;
+	uint16_t queue_size;
+	uint8_t mz_num;
+	const struct rte_memzone **mz_list;
+};
+
+struct ntb_queue_conf {
+	uint16_t nb_desc;
+	uint16_t tx_free_thresh;
+	struct rte_mempool *rx_mp;
+};
+
+#endif /* _RTE_PMD_NTB_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [PATCH v3 2/4] raw/ntb: add xstats support
  2019-09-06  7:53   ` [dpdk-dev] [PATCH v3 0/4] enable FIFO " Xiaoyun Li
  2019-09-06  7:53     ` [dpdk-dev] [PATCH v3 1/4] raw/ntb: setup ntb queue Xiaoyun Li
@ 2019-09-06  7:54     ` Xiaoyun Li
  2019-09-06  7:54     ` [dpdk-dev] [PATCH v3 3/4] raw/ntb: add enqueue and dequeue functions Xiaoyun Li
                       ` (2 subsequent siblings)
  4 siblings, 0 replies; 42+ messages in thread
From: Xiaoyun Li @ 2019-09-06  7:54 UTC (permalink / raw)
  To: jingjing.wu, keith.wiles, omkar.maslekar, cunming.liang; +Cc: dev, Xiaoyun Li

Add xstats support for ntb rawdev.
Support tx-packets, tx-bytes, tx-errors and
rx-packets, rx-bytes, rx-missed.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
---
 drivers/raw/ntb/ntb.c | 132 ++++++++++++++++++++++++++++++++++++------
 drivers/raw/ntb/ntb.h |  11 ++++
 2 files changed, 125 insertions(+), 18 deletions(-)

diff --git a/drivers/raw/ntb/ntb.c b/drivers/raw/ntb/ntb.c
index 02784a134..9c8a3e3cb 100644
--- a/drivers/raw/ntb/ntb.c
+++ b/drivers/raw/ntb/ntb.c
@@ -30,6 +30,17 @@ static const struct rte_pci_id pci_id_ntb_map[] = {
 	{ .vendor_id = 0, /* sentinel */ },
 };
 
+/* Align with enum ntb_xstats_idx */
+static struct rte_rawdev_xstats_name ntb_xstats_names[] = {
+	{"Tx-packets"},
+	{"Tx-bytes"},
+	{"Tx-errors"},
+	{"Rx-packets"},
+	{"Rx-bytes"},
+	{"Rx-missed"},
+};
+#define NTB_XSTATS_NUM RTE_DIM(ntb_xstats_names)
+
 static inline void
 ntb_link_cleanup(struct rte_rawdev *dev)
 {
@@ -538,6 +549,10 @@ ntb_queue_init(struct rte_rawdev *dev, uint16_t qp_id)
 	txq->last_avail = 0;
 	txq->nb_tx_free = txq->nb_tx_desc - 1;
 
+	/* Set per queue stats. */
+	for (i = 0; i < NTB_XSTATS_NUM; i++)
+		hw->ntb_xstats[i + NTB_XSTATS_NUM * (qp_id + 1)] = 0;
+
 	return 0;
 }
 
@@ -614,6 +629,7 @@ ntb_dev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config)
 {
 	struct ntb_dev_config *conf = config;
 	struct ntb_hw *hw = dev->dev_private;
+	uint32_t xstats_num;
 	int ret;
 
 	hw->queue_pairs	= conf->num_queues;
@@ -624,6 +640,10 @@ ntb_dev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config)
 			sizeof(struct ntb_rx_queue *) * hw->queue_pairs, 0);
 	hw->tx_queues = rte_zmalloc("ntb_tx_queues",
 			sizeof(struct ntb_tx_queue *) * hw->queue_pairs, 0);
+	/* First total stats, then per queue stats. */
+	xstats_num = (hw->queue_pairs + 1) * NTB_XSTATS_NUM;
+	hw->ntb_xstats = rte_zmalloc("ntb_xstats", xstats_num *
+				     sizeof(uint64_t), 0);
 
 	/* Start handshake with the peer. */
 	ret = ntb_handshake_work(dev);
@@ -645,6 +665,10 @@ ntb_dev_start(struct rte_rawdev *dev)
 	if (!hw->link_status || !hw->peer_dev_up)
 		return -EINVAL;
 
+	/* Set total stats. */
+	for (i = 0; i < NTB_XSTATS_NUM; i++)
+		hw->ntb_xstats[i] = 0;
+
 	for (i = 0; i < hw->queue_pairs; i++) {
 		ret = ntb_queue_init(dev, i);
 		if (ret) {
@@ -909,38 +933,110 @@ ntb_attr_get(struct rte_rawdev *dev, const char *attr_name,
 }
 
 static int
-ntb_xstats_get(const struct rte_rawdev *dev __rte_unused,
-	       const unsigned int ids[] __rte_unused,
-	       uint64_t values[] __rte_unused,
-	       unsigned int n __rte_unused)
+ntb_xstats_get(const struct rte_rawdev *dev,
+	       const unsigned int ids[],
+	       uint64_t values[],
+	       unsigned int n)
 {
-	return 0;
+	struct ntb_hw *hw = dev->dev_private;
+	uint32_t i, j, off, xstats_num;
+
+	/* Calculate total stats of all queues. */
+	for (i = 0; i < NTB_XSTATS_NUM; i++) {
+		for (j = 0; j < hw->queue_pairs; j++) {
+			off = NTB_XSTATS_NUM * (j + 1) + i;
+			hw->ntb_xstats[i] += hw->ntb_xstats[off];
+		}
+	}
+
+	xstats_num = NTB_XSTATS_NUM * (hw->queue_pairs + 1);
+	for (i = 0; i < n && ids[i] < xstats_num; i++)
+		values[i] = hw->ntb_xstats[ids[i]];
+
+	return i;
 }
 
 static int
-ntb_xstats_get_names(const struct rte_rawdev *dev __rte_unused,
-		     struct rte_rawdev_xstats_name *xstats_names __rte_unused,
-		     unsigned int size __rte_unused)
+ntb_xstats_get_names(const struct rte_rawdev *dev,
+		     struct rte_rawdev_xstats_name *xstats_names,
+		     unsigned int size)
 {
-	return 0;
+	struct ntb_hw *hw = dev->dev_private;
+	uint32_t xstats_num, i, j, off;
+
+	xstats_num = NTB_XSTATS_NUM * (hw->queue_pairs + 1);
+	if (xstats_names == NULL || size < xstats_num)
+		return xstats_num;
+
+	/* Total stats names */
+	memcpy(xstats_names, ntb_xstats_names, sizeof(ntb_xstats_names));
+
+	/* Queue stats names */
+	for (i = 0; i < hw->queue_pairs; i++) {
+		for (j = 0; j < NTB_XSTATS_NUM; j++) {
+			off = j + (i + 1) * NTB_XSTATS_NUM;
+			snprintf(xstats_names[off].name,
+				sizeof(xstats_names[0].name),
+				"%s_q%u", ntb_xstats_names[j].name, i);
+		}
+	}
+
+	return xstats_num;
 }
 
 static uint64_t
-ntb_xstats_get_by_name(const struct rte_rawdev *dev __rte_unused,
-		       const char *name __rte_unused,
-		       unsigned int *id __rte_unused)
+ntb_xstats_get_by_name(const struct rte_rawdev *dev,
+		       const char *name, unsigned int *id)
 {
-	return 0;
+	struct rte_rawdev_xstats_name *xstats_names;
+	struct ntb_hw *hw = dev->dev_private;
+	uint32_t xstats_num, i, j, off;
+
+	if (name == NULL)
+		return -EINVAL;
+
+	xstats_num = NTB_XSTATS_NUM * (hw->queue_pairs + 1);
+	xstats_names = rte_zmalloc("ntb_stats_name",
+				   sizeof(struct rte_rawdev_xstats_name) *
+				   xstats_num, 0);
+	ntb_xstats_get_names(dev, xstats_names, xstats_num);
+
+	/* Calculate total stats of all queues. */
+	for (i = 0; i < NTB_XSTATS_NUM; i++) {
+		for (j = 0; j < hw->queue_pairs; j++) {
+			off = NTB_XSTATS_NUM * (j + 1) + i;
+			hw->ntb_xstats[i] += hw->ntb_xstats[off];
+		}
+	}
+
+	for (i = 0; i < xstats_num; i++) {
+		if (!strncmp(name, xstats_names[i].name,
+		    RTE_RAW_DEV_XSTATS_NAME_SIZE)) {
+			*id = i;
+			rte_free(xstats_names);
+			return hw->ntb_xstats[i];
+		}
+	}
+
+	NTB_LOG(ERR, "Cannot find the xstats name.");
+
+	return -EINVAL;
 }
 
 static int
-ntb_xstats_reset(struct rte_rawdev *dev __rte_unused,
-		 const uint32_t ids[] __rte_unused,
-		 uint32_t nb_ids __rte_unused)
+ntb_xstats_reset(struct rte_rawdev *dev,
+		 const uint32_t ids[],
+		 uint32_t nb_ids)
 {
-	return 0;
-}
+	struct ntb_hw *hw = dev->dev_private;
+	uint32_t i, xstats_num;
 
+	xstats_num = NTB_XSTATS_NUM * (hw->queue_pairs + 1);
+	for (i = 0; i < nb_ids && ids[i] < xstats_num; i++)
+		hw->ntb_xstats[ids[i]] = 0;
+
+	return i;
+}
 
 static const struct rte_rawdev_ops ntb_ops = {
 	.dev_info_get         = ntb_dev_info_get,
diff --git a/drivers/raw/ntb/ntb.h b/drivers/raw/ntb/ntb.h
index 0ad20aed3..09e28050f 100644
--- a/drivers/raw/ntb/ntb.h
+++ b/drivers/raw/ntb/ntb.h
@@ -27,6 +27,15 @@ extern int ntb_logtype;
 
 #define NTB_DFLT_TX_FREE_THRESH     256
 
+enum ntb_xstats_idx {
+	NTB_TX_PKTS_ID = 0,
+	NTB_TX_BYTES_ID,
+	NTB_TX_ERRS_ID,
+	NTB_RX_PKTS_ID,
+	NTB_RX_BYTES_ID,
+	NTB_RX_MISS_ID,
+};
+
 enum ntb_topo {
 	NTB_TOPO_NONE = 0,
 	NTB_TOPO_B2B_USD,
@@ -216,6 +225,8 @@ struct ntb_hw {
 
 	uint8_t peer_used_mws;
 
+	uint64_t *ntb_xstats;
+
 	/* Reserve several spad for app to use. */
 	int spad_user_list[NTB_SPAD_USER_MAX_NUM];
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [PATCH v3 3/4] raw/ntb: add enqueue and dequeue functions
  2019-09-06  7:53   ` [dpdk-dev] [PATCH v3 0/4] enable FIFO " Xiaoyun Li
  2019-09-06  7:53     ` [dpdk-dev] [PATCH v3 1/4] raw/ntb: setup ntb queue Xiaoyun Li
  2019-09-06  7:54     ` [dpdk-dev] [PATCH v3 2/4] raw/ntb: add xstats support Xiaoyun Li
@ 2019-09-06  7:54     ` Xiaoyun Li
  2019-09-06  7:54     ` [dpdk-dev] [PATCH v3 4/4] examples/ntb: support more functions for NTB Xiaoyun Li
  2019-09-09  3:27     ` [dpdk-dev] [PATCH v4 0/4] enable FIFO " Xiaoyun Li
  4 siblings, 0 replies; 42+ messages in thread
From: Xiaoyun Li @ 2019-09-06  7:54 UTC (permalink / raw)
  To: jingjing.wu, keith.wiles, omkar.maslekar, cunming.liang; +Cc: dev, Xiaoyun Li

Introduce enqueue and dequeue functions to support packet based
processing. And enable write-combining for ntb driver since it
can improve the performance a lot.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
---
 doc/guides/rawdevs/ntb.rst     |  28 ++++
 drivers/raw/ntb/ntb.c          | 242 ++++++++++++++++++++++++++++++---
 drivers/raw/ntb/ntb.h          |   2 +
 drivers/raw/ntb/ntb_hw_intel.c |  22 +++
 4 files changed, 275 insertions(+), 19 deletions(-)

diff --git a/doc/guides/rawdevs/ntb.rst b/doc/guides/rawdevs/ntb.rst
index 99e7db441..afd5769fc 100644
--- a/doc/guides/rawdevs/ntb.rst
+++ b/doc/guides/rawdevs/ntb.rst
@@ -45,6 +45,24 @@ to use, i.e. igb_uio, vfio. The ``dpdk-devbind.py`` script can be used to
 show devices status and to bind them to a suitable kernel driver. They will
 appear under the category of "Misc (rawdev) devices".
 
+Prerequisites
+-------------
+NTB PMD needs kernel PCI driver to support write combining (WC) to get
+better performance. The difference will be more than 10 times.
+To enable WC, there are 2 ways.
+- Insert igb_uio with ``wc_active=1`` flag if use igb_uio driver.
+     insmod igb_uio.ko wc_active=1
+- Enable WC for NTB device's Bar 2 and Bar 4 (Mapped memory) manually.
+     Get bar base address using ``lspci -vvv -s ae:00.0 | grep Region``.
+        Region 0: Memory at 39bfe0000000 (64-bit, prefetchable) [size=64K]
+        Region 2: Memory at 39bfa0000000 (64-bit, prefetchable) [size=512M]
+        Region 4: Memory at 39bfc0000000 (64-bit, prefetchable) [size=512M]
+     Using the following command to enable WC.
+     echo "base=0x39bfa0000000 size=0x400000 type=write-combining" >> /proc/mtrr
+     echo "base=0x39bfc0000000 size=0x400000 type=write-combining" >> /proc/mtrr
+     To disable WC for these regions, using the following.
+     echo "disable=1" >> /proc/mtrr
+
 Ring Layout
 -----------
 
@@ -83,6 +101,16 @@ like the following:
       +------------------------+   +------------------------+
                     <---------traffic---------
 
+- Enqueue and Dequeue
+  Based on this ring layout, enqueue reads rx_tail to get how many free
+  buffers and writes used_ring and tx_tail to tell the peer which buffers
+  are filled with data.
+  And dequeue reads tx_tail to get how many packets are arrived, and
+  writes desc_ring and rx_tail to tell the peer about the new allocated
+  buffers.
+  So in this way, only remote write happens and remote read can be avoid
+  to get better performance.
+
 Limitation
 ----------
 
diff --git a/drivers/raw/ntb/ntb.c b/drivers/raw/ntb/ntb.c
index 9c8a3e3cb..0398f5ba3 100644
--- a/drivers/raw/ntb/ntb.c
+++ b/drivers/raw/ntb/ntb.c
@@ -556,26 +556,140 @@ ntb_queue_init(struct rte_rawdev *dev, uint16_t qp_id)
 	return 0;
 }
 
+static inline void
+ntb_enqueue_cleanup(struct ntb_tx_queue *txq)
+{
+	struct ntb_tx_entry *sw_ring = txq->sw_ring;
+	uint16_t tx_free = txq->last_avail;
+	uint16_t nb_to_clean, i;
+
+	/* avail_cnt + 1 represents where to rx next in the peer. */
+	nb_to_clean = (*txq->avail_cnt - txq->last_avail + 1 +
+			txq->nb_tx_desc) & (txq->nb_tx_desc - 1);
+	nb_to_clean = RTE_MIN(nb_to_clean, txq->tx_free_thresh);
+	for (i = 0; i < nb_to_clean; i++) {
+		if (sw_ring[tx_free].mbuf)
+			rte_pktmbuf_free_seg(sw_ring[tx_free].mbuf);
+		tx_free = (tx_free + 1) & (txq->nb_tx_desc - 1);
+	}
+
+	txq->nb_tx_free += nb_to_clean;
+	txq->last_avail = tx_free;
+}
+
 static int
 ntb_enqueue_bufs(struct rte_rawdev *dev,
 		 struct rte_rawdev_buf **buffers,
 		 unsigned int count,
 		 rte_rawdev_obj_t context)
 {
-	/* Not FIFO right now. Just for testing memory write. */
 	struct ntb_hw *hw = dev->dev_private;
-	unsigned int i;
-	void *bar_addr;
-	size_t size;
+	struct ntb_tx_queue *txq = hw->tx_queues[(size_t)context];
+	struct ntb_tx_entry *sw_ring = txq->sw_ring;
+	struct rte_mbuf *txm;
+	struct ntb_used tx_used[NTB_MAX_DESC_SIZE];
+	volatile struct ntb_desc *tx_item;
+	uint16_t tx_last, nb_segs, off, last_used, avail_cnt;
+	uint16_t nb_mbufs = 0;
+	uint16_t nb_tx = 0;
+	uint64_t bytes = 0;
+	void *buf_addr;
+	int i;
 
-	if (hw->ntb_ops->get_peer_mw_addr == NULL)
-		return -ENOTSUP;
-	bar_addr = (*hw->ntb_ops->get_peer_mw_addr)(dev, 0);
-	size = (size_t)context;
+	if (unlikely(hw->ntb_ops->ioremap == NULL)) {
+		NTB_LOG(ERR, "Ioremap not supported.");
+		return nb_tx;
+	}
 
-	for (i = 0; i < count; i++)
-		rte_memcpy(bar_addr, buffers[i]->buf_addr, size);
-	return 0;
+	if (unlikely(dev->started == 0 || hw->peer_dev_up == 0)) {
+		NTB_LOG(DEBUG, "Link is not up.");
+		return nb_tx;
+	}
+
+	if (txq->nb_tx_free < txq->tx_free_thresh)
+		ntb_enqueue_cleanup(txq);
+
+	off = NTB_XSTATS_NUM * ((size_t)context + 1);
+	last_used = txq->last_used;
+	avail_cnt = *txq->avail_cnt;/* Where to alloc next. */
+	for (nb_tx = 0; nb_tx < count; nb_tx++) {
+		txm = (struct rte_mbuf *)(buffers[nb_tx]->buf_addr);
+		if (txm == NULL || txq->nb_tx_free < txm->nb_segs)
+			break;
+
+		tx_last = (txq->last_used + txm->nb_segs - 1) &
+			  (txq->nb_tx_desc - 1);
+		nb_segs = txm->nb_segs;
+		for (i = 0; i < nb_segs; i++) {
+			/* Not enough ring space for tx. */
+			if (txq->last_used == avail_cnt)
+				goto end_of_tx;
+			sw_ring[txq->last_used].mbuf = txm;
+			tx_item = txq->tx_desc_ring + txq->last_used;
+
+			if (!tx_item->len) {
+				(hw->ntb_xstats[NTB_TX_ERRS_ID + off])++;
+				goto end_of_tx;
+			}
+			if (txm->data_len > tx_item->len) {
+				NTB_LOG(ERR, "Data length exceeds buf length."
+					" Only %u data would be transmitted.",
+					tx_item->len);
+				txm->data_len = tx_item->len;
+			}
+
+			/* translate remote virtual addr to bar virtual addr */
+			buf_addr = (*hw->ntb_ops->ioremap)(dev, tx_item->addr);
+			if (buf_addr == NULL) {
+				(hw->ntb_xstats[NTB_TX_ERRS_ID + off])++;
+				NTB_LOG(ERR, "Null remap addr.");
+				goto end_of_tx;
+			}
+			rte_memcpy(buf_addr, rte_pktmbuf_mtod(txm, void *),
+				   txm->data_len);
+
+			tx_used[nb_mbufs].len = txm->data_len;
+			tx_used[nb_mbufs++].flags = (txq->last_used ==
+						    tx_last) ?
+						    NTB_FLAG_EOP : 0;
+
+			/* update stats */
+			bytes += txm->data_len;
+
+			txm = txm->next;
+
+			sw_ring[txq->last_used].next_id = (txq->last_used + 1) &
+						  (txq->nb_tx_desc - 1);
+			sw_ring[txq->last_used].last_id = tx_last;
+			txq->last_used = (txq->last_used + 1) &
+					 (txq->nb_tx_desc - 1);
+		}
+		txq->nb_tx_free -= nb_segs;
+	}
+
+end_of_tx:
+	if (nb_tx) {
+		uint16_t nb1, nb2;
+		if (nb_mbufs > txq->nb_tx_desc - last_used) {
+			nb1 = txq->nb_tx_desc - last_used;
+			nb2 = nb_mbufs - txq->nb_tx_desc + last_used;
+		} else {
+			nb1 = nb_mbufs;
+			nb2 = 0;
+		}
+		rte_memcpy(txq->tx_used_ring + last_used, tx_used,
+			   sizeof(struct ntb_used) * nb1);
+		rte_memcpy(txq->tx_used_ring, tx_used + nb1,
+			   sizeof(struct ntb_used) * nb2);
+		*txq->used_cnt = txq->last_used;
+		rte_wmb();
+
+		/* update queue stats */
+		hw->ntb_xstats[NTB_TX_BYTES_ID + off] += bytes;
+		hw->ntb_xstats[NTB_TX_PKTS_ID + off] += nb_tx;
+	}
+
+	return nb_tx;
 }
 
 static int
@@ -584,16 +698,106 @@ ntb_dequeue_bufs(struct rte_rawdev *dev,
 		 unsigned int count,
 		 rte_rawdev_obj_t context)
 {
-	/* Not FIFO. Just for testing memory read. */
 	struct ntb_hw *hw = dev->dev_private;
-	unsigned int i;
-	size_t size;
+	struct ntb_rx_queue *rxq = hw->rx_queues[(size_t)context];
+	struct ntb_rx_entry *sw_ring = rxq->sw_ring;
+	struct ntb_desc rx_desc[NTB_MAX_DESC_SIZE];
+	struct rte_mbuf *first, *rxm_t;
+	struct rte_mbuf *prev = NULL;
+	volatile struct ntb_used *rx_item;
+	uint16_t nb_mbufs = 0;
+	uint16_t nb_rx = 0;
+	uint64_t bytes = 0;
+	uint16_t off, last_avail, used_cnt, used_nb;
+	int i;
 
-	size = (size_t)context;
+	if (unlikely(dev->started == 0 || hw->peer_dev_up == 0)) {
+		NTB_LOG(DEBUG, "Link is not up");
+		return nb_rx;
+	}
+
+	used_cnt = *rxq->used_cnt;
+
+	if (rxq->last_used == used_cnt)
+		return nb_rx;
+
+	last_avail = rxq->last_avail;
+	used_nb = (used_cnt - rxq->last_used) & (rxq->nb_rx_desc - 1);
+	count = RTE_MIN(count, used_nb);
+	for (nb_rx = 0; nb_rx < count; nb_rx++) {
+		i = 0;
+		while (true) {
+			rx_item = rxq->rx_used_ring + rxq->last_used;
+			rxm_t = sw_ring[rxq->last_used].mbuf;
+			rxm_t->data_len = rx_item->len;
+			rxm_t->data_off = RTE_PKTMBUF_HEADROOM;
+			rxm_t->port = rxq->port_id;
+
+			if (!i) {
+				rxm_t->nb_segs = 1;
+				first = rxm_t;
+				first->pkt_len = 0;
+				buffers[nb_rx]->buf_addr = rxm_t;
+			} else {
+				prev->next = rxm_t;
+				first->nb_segs++;
+			}
 
-	for (i = 0; i < count; i++)
-		rte_memcpy(buffers[i]->buf_addr, hw->mz[i]->addr, size);
-	return 0;
+			prev = rxm_t;
+			first->pkt_len += prev->data_len;
+			rxq->last_used = (rxq->last_used + 1) &
+					 (rxq->nb_rx_desc - 1);
+
+			/* alloc new mbuf */
+			rxm_t = rte_mbuf_raw_alloc(rxq->mpool);
+			if (unlikely(rxm_t == NULL)) {
+				NTB_LOG(ERR, "recv alloc mbuf failed.");
+				goto end_of_rx;
+			}
+			rxm_t->port = rxq->port_id;
+			sw_ring[rxq->last_avail].mbuf = rxm_t;
+			i++;
+
+			/* fill new desc */
+			rx_desc[nb_mbufs].addr =
+					rte_pktmbuf_mtod(rxm_t, uint64_t);
+			rx_desc[nb_mbufs++].len = rxm_t->buf_len -
+						  RTE_PKTMBUF_HEADROOM;
+			rxq->last_avail = (rxq->last_avail + 1) &
+					  (rxq->nb_rx_desc - 1);
+
+			if (rx_item->flags & NTB_FLAG_EOP)
+				break;
+		}
+		/* update stats */
+		bytes += first->pkt_len;
+	}
+
+end_of_rx:
+	if (nb_rx) {
+		uint16_t nb1, nb2;
+		if (nb_mbufs > rxq->nb_rx_desc - last_avail) {
+			nb1 = rxq->nb_rx_desc - last_avail;
+			nb2 = nb_mbufs - rxq->nb_rx_desc + last_avail;
+		} else {
+			nb1 = nb_mbufs;
+			nb2 = 0;
+		}
+		rte_memcpy(rxq->rx_desc_ring + last_avail, rx_desc,
+			   sizeof(struct ntb_desc) * nb1);
+		rte_memcpy(rxq->rx_desc_ring, rx_desc + nb1,
+			   sizeof(struct ntb_desc) * nb2);
+		*rxq->avail_cnt = rxq->last_avail;
+		rte_wmb();
+
+		/* update queue stats */
+		off = NTB_XSTATS_NUM * ((size_t)context + 1);
+		hw->ntb_xstats[NTB_RX_BYTES_ID + off] += bytes;
+		hw->ntb_xstats[NTB_RX_PKTS_ID + off] += nb_rx;
+		hw->ntb_xstats[NTB_RX_MISS_ID + off] += (count - nb_rx);
+	}
+
+	return nb_rx;
 }
 
 static void
@@ -1239,7 +1443,7 @@ ntb_remove(struct rte_pci_device *pci_dev)
 
 static struct rte_pci_driver rte_ntb_pmd = {
 	.id_table = pci_id_ntb_map,
-	.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_WC_ACTIVATE,
 	.probe = ntb_probe,
 	.remove = ntb_remove,
 };
diff --git a/drivers/raw/ntb/ntb.h b/drivers/raw/ntb/ntb.h
index 09e28050f..eff1f6f07 100644
--- a/drivers/raw/ntb/ntb.h
+++ b/drivers/raw/ntb/ntb.h
@@ -87,6 +87,7 @@ enum ntb_spad_idx {
  * @ntb_dev_init: Init ntb dev.
  * @get_peer_mw_addr: To get the addr of peer mw[mw_idx].
  * @mw_set_trans: Set translation of internal memory that remote can access.
+ * @ioremap: Translate the remote host address to bar address.
  * @get_link_status: get link status, link speed and link width.
  * @set_link: Set local side up/down.
  * @spad_read: Read local/peer spad register val.
@@ -103,6 +104,7 @@ struct ntb_dev_ops {
 	void *(*get_peer_mw_addr)(const struct rte_rawdev *dev, int mw_idx);
 	int (*mw_set_trans)(const struct rte_rawdev *dev, int mw_idx,
 			    uint64_t addr, uint64_t size);
+	void *(*ioremap)(const struct rte_rawdev *dev, uint64_t addr);
 	int (*get_link_status)(const struct rte_rawdev *dev);
 	int (*set_link)(const struct rte_rawdev *dev, bool up);
 	uint32_t (*spad_read)(const struct rte_rawdev *dev, int spad,
diff --git a/drivers/raw/ntb/ntb_hw_intel.c b/drivers/raw/ntb/ntb_hw_intel.c
index 0e73f1609..e7f8667cd 100644
--- a/drivers/raw/ntb/ntb_hw_intel.c
+++ b/drivers/raw/ntb/ntb_hw_intel.c
@@ -162,6 +162,27 @@ intel_ntb_mw_set_trans(const struct rte_rawdev *dev, int mw_idx,
 	return 0;
 }
 
+static void *
+intel_ntb_ioremap(const struct rte_rawdev *dev, uint64_t addr)
+{
+	struct ntb_hw *hw = dev->dev_private;
+	void *mapped = NULL;
+	void *base;
+	int i;
+
+	for (i = 0; i < hw->peer_used_mws; i++) {
+		if (addr >= hw->peer_mw_base[i] &&
+		    addr <= hw->peer_mw_base[i] + hw->mw_size[i]) {
+			base = intel_ntb_get_peer_mw_addr(dev, i);
+			mapped = (void *)(size_t)(addr - hw->peer_mw_base[i] +
+				 (size_t)base);
+			break;
+		}
+	}
+
+	return mapped;
+}
+
 static int
 intel_ntb_get_link_status(const struct rte_rawdev *dev)
 {
@@ -357,6 +378,7 @@ const struct ntb_dev_ops intel_ntb_ops = {
 	.ntb_dev_init       = intel_ntb_dev_init,
 	.get_peer_mw_addr   = intel_ntb_get_peer_mw_addr,
 	.mw_set_trans       = intel_ntb_mw_set_trans,
+	.ioremap            = intel_ntb_ioremap,
 	.get_link_status    = intel_ntb_get_link_status,
 	.set_link           = intel_ntb_set_link,
 	.spad_read          = intel_ntb_spad_read,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [PATCH v3 4/4] examples/ntb: support more functions for NTB
  2019-09-06  7:53   ` [dpdk-dev] [PATCH v3 0/4] enable FIFO " Xiaoyun Li
                       ` (2 preceding siblings ...)
  2019-09-06  7:54     ` [dpdk-dev] [PATCH v3 3/4] raw/ntb: add enqueue and dequeue functions Xiaoyun Li
@ 2019-09-06  7:54     ` Xiaoyun Li
  2019-09-09  3:27     ` [dpdk-dev] [PATCH v4 0/4] enable FIFO " Xiaoyun Li
  4 siblings, 0 replies; 42+ messages in thread
From: Xiaoyun Li @ 2019-09-06  7:54 UTC (permalink / raw)
  To: jingjing.wu, keith.wiles, omkar.maslekar, cunming.liang; +Cc: dev, Xiaoyun Li

Support to transmit files between two systems.
Support iofwd between one ethdev and NTB device.
Support rxonly and txonly for NTB device.
Support to set forwarding mode as file-trans, txonly,
rxonly or iofwd.
Support to show/clear port stats and throughput.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
---
 doc/guides/sample_app_ug/ntb.rst |   59 +-
 examples/ntb/meson.build         |    3 +
 examples/ntb/ntb_fwd.c           | 1298 +++++++++++++++++++++++++++---
 3 files changed, 1232 insertions(+), 128 deletions(-)

diff --git a/doc/guides/sample_app_ug/ntb.rst b/doc/guides/sample_app_ug/ntb.rst
index 079242175..f8291d7d1 100644
--- a/doc/guides/sample_app_ug/ntb.rst
+++ b/doc/guides/sample_app_ug/ntb.rst
@@ -5,8 +5,17 @@ NTB Sample Application
 ======================
 
 The ntb sample application shows how to use ntb rawdev driver.
-This sample provides interactive mode to transmit file between
-two hosts.
+This sample provides interactive mode to do packet based processing
+between two systems.
+
+This sample supports 4 types of packet forwarding mode.
+
+* ``file-trans``: transmit files between two systems. The sample will
+  be polling to receive files from the peer and save the file as
+  ``ntb_recv_file[N]``, [N] represents the number of received file.
+* ``rxonly``: NTB receives packets but doesn't transmit them.
+* ``txonly``: NTB generates and transmits packets without receiving any.
+* ``iofwd``: iofwd between NTB device and ethdev.
 
 Compiling the Application
 -------------------------
@@ -29,6 +38,40 @@ Refer to the *DPDK Getting Started Guide* for general information on
 running applications and the Environment Abstraction Layer (EAL)
 options.
 
+Command-line Options
+--------------------
+
+The application supports the following command-line options.
+
+* ``--buf-size=N``
+
+  Set the data size of the mbufs used to N bytes, where N < 65536.
+  The default value is 2048.
+
+* ``--fwd-mode=mode``
+
+  Set the packet forwarding mode as ``file-trans``, ``txonly``,
+  ``rxonly`` or ``iofwd``.
+
+* ``--nb-desc=N``
+
+  Set number of descriptors of queue as N, namely queue size,
+  where 64 <= N <= 1024. The default value is 1024.
+
+* ``--txfreet=N``
+
+  Set the transmit free threshold of TX rings to N, where 0 <= N <=
+  the value of ``--nb-desc``. The default value is 256.
+
+* ``--burst=N``
+
+  Set the number of packets per burst to N, where 1 <= N <= 32.
+  The default value is 32.
+
+* ``--qp=N``
+
+  Set the number of queues as N, where qp > 0.
+
 Using the application
 ---------------------
 
@@ -41,7 +84,11 @@ The application is console-driven using the cmdline DPDK interface:
 From this interface the available commands and descriptions of what
 they do as as follows:
 
-* ``send [filepath]``: Send file to the peer host.
-* ``receive [filepath]``: Receive file to [filepath]. Need the peer
-  to send file successfully first.
-* ``quit``: Exit program
+* ``send [filepath]``: Send file to the peer host. Need to be in
+  file-trans forwarding mode first.
+* ``start``: Start transmission.
+* ``stop``: Stop transmission.
+* ``show/clear port stats``: Show/Clear port stats and throughput.
+* ``set fwd file-trans/rxonly/txonly/iofwd``: Set packet forwarding
+  mode.
+* ``quit``: Exit program.
diff --git a/examples/ntb/meson.build b/examples/ntb/meson.build
index 9a6288f4f..f5435fe12 100644
--- a/examples/ntb/meson.build
+++ b/examples/ntb/meson.build
@@ -14,3 +14,6 @@ cflags += ['-D_FILE_OFFSET_BITS=64']
 sources = files(
 	'ntb_fwd.c'
 )
+if dpdk_conf.has('RTE_LIBRTE_PMD_NTB_RAWDEV')
+	deps += 'rawdev_ntb'
+endif
diff --git a/examples/ntb/ntb_fwd.c b/examples/ntb/ntb_fwd.c
index f8c970cdb..b1ea71c8f 100644
--- a/examples/ntb/ntb_fwd.c
+++ b/examples/ntb/ntb_fwd.c
@@ -14,21 +14,103 @@
 #include <cmdline.h>
 #include <rte_common.h>
 #include <rte_rawdev.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
 #include <rte_lcore.h>
+#include <rte_cycles.h>
+#include <rte_pmd_ntb.h>
 
-#define NTB_DRV_NAME_LEN	7
-static uint64_t max_file_size = 0x400000;
+/* Per-port statistics struct */
+struct ntb_port_statistics {
+	uint64_t tx;
+	uint64_t rx;
+} __rte_cache_aligned;
+/* Port 0: NTB dev, Port 1: ethdev when iofwd. */
+struct ntb_port_statistics ntb_port_stats[2];
+
+struct ntb_fwd_stream {
+	uint16_t tx_port;
+	uint16_t rx_port;
+	uint16_t qp_id;
+	uint8_t tx_ntb;  /* If ntb device is tx port. */
+};
+
+struct ntb_fwd_lcore_conf {
+	uint16_t stream_id;
+	uint16_t nb_stream;
+	uint8_t stopped;
+};
+
+enum ntb_fwd_mode {
+	FILE_TRANS = 0,
+	RXONLY,
+	TXONLY,
+	IOFWD,
+	MAX_FWD_MODE,
+};
+static const char *const fwd_mode_s[] = {
+	"file-trans",
+	"rxonly",
+	"txonly",
+	"iofwd",
+	NULL,
+};
+static enum ntb_fwd_mode fwd_mode = MAX_FWD_MODE;
+
+static struct ntb_fwd_lcore_conf fwd_lcore_conf[RTE_MAX_LCORE];
+static struct ntb_fwd_stream *fwd_streams;
+
+static struct rte_mempool *mbuf_pool;
+
+#define NTB_DRV_NAME_LEN 7
+#define MEMPOOL_CACHE_SIZE 256
+
+static uint8_t in_test;
 static uint8_t interactive = 1;
+static uint16_t eth_port_id = RTE_MAX_ETHPORTS;
 static uint16_t dev_id;
 
+/* Number of queues, default set as 1 */
+static uint16_t num_queues = 1;
+static uint16_t ntb_buf_size = RTE_MBUF_DEFAULT_BUF_SIZE;
+
+/* Configurable number of descriptors */
+#define NTB_DEFAULT_NUM_DESCS 1024
+static uint16_t nb_desc = NTB_DEFAULT_NUM_DESCS;
+
+static uint16_t tx_free_thresh;
+
+#define NTB_MAX_PKT_BURST 32
+#define NTB_DFLT_PKT_BURST 32
+static uint16_t pkt_burst = NTB_DFLT_PKT_BURST;
+
+#define BURST_TX_RETRIES 64
+
+static struct rte_eth_conf eth_port_conf = {
+	.rxmode = {
+		.mq_mode = ETH_MQ_RX_RSS,
+		.split_hdr_size = 0,
+	},
+	.rx_adv_conf = {
+		.rss_conf = {
+			.rss_key = NULL,
+			.rss_hf = ETH_RSS_IP,
+		},
+	},
+	.txmode = {
+		.mq_mode = ETH_MQ_TX_NONE,
+	},
+};
+
 /* *** Help command with introduction. *** */
 struct cmd_help_result {
 	cmdline_fixed_string_t help;
 };
 
-static void cmd_help_parsed(__attribute__((unused)) void *parsed_result,
-			    struct cmdline *cl,
-			    __attribute__((unused)) void *data)
+static void
+cmd_help_parsed(__attribute__((unused)) void *parsed_result,
+		struct cmdline *cl,
+		__attribute__((unused)) void *data)
 {
 	cmdline_printf(
 		cl,
@@ -37,13 +119,17 @@ static void cmd_help_parsed(__attribute__((unused)) void *parsed_result,
 		"Control:\n"
 		"    quit                                      :"
 		" Quit the application.\n"
-		"\nFile transmit:\n"
+		"\nTransmission:\n"
 		"    send [path]                               :"
-		" Send [path] file. (No more than %"PRIu64")\n"
-		"    recv [path]                            :"
-		" Receive file to [path]. Make sure sending is done"
-		" on the other side.\n",
-		max_file_size
+		" Send [path] file. Only take effect in file-trans mode\n"
+		"    start                                     :"
+		" Start transmissions.\n"
+		"    stop                                      :"
+		" Stop transmissions.\n"
+		"    clear/show port stats                     :"
+		" Clear/show port stats.\n"
+		"    set fwd file-trans/rxonly/txonly/iofwd    :"
+		" Set packet forwarding mode.\n"
 	);
 
 }
@@ -66,13 +152,37 @@ struct cmd_quit_result {
 	cmdline_fixed_string_t quit;
 };
 
-static void cmd_quit_parsed(__attribute__((unused)) void *parsed_result,
-			    struct cmdline *cl,
-			    __attribute__((unused)) void *data)
+static void
+cmd_quit_parsed(__attribute__((unused)) void *parsed_result,
+		struct cmdline *cl,
+		__attribute__((unused)) void *data)
 {
+	struct ntb_fwd_lcore_conf *conf;
+	uint8_t lcore_id;
+
+	/* Stop transmission first. */
+	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
+		conf = &fwd_lcore_conf[lcore_id];
+
+		if (!conf->nb_stream)
+			continue;
+
+		if (conf->stopped)
+			continue;
+
+		conf->stopped = 1;
+	}
+	printf("\nWaiting for lcores to finish...\n");
+	rte_eal_mp_wait_lcore();
+	in_test = 0;
+
 	/* Stop traffic and Close port. */
 	rte_rawdev_stop(dev_id);
 	rte_rawdev_close(dev_id);
+	if (eth_port_id < RTE_MAX_ETHPORTS && fwd_mode == IOFWD) {
+		rte_eth_dev_stop(eth_port_id);
+		rte_eth_dev_close(eth_port_id);
+	}
 
 	cmdline_quit(cl);
 }
@@ -102,21 +212,19 @@ cmd_sendfile_parsed(void *parsed_result,
 		    __attribute__((unused)) void *data)
 {
 	struct cmd_sendfile_result *res = parsed_result;
-	struct rte_rawdev_buf *pkts_send[1];
-	uint64_t rsize, size, link;
-	uint8_t *buff;
+	struct rte_rawdev_buf *pkts_send[NTB_MAX_PKT_BURST];
+	struct rte_mbuf *mbuf_send[NTB_MAX_PKT_BURST];
+	uint64_t size, count, i, nb_burst;
+	uint16_t nb_tx, buf_size;
+	unsigned int nb_pkt;
+	size_t queue_id = 0;
+	uint16_t retry = 0;
 	uint32_t val;
 	FILE *file;
 
-	if (!rte_rawdevs[dev_id].started) {
-		printf("Device needs to be up first. Try later.\n");
-		return;
-	}
-
-	rte_rawdev_get_attr(dev_id, "link_status", &link);
-	if (!link) {
-		printf("Link is not up, cannot send file.\n");
-		return;
+	if (num_queues != 1) {
+		printf("File transmission only supports 1 queue.\n");
+		num_queues = 1;
 	}
 
 	file = fopen(res->filepath, "r");
@@ -127,30 +235,13 @@ cmd_sendfile_parsed(void *parsed_result,
 
 	if (fseek(file, 0, SEEK_END) < 0) {
 		printf("Fail to get file size.\n");
+		fclose(file);
 		return;
 	}
 	size = ftell(file);
 	if (fseek(file, 0, SEEK_SET) < 0) {
 		printf("Fail to get file size.\n");
-		return;
-	}
-
-	/**
-	 * No FIFO now. Only test memory. Limit sending file
-	 * size <= max_file_size.
-	 */
-	if (size > max_file_size) {
-		printf("Warning: The file is too large. Only send first"
-		       " %"PRIu64" bits.\n", max_file_size);
-		size = max_file_size;
-	}
-
-	buff = (uint8_t *)malloc(size);
-	rsize = fread(buff, size, 1, file);
-	if (rsize != 1) {
-		printf("Fail to read file.\n");
 		fclose(file);
-		free(buff);
 		return;
 	}
 
@@ -159,22 +250,63 @@ cmd_sendfile_parsed(void *parsed_result,
 	rte_rawdev_set_attr(dev_id, "spad_user_0", val);
 	val = size;
 	rte_rawdev_set_attr(dev_id, "spad_user_1", val);
+	printf("Sending file, size is %"PRIu64"\n", size);
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		pkts_send[i] = (struct rte_rawdev_buf *)
+				malloc(sizeof(struct rte_rawdev_buf));
+
+	buf_size = ntb_buf_size - RTE_PKTMBUF_HEADROOM;
+	count = (size + buf_size - 1) / buf_size;
+	nb_burst = (count + pkt_burst - 1) / pkt_burst;
 
-	pkts_send[0] = (struct rte_rawdev_buf *)malloc
-			(sizeof(struct rte_rawdev_buf));
-	pkts_send[0]->buf_addr = buff;
+	for (i = 0; i < nb_burst; i++) {
+		val = RTE_MIN(count, pkt_burst);
+		if (rte_mempool_get_bulk(mbuf_pool, (void **)mbuf_send,
+					val) == 0) {
+			for (nb_pkt = 0; nb_pkt < val; nb_pkt++) {
+				mbuf_send[nb_pkt]->port = dev_id;
+				mbuf_send[nb_pkt]->data_len =
+				fread(rte_pktmbuf_mtod(mbuf_send[nb_pkt],
+					void *), 1, buf_size, file);
+				mbuf_send[nb_pkt]->pkt_len =
+					mbuf_send[nb_pkt]->data_len;
+				pkts_send[nb_pkt]->buf_addr = mbuf_send[nb_pkt];
+			}
+		} else {
+			for (nb_pkt = 0; nb_pkt < val; nb_pkt++) {
+				mbuf_send[nb_pkt] =
+					rte_mbuf_raw_alloc(mbuf_pool);
+				if (mbuf_send[nb_pkt] == NULL)
+					break;
+				mbuf_send[nb_pkt]->port = dev_id;
+				mbuf_send[nb_pkt]->data_len =
+				fread(rte_pktmbuf_mtod(mbuf_send[nb_pkt],
+					void *), 1, buf_size, file);
+				mbuf_send[nb_pkt]->pkt_len =
+					mbuf_send[nb_pkt]->data_len;
+				pkts_send[nb_pkt]->buf_addr = mbuf_send[nb_pkt];
+			}
+		}
 
-	if (rte_rawdev_enqueue_buffers(dev_id, pkts_send, 1,
-				       (void *)(size_t)size)) {
-		printf("Fail to enqueue.\n");
-		goto clean;
+		nb_tx = rte_rawdev_enqueue_buffers(dev_id, pkts_send, nb_pkt,
+						   (void *)queue_id);
+		while (nb_tx != nb_pkt && retry < BURST_TX_RETRIES) {
+			rte_delay_us(1);
+			nb_tx += rte_rawdev_enqueue_buffers(dev_id,
+				&pkts_send[nb_tx], nb_pkt - nb_tx,
+				(void *)queue_id);
+		}
+		count -= nb_pkt;
 	}
+	/* Clear register after file sending done. */
+	rte_rawdev_set_attr(dev_id, "spad_user_0", 0);
+	rte_rawdev_set_attr(dev_id, "spad_user_1", 0);
 	printf("Done sending file.\n");
 
-clean:
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		free(pkts_send[i]);
 	fclose(file);
-	free(buff);
-	free(pkts_send[0]);
 }
 
 cmdline_parse_token_string_t cmd_send_file_send =
@@ -195,79 +327,680 @@ cmdline_parse_inst_t cmd_send_file = {
 	},
 };
 
-/* *** RECEIVE FILE PARAMETERS *** */
-struct cmd_recvfile_result {
-	cmdline_fixed_string_t recv_string;
-	char filepath[];
-};
+#define RECV_FILE_LEN 30
+static int
+start_polling_recv_file(void *param)
+{
+	struct rte_rawdev_buf *pkts_recv[NTB_MAX_PKT_BURST];
+	struct ntb_fwd_lcore_conf *conf = param;
+	struct rte_mbuf *mbuf;
+	char filepath[RECV_FILE_LEN];
+	uint64_t val, size, file_len;
+	uint16_t nb_rx, i, file_no;
+	size_t queue_id = 0;
+	FILE *file;
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		pkts_recv[i] = (struct rte_rawdev_buf *)
+				malloc(sizeof(struct rte_rawdev_buf));
+
+	file_no = 0;
+	while (!conf->stopped) {
+		snprintf(filepath, RECV_FILE_LEN, "ntb_recv_file%d", file_no);
+		file = fopen(filepath, "w");
+		if (file == NULL) {
+			printf("Fail to open the file.\n");
+			return -EINVAL;
+		}
+
+		rte_rawdev_get_attr(dev_id, "spad_user_0", &val);
+		size = val << 32;
+		rte_rawdev_get_attr(dev_id, "spad_user_1", &val);
+		size |= val;
+
+		if (!size) {
+			fclose(file);
+			continue;
+		}
+
+		file_len = 0;
+		nb_rx = NTB_MAX_PKT_BURST;
+		while (file_len < size && !conf->stopped) {
+			nb_rx = rte_rawdev_dequeue_buffers(dev_id, pkts_recv,
+						pkt_burst, (void *)queue_id);
+			ntb_port_stats[0].rx += nb_rx;
+			for (i = 0; i < nb_rx; i++) {
+				mbuf = pkts_recv[i]->buf_addr;
+				fwrite(rte_pktmbuf_mtod(mbuf, void *), 1,
+					mbuf->data_len, file);
+				file_len += mbuf->data_len;
+				rte_pktmbuf_free(mbuf);
+				pkts_recv[i]->buf_addr = NULL;
+			}
+		}
+
+		printf("Received file (size: %" PRIu64 ") from peer to %s.\n",
+			size, filepath);
+		fclose(file);
+		file_no++;
+	}
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		free(pkts_recv[i]);
+	return 0;
+}
+
+static int
+start_iofwd_per_lcore(void *param)
+{
+	struct rte_rawdev_buf *ntb_buf[NTB_MAX_PKT_BURST];
+	struct rte_mbuf *pkts_burst[NTB_MAX_PKT_BURST];
+	struct ntb_fwd_lcore_conf *conf = param;
+	struct ntb_fwd_stream fs;
+	uint16_t nb_rx, nb_tx;
+	int i, j;
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		ntb_buf[i] = (struct rte_rawdev_buf *)
+			     malloc(sizeof(struct rte_rawdev_buf));
+
+	while (!conf->stopped) {
+		for (i = 0; i < conf->nb_stream; i++) {
+			fs = fwd_streams[conf->stream_id + i];
+			if (fs.tx_ntb) {
+				nb_rx = rte_eth_rx_burst(fs.rx_port,
+						fs.qp_id, pkts_burst,
+						pkt_burst);
+				if (unlikely(nb_rx == 0))
+					continue;
+				for (j = 0; j < nb_rx; j++)
+					ntb_buf[j]->buf_addr = pkts_burst[j];
+				nb_tx =
+				rte_rawdev_enqueue_buffers(fs.tx_port,
+						ntb_buf, nb_rx,
+						(void *)(size_t)fs.qp_id);
+				ntb_port_stats[0].tx += nb_tx;
+				ntb_port_stats[1].rx += nb_rx;
+			} else {
+				nb_rx =
+				rte_rawdev_dequeue_buffers(fs.rx_port,
+						ntb_buf, pkt_burst,
+						(void *)(size_t)fs.qp_id);
+				if (unlikely(nb_rx == 0))
+					continue;
+				for (j = 0; j < nb_rx; j++)
+					pkts_burst[j] = ntb_buf[j]->buf_addr;
+				nb_tx = rte_eth_tx_burst(fs.tx_port,
+					fs.qp_id, pkts_burst, nb_rx);
+				ntb_port_stats[1].tx += nb_tx;
+				ntb_port_stats[0].rx += nb_rx;
+			}
+			if (unlikely(nb_tx < nb_rx)) {
+				do {
+					rte_pktmbuf_free(pkts_burst[nb_tx]);
+				} while (++nb_tx < nb_rx);
+			}
+		}
+	}
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		free(ntb_buf[i]);
+
+	return 0;
+}
+
+static int
+start_rxonly_per_lcore(void *param)
+{
+	struct rte_rawdev_buf *ntb_buf[NTB_MAX_PKT_BURST];
+	struct ntb_fwd_lcore_conf *conf = param;
+	struct ntb_fwd_stream fs;
+	uint16_t nb_rx;
+	int i, j;
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		ntb_buf[i] = (struct rte_rawdev_buf *)
+			     malloc(sizeof(struct rte_rawdev_buf));
+
+	while (!conf->stopped) {
+		for (i = 0; i < conf->nb_stream; i++) {
+			fs = fwd_streams[conf->stream_id + i];
+			nb_rx = rte_rawdev_dequeue_buffers(fs.rx_port,
+				ntb_buf, pkt_burst, (void *)(size_t)fs.qp_id);
+			if (unlikely(nb_rx == 0))
+				continue;
+			ntb_port_stats[0].rx += nb_rx;
+
+			for (j = 0; j < nb_rx; j++)
+				rte_pktmbuf_free(ntb_buf[j]->buf_addr);
+		}
+	}
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		free(ntb_buf[i]);
+
+	return 0;
+}
+
+
+static int
+start_txonly_per_lcore(void *param)
+{
+	struct rte_rawdev_buf *ntb_buf[NTB_MAX_PKT_BURST];
+	struct rte_mbuf *pkts_burst[NTB_MAX_PKT_BURST];
+	struct ntb_fwd_lcore_conf *conf = param;
+	struct ntb_fwd_stream fs;
+	uint16_t nb_pkt, nb_tx;
+	int i;
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		ntb_buf[i] = (struct rte_rawdev_buf *)
+			     malloc(sizeof(struct rte_rawdev_buf));
+
+	while (!conf->stopped) {
+		for (i = 0; i < conf->nb_stream; i++) {
+			fs = fwd_streams[conf->stream_id + i];
+			if (rte_mempool_get_bulk(mbuf_pool, (void **)pkts_burst,
+				  pkt_burst) == 0) {
+				for (nb_pkt = 0; nb_pkt < pkt_burst; nb_pkt++) {
+					pkts_burst[nb_pkt]->port = dev_id;
+					pkts_burst[nb_pkt]->data_len =
+						pkts_burst[nb_pkt]->buf_len -
+						RTE_PKTMBUF_HEADROOM;
+					pkts_burst[nb_pkt]->pkt_len =
+						pkts_burst[nb_pkt]->data_len;
+					ntb_buf[nb_pkt]->buf_addr =
+						pkts_burst[nb_pkt];
+				}
+			} else {
+				for (nb_pkt = 0; nb_pkt < pkt_burst; nb_pkt++) {
+					pkts_burst[nb_pkt] =
+						rte_pktmbuf_alloc(mbuf_pool);
+					if (pkts_burst[nb_pkt] == NULL)
+						break;
+					pkts_burst[nb_pkt]->port = dev_id;
+					pkts_burst[nb_pkt]->data_len =
+						pkts_burst[nb_pkt]->buf_len -
+						RTE_PKTMBUF_HEADROOM;
+					pkts_burst[nb_pkt]->pkt_len =
+						pkts_burst[nb_pkt]->data_len;
+					ntb_buf[nb_pkt]->buf_addr =
+						pkts_burst[nb_pkt];
+				}
+			}
+			nb_tx = rte_rawdev_enqueue_buffers(fs.tx_port,
+				ntb_buf, nb_pkt, (void *)(size_t)fs.qp_id);
+			ntb_port_stats[0].tx += nb_tx;
+			if (unlikely(nb_tx < nb_pkt)) {
+				do {
+					rte_pktmbuf_free(
+						ntb_buf[nb_tx]->buf_addr);
+				} while (++nb_tx < nb_pkt);
+			}
+		}
+	}
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		free(ntb_buf[i]);
+
+	return 0;
+}
+
+static int
+ntb_fwd_config_setup(void)
+{
+	uint16_t i;
+
+	/* Make sure iofwd has valid ethdev. */
+	if (fwd_mode == IOFWD && eth_port_id >= RTE_MAX_ETHPORTS) {
+		printf("No ethdev, cannot be in iofwd mode.");
+		return -EINVAL;
+	}
+
+	if (fwd_mode == IOFWD) {
+		fwd_streams = rte_zmalloc("ntb_fwd: fwd_streams",
+			sizeof(struct ntb_fwd_stream) * num_queues * 2,
+			RTE_CACHE_LINE_SIZE);
+		for (i = 0; i < num_queues; i++) {
+			fwd_streams[i * 2].qp_id = i;
+			fwd_streams[i * 2].tx_port = dev_id;
+			fwd_streams[i * 2].rx_port = eth_port_id;
+			fwd_streams[i * 2].tx_ntb = 1;
+
+			fwd_streams[i * 2 + 1].qp_id = i;
+			fwd_streams[i * 2 + 1].tx_port = eth_port_id;
+			fwd_streams[i * 2 + 1].rx_port = dev_id;
+			fwd_streams[i * 2 + 1].tx_ntb = 0;
+		}
+		return 0;
+	}
+
+	if (fwd_mode == RXONLY || fwd_mode == FILE_TRANS) {
+		/* Only support 1 queue in file-trans for in order. */
+		if (fwd_mode == FILE_TRANS)
+			num_queues = 1;
+
+		fwd_streams = rte_zmalloc("ntb_fwd: fwd_streams",
+				sizeof(struct ntb_fwd_stream) * num_queues,
+				RTE_CACHE_LINE_SIZE);
+		for (i = 0; i < num_queues; i++) {
+			fwd_streams[i].qp_id = i;
+			fwd_streams[i].tx_port = RTE_MAX_ETHPORTS;
+			fwd_streams[i].rx_port = dev_id;
+			fwd_streams[i].tx_ntb = 0;
+		}
+		return 0;
+	}
+
+	if (fwd_mode == TXONLY) {
+		fwd_streams = rte_zmalloc("ntb_fwd: fwd_streams",
+				sizeof(struct ntb_fwd_stream) * num_queues,
+				RTE_CACHE_LINE_SIZE);
+		for (i = 0; i < num_queues; i++) {
+			fwd_streams[i].qp_id = i;
+			fwd_streams[i].tx_port = dev_id;
+			fwd_streams[i].rx_port = RTE_MAX_ETHPORTS;
+			fwd_streams[i].tx_ntb = 1;
+		}
+	}
+	return 0;
+}
 
 static void
-cmd_recvfile_parsed(void *parsed_result,
-		    __attribute__((unused)) struct cmdline *cl,
-		    __attribute__((unused)) void *data)
+assign_stream_to_lcores(void)
 {
-	struct cmd_sendfile_result *res = parsed_result;
-	struct rte_rawdev_buf *pkts_recv[1];
-	uint8_t *buff;
-	uint64_t val;
-	size_t size;
-	FILE *file;
+	struct ntb_fwd_lcore_conf *conf;
+	struct ntb_fwd_stream *fs;
+	uint16_t nb_streams, sm_per_lcore, sm_id, i;
+	uint8_t lcore_id, lcore_num, nb_extra;
 
-	if (!rte_rawdevs[dev_id].started) {
-		printf("Device needs to be up first. Try later.\n");
-		return;
+	lcore_num = rte_lcore_count();
+	/* Exclude master core */
+	lcore_num--;
+
+	nb_streams = (fwd_mode == IOFWD) ? num_queues * 2 : num_queues;
+
+	sm_per_lcore = nb_streams / lcore_num;
+	nb_extra = nb_streams % lcore_num;
+	sm_id = 0;
+	i = 0;
+
+	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
+		conf = &fwd_lcore_conf[lcore_id];
+
+		if (i < nb_extra) {
+			conf->nb_stream = sm_per_lcore + 1;
+			conf->stream_id = sm_id;
+			sm_id = sm_id + sm_per_lcore + 1;
+		} else {
+			conf->nb_stream = sm_per_lcore;
+			conf->stream_id = sm_id;
+			sm_id = sm_id + sm_per_lcore;
+		}
+
+		i++;
+		if (sm_id >= nb_streams)
+			break;
+	}
+
+	/* Print packet forwading config. */
+	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
+		conf = &fwd_lcore_conf[lcore_id];
+
+		if (!conf->nb_stream)
+			continue;
+
+		printf("Streams on Lcore %u :\n", lcore_id);
+		for (i = 0; i < conf->nb_stream; i++) {
+			fs = &fwd_streams[conf->stream_id + i];
+			if (fwd_mode == IOFWD)
+				printf(" + Stream %u : %s%u RX -> %s%u TX,"
+					" Q=%u\n", conf->stream_id + i,
+					fs->tx_ntb ? "Eth" : "NTB", fs->rx_port,
+					fs->tx_ntb ? "NTB" : "Eth", fs->tx_port,
+					fs->qp_id);
+			if (fwd_mode == FILE_TRANS || fwd_mode == RXONLY)
+				printf(" + Stream %u : %s%u RX only\n",
+					conf->stream_id, "NTB", fs->rx_port);
+			if (fwd_mode == TXONLY)
+				printf(" + Stream %u : %s%u TX only\n",
+					conf->stream_id, "NTB", fs->tx_port);
+		}
 	}
+}
 
-	rte_rawdev_get_attr(dev_id, "link_status", &val);
-	if (!val) {
-		printf("Link is not up, cannot receive file.\n");
+static void
+start_pkt_fwd(void)
+{
+	struct ntb_fwd_lcore_conf *conf;
+	struct rte_eth_link eth_link;
+	uint8_t lcore_id;
+	int ret, i;
+
+	ret = ntb_fwd_config_setup();
+	if (ret < 0) {
+		printf("Cannot start traffic. Please reset fwd mode.\n");
 		return;
 	}
 
-	file = fopen(res->filepath, "w");
-	if (file == NULL) {
-		printf("Fail to open the file.\n");
+	/* If using iofwd, checking ethdev link status first. */
+	if (fwd_mode == IOFWD) {
+		printf("Checking eth link status...\n");
+		/* Wait for eth link up at most 100 times. */
+		for (i = 0; i < 100; i++) {
+			rte_eth_link_get(eth_port_id, &eth_link);
+			if (eth_link.link_status) {
+				printf("Eth%u Link Up. Speed %u Mbps - %s\n",
+					eth_port_id, eth_link.link_speed,
+					(eth_link.link_duplex ==
+					 ETH_LINK_FULL_DUPLEX) ?
+					("full-duplex") : ("half-duplex"));
+				break;
+			}
+		}
+		if (!eth_link.link_status) {
+			printf("Eth%u link down. Cannot start traffic.\n",
+				eth_port_id);
+			return;
+		}
+	}
+
+	assign_stream_to_lcores();
+	in_test = 1;
+
+	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
+		conf = &fwd_lcore_conf[lcore_id];
+
+		if (!conf->nb_stream)
+			continue;
+
+		conf->stopped = 0;
+		if (fwd_mode == FILE_TRANS)
+			rte_eal_remote_launch(start_polling_recv_file,
+					      conf, lcore_id);
+		else if (fwd_mode == IOFWD)
+			rte_eal_remote_launch(start_iofwd_per_lcore,
+					      conf, lcore_id);
+		else if (fwd_mode == RXONLY)
+			rte_eal_remote_launch(start_rxonly_per_lcore,
+					      conf, lcore_id);
+		else if (fwd_mode == TXONLY)
+			rte_eal_remote_launch(start_txonly_per_lcore,
+					      conf, lcore_id);
+	}
+}
+
+/* *** START FWD PARAMETERS *** */
+struct cmd_start_result {
+	cmdline_fixed_string_t start;
+};
+
+static void
+cmd_start_parsed(__attribute__((unused)) void *parsed_result,
+			    __attribute__((unused)) struct cmdline *cl,
+			    __attribute__((unused)) void *data)
+{
+	start_pkt_fwd();
+}
+
+cmdline_parse_token_string_t cmd_start_start =
+		TOKEN_STRING_INITIALIZER(struct cmd_start_result, start, "start");
+
+cmdline_parse_inst_t cmd_start = {
+	.f = cmd_start_parsed,
+	.data = NULL,
+	.help_str = "start pkt fwd between ntb and ethdev",
+	.tokens = {
+		(void *)&cmd_start_start,
+		NULL,
+	},
+};
+
+/* *** STOP *** */
+struct cmd_stop_result {
+	cmdline_fixed_string_t stop;
+};
+
+static void
+cmd_stop_parsed(__attribute__((unused)) void *parsed_result,
+		__attribute__((unused)) struct cmdline *cl,
+		__attribute__((unused)) void *data)
+{
+	struct ntb_fwd_lcore_conf *conf;
+	uint8_t lcore_id;
+
+	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
+		conf = &fwd_lcore_conf[lcore_id];
+
+		if (!conf->nb_stream)
+			continue;
+
+		if (conf->stopped)
+			continue;
+
+		conf->stopped = 1;
+	}
+	printf("\nWaiting for lcores to finish...\n");
+	rte_eal_mp_wait_lcore();
+	in_test = 0;
+	printf("\nDone.\n");
+}
+
+cmdline_parse_token_string_t cmd_stop_stop =
+		TOKEN_STRING_INITIALIZER(struct cmd_stop_result, stop, "stop");
+
+cmdline_parse_inst_t cmd_stop = {
+	.f = cmd_stop_parsed,
+	.data = NULL,
+	.help_str = "stop: Stop packet forwarding",
+	.tokens = {
+		(void *)&cmd_stop_stop,
+		NULL,
+	},
+};
+
+static void
+ntb_stats_clear(void)
+{
+	int nb_ids, i;
+	uint32_t *ids;
+
+	/* Clear NTB dev stats */
+	nb_ids = rte_rawdev_xstats_names_get(dev_id, NULL, 0);
+	if (nb_ids  < 0) {
+		printf("Error: Cannot get count of xstats\n");
 		return;
 	}
+	ids = malloc(sizeof(uint32_t) * nb_ids);
+	for (i = 0; i < nb_ids; i++)
+		ids[i] = i;
+	rte_rawdev_xstats_reset(dev_id, ids, nb_ids);
+	printf("\n  statistics for NTB port %d cleared\n", dev_id);
+
+	/* Clear Ethdev stats if have any */
+	if (fwd_mode == IOFWD && eth_port_id != RTE_MAX_ETHPORTS) {
+		rte_eth_stats_reset(eth_port_id);
+		printf("\n  statistics for ETH port %d cleared\n", eth_port_id);
+	}
+}
+
+static inline void
+ntb_calculate_throughput(uint16_t port) {
+	uint64_t diff_pkts_rx, diff_pkts_tx, diff_cycles;
+	uint64_t mpps_rx, mpps_tx;
+	static uint64_t prev_pkts_rx[2];
+	static uint64_t prev_pkts_tx[2];
+	static uint64_t prev_cycles[2];
+
+	diff_cycles = prev_cycles[port];
+	prev_cycles[port] = rte_rdtsc();
+	if (diff_cycles > 0)
+		diff_cycles = prev_cycles[port] - diff_cycles;
+	diff_pkts_rx = (ntb_port_stats[port].rx > prev_pkts_rx[port]) ?
+		(ntb_port_stats[port].rx - prev_pkts_rx[port]) : 0;
+	diff_pkts_tx = (ntb_port_stats[port].tx > prev_pkts_tx[port]) ?
+		(ntb_port_stats[port].tx - prev_pkts_tx[port]) : 0;
+	prev_pkts_rx[port] = ntb_port_stats[port].rx;
+	prev_pkts_tx[port] = ntb_port_stats[port].tx;
+	mpps_rx = diff_cycles > 0 ?
+		diff_pkts_rx * rte_get_tsc_hz() / diff_cycles : 0;
+	mpps_tx = diff_cycles > 0 ?
+		diff_pkts_tx * rte_get_tsc_hz() / diff_cycles : 0;
+	printf("  Throughput (since last show)\n");
+	printf("  Rx-pps: %12"PRIu64"\n  Tx-pps: %12"PRIu64"\n",
+			mpps_rx, mpps_tx);
+
+}
+
+static void
+ntb_stats_display(void)
+{
+	struct rte_rawdev_xstats_name *xstats_names;
+	struct rte_eth_stats stats;
+	uint64_t *values;
+	uint32_t *ids;
+	int nb_ids, i;
 
-	rte_rawdev_get_attr(dev_id, "spad_user_0", &val);
-	size = val << 32;
-	rte_rawdev_get_attr(dev_id, "spad_user_1", &val);
-	size |= val;
+	printf("###### statistics for NTB port %d #######\n", dev_id);
 
-	buff = (uint8_t *)malloc(size);
-	pkts_recv[0] = (struct rte_rawdev_buf *)malloc
-			(sizeof(struct rte_rawdev_buf));
-	pkts_recv[0]->buf_addr = buff;
+	/* Get NTB dev stats and stats names */
+	nb_ids = rte_rawdev_xstats_names_get(dev_id, NULL, 0);
+	if (nb_ids  < 0) {
+		printf("Error: Cannot get count of xstats\n");
+		return;
+	}
+	xstats_names = malloc(sizeof(struct rte_rawdev_xstats_name) * nb_ids);
+	if (xstats_names == NULL) {
+		printf("Cannot allocate memory for xstats lookup\n");
+		return;
+	}
+	if (nb_ids != rte_rawdev_xstats_names_get(
+			dev_id, xstats_names, nb_ids)) {
+		printf("Error: Cannot get xstats lookup\n");
+		free(xstats_names);
+		return;
+	}
+	ids = malloc(sizeof(uint32_t) * nb_ids);
+	for (i = 0; i < nb_ids; i++)
+		ids[i] = i;
+	values = malloc(sizeof(uint64_t) * nb_ids);
+	if (nb_ids != rte_rawdev_xstats_get(dev_id, ids, values, nb_ids)) {
+		printf("Error: Unable to get xstats\n");
+		free(xstats_names);
+		free(values);
+		free(ids);
+		return;
+	}
+
+	/* Display NTB dev stats */
+	for (i = 0; i < nb_ids; i++)
+		printf("  %s: %"PRIu64"\n", xstats_names[i].name, values[i]);
+	ntb_calculate_throughput(0);
 
-	if (rte_rawdev_dequeue_buffers(dev_id, pkts_recv, 1, (void *)size)) {
-		printf("Fail to dequeue.\n");
-		goto clean;
+	/* Get Ethdev stats if have any */
+	if (fwd_mode == IOFWD && eth_port_id != RTE_MAX_ETHPORTS) {
+		printf("###### statistics for ETH port %d ######\n",
+			eth_port_id);
+		rte_eth_stats_get(eth_port_id, &stats);
+		printf("  RX-packets: %"PRIu64"\n", stats.ipackets);
+		printf("  RX-bytes: %"PRIu64"\n", stats.ibytes);
+		printf("  RX-errors: %"PRIu64"\n", stats.ierrors);
+		printf("  RX-missed: %"PRIu64"\n", stats.imissed);
+		printf("  TX-packets: %"PRIu64"\n", stats.opackets);
+		printf("  TX-bytes: %"PRIu64"\n", stats.obytes);
+		printf("  TX-errors: %"PRIu64"\n", stats.oerrors);
+		ntb_calculate_throughput(1);
 	}
 
-	fwrite(buff, size, 1, file);
-	printf("Done receiving to file.\n");
+	free(xstats_names);
+	free(values);
+	free(ids);
+}
 
-clean:
-	fclose(file);
-	free(buff);
-	free(pkts_recv[0]);
+/* *** SHOW/CLEAR PORT STATS *** */
+struct cmd_stats_result {
+	cmdline_fixed_string_t show;
+	cmdline_fixed_string_t port;
+	cmdline_fixed_string_t stats;
+};
+
+static void
+cmd_stats_parsed(void *parsed_result,
+		 __attribute__((unused)) struct cmdline *cl,
+		 __attribute__((unused)) void *data)
+{
+	struct cmd_stats_result *res = parsed_result;
+	if (!strcmp(res->show, "clear"))
+		ntb_stats_clear();
+	else
+		ntb_stats_display();
 }
 
-cmdline_parse_token_string_t cmd_recv_file_recv =
-	TOKEN_STRING_INITIALIZER(struct cmd_recvfile_result, recv_string,
-				 "recv");
-cmdline_parse_token_string_t cmd_recv_file_filepath =
-	TOKEN_STRING_INITIALIZER(struct cmd_recvfile_result, filepath, NULL);
+cmdline_parse_token_string_t cmd_stats_show =
+	TOKEN_STRING_INITIALIZER(struct cmd_stats_result, show, "show#clear");
+cmdline_parse_token_string_t cmd_stats_port =
+	TOKEN_STRING_INITIALIZER(struct cmd_stats_result, port, "port");
+cmdline_parse_token_string_t cmd_stats_stats =
+	TOKEN_STRING_INITIALIZER(struct cmd_stats_result, stats, "stats");
 
 
-cmdline_parse_inst_t cmd_recv_file = {
-	.f = cmd_recvfile_parsed,
+cmdline_parse_inst_t cmd_stats = {
+	.f = cmd_stats_parsed,
 	.data = NULL,
-	.help_str = "recv <file_path>",
+	.help_str = "show|clear port stats",
 	.tokens = {
-		(void *)&cmd_recv_file_recv,
-		(void *)&cmd_recv_file_filepath,
+		(void *)&cmd_stats_show,
+		(void *)&cmd_stats_port,
+		(void *)&cmd_stats_stats,
+		NULL,
+	},
+};
+
+/* *** SET FORWARDING MODE *** */
+struct cmd_set_fwd_mode_result {
+	cmdline_fixed_string_t set;
+	cmdline_fixed_string_t fwd;
+	cmdline_fixed_string_t mode;
+};
+
+static void
+cmd_set_fwd_mode_parsed(__attribute__((unused)) void *parsed_result,
+			__attribute__((unused)) struct cmdline *cl,
+			__attribute__((unused)) void *data)
+{
+	struct cmd_set_fwd_mode_result *res = parsed_result;
+	int i;
+
+	if (in_test) {
+		printf("Please stop traffic first.\n");
+		return;
+	}
+
+	for (i = 0; i < MAX_FWD_MODE; i++) {
+		if (!strcmp(res->mode, fwd_mode_s[i])) {
+			fwd_mode = i;
+			return;
+		}
+	}
+	printf("Invalid %s packet forwarding mode.\n", res->mode);
+}
+
+cmdline_parse_token_string_t cmd_setfwd_set =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_fwd_mode_result, set, "set");
+cmdline_parse_token_string_t cmd_setfwd_fwd =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_fwd_mode_result, fwd, "fwd");
+cmdline_parse_token_string_t cmd_setfwd_mode =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_fwd_mode_result, mode,
+				"file-trans#iofwd#txonly#rxonly");
+
+cmdline_parse_inst_t cmd_set_fwd_mode = {
+	.f = cmd_set_fwd_mode_parsed,
+	.data = NULL,
+	.help_str = "set forwarding mode as file-trans|rxonly|txonly|iofwd",
+	.tokens = {
+		(void *)&cmd_setfwd_set,
+		(void *)&cmd_setfwd_fwd,
+		(void *)&cmd_setfwd_mode,
 		NULL,
 	},
 };
@@ -276,7 +1009,10 @@ cmdline_parse_inst_t cmd_recv_file = {
 cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_help,
 	(cmdline_parse_inst_t *)&cmd_send_file,
-	(cmdline_parse_inst_t *)&cmd_recv_file,
+	(cmdline_parse_inst_t *)&cmd_start,
+	(cmdline_parse_inst_t *)&cmd_stop,
+	(cmdline_parse_inst_t *)&cmd_stats,
+	(cmdline_parse_inst_t *)&cmd_set_fwd_mode,
 	(cmdline_parse_inst_t *)&cmd_quit,
 	NULL,
 };
@@ -305,45 +1041,257 @@ signal_handler(int signum)
 	}
 }
 
+#define OPT_BUF_SIZE         "buf-size"
+#define OPT_FWD_MODE         "fwd-mode"
+#define OPT_NB_DESC          "nb-desc"
+#define OPT_TXFREET          "txfreet"
+#define OPT_BURST            "burst"
+#define OPT_QP               "qp"
+
+enum {
+	/* long options mapped to a short option */
+	OPT_NO_ZERO_COPY_NUM = 1,
+	OPT_BUF_SIZE_NUM,
+	OPT_FWD_MODE_NUM,
+	OPT_NB_DESC_NUM,
+	OPT_TXFREET_NUM,
+	OPT_BURST_NUM,
+	OPT_QP_NUM,
+};
+
+static const char short_options[] =
+	"i" /* interactive mode */
+	;
+
+static const struct option lgopts[] = {
+	{OPT_BUF_SIZE,     1, NULL, OPT_BUF_SIZE_NUM     },
+	{OPT_FWD_MODE,     1, NULL, OPT_FWD_MODE_NUM     },
+	{OPT_NB_DESC,      1, NULL, OPT_NB_DESC_NUM      },
+	{OPT_TXFREET,      1, NULL, OPT_TXFREET_NUM      },
+	{OPT_BURST,        1, NULL, OPT_BURST_NUM        },
+	{OPT_QP,           1, NULL, OPT_QP_NUM           },
+	{0,                0, NULL, 0                    }
+};
+
 static void
 ntb_usage(const char *prgname)
 {
 	printf("%s [EAL options] -- [options]\n"
-	       "-i : run in interactive mode (default value is 1)\n",
-	       prgname);
+	       "-i: run in interactive mode.\n"
+	       "-qp=N: set number of queues as N (N > 0, default: 1).\n"
+	       "--fwd-mode=N: set fwd mode (N: file-trans | rxonly | "
+	       "txonly | iofwd, default: file-trans)\n"
+	       "--buf-size=N: set mbuf dataroom size as N (0 < N < 65535,"
+	       " default: 2048).\n"
+	       "--nb-desc=N: set number of descriptors as N (%u <= N <= %u,"
+	       " default: 1024).\n"
+	       "--txfreet=N: set tx free thresh for NTB driver as N. (N >= 0)\n"
+	       "--burst=N: set pkt burst as N (0 < N <= %u default: 32).\n",
+	       prgname, NTB_MIN_DESC_SIZE, NTB_MAX_DESC_SIZE,
+	       NTB_MAX_PKT_BURST);
 }
 
-static int
-parse_args(int argc, char **argv)
+static void
+ntb_parse_args(int argc, char **argv)
 {
 	char *prgname = argv[0], **argvopt = argv;
-	int opt, ret;
+	int opt, opt_idx, n, i;
 
-	/* Only support interactive mode to send/recv file first. */
-	while ((opt = getopt(argc, argvopt, "i")) != EOF) {
+	while ((opt = getopt_long(argc, argvopt, short_options,
+				lgopts, &opt_idx)) != EOF) {
 		switch (opt) {
 		case 'i':
-			printf("Interactive-mode selected\n");
+			printf("Interactive-mode selected.\n");
 			interactive = 1;
 			break;
+		case OPT_QP_NUM:
+			n = atoi(optarg);
+			if (n > 0)
+				num_queues = n;
+			else
+				rte_exit(EXIT_FAILURE, "q must be > 0.\n");
+			break;
+		case OPT_BUF_SIZE_NUM:
+			n = atoi(optarg);
+			if (n > RTE_PKTMBUF_HEADROOM && n <= 0xFFFF)
+				ntb_buf_size = n;
+			else
+				rte_exit(EXIT_FAILURE, "buf-size must be > "
+					"%u and < 65536.\n",
+					RTE_PKTMBUF_HEADROOM);
+			break;
+		case OPT_FWD_MODE_NUM:
+			for (i = 0; i < MAX_FWD_MODE; i++) {
+				if (!strcmp(optarg, fwd_mode_s[i])) {
+					fwd_mode = i;
+					break;
+				}
+			}
+			if (i == MAX_FWD_MODE)
+				rte_exit(EXIT_FAILURE, "Unsupported mode. "
+				"(Should be: file-trans | rxonly | txonly "
+				"| iofwd)\n");
+			break;
+		case OPT_NB_DESC_NUM:
+			n = atoi(optarg);
+			if (n >= NTB_MIN_DESC_SIZE && n <= NTB_MAX_DESC_SIZE)
+				nb_desc = n;
+			else
+				rte_exit(EXIT_FAILURE, "nb-desc must be within"
+					" [%u, %u].\n", NTB_MIN_DESC_SIZE,
+					NTB_MAX_DESC_SIZE);
+			break;
+		case OPT_TXFREET_NUM:
+			n = atoi(optarg);
+			if (n >= 0)
+				tx_free_thresh = n;
+			else
+				rte_exit(EXIT_FAILURE, "txfreet must be"
+					" >= 0\n");
+			break;
+		case OPT_BURST_NUM:
+			n = atoi(optarg);
+			if (n > 0 && n <= NTB_MAX_PKT_BURST)
+				pkt_burst = n;
+			else
+				rte_exit(EXIT_FAILURE, "burst must be within "
+					"(0, %u].\n", NTB_MAX_PKT_BURST);
+			break;
 
 		default:
 			ntb_usage(prgname);
-			return -1;
+			rte_exit(EXIT_FAILURE,
+				 "Command line is incomplete or incorrect.\n");
+			break;
 		}
 	}
+}
 
-	if (optind >= 0)
-		argv[optind-1] = prgname;
+static void
+ntb_mempool_mz_free(__rte_unused struct rte_mempool_memhdr *memhdr,
+		void *opaque)
+{
+	const struct rte_memzone *mz = opaque;
+	rte_memzone_free(mz);
+}
 
-	ret = optind-1;
-	optind = 1; /* reset getopt lib */
-	return ret;
+static struct rte_mempool *
+ntb_mbuf_pool_create(uint16_t mbuf_seg_size, uint32_t nb_mbuf,
+		     struct ntb_dev_info ntb_info,
+		     struct ntb_dev_config *ntb_conf,
+		     unsigned int socket_id)
+{
+	size_t mz_len, total_elt_sz, max_mz_len, left_sz;
+	struct rte_pktmbuf_pool_private mbp_priv;
+	char pool_name[RTE_MEMPOOL_NAMESIZE];
+	char mz_name[RTE_MEMZONE_NAMESIZE];
+	const struct rte_memzone *mz;
+	struct rte_mempool *mp;
+	uint64_t align;
+	uint32_t mz_id;
+	int ret;
+
+	snprintf(pool_name, sizeof(pool_name), "ntb_mbuf_pool_%u", socket_id);
+	mp = rte_mempool_create_empty(pool_name, nb_mbuf,
+				      (mbuf_seg_size + sizeof(struct rte_mbuf)),
+				      MEMPOOL_CACHE_SIZE,
+				      sizeof(struct rte_pktmbuf_pool_private),
+				      socket_id, 0);
+	if (mp == NULL)
+		return NULL;
+
+	mbp_priv.mbuf_data_room_size = mbuf_seg_size;
+	mbp_priv.mbuf_priv_size = 0;
+	rte_pktmbuf_pool_init(mp, &mbp_priv);
+
+	ntb_conf->mz_list = rte_zmalloc("ntb_memzone_list",
+				sizeof(struct rte_memzone *) *
+				ntb_info.mw_cnt, 0);
+	if (ntb_conf->mz_list == NULL)
+		goto fail;
+
+	/* Put ntb header on mw0. */
+	if (ntb_info.mw_size[0] < ntb_info.ntb_hdr_size) {
+		printf("mw0 (size: %" PRIu64 ") is not enough for ntb hdr"
+		       " (size: %u)\n", ntb_info.mw_size[0],
+		       ntb_info.ntb_hdr_size);
+		goto fail;
+	}
+
+	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
+	left_sz = total_elt_sz * nb_mbuf;
+	for (mz_id = 0; mz_id < ntb_info.mw_cnt; mz_id++) {
+		/* If populated mbuf is enough, no need to reserve extra mz. */
+		if (!left_sz)
+			break;
+		snprintf(mz_name, sizeof(mz_name), "ntb_mw_%d", mz_id);
+		align = ntb_info.mw_size_align ? ntb_info.mw_size[mz_id] :
+			RTE_CACHE_LINE_SIZE;
+		/* Reserve ntb header space on memzone 0. */
+		max_mz_len = mz_id ? ntb_info.mw_size[mz_id] :
+			     ntb_info.mw_size[mz_id] - ntb_info.ntb_hdr_size;
+		mz_len = left_sz <= max_mz_len ? left_sz :
+			(max_mz_len / total_elt_sz * total_elt_sz);
+		if (!mz_len)
+			continue;
+		mz = rte_memzone_reserve_aligned(mz_name, mz_len, socket_id,
+					RTE_MEMZONE_IOVA_CONTIG, align);
+		if (mz == NULL) {
+			printf("Cannot allocate %" PRIu64 " aligned memzone"
+				" %u\n", align, mz_id);
+			goto fail;
+		}
+		left_sz -= mz_len;
+
+		/* Reserve ntb header space on memzone 0. */
+		if (mz_id)
+			ret = rte_mempool_populate_iova(mp, mz->addr, mz->iova,
+					mz->len, ntb_mempool_mz_free,
+					(void *)(uintptr_t)mz);
+		else
+			ret = rte_mempool_populate_iova(mp,
+					(void *)((size_t)mz->addr +
+					ntb_info.ntb_hdr_size),
+					mz->iova + ntb_info.ntb_hdr_size,
+					mz->len - ntb_info.ntb_hdr_size,
+					ntb_mempool_mz_free,
+					(void *)(uintptr_t)mz);
+		if (ret < 0) {
+			rte_memzone_free(mz);
+			rte_mempool_free(mp);
+			return NULL;
+		}
+
+		ntb_conf->mz_list[mz_id] = mz;
+	}
+	if (left_sz) {
+		printf("mw space is not enough for mempool.\n");
+		goto fail;
+	}
+
+	ntb_conf->mz_num = mz_id;
+	rte_mempool_obj_iter(mp, rte_pktmbuf_init, NULL);
+
+	return mp;
+fail:
+	rte_mempool_free(mp);
+	return NULL;
 }
 
 int
 main(int argc, char **argv)
 {
+	struct rte_eth_conf eth_pconf = eth_port_conf;
+	struct rte_rawdev_info ntb_rawdev_conf;
+	struct rte_rawdev_info ntb_rawdev_info;
+	struct rte_eth_dev_info ethdev_info;
+	struct rte_eth_rxconf eth_rx_conf;
+	struct rte_eth_txconf eth_tx_conf;
+	struct ntb_queue_conf ntb_q_conf;
+	struct ntb_dev_config ntb_conf;
+	struct ntb_dev_info ntb_info;
+	uint64_t ntb_link_status;
+	uint32_t nb_mbuf;
 	int ret, i;
 
 	signal(SIGINT, signal_handler);
@@ -353,6 +1301,9 @@ main(int argc, char **argv)
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "Error with EAL initialization.\n");
 
+	if (rte_lcore_count() < 2)
+		rte_exit(EXIT_FAILURE, "Need at least 2 cores\n");
+
 	/* Find 1st ntb rawdev. */
 	for (i = 0; i < RTE_RAWDEV_MAX_DEVS; i++)
 		if (rte_rawdevs[i].driver_name &&
@@ -368,15 +1319,118 @@ main(int argc, char **argv)
 	argc -= ret;
 	argv += ret;
 
-	ret = parse_args(argc, argv);
+	ntb_parse_args(argc, argv);
+
+	rte_rawdev_set_attr(dev_id, NTB_QUEUE_SZ_NAME, nb_desc);
+	printf("Set queue size as %u.\n", nb_desc);
+	rte_rawdev_set_attr(dev_id, NTB_QUEUE_NUM_NAME, num_queues);
+	printf("Set queue number as %u.\n", num_queues);
+	ntb_rawdev_info.dev_private = (rte_rawdev_obj_t)(&ntb_info);
+	rte_rawdev_info_get(dev_id, &ntb_rawdev_info);
+
+	nb_mbuf = nb_desc * num_queues * 2 * 2 + rte_lcore_count() *
+		  MEMPOOL_CACHE_SIZE;
+	mbuf_pool = ntb_mbuf_pool_create(ntb_buf_size, nb_mbuf, ntb_info,
+					 &ntb_conf, rte_socket_id());
+	if (mbuf_pool == NULL)
+		rte_exit(EXIT_FAILURE, "Cannot create mbuf pool.\n");
+
+	ntb_conf.num_queues = num_queues;
+	ntb_conf.queue_size = nb_desc;
+	ntb_rawdev_conf.dev_private = (rte_rawdev_obj_t)(&ntb_conf);
+	ret = rte_rawdev_configure(dev_id, &ntb_rawdev_conf);
+	if (ret)
+		rte_exit(EXIT_FAILURE, "Can't config ntb dev: err=%d, "
+			"port=%u\n", ret, dev_id);
+
+	ntb_q_conf.tx_free_thresh = tx_free_thresh;
+	ntb_q_conf.nb_desc = nb_desc;
+	ntb_q_conf.rx_mp = mbuf_pool;
+	for (i = 0; i < num_queues; i++) {
+		/* Setup rawdev queue */
+		ret = rte_rawdev_queue_setup(dev_id, i, &ntb_q_conf);
+		if (ret < 0)
+			rte_exit(EXIT_FAILURE,
+				"Failed to setup ntb queue %u.\n", i);
+	}
+
+	/* Waiting for peer dev up at most 100s.*/
+	printf("Checking ntb link status...\n");
+	for (i = 0; i < 1000; i++) {
+		rte_rawdev_get_attr(dev_id, NTB_LINK_STATUS_NAME,
+				    &ntb_link_status);
+		if (ntb_link_status) {
+			printf("Peer dev ready, ntb link up.\n");
+			break;
+		}
+		rte_delay_ms(100);
+	}
+	rte_rawdev_get_attr(dev_id, NTB_LINK_STATUS_NAME, &ntb_link_status);
+	if (ntb_link_status == 0)
+		printf("Expire 100s. Link is not up. Please restart app.\n");
+
+	ret = rte_rawdev_start(dev_id);
 	if (ret < 0)
-		rte_exit(EXIT_FAILURE, "Invalid arguments\n");
+		rte_exit(EXIT_FAILURE, "rte_rawdev_start: err=%d, port=%u\n",
+			ret, dev_id);
+
+	/* Find 1st ethdev */
+	eth_port_id = rte_eth_find_next(0);
 
-	rte_rawdev_start(dev_id);
+	if (eth_port_id < RTE_MAX_ETHPORTS) {
+		rte_eth_dev_info_get(eth_port_id, &ethdev_info);
+		eth_pconf.rx_adv_conf.rss_conf.rss_hf &=
+				ethdev_info.flow_type_rss_offloads;
+		ret = rte_eth_dev_configure(eth_port_id, num_queues,
+					    num_queues, &eth_pconf);
+		if (ret)
+			rte_exit(EXIT_FAILURE, "Can't config ethdev: err=%d, "
+				"port=%u\n", ret, eth_port_id);
+		eth_rx_conf = ethdev_info.default_rxconf;
+		eth_rx_conf.offloads = eth_pconf.rxmode.offloads;
+		eth_tx_conf = ethdev_info.default_txconf;
+		eth_tx_conf.offloads = eth_pconf.txmode.offloads;
+
+		/* Setup ethdev queue if ethdev exists */
+		for (i = 0; i < num_queues; i++) {
+			ret = rte_eth_rx_queue_setup(eth_port_id, i, nb_desc,
+					rte_eth_dev_socket_id(eth_port_id),
+					&eth_rx_conf, mbuf_pool);
+			if (ret < 0)
+				rte_exit(EXIT_FAILURE,
+					"Failed to setup eth rxq %u.\n", i);
+			ret = rte_eth_tx_queue_setup(eth_port_id, i, nb_desc,
+					rte_eth_dev_socket_id(eth_port_id),
+					&eth_tx_conf);
+			if (ret < 0)
+				rte_exit(EXIT_FAILURE,
+					"Failed to setup eth txq %u.\n", i);
+		}
+
+		ret = rte_eth_dev_start(eth_port_id);
+		if (ret < 0)
+			rte_exit(EXIT_FAILURE, "rte_eth_dev_start: err=%d, "
+				"port=%u\n", ret, eth_port_id);
+	}
+
+	/* initialize port stats */
+	memset(&ntb_port_stats, 0, sizeof(ntb_port_stats));
+
+	/* Set default fwd mode if user doesn't set it. */
+	if (fwd_mode == MAX_FWD_MODE && eth_port_id < RTE_MAX_ETHPORTS) {
+		printf("Set default fwd mode as iofwd.\n");
+		fwd_mode = IOFWD;
+	}
+	if (fwd_mode == MAX_FWD_MODE) {
+		printf("Set default fwd mode as file-trans.\n");
+		fwd_mode = FILE_TRANS;
+	}
 
 	if (interactive) {
 		sleep(1);
 		prompt();
+	} else {
+		start_pkt_fwd();
 	}
 
 	return 0;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [PATCH v4 0/4] enable FIFO for NTB
  2019-09-06  7:53   ` [dpdk-dev] [PATCH v3 0/4] enable FIFO " Xiaoyun Li
                       ` (3 preceding siblings ...)
  2019-09-06  7:54     ` [dpdk-dev] [PATCH v3 4/4] examples/ntb: support more functions for NTB Xiaoyun Li
@ 2019-09-09  3:27     ` " Xiaoyun Li
  2019-09-09  3:27       ` [dpdk-dev] [PATCH v4 1/4] raw/ntb: setup ntb queue Xiaoyun Li
                         ` (4 more replies)
  4 siblings, 5 replies; 42+ messages in thread
From: Xiaoyun Li @ 2019-09-09  3:27 UTC (permalink / raw)
  To: jingjing.wu, keith.wiles, omkar.maslekar, cunming.liang; +Cc: dev, Xiaoyun Li

Enable FIFO for NTB rawdev driver to support packet based
processing. And an example is provided to support txonly,
rxonly, iofwd between NTB device and ethdev, and file
transmission.

Acked-by: Omkar Maslekar <omkar.maslekar@intel.com>

---
v4:
 * Fixed compile issues with 32-bit machine.
 * Fixed total xstats issue.

v3:
 * Replace strncpy with memcpy to avoid gcc-9 compile issue.

v2:
 * Fixed compile issues with 32-bit machine and lack of including file.
 * Fixed a typo.

Xiaoyun Li (4):
  raw/ntb: setup ntb queue
  raw/ntb: add xstats support
  raw/ntb: add enqueue and dequeue functions
  examples/ntb: support more functions for NTB

 doc/guides/rawdevs/ntb.rst             |   67 +-
 doc/guides/rel_notes/release_19_11.rst |    4 +
 doc/guides/sample_app_ug/ntb.rst       |   59 +-
 drivers/raw/ntb/Makefile               |    3 +
 drivers/raw/ntb/meson.build            |    1 +
 drivers/raw/ntb/ntb.c                  | 1076 +++++++++++++++-----
 drivers/raw/ntb/ntb.h                  |  162 ++-
 drivers/raw/ntb/ntb_hw_intel.c         |   48 +-
 drivers/raw/ntb/rte_pmd_ntb.h          |   43 +
 examples/ntb/meson.build               |    3 +
 examples/ntb/ntb_fwd.c                 | 1298 +++++++++++++++++++++---
 11 files changed, 2348 insertions(+), 416 deletions(-)
 create mode 100644 drivers/raw/ntb/rte_pmd_ntb.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [PATCH v4 1/4] raw/ntb: setup ntb queue
  2019-09-09  3:27     ` [dpdk-dev] [PATCH v4 0/4] enable FIFO " Xiaoyun Li
@ 2019-09-09  3:27       ` Xiaoyun Li
  2019-09-23  2:50         ` Wu, Jingjing
  2019-09-09  3:27       ` [dpdk-dev] [PATCH v4 2/4] raw/ntb: add xstats support Xiaoyun Li
                         ` (3 subsequent siblings)
  4 siblings, 1 reply; 42+ messages in thread
From: Xiaoyun Li @ 2019-09-09  3:27 UTC (permalink / raw)
  To: jingjing.wu, keith.wiles, omkar.maslekar, cunming.liang; +Cc: dev, Xiaoyun Li

Setup and init ntb txq and rxq. And negotiate queue information
with the peer. If queue size and number of queues are not
consistent on both sides, return error.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
---
 doc/guides/rawdevs/ntb.rst             |  39 +-
 doc/guides/rel_notes/release_19_11.rst |   4 +
 drivers/raw/ntb/Makefile               |   3 +
 drivers/raw/ntb/meson.build            |   1 +
 drivers/raw/ntb/ntb.c                  | 705 ++++++++++++++++++-------
 drivers/raw/ntb/ntb.h                  | 151 ++++--
 drivers/raw/ntb/ntb_hw_intel.c         |  26 +-
 drivers/raw/ntb/rte_pmd_ntb.h          |  43 ++
 8 files changed, 718 insertions(+), 254 deletions(-)
 create mode 100644 drivers/raw/ntb/rte_pmd_ntb.h

diff --git a/doc/guides/rawdevs/ntb.rst b/doc/guides/rawdevs/ntb.rst
index 0a61ec03d..99e7db441 100644
--- a/doc/guides/rawdevs/ntb.rst
+++ b/doc/guides/rawdevs/ntb.rst
@@ -45,8 +45,45 @@ to use, i.e. igb_uio, vfio. The ``dpdk-devbind.py`` script can be used to
 show devices status and to bind them to a suitable kernel driver. They will
 appear under the category of "Misc (rawdev) devices".
 
+Ring Layout
+-----------
+
+Since read/write remote system's memory are through PCI bus, remote read
+is much more expensive than remote write. Thus, the enqueue and dequeue
+based on ntb ring should avoid remote read. The ring layout for ntb is
+like the following:
+- Ring Format:
+  desc_ring:
+      0               16                                              64
+      +---------------------------------------------------------------+
+      |                        buffer address                         |
+      +---------------+-----------------------------------------------+
+      | buffer length |                      resv                     |
+      +---------------+-----------------------------------------------+
+  used_ring:
+      0               16              32
+      +---------------+---------------+
+      | packet length |     flags     |
+      +---------------+---------------+
+- Ring Layout
+      +------------------------+   +------------------------+
+      | used_ring              |   | desc_ring              |
+      | +---+                  |   | +---+                  |
+      | |   |                  |   | |   |                  |
+      | +---+      +--------+  |   | +---+                  |
+      | |   | ---> | buffer | <+---+-|   |                  |
+      | +---+      +--------+  |   | +---+                  |
+      | |   |                  |   | |   |                  |
+      | +---+                  |   | +---+                  |
+      |  ...                   |   |  ...                   |
+      |                        |   |                        |
+      |            +---------+ |   |            +---------+ |
+      |            | tx_tail | |   |            | rx_tail | |
+      | System A   +---------+ |   | System B   +---------+ |
+      +------------------------+   +------------------------+
+                    <---------traffic---------
+
 Limitation
 ----------
 
-- The FIFO hasn't been introduced and will come in 19.11 release.
 - This PMD only supports Intel Skylake platform.
diff --git a/doc/guides/rel_notes/release_19_11.rst b/doc/guides/rel_notes/release_19_11.rst
index 8490d897c..7ac3d5ca6 100644
--- a/doc/guides/rel_notes/release_19_11.rst
+++ b/doc/guides/rel_notes/release_19_11.rst
@@ -56,6 +56,10 @@ New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
+   * **Introduced FIFO for NTB PMD.**
+
+     Introduced FIFO for NTB (Non-transparent Bridge) PMD to support
+     packet based processing.
 
 Removed Items
 -------------
diff --git a/drivers/raw/ntb/Makefile b/drivers/raw/ntb/Makefile
index 6fe2aaf40..814cd05ca 100644
--- a/drivers/raw/ntb/Makefile
+++ b/drivers/raw/ntb/Makefile
@@ -25,4 +25,7 @@ LIBABIVER := 1
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_NTB_RAWDEV) += ntb.c
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_NTB_RAWDEV) += ntb_hw_intel.c
 
+# install this header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_PMD_NTB_RAWDEV)-include := rte_pmd_ntb.h
+
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/raw/ntb/meson.build b/drivers/raw/ntb/meson.build
index 7f39437f8..7a7d26126 100644
--- a/drivers/raw/ntb/meson.build
+++ b/drivers/raw/ntb/meson.build
@@ -5,4 +5,5 @@ deps += ['rawdev', 'mbuf', 'mempool',
 	 'pci', 'bus_pci']
 sources = files('ntb.c',
                 'ntb_hw_intel.c')
+install_headers('rte_pmd_ntb.h')
 allow_experimental_apis = true
diff --git a/drivers/raw/ntb/ntb.c b/drivers/raw/ntb/ntb.c
index bfecce1e4..728deccdf 100644
--- a/drivers/raw/ntb/ntb.c
+++ b/drivers/raw/ntb/ntb.c
@@ -12,6 +12,7 @@
 #include <rte_eal.h>
 #include <rte_log.h>
 #include <rte_pci.h>
+#include <rte_mbuf.h>
 #include <rte_bus_pci.h>
 #include <rte_memzone.h>
 #include <rte_memcpy.h>
@@ -19,6 +20,7 @@
 #include <rte_rawdev_pmd.h>
 
 #include "ntb_hw_intel.h"
+#include "rte_pmd_ntb.h"
 #include "ntb.h"
 
 int ntb_logtype;
@@ -28,48 +30,7 @@ static const struct rte_pci_id pci_id_ntb_map[] = {
 	{ .vendor_id = 0, /* sentinel */ },
 };
 
-static int
-ntb_set_mw(struct rte_rawdev *dev, int mw_idx, uint64_t mw_size)
-{
-	struct ntb_hw *hw = dev->dev_private;
-	char mw_name[RTE_MEMZONE_NAMESIZE];
-	const struct rte_memzone *mz;
-	int ret = 0;
-
-	if (hw->ntb_ops->mw_set_trans == NULL) {
-		NTB_LOG(ERR, "Not supported to set mw.");
-		return -ENOTSUP;
-	}
-
-	snprintf(mw_name, sizeof(mw_name), "ntb_%d_mw_%d",
-		 dev->dev_id, mw_idx);
-
-	mz = rte_memzone_lookup(mw_name);
-	if (mz)
-		return 0;
-
-	/**
-	 * Hardware requires that mapped memory base address should be
-	 * aligned with EMBARSZ and needs continuous memzone.
-	 */
-	mz = rte_memzone_reserve_aligned(mw_name, mw_size, dev->socket_id,
-				RTE_MEMZONE_IOVA_CONTIG, hw->mw_size[mw_idx]);
-	if (!mz) {
-		NTB_LOG(ERR, "Cannot allocate aligned memzone.");
-		return -EIO;
-	}
-	hw->mz[mw_idx] = mz;
-
-	ret = (*hw->ntb_ops->mw_set_trans)(dev, mw_idx, mz->iova, mw_size);
-	if (ret) {
-		NTB_LOG(ERR, "Cannot set mw translation.");
-		return ret;
-	}
-
-	return ret;
-}
-
-static void
+static inline void
 ntb_link_cleanup(struct rte_rawdev *dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
@@ -89,20 +50,94 @@ ntb_link_cleanup(struct rte_rawdev *dev)
 	}
 
 	/* Clear mw so that peer cannot access local memory.*/
-	for (i = 0; i < hw->mw_cnt; i++) {
+	for (i = 0; i < hw->used_mw_num; i++) {
 		status = (*hw->ntb_ops->mw_set_trans)(dev, i, 0, 0);
 		if (status)
 			NTB_LOG(ERR, "Failed to clean mw.");
 	}
 }
 
+static inline int
+ntb_handshake_work(const struct rte_rawdev *dev)
+{
+	struct ntb_hw *hw = dev->dev_private;
+	uint32_t val;
+	int ret, i;
+
+	if (hw->ntb_ops->spad_write == NULL ||
+	    hw->ntb_ops->mw_set_trans == NULL) {
+		NTB_LOG(ERR, "Scratchpad/MW setting is not supported.");
+		return -ENOTSUP;
+	}
+
+	/* Tell peer the mw info of local side. */
+	ret = (*hw->ntb_ops->spad_write)(dev, SPAD_NUM_MWS, 1, hw->mw_cnt);
+	if (ret < 0)
+		return ret;
+	for (i = 0; i < hw->mw_cnt; i++) {
+		NTB_LOG(INFO, "Local %u mw size: 0x%"PRIx64"", i,
+				hw->mw_size[i]);
+		val = hw->mw_size[i] >> 32;
+		ret = (*hw->ntb_ops->spad_write)(dev, SPAD_MW0_SZ_H + 2 * i,
+						 1, val);
+		if (ret < 0)
+			return ret;
+		val = hw->mw_size[i];
+		ret = (*hw->ntb_ops->spad_write)(dev, SPAD_MW0_SZ_L + 2 * i,
+						 1, val);
+		if (ret < 0)
+			return ret;
+	}
+
+	/* Tell peer about the queue info and map memory to the peer. */
+	ret = (*hw->ntb_ops->spad_write)(dev, SPAD_Q_SZ, 1, hw->queue_size);
+	if (ret < 0)
+		return ret;
+	ret = (*hw->ntb_ops->spad_write)(dev, SPAD_NUM_QPS, 1,
+					 hw->queue_pairs);
+	if (ret < 0)
+		return ret;
+	ret = (*hw->ntb_ops->spad_write)(dev, SPAD_USED_MWS, 1,
+					 hw->used_mw_num);
+	if (ret < 0)
+		return ret;
+	for (i = 0; i < hw->used_mw_num; i++) {
+		val = (uint64_t)(size_t)(hw->mz[i]->addr) >> 32;
+		ret = (*hw->ntb_ops->spad_write)(dev, SPAD_MW0_BA_H + 2 * i,
+						 1, val);
+		if (ret < 0)
+			return ret;
+		val = (uint64_t)(size_t)(hw->mz[i]->addr);
+		ret = (*hw->ntb_ops->spad_write)(dev, SPAD_MW0_BA_L + 2 * i,
+						 1, val);
+		if (ret < 0)
+			return ret;
+	}
+
+	for (i = 0; i < hw->used_mw_num; i++) {
+		ret = (*hw->ntb_ops->mw_set_trans)(dev, i, hw->mz[i]->iova,
+						   hw->mz[i]->len);
+		if (ret < 0)
+			return ret;
+	}
+
+	/* Ring doorbell 0 to tell peer the device is ready. */
+	ret = (*hw->ntb_ops->peer_db_set)(dev, 0);
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
+
 static void
 ntb_dev_intr_handler(void *param)
 {
 	struct rte_rawdev *dev = (struct rte_rawdev *)param;
 	struct ntb_hw *hw = dev->dev_private;
-	uint32_t mw_size_h, mw_size_l;
+	uint32_t val_h, val_l;
+	uint64_t peer_mw_size;
 	uint64_t db_bits = 0;
+	uint8_t peer_mw_cnt;
 	int i = 0;
 
 	if (hw->ntb_ops->db_read == NULL ||
@@ -118,7 +153,7 @@ ntb_dev_intr_handler(void *param)
 
 	/* Doorbell 0 is for peer device ready. */
 	if (db_bits & 1) {
-		NTB_LOG(DEBUG, "DB0: Peer device is up.");
+		NTB_LOG(INFO, "DB0: Peer device is up.");
 		/* Clear received doorbell. */
 		(*hw->ntb_ops->db_clear)(dev, 1);
 
@@ -129,47 +164,44 @@ ntb_dev_intr_handler(void *param)
 		if (hw->peer_dev_up)
 			return;
 
-		if (hw->ntb_ops->spad_read == NULL ||
-		    hw->ntb_ops->spad_write == NULL) {
-			NTB_LOG(ERR, "Scratchpad is not supported.");
+		if (hw->ntb_ops->spad_read == NULL) {
+			NTB_LOG(ERR, "Scratchpad read is not supported.");
+			return;
+		}
+
+		/* Check if mw setting on the peer is the same as local. */
+		peer_mw_cnt = (*hw->ntb_ops->spad_read)(dev, SPAD_NUM_MWS, 0);
+		if (peer_mw_cnt != hw->mw_cnt) {
+			NTB_LOG(ERR, "Both mw cnt must be the same.");
 			return;
 		}
 
-		hw->peer_mw_cnt = (*hw->ntb_ops->spad_read)
-				  (dev, SPAD_NUM_MWS, 0);
-		hw->peer_mw_size = rte_zmalloc("uint64_t",
-				   hw->peer_mw_cnt * sizeof(uint64_t), 0);
 		for (i = 0; i < hw->mw_cnt; i++) {
-			mw_size_h = (*hw->ntb_ops->spad_read)
-				    (dev, SPAD_MW0_SZ_H + 2 * i, 0);
-			mw_size_l = (*hw->ntb_ops->spad_read)
-				    (dev, SPAD_MW0_SZ_L + 2 * i, 0);
-			hw->peer_mw_size[i] = ((uint64_t)mw_size_h << 32) |
-					      mw_size_l;
+			val_h = (*hw->ntb_ops->spad_read)
+				(dev, SPAD_MW0_SZ_H + 2 * i, 0);
+			val_l = (*hw->ntb_ops->spad_read)
+				(dev, SPAD_MW0_SZ_L + 2 * i, 0);
+			peer_mw_size = ((uint64_t)val_h << 32) | val_l;
 			NTB_LOG(DEBUG, "Peer %u mw size: 0x%"PRIx64"", i,
-					hw->peer_mw_size[i]);
+					peer_mw_size);
+			if (peer_mw_size != hw->mw_size[i]) {
+				NTB_LOG(ERR, "Mw config must be the same.");
+				return;
+			}
 		}
 
 		hw->peer_dev_up = 1;
 
 		/**
-		 * Handshake with peer. Spad_write only works when both
-		 * devices are up. So write spad again when db is received.
-		 * And set db again for the later device who may miss
+		 * Handshake with peer. Spad_write & mw_set_trans only works
+		 * when both devices are up. So write spad again when db is
+		 * received. And set db again for the later device who may miss
 		 * the 1st db.
 		 */
-		for (i = 0; i < hw->mw_cnt; i++) {
-			(*hw->ntb_ops->spad_write)(dev, SPAD_NUM_MWS,
-						   1, hw->mw_cnt);
-			mw_size_h = hw->mw_size[i] >> 32;
-			(*hw->ntb_ops->spad_write)(dev, SPAD_MW0_SZ_H + 2 * i,
-						   1, mw_size_h);
-
-			mw_size_l = hw->mw_size[i];
-			(*hw->ntb_ops->spad_write)(dev, SPAD_MW0_SZ_L + 2 * i,
-						   1, mw_size_l);
+		if (ntb_handshake_work(dev) < 0) {
+			NTB_LOG(ERR, "Handshake work failed.");
+			return;
 		}
-		(*hw->ntb_ops->peer_db_set)(dev, 0);
 
 		/* To get the link info. */
 		if (hw->ntb_ops->get_link_status == NULL) {
@@ -183,7 +215,7 @@ ntb_dev_intr_handler(void *param)
 	}
 
 	if (db_bits & (1 << 1)) {
-		NTB_LOG(DEBUG, "DB1: Peer device is down.");
+		NTB_LOG(INFO, "DB1: Peer device is down.");
 		/* Clear received doorbell. */
 		(*hw->ntb_ops->db_clear)(dev, 2);
 
@@ -197,7 +229,7 @@ ntb_dev_intr_handler(void *param)
 	}
 
 	if (db_bits & (1 << 2)) {
-		NTB_LOG(DEBUG, "DB2: Peer device agrees dev to be down.");
+		NTB_LOG(INFO, "DB2: Peer device agrees dev to be down.");
 		/* Clear received doorbell. */
 		(*hw->ntb_ops->db_clear)(dev, (1 << 2));
 		hw->peer_dev_up = 0;
@@ -206,24 +238,228 @@ ntb_dev_intr_handler(void *param)
 }
 
 static void
-ntb_queue_conf_get(struct rte_rawdev *dev __rte_unused,
-		   uint16_t queue_id __rte_unused,
-		   rte_rawdev_obj_t queue_conf __rte_unused)
+ntb_queue_conf_get(struct rte_rawdev *dev,
+		   uint16_t queue_id,
+		   rte_rawdev_obj_t queue_conf)
+{
+	struct ntb_queue_conf *q_conf = queue_conf;
+	struct ntb_hw *hw = dev->dev_private;
+
+	q_conf->tx_free_thresh = hw->tx_queues[queue_id]->tx_free_thresh;
+	q_conf->nb_desc = hw->rx_queues[queue_id]->nb_rx_desc;
+	q_conf->rx_mp = hw->rx_queues[queue_id]->mpool;
+}
+
+static void
+ntb_rxq_release_mbufs(struct ntb_rx_queue *q)
+{
+	int i;
+
+	if (!q || !q->sw_ring) {
+		NTB_LOG(ERR, "Pointer to rxq or sw_ring is NULL");
+		return;
+	}
+
+	for (i = 0; i < q->nb_rx_desc; i++) {
+		if (q->sw_ring[i].mbuf) {
+			rte_pktmbuf_free_seg(q->sw_ring[i].mbuf);
+			q->sw_ring[i].mbuf = NULL;
+		}
+	}
+}
+
+static void
+ntb_rxq_release(struct ntb_rx_queue *rxq)
+{
+	if (!rxq) {
+		NTB_LOG(ERR, "Pointer to rxq is NULL");
+		return;
+	}
+
+	ntb_rxq_release_mbufs(rxq);
+
+	rte_free(rxq->sw_ring);
+	rte_free(rxq);
+}
+
+static int
+ntb_rxq_setup(struct rte_rawdev *dev,
+	      uint16_t qp_id,
+	      rte_rawdev_obj_t queue_conf)
+{
+	struct ntb_queue_conf *rxq_conf = queue_conf;
+	struct ntb_hw *hw = dev->dev_private;
+	struct ntb_rx_queue *rxq;
+
+	/* Allocate the rx queue data structure */
+	rxq = rte_zmalloc_socket("ntb rx queue",
+				 sizeof(struct ntb_rx_queue),
+				 RTE_CACHE_LINE_SIZE,
+				 dev->socket_id);
+	if (!rxq) {
+		NTB_LOG(ERR, "Failed to allocate memory for "
+			    "rx queue data structure.");
+		return -ENOMEM;
+	}
+
+	if (rxq_conf->rx_mp == NULL) {
+		NTB_LOG(ERR, "Invalid null mempool pointer.");
+		return -EINVAL;
+	}
+	rxq->nb_rx_desc = rxq_conf->nb_desc;
+	rxq->mpool = rxq_conf->rx_mp;
+	rxq->port_id = dev->dev_id;
+	rxq->queue_id = qp_id;
+	rxq->hw = hw;
+
+	/* Allocate the software ring. */
+	rxq->sw_ring =
+		rte_zmalloc_socket("ntb rx sw ring",
+				   sizeof(struct ntb_rx_entry) *
+				   rxq->nb_rx_desc,
+				   RTE_CACHE_LINE_SIZE,
+				   dev->socket_id);
+	if (!rxq->sw_ring) {
+		ntb_rxq_release(rxq);
+		NTB_LOG(ERR, "Failed to allocate memory for SW ring");
+		return -ENOMEM;
+	}
+
+	hw->rx_queues[qp_id] = rxq;
+
+	return 0;
+}
+
+static void
+ntb_txq_release_mbufs(struct ntb_tx_queue *q)
+{
+	int i;
+
+	if (!q || !q->sw_ring) {
+		NTB_LOG(ERR, "Pointer to txq or sw_ring is NULL");
+		return;
+	}
+
+	for (i = 0; i < q->nb_tx_desc; i++) {
+		if (q->sw_ring[i].mbuf) {
+			rte_pktmbuf_free_seg(q->sw_ring[i].mbuf);
+			q->sw_ring[i].mbuf = NULL;
+		}
+	}
+}
+
+static void
+ntb_txq_release(struct ntb_tx_queue *txq)
 {
+	if (!txq) {
+		NTB_LOG(ERR, "Pointer to txq is NULL");
+		return;
+	}
+
+	ntb_txq_release_mbufs(txq);
+
+	rte_free(txq->sw_ring);
+	rte_free(txq);
 }
 
 static int
-ntb_queue_setup(struct rte_rawdev *dev __rte_unused,
-		uint16_t queue_id __rte_unused,
-		rte_rawdev_obj_t queue_conf __rte_unused)
+ntb_txq_setup(struct rte_rawdev *dev,
+	      uint16_t qp_id,
+	      rte_rawdev_obj_t queue_conf)
 {
+	struct ntb_queue_conf *txq_conf = queue_conf;
+	struct ntb_hw *hw = dev->dev_private;
+	struct ntb_tx_queue *txq;
+	uint16_t i, prev;
+
+	/* Allocate the TX queue data structure. */
+	txq = rte_zmalloc_socket("ntb tx queue",
+				  sizeof(struct ntb_tx_queue),
+				  RTE_CACHE_LINE_SIZE,
+				  dev->socket_id);
+	if (!txq) {
+		NTB_LOG(ERR, "Failed to allocate memory for "
+			    "tx queue structure");
+		return -ENOMEM;
+	}
+
+	txq->nb_tx_desc = txq_conf->nb_desc;
+	txq->port_id = dev->dev_id;
+	txq->queue_id = qp_id;
+	txq->hw = hw;
+
+	/* Allocate software ring */
+	txq->sw_ring =
+		rte_zmalloc_socket("ntb tx sw ring",
+				   sizeof(struct ntb_tx_entry) *
+				   txq->nb_tx_desc,
+				   RTE_CACHE_LINE_SIZE,
+				   dev->socket_id);
+	if (!txq->sw_ring) {
+		ntb_txq_release(txq);
+		NTB_LOG(ERR, "Failed to allocate memory for SW TX ring");
+		return -ENOMEM;
+	}
+
+	prev = txq->nb_tx_desc - 1;
+	for (i = 0; i < txq->nb_tx_desc; i++) {
+		txq->sw_ring[i].mbuf = NULL;
+		txq->sw_ring[i].last_id = i;
+		txq->sw_ring[prev].next_id = i;
+		prev = i;
+	}
+
+	txq->tx_free_thresh = txq_conf->tx_free_thresh ?
+			      txq_conf->tx_free_thresh :
+			      NTB_DFLT_TX_FREE_THRESH;
+	if (txq->tx_free_thresh >= txq->nb_tx_desc - 3) {
+		NTB_LOG(ERR, "tx_free_thresh must be less than nb_desc - 3. "
+			"(tx_free_thresh=%u qp_id=%u)", txq->tx_free_thresh,
+			qp_id);
+		return -EINVAL;
+	}
+
+	hw->tx_queues[qp_id] = txq;
+
 	return 0;
 }
 
+
+static int
+ntb_queue_setup(struct rte_rawdev *dev,
+		uint16_t queue_id,
+		rte_rawdev_obj_t queue_conf)
+{
+	struct ntb_hw *hw = dev->dev_private;
+	int ret;
+
+	if (queue_id > hw->queue_pairs)
+		return -EINVAL;
+
+	ret = ntb_txq_setup(dev, queue_id, queue_conf);
+	if (ret < 0)
+		return ret;
+
+	ret = ntb_rxq_setup(dev, queue_id, queue_conf);
+
+	return ret;
+}
+
 static int
-ntb_queue_release(struct rte_rawdev *dev __rte_unused,
-		  uint16_t queue_id __rte_unused)
+ntb_queue_release(struct rte_rawdev *dev, uint16_t queue_id)
 {
+	struct ntb_hw *hw = dev->dev_private;
+	struct ntb_tx_queue *txq;
+	struct ntb_rx_queue *rxq;
+
+	if (queue_id > hw->queue_pairs)
+		return -EINVAL;
+
+	txq = hw->tx_queues[queue_id];
+	rxq = hw->rx_queues[queue_id];
+	ntb_txq_release(txq);
+	ntb_rxq_release(rxq);
+
 	return 0;
 }
 
@@ -234,6 +470,77 @@ ntb_queue_count(struct rte_rawdev *dev)
 	return hw->queue_pairs;
 }
 
+static int
+ntb_queue_init(struct rte_rawdev *dev, uint16_t qp_id)
+{
+	struct ntb_hw *hw = dev->dev_private;
+	struct ntb_rx_queue *rxq = hw->rx_queues[qp_id];
+	struct ntb_tx_queue *txq = hw->tx_queues[qp_id];
+	volatile struct ntb_header *local_hdr;
+	struct ntb_header *remote_hdr;
+	uint16_t q_size = hw->queue_size;
+	uint32_t hdr_offset;
+	void *bar_addr;
+	uint16_t i;
+
+	if (hw->ntb_ops->get_peer_mw_addr == NULL) {
+		NTB_LOG(ERR, "Failed to get mapped peer addr.");
+		return -EINVAL;
+	}
+
+	/* Put queue info into the start of shared memory. */
+	hdr_offset = hw->hdr_size_per_queue * qp_id;
+	local_hdr = (volatile struct ntb_header *)
+		    ((size_t)hw->mz[0]->addr + hdr_offset);
+	bar_addr = (*hw->ntb_ops->get_peer_mw_addr)(dev, 0);
+	if (bar_addr == NULL)
+		return -EINVAL;
+	remote_hdr = (struct ntb_header *)
+		     ((size_t)bar_addr + hdr_offset);
+
+	/* rxq init. */
+	rxq->rx_desc_ring = (struct ntb_desc *)
+			    (&remote_hdr->desc_ring);
+	rxq->rx_used_ring = (volatile struct ntb_used *)
+			    (&local_hdr->desc_ring[q_size]);
+	rxq->avail_cnt = &remote_hdr->avail_cnt;
+	rxq->used_cnt = &local_hdr->used_cnt;
+
+	for (i = 0; i < rxq->nb_rx_desc - 1; i++) {
+		struct rte_mbuf *mbuf = rte_mbuf_raw_alloc(rxq->mpool);
+		if (unlikely(!mbuf)) {
+			NTB_LOG(ERR, "Failed to allocate mbuf for RX");
+			return -ENOMEM;
+		}
+		mbuf->port = dev->dev_id;
+
+		rxq->sw_ring[i].mbuf = mbuf;
+
+		rxq->rx_desc_ring[i].addr = rte_pktmbuf_mtod(mbuf, size_t);
+		rxq->rx_desc_ring[i].len = mbuf->buf_len - RTE_PKTMBUF_HEADROOM;
+	}
+	rte_wmb();
+	*rxq->avail_cnt = rxq->nb_rx_desc - 1;
+	rxq->last_avail = rxq->nb_rx_desc - 1;
+	rxq->last_used = 0;
+
+	/* txq init */
+	txq->tx_desc_ring = (volatile struct ntb_desc *)
+			    (&local_hdr->desc_ring);
+	txq->tx_used_ring = (struct ntb_used *)
+			    (&remote_hdr->desc_ring[q_size]);
+	txq->avail_cnt = &local_hdr->avail_cnt;
+	txq->used_cnt = &remote_hdr->used_cnt;
+
+	rte_wmb();
+	*txq->used_cnt = 0;
+	txq->last_used = 0;
+	txq->last_avail = 0;
+	txq->nb_tx_free = txq->nb_tx_desc - 1;
+
+	return 0;
+}
+
 static int
 ntb_enqueue_bufs(struct rte_rawdev *dev,
 		 struct rte_rawdev_buf **buffers,
@@ -278,58 +585,51 @@ static void
 ntb_dev_info_get(struct rte_rawdev *dev, rte_rawdev_obj_t dev_info)
 {
 	struct ntb_hw *hw = dev->dev_private;
-	struct ntb_attr *ntb_attrs = dev_info;
+	struct ntb_dev_info *info = dev_info;
 
-	strncpy(ntb_attrs[NTB_TOPO_ID].name, NTB_TOPO_NAME, NTB_ATTR_NAME_LEN);
-	switch (hw->topo) {
-	case NTB_TOPO_B2B_DSD:
-		strncpy(ntb_attrs[NTB_TOPO_ID].value, "B2B DSD",
-			NTB_ATTR_VAL_LEN);
-		break;
-	case NTB_TOPO_B2B_USD:
-		strncpy(ntb_attrs[NTB_TOPO_ID].value, "B2B USD",
-			NTB_ATTR_VAL_LEN);
-		break;
-	default:
-		strncpy(ntb_attrs[NTB_TOPO_ID].value, "Unsupported",
-			NTB_ATTR_VAL_LEN);
-	}
+	info->mw_cnt = hw->mw_cnt;
+	info->mw_size = hw->mw_size;
 
-	strncpy(ntb_attrs[NTB_LINK_STATUS_ID].name, NTB_LINK_STATUS_NAME,
-		NTB_ATTR_NAME_LEN);
-	snprintf(ntb_attrs[NTB_LINK_STATUS_ID].value, NTB_ATTR_VAL_LEN,
-		 "%d", hw->link_status);
-
-	strncpy(ntb_attrs[NTB_SPEED_ID].name, NTB_SPEED_NAME,
-		NTB_ATTR_NAME_LEN);
-	snprintf(ntb_attrs[NTB_SPEED_ID].value, NTB_ATTR_VAL_LEN,
-		 "%d", hw->link_speed);
-
-	strncpy(ntb_attrs[NTB_WIDTH_ID].name, NTB_WIDTH_NAME,
-		NTB_ATTR_NAME_LEN);
-	snprintf(ntb_attrs[NTB_WIDTH_ID].value, NTB_ATTR_VAL_LEN,
-		 "%d", hw->link_width);
-
-	strncpy(ntb_attrs[NTB_MW_CNT_ID].name, NTB_MW_CNT_NAME,
-		NTB_ATTR_NAME_LEN);
-	snprintf(ntb_attrs[NTB_MW_CNT_ID].value, NTB_ATTR_VAL_LEN,
-		 "%d", hw->mw_cnt);
+	/**
+	 * Intel hardware requires that mapped memory base address should be
+	 * aligned with EMBARSZ and needs continuous memzone.
+	 */
+	info->mw_size_align = (uint8_t)(hw->pci_dev->id.vendor_id ==
+					NTB_INTEL_VENDOR_ID);
 
-	strncpy(ntb_attrs[NTB_DB_CNT_ID].name, NTB_DB_CNT_NAME,
-		NTB_ATTR_NAME_LEN);
-	snprintf(ntb_attrs[NTB_DB_CNT_ID].value, NTB_ATTR_VAL_LEN,
-		 "%d", hw->db_cnt);
+	if (!hw->queue_size || !hw->queue_pairs) {
+		NTB_LOG(ERR, "No queue size and queue num assigned.");
+		return;
+	}
 
-	strncpy(ntb_attrs[NTB_SPAD_CNT_ID].name, NTB_SPAD_CNT_NAME,
-		NTB_ATTR_NAME_LEN);
-	snprintf(ntb_attrs[NTB_SPAD_CNT_ID].value, NTB_ATTR_VAL_LEN,
-		 "%d", hw->spad_cnt);
+	hw->hdr_size_per_queue = RTE_ALIGN(sizeof(struct ntb_header) +
+				hw->queue_size * sizeof(struct ntb_desc) +
+				hw->queue_size * sizeof(struct ntb_used),
+				RTE_CACHE_LINE_SIZE);
+	info->ntb_hdr_size = hw->hdr_size_per_queue * hw->queue_pairs;
 }
 
 static int
-ntb_dev_configure(const struct rte_rawdev *dev __rte_unused,
-		  rte_rawdev_obj_t config __rte_unused)
+ntb_dev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config)
 {
+	struct ntb_dev_config *conf = config;
+	struct ntb_hw *hw = dev->dev_private;
+	int ret;
+
+	hw->queue_pairs	= conf->num_queues;
+	hw->queue_size = conf->queue_size;
+	hw->used_mw_num = conf->mz_num;
+	hw->mz = conf->mz_list;
+	hw->rx_queues = rte_zmalloc("ntb_rx_queues",
+			sizeof(struct ntb_rx_queue *) * hw->queue_pairs, 0);
+	hw->tx_queues = rte_zmalloc("ntb_tx_queues",
+			sizeof(struct ntb_tx_queue *) * hw->queue_pairs, 0);
+
+	/* Start handshake with the peer. */
+	ret = ntb_handshake_work(dev);
+	if (ret < 0)
+		return ret;
+
 	return 0;
 }
 
@@ -337,21 +637,52 @@ static int
 ntb_dev_start(struct rte_rawdev *dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
-	int ret, i;
+	uint32_t peer_base_l, peer_val;
+	uint64_t peer_base_h;
+	uint32_t i;
+	int ret;
 
-	/* TODO: init queues and start queues. */
+	if (!hw->link_status || !hw->peer_dev_up)
+		return -EINVAL;
 
-	/* Map memory of bar_size to remote. */
-	hw->mz = rte_zmalloc("struct rte_memzone *",
-			     hw->mw_cnt * sizeof(struct rte_memzone *), 0);
-	for (i = 0; i < hw->mw_cnt; i++) {
-		ret = ntb_set_mw(dev, i, hw->mw_size[i]);
+	for (i = 0; i < hw->queue_pairs; i++) {
+		ret = ntb_queue_init(dev, i);
 		if (ret) {
-			NTB_LOG(ERR, "Fail to set mw.");
+			NTB_LOG(ERR, "Failed to init queue.");
 			return ret;
 		}
 	}
 
+	hw->peer_mw_base = rte_zmalloc("ntb_peer_mw_base", hw->mw_cnt *
+					sizeof(uint64_t), 0);
+
+	if (hw->ntb_ops->spad_read == NULL)
+		return -ENOTSUP;
+
+	peer_val = (*hw->ntb_ops->spad_read)(dev, SPAD_Q_SZ, 0);
+	if (peer_val != hw->queue_size) {
+		NTB_LOG(ERR, "Inconsistent queue size! (local: %u peer: %u)",
+			hw->queue_size, peer_val);
+		return -EINVAL;
+	}
+
+	peer_val = (*hw->ntb_ops->spad_read)(dev, SPAD_NUM_QPS, 0);
+	if (peer_val != hw->queue_pairs) {
+		NTB_LOG(ERR, "Inconsistent number of queues! (local: %u peer:"
+			" %u)", hw->queue_pairs, peer_val);
+		return -EINVAL;
+	}
+
+	hw->peer_used_mws = (*hw->ntb_ops->spad_read)(dev, SPAD_USED_MWS, 0);
+
+	for (i = 0; i < hw->peer_used_mws; i++) {
+		peer_base_h = (*hw->ntb_ops->spad_read)(dev,
+				SPAD_MW0_BA_H + 2 * i, 0);
+		peer_base_l = (*hw->ntb_ops->spad_read)(dev,
+				SPAD_MW0_BA_L + 2 * i, 0);
+		hw->peer_mw_base[i] = (peer_base_h << 32) + peer_base_l;
+	}
+
 	dev->started = 1;
 
 	return 0;
@@ -361,10 +692,10 @@ static void
 ntb_dev_stop(struct rte_rawdev *dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
+	struct ntb_rx_queue *rxq;
+	struct ntb_tx_queue *txq;
 	uint32_t time_out;
-	int status;
-
-	/* TODO: stop rx/tx queues. */
+	int status, i;
 
 	if (!hw->peer_dev_up)
 		goto clean;
@@ -405,6 +736,13 @@ ntb_dev_stop(struct rte_rawdev *dev)
 	if (status)
 		NTB_LOG(ERR, "Failed to clear doorbells.");
 
+	for (i = 0; i < hw->queue_pairs; i++) {
+		rxq = hw->rx_queues[i];
+		txq = hw->tx_queues[i];
+		ntb_rxq_release_mbufs(rxq);
+		ntb_txq_release_mbufs(txq);
+	}
+
 	dev->started = 0;
 }
 
@@ -413,12 +751,15 @@ ntb_dev_close(struct rte_rawdev *dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	struct rte_intr_handle *intr_handle;
-	int ret = 0;
+	int i;
 
 	if (dev->started)
 		ntb_dev_stop(dev);
 
-	/* TODO: free queues. */
+	/* free queues */
+	for (i = 0; i < hw->queue_pairs; i++)
+		ntb_queue_release(dev, i);
+	hw->queue_pairs = 0;
 
 	intr_handle = &hw->pci_dev->intr_handle;
 	/* Clean datapath event and vec mapping */
@@ -434,7 +775,7 @@ ntb_dev_close(struct rte_rawdev *dev)
 	rte_intr_callback_unregister(intr_handle,
 				     ntb_dev_intr_handler, dev);
 
-	return ret;
+	return 0;
 }
 
 static int
@@ -445,7 +786,7 @@ ntb_dev_reset(struct rte_rawdev *rawdev __rte_unused)
 
 static int
 ntb_attr_set(struct rte_rawdev *dev, const char *attr_name,
-				 uint64_t attr_value)
+	     uint64_t attr_value)
 {
 	struct ntb_hw *hw;
 	int index;
@@ -463,7 +804,21 @@ ntb_attr_set(struct rte_rawdev *dev, const char *attr_name,
 		index = atoi(&attr_name[NTB_SPAD_USER_LEN]);
 		(*hw->ntb_ops->spad_write)(dev, hw->spad_user_list[index],
 					   1, attr_value);
-		NTB_LOG(INFO, "Set attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Set attribute (%s) Value (%" PRIu64 ")",
+			attr_name, attr_value);
+		return 0;
+	}
+
+	if (!strncmp(attr_name, NTB_QUEUE_SZ_NAME, NTB_ATTR_NAME_LEN)) {
+		hw->queue_size = attr_value;
+		NTB_LOG(DEBUG, "Set attribute (%s) Value (%" PRIu64 ")",
+			attr_name, attr_value);
+		return 0;
+	}
+
+	if (!strncmp(attr_name, NTB_QUEUE_NUM_NAME, NTB_ATTR_NAME_LEN)) {
+		hw->queue_pairs = attr_value;
+		NTB_LOG(DEBUG, "Set attribute (%s) Value (%" PRIu64 ")",
 			attr_name, attr_value);
 		return 0;
 	}
@@ -475,7 +830,7 @@ ntb_attr_set(struct rte_rawdev *dev, const char *attr_name,
 
 static int
 ntb_attr_get(struct rte_rawdev *dev, const char *attr_name,
-				 uint64_t *attr_value)
+	     uint64_t *attr_value)
 {
 	struct ntb_hw *hw;
 	int index;
@@ -489,49 +844,50 @@ ntb_attr_get(struct rte_rawdev *dev, const char *attr_name,
 
 	if (!strncmp(attr_name, NTB_TOPO_NAME, NTB_ATTR_NAME_LEN)) {
 		*attr_value = hw->topo;
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
 
 	if (!strncmp(attr_name, NTB_LINK_STATUS_NAME, NTB_ATTR_NAME_LEN)) {
-		*attr_value = hw->link_status;
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		/* hw->link_status only indicates hw link status. */
+		*attr_value = hw->link_status && hw->peer_dev_up;
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
 
 	if (!strncmp(attr_name, NTB_SPEED_NAME, NTB_ATTR_NAME_LEN)) {
 		*attr_value = hw->link_speed;
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
 
 	if (!strncmp(attr_name, NTB_WIDTH_NAME, NTB_ATTR_NAME_LEN)) {
 		*attr_value = hw->link_width;
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
 
 	if (!strncmp(attr_name, NTB_MW_CNT_NAME, NTB_ATTR_NAME_LEN)) {
 		*attr_value = hw->mw_cnt;
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
 
 	if (!strncmp(attr_name, NTB_DB_CNT_NAME, NTB_ATTR_NAME_LEN)) {
 		*attr_value = hw->db_cnt;
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
 
 	if (!strncmp(attr_name, NTB_SPAD_CNT_NAME, NTB_ATTR_NAME_LEN)) {
 		*attr_value = hw->spad_cnt;
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
@@ -542,7 +898,7 @@ ntb_attr_get(struct rte_rawdev *dev, const char *attr_name,
 		index = atoi(&attr_name[NTB_SPAD_USER_LEN]);
 		*attr_value = (*hw->ntb_ops->spad_read)(dev,
 				hw->spad_user_list[index], 0);
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
@@ -585,6 +941,7 @@ ntb_xstats_reset(struct rte_rawdev *dev __rte_unused,
 	return 0;
 }
 
+
 static const struct rte_rawdev_ops ntb_ops = {
 	.dev_info_get         = ntb_dev_info_get,
 	.dev_configure        = ntb_dev_configure,
@@ -615,7 +972,6 @@ ntb_init_hw(struct rte_rawdev *dev, struct rte_pci_device *pci_dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	struct rte_intr_handle *intr_handle;
-	uint32_t val;
 	int ret, i;
 
 	hw->pci_dev = pci_dev;
@@ -688,45 +1044,6 @@ ntb_init_hw(struct rte_rawdev *dev, struct rte_pci_device *pci_dev)
 	/* enable uio intr after callback register */
 	rte_intr_enable(intr_handle);
 
-	if (hw->ntb_ops->spad_write == NULL) {
-		NTB_LOG(ERR, "Scratchpad is not supported.");
-		return -ENOTSUP;
-	}
-	/* Tell peer the mw_cnt of local side. */
-	ret = (*hw->ntb_ops->spad_write)(dev, SPAD_NUM_MWS, 1, hw->mw_cnt);
-	if (ret) {
-		NTB_LOG(ERR, "Failed to tell peer mw count.");
-		return ret;
-	}
-
-	/* Tell peer each mw size on local side. */
-	for (i = 0; i < hw->mw_cnt; i++) {
-		NTB_LOG(DEBUG, "Local %u mw size: 0x%"PRIx64"", i,
-				hw->mw_size[i]);
-		val = hw->mw_size[i] >> 32;
-		ret = (*hw->ntb_ops->spad_write)
-				(dev, SPAD_MW0_SZ_H + 2 * i, 1, val);
-		if (ret) {
-			NTB_LOG(ERR, "Failed to tell peer mw size.");
-			return ret;
-		}
-
-		val = hw->mw_size[i];
-		ret = (*hw->ntb_ops->spad_write)
-				(dev, SPAD_MW0_SZ_L + 2 * i, 1, val);
-		if (ret) {
-			NTB_LOG(ERR, "Failed to tell peer mw size.");
-			return ret;
-		}
-	}
-
-	/* Ring doorbell 0 to tell peer the device is ready. */
-	ret = (*hw->ntb_ops->peer_db_set)(dev, 0);
-	if (ret) {
-		NTB_LOG(ERR, "Failed to tell peer device is probed.");
-		return ret;
-	}
-
 	return ret;
 }
 
@@ -839,5 +1156,5 @@ RTE_INIT(ntb_init_log)
 {
 	ntb_logtype = rte_log_register("pmd.raw.ntb");
 	if (ntb_logtype >= 0)
-		rte_log_set_level(ntb_logtype, RTE_LOG_DEBUG);
+		rte_log_set_level(ntb_logtype, RTE_LOG_INFO);
 }
diff --git a/drivers/raw/ntb/ntb.h b/drivers/raw/ntb/ntb.h
index d355231b0..0ad20aed3 100644
--- a/drivers/raw/ntb/ntb.h
+++ b/drivers/raw/ntb/ntb.h
@@ -2,8 +2,8 @@
  * Copyright(c) 2019 Intel Corporation.
  */
 
-#ifndef _NTB_RAWDEV_H_
-#define _NTB_RAWDEV_H_
+#ifndef _NTB_H_
+#define _NTB_H_
 
 #include <stdbool.h>
 
@@ -19,38 +19,13 @@ extern int ntb_logtype;
 /* Device IDs */
 #define NTB_INTEL_DEV_ID_B2B_SKX    0x201C
 
-#define NTB_TOPO_NAME               "topo"
-#define NTB_LINK_STATUS_NAME        "link_status"
-#define NTB_SPEED_NAME              "speed"
-#define NTB_WIDTH_NAME              "width"
-#define NTB_MW_CNT_NAME             "mw_count"
-#define NTB_DB_CNT_NAME             "db_count"
-#define NTB_SPAD_CNT_NAME           "spad_count"
 /* Reserved to app to use. */
 #define NTB_SPAD_USER               "spad_user_"
 #define NTB_SPAD_USER_LEN           (sizeof(NTB_SPAD_USER) - 1)
-#define NTB_SPAD_USER_MAX_NUM       10
+#define NTB_SPAD_USER_MAX_NUM       4
 #define NTB_ATTR_NAME_LEN           30
-#define NTB_ATTR_VAL_LEN            30
-#define NTB_ATTR_MAX                20
-
-/* NTB Attributes */
-struct ntb_attr {
-	/**< Name of the attribute */
-	char name[NTB_ATTR_NAME_LEN];
-	/**< Value or reference of value of attribute */
-	char value[NTB_ATTR_NAME_LEN];
-};
 
-enum ntb_attr_idx {
-	NTB_TOPO_ID = 0,
-	NTB_LINK_STATUS_ID,
-	NTB_SPEED_ID,
-	NTB_WIDTH_ID,
-	NTB_MW_CNT_ID,
-	NTB_DB_CNT_ID,
-	NTB_SPAD_CNT_ID,
-};
+#define NTB_DFLT_TX_FREE_THRESH     256
 
 enum ntb_topo {
 	NTB_TOPO_NONE = 0,
@@ -87,10 +62,15 @@ enum ntb_spad_idx {
 	SPAD_NUM_MWS = 1,
 	SPAD_NUM_QPS,
 	SPAD_Q_SZ,
+	SPAD_USED_MWS,
 	SPAD_MW0_SZ_H,
 	SPAD_MW0_SZ_L,
 	SPAD_MW1_SZ_H,
 	SPAD_MW1_SZ_L,
+	SPAD_MW0_BA_H,
+	SPAD_MW0_BA_L,
+	SPAD_MW1_BA_H,
+	SPAD_MW1_BA_L,
 };
 
 /**
@@ -110,26 +90,97 @@ enum ntb_spad_idx {
  * @vector_bind: Bind vector source [intr] to msix vector [msix].
  */
 struct ntb_dev_ops {
-	int (*ntb_dev_init)(struct rte_rawdev *dev);
-	void *(*get_peer_mw_addr)(struct rte_rawdev *dev, int mw_idx);
-	int (*mw_set_trans)(struct rte_rawdev *dev, int mw_idx,
+	int (*ntb_dev_init)(const struct rte_rawdev *dev);
+	void *(*get_peer_mw_addr)(const struct rte_rawdev *dev, int mw_idx);
+	int (*mw_set_trans)(const struct rte_rawdev *dev, int mw_idx,
 			    uint64_t addr, uint64_t size);
-	int (*get_link_status)(struct rte_rawdev *dev);
-	int (*set_link)(struct rte_rawdev *dev, bool up);
-	uint32_t (*spad_read)(struct rte_rawdev *dev, int spad, bool peer);
-	int (*spad_write)(struct rte_rawdev *dev, int spad,
+	int (*get_link_status)(const struct rte_rawdev *dev);
+	int (*set_link)(const struct rte_rawdev *dev, bool up);
+	uint32_t (*spad_read)(const struct rte_rawdev *dev, int spad,
+			      bool peer);
+	int (*spad_write)(const struct rte_rawdev *dev, int spad,
 			  bool peer, uint32_t spad_v);
-	uint64_t (*db_read)(struct rte_rawdev *dev);
-	int (*db_clear)(struct rte_rawdev *dev, uint64_t db_bits);
-	int (*db_set_mask)(struct rte_rawdev *dev, uint64_t db_mask);
-	int (*peer_db_set)(struct rte_rawdev *dev, uint8_t db_bit);
-	int (*vector_bind)(struct rte_rawdev *dev, uint8_t intr, uint8_t msix);
+	uint64_t (*db_read)(const struct rte_rawdev *dev);
+	int (*db_clear)(const struct rte_rawdev *dev, uint64_t db_bits);
+	int (*db_set_mask)(const struct rte_rawdev *dev, uint64_t db_mask);
+	int (*peer_db_set)(const struct rte_rawdev *dev, uint8_t db_bit);
+	int (*vector_bind)(const struct rte_rawdev *dev, uint8_t intr,
+			   uint8_t msix);
+};
+
+struct ntb_desc {
+	uint64_t addr; /* buffer addr */
+	uint16_t len;  /* buffer length */
+	uint16_t rsv1;
+	uint32_t rsv2;
+};
+
+struct ntb_used {
+	uint16_t len;     /* buffer length */
+#define NTB_FLAG_EOP    1 /* end of packet */
+	uint16_t flags;   /* flags */
+};
+
+struct ntb_rx_entry {
+	struct rte_mbuf *mbuf;
+};
+
+struct ntb_rx_queue {
+	struct ntb_desc *rx_desc_ring;
+	volatile struct ntb_used *rx_used_ring;
+	uint16_t *avail_cnt;
+	volatile uint16_t *used_cnt;
+	uint16_t last_avail;
+	uint16_t last_used;
+	uint16_t nb_rx_desc;
+
+	uint16_t rx_free_thresh;
+
+	struct rte_mempool *mpool; /**< mempool for mbuf allocation */
+	struct ntb_rx_entry *sw_ring;
+
+	uint16_t queue_id;         /**< DPDK queue index. */
+	uint16_t port_id;          /**< Device port identifier. */
+
+	struct ntb_hw *hw;
+};
+
+struct ntb_tx_entry {
+	struct rte_mbuf *mbuf;
+	uint16_t next_id;
+	uint16_t last_id;
+};
+
+struct ntb_tx_queue {
+	volatile struct ntb_desc *tx_desc_ring;
+	struct ntb_used *tx_used_ring;
+	volatile uint16_t *avail_cnt;
+	uint16_t *used_cnt;
+	uint16_t last_avail;          /**< Next need to be free. */
+	uint16_t last_used;           /**< Next need to be sent. */
+	uint16_t nb_tx_desc;
+
+	/**< Total number of TX descriptors ready to be allocated. */
+	uint16_t nb_tx_free;
+	uint16_t tx_free_thresh;
+
+	struct ntb_tx_entry *sw_ring;
+
+	uint16_t queue_id;            /**< DPDK queue index. */
+	uint16_t port_id;             /**< Device port identifier. */
+
+	struct ntb_hw *hw;
+};
+
+struct ntb_header {
+	uint16_t avail_cnt __rte_cache_aligned;
+	uint16_t used_cnt __rte_cache_aligned;
+	struct ntb_desc desc_ring[] __rte_cache_aligned;
 };
 
 /* ntb private data. */
 struct ntb_hw {
 	uint8_t mw_cnt;
-	uint8_t peer_mw_cnt;
 	uint8_t db_cnt;
 	uint8_t spad_cnt;
 
@@ -147,18 +198,26 @@ struct ntb_hw {
 	struct rte_pci_device *pci_dev;
 	char *hw_addr;
 
-	uint64_t *mw_size;
-	uint64_t *peer_mw_size;
 	uint8_t peer_dev_up;
+	uint64_t *mw_size;
+	/* remote mem base addr */
+	uint64_t *peer_mw_base;
 
 	uint16_t queue_pairs;
 	uint16_t queue_size;
+	uint32_t hdr_size_per_queue;
+
+	struct ntb_rx_queue **rx_queues;
+	struct ntb_tx_queue **tx_queues;
 
-	/**< mem zone to populate RX ring. */
+	/* memzone to populate RX ring. */
 	const struct rte_memzone **mz;
+	uint8_t used_mw_num;
+
+	uint8_t peer_used_mws;
 
 	/* Reserve several spad for app to use. */
 	int spad_user_list[NTB_SPAD_USER_MAX_NUM];
 };
 
-#endif /* _NTB_RAWDEV_H_ */
+#endif /* _NTB_H_ */
diff --git a/drivers/raw/ntb/ntb_hw_intel.c b/drivers/raw/ntb/ntb_hw_intel.c
index 21eaa8511..0e73f1609 100644
--- a/drivers/raw/ntb/ntb_hw_intel.c
+++ b/drivers/raw/ntb/ntb_hw_intel.c
@@ -26,7 +26,7 @@ static enum xeon_ntb_bar intel_ntb_bar[] = {
 };
 
 static int
-intel_ntb_dev_init(struct rte_rawdev *dev)
+intel_ntb_dev_init(const struct rte_rawdev *dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint8_t reg_val, bar;
@@ -77,7 +77,7 @@ intel_ntb_dev_init(struct rte_rawdev *dev)
 	hw->db_cnt = XEON_DB_COUNT;
 	hw->spad_cnt = XEON_SPAD_COUNT;
 
-	hw->mw_size = rte_zmalloc("uint64_t",
+	hw->mw_size = rte_zmalloc("ntb_mw_size",
 				  hw->mw_cnt * sizeof(uint64_t), 0);
 	for (i = 0; i < hw->mw_cnt; i++) {
 		bar = intel_ntb_bar[i];
@@ -94,7 +94,7 @@ intel_ntb_dev_init(struct rte_rawdev *dev)
 }
 
 static void *
-intel_ntb_get_peer_mw_addr(struct rte_rawdev *dev, int mw_idx)
+intel_ntb_get_peer_mw_addr(const struct rte_rawdev *dev, int mw_idx)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint8_t bar;
@@ -116,7 +116,7 @@ intel_ntb_get_peer_mw_addr(struct rte_rawdev *dev, int mw_idx)
 }
 
 static int
-intel_ntb_mw_set_trans(struct rte_rawdev *dev, int mw_idx,
+intel_ntb_mw_set_trans(const struct rte_rawdev *dev, int mw_idx,
 		       uint64_t addr, uint64_t size)
 {
 	struct ntb_hw *hw = dev->dev_private;
@@ -163,7 +163,7 @@ intel_ntb_mw_set_trans(struct rte_rawdev *dev, int mw_idx,
 }
 
 static int
-intel_ntb_get_link_status(struct rte_rawdev *dev)
+intel_ntb_get_link_status(const struct rte_rawdev *dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint16_t reg_val;
@@ -195,7 +195,7 @@ intel_ntb_get_link_status(struct rte_rawdev *dev)
 }
 
 static int
-intel_ntb_set_link(struct rte_rawdev *dev, bool up)
+intel_ntb_set_link(const struct rte_rawdev *dev, bool up)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint32_t ntb_ctrl, reg_off;
@@ -221,7 +221,7 @@ intel_ntb_set_link(struct rte_rawdev *dev, bool up)
 }
 
 static uint32_t
-intel_ntb_spad_read(struct rte_rawdev *dev, int spad, bool peer)
+intel_ntb_spad_read(const struct rte_rawdev *dev, int spad, bool peer)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint32_t spad_v, reg_off;
@@ -241,7 +241,7 @@ intel_ntb_spad_read(struct rte_rawdev *dev, int spad, bool peer)
 }
 
 static int
-intel_ntb_spad_write(struct rte_rawdev *dev, int spad,
+intel_ntb_spad_write(const struct rte_rawdev *dev, int spad,
 		     bool peer, uint32_t spad_v)
 {
 	struct ntb_hw *hw = dev->dev_private;
@@ -263,7 +263,7 @@ intel_ntb_spad_write(struct rte_rawdev *dev, int spad,
 }
 
 static uint64_t
-intel_ntb_db_read(struct rte_rawdev *dev)
+intel_ntb_db_read(const struct rte_rawdev *dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint64_t db_off, db_bits;
@@ -278,7 +278,7 @@ intel_ntb_db_read(struct rte_rawdev *dev)
 }
 
 static int
-intel_ntb_db_clear(struct rte_rawdev *dev, uint64_t db_bits)
+intel_ntb_db_clear(const struct rte_rawdev *dev, uint64_t db_bits)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint64_t db_off;
@@ -293,7 +293,7 @@ intel_ntb_db_clear(struct rte_rawdev *dev, uint64_t db_bits)
 }
 
 static int
-intel_ntb_db_set_mask(struct rte_rawdev *dev, uint64_t db_mask)
+intel_ntb_db_set_mask(const struct rte_rawdev *dev, uint64_t db_mask)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint64_t db_m_off;
@@ -312,7 +312,7 @@ intel_ntb_db_set_mask(struct rte_rawdev *dev, uint64_t db_mask)
 }
 
 static int
-intel_ntb_peer_db_set(struct rte_rawdev *dev, uint8_t db_idx)
+intel_ntb_peer_db_set(const struct rte_rawdev *dev, uint8_t db_idx)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint32_t db_off;
@@ -332,7 +332,7 @@ intel_ntb_peer_db_set(struct rte_rawdev *dev, uint8_t db_idx)
 }
 
 static int
-intel_ntb_vector_bind(struct rte_rawdev *dev, uint8_t intr, uint8_t msix)
+intel_ntb_vector_bind(const struct rte_rawdev *dev, uint8_t intr, uint8_t msix)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint8_t reg_off;
diff --git a/drivers/raw/ntb/rte_pmd_ntb.h b/drivers/raw/ntb/rte_pmd_ntb.h
new file mode 100644
index 000000000..6591ce793
--- /dev/null
+++ b/drivers/raw/ntb/rte_pmd_ntb.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Intel Corporation.
+ */
+
+#ifndef _RTE_PMD_NTB_H_
+#define _RTE_PMD_NTB_H_
+
+/* App needs to set/get these attrs */
+#define NTB_QUEUE_SZ_NAME           "queue_size"
+#define NTB_QUEUE_NUM_NAME          "queue_num"
+#define NTB_TOPO_NAME               "topo"
+#define NTB_LINK_STATUS_NAME        "link_status"
+#define NTB_SPEED_NAME              "speed"
+#define NTB_WIDTH_NAME              "width"
+#define NTB_MW_CNT_NAME             "mw_count"
+#define NTB_DB_CNT_NAME             "db_count"
+#define NTB_SPAD_CNT_NAME           "spad_count"
+
+#define NTB_MAX_DESC_SIZE           1024
+#define NTB_MIN_DESC_SIZE           64
+
+struct ntb_dev_info {
+	uint32_t ntb_hdr_size;
+	/**< memzone needs to be mw size align or not. */
+	uint8_t mw_size_align;
+	uint8_t mw_cnt;
+	uint64_t *mw_size;
+};
+
+struct ntb_dev_config {
+	uint16_t num_queues;
+	uint16_t queue_size;
+	uint8_t mz_num;
+	const struct rte_memzone **mz_list;
+};
+
+struct ntb_queue_conf {
+	uint16_t nb_desc;
+	uint16_t tx_free_thresh;
+	struct rte_mempool *rx_mp;
+};
+
+#endif /* _RTE_PMD_NTB_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [PATCH v4 2/4] raw/ntb: add xstats support
  2019-09-09  3:27     ` [dpdk-dev] [PATCH v4 0/4] enable FIFO " Xiaoyun Li
  2019-09-09  3:27       ` [dpdk-dev] [PATCH v4 1/4] raw/ntb: setup ntb queue Xiaoyun Li
@ 2019-09-09  3:27       ` Xiaoyun Li
  2019-09-23  3:30         ` Wu, Jingjing
  2019-09-09  3:27       ` [dpdk-dev] [PATCH v4 3/4] raw/ntb: add enqueue and dequeue functions Xiaoyun Li
                         ` (2 subsequent siblings)
  4 siblings, 1 reply; 42+ messages in thread
From: Xiaoyun Li @ 2019-09-09  3:27 UTC (permalink / raw)
  To: jingjing.wu, keith.wiles, omkar.maslekar, cunming.liang; +Cc: dev, Xiaoyun Li

Add xstats support for ntb rawdev.
Support tx-packets, tx-bytes, tx-errors and
rx-packets, rx-bytes, rx-missed.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
---
 drivers/raw/ntb/ntb.c | 133 ++++++++++++++++++++++++++++++++++++------
 drivers/raw/ntb/ntb.h |  11 ++++
 2 files changed, 126 insertions(+), 18 deletions(-)

diff --git a/drivers/raw/ntb/ntb.c b/drivers/raw/ntb/ntb.c
index 728deccdf..3ddfa2afb 100644
--- a/drivers/raw/ntb/ntb.c
+++ b/drivers/raw/ntb/ntb.c
@@ -30,6 +30,17 @@ static const struct rte_pci_id pci_id_ntb_map[] = {
 	{ .vendor_id = 0, /* sentinel */ },
 };
 
+/* Align with enum ntb_xstats_idx */
+static struct rte_rawdev_xstats_name ntb_xstats_names[] = {
+	{"Tx-packets"},
+	{"Tx-bytes"},
+	{"Tx-errors"},
+	{"Rx-packets"},
+	{"Rx-bytes"},
+	{"Rx-missed"},
+};
+#define NTB_XSTATS_NUM RTE_DIM(ntb_xstats_names)
+
 static inline void
 ntb_link_cleanup(struct rte_rawdev *dev)
 {
@@ -538,6 +549,10 @@ ntb_queue_init(struct rte_rawdev *dev, uint16_t qp_id)
 	txq->last_avail = 0;
 	txq->nb_tx_free = txq->nb_tx_desc - 1;
 
+	/* Set per queue stats. */
+	for (i = 0; i < NTB_XSTATS_NUM; i++)
+		hw->ntb_xstats[i + NTB_XSTATS_NUM * (qp_id + 1)] = 0;
+
 	return 0;
 }
 
@@ -614,6 +629,7 @@ ntb_dev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config)
 {
 	struct ntb_dev_config *conf = config;
 	struct ntb_hw *hw = dev->dev_private;
+	uint32_t xstats_num;
 	int ret;
 
 	hw->queue_pairs	= conf->num_queues;
@@ -624,6 +640,10 @@ ntb_dev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config)
 			sizeof(struct ntb_rx_queue *) * hw->queue_pairs, 0);
 	hw->tx_queues = rte_zmalloc("ntb_tx_queues",
 			sizeof(struct ntb_tx_queue *) * hw->queue_pairs, 0);
+	/* First total stats, then per queue stats. */
+	xstats_num = (hw->queue_pairs + 1) * NTB_XSTATS_NUM;
+	hw->ntb_xstats = rte_zmalloc("ntb_xstats", xstats_num *
+				     sizeof(uint64_t), 0);
 
 	/* Start handshake with the peer. */
 	ret = ntb_handshake_work(dev);
@@ -645,6 +665,10 @@ ntb_dev_start(struct rte_rawdev *dev)
 	if (!hw->link_status || !hw->peer_dev_up)
 		return -EINVAL;
 
+	/* Set total stats. */
+	for (i = 0; i < NTB_XSTATS_NUM; i++)
+		hw->ntb_xstats[i] = 0;
+
 	for (i = 0; i < hw->queue_pairs; i++) {
 		ret = ntb_queue_init(dev, i);
 		if (ret) {
@@ -909,38 +933,111 @@ ntb_attr_get(struct rte_rawdev *dev, const char *attr_name,
 }
 
 static int
-ntb_xstats_get(const struct rte_rawdev *dev __rte_unused,
-	       const unsigned int ids[] __rte_unused,
-	       uint64_t values[] __rte_unused,
-	       unsigned int n __rte_unused)
+ntb_xstats_get(const struct rte_rawdev *dev,
+	       const unsigned int ids[],
+	       uint64_t values[],
+	       unsigned int n)
 {
-	return 0;
+	struct ntb_hw *hw = dev->dev_private;
+	uint32_t i, j, off, xstats_num;
+
+	/* Calculate total stats of all queues. */
+	for (i = 0; i < NTB_XSTATS_NUM; i++) {
+		hw->ntb_xstats[i] = 0;
+		for (j = 0; j < hw->queue_pairs; j++) {
+			off = NTB_XSTATS_NUM * (j + 1) + i;
+			hw->ntb_xstats[i] += hw->ntb_xstats[off];
+		}
+	}
+
+	xstats_num = NTB_XSTATS_NUM * (hw->queue_pairs + 1);
+	for (i = 0; i < n && ids[i] < xstats_num; i++)
+		values[i] = hw->ntb_xstats[ids[i]];
+
+	return i;
 }
 
 static int
-ntb_xstats_get_names(const struct rte_rawdev *dev __rte_unused,
-		     struct rte_rawdev_xstats_name *xstats_names __rte_unused,
-		     unsigned int size __rte_unused)
+ntb_xstats_get_names(const struct rte_rawdev *dev,
+		     struct rte_rawdev_xstats_name *xstats_names,
+		     unsigned int size)
 {
-	return 0;
+	struct ntb_hw *hw = dev->dev_private;
+	uint32_t xstats_num, i, j, off;
+
+	xstats_num = NTB_XSTATS_NUM * (hw->queue_pairs + 1);
+	if (xstats_names == NULL || size < xstats_num)
+		return xstats_num;
+
+	/* Total stats names */
+	memcpy(xstats_names, ntb_xstats_names, sizeof(ntb_xstats_names));
+
+	/* Queue stats names */
+	for (i = 0; i < hw->queue_pairs; i++) {
+		for (j = 0; j < NTB_XSTATS_NUM; j++) {
+			off = j + (i + 1) * NTB_XSTATS_NUM;
+			snprintf(xstats_names[off].name,
+				sizeof(xstats_names[0].name),
+				"%s_q%u", ntb_xstats_names[j].name, i);
+		}
+	}
+
+	return xstats_num;
 }
 
 static uint64_t
-ntb_xstats_get_by_name(const struct rte_rawdev *dev __rte_unused,
-		       const char *name __rte_unused,
-		       unsigned int *id __rte_unused)
+ntb_xstats_get_by_name(const struct rte_rawdev *dev,
+		       const char *name, unsigned int *id)
 {
-	return 0;
+	struct rte_rawdev_xstats_name *xstats_names;
+	struct ntb_hw *hw = dev->dev_private;
+	uint32_t xstats_num, i, j, off;
+
+	if (name == NULL)
+		return -EINVAL;
+
+	xstats_num = NTB_XSTATS_NUM * (hw->queue_pairs + 1);
+	xstats_names = rte_zmalloc("ntb_stats_name",
+				   sizeof(struct rte_rawdev_xstats_name) *
+				   xstats_num, 0);
+	ntb_xstats_get_names(dev, xstats_names, xstats_num);
+
+	/* Calculate total stats of all queues. */
+	for (i = 0; i < NTB_XSTATS_NUM; i++) {
+		for (j = 0; j < hw->queue_pairs; j++) {
+			off = NTB_XSTATS_NUM * (j + 1) + i;
+			hw->ntb_xstats[i] += hw->ntb_xstats[off];
+		}
+	}
+
+	for (i = 0; i < xstats_num; i++) {
+		if (!strncmp(name, xstats_names[i].name,
+		    RTE_RAW_DEV_XSTATS_NAME_SIZE)) {
+			*id = i;
+			rte_free(xstats_names);
+			return hw->ntb_xstats[i];
+		}
+	}
+
+	NTB_LOG(ERR, "Cannot find the xstats name.");
+
+	return -EINVAL;
 }
 
 static int
-ntb_xstats_reset(struct rte_rawdev *dev __rte_unused,
-		 const uint32_t ids[] __rte_unused,
-		 uint32_t nb_ids __rte_unused)
+ntb_xstats_reset(struct rte_rawdev *dev,
+		 const uint32_t ids[],
+		 uint32_t nb_ids)
 {
-	return 0;
-}
+	struct ntb_hw *hw = dev->dev_private;
+	uint32_t i, xstats_num;
 
+	xstats_num = NTB_XSTATS_NUM * (hw->queue_pairs + 1);
+	for (i = 0; i < nb_ids && ids[i] < xstats_num; i++)
+		hw->ntb_xstats[ids[i]] = 0;
+
+	return i;
+}
 
 static const struct rte_rawdev_ops ntb_ops = {
 	.dev_info_get         = ntb_dev_info_get,
diff --git a/drivers/raw/ntb/ntb.h b/drivers/raw/ntb/ntb.h
index 0ad20aed3..09e28050f 100644
--- a/drivers/raw/ntb/ntb.h
+++ b/drivers/raw/ntb/ntb.h
@@ -27,6 +27,15 @@ extern int ntb_logtype;
 
 #define NTB_DFLT_TX_FREE_THRESH     256
 
+enum ntb_xstats_idx {
+	NTB_TX_PKTS_ID = 0,
+	NTB_TX_BYTES_ID,
+	NTB_TX_ERRS_ID,
+	NTB_RX_PKTS_ID,
+	NTB_RX_BYTES_ID,
+	NTB_RX_MISS_ID,
+};
+
 enum ntb_topo {
 	NTB_TOPO_NONE = 0,
 	NTB_TOPO_B2B_USD,
@@ -216,6 +225,8 @@ struct ntb_hw {
 
 	uint8_t peer_used_mws;
 
+	uint64_t *ntb_xstats;
+
 	/* Reserve several spad for app to use. */
 	int spad_user_list[NTB_SPAD_USER_MAX_NUM];
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [PATCH v4 3/4] raw/ntb: add enqueue and dequeue functions
  2019-09-09  3:27     ` [dpdk-dev] [PATCH v4 0/4] enable FIFO " Xiaoyun Li
  2019-09-09  3:27       ` [dpdk-dev] [PATCH v4 1/4] raw/ntb: setup ntb queue Xiaoyun Li
  2019-09-09  3:27       ` [dpdk-dev] [PATCH v4 2/4] raw/ntb: add xstats support Xiaoyun Li
@ 2019-09-09  3:27       ` Xiaoyun Li
  2019-09-23  5:25         ` Wu, Jingjing
  2019-09-09  3:27       ` [dpdk-dev] [PATCH v4 4/4] examples/ntb: support more functions for NTB Xiaoyun Li
  2019-09-24  8:43       ` [dpdk-dev] [PATCH v5 0/4] enable FIFO " Xiaoyun Li
  4 siblings, 1 reply; 42+ messages in thread
From: Xiaoyun Li @ 2019-09-09  3:27 UTC (permalink / raw)
  To: jingjing.wu, keith.wiles, omkar.maslekar, cunming.liang; +Cc: dev, Xiaoyun Li

Introduce enqueue and dequeue functions to support packet based
processing. And enable write-combining for ntb driver since it
can improve the performance a lot.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
---
 doc/guides/rawdevs/ntb.rst     |  28 ++++
 drivers/raw/ntb/ntb.c          | 242 ++++++++++++++++++++++++++++++---
 drivers/raw/ntb/ntb.h          |   2 +
 drivers/raw/ntb/ntb_hw_intel.c |  22 +++
 4 files changed, 275 insertions(+), 19 deletions(-)

diff --git a/doc/guides/rawdevs/ntb.rst b/doc/guides/rawdevs/ntb.rst
index 99e7db441..afd5769fc 100644
--- a/doc/guides/rawdevs/ntb.rst
+++ b/doc/guides/rawdevs/ntb.rst
@@ -45,6 +45,24 @@ to use, i.e. igb_uio, vfio. The ``dpdk-devbind.py`` script can be used to
 show devices status and to bind them to a suitable kernel driver. They will
 appear under the category of "Misc (rawdev) devices".
 
+Prerequisites
+-------------
+NTB PMD needs kernel PCI driver to support write combining (WC) to get
+better performance. The difference will be more than 10 times.
+To enable WC, there are 2 ways.
+- Insert igb_uio with ``wc_active=1`` flag if use igb_uio driver.
+     insmod igb_uio.ko wc_active=1
+- Enable WC for NTB device's Bar 2 and Bar 4 (Mapped memory) manually.
+     Get bar base address using ``lspci -vvv -s ae:00.0 | grep Region``.
+        Region 0: Memory at 39bfe0000000 (64-bit, prefetchable) [size=64K]
+        Region 2: Memory at 39bfa0000000 (64-bit, prefetchable) [size=512M]
+        Region 4: Memory at 39bfc0000000 (64-bit, prefetchable) [size=512M]
+     Using the following command to enable WC.
+     echo "base=0x39bfa0000000 size=0x400000 type=write-combining" >> /proc/mtrr
+     echo "base=0x39bfc0000000 size=0x400000 type=write-combining" >> /proc/mtrr
+     To disable WC for these regions, using the following.
+     echo "disable=1" >> /proc/mtrr
+
 Ring Layout
 -----------
 
@@ -83,6 +101,16 @@ like the following:
       +------------------------+   +------------------------+
                     <---------traffic---------
 
+- Enqueue and Dequeue
+  Based on this ring layout, enqueue reads rx_tail to get how many free
+  buffers and writes used_ring and tx_tail to tell the peer which buffers
+  are filled with data.
+  And dequeue reads tx_tail to get how many packets are arrived, and
+  writes desc_ring and rx_tail to tell the peer about the new allocated
+  buffers.
+  So in this way, only remote write happens and remote read can be avoid
+  to get better performance.
+
 Limitation
 ----------
 
diff --git a/drivers/raw/ntb/ntb.c b/drivers/raw/ntb/ntb.c
index 3ddfa2afb..a34f3f9ee 100644
--- a/drivers/raw/ntb/ntb.c
+++ b/drivers/raw/ntb/ntb.c
@@ -556,26 +556,140 @@ ntb_queue_init(struct rte_rawdev *dev, uint16_t qp_id)
 	return 0;
 }
 
+static inline void
+ntb_enqueue_cleanup(struct ntb_tx_queue *txq)
+{
+	struct ntb_tx_entry *sw_ring = txq->sw_ring;
+	uint16_t tx_free = txq->last_avail;
+	uint16_t nb_to_clean, i;
+
+	/* avail_cnt + 1 represents where to rx next in the peer. */
+	nb_to_clean = (*txq->avail_cnt - txq->last_avail + 1 +
+			txq->nb_tx_desc) & (txq->nb_tx_desc - 1);
+	nb_to_clean = RTE_MIN(nb_to_clean, txq->tx_free_thresh);
+	for (i = 0; i < nb_to_clean; i++) {
+		if (sw_ring[tx_free].mbuf)
+			rte_pktmbuf_free_seg(sw_ring[tx_free].mbuf);
+		tx_free = (tx_free + 1) & (txq->nb_tx_desc - 1);
+	}
+
+	txq->nb_tx_free += nb_to_clean;
+	txq->last_avail = tx_free;
+}
+
 static int
 ntb_enqueue_bufs(struct rte_rawdev *dev,
 		 struct rte_rawdev_buf **buffers,
 		 unsigned int count,
 		 rte_rawdev_obj_t context)
 {
-	/* Not FIFO right now. Just for testing memory write. */
 	struct ntb_hw *hw = dev->dev_private;
-	unsigned int i;
-	void *bar_addr;
-	size_t size;
+	struct ntb_tx_queue *txq = hw->tx_queues[(size_t)context];
+	struct ntb_tx_entry *sw_ring = txq->sw_ring;
+	struct rte_mbuf *txm;
+	struct ntb_used tx_used[NTB_MAX_DESC_SIZE];
+	volatile struct ntb_desc *tx_item;
+	uint16_t tx_last, nb_segs, off, last_used, avail_cnt;
+	uint16_t nb_mbufs = 0;
+	uint16_t nb_tx = 0;
+	uint64_t bytes = 0;
+	void *buf_addr;
+	int i;
 
-	if (hw->ntb_ops->get_peer_mw_addr == NULL)
-		return -ENOTSUP;
-	bar_addr = (*hw->ntb_ops->get_peer_mw_addr)(dev, 0);
-	size = (size_t)context;
+	if (unlikely(hw->ntb_ops->ioremap == NULL)) {
+		NTB_LOG(ERR, "Ioremap not supported.");
+		return nb_tx;
+	}
 
-	for (i = 0; i < count; i++)
-		rte_memcpy(bar_addr, buffers[i]->buf_addr, size);
-	return 0;
+	if (unlikely(dev->started == 0 || hw->peer_dev_up == 0)) {
+		NTB_LOG(DEBUG, "Link is not up.");
+		return nb_tx;
+	}
+
+	if (txq->nb_tx_free < txq->tx_free_thresh)
+		ntb_enqueue_cleanup(txq);
+
+	off = NTB_XSTATS_NUM * ((size_t)context + 1);
+	last_used = txq->last_used;
+	avail_cnt = *txq->avail_cnt;/* Where to alloc next. */
+	for (nb_tx = 0; nb_tx < count; nb_tx++) {
+		txm = (struct rte_mbuf *)(buffers[nb_tx]->buf_addr);
+		if (txm == NULL || txq->nb_tx_free < txm->nb_segs)
+			break;
+
+		tx_last = (txq->last_used + txm->nb_segs - 1) &
+			  (txq->nb_tx_desc - 1);
+		nb_segs = txm->nb_segs;
+		for (i = 0; i < nb_segs; i++) {
+			/* Not enough ring space for tx. */
+			if (txq->last_used == avail_cnt)
+				goto end_of_tx;
+			sw_ring[txq->last_used].mbuf = txm;
+			tx_item = txq->tx_desc_ring + txq->last_used;
+
+			if (!tx_item->len) {
+				(hw->ntb_xstats[NTB_TX_ERRS_ID + off])++;
+				goto end_of_tx;
+			}
+			if (txm->data_len > tx_item->len) {
+				NTB_LOG(ERR, "Data length exceeds buf length."
+					" Only %u data would be transmitted.",
+					tx_item->len);
+				txm->data_len = tx_item->len;
+			}
+
+			/* translate remote virtual addr to bar virtual addr */
+			buf_addr = (*hw->ntb_ops->ioremap)(dev, tx_item->addr);
+			if (buf_addr == NULL) {
+				(hw->ntb_xstats[NTB_TX_ERRS_ID + off])++;
+				NTB_LOG(ERR, "Null remap addr.");
+				goto end_of_tx;
+			}
+			rte_memcpy(buf_addr, rte_pktmbuf_mtod(txm, void *),
+				   txm->data_len);
+
+			tx_used[nb_mbufs].len = txm->data_len;
+			tx_used[nb_mbufs++].flags = (txq->last_used ==
+						    tx_last) ?
+						    NTB_FLAG_EOP : 0;
+
+			/* update stats */
+			bytes += txm->data_len;
+
+			txm = txm->next;
+
+			sw_ring[txq->last_used].next_id = (txq->last_used + 1) &
+						  (txq->nb_tx_desc - 1);
+			sw_ring[txq->last_used].last_id = tx_last;
+			txq->last_used = (txq->last_used + 1) &
+					 (txq->nb_tx_desc - 1);
+		}
+		txq->nb_tx_free -= nb_segs;
+	}
+
+end_of_tx:
+	if (nb_tx) {
+		uint16_t nb1, nb2;
+		if (nb_mbufs > txq->nb_tx_desc - last_used) {
+			nb1 = txq->nb_tx_desc - last_used;
+			nb2 = nb_mbufs - txq->nb_tx_desc + last_used;
+		} else {
+			nb1 = nb_mbufs;
+			nb2 = 0;
+		}
+		rte_memcpy(txq->tx_used_ring + last_used, tx_used,
+			   sizeof(struct ntb_used) * nb1);
+		rte_memcpy(txq->tx_used_ring, tx_used + nb1,
+			   sizeof(struct ntb_used) * nb2);
+		*txq->used_cnt = txq->last_used;
+		rte_wmb();
+
+		/* update queue stats */
+		hw->ntb_xstats[NTB_TX_BYTES_ID + off] += bytes;
+		hw->ntb_xstats[NTB_TX_PKTS_ID + off] += nb_tx;
+	}
+
+	return nb_tx;
 }
 
 static int
@@ -584,16 +698,106 @@ ntb_dequeue_bufs(struct rte_rawdev *dev,
 		 unsigned int count,
 		 rte_rawdev_obj_t context)
 {
-	/* Not FIFO. Just for testing memory read. */
 	struct ntb_hw *hw = dev->dev_private;
-	unsigned int i;
-	size_t size;
+	struct ntb_rx_queue *rxq = hw->rx_queues[(size_t)context];
+	struct ntb_rx_entry *sw_ring = rxq->sw_ring;
+	struct ntb_desc rx_desc[NTB_MAX_DESC_SIZE];
+	struct rte_mbuf *first, *rxm_t;
+	struct rte_mbuf *prev = NULL;
+	volatile struct ntb_used *rx_item;
+	uint16_t nb_mbufs = 0;
+	uint16_t nb_rx = 0;
+	uint64_t bytes = 0;
+	uint16_t off, last_avail, used_cnt, used_nb;
+	int i;
 
-	size = (size_t)context;
+	if (unlikely(dev->started == 0 || hw->peer_dev_up == 0)) {
+		NTB_LOG(DEBUG, "Link is not up");
+		return nb_rx;
+	}
+
+	used_cnt = *rxq->used_cnt;
+
+	if (rxq->last_used == used_cnt)
+		return nb_rx;
+
+	last_avail = rxq->last_avail;
+	used_nb = (used_cnt - rxq->last_used) & (rxq->nb_rx_desc - 1);
+	count = RTE_MIN(count, used_nb);
+	for (nb_rx = 0; nb_rx < count; nb_rx++) {
+		i = 0;
+		while (true) {
+			rx_item = rxq->rx_used_ring + rxq->last_used;
+			rxm_t = sw_ring[rxq->last_used].mbuf;
+			rxm_t->data_len = rx_item->len;
+			rxm_t->data_off = RTE_PKTMBUF_HEADROOM;
+			rxm_t->port = rxq->port_id;
+
+			if (!i) {
+				rxm_t->nb_segs = 1;
+				first = rxm_t;
+				first->pkt_len = 0;
+				buffers[nb_rx]->buf_addr = rxm_t;
+			} else {
+				prev->next = rxm_t;
+				first->nb_segs++;
+			}
 
-	for (i = 0; i < count; i++)
-		rte_memcpy(buffers[i]->buf_addr, hw->mz[i]->addr, size);
-	return 0;
+			prev = rxm_t;
+			first->pkt_len += prev->data_len;
+			rxq->last_used = (rxq->last_used + 1) &
+					 (rxq->nb_rx_desc - 1);
+
+			/* alloc new mbuf */
+			rxm_t = rte_mbuf_raw_alloc(rxq->mpool);
+			if (unlikely(rxm_t == NULL)) {
+				NTB_LOG(ERR, "recv alloc mbuf failed.");
+				goto end_of_rx;
+			}
+			rxm_t->port = rxq->port_id;
+			sw_ring[rxq->last_avail].mbuf = rxm_t;
+			i++;
+
+			/* fill new desc */
+			rx_desc[nb_mbufs].addr =
+					rte_pktmbuf_mtod(rxm_t, size_t);
+			rx_desc[nb_mbufs++].len = rxm_t->buf_len -
+						  RTE_PKTMBUF_HEADROOM;
+			rxq->last_avail = (rxq->last_avail + 1) &
+					  (rxq->nb_rx_desc - 1);
+
+			if (rx_item->flags & NTB_FLAG_EOP)
+				break;
+		}
+		/* update stats */
+		bytes += first->pkt_len;
+	}
+
+end_of_rx:
+	if (nb_rx) {
+		uint16_t nb1, nb2;
+		if (nb_mbufs > rxq->nb_rx_desc - last_avail) {
+			nb1 = rxq->nb_rx_desc - last_avail;
+			nb2 = nb_mbufs - rxq->nb_rx_desc + last_avail;
+		} else {
+			nb1 = nb_mbufs;
+			nb2 = 0;
+		}
+		rte_memcpy(rxq->rx_desc_ring + last_avail, rx_desc,
+			   sizeof(struct ntb_desc) * nb1);
+		rte_memcpy(rxq->rx_desc_ring, rx_desc + nb1,
+			   sizeof(struct ntb_desc) * nb2);
+		*rxq->avail_cnt = rxq->last_avail;
+		rte_wmb();
+
+		/* update queue stats */
+		off = NTB_XSTATS_NUM * ((size_t)context + 1);
+		hw->ntb_xstats[NTB_RX_BYTES_ID + off] += bytes;
+		hw->ntb_xstats[NTB_RX_PKTS_ID + off] += nb_rx;
+		hw->ntb_xstats[NTB_RX_MISS_ID + off] += (count - nb_rx);
+	}
+
+	return nb_rx;
 }
 
 static void
@@ -1240,7 +1444,7 @@ ntb_remove(struct rte_pci_device *pci_dev)
 
 static struct rte_pci_driver rte_ntb_pmd = {
 	.id_table = pci_id_ntb_map,
-	.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_WC_ACTIVATE,
 	.probe = ntb_probe,
 	.remove = ntb_remove,
 };
diff --git a/drivers/raw/ntb/ntb.h b/drivers/raw/ntb/ntb.h
index 09e28050f..eff1f6f07 100644
--- a/drivers/raw/ntb/ntb.h
+++ b/drivers/raw/ntb/ntb.h
@@ -87,6 +87,7 @@ enum ntb_spad_idx {
  * @ntb_dev_init: Init ntb dev.
  * @get_peer_mw_addr: To get the addr of peer mw[mw_idx].
  * @mw_set_trans: Set translation of internal memory that remote can access.
+ * @ioremap: Translate the remote host address to bar address.
  * @get_link_status: get link status, link speed and link width.
  * @set_link: Set local side up/down.
  * @spad_read: Read local/peer spad register val.
@@ -103,6 +104,7 @@ struct ntb_dev_ops {
 	void *(*get_peer_mw_addr)(const struct rte_rawdev *dev, int mw_idx);
 	int (*mw_set_trans)(const struct rte_rawdev *dev, int mw_idx,
 			    uint64_t addr, uint64_t size);
+	void *(*ioremap)(const struct rte_rawdev *dev, uint64_t addr);
 	int (*get_link_status)(const struct rte_rawdev *dev);
 	int (*set_link)(const struct rte_rawdev *dev, bool up);
 	uint32_t (*spad_read)(const struct rte_rawdev *dev, int spad,
diff --git a/drivers/raw/ntb/ntb_hw_intel.c b/drivers/raw/ntb/ntb_hw_intel.c
index 0e73f1609..e7f8667cd 100644
--- a/drivers/raw/ntb/ntb_hw_intel.c
+++ b/drivers/raw/ntb/ntb_hw_intel.c
@@ -162,6 +162,27 @@ intel_ntb_mw_set_trans(const struct rte_rawdev *dev, int mw_idx,
 	return 0;
 }
 
+static void *
+intel_ntb_ioremap(const struct rte_rawdev *dev, uint64_t addr)
+{
+	struct ntb_hw *hw = dev->dev_private;
+	void *mapped = NULL;
+	void *base;
+	int i;
+
+	for (i = 0; i < hw->peer_used_mws; i++) {
+		if (addr >= hw->peer_mw_base[i] &&
+		    addr <= hw->peer_mw_base[i] + hw->mw_size[i]) {
+			base = intel_ntb_get_peer_mw_addr(dev, i);
+			mapped = (void *)(size_t)(addr - hw->peer_mw_base[i] +
+				 (size_t)base);
+			break;
+		}
+	}
+
+	return mapped;
+}
+
 static int
 intel_ntb_get_link_status(const struct rte_rawdev *dev)
 {
@@ -357,6 +378,7 @@ const struct ntb_dev_ops intel_ntb_ops = {
 	.ntb_dev_init       = intel_ntb_dev_init,
 	.get_peer_mw_addr   = intel_ntb_get_peer_mw_addr,
 	.mw_set_trans       = intel_ntb_mw_set_trans,
+	.ioremap            = intel_ntb_ioremap,
 	.get_link_status    = intel_ntb_get_link_status,
 	.set_link           = intel_ntb_set_link,
 	.spad_read          = intel_ntb_spad_read,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [PATCH v4 4/4] examples/ntb: support more functions for NTB
  2019-09-09  3:27     ` [dpdk-dev] [PATCH v4 0/4] enable FIFO " Xiaoyun Li
                         ` (2 preceding siblings ...)
  2019-09-09  3:27       ` [dpdk-dev] [PATCH v4 3/4] raw/ntb: add enqueue and dequeue functions Xiaoyun Li
@ 2019-09-09  3:27       ` Xiaoyun Li
  2019-09-23  7:18         ` Wu, Jingjing
  2019-09-24  8:43       ` [dpdk-dev] [PATCH v5 0/4] enable FIFO " Xiaoyun Li
  4 siblings, 1 reply; 42+ messages in thread
From: Xiaoyun Li @ 2019-09-09  3:27 UTC (permalink / raw)
  To: jingjing.wu, keith.wiles, omkar.maslekar, cunming.liang; +Cc: dev, Xiaoyun Li

Support to transmit files between two systems.
Support iofwd between one ethdev and NTB device.
Support rxonly and txonly for NTB device.
Support to set forwarding mode as file-trans, txonly,
rxonly or iofwd.
Support to show/clear port stats and throughput.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
---
 doc/guides/sample_app_ug/ntb.rst |   59 +-
 examples/ntb/meson.build         |    3 +
 examples/ntb/ntb_fwd.c           | 1298 +++++++++++++++++++++++++++---
 3 files changed, 1232 insertions(+), 128 deletions(-)

diff --git a/doc/guides/sample_app_ug/ntb.rst b/doc/guides/sample_app_ug/ntb.rst
index 079242175..f8291d7d1 100644
--- a/doc/guides/sample_app_ug/ntb.rst
+++ b/doc/guides/sample_app_ug/ntb.rst
@@ -5,8 +5,17 @@ NTB Sample Application
 ======================
 
 The ntb sample application shows how to use ntb rawdev driver.
-This sample provides interactive mode to transmit file between
-two hosts.
+This sample provides interactive mode to do packet based processing
+between two systems.
+
+This sample supports 4 types of packet forwarding mode.
+
+* ``file-trans``: transmit files between two systems. The sample will
+  be polling to receive files from the peer and save the file as
+  ``ntb_recv_file[N]``, [N] represents the number of received file.
+* ``rxonly``: NTB receives packets but doesn't transmit them.
+* ``txonly``: NTB generates and transmits packets without receiving any.
+* ``iofwd``: iofwd between NTB device and ethdev.
 
 Compiling the Application
 -------------------------
@@ -29,6 +38,40 @@ Refer to the *DPDK Getting Started Guide* for general information on
 running applications and the Environment Abstraction Layer (EAL)
 options.
 
+Command-line Options
+--------------------
+
+The application supports the following command-line options.
+
+* ``--buf-size=N``
+
+  Set the data size of the mbufs used to N bytes, where N < 65536.
+  The default value is 2048.
+
+* ``--fwd-mode=mode``
+
+  Set the packet forwarding mode as ``file-trans``, ``txonly``,
+  ``rxonly`` or ``iofwd``.
+
+* ``--nb-desc=N``
+
+  Set number of descriptors of queue as N, namely queue size,
+  where 64 <= N <= 1024. The default value is 1024.
+
+* ``--txfreet=N``
+
+  Set the transmit free threshold of TX rings to N, where 0 <= N <=
+  the value of ``--nb-desc``. The default value is 256.
+
+* ``--burst=N``
+
+  Set the number of packets per burst to N, where 1 <= N <= 32.
+  The default value is 32.
+
+* ``--qp=N``
+
+  Set the number of queues as N, where qp > 0.
+
 Using the application
 ---------------------
 
@@ -41,7 +84,11 @@ The application is console-driven using the cmdline DPDK interface:
 From this interface the available commands and descriptions of what
 they do as as follows:
 
-* ``send [filepath]``: Send file to the peer host.
-* ``receive [filepath]``: Receive file to [filepath]. Need the peer
-  to send file successfully first.
-* ``quit``: Exit program
+* ``send [filepath]``: Send file to the peer host. Need to be in
+  file-trans forwarding mode first.
+* ``start``: Start transmission.
+* ``stop``: Stop transmission.
+* ``show/clear port stats``: Show/Clear port stats and throughput.
+* ``set fwd file-trans/rxonly/txonly/iofwd``: Set packet forwarding
+  mode.
+* ``quit``: Exit program.
diff --git a/examples/ntb/meson.build b/examples/ntb/meson.build
index 9a6288f4f..f5435fe12 100644
--- a/examples/ntb/meson.build
+++ b/examples/ntb/meson.build
@@ -14,3 +14,6 @@ cflags += ['-D_FILE_OFFSET_BITS=64']
 sources = files(
 	'ntb_fwd.c'
 )
+if dpdk_conf.has('RTE_LIBRTE_PMD_NTB_RAWDEV')
+	deps += 'rawdev_ntb'
+endif
diff --git a/examples/ntb/ntb_fwd.c b/examples/ntb/ntb_fwd.c
index f8c970cdb..b1ea71c8f 100644
--- a/examples/ntb/ntb_fwd.c
+++ b/examples/ntb/ntb_fwd.c
@@ -14,21 +14,103 @@
 #include <cmdline.h>
 #include <rte_common.h>
 #include <rte_rawdev.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
 #include <rte_lcore.h>
+#include <rte_cycles.h>
+#include <rte_pmd_ntb.h>
 
-#define NTB_DRV_NAME_LEN	7
-static uint64_t max_file_size = 0x400000;
+/* Per-port statistics struct */
+struct ntb_port_statistics {
+	uint64_t tx;
+	uint64_t rx;
+} __rte_cache_aligned;
+/* Port 0: NTB dev, Port 1: ethdev when iofwd. */
+struct ntb_port_statistics ntb_port_stats[2];
+
+struct ntb_fwd_stream {
+	uint16_t tx_port;
+	uint16_t rx_port;
+	uint16_t qp_id;
+	uint8_t tx_ntb;  /* If ntb device is tx port. */
+};
+
+struct ntb_fwd_lcore_conf {
+	uint16_t stream_id;
+	uint16_t nb_stream;
+	uint8_t stopped;
+};
+
+enum ntb_fwd_mode {
+	FILE_TRANS = 0,
+	RXONLY,
+	TXONLY,
+	IOFWD,
+	MAX_FWD_MODE,
+};
+static const char *const fwd_mode_s[] = {
+	"file-trans",
+	"rxonly",
+	"txonly",
+	"iofwd",
+	NULL,
+};
+static enum ntb_fwd_mode fwd_mode = MAX_FWD_MODE;
+
+static struct ntb_fwd_lcore_conf fwd_lcore_conf[RTE_MAX_LCORE];
+static struct ntb_fwd_stream *fwd_streams;
+
+static struct rte_mempool *mbuf_pool;
+
+#define NTB_DRV_NAME_LEN 7
+#define MEMPOOL_CACHE_SIZE 256
+
+static uint8_t in_test;
 static uint8_t interactive = 1;
+static uint16_t eth_port_id = RTE_MAX_ETHPORTS;
 static uint16_t dev_id;
 
+/* Number of queues, default set as 1 */
+static uint16_t num_queues = 1;
+static uint16_t ntb_buf_size = RTE_MBUF_DEFAULT_BUF_SIZE;
+
+/* Configurable number of descriptors */
+#define NTB_DEFAULT_NUM_DESCS 1024
+static uint16_t nb_desc = NTB_DEFAULT_NUM_DESCS;
+
+static uint16_t tx_free_thresh;
+
+#define NTB_MAX_PKT_BURST 32
+#define NTB_DFLT_PKT_BURST 32
+static uint16_t pkt_burst = NTB_DFLT_PKT_BURST;
+
+#define BURST_TX_RETRIES 64
+
+static struct rte_eth_conf eth_port_conf = {
+	.rxmode = {
+		.mq_mode = ETH_MQ_RX_RSS,
+		.split_hdr_size = 0,
+	},
+	.rx_adv_conf = {
+		.rss_conf = {
+			.rss_key = NULL,
+			.rss_hf = ETH_RSS_IP,
+		},
+	},
+	.txmode = {
+		.mq_mode = ETH_MQ_TX_NONE,
+	},
+};
+
 /* *** Help command with introduction. *** */
 struct cmd_help_result {
 	cmdline_fixed_string_t help;
 };
 
-static void cmd_help_parsed(__attribute__((unused)) void *parsed_result,
-			    struct cmdline *cl,
-			    __attribute__((unused)) void *data)
+static void
+cmd_help_parsed(__attribute__((unused)) void *parsed_result,
+		struct cmdline *cl,
+		__attribute__((unused)) void *data)
 {
 	cmdline_printf(
 		cl,
@@ -37,13 +119,17 @@ static void cmd_help_parsed(__attribute__((unused)) void *parsed_result,
 		"Control:\n"
 		"    quit                                      :"
 		" Quit the application.\n"
-		"\nFile transmit:\n"
+		"\nTransmission:\n"
 		"    send [path]                               :"
-		" Send [path] file. (No more than %"PRIu64")\n"
-		"    recv [path]                            :"
-		" Receive file to [path]. Make sure sending is done"
-		" on the other side.\n",
-		max_file_size
+		" Send [path] file. Only take effect in file-trans mode\n"
+		"    start                                     :"
+		" Start transmissions.\n"
+		"    stop                                      :"
+		" Stop transmissions.\n"
+		"    clear/show port stats                     :"
+		" Clear/show port stats.\n"
+		"    set fwd file-trans/rxonly/txonly/iofwd    :"
+		" Set packet forwarding mode.\n"
 	);
 
 }
@@ -66,13 +152,37 @@ struct cmd_quit_result {
 	cmdline_fixed_string_t quit;
 };
 
-static void cmd_quit_parsed(__attribute__((unused)) void *parsed_result,
-			    struct cmdline *cl,
-			    __attribute__((unused)) void *data)
+static void
+cmd_quit_parsed(__attribute__((unused)) void *parsed_result,
+		struct cmdline *cl,
+		__attribute__((unused)) void *data)
 {
+	struct ntb_fwd_lcore_conf *conf;
+	uint8_t lcore_id;
+
+	/* Stop transmission first. */
+	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
+		conf = &fwd_lcore_conf[lcore_id];
+
+		if (!conf->nb_stream)
+			continue;
+
+		if (conf->stopped)
+			continue;
+
+		conf->stopped = 1;
+	}
+	printf("\nWaiting for lcores to finish...\n");
+	rte_eal_mp_wait_lcore();
+	in_test = 0;
+
 	/* Stop traffic and Close port. */
 	rte_rawdev_stop(dev_id);
 	rte_rawdev_close(dev_id);
+	if (eth_port_id < RTE_MAX_ETHPORTS && fwd_mode == IOFWD) {
+		rte_eth_dev_stop(eth_port_id);
+		rte_eth_dev_close(eth_port_id);
+	}
 
 	cmdline_quit(cl);
 }
@@ -102,21 +212,19 @@ cmd_sendfile_parsed(void *parsed_result,
 		    __attribute__((unused)) void *data)
 {
 	struct cmd_sendfile_result *res = parsed_result;
-	struct rte_rawdev_buf *pkts_send[1];
-	uint64_t rsize, size, link;
-	uint8_t *buff;
+	struct rte_rawdev_buf *pkts_send[NTB_MAX_PKT_BURST];
+	struct rte_mbuf *mbuf_send[NTB_MAX_PKT_BURST];
+	uint64_t size, count, i, nb_burst;
+	uint16_t nb_tx, buf_size;
+	unsigned int nb_pkt;
+	size_t queue_id = 0;
+	uint16_t retry = 0;
 	uint32_t val;
 	FILE *file;
 
-	if (!rte_rawdevs[dev_id].started) {
-		printf("Device needs to be up first. Try later.\n");
-		return;
-	}
-
-	rte_rawdev_get_attr(dev_id, "link_status", &link);
-	if (!link) {
-		printf("Link is not up, cannot send file.\n");
-		return;
+	if (num_queues != 1) {
+		printf("File transmission only supports 1 queue.\n");
+		num_queues = 1;
 	}
 
 	file = fopen(res->filepath, "r");
@@ -127,30 +235,13 @@ cmd_sendfile_parsed(void *parsed_result,
 
 	if (fseek(file, 0, SEEK_END) < 0) {
 		printf("Fail to get file size.\n");
+		fclose(file);
 		return;
 	}
 	size = ftell(file);
 	if (fseek(file, 0, SEEK_SET) < 0) {
 		printf("Fail to get file size.\n");
-		return;
-	}
-
-	/**
-	 * No FIFO now. Only test memory. Limit sending file
-	 * size <= max_file_size.
-	 */
-	if (size > max_file_size) {
-		printf("Warning: The file is too large. Only send first"
-		       " %"PRIu64" bits.\n", max_file_size);
-		size = max_file_size;
-	}
-
-	buff = (uint8_t *)malloc(size);
-	rsize = fread(buff, size, 1, file);
-	if (rsize != 1) {
-		printf("Fail to read file.\n");
 		fclose(file);
-		free(buff);
 		return;
 	}
 
@@ -159,22 +250,63 @@ cmd_sendfile_parsed(void *parsed_result,
 	rte_rawdev_set_attr(dev_id, "spad_user_0", val);
 	val = size;
 	rte_rawdev_set_attr(dev_id, "spad_user_1", val);
+	printf("Sending file, size is %"PRIu64"\n", size);
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		pkts_send[i] = (struct rte_rawdev_buf *)
+				malloc(sizeof(struct rte_rawdev_buf));
+
+	buf_size = ntb_buf_size - RTE_PKTMBUF_HEADROOM;
+	count = (size + buf_size - 1) / buf_size;
+	nb_burst = (count + pkt_burst - 1) / pkt_burst;
 
-	pkts_send[0] = (struct rte_rawdev_buf *)malloc
-			(sizeof(struct rte_rawdev_buf));
-	pkts_send[0]->buf_addr = buff;
+	for (i = 0; i < nb_burst; i++) {
+		val = RTE_MIN(count, pkt_burst);
+		if (rte_mempool_get_bulk(mbuf_pool, (void **)mbuf_send,
+					val) == 0) {
+			for (nb_pkt = 0; nb_pkt < val; nb_pkt++) {
+				mbuf_send[nb_pkt]->port = dev_id;
+				mbuf_send[nb_pkt]->data_len =
+				fread(rte_pktmbuf_mtod(mbuf_send[nb_pkt],
+					void *), 1, buf_size, file);
+				mbuf_send[nb_pkt]->pkt_len =
+					mbuf_send[nb_pkt]->data_len;
+				pkts_send[nb_pkt]->buf_addr = mbuf_send[nb_pkt];
+			}
+		} else {
+			for (nb_pkt = 0; nb_pkt < val; nb_pkt++) {
+				mbuf_send[nb_pkt] =
+					rte_mbuf_raw_alloc(mbuf_pool);
+				if (mbuf_send[nb_pkt] == NULL)
+					break;
+				mbuf_send[nb_pkt]->port = dev_id;
+				mbuf_send[nb_pkt]->data_len =
+				fread(rte_pktmbuf_mtod(mbuf_send[nb_pkt],
+					void *), 1, buf_size, file);
+				mbuf_send[nb_pkt]->pkt_len =
+					mbuf_send[nb_pkt]->data_len;
+				pkts_send[nb_pkt]->buf_addr = mbuf_send[nb_pkt];
+			}
+		}
 
-	if (rte_rawdev_enqueue_buffers(dev_id, pkts_send, 1,
-				       (void *)(size_t)size)) {
-		printf("Fail to enqueue.\n");
-		goto clean;
+		nb_tx = rte_rawdev_enqueue_buffers(dev_id, pkts_send, nb_pkt,
+						   (void *)queue_id);
+		while (nb_tx != nb_pkt && retry < BURST_TX_RETRIES) {
+			rte_delay_us(1);
+			nb_tx += rte_rawdev_enqueue_buffers(dev_id,
+				&pkts_send[nb_tx], nb_pkt - nb_tx,
+				(void *)queue_id);
+		}
+		count -= nb_pkt;
 	}
+	/* Clear register after file sending done. */
+	rte_rawdev_set_attr(dev_id, "spad_user_0", 0);
+	rte_rawdev_set_attr(dev_id, "spad_user_1", 0);
 	printf("Done sending file.\n");
 
-clean:
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		free(pkts_send[i]);
 	fclose(file);
-	free(buff);
-	free(pkts_send[0]);
 }
 
 cmdline_parse_token_string_t cmd_send_file_send =
@@ -195,79 +327,680 @@ cmdline_parse_inst_t cmd_send_file = {
 	},
 };
 
-/* *** RECEIVE FILE PARAMETERS *** */
-struct cmd_recvfile_result {
-	cmdline_fixed_string_t recv_string;
-	char filepath[];
-};
+#define RECV_FILE_LEN 30
+static int
+start_polling_recv_file(void *param)
+{
+	struct rte_rawdev_buf *pkts_recv[NTB_MAX_PKT_BURST];
+	struct ntb_fwd_lcore_conf *conf = param;
+	struct rte_mbuf *mbuf;
+	char filepath[RECV_FILE_LEN];
+	uint64_t val, size, file_len;
+	uint16_t nb_rx, i, file_no;
+	size_t queue_id = 0;
+	FILE *file;
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		pkts_recv[i] = (struct rte_rawdev_buf *)
+				malloc(sizeof(struct rte_rawdev_buf));
+
+	file_no = 0;
+	while (!conf->stopped) {
+		snprintf(filepath, RECV_FILE_LEN, "ntb_recv_file%d", file_no);
+		file = fopen(filepath, "w");
+		if (file == NULL) {
+			printf("Fail to open the file.\n");
+			return -EINVAL;
+		}
+
+		rte_rawdev_get_attr(dev_id, "spad_user_0", &val);
+		size = val << 32;
+		rte_rawdev_get_attr(dev_id, "spad_user_1", &val);
+		size |= val;
+
+		if (!size) {
+			fclose(file);
+			continue;
+		}
+
+		file_len = 0;
+		nb_rx = NTB_MAX_PKT_BURST;
+		while (file_len < size && !conf->stopped) {
+			nb_rx = rte_rawdev_dequeue_buffers(dev_id, pkts_recv,
+						pkt_burst, (void *)queue_id);
+			ntb_port_stats[0].rx += nb_rx;
+			for (i = 0; i < nb_rx; i++) {
+				mbuf = pkts_recv[i]->buf_addr;
+				fwrite(rte_pktmbuf_mtod(mbuf, void *), 1,
+					mbuf->data_len, file);
+				file_len += mbuf->data_len;
+				rte_pktmbuf_free(mbuf);
+				pkts_recv[i]->buf_addr = NULL;
+			}
+		}
+
+		printf("Received file (size: %" PRIu64 ") from peer to %s.\n",
+			size, filepath);
+		fclose(file);
+		file_no++;
+	}
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		free(pkts_recv[i]);
+	return 0;
+}
+
+static int
+start_iofwd_per_lcore(void *param)
+{
+	struct rte_rawdev_buf *ntb_buf[NTB_MAX_PKT_BURST];
+	struct rte_mbuf *pkts_burst[NTB_MAX_PKT_BURST];
+	struct ntb_fwd_lcore_conf *conf = param;
+	struct ntb_fwd_stream fs;
+	uint16_t nb_rx, nb_tx;
+	int i, j;
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		ntb_buf[i] = (struct rte_rawdev_buf *)
+			     malloc(sizeof(struct rte_rawdev_buf));
+
+	while (!conf->stopped) {
+		for (i = 0; i < conf->nb_stream; i++) {
+			fs = fwd_streams[conf->stream_id + i];
+			if (fs.tx_ntb) {
+				nb_rx = rte_eth_rx_burst(fs.rx_port,
+						fs.qp_id, pkts_burst,
+						pkt_burst);
+				if (unlikely(nb_rx == 0))
+					continue;
+				for (j = 0; j < nb_rx; j++)
+					ntb_buf[j]->buf_addr = pkts_burst[j];
+				nb_tx =
+				rte_rawdev_enqueue_buffers(fs.tx_port,
+						ntb_buf, nb_rx,
+						(void *)(size_t)fs.qp_id);
+				ntb_port_stats[0].tx += nb_tx;
+				ntb_port_stats[1].rx += nb_rx;
+			} else {
+				nb_rx =
+				rte_rawdev_dequeue_buffers(fs.rx_port,
+						ntb_buf, pkt_burst,
+						(void *)(size_t)fs.qp_id);
+				if (unlikely(nb_rx == 0))
+					continue;
+				for (j = 0; j < nb_rx; j++)
+					pkts_burst[j] = ntb_buf[j]->buf_addr;
+				nb_tx = rte_eth_tx_burst(fs.tx_port,
+					fs.qp_id, pkts_burst, nb_rx);
+				ntb_port_stats[1].tx += nb_tx;
+				ntb_port_stats[0].rx += nb_rx;
+			}
+			if (unlikely(nb_tx < nb_rx)) {
+				do {
+					rte_pktmbuf_free(pkts_burst[nb_tx]);
+				} while (++nb_tx < nb_rx);
+			}
+		}
+	}
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		free(ntb_buf[i]);
+
+	return 0;
+}
+
+static int
+start_rxonly_per_lcore(void *param)
+{
+	struct rte_rawdev_buf *ntb_buf[NTB_MAX_PKT_BURST];
+	struct ntb_fwd_lcore_conf *conf = param;
+	struct ntb_fwd_stream fs;
+	uint16_t nb_rx;
+	int i, j;
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		ntb_buf[i] = (struct rte_rawdev_buf *)
+			     malloc(sizeof(struct rte_rawdev_buf));
+
+	while (!conf->stopped) {
+		for (i = 0; i < conf->nb_stream; i++) {
+			fs = fwd_streams[conf->stream_id + i];
+			nb_rx = rte_rawdev_dequeue_buffers(fs.rx_port,
+				ntb_buf, pkt_burst, (void *)(size_t)fs.qp_id);
+			if (unlikely(nb_rx == 0))
+				continue;
+			ntb_port_stats[0].rx += nb_rx;
+
+			for (j = 0; j < nb_rx; j++)
+				rte_pktmbuf_free(ntb_buf[j]->buf_addr);
+		}
+	}
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		free(ntb_buf[i]);
+
+	return 0;
+}
+
+
+static int
+start_txonly_per_lcore(void *param)
+{
+	struct rte_rawdev_buf *ntb_buf[NTB_MAX_PKT_BURST];
+	struct rte_mbuf *pkts_burst[NTB_MAX_PKT_BURST];
+	struct ntb_fwd_lcore_conf *conf = param;
+	struct ntb_fwd_stream fs;
+	uint16_t nb_pkt, nb_tx;
+	int i;
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		ntb_buf[i] = (struct rte_rawdev_buf *)
+			     malloc(sizeof(struct rte_rawdev_buf));
+
+	while (!conf->stopped) {
+		for (i = 0; i < conf->nb_stream; i++) {
+			fs = fwd_streams[conf->stream_id + i];
+			if (rte_mempool_get_bulk(mbuf_pool, (void **)pkts_burst,
+				  pkt_burst) == 0) {
+				for (nb_pkt = 0; nb_pkt < pkt_burst; nb_pkt++) {
+					pkts_burst[nb_pkt]->port = dev_id;
+					pkts_burst[nb_pkt]->data_len =
+						pkts_burst[nb_pkt]->buf_len -
+						RTE_PKTMBUF_HEADROOM;
+					pkts_burst[nb_pkt]->pkt_len =
+						pkts_burst[nb_pkt]->data_len;
+					ntb_buf[nb_pkt]->buf_addr =
+						pkts_burst[nb_pkt];
+				}
+			} else {
+				for (nb_pkt = 0; nb_pkt < pkt_burst; nb_pkt++) {
+					pkts_burst[nb_pkt] =
+						rte_pktmbuf_alloc(mbuf_pool);
+					if (pkts_burst[nb_pkt] == NULL)
+						break;
+					pkts_burst[nb_pkt]->port = dev_id;
+					pkts_burst[nb_pkt]->data_len =
+						pkts_burst[nb_pkt]->buf_len -
+						RTE_PKTMBUF_HEADROOM;
+					pkts_burst[nb_pkt]->pkt_len =
+						pkts_burst[nb_pkt]->data_len;
+					ntb_buf[nb_pkt]->buf_addr =
+						pkts_burst[nb_pkt];
+				}
+			}
+			nb_tx = rte_rawdev_enqueue_buffers(fs.tx_port,
+				ntb_buf, nb_pkt, (void *)(size_t)fs.qp_id);
+			ntb_port_stats[0].tx += nb_tx;
+			if (unlikely(nb_tx < nb_pkt)) {
+				do {
+					rte_pktmbuf_free(
+						ntb_buf[nb_tx]->buf_addr);
+				} while (++nb_tx < nb_pkt);
+			}
+		}
+	}
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		free(ntb_buf[i]);
+
+	return 0;
+}
+
+static int
+ntb_fwd_config_setup(void)
+{
+	uint16_t i;
+
+	/* Make sure iofwd has valid ethdev. */
+	if (fwd_mode == IOFWD && eth_port_id >= RTE_MAX_ETHPORTS) {
+		printf("No ethdev, cannot be in iofwd mode.");
+		return -EINVAL;
+	}
+
+	if (fwd_mode == IOFWD) {
+		fwd_streams = rte_zmalloc("ntb_fwd: fwd_streams",
+			sizeof(struct ntb_fwd_stream) * num_queues * 2,
+			RTE_CACHE_LINE_SIZE);
+		for (i = 0; i < num_queues; i++) {
+			fwd_streams[i * 2].qp_id = i;
+			fwd_streams[i * 2].tx_port = dev_id;
+			fwd_streams[i * 2].rx_port = eth_port_id;
+			fwd_streams[i * 2].tx_ntb = 1;
+
+			fwd_streams[i * 2 + 1].qp_id = i;
+			fwd_streams[i * 2 + 1].tx_port = eth_port_id;
+			fwd_streams[i * 2 + 1].rx_port = dev_id;
+			fwd_streams[i * 2 + 1].tx_ntb = 0;
+		}
+		return 0;
+	}
+
+	if (fwd_mode == RXONLY || fwd_mode == FILE_TRANS) {
+		/* Only support 1 queue in file-trans for in order. */
+		if (fwd_mode == FILE_TRANS)
+			num_queues = 1;
+
+		fwd_streams = rte_zmalloc("ntb_fwd: fwd_streams",
+				sizeof(struct ntb_fwd_stream) * num_queues,
+				RTE_CACHE_LINE_SIZE);
+		for (i = 0; i < num_queues; i++) {
+			fwd_streams[i].qp_id = i;
+			fwd_streams[i].tx_port = RTE_MAX_ETHPORTS;
+			fwd_streams[i].rx_port = dev_id;
+			fwd_streams[i].tx_ntb = 0;
+		}
+		return 0;
+	}
+
+	if (fwd_mode == TXONLY) {
+		fwd_streams = rte_zmalloc("ntb_fwd: fwd_streams",
+				sizeof(struct ntb_fwd_stream) * num_queues,
+				RTE_CACHE_LINE_SIZE);
+		for (i = 0; i < num_queues; i++) {
+			fwd_streams[i].qp_id = i;
+			fwd_streams[i].tx_port = dev_id;
+			fwd_streams[i].rx_port = RTE_MAX_ETHPORTS;
+			fwd_streams[i].tx_ntb = 1;
+		}
+	}
+	return 0;
+}
 
 static void
-cmd_recvfile_parsed(void *parsed_result,
-		    __attribute__((unused)) struct cmdline *cl,
-		    __attribute__((unused)) void *data)
+assign_stream_to_lcores(void)
 {
-	struct cmd_sendfile_result *res = parsed_result;
-	struct rte_rawdev_buf *pkts_recv[1];
-	uint8_t *buff;
-	uint64_t val;
-	size_t size;
-	FILE *file;
+	struct ntb_fwd_lcore_conf *conf;
+	struct ntb_fwd_stream *fs;
+	uint16_t nb_streams, sm_per_lcore, sm_id, i;
+	uint8_t lcore_id, lcore_num, nb_extra;
 
-	if (!rte_rawdevs[dev_id].started) {
-		printf("Device needs to be up first. Try later.\n");
-		return;
+	lcore_num = rte_lcore_count();
+	/* Exclude master core */
+	lcore_num--;
+
+	nb_streams = (fwd_mode == IOFWD) ? num_queues * 2 : num_queues;
+
+	sm_per_lcore = nb_streams / lcore_num;
+	nb_extra = nb_streams % lcore_num;
+	sm_id = 0;
+	i = 0;
+
+	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
+		conf = &fwd_lcore_conf[lcore_id];
+
+		if (i < nb_extra) {
+			conf->nb_stream = sm_per_lcore + 1;
+			conf->stream_id = sm_id;
+			sm_id = sm_id + sm_per_lcore + 1;
+		} else {
+			conf->nb_stream = sm_per_lcore;
+			conf->stream_id = sm_id;
+			sm_id = sm_id + sm_per_lcore;
+		}
+
+		i++;
+		if (sm_id >= nb_streams)
+			break;
+	}
+
+	/* Print packet forwading config. */
+	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
+		conf = &fwd_lcore_conf[lcore_id];
+
+		if (!conf->nb_stream)
+			continue;
+
+		printf("Streams on Lcore %u :\n", lcore_id);
+		for (i = 0; i < conf->nb_stream; i++) {
+			fs = &fwd_streams[conf->stream_id + i];
+			if (fwd_mode == IOFWD)
+				printf(" + Stream %u : %s%u RX -> %s%u TX,"
+					" Q=%u\n", conf->stream_id + i,
+					fs->tx_ntb ? "Eth" : "NTB", fs->rx_port,
+					fs->tx_ntb ? "NTB" : "Eth", fs->tx_port,
+					fs->qp_id);
+			if (fwd_mode == FILE_TRANS || fwd_mode == RXONLY)
+				printf(" + Stream %u : %s%u RX only\n",
+					conf->stream_id, "NTB", fs->rx_port);
+			if (fwd_mode == TXONLY)
+				printf(" + Stream %u : %s%u TX only\n",
+					conf->stream_id, "NTB", fs->tx_port);
+		}
 	}
+}
 
-	rte_rawdev_get_attr(dev_id, "link_status", &val);
-	if (!val) {
-		printf("Link is not up, cannot receive file.\n");
+static void
+start_pkt_fwd(void)
+{
+	struct ntb_fwd_lcore_conf *conf;
+	struct rte_eth_link eth_link;
+	uint8_t lcore_id;
+	int ret, i;
+
+	ret = ntb_fwd_config_setup();
+	if (ret < 0) {
+		printf("Cannot start traffic. Please reset fwd mode.\n");
 		return;
 	}
 
-	file = fopen(res->filepath, "w");
-	if (file == NULL) {
-		printf("Fail to open the file.\n");
+	/* If using iofwd, checking ethdev link status first. */
+	if (fwd_mode == IOFWD) {
+		printf("Checking eth link status...\n");
+		/* Wait for eth link up at most 100 times. */
+		for (i = 0; i < 100; i++) {
+			rte_eth_link_get(eth_port_id, &eth_link);
+			if (eth_link.link_status) {
+				printf("Eth%u Link Up. Speed %u Mbps - %s\n",
+					eth_port_id, eth_link.link_speed,
+					(eth_link.link_duplex ==
+					 ETH_LINK_FULL_DUPLEX) ?
+					("full-duplex") : ("half-duplex"));
+				break;
+			}
+		}
+		if (!eth_link.link_status) {
+			printf("Eth%u link down. Cannot start traffic.\n",
+				eth_port_id);
+			return;
+		}
+	}
+
+	assign_stream_to_lcores();
+	in_test = 1;
+
+	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
+		conf = &fwd_lcore_conf[lcore_id];
+
+		if (!conf->nb_stream)
+			continue;
+
+		conf->stopped = 0;
+		if (fwd_mode == FILE_TRANS)
+			rte_eal_remote_launch(start_polling_recv_file,
+					      conf, lcore_id);
+		else if (fwd_mode == IOFWD)
+			rte_eal_remote_launch(start_iofwd_per_lcore,
+					      conf, lcore_id);
+		else if (fwd_mode == RXONLY)
+			rte_eal_remote_launch(start_rxonly_per_lcore,
+					      conf, lcore_id);
+		else if (fwd_mode == TXONLY)
+			rte_eal_remote_launch(start_txonly_per_lcore,
+					      conf, lcore_id);
+	}
+}
+
+/* *** START FWD PARAMETERS *** */
+struct cmd_start_result {
+	cmdline_fixed_string_t start;
+};
+
+static void
+cmd_start_parsed(__attribute__((unused)) void *parsed_result,
+			    __attribute__((unused)) struct cmdline *cl,
+			    __attribute__((unused)) void *data)
+{
+	start_pkt_fwd();
+}
+
+cmdline_parse_token_string_t cmd_start_start =
+		TOKEN_STRING_INITIALIZER(struct cmd_start_result, start, "start");
+
+cmdline_parse_inst_t cmd_start = {
+	.f = cmd_start_parsed,
+	.data = NULL,
+	.help_str = "start pkt fwd between ntb and ethdev",
+	.tokens = {
+		(void *)&cmd_start_start,
+		NULL,
+	},
+};
+
+/* *** STOP *** */
+struct cmd_stop_result {
+	cmdline_fixed_string_t stop;
+};
+
+static void
+cmd_stop_parsed(__attribute__((unused)) void *parsed_result,
+		__attribute__((unused)) struct cmdline *cl,
+		__attribute__((unused)) void *data)
+{
+	struct ntb_fwd_lcore_conf *conf;
+	uint8_t lcore_id;
+
+	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
+		conf = &fwd_lcore_conf[lcore_id];
+
+		if (!conf->nb_stream)
+			continue;
+
+		if (conf->stopped)
+			continue;
+
+		conf->stopped = 1;
+	}
+	printf("\nWaiting for lcores to finish...\n");
+	rte_eal_mp_wait_lcore();
+	in_test = 0;
+	printf("\nDone.\n");
+}
+
+cmdline_parse_token_string_t cmd_stop_stop =
+		TOKEN_STRING_INITIALIZER(struct cmd_stop_result, stop, "stop");
+
+cmdline_parse_inst_t cmd_stop = {
+	.f = cmd_stop_parsed,
+	.data = NULL,
+	.help_str = "stop: Stop packet forwarding",
+	.tokens = {
+		(void *)&cmd_stop_stop,
+		NULL,
+	},
+};
+
+static void
+ntb_stats_clear(void)
+{
+	int nb_ids, i;
+	uint32_t *ids;
+
+	/* Clear NTB dev stats */
+	nb_ids = rte_rawdev_xstats_names_get(dev_id, NULL, 0);
+	if (nb_ids  < 0) {
+		printf("Error: Cannot get count of xstats\n");
 		return;
 	}
+	ids = malloc(sizeof(uint32_t) * nb_ids);
+	for (i = 0; i < nb_ids; i++)
+		ids[i] = i;
+	rte_rawdev_xstats_reset(dev_id, ids, nb_ids);
+	printf("\n  statistics for NTB port %d cleared\n", dev_id);
+
+	/* Clear Ethdev stats if have any */
+	if (fwd_mode == IOFWD && eth_port_id != RTE_MAX_ETHPORTS) {
+		rte_eth_stats_reset(eth_port_id);
+		printf("\n  statistics for ETH port %d cleared\n", eth_port_id);
+	}
+}
+
+static inline void
+ntb_calculate_throughput(uint16_t port) {
+	uint64_t diff_pkts_rx, diff_pkts_tx, diff_cycles;
+	uint64_t mpps_rx, mpps_tx;
+	static uint64_t prev_pkts_rx[2];
+	static uint64_t prev_pkts_tx[2];
+	static uint64_t prev_cycles[2];
+
+	diff_cycles = prev_cycles[port];
+	prev_cycles[port] = rte_rdtsc();
+	if (diff_cycles > 0)
+		diff_cycles = prev_cycles[port] - diff_cycles;
+	diff_pkts_rx = (ntb_port_stats[port].rx > prev_pkts_rx[port]) ?
+		(ntb_port_stats[port].rx - prev_pkts_rx[port]) : 0;
+	diff_pkts_tx = (ntb_port_stats[port].tx > prev_pkts_tx[port]) ?
+		(ntb_port_stats[port].tx - prev_pkts_tx[port]) : 0;
+	prev_pkts_rx[port] = ntb_port_stats[port].rx;
+	prev_pkts_tx[port] = ntb_port_stats[port].tx;
+	mpps_rx = diff_cycles > 0 ?
+		diff_pkts_rx * rte_get_tsc_hz() / diff_cycles : 0;
+	mpps_tx = diff_cycles > 0 ?
+		diff_pkts_tx * rte_get_tsc_hz() / diff_cycles : 0;
+	printf("  Throughput (since last show)\n");
+	printf("  Rx-pps: %12"PRIu64"\n  Tx-pps: %12"PRIu64"\n",
+			mpps_rx, mpps_tx);
+
+}
+
+static void
+ntb_stats_display(void)
+{
+	struct rte_rawdev_xstats_name *xstats_names;
+	struct rte_eth_stats stats;
+	uint64_t *values;
+	uint32_t *ids;
+	int nb_ids, i;
 
-	rte_rawdev_get_attr(dev_id, "spad_user_0", &val);
-	size = val << 32;
-	rte_rawdev_get_attr(dev_id, "spad_user_1", &val);
-	size |= val;
+	printf("###### statistics for NTB port %d #######\n", dev_id);
 
-	buff = (uint8_t *)malloc(size);
-	pkts_recv[0] = (struct rte_rawdev_buf *)malloc
-			(sizeof(struct rte_rawdev_buf));
-	pkts_recv[0]->buf_addr = buff;
+	/* Get NTB dev stats and stats names */
+	nb_ids = rte_rawdev_xstats_names_get(dev_id, NULL, 0);
+	if (nb_ids  < 0) {
+		printf("Error: Cannot get count of xstats\n");
+		return;
+	}
+	xstats_names = malloc(sizeof(struct rte_rawdev_xstats_name) * nb_ids);
+	if (xstats_names == NULL) {
+		printf("Cannot allocate memory for xstats lookup\n");
+		return;
+	}
+	if (nb_ids != rte_rawdev_xstats_names_get(
+			dev_id, xstats_names, nb_ids)) {
+		printf("Error: Cannot get xstats lookup\n");
+		free(xstats_names);
+		return;
+	}
+	ids = malloc(sizeof(uint32_t) * nb_ids);
+	for (i = 0; i < nb_ids; i++)
+		ids[i] = i;
+	values = malloc(sizeof(uint64_t) * nb_ids);
+	if (nb_ids != rte_rawdev_xstats_get(dev_id, ids, values, nb_ids)) {
+		printf("Error: Unable to get xstats\n");
+		free(xstats_names);
+		free(values);
+		free(ids);
+		return;
+	}
+
+	/* Display NTB dev stats */
+	for (i = 0; i < nb_ids; i++)
+		printf("  %s: %"PRIu64"\n", xstats_names[i].name, values[i]);
+	ntb_calculate_throughput(0);
 
-	if (rte_rawdev_dequeue_buffers(dev_id, pkts_recv, 1, (void *)size)) {
-		printf("Fail to dequeue.\n");
-		goto clean;
+	/* Get Ethdev stats if have any */
+	if (fwd_mode == IOFWD && eth_port_id != RTE_MAX_ETHPORTS) {
+		printf("###### statistics for ETH port %d ######\n",
+			eth_port_id);
+		rte_eth_stats_get(eth_port_id, &stats);
+		printf("  RX-packets: %"PRIu64"\n", stats.ipackets);
+		printf("  RX-bytes: %"PRIu64"\n", stats.ibytes);
+		printf("  RX-errors: %"PRIu64"\n", stats.ierrors);
+		printf("  RX-missed: %"PRIu64"\n", stats.imissed);
+		printf("  TX-packets: %"PRIu64"\n", stats.opackets);
+		printf("  TX-bytes: %"PRIu64"\n", stats.obytes);
+		printf("  TX-errors: %"PRIu64"\n", stats.oerrors);
+		ntb_calculate_throughput(1);
 	}
 
-	fwrite(buff, size, 1, file);
-	printf("Done receiving to file.\n");
+	free(xstats_names);
+	free(values);
+	free(ids);
+}
 
-clean:
-	fclose(file);
-	free(buff);
-	free(pkts_recv[0]);
+/* *** SHOW/CLEAR PORT STATS *** */
+struct cmd_stats_result {
+	cmdline_fixed_string_t show;
+	cmdline_fixed_string_t port;
+	cmdline_fixed_string_t stats;
+};
+
+static void
+cmd_stats_parsed(void *parsed_result,
+		 __attribute__((unused)) struct cmdline *cl,
+		 __attribute__((unused)) void *data)
+{
+	struct cmd_stats_result *res = parsed_result;
+	if (!strcmp(res->show, "clear"))
+		ntb_stats_clear();
+	else
+		ntb_stats_display();
 }
 
-cmdline_parse_token_string_t cmd_recv_file_recv =
-	TOKEN_STRING_INITIALIZER(struct cmd_recvfile_result, recv_string,
-				 "recv");
-cmdline_parse_token_string_t cmd_recv_file_filepath =
-	TOKEN_STRING_INITIALIZER(struct cmd_recvfile_result, filepath, NULL);
+cmdline_parse_token_string_t cmd_stats_show =
+	TOKEN_STRING_INITIALIZER(struct cmd_stats_result, show, "show#clear");
+cmdline_parse_token_string_t cmd_stats_port =
+	TOKEN_STRING_INITIALIZER(struct cmd_stats_result, port, "port");
+cmdline_parse_token_string_t cmd_stats_stats =
+	TOKEN_STRING_INITIALIZER(struct cmd_stats_result, stats, "stats");
 
 
-cmdline_parse_inst_t cmd_recv_file = {
-	.f = cmd_recvfile_parsed,
+cmdline_parse_inst_t cmd_stats = {
+	.f = cmd_stats_parsed,
 	.data = NULL,
-	.help_str = "recv <file_path>",
+	.help_str = "show|clear port stats",
 	.tokens = {
-		(void *)&cmd_recv_file_recv,
-		(void *)&cmd_recv_file_filepath,
+		(void *)&cmd_stats_show,
+		(void *)&cmd_stats_port,
+		(void *)&cmd_stats_stats,
+		NULL,
+	},
+};
+
+/* *** SET FORWARDING MODE *** */
+struct cmd_set_fwd_mode_result {
+	cmdline_fixed_string_t set;
+	cmdline_fixed_string_t fwd;
+	cmdline_fixed_string_t mode;
+};
+
+static void
+cmd_set_fwd_mode_parsed(__attribute__((unused)) void *parsed_result,
+			__attribute__((unused)) struct cmdline *cl,
+			__attribute__((unused)) void *data)
+{
+	struct cmd_set_fwd_mode_result *res = parsed_result;
+	int i;
+
+	if (in_test) {
+		printf("Please stop traffic first.\n");
+		return;
+	}
+
+	for (i = 0; i < MAX_FWD_MODE; i++) {
+		if (!strcmp(res->mode, fwd_mode_s[i])) {
+			fwd_mode = i;
+			return;
+		}
+	}
+	printf("Invalid %s packet forwarding mode.\n", res->mode);
+}
+
+cmdline_parse_token_string_t cmd_setfwd_set =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_fwd_mode_result, set, "set");
+cmdline_parse_token_string_t cmd_setfwd_fwd =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_fwd_mode_result, fwd, "fwd");
+cmdline_parse_token_string_t cmd_setfwd_mode =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_fwd_mode_result, mode,
+				"file-trans#iofwd#txonly#rxonly");
+
+cmdline_parse_inst_t cmd_set_fwd_mode = {
+	.f = cmd_set_fwd_mode_parsed,
+	.data = NULL,
+	.help_str = "set forwarding mode as file-trans|rxonly|txonly|iofwd",
+	.tokens = {
+		(void *)&cmd_setfwd_set,
+		(void *)&cmd_setfwd_fwd,
+		(void *)&cmd_setfwd_mode,
 		NULL,
 	},
 };
@@ -276,7 +1009,10 @@ cmdline_parse_inst_t cmd_recv_file = {
 cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_help,
 	(cmdline_parse_inst_t *)&cmd_send_file,
-	(cmdline_parse_inst_t *)&cmd_recv_file,
+	(cmdline_parse_inst_t *)&cmd_start,
+	(cmdline_parse_inst_t *)&cmd_stop,
+	(cmdline_parse_inst_t *)&cmd_stats,
+	(cmdline_parse_inst_t *)&cmd_set_fwd_mode,
 	(cmdline_parse_inst_t *)&cmd_quit,
 	NULL,
 };
@@ -305,45 +1041,257 @@ signal_handler(int signum)
 	}
 }
 
+#define OPT_BUF_SIZE         "buf-size"
+#define OPT_FWD_MODE         "fwd-mode"
+#define OPT_NB_DESC          "nb-desc"
+#define OPT_TXFREET          "txfreet"
+#define OPT_BURST            "burst"
+#define OPT_QP               "qp"
+
+enum {
+	/* long options mapped to a short option */
+	OPT_NO_ZERO_COPY_NUM = 1,
+	OPT_BUF_SIZE_NUM,
+	OPT_FWD_MODE_NUM,
+	OPT_NB_DESC_NUM,
+	OPT_TXFREET_NUM,
+	OPT_BURST_NUM,
+	OPT_QP_NUM,
+};
+
+static const char short_options[] =
+	"i" /* interactive mode */
+	;
+
+static const struct option lgopts[] = {
+	{OPT_BUF_SIZE,     1, NULL, OPT_BUF_SIZE_NUM     },
+	{OPT_FWD_MODE,     1, NULL, OPT_FWD_MODE_NUM     },
+	{OPT_NB_DESC,      1, NULL, OPT_NB_DESC_NUM      },
+	{OPT_TXFREET,      1, NULL, OPT_TXFREET_NUM      },
+	{OPT_BURST,        1, NULL, OPT_BURST_NUM        },
+	{OPT_QP,           1, NULL, OPT_QP_NUM           },
+	{0,                0, NULL, 0                    }
+};
+
 static void
 ntb_usage(const char *prgname)
 {
 	printf("%s [EAL options] -- [options]\n"
-	       "-i : run in interactive mode (default value is 1)\n",
-	       prgname);
+	       "-i: run in interactive mode.\n"
+	       "-qp=N: set number of queues as N (N > 0, default: 1).\n"
+	       "--fwd-mode=N: set fwd mode (N: file-trans | rxonly | "
+	       "txonly | iofwd, default: file-trans)\n"
+	       "--buf-size=N: set mbuf dataroom size as N (0 < N < 65535,"
+	       " default: 2048).\n"
+	       "--nb-desc=N: set number of descriptors as N (%u <= N <= %u,"
+	       " default: 1024).\n"
+	       "--txfreet=N: set tx free thresh for NTB driver as N. (N >= 0)\n"
+	       "--burst=N: set pkt burst as N (0 < N <= %u default: 32).\n",
+	       prgname, NTB_MIN_DESC_SIZE, NTB_MAX_DESC_SIZE,
+	       NTB_MAX_PKT_BURST);
 }
 
-static int
-parse_args(int argc, char **argv)
+static void
+ntb_parse_args(int argc, char **argv)
 {
 	char *prgname = argv[0], **argvopt = argv;
-	int opt, ret;
+	int opt, opt_idx, n, i;
 
-	/* Only support interactive mode to send/recv file first. */
-	while ((opt = getopt(argc, argvopt, "i")) != EOF) {
+	while ((opt = getopt_long(argc, argvopt, short_options,
+				lgopts, &opt_idx)) != EOF) {
 		switch (opt) {
 		case 'i':
-			printf("Interactive-mode selected\n");
+			printf("Interactive-mode selected.\n");
 			interactive = 1;
 			break;
+		case OPT_QP_NUM:
+			n = atoi(optarg);
+			if (n > 0)
+				num_queues = n;
+			else
+				rte_exit(EXIT_FAILURE, "q must be > 0.\n");
+			break;
+		case OPT_BUF_SIZE_NUM:
+			n = atoi(optarg);
+			if (n > RTE_PKTMBUF_HEADROOM && n <= 0xFFFF)
+				ntb_buf_size = n;
+			else
+				rte_exit(EXIT_FAILURE, "buf-size must be > "
+					"%u and < 65536.\n",
+					RTE_PKTMBUF_HEADROOM);
+			break;
+		case OPT_FWD_MODE_NUM:
+			for (i = 0; i < MAX_FWD_MODE; i++) {
+				if (!strcmp(optarg, fwd_mode_s[i])) {
+					fwd_mode = i;
+					break;
+				}
+			}
+			if (i == MAX_FWD_MODE)
+				rte_exit(EXIT_FAILURE, "Unsupported mode. "
+				"(Should be: file-trans | rxonly | txonly "
+				"| iofwd)\n");
+			break;
+		case OPT_NB_DESC_NUM:
+			n = atoi(optarg);
+			if (n >= NTB_MIN_DESC_SIZE && n <= NTB_MAX_DESC_SIZE)
+				nb_desc = n;
+			else
+				rte_exit(EXIT_FAILURE, "nb-desc must be within"
+					" [%u, %u].\n", NTB_MIN_DESC_SIZE,
+					NTB_MAX_DESC_SIZE);
+			break;
+		case OPT_TXFREET_NUM:
+			n = atoi(optarg);
+			if (n >= 0)
+				tx_free_thresh = n;
+			else
+				rte_exit(EXIT_FAILURE, "txfreet must be"
+					" >= 0\n");
+			break;
+		case OPT_BURST_NUM:
+			n = atoi(optarg);
+			if (n > 0 && n <= NTB_MAX_PKT_BURST)
+				pkt_burst = n;
+			else
+				rte_exit(EXIT_FAILURE, "burst must be within "
+					"(0, %u].\n", NTB_MAX_PKT_BURST);
+			break;
 
 		default:
 			ntb_usage(prgname);
-			return -1;
+			rte_exit(EXIT_FAILURE,
+				 "Command line is incomplete or incorrect.\n");
+			break;
 		}
 	}
+}
 
-	if (optind >= 0)
-		argv[optind-1] = prgname;
+static void
+ntb_mempool_mz_free(__rte_unused struct rte_mempool_memhdr *memhdr,
+		void *opaque)
+{
+	const struct rte_memzone *mz = opaque;
+	rte_memzone_free(mz);
+}
 
-	ret = optind-1;
-	optind = 1; /* reset getopt lib */
-	return ret;
+static struct rte_mempool *
+ntb_mbuf_pool_create(uint16_t mbuf_seg_size, uint32_t nb_mbuf,
+		     struct ntb_dev_info ntb_info,
+		     struct ntb_dev_config *ntb_conf,
+		     unsigned int socket_id)
+{
+	size_t mz_len, total_elt_sz, max_mz_len, left_sz;
+	struct rte_pktmbuf_pool_private mbp_priv;
+	char pool_name[RTE_MEMPOOL_NAMESIZE];
+	char mz_name[RTE_MEMZONE_NAMESIZE];
+	const struct rte_memzone *mz;
+	struct rte_mempool *mp;
+	uint64_t align;
+	uint32_t mz_id;
+	int ret;
+
+	snprintf(pool_name, sizeof(pool_name), "ntb_mbuf_pool_%u", socket_id);
+	mp = rte_mempool_create_empty(pool_name, nb_mbuf,
+				      (mbuf_seg_size + sizeof(struct rte_mbuf)),
+				      MEMPOOL_CACHE_SIZE,
+				      sizeof(struct rte_pktmbuf_pool_private),
+				      socket_id, 0);
+	if (mp == NULL)
+		return NULL;
+
+	mbp_priv.mbuf_data_room_size = mbuf_seg_size;
+	mbp_priv.mbuf_priv_size = 0;
+	rte_pktmbuf_pool_init(mp, &mbp_priv);
+
+	ntb_conf->mz_list = rte_zmalloc("ntb_memzone_list",
+				sizeof(struct rte_memzone *) *
+				ntb_info.mw_cnt, 0);
+	if (ntb_conf->mz_list == NULL)
+		goto fail;
+
+	/* Put ntb header on mw0. */
+	if (ntb_info.mw_size[0] < ntb_info.ntb_hdr_size) {
+		printf("mw0 (size: %" PRIu64 ") is not enough for ntb hdr"
+		       " (size: %u)\n", ntb_info.mw_size[0],
+		       ntb_info.ntb_hdr_size);
+		goto fail;
+	}
+
+	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
+	left_sz = total_elt_sz * nb_mbuf;
+	for (mz_id = 0; mz_id < ntb_info.mw_cnt; mz_id++) {
+		/* If populated mbuf is enough, no need to reserve extra mz. */
+		if (!left_sz)
+			break;
+		snprintf(mz_name, sizeof(mz_name), "ntb_mw_%d", mz_id);
+		align = ntb_info.mw_size_align ? ntb_info.mw_size[mz_id] :
+			RTE_CACHE_LINE_SIZE;
+		/* Reserve ntb header space on memzone 0. */
+		max_mz_len = mz_id ? ntb_info.mw_size[mz_id] :
+			     ntb_info.mw_size[mz_id] - ntb_info.ntb_hdr_size;
+		mz_len = left_sz <= max_mz_len ? left_sz :
+			(max_mz_len / total_elt_sz * total_elt_sz);
+		if (!mz_len)
+			continue;
+		mz = rte_memzone_reserve_aligned(mz_name, mz_len, socket_id,
+					RTE_MEMZONE_IOVA_CONTIG, align);
+		if (mz == NULL) {
+			printf("Cannot allocate %" PRIu64 " aligned memzone"
+				" %u\n", align, mz_id);
+			goto fail;
+		}
+		left_sz -= mz_len;
+
+		/* Reserve ntb header space on memzone 0. */
+		if (mz_id)
+			ret = rte_mempool_populate_iova(mp, mz->addr, mz->iova,
+					mz->len, ntb_mempool_mz_free,
+					(void *)(uintptr_t)mz);
+		else
+			ret = rte_mempool_populate_iova(mp,
+					(void *)((size_t)mz->addr +
+					ntb_info.ntb_hdr_size),
+					mz->iova + ntb_info.ntb_hdr_size,
+					mz->len - ntb_info.ntb_hdr_size,
+					ntb_mempool_mz_free,
+					(void *)(uintptr_t)mz);
+		if (ret < 0) {
+			rte_memzone_free(mz);
+			rte_mempool_free(mp);
+			return NULL;
+		}
+
+		ntb_conf->mz_list[mz_id] = mz;
+	}
+	if (left_sz) {
+		printf("mw space is not enough for mempool.\n");
+		goto fail;
+	}
+
+	ntb_conf->mz_num = mz_id;
+	rte_mempool_obj_iter(mp, rte_pktmbuf_init, NULL);
+
+	return mp;
+fail:
+	rte_mempool_free(mp);
+	return NULL;
 }
 
 int
 main(int argc, char **argv)
 {
+	struct rte_eth_conf eth_pconf = eth_port_conf;
+	struct rte_rawdev_info ntb_rawdev_conf;
+	struct rte_rawdev_info ntb_rawdev_info;
+	struct rte_eth_dev_info ethdev_info;
+	struct rte_eth_rxconf eth_rx_conf;
+	struct rte_eth_txconf eth_tx_conf;
+	struct ntb_queue_conf ntb_q_conf;
+	struct ntb_dev_config ntb_conf;
+	struct ntb_dev_info ntb_info;
+	uint64_t ntb_link_status;
+	uint32_t nb_mbuf;
 	int ret, i;
 
 	signal(SIGINT, signal_handler);
@@ -353,6 +1301,9 @@ main(int argc, char **argv)
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "Error with EAL initialization.\n");
 
+	if (rte_lcore_count() < 2)
+		rte_exit(EXIT_FAILURE, "Need at least 2 cores\n");
+
 	/* Find 1st ntb rawdev. */
 	for (i = 0; i < RTE_RAWDEV_MAX_DEVS; i++)
 		if (rte_rawdevs[i].driver_name &&
@@ -368,15 +1319,118 @@ main(int argc, char **argv)
 	argc -= ret;
 	argv += ret;
 
-	ret = parse_args(argc, argv);
+	ntb_parse_args(argc, argv);
+
+	rte_rawdev_set_attr(dev_id, NTB_QUEUE_SZ_NAME, nb_desc);
+	printf("Set queue size as %u.\n", nb_desc);
+	rte_rawdev_set_attr(dev_id, NTB_QUEUE_NUM_NAME, num_queues);
+	printf("Set queue number as %u.\n", num_queues);
+	ntb_rawdev_info.dev_private = (rte_rawdev_obj_t)(&ntb_info);
+	rte_rawdev_info_get(dev_id, &ntb_rawdev_info);
+
+	nb_mbuf = nb_desc * num_queues * 2 * 2 + rte_lcore_count() *
+		  MEMPOOL_CACHE_SIZE;
+	mbuf_pool = ntb_mbuf_pool_create(ntb_buf_size, nb_mbuf, ntb_info,
+					 &ntb_conf, rte_socket_id());
+	if (mbuf_pool == NULL)
+		rte_exit(EXIT_FAILURE, "Cannot create mbuf pool.\n");
+
+	ntb_conf.num_queues = num_queues;
+	ntb_conf.queue_size = nb_desc;
+	ntb_rawdev_conf.dev_private = (rte_rawdev_obj_t)(&ntb_conf);
+	ret = rte_rawdev_configure(dev_id, &ntb_rawdev_conf);
+	if (ret)
+		rte_exit(EXIT_FAILURE, "Can't config ntb dev: err=%d, "
+			"port=%u\n", ret, dev_id);
+
+	ntb_q_conf.tx_free_thresh = tx_free_thresh;
+	ntb_q_conf.nb_desc = nb_desc;
+	ntb_q_conf.rx_mp = mbuf_pool;
+	for (i = 0; i < num_queues; i++) {
+		/* Setup rawdev queue */
+		ret = rte_rawdev_queue_setup(dev_id, i, &ntb_q_conf);
+		if (ret < 0)
+			rte_exit(EXIT_FAILURE,
+				"Failed to setup ntb queue %u.\n", i);
+	}
+
+	/* Waiting for peer dev up at most 100s.*/
+	printf("Checking ntb link status...\n");
+	for (i = 0; i < 1000; i++) {
+		rte_rawdev_get_attr(dev_id, NTB_LINK_STATUS_NAME,
+				    &ntb_link_status);
+		if (ntb_link_status) {
+			printf("Peer dev ready, ntb link up.\n");
+			break;
+		}
+		rte_delay_ms(100);
+	}
+	rte_rawdev_get_attr(dev_id, NTB_LINK_STATUS_NAME, &ntb_link_status);
+	if (ntb_link_status == 0)
+		printf("Expire 100s. Link is not up. Please restart app.\n");
+
+	ret = rte_rawdev_start(dev_id);
 	if (ret < 0)
-		rte_exit(EXIT_FAILURE, "Invalid arguments\n");
+		rte_exit(EXIT_FAILURE, "rte_rawdev_start: err=%d, port=%u\n",
+			ret, dev_id);
+
+	/* Find 1st ethdev */
+	eth_port_id = rte_eth_find_next(0);
 
-	rte_rawdev_start(dev_id);
+	if (eth_port_id < RTE_MAX_ETHPORTS) {
+		rte_eth_dev_info_get(eth_port_id, &ethdev_info);
+		eth_pconf.rx_adv_conf.rss_conf.rss_hf &=
+				ethdev_info.flow_type_rss_offloads;
+		ret = rte_eth_dev_configure(eth_port_id, num_queues,
+					    num_queues, &eth_pconf);
+		if (ret)
+			rte_exit(EXIT_FAILURE, "Can't config ethdev: err=%d, "
+				"port=%u\n", ret, eth_port_id);
+		eth_rx_conf = ethdev_info.default_rxconf;
+		eth_rx_conf.offloads = eth_pconf.rxmode.offloads;
+		eth_tx_conf = ethdev_info.default_txconf;
+		eth_tx_conf.offloads = eth_pconf.txmode.offloads;
+
+		/* Setup ethdev queue if ethdev exists */
+		for (i = 0; i < num_queues; i++) {
+			ret = rte_eth_rx_queue_setup(eth_port_id, i, nb_desc,
+					rte_eth_dev_socket_id(eth_port_id),
+					&eth_rx_conf, mbuf_pool);
+			if (ret < 0)
+				rte_exit(EXIT_FAILURE,
+					"Failed to setup eth rxq %u.\n", i);
+			ret = rte_eth_tx_queue_setup(eth_port_id, i, nb_desc,
+					rte_eth_dev_socket_id(eth_port_id),
+					&eth_tx_conf);
+			if (ret < 0)
+				rte_exit(EXIT_FAILURE,
+					"Failed to setup eth txq %u.\n", i);
+		}
+
+		ret = rte_eth_dev_start(eth_port_id);
+		if (ret < 0)
+			rte_exit(EXIT_FAILURE, "rte_eth_dev_start: err=%d, "
+				"port=%u\n", ret, eth_port_id);
+	}
+
+	/* initialize port stats */
+	memset(&ntb_port_stats, 0, sizeof(ntb_port_stats));
+
+	/* Set default fwd mode if user doesn't set it. */
+	if (fwd_mode == MAX_FWD_MODE && eth_port_id < RTE_MAX_ETHPORTS) {
+		printf("Set default fwd mode as iofwd.\n");
+		fwd_mode = IOFWD;
+	}
+	if (fwd_mode == MAX_FWD_MODE) {
+		printf("Set default fwd mode as file-trans.\n");
+		fwd_mode = FILE_TRANS;
+	}
 
 	if (interactive) {
 		sleep(1);
 		prompt();
+	} else {
+		start_pkt_fwd();
 	}
 
 	return 0;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/4] raw/ntb: setup ntb queue
  2019-09-09  3:27       ` [dpdk-dev] [PATCH v4 1/4] raw/ntb: setup ntb queue Xiaoyun Li
@ 2019-09-23  2:50         ` Wu, Jingjing
  2019-09-23  3:28           ` Li, Xiaoyun
  0 siblings, 1 reply; 42+ messages in thread
From: Wu, Jingjing @ 2019-09-23  2:50 UTC (permalink / raw)
  To: Li, Xiaoyun, Wiles, Keith, Maslekar, Omkar, Liang, Cunming; +Cc: dev

<...>
> +static void
> +ntb_rxq_release(struct ntb_rx_queue *rxq)
> +{
> +	if (!rxq) {
> +		NTB_LOG(ERR, "Pointer to rxq is NULL");
> +		return;
> +	}
> +
> +	ntb_rxq_release_mbufs(rxq);
> +
> +	rte_free(rxq->sw_ring);
> +	rte_free(rxq);
It' better to free rxq out of this function, as the point of param "rxq" cannot be set to NULL in this func.

> +}
> +
> +static int
> +ntb_rxq_setup(struct rte_rawdev *dev,
> +	      uint16_t qp_id,
> +	      rte_rawdev_obj_t queue_conf)
> +{
> +	struct ntb_queue_conf *rxq_conf = queue_conf;
> +	struct ntb_hw *hw = dev->dev_private;
> +	struct ntb_rx_queue *rxq;
> +
> +	/* Allocate the rx queue data structure */
> +	rxq = rte_zmalloc_socket("ntb rx queue",
> +				 sizeof(struct ntb_rx_queue),
> +				 RTE_CACHE_LINE_SIZE,
> +				 dev->socket_id);
> +	if (!rxq) {
> +		NTB_LOG(ERR, "Failed to allocate memory for "
> +			    "rx queue data structure.");
> +		return -ENOMEM;
Need to free rxq here.
<...>

> +static void
> +ntb_txq_release(struct ntb_tx_queue *txq)
>  {
> +	if (!txq) {
> +		NTB_LOG(ERR, "Pointer to txq is NULL");
> +		return;
> +	}
> +
> +	ntb_txq_release_mbufs(txq);
> +
> +	rte_free(txq->sw_ring);
> +	rte_free(txq);
The same as above "ntb_rxq_release".

<...>

> +static int
> +ntb_queue_setup(struct rte_rawdev *dev,
> +		uint16_t queue_id,
> +		rte_rawdev_obj_t queue_conf)
> +{
> +	struct ntb_hw *hw = dev->dev_private;
> +	int ret;
> +
> +	if (queue_id > hw->queue_pairs)
Should be ">=" ?

> +		return -EINVAL;
> +
> +	ret = ntb_txq_setup(dev, queue_id, queue_conf);
> +	if (ret < 0)
> +		return ret;
> +
> +	ret = ntb_rxq_setup(dev, queue_id, queue_conf);
> +
> +	return ret;
> +}
> +
>  static int
> -ntb_queue_release(struct rte_rawdev *dev __rte_unused,
> -		  uint16_t queue_id __rte_unused)
> +ntb_queue_release(struct rte_rawdev *dev, uint16_t queue_id)
>  {
> +	struct ntb_hw *hw = dev->dev_private;
> +	struct ntb_tx_queue *txq;
> +	struct ntb_rx_queue *rxq;
> +
> +	if (queue_id > hw->queue_pairs)
Should be ">=" ?

> +		return -EINVAL;
> +
> +	txq = hw->tx_queues[queue_id];
> +	rxq = hw->rx_queues[queue_id];
> +	ntb_txq_release(txq);
> +	ntb_rxq_release(rxq);
> +
>  	return 0;
>  }
> 
> @@ -234,6 +470,77 @@ ntb_queue_count(struct rte_rawdev *dev)
>  	return hw->queue_pairs;
>  }
> 
> +static int
> +ntb_queue_init(struct rte_rawdev *dev, uint16_t qp_id)
> +{
> +	struct ntb_hw *hw = dev->dev_private;
> +	struct ntb_rx_queue *rxq = hw->rx_queues[qp_id];
> +	struct ntb_tx_queue *txq = hw->tx_queues[qp_id];
> +	volatile struct ntb_header *local_hdr;
> +	struct ntb_header *remote_hdr;
> +	uint16_t q_size = hw->queue_size;
> +	uint32_t hdr_offset;
> +	void *bar_addr;
> +	uint16_t i;
> +
> +	if (hw->ntb_ops->get_peer_mw_addr == NULL) {
> +		NTB_LOG(ERR, "Failed to get mapped peer addr.");
Would it be better to log as "XX ops is not supported" to keep consistent as others?

> +		return -EINVAL;
> +	}
> +
> +	/* Put queue info into the start of shared memory. */
> +	hdr_offset = hw->hdr_size_per_queue * qp_id;
> +	local_hdr = (volatile struct ntb_header *)
> +		    ((size_t)hw->mz[0]->addr + hdr_offset);
> +	bar_addr = (*hw->ntb_ops->get_peer_mw_addr)(dev, 0);
> +	if (bar_addr == NULL)
> +		return -EINVAL;
> +	remote_hdr = (struct ntb_header *)
> +		     ((size_t)bar_addr + hdr_offset);
> +
> +	/* rxq init. */
> +	rxq->rx_desc_ring = (struct ntb_desc *)
> +			    (&remote_hdr->desc_ring);
> +	rxq->rx_used_ring = (volatile struct ntb_used *)
> +			    (&local_hdr->desc_ring[q_size]);
> +	rxq->avail_cnt = &remote_hdr->avail_cnt;
> +	rxq->used_cnt = &local_hdr->used_cnt;
> +
> +	for (i = 0; i < rxq->nb_rx_desc - 1; i++) {
> +		struct rte_mbuf *mbuf = rte_mbuf_raw_alloc(rxq->mpool);
> +		if (unlikely(!mbuf)) {
> +			NTB_LOG(ERR, "Failed to allocate mbuf for RX");
Need release mbufs allocated here or in " ntb_dev_start".

<...>

> +	hw->hdr_size_per_queue = RTE_ALIGN(sizeof(struct ntb_header) +
> +				hw->queue_size * sizeof(struct ntb_desc) +
> +				hw->queue_size * sizeof(struct ntb_used),
> +				RTE_CACHE_LINE_SIZE);
hw->hdr_size_per_queue is internal information, why put the assignment in ntb_dev_info_get?

> +	info->ntb_hdr_size = hw->hdr_size_per_queue * hw->queue_pairs;
>  }
> 
>  static int
> -ntb_dev_configure(const struct rte_rawdev *dev __rte_unused,
> -		  rte_rawdev_obj_t config __rte_unused)
> +ntb_dev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config)
>  {
> +	struct ntb_dev_config *conf = config;
> +	struct ntb_hw *hw = dev->dev_private;
> +	int ret;
> +
> +	hw->queue_pairs	= conf->num_queues;
> +	hw->queue_size = conf->queue_size;
> +	hw->used_mw_num = conf->mz_num;
> +	hw->mz = conf->mz_list;
> +	hw->rx_queues = rte_zmalloc("ntb_rx_queues",
> +			sizeof(struct ntb_rx_queue *) * hw->queue_pairs, 0);
> +	hw->tx_queues = rte_zmalloc("ntb_tx_queues",
> +			sizeof(struct ntb_tx_queue *) * hw->queue_pairs, 0);
> +
> +	/* Start handshake with the peer. */
> +	ret = ntb_handshake_work(dev);
> +	if (ret < 0)
Free?

> +		return ret;
> +
>  	return 0;
>  }
> 
> @@ -337,21 +637,52 @@ static int
>  ntb_dev_start(struct rte_rawdev *dev)
>  {
>  	struct ntb_hw *hw = dev->dev_private;
> -	int ret, i;
> +	uint32_t peer_base_l, peer_val;
> +	uint64_t peer_base_h;
> +	uint32_t i;
> +	int ret;
> 
> -	/* TODO: init queues and start queues. */
> +	if (!hw->link_status || !hw->peer_dev_up)
> +		return -EINVAL;
> 
> -	/* Map memory of bar_size to remote. */
> -	hw->mz = rte_zmalloc("struct rte_memzone *",
> -			     hw->mw_cnt * sizeof(struct rte_memzone *), 0);
> -	for (i = 0; i < hw->mw_cnt; i++) {
> -		ret = ntb_set_mw(dev, i, hw->mw_size[i]);
> +	for (i = 0; i < hw->queue_pairs; i++) {
> +		ret = ntb_queue_init(dev, i);
>  		if (ret) {
> -			NTB_LOG(ERR, "Fail to set mw.");
> +			NTB_LOG(ERR, "Failed to init queue.");
Free when error.

<...>

> +struct ntb_used {
> +	uint16_t len;     /* buffer length */
> +#define NTB_FLAG_EOP    1 /* end of packet */
Better to 
> +	uint16_t flags;   /* flags */
> +};
> +
> +struct ntb_rx_entry {
> +	struct rte_mbuf *mbuf;
> +};
> +
> +struct ntb_rx_queue {
> +	struct ntb_desc *rx_desc_ring;
> +	volatile struct ntb_used *rx_used_ring;
> +	uint16_t *avail_cnt;
> +	volatile uint16_t *used_cnt;
> +	uint16_t last_avail;
> +	uint16_t last_used;
> +	uint16_t nb_rx_desc;
> +
> +	uint16_t rx_free_thresh;
> +
> +	struct rte_mempool *mpool; /**< mempool for mbuf allocation */
Generally comments: comments in internal header doesn't need to be wrapped with "/**< */", keep consistent in one file would be nice.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/4] raw/ntb: setup ntb queue
  2019-09-23  2:50         ` Wu, Jingjing
@ 2019-09-23  3:28           ` Li, Xiaoyun
  0 siblings, 0 replies; 42+ messages in thread
From: Li, Xiaoyun @ 2019-09-23  3:28 UTC (permalink / raw)
  To: Wu, Jingjing, Wiles, Keith, Maslekar, Omkar, Liang, Cunming; +Cc: dev

Hi

> -----Original Message-----
> From: Wu, Jingjing
> Sent: Monday, September 23, 2019 10:50
> To: Li, Xiaoyun <xiaoyun.li@intel.com>; Wiles, Keith <keith.wiles@intel.com>;
> Maslekar, Omkar <omkar.maslekar@intel.com>; Liang, Cunming
> <cunming.liang@intel.com>
> Cc: dev@dpdk.org
> Subject: RE: [PATCH v4 1/4] raw/ntb: setup ntb queue
> 
> <...>
> > +static void
> > +ntb_rxq_release(struct ntb_rx_queue *rxq) {
> > +	if (!rxq) {
> > +		NTB_LOG(ERR, "Pointer to rxq is NULL");
> > +		return;
> > +	}
> > +
> > +	ntb_rxq_release_mbufs(rxq);
> > +
> > +	rte_free(rxq->sw_ring);
> > +	rte_free(rxq);
> It' better to free rxq out of this function, as the point of param "rxq" cannot be
> set to NULL in this func.
OK.
> 
> > +}
> > +
> > +static int
> > +ntb_rxq_setup(struct rte_rawdev *dev,
> > +	      uint16_t qp_id,
> > +	      rte_rawdev_obj_t queue_conf)
> > +{
> > +	struct ntb_queue_conf *rxq_conf = queue_conf;
> > +	struct ntb_hw *hw = dev->dev_private;
> > +	struct ntb_rx_queue *rxq;
> > +
> > +	/* Allocate the rx queue data structure */
> > +	rxq = rte_zmalloc_socket("ntb rx queue",
> > +				 sizeof(struct ntb_rx_queue),
> > +				 RTE_CACHE_LINE_SIZE,
> > +				 dev->socket_id);
> > +	if (!rxq) {
> > +		NTB_LOG(ERR, "Failed to allocate memory for "
> > +			    "rx queue data structure.");
> > +		return -ENOMEM;
> Need to free rxq here.
It only allocates memory. And this error means allocate failure and rxq == NULL. So no need to free here.

> <...>
> 
> > +static void
> > +ntb_txq_release(struct ntb_tx_queue *txq)
> >  {
> > +	if (!txq) {
> > +		NTB_LOG(ERR, "Pointer to txq is NULL");
> > +		return;
> > +	}
> > +
> > +	ntb_txq_release_mbufs(txq);
> > +
> > +	rte_free(txq->sw_ring);
> > +	rte_free(txq);
> The same as above "ntb_rxq_release".
OK.
> 
> <...>
> 
> > +static int
> > +ntb_queue_setup(struct rte_rawdev *dev,
> > +		uint16_t queue_id,
> > +		rte_rawdev_obj_t queue_conf)
> > +{
> > +	struct ntb_hw *hw = dev->dev_private;
> > +	int ret;
> > +
> > +	if (queue_id > hw->queue_pairs)
> Should be ">=" ?
Yes.
> 
> > +		return -EINVAL;
> > +
> > +	ret = ntb_txq_setup(dev, queue_id, queue_conf);
> > +	if (ret < 0)
> > +		return ret;
> > +
> > +	ret = ntb_rxq_setup(dev, queue_id, queue_conf);
> > +
> > +	return ret;
> > +}
> > +
> >  static int
> > -ntb_queue_release(struct rte_rawdev *dev __rte_unused,
> > -		  uint16_t queue_id __rte_unused)
> > +ntb_queue_release(struct rte_rawdev *dev, uint16_t queue_id)
> >  {
> > +	struct ntb_hw *hw = dev->dev_private;
> > +	struct ntb_tx_queue *txq;
> > +	struct ntb_rx_queue *rxq;
> > +
> > +	if (queue_id > hw->queue_pairs)
> Should be ">=" ?
> 
> > +		return -EINVAL;
> > +
> > +	txq = hw->tx_queues[queue_id];
> > +	rxq = hw->rx_queues[queue_id];
> > +	ntb_txq_release(txq);
> > +	ntb_rxq_release(rxq);
> > +
> >  	return 0;
> >  }
> >
> > @@ -234,6 +470,77 @@ ntb_queue_count(struct rte_rawdev *dev)
> >  	return hw->queue_pairs;
> >  }
> >
> > +static int
> > +ntb_queue_init(struct rte_rawdev *dev, uint16_t qp_id) {
> > +	struct ntb_hw *hw = dev->dev_private;
> > +	struct ntb_rx_queue *rxq = hw->rx_queues[qp_id];
> > +	struct ntb_tx_queue *txq = hw->tx_queues[qp_id];
> > +	volatile struct ntb_header *local_hdr;
> > +	struct ntb_header *remote_hdr;
> > +	uint16_t q_size = hw->queue_size;
> > +	uint32_t hdr_offset;
> > +	void *bar_addr;
> > +	uint16_t i;
> > +
> > +	if (hw->ntb_ops->get_peer_mw_addr == NULL) {
> > +		NTB_LOG(ERR, "Failed to get mapped peer addr.");
> Would it be better to log as "XX ops is not supported" to keep consistent as
> others?
OK.
> 
> > +		return -EINVAL;
> > +	}
> > +
> > +	/* Put queue info into the start of shared memory. */
> > +	hdr_offset = hw->hdr_size_per_queue * qp_id;
> > +	local_hdr = (volatile struct ntb_header *)
> > +		    ((size_t)hw->mz[0]->addr + hdr_offset);
> > +	bar_addr = (*hw->ntb_ops->get_peer_mw_addr)(dev, 0);
> > +	if (bar_addr == NULL)
> > +		return -EINVAL;
> > +	remote_hdr = (struct ntb_header *)
> > +		     ((size_t)bar_addr + hdr_offset);
> > +
> > +	/* rxq init. */
> > +	rxq->rx_desc_ring = (struct ntb_desc *)
> > +			    (&remote_hdr->desc_ring);
> > +	rxq->rx_used_ring = (volatile struct ntb_used *)
> > +			    (&local_hdr->desc_ring[q_size]);
> > +	rxq->avail_cnt = &remote_hdr->avail_cnt;
> > +	rxq->used_cnt = &local_hdr->used_cnt;
> > +
> > +	for (i = 0; i < rxq->nb_rx_desc - 1; i++) {
> > +		struct rte_mbuf *mbuf = rte_mbuf_raw_alloc(rxq->mpool);
> > +		if (unlikely(!mbuf)) {
> > +			NTB_LOG(ERR, "Failed to allocate mbuf for RX");
> Need release mbufs allocated here or in " ntb_dev_start".
OK.
> 
> <...>
> 
> > +	hw->hdr_size_per_queue = RTE_ALIGN(sizeof(struct ntb_header) +
> > +				hw->queue_size * sizeof(struct ntb_desc) +
> > +				hw->queue_size * sizeof(struct ntb_used),
> > +				RTE_CACHE_LINE_SIZE);
> hw->hdr_size_per_queue is internal information, why put the assignment in
> ntb_dev_info_get?
Because the calculation needs the app to pass params (queue size and queue number). And the app needs the result to populate mempool and then configure to driver.
There is no else place  where can do the calculation.
> 
> > +	info->ntb_hdr_size = hw->hdr_size_per_queue * hw->queue_pairs;
> >  }
> >
> >  static int
> > -ntb_dev_configure(const struct rte_rawdev *dev __rte_unused,
> > -		  rte_rawdev_obj_t config __rte_unused)
> > +ntb_dev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t
> > +config)
> >  {
> > +	struct ntb_dev_config *conf = config;
> > +	struct ntb_hw *hw = dev->dev_private;
> > +	int ret;
> > +
> > +	hw->queue_pairs	= conf->num_queues;
> > +	hw->queue_size = conf->queue_size;
> > +	hw->used_mw_num = conf->mz_num;
> > +	hw->mz = conf->mz_list;
> > +	hw->rx_queues = rte_zmalloc("ntb_rx_queues",
> > +			sizeof(struct ntb_rx_queue *) * hw->queue_pairs, 0);
> > +	hw->tx_queues = rte_zmalloc("ntb_tx_queues",
> > +			sizeof(struct ntb_tx_queue *) * hw->queue_pairs, 0);
> > +
> > +	/* Start handshake with the peer. */
> > +	ret = ntb_handshake_work(dev);
> > +	if (ret < 0)
> Free?
OK.
> 
> > +		return ret;
> > +
> >  	return 0;
> >  }
> >
> > @@ -337,21 +637,52 @@ static int
> >  ntb_dev_start(struct rte_rawdev *dev)  {
> >  	struct ntb_hw *hw = dev->dev_private;
> > -	int ret, i;
> > +	uint32_t peer_base_l, peer_val;
> > +	uint64_t peer_base_h;
> > +	uint32_t i;
> > +	int ret;
> >
> > -	/* TODO: init queues and start queues. */
> > +	if (!hw->link_status || !hw->peer_dev_up)
> > +		return -EINVAL;
> >
> > -	/* Map memory of bar_size to remote. */
> > -	hw->mz = rte_zmalloc("struct rte_memzone *",
> > -			     hw->mw_cnt * sizeof(struct rte_memzone *), 0);
> > -	for (i = 0; i < hw->mw_cnt; i++) {
> > -		ret = ntb_set_mw(dev, i, hw->mw_size[i]);
> > +	for (i = 0; i < hw->queue_pairs; i++) {
> > +		ret = ntb_queue_init(dev, i);
> >  		if (ret) {
> > -			NTB_LOG(ERR, "Fail to set mw.");
> > +			NTB_LOG(ERR, "Failed to init queue.");
> Free when error.
OK.
> 
> <...>
> 
> > +struct ntb_used {
> > +	uint16_t len;     /* buffer length */
> > +#define NTB_FLAG_EOP    1 /* end of packet */
> Better to
> > +	uint16_t flags;   /* flags */
> > +};
> > +
> > +struct ntb_rx_entry {
> > +	struct rte_mbuf *mbuf;
> > +};
> > +
> > +struct ntb_rx_queue {
> > +	struct ntb_desc *rx_desc_ring;
> > +	volatile struct ntb_used *rx_used_ring;
> > +	uint16_t *avail_cnt;
> > +	volatile uint16_t *used_cnt;
> > +	uint16_t last_avail;
> > +	uint16_t last_used;
> > +	uint16_t nb_rx_desc;
> > +
> > +	uint16_t rx_free_thresh;
> > +
> > +	struct rte_mempool *mpool; /**< mempool for mbuf allocation */
> Generally comments: comments in internal header doesn't need to be wrapped
> with "/**< */", keep consistent in one file would be nice.
OK.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH v4 2/4] raw/ntb: add xstats support
  2019-09-09  3:27       ` [dpdk-dev] [PATCH v4 2/4] raw/ntb: add xstats support Xiaoyun Li
@ 2019-09-23  3:30         ` Wu, Jingjing
  0 siblings, 0 replies; 42+ messages in thread
From: Wu, Jingjing @ 2019-09-23  3:30 UTC (permalink / raw)
  To: Li, Xiaoyun, Wiles, Keith, Maslekar, Omkar, Liang, Cunming; +Cc: dev

>  static int
> -ntb_xstats_reset(struct rte_rawdev *dev __rte_unused,
> -		 const uint32_t ids[] __rte_unused,
> -		 uint32_t nb_ids __rte_unused)
> +ntb_xstats_reset(struct rte_rawdev *dev,
> +		 const uint32_t ids[],
> +		 uint32_t nb_ids)
>  {
> -	return 0;
> -}
> +	struct ntb_hw *hw = dev->dev_private;
> +	uint32_t i, xstats_num;
> 
> +	xstats_num = NTB_XSTATS_NUM * (hw->queue_pairs + 1);
> +	for (i = 0; i < nb_ids && ids[i] < xstats_num; i++)
> +		hw->ntb_xstats[ids[i]] = 0;
> +
As there is no lock for the xstats, the enqueue and dequeuer thread are updating the value. It will cause competition.
Suggest to save the ntx_xstats, and update the value when enqueue and dequeuer are updating.

Thanks
Jingjing

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH v4 3/4] raw/ntb: add enqueue and dequeue functions
  2019-09-09  3:27       ` [dpdk-dev] [PATCH v4 3/4] raw/ntb: add enqueue and dequeue functions Xiaoyun Li
@ 2019-09-23  5:25         ` Wu, Jingjing
  0 siblings, 0 replies; 42+ messages in thread
From: Wu, Jingjing @ 2019-09-23  5:25 UTC (permalink / raw)
  To: Li, Xiaoyun, Wiles, Keith, Maslekar, Omkar, Liang, Cunming; +Cc: dev



> -----Original Message-----
> From: Li, Xiaoyun
> Sent: Monday, September 9, 2019 11:27 AM
> To: Wu, Jingjing <jingjing.wu@intel.com>; Wiles, Keith <keith.wiles@intel.com>; Maslekar,
> Omkar <omkar.maslekar@intel.com>; Liang, Cunming <cunming.liang@intel.com>
> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>
> Subject: [PATCH v4 3/4] raw/ntb: add enqueue and dequeue functions
> 
> Introduce enqueue and dequeue functions to support packet based
> processing. And enable write-combining for ntb driver since it
> can improve the performance a lot.
> 
> Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Acked-by: Jingjing Wu <jingjing.wu@intel.com>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH v4 4/4] examples/ntb: support more functions for NTB
  2019-09-09  3:27       ` [dpdk-dev] [PATCH v4 4/4] examples/ntb: support more functions for NTB Xiaoyun Li
@ 2019-09-23  7:18         ` Wu, Jingjing
  2019-09-23  7:26           ` Li, Xiaoyun
  2019-09-24  8:24           ` Li, Xiaoyun
  0 siblings, 2 replies; 42+ messages in thread
From: Wu, Jingjing @ 2019-09-23  7:18 UTC (permalink / raw)
  To: Li, Xiaoyun, Wiles, Keith, Maslekar, Omkar, Liang, Cunming; +Cc: dev

<...>

> +* ``--qp=N``
> +
> +  Set the number of queues as N, where qp > 0.
The default value is 1?

<...>

> +
> +	/* Set default fwd mode if user doesn't set it. */
> +	if (fwd_mode == MAX_FWD_MODE && eth_port_id < RTE_MAX_ETHPORTS) {
> +		printf("Set default fwd mode as iofwd.\n");
> +		fwd_mode = IOFWD;
> +	}
> +	if (fwd_mode == MAX_FWD_MODE) {

use "else if"? because (fwd_mode == MAX_FWD_MODE) including (fwd_mode == MAX_FWD_MODE && eth_port_id < RTE_MAX_ETHPORTS)

Thanks
Jingjing

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH v4 4/4] examples/ntb: support more functions for NTB
  2019-09-23  7:18         ` Wu, Jingjing
@ 2019-09-23  7:26           ` Li, Xiaoyun
  2019-09-24  8:24           ` Li, Xiaoyun
  1 sibling, 0 replies; 42+ messages in thread
From: Li, Xiaoyun @ 2019-09-23  7:26 UTC (permalink / raw)
  To: Wu, Jingjing, Wiles, Keith, Maslekar, Omkar, Liang, Cunming; +Cc: dev

Hi

> -----Original Message-----
> From: Wu, Jingjing
> Sent: Monday, September 23, 2019 15:19
> To: Li, Xiaoyun <xiaoyun.li@intel.com>; Wiles, Keith <keith.wiles@intel.com>;
> Maslekar, Omkar <omkar.maslekar@intel.com>; Liang, Cunming
> <cunming.liang@intel.com>
> Cc: dev@dpdk.org
> Subject: RE: [PATCH v4 4/4] examples/ntb: support more functions for NTB
> 
> <...>
> 
> > +* ``--qp=N``
> > +
> > +  Set the number of queues as N, where qp > 0.
> The default value is 1?
Yes. Will add respective comment and guide.
> 
> <...>
> 
> > +
> > +	/* Set default fwd mode if user doesn't set it. */
> > +	if (fwd_mode == MAX_FWD_MODE && eth_port_id <
> RTE_MAX_ETHPORTS) {
> > +		printf("Set default fwd mode as iofwd.\n");
> > +		fwd_mode = IOFWD;
> > +	}
> > +	if (fwd_mode == MAX_FWD_MODE) {
> 
> use "else if"? because (fwd_mode == MAX_FWD_MODE) including (fwd_mode
> == MAX_FWD_MODE && eth_port_id < RTE_MAX_ETHPORTS)
Yes. Thanks.
> 
> Thanks
> Jingjing

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH v4 4/4] examples/ntb: support more functions for NTB
  2019-09-23  7:18         ` Wu, Jingjing
  2019-09-23  7:26           ` Li, Xiaoyun
@ 2019-09-24  8:24           ` Li, Xiaoyun
  1 sibling, 0 replies; 42+ messages in thread
From: Li, Xiaoyun @ 2019-09-24  8:24 UTC (permalink / raw)
  To: Wu, Jingjing, Wiles, Keith, Maslekar, Omkar, Liang, Cunming; +Cc: dev

Hi

> -----Original Message-----
> From: Wu, Jingjing
> Sent: Monday, September 23, 2019 15:19
> To: Li, Xiaoyun <xiaoyun.li@intel.com>; Wiles, Keith <keith.wiles@intel.com>;
> Maslekar, Omkar <omkar.maslekar@intel.com>; Liang, Cunming
> <cunming.liang@intel.com>
> Cc: dev@dpdk.org
> Subject: RE: [PATCH v4 4/4] examples/ntb: support more functions for NTB
> 
> <...>
> 
> > +* ``--qp=N``
> > +
> > +  Set the number of queues as N, where qp > 0.
> The default value is 1?

Yes. Will add missing info in doc.
> 
> <...>
> 
> > +
> > +	/* Set default fwd mode if user doesn't set it. */
> > +	if (fwd_mode == MAX_FWD_MODE && eth_port_id <
> RTE_MAX_ETHPORTS) {
> > +		printf("Set default fwd mode as iofwd.\n");
> > +		fwd_mode = IOFWD;
> > +	}
> > +	if (fwd_mode == MAX_FWD_MODE) {
> 
> use "else if"? because (fwd_mode == MAX_FWD_MODE) including (fwd_mode
> == MAX_FWD_MODE && eth_port_id < RTE_MAX_ETHPORTS)

No. If it is that case, the fwd_mode will be set as IOFWD in the first if and won't hit this one.
> 
> Thanks
> Jingjing

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [PATCH v5 0/4] enable FIFO for NTB
  2019-09-09  3:27     ` [dpdk-dev] [PATCH v4 0/4] enable FIFO " Xiaoyun Li
                         ` (3 preceding siblings ...)
  2019-09-09  3:27       ` [dpdk-dev] [PATCH v4 4/4] examples/ntb: support more functions for NTB Xiaoyun Li
@ 2019-09-24  8:43       ` " Xiaoyun Li
  2019-09-24  8:43         ` [dpdk-dev] [PATCH v5 1/4] raw/ntb: setup ntb queue Xiaoyun Li
                           ` (4 more replies)
  4 siblings, 5 replies; 42+ messages in thread
From: Xiaoyun Li @ 2019-09-24  8:43 UTC (permalink / raw)
  To: jingjing.wu, keith.wiles, omkar.maslekar, cunming.liang; +Cc: dev, Xiaoyun Li

Enable FIFO for NTB rawdev driver to support packet based
processing. And an example is provided to support txonly,
rxonly, iofwd between NTB device and ethdev, and file
transmission.

Acked-by: Omkar Maslekar <omkar.maslekar@intel.com>

---
v5:
 * Added missing free function when error happens.
 * Reworked on xstats reset and get to avoid competition of reset and
 * en/dequeue.
 * Added missing info in doc.

v4:
 * Fixed compile issues with 32-bit machine.
 * Fixed total xstats issue.

v3:
 * Replace strncpy with memcpy to avoid gcc-9 compile issue.

v2:
 * Fixed compile issues with 32-bit machine and lack of including file.
 * Fixed a typo.

Xiaoyun Li (4):
  raw/ntb: setup ntb queue
  raw/ntb: add xstats support
  raw/ntb: add enqueue and dequeue functions
  examples/ntb: support more functions for NTB

 doc/guides/rawdevs/ntb.rst             |   67 +-
 doc/guides/rel_notes/release_19_11.rst |    4 +
 doc/guides/sample_app_ug/ntb.rst       |   59 +-
 drivers/raw/ntb/Makefile               |    3 +
 drivers/raw/ntb/meson.build            |    1 +
 drivers/raw/ntb/ntb.c                  | 1134 ++++++++++++++++-----
 drivers/raw/ntb/ntb.h                  |  163 ++-
 drivers/raw/ntb/ntb_hw_intel.c         |   48 +-
 drivers/raw/ntb/rte_pmd_ntb.h          |   43 +
 examples/ntb/meson.build               |    3 +
 examples/ntb/ntb_fwd.c                 | 1298 +++++++++++++++++++++---
 11 files changed, 2406 insertions(+), 417 deletions(-)
 create mode 100644 drivers/raw/ntb/rte_pmd_ntb.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [PATCH v5 1/4] raw/ntb: setup ntb queue
  2019-09-24  8:43       ` [dpdk-dev] [PATCH v5 0/4] enable FIFO " Xiaoyun Li
@ 2019-09-24  8:43         ` Xiaoyun Li
  2019-09-24  8:43         ` [dpdk-dev] [PATCH v5 2/4] raw/ntb: add xstats support Xiaoyun Li
                           ` (3 subsequent siblings)
  4 siblings, 0 replies; 42+ messages in thread
From: Xiaoyun Li @ 2019-09-24  8:43 UTC (permalink / raw)
  To: jingjing.wu, keith.wiles, omkar.maslekar, cunming.liang; +Cc: dev, Xiaoyun Li

Setup and init ntb txq and rxq. And negotiate queue information
with the peer. If queue size and number of queues are not
consistent on both sides, return error.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
---
 doc/guides/rawdevs/ntb.rst             |  39 +-
 doc/guides/rel_notes/release_19_11.rst |   4 +
 drivers/raw/ntb/Makefile               |   3 +
 drivers/raw/ntb/meson.build            |   1 +
 drivers/raw/ntb/ntb.c                  | 726 ++++++++++++++++++-------
 drivers/raw/ntb/ntb.h                  | 151 +++--
 drivers/raw/ntb/ntb_hw_intel.c         |  26 +-
 drivers/raw/ntb/rte_pmd_ntb.h          |  43 ++
 8 files changed, 738 insertions(+), 255 deletions(-)
 create mode 100644 drivers/raw/ntb/rte_pmd_ntb.h

diff --git a/doc/guides/rawdevs/ntb.rst b/doc/guides/rawdevs/ntb.rst
index 0a61ec03d..99e7db441 100644
--- a/doc/guides/rawdevs/ntb.rst
+++ b/doc/guides/rawdevs/ntb.rst
@@ -45,8 +45,45 @@ to use, i.e. igb_uio, vfio. The ``dpdk-devbind.py`` script can be used to
 show devices status and to bind them to a suitable kernel driver. They will
 appear under the category of "Misc (rawdev) devices".
 
+Ring Layout
+-----------
+
+Since read/write remote system's memory are through PCI bus, remote read
+is much more expensive than remote write. Thus, the enqueue and dequeue
+based on ntb ring should avoid remote read. The ring layout for ntb is
+like the following:
+- Ring Format:
+  desc_ring:
+      0               16                                              64
+      +---------------------------------------------------------------+
+      |                        buffer address                         |
+      +---------------+-----------------------------------------------+
+      | buffer length |                      resv                     |
+      +---------------+-----------------------------------------------+
+  used_ring:
+      0               16              32
+      +---------------+---------------+
+      | packet length |     flags     |
+      +---------------+---------------+
+- Ring Layout
+      +------------------------+   +------------------------+
+      | used_ring              |   | desc_ring              |
+      | +---+                  |   | +---+                  |
+      | |   |                  |   | |   |                  |
+      | +---+      +--------+  |   | +---+                  |
+      | |   | ---> | buffer | <+---+-|   |                  |
+      | +---+      +--------+  |   | +---+                  |
+      | |   |                  |   | |   |                  |
+      | +---+                  |   | +---+                  |
+      |  ...                   |   |  ...                   |
+      |                        |   |                        |
+      |            +---------+ |   |            +---------+ |
+      |            | tx_tail | |   |            | rx_tail | |
+      | System A   +---------+ |   | System B   +---------+ |
+      +------------------------+   +------------------------+
+                    <---------traffic---------
+
 Limitation
 ----------
 
-- The FIFO hasn't been introduced and will come in 19.11 release.
 - This PMD only supports Intel Skylake platform.
diff --git a/doc/guides/rel_notes/release_19_11.rst b/doc/guides/rel_notes/release_19_11.rst
index 27cfbd9e3..efd0fb825 100644
--- a/doc/guides/rel_notes/release_19_11.rst
+++ b/doc/guides/rel_notes/release_19_11.rst
@@ -56,6 +56,10 @@ New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
+   * **Introduced FIFO for NTB PMD.**
+
+     Introduced FIFO for NTB (Non-transparent Bridge) PMD to support
+     packet based processing.
 
 Removed Items
 -------------
diff --git a/drivers/raw/ntb/Makefile b/drivers/raw/ntb/Makefile
index 6fe2aaf40..814cd05ca 100644
--- a/drivers/raw/ntb/Makefile
+++ b/drivers/raw/ntb/Makefile
@@ -25,4 +25,7 @@ LIBABIVER := 1
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_NTB_RAWDEV) += ntb.c
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_NTB_RAWDEV) += ntb_hw_intel.c
 
+# install this header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_PMD_NTB_RAWDEV)-include := rte_pmd_ntb.h
+
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/raw/ntb/meson.build b/drivers/raw/ntb/meson.build
index 7f39437f8..7a7d26126 100644
--- a/drivers/raw/ntb/meson.build
+++ b/drivers/raw/ntb/meson.build
@@ -5,4 +5,5 @@ deps += ['rawdev', 'mbuf', 'mempool',
 	 'pci', 'bus_pci']
 sources = files('ntb.c',
                 'ntb_hw_intel.c')
+install_headers('rte_pmd_ntb.h')
 allow_experimental_apis = true
diff --git a/drivers/raw/ntb/ntb.c b/drivers/raw/ntb/ntb.c
index bfecce1e4..fdc6c8a30 100644
--- a/drivers/raw/ntb/ntb.c
+++ b/drivers/raw/ntb/ntb.c
@@ -12,6 +12,7 @@
 #include <rte_eal.h>
 #include <rte_log.h>
 #include <rte_pci.h>
+#include <rte_mbuf.h>
 #include <rte_bus_pci.h>
 #include <rte_memzone.h>
 #include <rte_memcpy.h>
@@ -19,6 +20,7 @@
 #include <rte_rawdev_pmd.h>
 
 #include "ntb_hw_intel.h"
+#include "rte_pmd_ntb.h"
 #include "ntb.h"
 
 int ntb_logtype;
@@ -28,48 +30,7 @@ static const struct rte_pci_id pci_id_ntb_map[] = {
 	{ .vendor_id = 0, /* sentinel */ },
 };
 
-static int
-ntb_set_mw(struct rte_rawdev *dev, int mw_idx, uint64_t mw_size)
-{
-	struct ntb_hw *hw = dev->dev_private;
-	char mw_name[RTE_MEMZONE_NAMESIZE];
-	const struct rte_memzone *mz;
-	int ret = 0;
-
-	if (hw->ntb_ops->mw_set_trans == NULL) {
-		NTB_LOG(ERR, "Not supported to set mw.");
-		return -ENOTSUP;
-	}
-
-	snprintf(mw_name, sizeof(mw_name), "ntb_%d_mw_%d",
-		 dev->dev_id, mw_idx);
-
-	mz = rte_memzone_lookup(mw_name);
-	if (mz)
-		return 0;
-
-	/**
-	 * Hardware requires that mapped memory base address should be
-	 * aligned with EMBARSZ and needs continuous memzone.
-	 */
-	mz = rte_memzone_reserve_aligned(mw_name, mw_size, dev->socket_id,
-				RTE_MEMZONE_IOVA_CONTIG, hw->mw_size[mw_idx]);
-	if (!mz) {
-		NTB_LOG(ERR, "Cannot allocate aligned memzone.");
-		return -EIO;
-	}
-	hw->mz[mw_idx] = mz;
-
-	ret = (*hw->ntb_ops->mw_set_trans)(dev, mw_idx, mz->iova, mw_size);
-	if (ret) {
-		NTB_LOG(ERR, "Cannot set mw translation.");
-		return ret;
-	}
-
-	return ret;
-}
-
-static void
+static inline void
 ntb_link_cleanup(struct rte_rawdev *dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
@@ -89,20 +50,94 @@ ntb_link_cleanup(struct rte_rawdev *dev)
 	}
 
 	/* Clear mw so that peer cannot access local memory.*/
-	for (i = 0; i < hw->mw_cnt; i++) {
+	for (i = 0; i < hw->used_mw_num; i++) {
 		status = (*hw->ntb_ops->mw_set_trans)(dev, i, 0, 0);
 		if (status)
 			NTB_LOG(ERR, "Failed to clean mw.");
 	}
 }
 
+static inline int
+ntb_handshake_work(const struct rte_rawdev *dev)
+{
+	struct ntb_hw *hw = dev->dev_private;
+	uint32_t val;
+	int ret, i;
+
+	if (hw->ntb_ops->spad_write == NULL ||
+	    hw->ntb_ops->mw_set_trans == NULL) {
+		NTB_LOG(ERR, "Scratchpad/MW setting is not supported.");
+		return -ENOTSUP;
+	}
+
+	/* Tell peer the mw info of local side. */
+	ret = (*hw->ntb_ops->spad_write)(dev, SPAD_NUM_MWS, 1, hw->mw_cnt);
+	if (ret < 0)
+		return ret;
+	for (i = 0; i < hw->mw_cnt; i++) {
+		NTB_LOG(INFO, "Local %u mw size: 0x%"PRIx64"", i,
+				hw->mw_size[i]);
+		val = hw->mw_size[i] >> 32;
+		ret = (*hw->ntb_ops->spad_write)(dev, SPAD_MW0_SZ_H + 2 * i,
+						 1, val);
+		if (ret < 0)
+			return ret;
+		val = hw->mw_size[i];
+		ret = (*hw->ntb_ops->spad_write)(dev, SPAD_MW0_SZ_L + 2 * i,
+						 1, val);
+		if (ret < 0)
+			return ret;
+	}
+
+	/* Tell peer about the queue info and map memory to the peer. */
+	ret = (*hw->ntb_ops->spad_write)(dev, SPAD_Q_SZ, 1, hw->queue_size);
+	if (ret < 0)
+		return ret;
+	ret = (*hw->ntb_ops->spad_write)(dev, SPAD_NUM_QPS, 1,
+					 hw->queue_pairs);
+	if (ret < 0)
+		return ret;
+	ret = (*hw->ntb_ops->spad_write)(dev, SPAD_USED_MWS, 1,
+					 hw->used_mw_num);
+	if (ret < 0)
+		return ret;
+	for (i = 0; i < hw->used_mw_num; i++) {
+		val = (uint64_t)(size_t)(hw->mz[i]->addr) >> 32;
+		ret = (*hw->ntb_ops->spad_write)(dev, SPAD_MW0_BA_H + 2 * i,
+						 1, val);
+		if (ret < 0)
+			return ret;
+		val = (uint64_t)(size_t)(hw->mz[i]->addr);
+		ret = (*hw->ntb_ops->spad_write)(dev, SPAD_MW0_BA_L + 2 * i,
+						 1, val);
+		if (ret < 0)
+			return ret;
+	}
+
+	for (i = 0; i < hw->used_mw_num; i++) {
+		ret = (*hw->ntb_ops->mw_set_trans)(dev, i, hw->mz[i]->iova,
+						   hw->mz[i]->len);
+		if (ret < 0)
+			return ret;
+	}
+
+	/* Ring doorbell 0 to tell peer the device is ready. */
+	ret = (*hw->ntb_ops->peer_db_set)(dev, 0);
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
+
 static void
 ntb_dev_intr_handler(void *param)
 {
 	struct rte_rawdev *dev = (struct rte_rawdev *)param;
 	struct ntb_hw *hw = dev->dev_private;
-	uint32_t mw_size_h, mw_size_l;
+	uint32_t val_h, val_l;
+	uint64_t peer_mw_size;
 	uint64_t db_bits = 0;
+	uint8_t peer_mw_cnt;
 	int i = 0;
 
 	if (hw->ntb_ops->db_read == NULL ||
@@ -118,7 +153,7 @@ ntb_dev_intr_handler(void *param)
 
 	/* Doorbell 0 is for peer device ready. */
 	if (db_bits & 1) {
-		NTB_LOG(DEBUG, "DB0: Peer device is up.");
+		NTB_LOG(INFO, "DB0: Peer device is up.");
 		/* Clear received doorbell. */
 		(*hw->ntb_ops->db_clear)(dev, 1);
 
@@ -129,47 +164,44 @@ ntb_dev_intr_handler(void *param)
 		if (hw->peer_dev_up)
 			return;
 
-		if (hw->ntb_ops->spad_read == NULL ||
-		    hw->ntb_ops->spad_write == NULL) {
-			NTB_LOG(ERR, "Scratchpad is not supported.");
+		if (hw->ntb_ops->spad_read == NULL) {
+			NTB_LOG(ERR, "Scratchpad read is not supported.");
+			return;
+		}
+
+		/* Check if mw setting on the peer is the same as local. */
+		peer_mw_cnt = (*hw->ntb_ops->spad_read)(dev, SPAD_NUM_MWS, 0);
+		if (peer_mw_cnt != hw->mw_cnt) {
+			NTB_LOG(ERR, "Both mw cnt must be the same.");
 			return;
 		}
 
-		hw->peer_mw_cnt = (*hw->ntb_ops->spad_read)
-				  (dev, SPAD_NUM_MWS, 0);
-		hw->peer_mw_size = rte_zmalloc("uint64_t",
-				   hw->peer_mw_cnt * sizeof(uint64_t), 0);
 		for (i = 0; i < hw->mw_cnt; i++) {
-			mw_size_h = (*hw->ntb_ops->spad_read)
-				    (dev, SPAD_MW0_SZ_H + 2 * i, 0);
-			mw_size_l = (*hw->ntb_ops->spad_read)
-				    (dev, SPAD_MW0_SZ_L + 2 * i, 0);
-			hw->peer_mw_size[i] = ((uint64_t)mw_size_h << 32) |
-					      mw_size_l;
+			val_h = (*hw->ntb_ops->spad_read)
+				(dev, SPAD_MW0_SZ_H + 2 * i, 0);
+			val_l = (*hw->ntb_ops->spad_read)
+				(dev, SPAD_MW0_SZ_L + 2 * i, 0);
+			peer_mw_size = ((uint64_t)val_h << 32) | val_l;
 			NTB_LOG(DEBUG, "Peer %u mw size: 0x%"PRIx64"", i,
-					hw->peer_mw_size[i]);
+					peer_mw_size);
+			if (peer_mw_size != hw->mw_size[i]) {
+				NTB_LOG(ERR, "Mw config must be the same.");
+				return;
+			}
 		}
 
 		hw->peer_dev_up = 1;
 
 		/**
-		 * Handshake with peer. Spad_write only works when both
-		 * devices are up. So write spad again when db is received.
-		 * And set db again for the later device who may miss
+		 * Handshake with peer. Spad_write & mw_set_trans only works
+		 * when both devices are up. So write spad again when db is
+		 * received. And set db again for the later device who may miss
 		 * the 1st db.
 		 */
-		for (i = 0; i < hw->mw_cnt; i++) {
-			(*hw->ntb_ops->spad_write)(dev, SPAD_NUM_MWS,
-						   1, hw->mw_cnt);
-			mw_size_h = hw->mw_size[i] >> 32;
-			(*hw->ntb_ops->spad_write)(dev, SPAD_MW0_SZ_H + 2 * i,
-						   1, mw_size_h);
-
-			mw_size_l = hw->mw_size[i];
-			(*hw->ntb_ops->spad_write)(dev, SPAD_MW0_SZ_L + 2 * i,
-						   1, mw_size_l);
+		if (ntb_handshake_work(dev) < 0) {
+			NTB_LOG(ERR, "Handshake work failed.");
+			return;
 		}
-		(*hw->ntb_ops->peer_db_set)(dev, 0);
 
 		/* To get the link info. */
 		if (hw->ntb_ops->get_link_status == NULL) {
@@ -183,7 +215,7 @@ ntb_dev_intr_handler(void *param)
 	}
 
 	if (db_bits & (1 << 1)) {
-		NTB_LOG(DEBUG, "DB1: Peer device is down.");
+		NTB_LOG(INFO, "DB1: Peer device is down.");
 		/* Clear received doorbell. */
 		(*hw->ntb_ops->db_clear)(dev, 2);
 
@@ -197,7 +229,7 @@ ntb_dev_intr_handler(void *param)
 	}
 
 	if (db_bits & (1 << 2)) {
-		NTB_LOG(DEBUG, "DB2: Peer device agrees dev to be down.");
+		NTB_LOG(INFO, "DB2: Peer device agrees dev to be down.");
 		/* Clear received doorbell. */
 		(*hw->ntb_ops->db_clear)(dev, (1 << 2));
 		hw->peer_dev_up = 0;
@@ -206,24 +238,232 @@ ntb_dev_intr_handler(void *param)
 }
 
 static void
-ntb_queue_conf_get(struct rte_rawdev *dev __rte_unused,
-		   uint16_t queue_id __rte_unused,
-		   rte_rawdev_obj_t queue_conf __rte_unused)
+ntb_queue_conf_get(struct rte_rawdev *dev,
+		   uint16_t queue_id,
+		   rte_rawdev_obj_t queue_conf)
+{
+	struct ntb_queue_conf *q_conf = queue_conf;
+	struct ntb_hw *hw = dev->dev_private;
+
+	q_conf->tx_free_thresh = hw->tx_queues[queue_id]->tx_free_thresh;
+	q_conf->nb_desc = hw->rx_queues[queue_id]->nb_rx_desc;
+	q_conf->rx_mp = hw->rx_queues[queue_id]->mpool;
+}
+
+static void
+ntb_rxq_release_mbufs(struct ntb_rx_queue *q)
 {
+	int i;
+
+	if (!q || !q->sw_ring) {
+		NTB_LOG(ERR, "Pointer to rxq or sw_ring is NULL");
+		return;
+	}
+
+	for (i = 0; i < q->nb_rx_desc; i++) {
+		if (q->sw_ring[i].mbuf) {
+			rte_pktmbuf_free_seg(q->sw_ring[i].mbuf);
+			q->sw_ring[i].mbuf = NULL;
+		}
+	}
+}
+
+static void
+ntb_rxq_release(struct ntb_rx_queue *rxq)
+{
+	if (!rxq) {
+		NTB_LOG(ERR, "Pointer to rxq is NULL");
+		return;
+	}
+
+	ntb_rxq_release_mbufs(rxq);
+
+	rte_free(rxq->sw_ring);
+	rte_free(rxq);
 }
 
 static int
-ntb_queue_setup(struct rte_rawdev *dev __rte_unused,
-		uint16_t queue_id __rte_unused,
-		rte_rawdev_obj_t queue_conf __rte_unused)
+ntb_rxq_setup(struct rte_rawdev *dev,
+	      uint16_t qp_id,
+	      rte_rawdev_obj_t queue_conf)
 {
+	struct ntb_queue_conf *rxq_conf = queue_conf;
+	struct ntb_hw *hw = dev->dev_private;
+	struct ntb_rx_queue *rxq;
+
+	/* Allocate the rx queue data structure */
+	rxq = rte_zmalloc_socket("ntb rx queue",
+				 sizeof(struct ntb_rx_queue),
+				 RTE_CACHE_LINE_SIZE,
+				 dev->socket_id);
+	if (!rxq) {
+		NTB_LOG(ERR, "Failed to allocate memory for "
+			    "rx queue data structure.");
+		return -ENOMEM;
+	}
+
+	if (rxq_conf->rx_mp == NULL) {
+		NTB_LOG(ERR, "Invalid null mempool pointer.");
+		return -EINVAL;
+	}
+	rxq->nb_rx_desc = rxq_conf->nb_desc;
+	rxq->mpool = rxq_conf->rx_mp;
+	rxq->port_id = dev->dev_id;
+	rxq->queue_id = qp_id;
+	rxq->hw = hw;
+
+	/* Allocate the software ring. */
+	rxq->sw_ring =
+		rte_zmalloc_socket("ntb rx sw ring",
+				   sizeof(struct ntb_rx_entry) *
+				   rxq->nb_rx_desc,
+				   RTE_CACHE_LINE_SIZE,
+				   dev->socket_id);
+	if (!rxq->sw_ring) {
+		ntb_rxq_release(rxq);
+		rxq = NULL;
+		NTB_LOG(ERR, "Failed to allocate memory for SW ring");
+		return -ENOMEM;
+	}
+
+	hw->rx_queues[qp_id] = rxq;
+
+	return 0;
+}
+
+static void
+ntb_txq_release_mbufs(struct ntb_tx_queue *q)
+{
+	int i;
+
+	if (!q || !q->sw_ring) {
+		NTB_LOG(ERR, "Pointer to txq or sw_ring is NULL");
+		return;
+	}
+
+	for (i = 0; i < q->nb_tx_desc; i++) {
+		if (q->sw_ring[i].mbuf) {
+			rte_pktmbuf_free_seg(q->sw_ring[i].mbuf);
+			q->sw_ring[i].mbuf = NULL;
+		}
+	}
+}
+
+static void
+ntb_txq_release(struct ntb_tx_queue *txq)
+{
+	if (!txq) {
+		NTB_LOG(ERR, "Pointer to txq is NULL");
+		return;
+	}
+
+	ntb_txq_release_mbufs(txq);
+
+	rte_free(txq->sw_ring);
+	rte_free(txq);
+}
+
+static int
+ntb_txq_setup(struct rte_rawdev *dev,
+	      uint16_t qp_id,
+	      rte_rawdev_obj_t queue_conf)
+{
+	struct ntb_queue_conf *txq_conf = queue_conf;
+	struct ntb_hw *hw = dev->dev_private;
+	struct ntb_tx_queue *txq;
+	uint16_t i, prev;
+
+	/* Allocate the TX queue data structure. */
+	txq = rte_zmalloc_socket("ntb tx queue",
+				  sizeof(struct ntb_tx_queue),
+				  RTE_CACHE_LINE_SIZE,
+				  dev->socket_id);
+	if (!txq) {
+		NTB_LOG(ERR, "Failed to allocate memory for "
+			    "tx queue structure");
+		return -ENOMEM;
+	}
+
+	txq->nb_tx_desc = txq_conf->nb_desc;
+	txq->port_id = dev->dev_id;
+	txq->queue_id = qp_id;
+	txq->hw = hw;
+
+	/* Allocate software ring */
+	txq->sw_ring =
+		rte_zmalloc_socket("ntb tx sw ring",
+				   sizeof(struct ntb_tx_entry) *
+				   txq->nb_tx_desc,
+				   RTE_CACHE_LINE_SIZE,
+				   dev->socket_id);
+	if (!txq->sw_ring) {
+		ntb_txq_release(txq);
+		txq = NULL;
+		NTB_LOG(ERR, "Failed to allocate memory for SW TX ring");
+		return -ENOMEM;
+	}
+
+	prev = txq->nb_tx_desc - 1;
+	for (i = 0; i < txq->nb_tx_desc; i++) {
+		txq->sw_ring[i].mbuf = NULL;
+		txq->sw_ring[i].last_id = i;
+		txq->sw_ring[prev].next_id = i;
+		prev = i;
+	}
+
+	txq->tx_free_thresh = txq_conf->tx_free_thresh ?
+			      txq_conf->tx_free_thresh :
+			      NTB_DFLT_TX_FREE_THRESH;
+	if (txq->tx_free_thresh >= txq->nb_tx_desc - 3) {
+		NTB_LOG(ERR, "tx_free_thresh must be less than nb_desc - 3. "
+			"(tx_free_thresh=%u qp_id=%u)", txq->tx_free_thresh,
+			qp_id);
+		return -EINVAL;
+	}
+
+	hw->tx_queues[qp_id] = txq;
+
 	return 0;
 }
 
+
 static int
-ntb_queue_release(struct rte_rawdev *dev __rte_unused,
-		  uint16_t queue_id __rte_unused)
+ntb_queue_setup(struct rte_rawdev *dev,
+		uint16_t queue_id,
+		rte_rawdev_obj_t queue_conf)
 {
+	struct ntb_hw *hw = dev->dev_private;
+	int ret;
+
+	if (queue_id >= hw->queue_pairs)
+		return -EINVAL;
+
+	ret = ntb_txq_setup(dev, queue_id, queue_conf);
+	if (ret < 0)
+		return ret;
+
+	ret = ntb_rxq_setup(dev, queue_id, queue_conf);
+
+	return ret;
+}
+
+static int
+ntb_queue_release(struct rte_rawdev *dev, uint16_t queue_id)
+{
+	struct ntb_hw *hw = dev->dev_private;
+	struct ntb_tx_queue *txq;
+	struct ntb_rx_queue *rxq;
+
+	if (queue_id >= hw->queue_pairs)
+		return -EINVAL;
+
+	txq = hw->tx_queues[queue_id];
+	rxq = hw->rx_queues[queue_id];
+	ntb_txq_release(txq);
+	txq = NULL;
+	ntb_rxq_release(rxq);
+	rxq = NULL;
+
 	return 0;
 }
 
@@ -234,6 +474,77 @@ ntb_queue_count(struct rte_rawdev *dev)
 	return hw->queue_pairs;
 }
 
+static int
+ntb_queue_init(struct rte_rawdev *dev, uint16_t qp_id)
+{
+	struct ntb_hw *hw = dev->dev_private;
+	struct ntb_rx_queue *rxq = hw->rx_queues[qp_id];
+	struct ntb_tx_queue *txq = hw->tx_queues[qp_id];
+	volatile struct ntb_header *local_hdr;
+	struct ntb_header *remote_hdr;
+	uint16_t q_size = hw->queue_size;
+	uint32_t hdr_offset;
+	void *bar_addr;
+	uint16_t i;
+
+	if (hw->ntb_ops->get_peer_mw_addr == NULL) {
+		NTB_LOG(ERR, "Getting peer mw addr is not supported.");
+		return -EINVAL;
+	}
+
+	/* Put queue info into the start of shared memory. */
+	hdr_offset = hw->hdr_size_per_queue * qp_id;
+	local_hdr = (volatile struct ntb_header *)
+		    ((size_t)hw->mz[0]->addr + hdr_offset);
+	bar_addr = (*hw->ntb_ops->get_peer_mw_addr)(dev, 0);
+	if (bar_addr == NULL)
+		return -EINVAL;
+	remote_hdr = (struct ntb_header *)
+		     ((size_t)bar_addr + hdr_offset);
+
+	/* rxq init. */
+	rxq->rx_desc_ring = (struct ntb_desc *)
+			    (&remote_hdr->desc_ring);
+	rxq->rx_used_ring = (volatile struct ntb_used *)
+			    (&local_hdr->desc_ring[q_size]);
+	rxq->avail_cnt = &remote_hdr->avail_cnt;
+	rxq->used_cnt = &local_hdr->used_cnt;
+
+	for (i = 0; i < rxq->nb_rx_desc - 1; i++) {
+		struct rte_mbuf *mbuf = rte_mbuf_raw_alloc(rxq->mpool);
+		if (unlikely(!mbuf)) {
+			NTB_LOG(ERR, "Failed to allocate mbuf for RX");
+			return -ENOMEM;
+		}
+		mbuf->port = dev->dev_id;
+
+		rxq->sw_ring[i].mbuf = mbuf;
+
+		rxq->rx_desc_ring[i].addr = rte_pktmbuf_mtod(mbuf, size_t);
+		rxq->rx_desc_ring[i].len = mbuf->buf_len - RTE_PKTMBUF_HEADROOM;
+	}
+	rte_wmb();
+	*rxq->avail_cnt = rxq->nb_rx_desc - 1;
+	rxq->last_avail = rxq->nb_rx_desc - 1;
+	rxq->last_used = 0;
+
+	/* txq init */
+	txq->tx_desc_ring = (volatile struct ntb_desc *)
+			    (&local_hdr->desc_ring);
+	txq->tx_used_ring = (struct ntb_used *)
+			    (&remote_hdr->desc_ring[q_size]);
+	txq->avail_cnt = &local_hdr->avail_cnt;
+	txq->used_cnt = &remote_hdr->used_cnt;
+
+	rte_wmb();
+	*txq->used_cnt = 0;
+	txq->last_used = 0;
+	txq->last_avail = 0;
+	txq->nb_tx_free = txq->nb_tx_desc - 1;
+
+	return 0;
+}
+
 static int
 ntb_enqueue_bufs(struct rte_rawdev *dev,
 		 struct rte_rawdev_buf **buffers,
@@ -278,58 +589,56 @@ static void
 ntb_dev_info_get(struct rte_rawdev *dev, rte_rawdev_obj_t dev_info)
 {
 	struct ntb_hw *hw = dev->dev_private;
-	struct ntb_attr *ntb_attrs = dev_info;
-
-	strncpy(ntb_attrs[NTB_TOPO_ID].name, NTB_TOPO_NAME, NTB_ATTR_NAME_LEN);
-	switch (hw->topo) {
-	case NTB_TOPO_B2B_DSD:
-		strncpy(ntb_attrs[NTB_TOPO_ID].value, "B2B DSD",
-			NTB_ATTR_VAL_LEN);
-		break;
-	case NTB_TOPO_B2B_USD:
-		strncpy(ntb_attrs[NTB_TOPO_ID].value, "B2B USD",
-			NTB_ATTR_VAL_LEN);
-		break;
-	default:
-		strncpy(ntb_attrs[NTB_TOPO_ID].value, "Unsupported",
-			NTB_ATTR_VAL_LEN);
-	}
+	struct ntb_dev_info *info = dev_info;
 
-	strncpy(ntb_attrs[NTB_LINK_STATUS_ID].name, NTB_LINK_STATUS_NAME,
-		NTB_ATTR_NAME_LEN);
-	snprintf(ntb_attrs[NTB_LINK_STATUS_ID].value, NTB_ATTR_VAL_LEN,
-		 "%d", hw->link_status);
+	info->mw_cnt = hw->mw_cnt;
+	info->mw_size = hw->mw_size;
 
-	strncpy(ntb_attrs[NTB_SPEED_ID].name, NTB_SPEED_NAME,
-		NTB_ATTR_NAME_LEN);
-	snprintf(ntb_attrs[NTB_SPEED_ID].value, NTB_ATTR_VAL_LEN,
-		 "%d", hw->link_speed);
-
-	strncpy(ntb_attrs[NTB_WIDTH_ID].name, NTB_WIDTH_NAME,
-		NTB_ATTR_NAME_LEN);
-	snprintf(ntb_attrs[NTB_WIDTH_ID].value, NTB_ATTR_VAL_LEN,
-		 "%d", hw->link_width);
-
-	strncpy(ntb_attrs[NTB_MW_CNT_ID].name, NTB_MW_CNT_NAME,
-		NTB_ATTR_NAME_LEN);
-	snprintf(ntb_attrs[NTB_MW_CNT_ID].value, NTB_ATTR_VAL_LEN,
-		 "%d", hw->mw_cnt);
+	/**
+	 * Intel hardware requires that mapped memory base address should be
+	 * aligned with EMBARSZ and needs continuous memzone.
+	 */
+	info->mw_size_align = (uint8_t)(hw->pci_dev->id.vendor_id ==
+					NTB_INTEL_VENDOR_ID);
 
-	strncpy(ntb_attrs[NTB_DB_CNT_ID].name, NTB_DB_CNT_NAME,
-		NTB_ATTR_NAME_LEN);
-	snprintf(ntb_attrs[NTB_DB_CNT_ID].value, NTB_ATTR_VAL_LEN,
-		 "%d", hw->db_cnt);
+	if (!hw->queue_size || !hw->queue_pairs) {
+		NTB_LOG(ERR, "No queue size and queue num assigned.");
+		return;
+	}
 
-	strncpy(ntb_attrs[NTB_SPAD_CNT_ID].name, NTB_SPAD_CNT_NAME,
-		NTB_ATTR_NAME_LEN);
-	snprintf(ntb_attrs[NTB_SPAD_CNT_ID].value, NTB_ATTR_VAL_LEN,
-		 "%d", hw->spad_cnt);
+	hw->hdr_size_per_queue = RTE_ALIGN(sizeof(struct ntb_header) +
+				hw->queue_size * sizeof(struct ntb_desc) +
+				hw->queue_size * sizeof(struct ntb_used),
+				RTE_CACHE_LINE_SIZE);
+	info->ntb_hdr_size = hw->hdr_size_per_queue * hw->queue_pairs;
 }
 
 static int
-ntb_dev_configure(const struct rte_rawdev *dev __rte_unused,
-		  rte_rawdev_obj_t config __rte_unused)
+ntb_dev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config)
 {
+	struct ntb_dev_config *conf = config;
+	struct ntb_hw *hw = dev->dev_private;
+	int ret;
+
+	hw->queue_pairs	= conf->num_queues;
+	hw->queue_size = conf->queue_size;
+	hw->used_mw_num = conf->mz_num;
+	hw->mz = conf->mz_list;
+	hw->rx_queues = rte_zmalloc("ntb_rx_queues",
+			sizeof(struct ntb_rx_queue *) * hw->queue_pairs, 0);
+	hw->tx_queues = rte_zmalloc("ntb_tx_queues",
+			sizeof(struct ntb_tx_queue *) * hw->queue_pairs, 0);
+
+	/* Start handshake with the peer. */
+	ret = ntb_handshake_work(dev);
+	if (ret < 0) {
+		rte_free(hw->rx_queues);
+		rte_free(hw->tx_queues);
+		hw->rx_queues = NULL;
+		hw->tx_queues = NULL;
+		return ret;
+	}
+
 	return 0;
 }
 
@@ -337,34 +646,75 @@ static int
 ntb_dev_start(struct rte_rawdev *dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
-	int ret, i;
+	uint32_t peer_base_l, peer_val;
+	uint64_t peer_base_h;
+	uint32_t i;
+	int ret;
 
-	/* TODO: init queues and start queues. */
+	if (!hw->link_status || !hw->peer_dev_up)
+		return -EINVAL;
 
-	/* Map memory of bar_size to remote. */
-	hw->mz = rte_zmalloc("struct rte_memzone *",
-			     hw->mw_cnt * sizeof(struct rte_memzone *), 0);
-	for (i = 0; i < hw->mw_cnt; i++) {
-		ret = ntb_set_mw(dev, i, hw->mw_size[i]);
+	for (i = 0; i < hw->queue_pairs; i++) {
+		ret = ntb_queue_init(dev, i);
 		if (ret) {
-			NTB_LOG(ERR, "Fail to set mw.");
-			return ret;
+			NTB_LOG(ERR, "Failed to init queue.");
+			goto err_up;
 		}
 	}
 
+	hw->peer_mw_base = rte_zmalloc("ntb_peer_mw_base", hw->mw_cnt *
+					sizeof(uint64_t), 0);
+
+	if (hw->ntb_ops->spad_read == NULL) {
+		ret = -ENOTSUP;
+		goto err_up;
+	}
+
+	peer_val = (*hw->ntb_ops->spad_read)(dev, SPAD_Q_SZ, 0);
+	if (peer_val != hw->queue_size) {
+		NTB_LOG(ERR, "Inconsistent queue size! (local: %u peer: %u)",
+			hw->queue_size, peer_val);
+		ret = -EINVAL;
+		goto err_up;
+	}
+
+	peer_val = (*hw->ntb_ops->spad_read)(dev, SPAD_NUM_QPS, 0);
+	if (peer_val != hw->queue_pairs) {
+		NTB_LOG(ERR, "Inconsistent number of queues! (local: %u peer:"
+			" %u)", hw->queue_pairs, peer_val);
+		ret = -EINVAL;
+		goto err_up;
+	}
+
+	hw->peer_used_mws = (*hw->ntb_ops->spad_read)(dev, SPAD_USED_MWS, 0);
+
+	for (i = 0; i < hw->peer_used_mws; i++) {
+		peer_base_h = (*hw->ntb_ops->spad_read)(dev,
+				SPAD_MW0_BA_H + 2 * i, 0);
+		peer_base_l = (*hw->ntb_ops->spad_read)(dev,
+				SPAD_MW0_BA_L + 2 * i, 0);
+		hw->peer_mw_base[i] = (peer_base_h << 32) + peer_base_l;
+	}
+
 	dev->started = 1;
 
 	return 0;
+
+err_up:
+	for (i = 0; i < hw->queue_pairs; i++)
+		ntb_queue_release(dev, i);
+
+	return ret;
 }
 
 static void
 ntb_dev_stop(struct rte_rawdev *dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
+	struct ntb_rx_queue *rxq;
+	struct ntb_tx_queue *txq;
 	uint32_t time_out;
-	int status;
-
-	/* TODO: stop rx/tx queues. */
+	int status, i;
 
 	if (!hw->peer_dev_up)
 		goto clean;
@@ -405,6 +755,13 @@ ntb_dev_stop(struct rte_rawdev *dev)
 	if (status)
 		NTB_LOG(ERR, "Failed to clear doorbells.");
 
+	for (i = 0; i < hw->queue_pairs; i++) {
+		rxq = hw->rx_queues[i];
+		txq = hw->tx_queues[i];
+		ntb_rxq_release_mbufs(rxq);
+		ntb_txq_release_mbufs(txq);
+	}
+
 	dev->started = 0;
 }
 
@@ -413,12 +770,15 @@ ntb_dev_close(struct rte_rawdev *dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	struct rte_intr_handle *intr_handle;
-	int ret = 0;
+	int i;
 
 	if (dev->started)
 		ntb_dev_stop(dev);
 
-	/* TODO: free queues. */
+	/* free queues */
+	for (i = 0; i < hw->queue_pairs; i++)
+		ntb_queue_release(dev, i);
+	hw->queue_pairs = 0;
 
 	intr_handle = &hw->pci_dev->intr_handle;
 	/* Clean datapath event and vec mapping */
@@ -434,7 +794,7 @@ ntb_dev_close(struct rte_rawdev *dev)
 	rte_intr_callback_unregister(intr_handle,
 				     ntb_dev_intr_handler, dev);
 
-	return ret;
+	return 0;
 }
 
 static int
@@ -445,7 +805,7 @@ ntb_dev_reset(struct rte_rawdev *rawdev __rte_unused)
 
 static int
 ntb_attr_set(struct rte_rawdev *dev, const char *attr_name,
-				 uint64_t attr_value)
+	     uint64_t attr_value)
 {
 	struct ntb_hw *hw;
 	int index;
@@ -463,7 +823,21 @@ ntb_attr_set(struct rte_rawdev *dev, const char *attr_name,
 		index = atoi(&attr_name[NTB_SPAD_USER_LEN]);
 		(*hw->ntb_ops->spad_write)(dev, hw->spad_user_list[index],
 					   1, attr_value);
-		NTB_LOG(INFO, "Set attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Set attribute (%s) Value (%" PRIu64 ")",
+			attr_name, attr_value);
+		return 0;
+	}
+
+	if (!strncmp(attr_name, NTB_QUEUE_SZ_NAME, NTB_ATTR_NAME_LEN)) {
+		hw->queue_size = attr_value;
+		NTB_LOG(DEBUG, "Set attribute (%s) Value (%" PRIu64 ")",
+			attr_name, attr_value);
+		return 0;
+	}
+
+	if (!strncmp(attr_name, NTB_QUEUE_NUM_NAME, NTB_ATTR_NAME_LEN)) {
+		hw->queue_pairs = attr_value;
+		NTB_LOG(DEBUG, "Set attribute (%s) Value (%" PRIu64 ")",
 			attr_name, attr_value);
 		return 0;
 	}
@@ -475,7 +849,7 @@ ntb_attr_set(struct rte_rawdev *dev, const char *attr_name,
 
 static int
 ntb_attr_get(struct rte_rawdev *dev, const char *attr_name,
-				 uint64_t *attr_value)
+	     uint64_t *attr_value)
 {
 	struct ntb_hw *hw;
 	int index;
@@ -489,49 +863,50 @@ ntb_attr_get(struct rte_rawdev *dev, const char *attr_name,
 
 	if (!strncmp(attr_name, NTB_TOPO_NAME, NTB_ATTR_NAME_LEN)) {
 		*attr_value = hw->topo;
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
 
 	if (!strncmp(attr_name, NTB_LINK_STATUS_NAME, NTB_ATTR_NAME_LEN)) {
-		*attr_value = hw->link_status;
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		/* hw->link_status only indicates hw link status. */
+		*attr_value = hw->link_status && hw->peer_dev_up;
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
 
 	if (!strncmp(attr_name, NTB_SPEED_NAME, NTB_ATTR_NAME_LEN)) {
 		*attr_value = hw->link_speed;
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
 
 	if (!strncmp(attr_name, NTB_WIDTH_NAME, NTB_ATTR_NAME_LEN)) {
 		*attr_value = hw->link_width;
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
 
 	if (!strncmp(attr_name, NTB_MW_CNT_NAME, NTB_ATTR_NAME_LEN)) {
 		*attr_value = hw->mw_cnt;
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
 
 	if (!strncmp(attr_name, NTB_DB_CNT_NAME, NTB_ATTR_NAME_LEN)) {
 		*attr_value = hw->db_cnt;
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
 
 	if (!strncmp(attr_name, NTB_SPAD_CNT_NAME, NTB_ATTR_NAME_LEN)) {
 		*attr_value = hw->spad_cnt;
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
@@ -542,7 +917,7 @@ ntb_attr_get(struct rte_rawdev *dev, const char *attr_name,
 		index = atoi(&attr_name[NTB_SPAD_USER_LEN]);
 		*attr_value = (*hw->ntb_ops->spad_read)(dev,
 				hw->spad_user_list[index], 0);
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
@@ -585,6 +960,7 @@ ntb_xstats_reset(struct rte_rawdev *dev __rte_unused,
 	return 0;
 }
 
+
 static const struct rte_rawdev_ops ntb_ops = {
 	.dev_info_get         = ntb_dev_info_get,
 	.dev_configure        = ntb_dev_configure,
@@ -615,7 +991,6 @@ ntb_init_hw(struct rte_rawdev *dev, struct rte_pci_device *pci_dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	struct rte_intr_handle *intr_handle;
-	uint32_t val;
 	int ret, i;
 
 	hw->pci_dev = pci_dev;
@@ -688,45 +1063,6 @@ ntb_init_hw(struct rte_rawdev *dev, struct rte_pci_device *pci_dev)
 	/* enable uio intr after callback register */
 	rte_intr_enable(intr_handle);
 
-	if (hw->ntb_ops->spad_write == NULL) {
-		NTB_LOG(ERR, "Scratchpad is not supported.");
-		return -ENOTSUP;
-	}
-	/* Tell peer the mw_cnt of local side. */
-	ret = (*hw->ntb_ops->spad_write)(dev, SPAD_NUM_MWS, 1, hw->mw_cnt);
-	if (ret) {
-		NTB_LOG(ERR, "Failed to tell peer mw count.");
-		return ret;
-	}
-
-	/* Tell peer each mw size on local side. */
-	for (i = 0; i < hw->mw_cnt; i++) {
-		NTB_LOG(DEBUG, "Local %u mw size: 0x%"PRIx64"", i,
-				hw->mw_size[i]);
-		val = hw->mw_size[i] >> 32;
-		ret = (*hw->ntb_ops->spad_write)
-				(dev, SPAD_MW0_SZ_H + 2 * i, 1, val);
-		if (ret) {
-			NTB_LOG(ERR, "Failed to tell peer mw size.");
-			return ret;
-		}
-
-		val = hw->mw_size[i];
-		ret = (*hw->ntb_ops->spad_write)
-				(dev, SPAD_MW0_SZ_L + 2 * i, 1, val);
-		if (ret) {
-			NTB_LOG(ERR, "Failed to tell peer mw size.");
-			return ret;
-		}
-	}
-
-	/* Ring doorbell 0 to tell peer the device is ready. */
-	ret = (*hw->ntb_ops->peer_db_set)(dev, 0);
-	if (ret) {
-		NTB_LOG(ERR, "Failed to tell peer device is probed.");
-		return ret;
-	}
-
 	return ret;
 }
 
@@ -839,5 +1175,5 @@ RTE_INIT(ntb_init_log)
 {
 	ntb_logtype = rte_log_register("pmd.raw.ntb");
 	if (ntb_logtype >= 0)
-		rte_log_set_level(ntb_logtype, RTE_LOG_DEBUG);
+		rte_log_set_level(ntb_logtype, RTE_LOG_INFO);
 }
diff --git a/drivers/raw/ntb/ntb.h b/drivers/raw/ntb/ntb.h
index d355231b0..69f200b99 100644
--- a/drivers/raw/ntb/ntb.h
+++ b/drivers/raw/ntb/ntb.h
@@ -2,8 +2,8 @@
  * Copyright(c) 2019 Intel Corporation.
  */
 
-#ifndef _NTB_RAWDEV_H_
-#define _NTB_RAWDEV_H_
+#ifndef _NTB_H_
+#define _NTB_H_
 
 #include <stdbool.h>
 
@@ -19,38 +19,13 @@ extern int ntb_logtype;
 /* Device IDs */
 #define NTB_INTEL_DEV_ID_B2B_SKX    0x201C
 
-#define NTB_TOPO_NAME               "topo"
-#define NTB_LINK_STATUS_NAME        "link_status"
-#define NTB_SPEED_NAME              "speed"
-#define NTB_WIDTH_NAME              "width"
-#define NTB_MW_CNT_NAME             "mw_count"
-#define NTB_DB_CNT_NAME             "db_count"
-#define NTB_SPAD_CNT_NAME           "spad_count"
 /* Reserved to app to use. */
 #define NTB_SPAD_USER               "spad_user_"
 #define NTB_SPAD_USER_LEN           (sizeof(NTB_SPAD_USER) - 1)
-#define NTB_SPAD_USER_MAX_NUM       10
+#define NTB_SPAD_USER_MAX_NUM       4
 #define NTB_ATTR_NAME_LEN           30
-#define NTB_ATTR_VAL_LEN            30
-#define NTB_ATTR_MAX                20
-
-/* NTB Attributes */
-struct ntb_attr {
-	/**< Name of the attribute */
-	char name[NTB_ATTR_NAME_LEN];
-	/**< Value or reference of value of attribute */
-	char value[NTB_ATTR_NAME_LEN];
-};
 
-enum ntb_attr_idx {
-	NTB_TOPO_ID = 0,
-	NTB_LINK_STATUS_ID,
-	NTB_SPEED_ID,
-	NTB_WIDTH_ID,
-	NTB_MW_CNT_ID,
-	NTB_DB_CNT_ID,
-	NTB_SPAD_CNT_ID,
-};
+#define NTB_DFLT_TX_FREE_THRESH     256
 
 enum ntb_topo {
 	NTB_TOPO_NONE = 0,
@@ -87,10 +62,15 @@ enum ntb_spad_idx {
 	SPAD_NUM_MWS = 1,
 	SPAD_NUM_QPS,
 	SPAD_Q_SZ,
+	SPAD_USED_MWS,
 	SPAD_MW0_SZ_H,
 	SPAD_MW0_SZ_L,
 	SPAD_MW1_SZ_H,
 	SPAD_MW1_SZ_L,
+	SPAD_MW0_BA_H,
+	SPAD_MW0_BA_L,
+	SPAD_MW1_BA_H,
+	SPAD_MW1_BA_L,
 };
 
 /**
@@ -110,26 +90,97 @@ enum ntb_spad_idx {
  * @vector_bind: Bind vector source [intr] to msix vector [msix].
  */
 struct ntb_dev_ops {
-	int (*ntb_dev_init)(struct rte_rawdev *dev);
-	void *(*get_peer_mw_addr)(struct rte_rawdev *dev, int mw_idx);
-	int (*mw_set_trans)(struct rte_rawdev *dev, int mw_idx,
+	int (*ntb_dev_init)(const struct rte_rawdev *dev);
+	void *(*get_peer_mw_addr)(const struct rte_rawdev *dev, int mw_idx);
+	int (*mw_set_trans)(const struct rte_rawdev *dev, int mw_idx,
 			    uint64_t addr, uint64_t size);
-	int (*get_link_status)(struct rte_rawdev *dev);
-	int (*set_link)(struct rte_rawdev *dev, bool up);
-	uint32_t (*spad_read)(struct rte_rawdev *dev, int spad, bool peer);
-	int (*spad_write)(struct rte_rawdev *dev, int spad,
+	int (*get_link_status)(const struct rte_rawdev *dev);
+	int (*set_link)(const struct rte_rawdev *dev, bool up);
+	uint32_t (*spad_read)(const struct rte_rawdev *dev, int spad,
+			      bool peer);
+	int (*spad_write)(const struct rte_rawdev *dev, int spad,
 			  bool peer, uint32_t spad_v);
-	uint64_t (*db_read)(struct rte_rawdev *dev);
-	int (*db_clear)(struct rte_rawdev *dev, uint64_t db_bits);
-	int (*db_set_mask)(struct rte_rawdev *dev, uint64_t db_mask);
-	int (*peer_db_set)(struct rte_rawdev *dev, uint8_t db_bit);
-	int (*vector_bind)(struct rte_rawdev *dev, uint8_t intr, uint8_t msix);
+	uint64_t (*db_read)(const struct rte_rawdev *dev);
+	int (*db_clear)(const struct rte_rawdev *dev, uint64_t db_bits);
+	int (*db_set_mask)(const struct rte_rawdev *dev, uint64_t db_mask);
+	int (*peer_db_set)(const struct rte_rawdev *dev, uint8_t db_bit);
+	int (*vector_bind)(const struct rte_rawdev *dev, uint8_t intr,
+			   uint8_t msix);
+};
+
+struct ntb_desc {
+	uint64_t addr; /* buffer addr */
+	uint16_t len;  /* buffer length */
+	uint16_t rsv1;
+	uint32_t rsv2;
+};
+
+#define NTB_FLAG_EOP    1 /* end of packet */
+struct ntb_used {
+	uint16_t len;     /* buffer length */
+	uint16_t flags;   /* flags */
+};
+
+struct ntb_rx_entry {
+	struct rte_mbuf *mbuf;
+};
+
+struct ntb_rx_queue {
+	struct ntb_desc *rx_desc_ring;
+	volatile struct ntb_used *rx_used_ring;
+	uint16_t *avail_cnt;
+	volatile uint16_t *used_cnt;
+	uint16_t last_avail;
+	uint16_t last_used;
+	uint16_t nb_rx_desc;
+
+	uint16_t rx_free_thresh;
+
+	struct rte_mempool *mpool; /* mempool for mbuf allocation */
+	struct ntb_rx_entry *sw_ring;
+
+	uint16_t queue_id;         /* DPDK queue index. */
+	uint16_t port_id;          /* Device port identifier. */
+
+	struct ntb_hw *hw;
+};
+
+struct ntb_tx_entry {
+	struct rte_mbuf *mbuf;
+	uint16_t next_id;
+	uint16_t last_id;
+};
+
+struct ntb_tx_queue {
+	volatile struct ntb_desc *tx_desc_ring;
+	struct ntb_used *tx_used_ring;
+	volatile uint16_t *avail_cnt;
+	uint16_t *used_cnt;
+	uint16_t last_avail;          /* Next need to be free. */
+	uint16_t last_used;           /* Next need to be sent. */
+	uint16_t nb_tx_desc;
+
+	/* Total number of TX descriptors ready to be allocated. */
+	uint16_t nb_tx_free;
+	uint16_t tx_free_thresh;
+
+	struct ntb_tx_entry *sw_ring;
+
+	uint16_t queue_id;            /* DPDK queue index. */
+	uint16_t port_id;             /* Device port identifier. */
+
+	struct ntb_hw *hw;
+};
+
+struct ntb_header {
+	uint16_t avail_cnt __rte_cache_aligned;
+	uint16_t used_cnt __rte_cache_aligned;
+	struct ntb_desc desc_ring[] __rte_cache_aligned;
 };
 
 /* ntb private data. */
 struct ntb_hw {
 	uint8_t mw_cnt;
-	uint8_t peer_mw_cnt;
 	uint8_t db_cnt;
 	uint8_t spad_cnt;
 
@@ -147,18 +198,26 @@ struct ntb_hw {
 	struct rte_pci_device *pci_dev;
 	char *hw_addr;
 
-	uint64_t *mw_size;
-	uint64_t *peer_mw_size;
 	uint8_t peer_dev_up;
+	uint64_t *mw_size;
+	/* remote mem base addr */
+	uint64_t *peer_mw_base;
 
 	uint16_t queue_pairs;
 	uint16_t queue_size;
+	uint32_t hdr_size_per_queue;
+
+	struct ntb_rx_queue **rx_queues;
+	struct ntb_tx_queue **tx_queues;
 
-	/**< mem zone to populate RX ring. */
+	/* memzone to populate RX ring. */
 	const struct rte_memzone **mz;
+	uint8_t used_mw_num;
+
+	uint8_t peer_used_mws;
 
 	/* Reserve several spad for app to use. */
 	int spad_user_list[NTB_SPAD_USER_MAX_NUM];
 };
 
-#endif /* _NTB_RAWDEV_H_ */
+#endif /* _NTB_H_ */
diff --git a/drivers/raw/ntb/ntb_hw_intel.c b/drivers/raw/ntb/ntb_hw_intel.c
index 21eaa8511..0e73f1609 100644
--- a/drivers/raw/ntb/ntb_hw_intel.c
+++ b/drivers/raw/ntb/ntb_hw_intel.c
@@ -26,7 +26,7 @@ static enum xeon_ntb_bar intel_ntb_bar[] = {
 };
 
 static int
-intel_ntb_dev_init(struct rte_rawdev *dev)
+intel_ntb_dev_init(const struct rte_rawdev *dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint8_t reg_val, bar;
@@ -77,7 +77,7 @@ intel_ntb_dev_init(struct rte_rawdev *dev)
 	hw->db_cnt = XEON_DB_COUNT;
 	hw->spad_cnt = XEON_SPAD_COUNT;
 
-	hw->mw_size = rte_zmalloc("uint64_t",
+	hw->mw_size = rte_zmalloc("ntb_mw_size",
 				  hw->mw_cnt * sizeof(uint64_t), 0);
 	for (i = 0; i < hw->mw_cnt; i++) {
 		bar = intel_ntb_bar[i];
@@ -94,7 +94,7 @@ intel_ntb_dev_init(struct rte_rawdev *dev)
 }
 
 static void *
-intel_ntb_get_peer_mw_addr(struct rte_rawdev *dev, int mw_idx)
+intel_ntb_get_peer_mw_addr(const struct rte_rawdev *dev, int mw_idx)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint8_t bar;
@@ -116,7 +116,7 @@ intel_ntb_get_peer_mw_addr(struct rte_rawdev *dev, int mw_idx)
 }
 
 static int
-intel_ntb_mw_set_trans(struct rte_rawdev *dev, int mw_idx,
+intel_ntb_mw_set_trans(const struct rte_rawdev *dev, int mw_idx,
 		       uint64_t addr, uint64_t size)
 {
 	struct ntb_hw *hw = dev->dev_private;
@@ -163,7 +163,7 @@ intel_ntb_mw_set_trans(struct rte_rawdev *dev, int mw_idx,
 }
 
 static int
-intel_ntb_get_link_status(struct rte_rawdev *dev)
+intel_ntb_get_link_status(const struct rte_rawdev *dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint16_t reg_val;
@@ -195,7 +195,7 @@ intel_ntb_get_link_status(struct rte_rawdev *dev)
 }
 
 static int
-intel_ntb_set_link(struct rte_rawdev *dev, bool up)
+intel_ntb_set_link(const struct rte_rawdev *dev, bool up)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint32_t ntb_ctrl, reg_off;
@@ -221,7 +221,7 @@ intel_ntb_set_link(struct rte_rawdev *dev, bool up)
 }
 
 static uint32_t
-intel_ntb_spad_read(struct rte_rawdev *dev, int spad, bool peer)
+intel_ntb_spad_read(const struct rte_rawdev *dev, int spad, bool peer)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint32_t spad_v, reg_off;
@@ -241,7 +241,7 @@ intel_ntb_spad_read(struct rte_rawdev *dev, int spad, bool peer)
 }
 
 static int
-intel_ntb_spad_write(struct rte_rawdev *dev, int spad,
+intel_ntb_spad_write(const struct rte_rawdev *dev, int spad,
 		     bool peer, uint32_t spad_v)
 {
 	struct ntb_hw *hw = dev->dev_private;
@@ -263,7 +263,7 @@ intel_ntb_spad_write(struct rte_rawdev *dev, int spad,
 }
 
 static uint64_t
-intel_ntb_db_read(struct rte_rawdev *dev)
+intel_ntb_db_read(const struct rte_rawdev *dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint64_t db_off, db_bits;
@@ -278,7 +278,7 @@ intel_ntb_db_read(struct rte_rawdev *dev)
 }
 
 static int
-intel_ntb_db_clear(struct rte_rawdev *dev, uint64_t db_bits)
+intel_ntb_db_clear(const struct rte_rawdev *dev, uint64_t db_bits)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint64_t db_off;
@@ -293,7 +293,7 @@ intel_ntb_db_clear(struct rte_rawdev *dev, uint64_t db_bits)
 }
 
 static int
-intel_ntb_db_set_mask(struct rte_rawdev *dev, uint64_t db_mask)
+intel_ntb_db_set_mask(const struct rte_rawdev *dev, uint64_t db_mask)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint64_t db_m_off;
@@ -312,7 +312,7 @@ intel_ntb_db_set_mask(struct rte_rawdev *dev, uint64_t db_mask)
 }
 
 static int
-intel_ntb_peer_db_set(struct rte_rawdev *dev, uint8_t db_idx)
+intel_ntb_peer_db_set(const struct rte_rawdev *dev, uint8_t db_idx)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint32_t db_off;
@@ -332,7 +332,7 @@ intel_ntb_peer_db_set(struct rte_rawdev *dev, uint8_t db_idx)
 }
 
 static int
-intel_ntb_vector_bind(struct rte_rawdev *dev, uint8_t intr, uint8_t msix)
+intel_ntb_vector_bind(const struct rte_rawdev *dev, uint8_t intr, uint8_t msix)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint8_t reg_off;
diff --git a/drivers/raw/ntb/rte_pmd_ntb.h b/drivers/raw/ntb/rte_pmd_ntb.h
new file mode 100644
index 000000000..6591ce793
--- /dev/null
+++ b/drivers/raw/ntb/rte_pmd_ntb.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Intel Corporation.
+ */
+
+#ifndef _RTE_PMD_NTB_H_
+#define _RTE_PMD_NTB_H_
+
+/* App needs to set/get these attrs */
+#define NTB_QUEUE_SZ_NAME           "queue_size"
+#define NTB_QUEUE_NUM_NAME          "queue_num"
+#define NTB_TOPO_NAME               "topo"
+#define NTB_LINK_STATUS_NAME        "link_status"
+#define NTB_SPEED_NAME              "speed"
+#define NTB_WIDTH_NAME              "width"
+#define NTB_MW_CNT_NAME             "mw_count"
+#define NTB_DB_CNT_NAME             "db_count"
+#define NTB_SPAD_CNT_NAME           "spad_count"
+
+#define NTB_MAX_DESC_SIZE           1024
+#define NTB_MIN_DESC_SIZE           64
+
+struct ntb_dev_info {
+	uint32_t ntb_hdr_size;
+	/**< memzone needs to be mw size align or not. */
+	uint8_t mw_size_align;
+	uint8_t mw_cnt;
+	uint64_t *mw_size;
+};
+
+struct ntb_dev_config {
+	uint16_t num_queues;
+	uint16_t queue_size;
+	uint8_t mz_num;
+	const struct rte_memzone **mz_list;
+};
+
+struct ntb_queue_conf {
+	uint16_t nb_desc;
+	uint16_t tx_free_thresh;
+	struct rte_mempool *rx_mp;
+};
+
+#endif /* _RTE_PMD_NTB_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [PATCH v5 2/4] raw/ntb: add xstats support
  2019-09-24  8:43       ` [dpdk-dev] [PATCH v5 0/4] enable FIFO " Xiaoyun Li
  2019-09-24  8:43         ` [dpdk-dev] [PATCH v5 1/4] raw/ntb: setup ntb queue Xiaoyun Li
@ 2019-09-24  8:43         ` Xiaoyun Li
  2019-09-24  8:43         ` [dpdk-dev] [PATCH v5 3/4] raw/ntb: add enqueue and dequeue functions Xiaoyun Li
                           ` (2 subsequent siblings)
  4 siblings, 0 replies; 42+ messages in thread
From: Xiaoyun Li @ 2019-09-24  8:43 UTC (permalink / raw)
  To: jingjing.wu, keith.wiles, omkar.maslekar, cunming.liang; +Cc: dev, Xiaoyun Li

Add xstats support for ntb rawdev.
Support tx-packets, tx-bytes, tx-errors and
rx-packets, rx-bytes, rx-missed.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
---
 drivers/raw/ntb/ntb.c | 170 +++++++++++++++++++++++++++++++++++++-----
 drivers/raw/ntb/ntb.h |  12 +++
 2 files changed, 164 insertions(+), 18 deletions(-)

diff --git a/drivers/raw/ntb/ntb.c b/drivers/raw/ntb/ntb.c
index fdc6c8a30..1367b2f59 100644
--- a/drivers/raw/ntb/ntb.c
+++ b/drivers/raw/ntb/ntb.c
@@ -30,6 +30,17 @@ static const struct rte_pci_id pci_id_ntb_map[] = {
 	{ .vendor_id = 0, /* sentinel */ },
 };
 
+/* Align with enum ntb_xstats_idx */
+static struct rte_rawdev_xstats_name ntb_xstats_names[] = {
+	{"Tx-packets"},
+	{"Tx-bytes"},
+	{"Tx-errors"},
+	{"Rx-packets"},
+	{"Rx-bytes"},
+	{"Rx-missed"},
+};
+#define NTB_XSTATS_NUM RTE_DIM(ntb_xstats_names)
+
 static inline void
 ntb_link_cleanup(struct rte_rawdev *dev)
 {
@@ -542,6 +553,12 @@ ntb_queue_init(struct rte_rawdev *dev, uint16_t qp_id)
 	txq->last_avail = 0;
 	txq->nb_tx_free = txq->nb_tx_desc - 1;
 
+	/* Set per queue stats. */
+	for (i = 0; i < NTB_XSTATS_NUM; i++) {
+		hw->ntb_xstats[i + NTB_XSTATS_NUM * (qp_id + 1)] = 0;
+		hw->ntb_xstats_off[i + NTB_XSTATS_NUM * (qp_id + 1)] = 0;
+	}
+
 	return 0;
 }
 
@@ -618,6 +635,7 @@ ntb_dev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config)
 {
 	struct ntb_dev_config *conf = config;
 	struct ntb_hw *hw = dev->dev_private;
+	uint32_t xstats_num;
 	int ret;
 
 	hw->queue_pairs	= conf->num_queues;
@@ -628,6 +646,12 @@ ntb_dev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config)
 			sizeof(struct ntb_rx_queue *) * hw->queue_pairs, 0);
 	hw->tx_queues = rte_zmalloc("ntb_tx_queues",
 			sizeof(struct ntb_tx_queue *) * hw->queue_pairs, 0);
+	/* First total stats, then per queue stats. */
+	xstats_num = (hw->queue_pairs + 1) * NTB_XSTATS_NUM;
+	hw->ntb_xstats = rte_zmalloc("ntb_xstats", xstats_num *
+				     sizeof(uint64_t), 0);
+	hw->ntb_xstats_off = rte_zmalloc("ntb_xstats_off", xstats_num *
+					 sizeof(uint64_t), 0);
 
 	/* Start handshake with the peer. */
 	ret = ntb_handshake_work(dev);
@@ -654,6 +678,12 @@ ntb_dev_start(struct rte_rawdev *dev)
 	if (!hw->link_status || !hw->peer_dev_up)
 		return -EINVAL;
 
+	/* Set total stats. */
+	for (i = 0; i < NTB_XSTATS_NUM; i++) {
+		hw->ntb_xstats[i] = 0;
+		hw->ntb_xstats_off[i] = 0;
+	}
+
 	for (i = 0; i < hw->queue_pairs; i++) {
 		ret = ntb_queue_init(dev, i);
 		if (ret) {
@@ -927,39 +957,143 @@ ntb_attr_get(struct rte_rawdev *dev, const char *attr_name,
 	return -EINVAL;
 }
 
+static inline uint64_t
+ntb_stats_update(uint64_t offset, uint64_t stat)
+{
+	if (stat >= offset)
+		return (stat - offset);
+	else
+		return (uint64_t)(((uint64_t)-1) - offset + stat + 1);
+}
+
 static int
-ntb_xstats_get(const struct rte_rawdev *dev __rte_unused,
-	       const unsigned int ids[] __rte_unused,
-	       uint64_t values[] __rte_unused,
-	       unsigned int n __rte_unused)
+ntb_xstats_get(const struct rte_rawdev *dev,
+	       const unsigned int ids[],
+	       uint64_t values[],
+	       unsigned int n)
 {
-	return 0;
+	struct ntb_hw *hw = dev->dev_private;
+	uint32_t i, j, off, xstats_num;
+
+	/* Calculate total stats of all queues. */
+	for (i = 0; i < NTB_XSTATS_NUM; i++) {
+		hw->ntb_xstats[i] = 0;
+		for (j = 0; j < hw->queue_pairs; j++) {
+			off = NTB_XSTATS_NUM * (j + 1) + i;
+			hw->ntb_xstats[i] +=
+			ntb_stats_update(hw->ntb_xstats_off[off],
+					 hw->ntb_xstats[off]);
+		}
+	}
+
+	xstats_num = NTB_XSTATS_NUM * (hw->queue_pairs + 1);
+	for (i = 0; i < n && ids[i] < xstats_num; i++) {
+		if (ids[i] < NTB_XSTATS_NUM)
+			values[i] = hw->ntb_xstats[ids[i]];
+		else
+			values[i] =
+			ntb_stats_update(hw->ntb_xstats_off[ids[i]],
+					 hw->ntb_xstats[ids[i]]);
+	}
+
+	return i;
 }
 
 static int
-ntb_xstats_get_names(const struct rte_rawdev *dev __rte_unused,
-		     struct rte_rawdev_xstats_name *xstats_names __rte_unused,
-		     unsigned int size __rte_unused)
+ntb_xstats_get_names(const struct rte_rawdev *dev,
+		     struct rte_rawdev_xstats_name *xstats_names,
+		     unsigned int size)
 {
-	return 0;
+	struct ntb_hw *hw = dev->dev_private;
+	uint32_t xstats_num, i, j, off;
+
+	xstats_num = NTB_XSTATS_NUM * (hw->queue_pairs + 1);
+	if (xstats_names == NULL || size < xstats_num)
+		return xstats_num;
+
+	/* Total stats names */
+	memcpy(xstats_names, ntb_xstats_names, sizeof(ntb_xstats_names));
+
+	/* Queue stats names */
+	for (i = 0; i < hw->queue_pairs; i++) {
+		for (j = 0; j < NTB_XSTATS_NUM; j++) {
+			off = j + (i + 1) * NTB_XSTATS_NUM;
+			snprintf(xstats_names[off].name,
+				sizeof(xstats_names[0].name),
+				"%s_q%u", ntb_xstats_names[j].name, i);
+		}
+	}
+
+	return xstats_num;
 }
 
 static uint64_t
-ntb_xstats_get_by_name(const struct rte_rawdev *dev __rte_unused,
-		       const char *name __rte_unused,
-		       unsigned int *id __rte_unused)
+ntb_xstats_get_by_name(const struct rte_rawdev *dev,
+		       const char *name, unsigned int *id)
 {
-	return 0;
+	struct rte_rawdev_xstats_name *xstats_names;
+	struct ntb_hw *hw = dev->dev_private;
+	uint32_t xstats_num, i, j, off;
+
+	if (name == NULL)
+		return -EINVAL;
+
+	xstats_num = NTB_XSTATS_NUM * (hw->queue_pairs + 1);
+	xstats_names = rte_zmalloc("ntb_stats_name",
+				   sizeof(struct rte_rawdev_xstats_name) *
+				   xstats_num, 0);
+	ntb_xstats_get_names(dev, xstats_names, xstats_num);
+
+	/* Calculate total stats of all queues. */
+	for (i = 0; i < NTB_XSTATS_NUM; i++) {
+		for (j = 0; j < hw->queue_pairs; j++) {
+			off = NTB_XSTATS_NUM * (j + 1) + i;
+			hw->ntb_xstats[i] +=
+			ntb_stats_update(hw->ntb_xstats_off[off],
+					 hw->ntb_xstats[off]);
+		}
+	}
+
+	for (i = 0; i < xstats_num; i++) {
+		if (!strncmp(name, xstats_names[i].name,
+		    RTE_RAW_DEV_XSTATS_NAME_SIZE)) {
+			*id = i;
+			rte_free(xstats_names);
+			if (i < NTB_XSTATS_NUM)
+				return hw->ntb_xstats[i];
+			else
+				return ntb_stats_update(hw->ntb_xstats_off[i],
+							hw->ntb_xstats[i]);
+		}
+	}
+
+	NTB_LOG(ERR, "Cannot find the xstats name.");
+
+	return -EINVAL;
 }
 
 static int
-ntb_xstats_reset(struct rte_rawdev *dev __rte_unused,
-		 const uint32_t ids[] __rte_unused,
-		 uint32_t nb_ids __rte_unused)
+ntb_xstats_reset(struct rte_rawdev *dev,
+		 const uint32_t ids[],
+		 uint32_t nb_ids)
 {
-	return 0;
-}
+	struct ntb_hw *hw = dev->dev_private;
+	uint32_t i, j, off, xstats_num;
+
+	xstats_num = NTB_XSTATS_NUM * (hw->queue_pairs + 1);
+	for (i = 0; i < nb_ids && ids[i] < xstats_num; i++) {
+		if (ids[i] < NTB_XSTATS_NUM) {
+			for (j = 0; j < hw->queue_pairs; j++) {
+				off = NTB_XSTATS_NUM * (j + 1) + i;
+				hw->ntb_xstats_off[off] = hw->ntb_xstats[off];
+			}
+		} else {
+			hw->ntb_xstats_off[ids[i]] = hw->ntb_xstats[ids[i]];
+		}
+	}
 
+	return i;
+}
 
 static const struct rte_rawdev_ops ntb_ops = {
 	.dev_info_get         = ntb_dev_info_get,
diff --git a/drivers/raw/ntb/ntb.h b/drivers/raw/ntb/ntb.h
index 69f200b99..3cc160680 100644
--- a/drivers/raw/ntb/ntb.h
+++ b/drivers/raw/ntb/ntb.h
@@ -27,6 +27,15 @@ extern int ntb_logtype;
 
 #define NTB_DFLT_TX_FREE_THRESH     256
 
+enum ntb_xstats_idx {
+	NTB_TX_PKTS_ID = 0,
+	NTB_TX_BYTES_ID,
+	NTB_TX_ERRS_ID,
+	NTB_RX_PKTS_ID,
+	NTB_RX_BYTES_ID,
+	NTB_RX_MISS_ID,
+};
+
 enum ntb_topo {
 	NTB_TOPO_NONE = 0,
 	NTB_TOPO_B2B_USD,
@@ -216,6 +225,9 @@ struct ntb_hw {
 
 	uint8_t peer_used_mws;
 
+	uint64_t *ntb_xstats;
+	uint64_t *ntb_xstats_off;
+
 	/* Reserve several spad for app to use. */
 	int spad_user_list[NTB_SPAD_USER_MAX_NUM];
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [PATCH v5 3/4] raw/ntb: add enqueue and dequeue functions
  2019-09-24  8:43       ` [dpdk-dev] [PATCH v5 0/4] enable FIFO " Xiaoyun Li
  2019-09-24  8:43         ` [dpdk-dev] [PATCH v5 1/4] raw/ntb: setup ntb queue Xiaoyun Li
  2019-09-24  8:43         ` [dpdk-dev] [PATCH v5 2/4] raw/ntb: add xstats support Xiaoyun Li
@ 2019-09-24  8:43         ` Xiaoyun Li
  2019-09-24  8:43         ` [dpdk-dev] [PATCH v5 4/4] examples/ntb: support more functions for NTB Xiaoyun Li
  2019-09-26  3:20         ` [dpdk-dev] [PATCH v6 0/4] enable FIFO " Xiaoyun Li
  4 siblings, 0 replies; 42+ messages in thread
From: Xiaoyun Li @ 2019-09-24  8:43 UTC (permalink / raw)
  To: jingjing.wu, keith.wiles, omkar.maslekar, cunming.liang; +Cc: dev, Xiaoyun Li

Introduce enqueue and dequeue functions to support packet based
processing. And enable write-combining for ntb driver since it
can improve the performance a lot.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
---
 doc/guides/rawdevs/ntb.rst     |  28 ++++
 drivers/raw/ntb/ntb.c          | 242 ++++++++++++++++++++++++++++++---
 drivers/raw/ntb/ntb.h          |   2 +
 drivers/raw/ntb/ntb_hw_intel.c |  22 +++
 4 files changed, 275 insertions(+), 19 deletions(-)

diff --git a/doc/guides/rawdevs/ntb.rst b/doc/guides/rawdevs/ntb.rst
index 99e7db441..afd5769fc 100644
--- a/doc/guides/rawdevs/ntb.rst
+++ b/doc/guides/rawdevs/ntb.rst
@@ -45,6 +45,24 @@ to use, i.e. igb_uio, vfio. The ``dpdk-devbind.py`` script can be used to
 show devices status and to bind them to a suitable kernel driver. They will
 appear under the category of "Misc (rawdev) devices".
 
+Prerequisites
+-------------
+NTB PMD needs kernel PCI driver to support write combining (WC) to get
+better performance. The difference will be more than 10 times.
+To enable WC, there are 2 ways.
+- Insert igb_uio with ``wc_active=1`` flag if use igb_uio driver.
+     insmod igb_uio.ko wc_active=1
+- Enable WC for NTB device's Bar 2 and Bar 4 (Mapped memory) manually.
+     Get bar base address using ``lspci -vvv -s ae:00.0 | grep Region``.
+        Region 0: Memory at 39bfe0000000 (64-bit, prefetchable) [size=64K]
+        Region 2: Memory at 39bfa0000000 (64-bit, prefetchable) [size=512M]
+        Region 4: Memory at 39bfc0000000 (64-bit, prefetchable) [size=512M]
+     Using the following command to enable WC.
+     echo "base=0x39bfa0000000 size=0x400000 type=write-combining" >> /proc/mtrr
+     echo "base=0x39bfc0000000 size=0x400000 type=write-combining" >> /proc/mtrr
+     To disable WC for these regions, using the following.
+     echo "disable=1" >> /proc/mtrr
+
 Ring Layout
 -----------
 
@@ -83,6 +101,16 @@ like the following:
       +------------------------+   +------------------------+
                     <---------traffic---------
 
+- Enqueue and Dequeue
+  Based on this ring layout, enqueue reads rx_tail to get how many free
+  buffers and writes used_ring and tx_tail to tell the peer which buffers
+  are filled with data.
+  And dequeue reads tx_tail to get how many packets are arrived, and
+  writes desc_ring and rx_tail to tell the peer about the new allocated
+  buffers.
+  So in this way, only remote write happens and remote read can be avoid
+  to get better performance.
+
 Limitation
 ----------
 
diff --git a/drivers/raw/ntb/ntb.c b/drivers/raw/ntb/ntb.c
index 1367b2f59..d0f4f1af0 100644
--- a/drivers/raw/ntb/ntb.c
+++ b/drivers/raw/ntb/ntb.c
@@ -562,26 +562,140 @@ ntb_queue_init(struct rte_rawdev *dev, uint16_t qp_id)
 	return 0;
 }
 
+static inline void
+ntb_enqueue_cleanup(struct ntb_tx_queue *txq)
+{
+	struct ntb_tx_entry *sw_ring = txq->sw_ring;
+	uint16_t tx_free = txq->last_avail;
+	uint16_t nb_to_clean, i;
+
+	/* avail_cnt + 1 represents where to rx next in the peer. */
+	nb_to_clean = (*txq->avail_cnt - txq->last_avail + 1 +
+			txq->nb_tx_desc) & (txq->nb_tx_desc - 1);
+	nb_to_clean = RTE_MIN(nb_to_clean, txq->tx_free_thresh);
+	for (i = 0; i < nb_to_clean; i++) {
+		if (sw_ring[tx_free].mbuf)
+			rte_pktmbuf_free_seg(sw_ring[tx_free].mbuf);
+		tx_free = (tx_free + 1) & (txq->nb_tx_desc - 1);
+	}
+
+	txq->nb_tx_free += nb_to_clean;
+	txq->last_avail = tx_free;
+}
+
 static int
 ntb_enqueue_bufs(struct rte_rawdev *dev,
 		 struct rte_rawdev_buf **buffers,
 		 unsigned int count,
 		 rte_rawdev_obj_t context)
 {
-	/* Not FIFO right now. Just for testing memory write. */
 	struct ntb_hw *hw = dev->dev_private;
-	unsigned int i;
-	void *bar_addr;
-	size_t size;
+	struct ntb_tx_queue *txq = hw->tx_queues[(size_t)context];
+	struct ntb_tx_entry *sw_ring = txq->sw_ring;
+	struct rte_mbuf *txm;
+	struct ntb_used tx_used[NTB_MAX_DESC_SIZE];
+	volatile struct ntb_desc *tx_item;
+	uint16_t tx_last, nb_segs, off, last_used, avail_cnt;
+	uint16_t nb_mbufs = 0;
+	uint16_t nb_tx = 0;
+	uint64_t bytes = 0;
+	void *buf_addr;
+	int i;
 
-	if (hw->ntb_ops->get_peer_mw_addr == NULL)
-		return -ENOTSUP;
-	bar_addr = (*hw->ntb_ops->get_peer_mw_addr)(dev, 0);
-	size = (size_t)context;
+	if (unlikely(hw->ntb_ops->ioremap == NULL)) {
+		NTB_LOG(ERR, "Ioremap not supported.");
+		return nb_tx;
+	}
 
-	for (i = 0; i < count; i++)
-		rte_memcpy(bar_addr, buffers[i]->buf_addr, size);
-	return 0;
+	if (unlikely(dev->started == 0 || hw->peer_dev_up == 0)) {
+		NTB_LOG(DEBUG, "Link is not up.");
+		return nb_tx;
+	}
+
+	if (txq->nb_tx_free < txq->tx_free_thresh)
+		ntb_enqueue_cleanup(txq);
+
+	off = NTB_XSTATS_NUM * ((size_t)context + 1);
+	last_used = txq->last_used;
+	avail_cnt = *txq->avail_cnt;/* Where to alloc next. */
+	for (nb_tx = 0; nb_tx < count; nb_tx++) {
+		txm = (struct rte_mbuf *)(buffers[nb_tx]->buf_addr);
+		if (txm == NULL || txq->nb_tx_free < txm->nb_segs)
+			break;
+
+		tx_last = (txq->last_used + txm->nb_segs - 1) &
+			  (txq->nb_tx_desc - 1);
+		nb_segs = txm->nb_segs;
+		for (i = 0; i < nb_segs; i++) {
+			/* Not enough ring space for tx. */
+			if (txq->last_used == avail_cnt)
+				goto end_of_tx;
+			sw_ring[txq->last_used].mbuf = txm;
+			tx_item = txq->tx_desc_ring + txq->last_used;
+
+			if (!tx_item->len) {
+				(hw->ntb_xstats[NTB_TX_ERRS_ID + off])++;
+				goto end_of_tx;
+			}
+			if (txm->data_len > tx_item->len) {
+				NTB_LOG(ERR, "Data length exceeds buf length."
+					" Only %u data would be transmitted.",
+					tx_item->len);
+				txm->data_len = tx_item->len;
+			}
+
+			/* translate remote virtual addr to bar virtual addr */
+			buf_addr = (*hw->ntb_ops->ioremap)(dev, tx_item->addr);
+			if (buf_addr == NULL) {
+				(hw->ntb_xstats[NTB_TX_ERRS_ID + off])++;
+				NTB_LOG(ERR, "Null remap addr.");
+				goto end_of_tx;
+			}
+			rte_memcpy(buf_addr, rte_pktmbuf_mtod(txm, void *),
+				   txm->data_len);
+
+			tx_used[nb_mbufs].len = txm->data_len;
+			tx_used[nb_mbufs++].flags = (txq->last_used ==
+						    tx_last) ?
+						    NTB_FLAG_EOP : 0;
+
+			/* update stats */
+			bytes += txm->data_len;
+
+			txm = txm->next;
+
+			sw_ring[txq->last_used].next_id = (txq->last_used + 1) &
+						  (txq->nb_tx_desc - 1);
+			sw_ring[txq->last_used].last_id = tx_last;
+			txq->last_used = (txq->last_used + 1) &
+					 (txq->nb_tx_desc - 1);
+		}
+		txq->nb_tx_free -= nb_segs;
+	}
+
+end_of_tx:
+	if (nb_tx) {
+		uint16_t nb1, nb2;
+		if (nb_mbufs > txq->nb_tx_desc - last_used) {
+			nb1 = txq->nb_tx_desc - last_used;
+			nb2 = nb_mbufs - txq->nb_tx_desc + last_used;
+		} else {
+			nb1 = nb_mbufs;
+			nb2 = 0;
+		}
+		rte_memcpy(txq->tx_used_ring + last_used, tx_used,
+			   sizeof(struct ntb_used) * nb1);
+		rte_memcpy(txq->tx_used_ring, tx_used + nb1,
+			   sizeof(struct ntb_used) * nb2);
+		*txq->used_cnt = txq->last_used;
+		rte_wmb();
+
+		/* update queue stats */
+		hw->ntb_xstats[NTB_TX_BYTES_ID + off] += bytes;
+		hw->ntb_xstats[NTB_TX_PKTS_ID + off] += nb_tx;
+	}
+
+	return nb_tx;
 }
 
 static int
@@ -590,16 +704,106 @@ ntb_dequeue_bufs(struct rte_rawdev *dev,
 		 unsigned int count,
 		 rte_rawdev_obj_t context)
 {
-	/* Not FIFO. Just for testing memory read. */
 	struct ntb_hw *hw = dev->dev_private;
-	unsigned int i;
-	size_t size;
+	struct ntb_rx_queue *rxq = hw->rx_queues[(size_t)context];
+	struct ntb_rx_entry *sw_ring = rxq->sw_ring;
+	struct ntb_desc rx_desc[NTB_MAX_DESC_SIZE];
+	struct rte_mbuf *first, *rxm_t;
+	struct rte_mbuf *prev = NULL;
+	volatile struct ntb_used *rx_item;
+	uint16_t nb_mbufs = 0;
+	uint16_t nb_rx = 0;
+	uint64_t bytes = 0;
+	uint16_t off, last_avail, used_cnt, used_nb;
+	int i;
+
+	if (unlikely(dev->started == 0 || hw->peer_dev_up == 0)) {
+		NTB_LOG(DEBUG, "Link is not up");
+		return nb_rx;
+	}
+
+	used_cnt = *rxq->used_cnt;
+
+	if (rxq->last_used == used_cnt)
+		return nb_rx;
+
+	last_avail = rxq->last_avail;
+	used_nb = (used_cnt - rxq->last_used) & (rxq->nb_rx_desc - 1);
+	count = RTE_MIN(count, used_nb);
+	for (nb_rx = 0; nb_rx < count; nb_rx++) {
+		i = 0;
+		while (true) {
+			rx_item = rxq->rx_used_ring + rxq->last_used;
+			rxm_t = sw_ring[rxq->last_used].mbuf;
+			rxm_t->data_len = rx_item->len;
+			rxm_t->data_off = RTE_PKTMBUF_HEADROOM;
+			rxm_t->port = rxq->port_id;
+
+			if (!i) {
+				rxm_t->nb_segs = 1;
+				first = rxm_t;
+				first->pkt_len = 0;
+				buffers[nb_rx]->buf_addr = rxm_t;
+			} else {
+				prev->next = rxm_t;
+				first->nb_segs++;
+			}
 
-	size = (size_t)context;
+			prev = rxm_t;
+			first->pkt_len += prev->data_len;
+			rxq->last_used = (rxq->last_used + 1) &
+					 (rxq->nb_rx_desc - 1);
 
-	for (i = 0; i < count; i++)
-		rte_memcpy(buffers[i]->buf_addr, hw->mz[i]->addr, size);
-	return 0;
+			/* alloc new mbuf */
+			rxm_t = rte_mbuf_raw_alloc(rxq->mpool);
+			if (unlikely(rxm_t == NULL)) {
+				NTB_LOG(ERR, "recv alloc mbuf failed.");
+				goto end_of_rx;
+			}
+			rxm_t->port = rxq->port_id;
+			sw_ring[rxq->last_avail].mbuf = rxm_t;
+			i++;
+
+			/* fill new desc */
+			rx_desc[nb_mbufs].addr =
+					rte_pktmbuf_mtod(rxm_t, size_t);
+			rx_desc[nb_mbufs++].len = rxm_t->buf_len -
+						  RTE_PKTMBUF_HEADROOM;
+			rxq->last_avail = (rxq->last_avail + 1) &
+					  (rxq->nb_rx_desc - 1);
+
+			if (rx_item->flags & NTB_FLAG_EOP)
+				break;
+		}
+		/* update stats */
+		bytes += first->pkt_len;
+	}
+
+end_of_rx:
+	if (nb_rx) {
+		uint16_t nb1, nb2;
+		if (nb_mbufs > rxq->nb_rx_desc - last_avail) {
+			nb1 = rxq->nb_rx_desc - last_avail;
+			nb2 = nb_mbufs - rxq->nb_rx_desc + last_avail;
+		} else {
+			nb1 = nb_mbufs;
+			nb2 = 0;
+		}
+		rte_memcpy(rxq->rx_desc_ring + last_avail, rx_desc,
+			   sizeof(struct ntb_desc) * nb1);
+		rte_memcpy(rxq->rx_desc_ring, rx_desc + nb1,
+			   sizeof(struct ntb_desc) * nb2);
+		*rxq->avail_cnt = rxq->last_avail;
+		rte_wmb();
+
+		/* update queue stats */
+		off = NTB_XSTATS_NUM * ((size_t)context + 1);
+		hw->ntb_xstats[NTB_RX_BYTES_ID + off] += bytes;
+		hw->ntb_xstats[NTB_RX_PKTS_ID + off] += nb_rx;
+		hw->ntb_xstats[NTB_RX_MISS_ID + off] += (count - nb_rx);
+	}
+
+	return nb_rx;
 }
 
 static void
@@ -1296,7 +1500,7 @@ ntb_remove(struct rte_pci_device *pci_dev)
 
 static struct rte_pci_driver rte_ntb_pmd = {
 	.id_table = pci_id_ntb_map,
-	.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_WC_ACTIVATE,
 	.probe = ntb_probe,
 	.remove = ntb_remove,
 };
diff --git a/drivers/raw/ntb/ntb.h b/drivers/raw/ntb/ntb.h
index 3cc160680..a561c42d1 100644
--- a/drivers/raw/ntb/ntb.h
+++ b/drivers/raw/ntb/ntb.h
@@ -87,6 +87,7 @@ enum ntb_spad_idx {
  * @ntb_dev_init: Init ntb dev.
  * @get_peer_mw_addr: To get the addr of peer mw[mw_idx].
  * @mw_set_trans: Set translation of internal memory that remote can access.
+ * @ioremap: Translate the remote host address to bar address.
  * @get_link_status: get link status, link speed and link width.
  * @set_link: Set local side up/down.
  * @spad_read: Read local/peer spad register val.
@@ -103,6 +104,7 @@ struct ntb_dev_ops {
 	void *(*get_peer_mw_addr)(const struct rte_rawdev *dev, int mw_idx);
 	int (*mw_set_trans)(const struct rte_rawdev *dev, int mw_idx,
 			    uint64_t addr, uint64_t size);
+	void *(*ioremap)(const struct rte_rawdev *dev, uint64_t addr);
 	int (*get_link_status)(const struct rte_rawdev *dev);
 	int (*set_link)(const struct rte_rawdev *dev, bool up);
 	uint32_t (*spad_read)(const struct rte_rawdev *dev, int spad,
diff --git a/drivers/raw/ntb/ntb_hw_intel.c b/drivers/raw/ntb/ntb_hw_intel.c
index 0e73f1609..e7f8667cd 100644
--- a/drivers/raw/ntb/ntb_hw_intel.c
+++ b/drivers/raw/ntb/ntb_hw_intel.c
@@ -162,6 +162,27 @@ intel_ntb_mw_set_trans(const struct rte_rawdev *dev, int mw_idx,
 	return 0;
 }
 
+static void *
+intel_ntb_ioremap(const struct rte_rawdev *dev, uint64_t addr)
+{
+	struct ntb_hw *hw = dev->dev_private;
+	void *mapped = NULL;
+	void *base;
+	int i;
+
+	for (i = 0; i < hw->peer_used_mws; i++) {
+		if (addr >= hw->peer_mw_base[i] &&
+		    addr <= hw->peer_mw_base[i] + hw->mw_size[i]) {
+			base = intel_ntb_get_peer_mw_addr(dev, i);
+			mapped = (void *)(size_t)(addr - hw->peer_mw_base[i] +
+				 (size_t)base);
+			break;
+		}
+	}
+
+	return mapped;
+}
+
 static int
 intel_ntb_get_link_status(const struct rte_rawdev *dev)
 {
@@ -357,6 +378,7 @@ const struct ntb_dev_ops intel_ntb_ops = {
 	.ntb_dev_init       = intel_ntb_dev_init,
 	.get_peer_mw_addr   = intel_ntb_get_peer_mw_addr,
 	.mw_set_trans       = intel_ntb_mw_set_trans,
+	.ioremap            = intel_ntb_ioremap,
 	.get_link_status    = intel_ntb_get_link_status,
 	.set_link           = intel_ntb_set_link,
 	.spad_read          = intel_ntb_spad_read,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [PATCH v5 4/4] examples/ntb: support more functions for NTB
  2019-09-24  8:43       ` [dpdk-dev] [PATCH v5 0/4] enable FIFO " Xiaoyun Li
                           ` (2 preceding siblings ...)
  2019-09-24  8:43         ` [dpdk-dev] [PATCH v5 3/4] raw/ntb: add enqueue and dequeue functions Xiaoyun Li
@ 2019-09-24  8:43         ` Xiaoyun Li
  2019-09-26  3:20         ` [dpdk-dev] [PATCH v6 0/4] enable FIFO " Xiaoyun Li
  4 siblings, 0 replies; 42+ messages in thread
From: Xiaoyun Li @ 2019-09-24  8:43 UTC (permalink / raw)
  To: jingjing.wu, keith.wiles, omkar.maslekar, cunming.liang; +Cc: dev, Xiaoyun Li

Support to transmit files between two systems.
Support iofwd between one ethdev and NTB device.
Support rxonly and txonly for NTB device.
Support to set forwarding mode as file-trans, txonly,
rxonly or iofwd.
Support to show/clear port stats and throughput.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
---
 doc/guides/sample_app_ug/ntb.rst |   59 +-
 examples/ntb/meson.build         |    3 +
 examples/ntb/ntb_fwd.c           | 1298 +++++++++++++++++++++++++++---
 3 files changed, 1232 insertions(+), 128 deletions(-)

diff --git a/doc/guides/sample_app_ug/ntb.rst b/doc/guides/sample_app_ug/ntb.rst
index 079242175..df16af86c 100644
--- a/doc/guides/sample_app_ug/ntb.rst
+++ b/doc/guides/sample_app_ug/ntb.rst
@@ -5,8 +5,17 @@ NTB Sample Application
 ======================
 
 The ntb sample application shows how to use ntb rawdev driver.
-This sample provides interactive mode to transmit file between
-two hosts.
+This sample provides interactive mode to do packet based processing
+between two systems.
+
+This sample supports 4 types of packet forwarding mode.
+
+* ``file-trans``: transmit files between two systems. The sample will
+  be polling to receive files from the peer and save the file as
+  ``ntb_recv_file[N]``, [N] represents the number of received file.
+* ``rxonly``: NTB receives packets but doesn't transmit them.
+* ``txonly``: NTB generates and transmits packets without receiving any.
+* ``iofwd``: iofwd between NTB device and ethdev.
 
 Compiling the Application
 -------------------------
@@ -29,6 +38,40 @@ Refer to the *DPDK Getting Started Guide* for general information on
 running applications and the Environment Abstraction Layer (EAL)
 options.
 
+Command-line Options
+--------------------
+
+The application supports the following command-line options.
+
+* ``--buf-size=N``
+
+  Set the data size of the mbufs used to N bytes, where N < 65536.
+  The default value is 2048.
+
+* ``--fwd-mode=mode``
+
+  Set the packet forwarding mode as ``file-trans``, ``txonly``,
+  ``rxonly`` or ``iofwd``.
+
+* ``--nb-desc=N``
+
+  Set number of descriptors of queue as N, namely queue size,
+  where 64 <= N <= 1024. The default value is 1024.
+
+* ``--txfreet=N``
+
+  Set the transmit free threshold of TX rings to N, where 0 <= N <=
+  the value of ``--nb-desc``. The default value is 256.
+
+* ``--burst=N``
+
+  Set the number of packets per burst to N, where 1 <= N <= 32.
+  The default value is 32.
+
+* ``--qp=N``
+
+  Set the number of queues as N, where qp > 0. The default value is 1.
+
 Using the application
 ---------------------
 
@@ -41,7 +84,11 @@ The application is console-driven using the cmdline DPDK interface:
 From this interface the available commands and descriptions of what
 they do as as follows:
 
-* ``send [filepath]``: Send file to the peer host.
-* ``receive [filepath]``: Receive file to [filepath]. Need the peer
-  to send file successfully first.
-* ``quit``: Exit program
+* ``send [filepath]``: Send file to the peer host. Need to be in
+  file-trans forwarding mode first.
+* ``start``: Start transmission.
+* ``stop``: Stop transmission.
+* ``show/clear port stats``: Show/Clear port stats and throughput.
+* ``set fwd file-trans/rxonly/txonly/iofwd``: Set packet forwarding
+  mode.
+* ``quit``: Exit program.
diff --git a/examples/ntb/meson.build b/examples/ntb/meson.build
index 9a6288f4f..f5435fe12 100644
--- a/examples/ntb/meson.build
+++ b/examples/ntb/meson.build
@@ -14,3 +14,6 @@ cflags += ['-D_FILE_OFFSET_BITS=64']
 sources = files(
 	'ntb_fwd.c'
 )
+if dpdk_conf.has('RTE_LIBRTE_PMD_NTB_RAWDEV')
+	deps += 'rawdev_ntb'
+endif
diff --git a/examples/ntb/ntb_fwd.c b/examples/ntb/ntb_fwd.c
index f8c970cdb..b1ea71c8f 100644
--- a/examples/ntb/ntb_fwd.c
+++ b/examples/ntb/ntb_fwd.c
@@ -14,21 +14,103 @@
 #include <cmdline.h>
 #include <rte_common.h>
 #include <rte_rawdev.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
 #include <rte_lcore.h>
+#include <rte_cycles.h>
+#include <rte_pmd_ntb.h>
 
-#define NTB_DRV_NAME_LEN	7
-static uint64_t max_file_size = 0x400000;
+/* Per-port statistics struct */
+struct ntb_port_statistics {
+	uint64_t tx;
+	uint64_t rx;
+} __rte_cache_aligned;
+/* Port 0: NTB dev, Port 1: ethdev when iofwd. */
+struct ntb_port_statistics ntb_port_stats[2];
+
+struct ntb_fwd_stream {
+	uint16_t tx_port;
+	uint16_t rx_port;
+	uint16_t qp_id;
+	uint8_t tx_ntb;  /* If ntb device is tx port. */
+};
+
+struct ntb_fwd_lcore_conf {
+	uint16_t stream_id;
+	uint16_t nb_stream;
+	uint8_t stopped;
+};
+
+enum ntb_fwd_mode {
+	FILE_TRANS = 0,
+	RXONLY,
+	TXONLY,
+	IOFWD,
+	MAX_FWD_MODE,
+};
+static const char *const fwd_mode_s[] = {
+	"file-trans",
+	"rxonly",
+	"txonly",
+	"iofwd",
+	NULL,
+};
+static enum ntb_fwd_mode fwd_mode = MAX_FWD_MODE;
+
+static struct ntb_fwd_lcore_conf fwd_lcore_conf[RTE_MAX_LCORE];
+static struct ntb_fwd_stream *fwd_streams;
+
+static struct rte_mempool *mbuf_pool;
+
+#define NTB_DRV_NAME_LEN 7
+#define MEMPOOL_CACHE_SIZE 256
+
+static uint8_t in_test;
 static uint8_t interactive = 1;
+static uint16_t eth_port_id = RTE_MAX_ETHPORTS;
 static uint16_t dev_id;
 
+/* Number of queues, default set as 1 */
+static uint16_t num_queues = 1;
+static uint16_t ntb_buf_size = RTE_MBUF_DEFAULT_BUF_SIZE;
+
+/* Configurable number of descriptors */
+#define NTB_DEFAULT_NUM_DESCS 1024
+static uint16_t nb_desc = NTB_DEFAULT_NUM_DESCS;
+
+static uint16_t tx_free_thresh;
+
+#define NTB_MAX_PKT_BURST 32
+#define NTB_DFLT_PKT_BURST 32
+static uint16_t pkt_burst = NTB_DFLT_PKT_BURST;
+
+#define BURST_TX_RETRIES 64
+
+static struct rte_eth_conf eth_port_conf = {
+	.rxmode = {
+		.mq_mode = ETH_MQ_RX_RSS,
+		.split_hdr_size = 0,
+	},
+	.rx_adv_conf = {
+		.rss_conf = {
+			.rss_key = NULL,
+			.rss_hf = ETH_RSS_IP,
+		},
+	},
+	.txmode = {
+		.mq_mode = ETH_MQ_TX_NONE,
+	},
+};
+
 /* *** Help command with introduction. *** */
 struct cmd_help_result {
 	cmdline_fixed_string_t help;
 };
 
-static void cmd_help_parsed(__attribute__((unused)) void *parsed_result,
-			    struct cmdline *cl,
-			    __attribute__((unused)) void *data)
+static void
+cmd_help_parsed(__attribute__((unused)) void *parsed_result,
+		struct cmdline *cl,
+		__attribute__((unused)) void *data)
 {
 	cmdline_printf(
 		cl,
@@ -37,13 +119,17 @@ static void cmd_help_parsed(__attribute__((unused)) void *parsed_result,
 		"Control:\n"
 		"    quit                                      :"
 		" Quit the application.\n"
-		"\nFile transmit:\n"
+		"\nTransmission:\n"
 		"    send [path]                               :"
-		" Send [path] file. (No more than %"PRIu64")\n"
-		"    recv [path]                            :"
-		" Receive file to [path]. Make sure sending is done"
-		" on the other side.\n",
-		max_file_size
+		" Send [path] file. Only take effect in file-trans mode\n"
+		"    start                                     :"
+		" Start transmissions.\n"
+		"    stop                                      :"
+		" Stop transmissions.\n"
+		"    clear/show port stats                     :"
+		" Clear/show port stats.\n"
+		"    set fwd file-trans/rxonly/txonly/iofwd    :"
+		" Set packet forwarding mode.\n"
 	);
 
 }
@@ -66,13 +152,37 @@ struct cmd_quit_result {
 	cmdline_fixed_string_t quit;
 };
 
-static void cmd_quit_parsed(__attribute__((unused)) void *parsed_result,
-			    struct cmdline *cl,
-			    __attribute__((unused)) void *data)
+static void
+cmd_quit_parsed(__attribute__((unused)) void *parsed_result,
+		struct cmdline *cl,
+		__attribute__((unused)) void *data)
 {
+	struct ntb_fwd_lcore_conf *conf;
+	uint8_t lcore_id;
+
+	/* Stop transmission first. */
+	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
+		conf = &fwd_lcore_conf[lcore_id];
+
+		if (!conf->nb_stream)
+			continue;
+
+		if (conf->stopped)
+			continue;
+
+		conf->stopped = 1;
+	}
+	printf("\nWaiting for lcores to finish...\n");
+	rte_eal_mp_wait_lcore();
+	in_test = 0;
+
 	/* Stop traffic and Close port. */
 	rte_rawdev_stop(dev_id);
 	rte_rawdev_close(dev_id);
+	if (eth_port_id < RTE_MAX_ETHPORTS && fwd_mode == IOFWD) {
+		rte_eth_dev_stop(eth_port_id);
+		rte_eth_dev_close(eth_port_id);
+	}
 
 	cmdline_quit(cl);
 }
@@ -102,21 +212,19 @@ cmd_sendfile_parsed(void *parsed_result,
 		    __attribute__((unused)) void *data)
 {
 	struct cmd_sendfile_result *res = parsed_result;
-	struct rte_rawdev_buf *pkts_send[1];
-	uint64_t rsize, size, link;
-	uint8_t *buff;
+	struct rte_rawdev_buf *pkts_send[NTB_MAX_PKT_BURST];
+	struct rte_mbuf *mbuf_send[NTB_MAX_PKT_BURST];
+	uint64_t size, count, i, nb_burst;
+	uint16_t nb_tx, buf_size;
+	unsigned int nb_pkt;
+	size_t queue_id = 0;
+	uint16_t retry = 0;
 	uint32_t val;
 	FILE *file;
 
-	if (!rte_rawdevs[dev_id].started) {
-		printf("Device needs to be up first. Try later.\n");
-		return;
-	}
-
-	rte_rawdev_get_attr(dev_id, "link_status", &link);
-	if (!link) {
-		printf("Link is not up, cannot send file.\n");
-		return;
+	if (num_queues != 1) {
+		printf("File transmission only supports 1 queue.\n");
+		num_queues = 1;
 	}
 
 	file = fopen(res->filepath, "r");
@@ -127,30 +235,13 @@ cmd_sendfile_parsed(void *parsed_result,
 
 	if (fseek(file, 0, SEEK_END) < 0) {
 		printf("Fail to get file size.\n");
+		fclose(file);
 		return;
 	}
 	size = ftell(file);
 	if (fseek(file, 0, SEEK_SET) < 0) {
 		printf("Fail to get file size.\n");
-		return;
-	}
-
-	/**
-	 * No FIFO now. Only test memory. Limit sending file
-	 * size <= max_file_size.
-	 */
-	if (size > max_file_size) {
-		printf("Warning: The file is too large. Only send first"
-		       " %"PRIu64" bits.\n", max_file_size);
-		size = max_file_size;
-	}
-
-	buff = (uint8_t *)malloc(size);
-	rsize = fread(buff, size, 1, file);
-	if (rsize != 1) {
-		printf("Fail to read file.\n");
 		fclose(file);
-		free(buff);
 		return;
 	}
 
@@ -159,22 +250,63 @@ cmd_sendfile_parsed(void *parsed_result,
 	rte_rawdev_set_attr(dev_id, "spad_user_0", val);
 	val = size;
 	rte_rawdev_set_attr(dev_id, "spad_user_1", val);
+	printf("Sending file, size is %"PRIu64"\n", size);
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		pkts_send[i] = (struct rte_rawdev_buf *)
+				malloc(sizeof(struct rte_rawdev_buf));
+
+	buf_size = ntb_buf_size - RTE_PKTMBUF_HEADROOM;
+	count = (size + buf_size - 1) / buf_size;
+	nb_burst = (count + pkt_burst - 1) / pkt_burst;
 
-	pkts_send[0] = (struct rte_rawdev_buf *)malloc
-			(sizeof(struct rte_rawdev_buf));
-	pkts_send[0]->buf_addr = buff;
+	for (i = 0; i < nb_burst; i++) {
+		val = RTE_MIN(count, pkt_burst);
+		if (rte_mempool_get_bulk(mbuf_pool, (void **)mbuf_send,
+					val) == 0) {
+			for (nb_pkt = 0; nb_pkt < val; nb_pkt++) {
+				mbuf_send[nb_pkt]->port = dev_id;
+				mbuf_send[nb_pkt]->data_len =
+				fread(rte_pktmbuf_mtod(mbuf_send[nb_pkt],
+					void *), 1, buf_size, file);
+				mbuf_send[nb_pkt]->pkt_len =
+					mbuf_send[nb_pkt]->data_len;
+				pkts_send[nb_pkt]->buf_addr = mbuf_send[nb_pkt];
+			}
+		} else {
+			for (nb_pkt = 0; nb_pkt < val; nb_pkt++) {
+				mbuf_send[nb_pkt] =
+					rte_mbuf_raw_alloc(mbuf_pool);
+				if (mbuf_send[nb_pkt] == NULL)
+					break;
+				mbuf_send[nb_pkt]->port = dev_id;
+				mbuf_send[nb_pkt]->data_len =
+				fread(rte_pktmbuf_mtod(mbuf_send[nb_pkt],
+					void *), 1, buf_size, file);
+				mbuf_send[nb_pkt]->pkt_len =
+					mbuf_send[nb_pkt]->data_len;
+				pkts_send[nb_pkt]->buf_addr = mbuf_send[nb_pkt];
+			}
+		}
 
-	if (rte_rawdev_enqueue_buffers(dev_id, pkts_send, 1,
-				       (void *)(size_t)size)) {
-		printf("Fail to enqueue.\n");
-		goto clean;
+		nb_tx = rte_rawdev_enqueue_buffers(dev_id, pkts_send, nb_pkt,
+						   (void *)queue_id);
+		while (nb_tx != nb_pkt && retry < BURST_TX_RETRIES) {
+			rte_delay_us(1);
+			nb_tx += rte_rawdev_enqueue_buffers(dev_id,
+				&pkts_send[nb_tx], nb_pkt - nb_tx,
+				(void *)queue_id);
+		}
+		count -= nb_pkt;
 	}
+	/* Clear register after file sending done. */
+	rte_rawdev_set_attr(dev_id, "spad_user_0", 0);
+	rte_rawdev_set_attr(dev_id, "spad_user_1", 0);
 	printf("Done sending file.\n");
 
-clean:
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		free(pkts_send[i]);
 	fclose(file);
-	free(buff);
-	free(pkts_send[0]);
 }
 
 cmdline_parse_token_string_t cmd_send_file_send =
@@ -195,79 +327,680 @@ cmdline_parse_inst_t cmd_send_file = {
 	},
 };
 
-/* *** RECEIVE FILE PARAMETERS *** */
-struct cmd_recvfile_result {
-	cmdline_fixed_string_t recv_string;
-	char filepath[];
-};
+#define RECV_FILE_LEN 30
+static int
+start_polling_recv_file(void *param)
+{
+	struct rte_rawdev_buf *pkts_recv[NTB_MAX_PKT_BURST];
+	struct ntb_fwd_lcore_conf *conf = param;
+	struct rte_mbuf *mbuf;
+	char filepath[RECV_FILE_LEN];
+	uint64_t val, size, file_len;
+	uint16_t nb_rx, i, file_no;
+	size_t queue_id = 0;
+	FILE *file;
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		pkts_recv[i] = (struct rte_rawdev_buf *)
+				malloc(sizeof(struct rte_rawdev_buf));
+
+	file_no = 0;
+	while (!conf->stopped) {
+		snprintf(filepath, RECV_FILE_LEN, "ntb_recv_file%d", file_no);
+		file = fopen(filepath, "w");
+		if (file == NULL) {
+			printf("Fail to open the file.\n");
+			return -EINVAL;
+		}
+
+		rte_rawdev_get_attr(dev_id, "spad_user_0", &val);
+		size = val << 32;
+		rte_rawdev_get_attr(dev_id, "spad_user_1", &val);
+		size |= val;
+
+		if (!size) {
+			fclose(file);
+			continue;
+		}
+
+		file_len = 0;
+		nb_rx = NTB_MAX_PKT_BURST;
+		while (file_len < size && !conf->stopped) {
+			nb_rx = rte_rawdev_dequeue_buffers(dev_id, pkts_recv,
+						pkt_burst, (void *)queue_id);
+			ntb_port_stats[0].rx += nb_rx;
+			for (i = 0; i < nb_rx; i++) {
+				mbuf = pkts_recv[i]->buf_addr;
+				fwrite(rte_pktmbuf_mtod(mbuf, void *), 1,
+					mbuf->data_len, file);
+				file_len += mbuf->data_len;
+				rte_pktmbuf_free(mbuf);
+				pkts_recv[i]->buf_addr = NULL;
+			}
+		}
+
+		printf("Received file (size: %" PRIu64 ") from peer to %s.\n",
+			size, filepath);
+		fclose(file);
+		file_no++;
+	}
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		free(pkts_recv[i]);
+	return 0;
+}
+
+static int
+start_iofwd_per_lcore(void *param)
+{
+	struct rte_rawdev_buf *ntb_buf[NTB_MAX_PKT_BURST];
+	struct rte_mbuf *pkts_burst[NTB_MAX_PKT_BURST];
+	struct ntb_fwd_lcore_conf *conf = param;
+	struct ntb_fwd_stream fs;
+	uint16_t nb_rx, nb_tx;
+	int i, j;
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		ntb_buf[i] = (struct rte_rawdev_buf *)
+			     malloc(sizeof(struct rte_rawdev_buf));
+
+	while (!conf->stopped) {
+		for (i = 0; i < conf->nb_stream; i++) {
+			fs = fwd_streams[conf->stream_id + i];
+			if (fs.tx_ntb) {
+				nb_rx = rte_eth_rx_burst(fs.rx_port,
+						fs.qp_id, pkts_burst,
+						pkt_burst);
+				if (unlikely(nb_rx == 0))
+					continue;
+				for (j = 0; j < nb_rx; j++)
+					ntb_buf[j]->buf_addr = pkts_burst[j];
+				nb_tx =
+				rte_rawdev_enqueue_buffers(fs.tx_port,
+						ntb_buf, nb_rx,
+						(void *)(size_t)fs.qp_id);
+				ntb_port_stats[0].tx += nb_tx;
+				ntb_port_stats[1].rx += nb_rx;
+			} else {
+				nb_rx =
+				rte_rawdev_dequeue_buffers(fs.rx_port,
+						ntb_buf, pkt_burst,
+						(void *)(size_t)fs.qp_id);
+				if (unlikely(nb_rx == 0))
+					continue;
+				for (j = 0; j < nb_rx; j++)
+					pkts_burst[j] = ntb_buf[j]->buf_addr;
+				nb_tx = rte_eth_tx_burst(fs.tx_port,
+					fs.qp_id, pkts_burst, nb_rx);
+				ntb_port_stats[1].tx += nb_tx;
+				ntb_port_stats[0].rx += nb_rx;
+			}
+			if (unlikely(nb_tx < nb_rx)) {
+				do {
+					rte_pktmbuf_free(pkts_burst[nb_tx]);
+				} while (++nb_tx < nb_rx);
+			}
+		}
+	}
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		free(ntb_buf[i]);
+
+	return 0;
+}
+
+static int
+start_rxonly_per_lcore(void *param)
+{
+	struct rte_rawdev_buf *ntb_buf[NTB_MAX_PKT_BURST];
+	struct ntb_fwd_lcore_conf *conf = param;
+	struct ntb_fwd_stream fs;
+	uint16_t nb_rx;
+	int i, j;
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		ntb_buf[i] = (struct rte_rawdev_buf *)
+			     malloc(sizeof(struct rte_rawdev_buf));
+
+	while (!conf->stopped) {
+		for (i = 0; i < conf->nb_stream; i++) {
+			fs = fwd_streams[conf->stream_id + i];
+			nb_rx = rte_rawdev_dequeue_buffers(fs.rx_port,
+				ntb_buf, pkt_burst, (void *)(size_t)fs.qp_id);
+			if (unlikely(nb_rx == 0))
+				continue;
+			ntb_port_stats[0].rx += nb_rx;
+
+			for (j = 0; j < nb_rx; j++)
+				rte_pktmbuf_free(ntb_buf[j]->buf_addr);
+		}
+	}
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		free(ntb_buf[i]);
+
+	return 0;
+}
+
+
+static int
+start_txonly_per_lcore(void *param)
+{
+	struct rte_rawdev_buf *ntb_buf[NTB_MAX_PKT_BURST];
+	struct rte_mbuf *pkts_burst[NTB_MAX_PKT_BURST];
+	struct ntb_fwd_lcore_conf *conf = param;
+	struct ntb_fwd_stream fs;
+	uint16_t nb_pkt, nb_tx;
+	int i;
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		ntb_buf[i] = (struct rte_rawdev_buf *)
+			     malloc(sizeof(struct rte_rawdev_buf));
+
+	while (!conf->stopped) {
+		for (i = 0; i < conf->nb_stream; i++) {
+			fs = fwd_streams[conf->stream_id + i];
+			if (rte_mempool_get_bulk(mbuf_pool, (void **)pkts_burst,
+				  pkt_burst) == 0) {
+				for (nb_pkt = 0; nb_pkt < pkt_burst; nb_pkt++) {
+					pkts_burst[nb_pkt]->port = dev_id;
+					pkts_burst[nb_pkt]->data_len =
+						pkts_burst[nb_pkt]->buf_len -
+						RTE_PKTMBUF_HEADROOM;
+					pkts_burst[nb_pkt]->pkt_len =
+						pkts_burst[nb_pkt]->data_len;
+					ntb_buf[nb_pkt]->buf_addr =
+						pkts_burst[nb_pkt];
+				}
+			} else {
+				for (nb_pkt = 0; nb_pkt < pkt_burst; nb_pkt++) {
+					pkts_burst[nb_pkt] =
+						rte_pktmbuf_alloc(mbuf_pool);
+					if (pkts_burst[nb_pkt] == NULL)
+						break;
+					pkts_burst[nb_pkt]->port = dev_id;
+					pkts_burst[nb_pkt]->data_len =
+						pkts_burst[nb_pkt]->buf_len -
+						RTE_PKTMBUF_HEADROOM;
+					pkts_burst[nb_pkt]->pkt_len =
+						pkts_burst[nb_pkt]->data_len;
+					ntb_buf[nb_pkt]->buf_addr =
+						pkts_burst[nb_pkt];
+				}
+			}
+			nb_tx = rte_rawdev_enqueue_buffers(fs.tx_port,
+				ntb_buf, nb_pkt, (void *)(size_t)fs.qp_id);
+			ntb_port_stats[0].tx += nb_tx;
+			if (unlikely(nb_tx < nb_pkt)) {
+				do {
+					rte_pktmbuf_free(
+						ntb_buf[nb_tx]->buf_addr);
+				} while (++nb_tx < nb_pkt);
+			}
+		}
+	}
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		free(ntb_buf[i]);
+
+	return 0;
+}
+
+static int
+ntb_fwd_config_setup(void)
+{
+	uint16_t i;
+
+	/* Make sure iofwd has valid ethdev. */
+	if (fwd_mode == IOFWD && eth_port_id >= RTE_MAX_ETHPORTS) {
+		printf("No ethdev, cannot be in iofwd mode.");
+		return -EINVAL;
+	}
+
+	if (fwd_mode == IOFWD) {
+		fwd_streams = rte_zmalloc("ntb_fwd: fwd_streams",
+			sizeof(struct ntb_fwd_stream) * num_queues * 2,
+			RTE_CACHE_LINE_SIZE);
+		for (i = 0; i < num_queues; i++) {
+			fwd_streams[i * 2].qp_id = i;
+			fwd_streams[i * 2].tx_port = dev_id;
+			fwd_streams[i * 2].rx_port = eth_port_id;
+			fwd_streams[i * 2].tx_ntb = 1;
+
+			fwd_streams[i * 2 + 1].qp_id = i;
+			fwd_streams[i * 2 + 1].tx_port = eth_port_id;
+			fwd_streams[i * 2 + 1].rx_port = dev_id;
+			fwd_streams[i * 2 + 1].tx_ntb = 0;
+		}
+		return 0;
+	}
+
+	if (fwd_mode == RXONLY || fwd_mode == FILE_TRANS) {
+		/* Only support 1 queue in file-trans for in order. */
+		if (fwd_mode == FILE_TRANS)
+			num_queues = 1;
+
+		fwd_streams = rte_zmalloc("ntb_fwd: fwd_streams",
+				sizeof(struct ntb_fwd_stream) * num_queues,
+				RTE_CACHE_LINE_SIZE);
+		for (i = 0; i < num_queues; i++) {
+			fwd_streams[i].qp_id = i;
+			fwd_streams[i].tx_port = RTE_MAX_ETHPORTS;
+			fwd_streams[i].rx_port = dev_id;
+			fwd_streams[i].tx_ntb = 0;
+		}
+		return 0;
+	}
+
+	if (fwd_mode == TXONLY) {
+		fwd_streams = rte_zmalloc("ntb_fwd: fwd_streams",
+				sizeof(struct ntb_fwd_stream) * num_queues,
+				RTE_CACHE_LINE_SIZE);
+		for (i = 0; i < num_queues; i++) {
+			fwd_streams[i].qp_id = i;
+			fwd_streams[i].tx_port = dev_id;
+			fwd_streams[i].rx_port = RTE_MAX_ETHPORTS;
+			fwd_streams[i].tx_ntb = 1;
+		}
+	}
+	return 0;
+}
 
 static void
-cmd_recvfile_parsed(void *parsed_result,
-		    __attribute__((unused)) struct cmdline *cl,
-		    __attribute__((unused)) void *data)
+assign_stream_to_lcores(void)
 {
-	struct cmd_sendfile_result *res = parsed_result;
-	struct rte_rawdev_buf *pkts_recv[1];
-	uint8_t *buff;
-	uint64_t val;
-	size_t size;
-	FILE *file;
+	struct ntb_fwd_lcore_conf *conf;
+	struct ntb_fwd_stream *fs;
+	uint16_t nb_streams, sm_per_lcore, sm_id, i;
+	uint8_t lcore_id, lcore_num, nb_extra;
 
-	if (!rte_rawdevs[dev_id].started) {
-		printf("Device needs to be up first. Try later.\n");
-		return;
+	lcore_num = rte_lcore_count();
+	/* Exclude master core */
+	lcore_num--;
+
+	nb_streams = (fwd_mode == IOFWD) ? num_queues * 2 : num_queues;
+
+	sm_per_lcore = nb_streams / lcore_num;
+	nb_extra = nb_streams % lcore_num;
+	sm_id = 0;
+	i = 0;
+
+	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
+		conf = &fwd_lcore_conf[lcore_id];
+
+		if (i < nb_extra) {
+			conf->nb_stream = sm_per_lcore + 1;
+			conf->stream_id = sm_id;
+			sm_id = sm_id + sm_per_lcore + 1;
+		} else {
+			conf->nb_stream = sm_per_lcore;
+			conf->stream_id = sm_id;
+			sm_id = sm_id + sm_per_lcore;
+		}
+
+		i++;
+		if (sm_id >= nb_streams)
+			break;
+	}
+
+	/* Print packet forwading config. */
+	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
+		conf = &fwd_lcore_conf[lcore_id];
+
+		if (!conf->nb_stream)
+			continue;
+
+		printf("Streams on Lcore %u :\n", lcore_id);
+		for (i = 0; i < conf->nb_stream; i++) {
+			fs = &fwd_streams[conf->stream_id + i];
+			if (fwd_mode == IOFWD)
+				printf(" + Stream %u : %s%u RX -> %s%u TX,"
+					" Q=%u\n", conf->stream_id + i,
+					fs->tx_ntb ? "Eth" : "NTB", fs->rx_port,
+					fs->tx_ntb ? "NTB" : "Eth", fs->tx_port,
+					fs->qp_id);
+			if (fwd_mode == FILE_TRANS || fwd_mode == RXONLY)
+				printf(" + Stream %u : %s%u RX only\n",
+					conf->stream_id, "NTB", fs->rx_port);
+			if (fwd_mode == TXONLY)
+				printf(" + Stream %u : %s%u TX only\n",
+					conf->stream_id, "NTB", fs->tx_port);
+		}
 	}
+}
 
-	rte_rawdev_get_attr(dev_id, "link_status", &val);
-	if (!val) {
-		printf("Link is not up, cannot receive file.\n");
+static void
+start_pkt_fwd(void)
+{
+	struct ntb_fwd_lcore_conf *conf;
+	struct rte_eth_link eth_link;
+	uint8_t lcore_id;
+	int ret, i;
+
+	ret = ntb_fwd_config_setup();
+	if (ret < 0) {
+		printf("Cannot start traffic. Please reset fwd mode.\n");
 		return;
 	}
 
-	file = fopen(res->filepath, "w");
-	if (file == NULL) {
-		printf("Fail to open the file.\n");
+	/* If using iofwd, checking ethdev link status first. */
+	if (fwd_mode == IOFWD) {
+		printf("Checking eth link status...\n");
+		/* Wait for eth link up at most 100 times. */
+		for (i = 0; i < 100; i++) {
+			rte_eth_link_get(eth_port_id, &eth_link);
+			if (eth_link.link_status) {
+				printf("Eth%u Link Up. Speed %u Mbps - %s\n",
+					eth_port_id, eth_link.link_speed,
+					(eth_link.link_duplex ==
+					 ETH_LINK_FULL_DUPLEX) ?
+					("full-duplex") : ("half-duplex"));
+				break;
+			}
+		}
+		if (!eth_link.link_status) {
+			printf("Eth%u link down. Cannot start traffic.\n",
+				eth_port_id);
+			return;
+		}
+	}
+
+	assign_stream_to_lcores();
+	in_test = 1;
+
+	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
+		conf = &fwd_lcore_conf[lcore_id];
+
+		if (!conf->nb_stream)
+			continue;
+
+		conf->stopped = 0;
+		if (fwd_mode == FILE_TRANS)
+			rte_eal_remote_launch(start_polling_recv_file,
+					      conf, lcore_id);
+		else if (fwd_mode == IOFWD)
+			rte_eal_remote_launch(start_iofwd_per_lcore,
+					      conf, lcore_id);
+		else if (fwd_mode == RXONLY)
+			rte_eal_remote_launch(start_rxonly_per_lcore,
+					      conf, lcore_id);
+		else if (fwd_mode == TXONLY)
+			rte_eal_remote_launch(start_txonly_per_lcore,
+					      conf, lcore_id);
+	}
+}
+
+/* *** START FWD PARAMETERS *** */
+struct cmd_start_result {
+	cmdline_fixed_string_t start;
+};
+
+static void
+cmd_start_parsed(__attribute__((unused)) void *parsed_result,
+			    __attribute__((unused)) struct cmdline *cl,
+			    __attribute__((unused)) void *data)
+{
+	start_pkt_fwd();
+}
+
+cmdline_parse_token_string_t cmd_start_start =
+		TOKEN_STRING_INITIALIZER(struct cmd_start_result, start, "start");
+
+cmdline_parse_inst_t cmd_start = {
+	.f = cmd_start_parsed,
+	.data = NULL,
+	.help_str = "start pkt fwd between ntb and ethdev",
+	.tokens = {
+		(void *)&cmd_start_start,
+		NULL,
+	},
+};
+
+/* *** STOP *** */
+struct cmd_stop_result {
+	cmdline_fixed_string_t stop;
+};
+
+static void
+cmd_stop_parsed(__attribute__((unused)) void *parsed_result,
+		__attribute__((unused)) struct cmdline *cl,
+		__attribute__((unused)) void *data)
+{
+	struct ntb_fwd_lcore_conf *conf;
+	uint8_t lcore_id;
+
+	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
+		conf = &fwd_lcore_conf[lcore_id];
+
+		if (!conf->nb_stream)
+			continue;
+
+		if (conf->stopped)
+			continue;
+
+		conf->stopped = 1;
+	}
+	printf("\nWaiting for lcores to finish...\n");
+	rte_eal_mp_wait_lcore();
+	in_test = 0;
+	printf("\nDone.\n");
+}
+
+cmdline_parse_token_string_t cmd_stop_stop =
+		TOKEN_STRING_INITIALIZER(struct cmd_stop_result, stop, "stop");
+
+cmdline_parse_inst_t cmd_stop = {
+	.f = cmd_stop_parsed,
+	.data = NULL,
+	.help_str = "stop: Stop packet forwarding",
+	.tokens = {
+		(void *)&cmd_stop_stop,
+		NULL,
+	},
+};
+
+static void
+ntb_stats_clear(void)
+{
+	int nb_ids, i;
+	uint32_t *ids;
+
+	/* Clear NTB dev stats */
+	nb_ids = rte_rawdev_xstats_names_get(dev_id, NULL, 0);
+	if (nb_ids  < 0) {
+		printf("Error: Cannot get count of xstats\n");
 		return;
 	}
+	ids = malloc(sizeof(uint32_t) * nb_ids);
+	for (i = 0; i < nb_ids; i++)
+		ids[i] = i;
+	rte_rawdev_xstats_reset(dev_id, ids, nb_ids);
+	printf("\n  statistics for NTB port %d cleared\n", dev_id);
+
+	/* Clear Ethdev stats if have any */
+	if (fwd_mode == IOFWD && eth_port_id != RTE_MAX_ETHPORTS) {
+		rte_eth_stats_reset(eth_port_id);
+		printf("\n  statistics for ETH port %d cleared\n", eth_port_id);
+	}
+}
+
+static inline void
+ntb_calculate_throughput(uint16_t port) {
+	uint64_t diff_pkts_rx, diff_pkts_tx, diff_cycles;
+	uint64_t mpps_rx, mpps_tx;
+	static uint64_t prev_pkts_rx[2];
+	static uint64_t prev_pkts_tx[2];
+	static uint64_t prev_cycles[2];
+
+	diff_cycles = prev_cycles[port];
+	prev_cycles[port] = rte_rdtsc();
+	if (diff_cycles > 0)
+		diff_cycles = prev_cycles[port] - diff_cycles;
+	diff_pkts_rx = (ntb_port_stats[port].rx > prev_pkts_rx[port]) ?
+		(ntb_port_stats[port].rx - prev_pkts_rx[port]) : 0;
+	diff_pkts_tx = (ntb_port_stats[port].tx > prev_pkts_tx[port]) ?
+		(ntb_port_stats[port].tx - prev_pkts_tx[port]) : 0;
+	prev_pkts_rx[port] = ntb_port_stats[port].rx;
+	prev_pkts_tx[port] = ntb_port_stats[port].tx;
+	mpps_rx = diff_cycles > 0 ?
+		diff_pkts_rx * rte_get_tsc_hz() / diff_cycles : 0;
+	mpps_tx = diff_cycles > 0 ?
+		diff_pkts_tx * rte_get_tsc_hz() / diff_cycles : 0;
+	printf("  Throughput (since last show)\n");
+	printf("  Rx-pps: %12"PRIu64"\n  Tx-pps: %12"PRIu64"\n",
+			mpps_rx, mpps_tx);
+
+}
+
+static void
+ntb_stats_display(void)
+{
+	struct rte_rawdev_xstats_name *xstats_names;
+	struct rte_eth_stats stats;
+	uint64_t *values;
+	uint32_t *ids;
+	int nb_ids, i;
 
-	rte_rawdev_get_attr(dev_id, "spad_user_0", &val);
-	size = val << 32;
-	rte_rawdev_get_attr(dev_id, "spad_user_1", &val);
-	size |= val;
+	printf("###### statistics for NTB port %d #######\n", dev_id);
 
-	buff = (uint8_t *)malloc(size);
-	pkts_recv[0] = (struct rte_rawdev_buf *)malloc
-			(sizeof(struct rte_rawdev_buf));
-	pkts_recv[0]->buf_addr = buff;
+	/* Get NTB dev stats and stats names */
+	nb_ids = rte_rawdev_xstats_names_get(dev_id, NULL, 0);
+	if (nb_ids  < 0) {
+		printf("Error: Cannot get count of xstats\n");
+		return;
+	}
+	xstats_names = malloc(sizeof(struct rte_rawdev_xstats_name) * nb_ids);
+	if (xstats_names == NULL) {
+		printf("Cannot allocate memory for xstats lookup\n");
+		return;
+	}
+	if (nb_ids != rte_rawdev_xstats_names_get(
+			dev_id, xstats_names, nb_ids)) {
+		printf("Error: Cannot get xstats lookup\n");
+		free(xstats_names);
+		return;
+	}
+	ids = malloc(sizeof(uint32_t) * nb_ids);
+	for (i = 0; i < nb_ids; i++)
+		ids[i] = i;
+	values = malloc(sizeof(uint64_t) * nb_ids);
+	if (nb_ids != rte_rawdev_xstats_get(dev_id, ids, values, nb_ids)) {
+		printf("Error: Unable to get xstats\n");
+		free(xstats_names);
+		free(values);
+		free(ids);
+		return;
+	}
+
+	/* Display NTB dev stats */
+	for (i = 0; i < nb_ids; i++)
+		printf("  %s: %"PRIu64"\n", xstats_names[i].name, values[i]);
+	ntb_calculate_throughput(0);
 
-	if (rte_rawdev_dequeue_buffers(dev_id, pkts_recv, 1, (void *)size)) {
-		printf("Fail to dequeue.\n");
-		goto clean;
+	/* Get Ethdev stats if have any */
+	if (fwd_mode == IOFWD && eth_port_id != RTE_MAX_ETHPORTS) {
+		printf("###### statistics for ETH port %d ######\n",
+			eth_port_id);
+		rte_eth_stats_get(eth_port_id, &stats);
+		printf("  RX-packets: %"PRIu64"\n", stats.ipackets);
+		printf("  RX-bytes: %"PRIu64"\n", stats.ibytes);
+		printf("  RX-errors: %"PRIu64"\n", stats.ierrors);
+		printf("  RX-missed: %"PRIu64"\n", stats.imissed);
+		printf("  TX-packets: %"PRIu64"\n", stats.opackets);
+		printf("  TX-bytes: %"PRIu64"\n", stats.obytes);
+		printf("  TX-errors: %"PRIu64"\n", stats.oerrors);
+		ntb_calculate_throughput(1);
 	}
 
-	fwrite(buff, size, 1, file);
-	printf("Done receiving to file.\n");
+	free(xstats_names);
+	free(values);
+	free(ids);
+}
 
-clean:
-	fclose(file);
-	free(buff);
-	free(pkts_recv[0]);
+/* *** SHOW/CLEAR PORT STATS *** */
+struct cmd_stats_result {
+	cmdline_fixed_string_t show;
+	cmdline_fixed_string_t port;
+	cmdline_fixed_string_t stats;
+};
+
+static void
+cmd_stats_parsed(void *parsed_result,
+		 __attribute__((unused)) struct cmdline *cl,
+		 __attribute__((unused)) void *data)
+{
+	struct cmd_stats_result *res = parsed_result;
+	if (!strcmp(res->show, "clear"))
+		ntb_stats_clear();
+	else
+		ntb_stats_display();
 }
 
-cmdline_parse_token_string_t cmd_recv_file_recv =
-	TOKEN_STRING_INITIALIZER(struct cmd_recvfile_result, recv_string,
-				 "recv");
-cmdline_parse_token_string_t cmd_recv_file_filepath =
-	TOKEN_STRING_INITIALIZER(struct cmd_recvfile_result, filepath, NULL);
+cmdline_parse_token_string_t cmd_stats_show =
+	TOKEN_STRING_INITIALIZER(struct cmd_stats_result, show, "show#clear");
+cmdline_parse_token_string_t cmd_stats_port =
+	TOKEN_STRING_INITIALIZER(struct cmd_stats_result, port, "port");
+cmdline_parse_token_string_t cmd_stats_stats =
+	TOKEN_STRING_INITIALIZER(struct cmd_stats_result, stats, "stats");
 
 
-cmdline_parse_inst_t cmd_recv_file = {
-	.f = cmd_recvfile_parsed,
+cmdline_parse_inst_t cmd_stats = {
+	.f = cmd_stats_parsed,
 	.data = NULL,
-	.help_str = "recv <file_path>",
+	.help_str = "show|clear port stats",
 	.tokens = {
-		(void *)&cmd_recv_file_recv,
-		(void *)&cmd_recv_file_filepath,
+		(void *)&cmd_stats_show,
+		(void *)&cmd_stats_port,
+		(void *)&cmd_stats_stats,
+		NULL,
+	},
+};
+
+/* *** SET FORWARDING MODE *** */
+struct cmd_set_fwd_mode_result {
+	cmdline_fixed_string_t set;
+	cmdline_fixed_string_t fwd;
+	cmdline_fixed_string_t mode;
+};
+
+static void
+cmd_set_fwd_mode_parsed(__attribute__((unused)) void *parsed_result,
+			__attribute__((unused)) struct cmdline *cl,
+			__attribute__((unused)) void *data)
+{
+	struct cmd_set_fwd_mode_result *res = parsed_result;
+	int i;
+
+	if (in_test) {
+		printf("Please stop traffic first.\n");
+		return;
+	}
+
+	for (i = 0; i < MAX_FWD_MODE; i++) {
+		if (!strcmp(res->mode, fwd_mode_s[i])) {
+			fwd_mode = i;
+			return;
+		}
+	}
+	printf("Invalid %s packet forwarding mode.\n", res->mode);
+}
+
+cmdline_parse_token_string_t cmd_setfwd_set =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_fwd_mode_result, set, "set");
+cmdline_parse_token_string_t cmd_setfwd_fwd =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_fwd_mode_result, fwd, "fwd");
+cmdline_parse_token_string_t cmd_setfwd_mode =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_fwd_mode_result, mode,
+				"file-trans#iofwd#txonly#rxonly");
+
+cmdline_parse_inst_t cmd_set_fwd_mode = {
+	.f = cmd_set_fwd_mode_parsed,
+	.data = NULL,
+	.help_str = "set forwarding mode as file-trans|rxonly|txonly|iofwd",
+	.tokens = {
+		(void *)&cmd_setfwd_set,
+		(void *)&cmd_setfwd_fwd,
+		(void *)&cmd_setfwd_mode,
 		NULL,
 	},
 };
@@ -276,7 +1009,10 @@ cmdline_parse_inst_t cmd_recv_file = {
 cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_help,
 	(cmdline_parse_inst_t *)&cmd_send_file,
-	(cmdline_parse_inst_t *)&cmd_recv_file,
+	(cmdline_parse_inst_t *)&cmd_start,
+	(cmdline_parse_inst_t *)&cmd_stop,
+	(cmdline_parse_inst_t *)&cmd_stats,
+	(cmdline_parse_inst_t *)&cmd_set_fwd_mode,
 	(cmdline_parse_inst_t *)&cmd_quit,
 	NULL,
 };
@@ -305,45 +1041,257 @@ signal_handler(int signum)
 	}
 }
 
+#define OPT_BUF_SIZE         "buf-size"
+#define OPT_FWD_MODE         "fwd-mode"
+#define OPT_NB_DESC          "nb-desc"
+#define OPT_TXFREET          "txfreet"
+#define OPT_BURST            "burst"
+#define OPT_QP               "qp"
+
+enum {
+	/* long options mapped to a short option */
+	OPT_NO_ZERO_COPY_NUM = 1,
+	OPT_BUF_SIZE_NUM,
+	OPT_FWD_MODE_NUM,
+	OPT_NB_DESC_NUM,
+	OPT_TXFREET_NUM,
+	OPT_BURST_NUM,
+	OPT_QP_NUM,
+};
+
+static const char short_options[] =
+	"i" /* interactive mode */
+	;
+
+static const struct option lgopts[] = {
+	{OPT_BUF_SIZE,     1, NULL, OPT_BUF_SIZE_NUM     },
+	{OPT_FWD_MODE,     1, NULL, OPT_FWD_MODE_NUM     },
+	{OPT_NB_DESC,      1, NULL, OPT_NB_DESC_NUM      },
+	{OPT_TXFREET,      1, NULL, OPT_TXFREET_NUM      },
+	{OPT_BURST,        1, NULL, OPT_BURST_NUM        },
+	{OPT_QP,           1, NULL, OPT_QP_NUM           },
+	{0,                0, NULL, 0                    }
+};
+
 static void
 ntb_usage(const char *prgname)
 {
 	printf("%s [EAL options] -- [options]\n"
-	       "-i : run in interactive mode (default value is 1)\n",
-	       prgname);
+	       "-i: run in interactive mode.\n"
+	       "-qp=N: set number of queues as N (N > 0, default: 1).\n"
+	       "--fwd-mode=N: set fwd mode (N: file-trans | rxonly | "
+	       "txonly | iofwd, default: file-trans)\n"
+	       "--buf-size=N: set mbuf dataroom size as N (0 < N < 65535,"
+	       " default: 2048).\n"
+	       "--nb-desc=N: set number of descriptors as N (%u <= N <= %u,"
+	       " default: 1024).\n"
+	       "--txfreet=N: set tx free thresh for NTB driver as N. (N >= 0)\n"
+	       "--burst=N: set pkt burst as N (0 < N <= %u default: 32).\n",
+	       prgname, NTB_MIN_DESC_SIZE, NTB_MAX_DESC_SIZE,
+	       NTB_MAX_PKT_BURST);
 }
 
-static int
-parse_args(int argc, char **argv)
+static void
+ntb_parse_args(int argc, char **argv)
 {
 	char *prgname = argv[0], **argvopt = argv;
-	int opt, ret;
+	int opt, opt_idx, n, i;
 
-	/* Only support interactive mode to send/recv file first. */
-	while ((opt = getopt(argc, argvopt, "i")) != EOF) {
+	while ((opt = getopt_long(argc, argvopt, short_options,
+				lgopts, &opt_idx)) != EOF) {
 		switch (opt) {
 		case 'i':
-			printf("Interactive-mode selected\n");
+			printf("Interactive-mode selected.\n");
 			interactive = 1;
 			break;
+		case OPT_QP_NUM:
+			n = atoi(optarg);
+			if (n > 0)
+				num_queues = n;
+			else
+				rte_exit(EXIT_FAILURE, "q must be > 0.\n");
+			break;
+		case OPT_BUF_SIZE_NUM:
+			n = atoi(optarg);
+			if (n > RTE_PKTMBUF_HEADROOM && n <= 0xFFFF)
+				ntb_buf_size = n;
+			else
+				rte_exit(EXIT_FAILURE, "buf-size must be > "
+					"%u and < 65536.\n",
+					RTE_PKTMBUF_HEADROOM);
+			break;
+		case OPT_FWD_MODE_NUM:
+			for (i = 0; i < MAX_FWD_MODE; i++) {
+				if (!strcmp(optarg, fwd_mode_s[i])) {
+					fwd_mode = i;
+					break;
+				}
+			}
+			if (i == MAX_FWD_MODE)
+				rte_exit(EXIT_FAILURE, "Unsupported mode. "
+				"(Should be: file-trans | rxonly | txonly "
+				"| iofwd)\n");
+			break;
+		case OPT_NB_DESC_NUM:
+			n = atoi(optarg);
+			if (n >= NTB_MIN_DESC_SIZE && n <= NTB_MAX_DESC_SIZE)
+				nb_desc = n;
+			else
+				rte_exit(EXIT_FAILURE, "nb-desc must be within"
+					" [%u, %u].\n", NTB_MIN_DESC_SIZE,
+					NTB_MAX_DESC_SIZE);
+			break;
+		case OPT_TXFREET_NUM:
+			n = atoi(optarg);
+			if (n >= 0)
+				tx_free_thresh = n;
+			else
+				rte_exit(EXIT_FAILURE, "txfreet must be"
+					" >= 0\n");
+			break;
+		case OPT_BURST_NUM:
+			n = atoi(optarg);
+			if (n > 0 && n <= NTB_MAX_PKT_BURST)
+				pkt_burst = n;
+			else
+				rte_exit(EXIT_FAILURE, "burst must be within "
+					"(0, %u].\n", NTB_MAX_PKT_BURST);
+			break;
 
 		default:
 			ntb_usage(prgname);
-			return -1;
+			rte_exit(EXIT_FAILURE,
+				 "Command line is incomplete or incorrect.\n");
+			break;
 		}
 	}
+}
 
-	if (optind >= 0)
-		argv[optind-1] = prgname;
+static void
+ntb_mempool_mz_free(__rte_unused struct rte_mempool_memhdr *memhdr,
+		void *opaque)
+{
+	const struct rte_memzone *mz = opaque;
+	rte_memzone_free(mz);
+}
 
-	ret = optind-1;
-	optind = 1; /* reset getopt lib */
-	return ret;
+static struct rte_mempool *
+ntb_mbuf_pool_create(uint16_t mbuf_seg_size, uint32_t nb_mbuf,
+		     struct ntb_dev_info ntb_info,
+		     struct ntb_dev_config *ntb_conf,
+		     unsigned int socket_id)
+{
+	size_t mz_len, total_elt_sz, max_mz_len, left_sz;
+	struct rte_pktmbuf_pool_private mbp_priv;
+	char pool_name[RTE_MEMPOOL_NAMESIZE];
+	char mz_name[RTE_MEMZONE_NAMESIZE];
+	const struct rte_memzone *mz;
+	struct rte_mempool *mp;
+	uint64_t align;
+	uint32_t mz_id;
+	int ret;
+
+	snprintf(pool_name, sizeof(pool_name), "ntb_mbuf_pool_%u", socket_id);
+	mp = rte_mempool_create_empty(pool_name, nb_mbuf,
+				      (mbuf_seg_size + sizeof(struct rte_mbuf)),
+				      MEMPOOL_CACHE_SIZE,
+				      sizeof(struct rte_pktmbuf_pool_private),
+				      socket_id, 0);
+	if (mp == NULL)
+		return NULL;
+
+	mbp_priv.mbuf_data_room_size = mbuf_seg_size;
+	mbp_priv.mbuf_priv_size = 0;
+	rte_pktmbuf_pool_init(mp, &mbp_priv);
+
+	ntb_conf->mz_list = rte_zmalloc("ntb_memzone_list",
+				sizeof(struct rte_memzone *) *
+				ntb_info.mw_cnt, 0);
+	if (ntb_conf->mz_list == NULL)
+		goto fail;
+
+	/* Put ntb header on mw0. */
+	if (ntb_info.mw_size[0] < ntb_info.ntb_hdr_size) {
+		printf("mw0 (size: %" PRIu64 ") is not enough for ntb hdr"
+		       " (size: %u)\n", ntb_info.mw_size[0],
+		       ntb_info.ntb_hdr_size);
+		goto fail;
+	}
+
+	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
+	left_sz = total_elt_sz * nb_mbuf;
+	for (mz_id = 0; mz_id < ntb_info.mw_cnt; mz_id++) {
+		/* If populated mbuf is enough, no need to reserve extra mz. */
+		if (!left_sz)
+			break;
+		snprintf(mz_name, sizeof(mz_name), "ntb_mw_%d", mz_id);
+		align = ntb_info.mw_size_align ? ntb_info.mw_size[mz_id] :
+			RTE_CACHE_LINE_SIZE;
+		/* Reserve ntb header space on memzone 0. */
+		max_mz_len = mz_id ? ntb_info.mw_size[mz_id] :
+			     ntb_info.mw_size[mz_id] - ntb_info.ntb_hdr_size;
+		mz_len = left_sz <= max_mz_len ? left_sz :
+			(max_mz_len / total_elt_sz * total_elt_sz);
+		if (!mz_len)
+			continue;
+		mz = rte_memzone_reserve_aligned(mz_name, mz_len, socket_id,
+					RTE_MEMZONE_IOVA_CONTIG, align);
+		if (mz == NULL) {
+			printf("Cannot allocate %" PRIu64 " aligned memzone"
+				" %u\n", align, mz_id);
+			goto fail;
+		}
+		left_sz -= mz_len;
+
+		/* Reserve ntb header space on memzone 0. */
+		if (mz_id)
+			ret = rte_mempool_populate_iova(mp, mz->addr, mz->iova,
+					mz->len, ntb_mempool_mz_free,
+					(void *)(uintptr_t)mz);
+		else
+			ret = rte_mempool_populate_iova(mp,
+					(void *)((size_t)mz->addr +
+					ntb_info.ntb_hdr_size),
+					mz->iova + ntb_info.ntb_hdr_size,
+					mz->len - ntb_info.ntb_hdr_size,
+					ntb_mempool_mz_free,
+					(void *)(uintptr_t)mz);
+		if (ret < 0) {
+			rte_memzone_free(mz);
+			rte_mempool_free(mp);
+			return NULL;
+		}
+
+		ntb_conf->mz_list[mz_id] = mz;
+	}
+	if (left_sz) {
+		printf("mw space is not enough for mempool.\n");
+		goto fail;
+	}
+
+	ntb_conf->mz_num = mz_id;
+	rte_mempool_obj_iter(mp, rte_pktmbuf_init, NULL);
+
+	return mp;
+fail:
+	rte_mempool_free(mp);
+	return NULL;
 }
 
 int
 main(int argc, char **argv)
 {
+	struct rte_eth_conf eth_pconf = eth_port_conf;
+	struct rte_rawdev_info ntb_rawdev_conf;
+	struct rte_rawdev_info ntb_rawdev_info;
+	struct rte_eth_dev_info ethdev_info;
+	struct rte_eth_rxconf eth_rx_conf;
+	struct rte_eth_txconf eth_tx_conf;
+	struct ntb_queue_conf ntb_q_conf;
+	struct ntb_dev_config ntb_conf;
+	struct ntb_dev_info ntb_info;
+	uint64_t ntb_link_status;
+	uint32_t nb_mbuf;
 	int ret, i;
 
 	signal(SIGINT, signal_handler);
@@ -353,6 +1301,9 @@ main(int argc, char **argv)
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "Error with EAL initialization.\n");
 
+	if (rte_lcore_count() < 2)
+		rte_exit(EXIT_FAILURE, "Need at least 2 cores\n");
+
 	/* Find 1st ntb rawdev. */
 	for (i = 0; i < RTE_RAWDEV_MAX_DEVS; i++)
 		if (rte_rawdevs[i].driver_name &&
@@ -368,15 +1319,118 @@ main(int argc, char **argv)
 	argc -= ret;
 	argv += ret;
 
-	ret = parse_args(argc, argv);
+	ntb_parse_args(argc, argv);
+
+	rte_rawdev_set_attr(dev_id, NTB_QUEUE_SZ_NAME, nb_desc);
+	printf("Set queue size as %u.\n", nb_desc);
+	rte_rawdev_set_attr(dev_id, NTB_QUEUE_NUM_NAME, num_queues);
+	printf("Set queue number as %u.\n", num_queues);
+	ntb_rawdev_info.dev_private = (rte_rawdev_obj_t)(&ntb_info);
+	rte_rawdev_info_get(dev_id, &ntb_rawdev_info);
+
+	nb_mbuf = nb_desc * num_queues * 2 * 2 + rte_lcore_count() *
+		  MEMPOOL_CACHE_SIZE;
+	mbuf_pool = ntb_mbuf_pool_create(ntb_buf_size, nb_mbuf, ntb_info,
+					 &ntb_conf, rte_socket_id());
+	if (mbuf_pool == NULL)
+		rte_exit(EXIT_FAILURE, "Cannot create mbuf pool.\n");
+
+	ntb_conf.num_queues = num_queues;
+	ntb_conf.queue_size = nb_desc;
+	ntb_rawdev_conf.dev_private = (rte_rawdev_obj_t)(&ntb_conf);
+	ret = rte_rawdev_configure(dev_id, &ntb_rawdev_conf);
+	if (ret)
+		rte_exit(EXIT_FAILURE, "Can't config ntb dev: err=%d, "
+			"port=%u\n", ret, dev_id);
+
+	ntb_q_conf.tx_free_thresh = tx_free_thresh;
+	ntb_q_conf.nb_desc = nb_desc;
+	ntb_q_conf.rx_mp = mbuf_pool;
+	for (i = 0; i < num_queues; i++) {
+		/* Setup rawdev queue */
+		ret = rte_rawdev_queue_setup(dev_id, i, &ntb_q_conf);
+		if (ret < 0)
+			rte_exit(EXIT_FAILURE,
+				"Failed to setup ntb queue %u.\n", i);
+	}
+
+	/* Waiting for peer dev up at most 100s.*/
+	printf("Checking ntb link status...\n");
+	for (i = 0; i < 1000; i++) {
+		rte_rawdev_get_attr(dev_id, NTB_LINK_STATUS_NAME,
+				    &ntb_link_status);
+		if (ntb_link_status) {
+			printf("Peer dev ready, ntb link up.\n");
+			break;
+		}
+		rte_delay_ms(100);
+	}
+	rte_rawdev_get_attr(dev_id, NTB_LINK_STATUS_NAME, &ntb_link_status);
+	if (ntb_link_status == 0)
+		printf("Expire 100s. Link is not up. Please restart app.\n");
+
+	ret = rte_rawdev_start(dev_id);
 	if (ret < 0)
-		rte_exit(EXIT_FAILURE, "Invalid arguments\n");
+		rte_exit(EXIT_FAILURE, "rte_rawdev_start: err=%d, port=%u\n",
+			ret, dev_id);
+
+	/* Find 1st ethdev */
+	eth_port_id = rte_eth_find_next(0);
 
-	rte_rawdev_start(dev_id);
+	if (eth_port_id < RTE_MAX_ETHPORTS) {
+		rte_eth_dev_info_get(eth_port_id, &ethdev_info);
+		eth_pconf.rx_adv_conf.rss_conf.rss_hf &=
+				ethdev_info.flow_type_rss_offloads;
+		ret = rte_eth_dev_configure(eth_port_id, num_queues,
+					    num_queues, &eth_pconf);
+		if (ret)
+			rte_exit(EXIT_FAILURE, "Can't config ethdev: err=%d, "
+				"port=%u\n", ret, eth_port_id);
+		eth_rx_conf = ethdev_info.default_rxconf;
+		eth_rx_conf.offloads = eth_pconf.rxmode.offloads;
+		eth_tx_conf = ethdev_info.default_txconf;
+		eth_tx_conf.offloads = eth_pconf.txmode.offloads;
+
+		/* Setup ethdev queue if ethdev exists */
+		for (i = 0; i < num_queues; i++) {
+			ret = rte_eth_rx_queue_setup(eth_port_id, i, nb_desc,
+					rte_eth_dev_socket_id(eth_port_id),
+					&eth_rx_conf, mbuf_pool);
+			if (ret < 0)
+				rte_exit(EXIT_FAILURE,
+					"Failed to setup eth rxq %u.\n", i);
+			ret = rte_eth_tx_queue_setup(eth_port_id, i, nb_desc,
+					rte_eth_dev_socket_id(eth_port_id),
+					&eth_tx_conf);
+			if (ret < 0)
+				rte_exit(EXIT_FAILURE,
+					"Failed to setup eth txq %u.\n", i);
+		}
+
+		ret = rte_eth_dev_start(eth_port_id);
+		if (ret < 0)
+			rte_exit(EXIT_FAILURE, "rte_eth_dev_start: err=%d, "
+				"port=%u\n", ret, eth_port_id);
+	}
+
+	/* initialize port stats */
+	memset(&ntb_port_stats, 0, sizeof(ntb_port_stats));
+
+	/* Set default fwd mode if user doesn't set it. */
+	if (fwd_mode == MAX_FWD_MODE && eth_port_id < RTE_MAX_ETHPORTS) {
+		printf("Set default fwd mode as iofwd.\n");
+		fwd_mode = IOFWD;
+	}
+	if (fwd_mode == MAX_FWD_MODE) {
+		printf("Set default fwd mode as file-trans.\n");
+		fwd_mode = FILE_TRANS;
+	}
 
 	if (interactive) {
 		sleep(1);
 		prompt();
+	} else {
+		start_pkt_fwd();
 	}
 
 	return 0;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [PATCH v6 0/4] enable FIFO for NTB
  2019-09-24  8:43       ` [dpdk-dev] [PATCH v5 0/4] enable FIFO " Xiaoyun Li
                           ` (3 preceding siblings ...)
  2019-09-24  8:43         ` [dpdk-dev] [PATCH v5 4/4] examples/ntb: support more functions for NTB Xiaoyun Li
@ 2019-09-26  3:20         ` " Xiaoyun Li
  2019-09-26  3:20           ` [dpdk-dev] [PATCH v6 1/4] raw/ntb: setup ntb queue Xiaoyun Li
                             ` (4 more replies)
  4 siblings, 5 replies; 42+ messages in thread
From: Xiaoyun Li @ 2019-09-26  3:20 UTC (permalink / raw)
  To: jingjing.wu, keith.wiles, omkar.maslekar, cunming.liang; +Cc: dev, Xiaoyun Li

Enable FIFO for NTB rawdev driver to support packet based
processing. And an example is provided to support txonly,
rxonly, iofwd between NTB device and ethdev, and file
transmission.

Acked-by: Omkar Maslekar <omkar.maslekar@intel.com>

---
v6:
 * Fixed xstats reset typo.
 * Fixed free functions when error happens.
 * Added missing info in doc.

v5:
 * Added missing free function when error happens.
 * Reworked on xstats reset and get to avoid competition of reset and
 * en/dequeue.
 * Added missing info in doc.

v4:
 * Fixed compile issues with 32-bit machine.
 * Fixed total xstats issue.

v3:
 * Replace strncpy with memcpy to avoid gcc-9 compile issue.

v2:
 * Fixed compile issues with 32-bit machine and lack of including file.
 * Fixed a typo.

Xiaoyun Li (4):
  raw/ntb: setup ntb queue
  raw/ntb: add xstats support
  raw/ntb: add enqueue and dequeue functions
  examples/ntb: support more functions for NTB

 doc/guides/rawdevs/ntb.rst             |   93 +-
 doc/guides/rel_notes/release_19_11.rst |    4 +
 doc/guides/sample_app_ug/ntb.rst       |   59 +-
 drivers/raw/ntb/Makefile               |    3 +
 drivers/raw/ntb/meson.build            |    1 +
 drivers/raw/ntb/ntb.c                  | 1130 ++++++++++++++++-----
 drivers/raw/ntb/ntb.h                  |  163 ++-
 drivers/raw/ntb/ntb_hw_intel.c         |   48 +-
 drivers/raw/ntb/rte_pmd_ntb.h          |   43 +
 examples/ntb/meson.build               |    3 +
 examples/ntb/ntb_fwd.c                 | 1298 +++++++++++++++++++++---
 11 files changed, 2428 insertions(+), 417 deletions(-)
 create mode 100644 drivers/raw/ntb/rte_pmd_ntb.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [PATCH v6 1/4] raw/ntb: setup ntb queue
  2019-09-26  3:20         ` [dpdk-dev] [PATCH v6 0/4] enable FIFO " Xiaoyun Li
@ 2019-09-26  3:20           ` Xiaoyun Li
  2019-09-26  3:20           ` [dpdk-dev] [PATCH v6 2/4] raw/ntb: add xstats support Xiaoyun Li
                             ` (3 subsequent siblings)
  4 siblings, 0 replies; 42+ messages in thread
From: Xiaoyun Li @ 2019-09-26  3:20 UTC (permalink / raw)
  To: jingjing.wu, keith.wiles, omkar.maslekar, cunming.liang; +Cc: dev, Xiaoyun Li

Setup and init ntb txq and rxq. And negotiate queue information
with the peer. If queue size and number of queues are not
consistent on both sides, return error.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
---
 doc/guides/rawdevs/ntb.rst             |  39 +-
 doc/guides/rel_notes/release_19_11.rst |   4 +
 drivers/raw/ntb/Makefile               |   3 +
 drivers/raw/ntb/meson.build            |   1 +
 drivers/raw/ntb/ntb.c                  | 722 ++++++++++++++++++-------
 drivers/raw/ntb/ntb.h                  | 151 ++++--
 drivers/raw/ntb/ntb_hw_intel.c         |  26 +-
 drivers/raw/ntb/rte_pmd_ntb.h          |  43 ++
 8 files changed, 734 insertions(+), 255 deletions(-)
 create mode 100644 drivers/raw/ntb/rte_pmd_ntb.h

diff --git a/doc/guides/rawdevs/ntb.rst b/doc/guides/rawdevs/ntb.rst
index 0a61ec03d..99e7db441 100644
--- a/doc/guides/rawdevs/ntb.rst
+++ b/doc/guides/rawdevs/ntb.rst
@@ -45,8 +45,45 @@ to use, i.e. igb_uio, vfio. The ``dpdk-devbind.py`` script can be used to
 show devices status and to bind them to a suitable kernel driver. They will
 appear under the category of "Misc (rawdev) devices".
 
+Ring Layout
+-----------
+
+Since read/write remote system's memory are through PCI bus, remote read
+is much more expensive than remote write. Thus, the enqueue and dequeue
+based on ntb ring should avoid remote read. The ring layout for ntb is
+like the following:
+- Ring Format:
+  desc_ring:
+      0               16                                              64
+      +---------------------------------------------------------------+
+      |                        buffer address                         |
+      +---------------+-----------------------------------------------+
+      | buffer length |                      resv                     |
+      +---------------+-----------------------------------------------+
+  used_ring:
+      0               16              32
+      +---------------+---------------+
+      | packet length |     flags     |
+      +---------------+---------------+
+- Ring Layout
+      +------------------------+   +------------------------+
+      | used_ring              |   | desc_ring              |
+      | +---+                  |   | +---+                  |
+      | |   |                  |   | |   |                  |
+      | +---+      +--------+  |   | +---+                  |
+      | |   | ---> | buffer | <+---+-|   |                  |
+      | +---+      +--------+  |   | +---+                  |
+      | |   |                  |   | |   |                  |
+      | +---+                  |   | +---+                  |
+      |  ...                   |   |  ...                   |
+      |                        |   |                        |
+      |            +---------+ |   |            +---------+ |
+      |            | tx_tail | |   |            | rx_tail | |
+      | System A   +---------+ |   | System B   +---------+ |
+      +------------------------+   +------------------------+
+                    <---------traffic---------
+
 Limitation
 ----------
 
-- The FIFO hasn't been introduced and will come in 19.11 release.
 - This PMD only supports Intel Skylake platform.
diff --git a/doc/guides/rel_notes/release_19_11.rst b/doc/guides/rel_notes/release_19_11.rst
index 27cfbd9e3..efd0fb825 100644
--- a/doc/guides/rel_notes/release_19_11.rst
+++ b/doc/guides/rel_notes/release_19_11.rst
@@ -56,6 +56,10 @@ New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
+   * **Introduced FIFO for NTB PMD.**
+
+     Introduced FIFO for NTB (Non-transparent Bridge) PMD to support
+     packet based processing.
 
 Removed Items
 -------------
diff --git a/drivers/raw/ntb/Makefile b/drivers/raw/ntb/Makefile
index 6fe2aaf40..814cd05ca 100644
--- a/drivers/raw/ntb/Makefile
+++ b/drivers/raw/ntb/Makefile
@@ -25,4 +25,7 @@ LIBABIVER := 1
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_NTB_RAWDEV) += ntb.c
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_NTB_RAWDEV) += ntb_hw_intel.c
 
+# install this header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_PMD_NTB_RAWDEV)-include := rte_pmd_ntb.h
+
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/raw/ntb/meson.build b/drivers/raw/ntb/meson.build
index 7f39437f8..7a7d26126 100644
--- a/drivers/raw/ntb/meson.build
+++ b/drivers/raw/ntb/meson.build
@@ -5,4 +5,5 @@ deps += ['rawdev', 'mbuf', 'mempool',
 	 'pci', 'bus_pci']
 sources = files('ntb.c',
                 'ntb_hw_intel.c')
+install_headers('rte_pmd_ntb.h')
 allow_experimental_apis = true
diff --git a/drivers/raw/ntb/ntb.c b/drivers/raw/ntb/ntb.c
index bfecce1e4..0e62ad433 100644
--- a/drivers/raw/ntb/ntb.c
+++ b/drivers/raw/ntb/ntb.c
@@ -12,6 +12,7 @@
 #include <rte_eal.h>
 #include <rte_log.h>
 #include <rte_pci.h>
+#include <rte_mbuf.h>
 #include <rte_bus_pci.h>
 #include <rte_memzone.h>
 #include <rte_memcpy.h>
@@ -19,6 +20,7 @@
 #include <rte_rawdev_pmd.h>
 
 #include "ntb_hw_intel.h"
+#include "rte_pmd_ntb.h"
 #include "ntb.h"
 
 int ntb_logtype;
@@ -28,48 +30,7 @@ static const struct rte_pci_id pci_id_ntb_map[] = {
 	{ .vendor_id = 0, /* sentinel */ },
 };
 
-static int
-ntb_set_mw(struct rte_rawdev *dev, int mw_idx, uint64_t mw_size)
-{
-	struct ntb_hw *hw = dev->dev_private;
-	char mw_name[RTE_MEMZONE_NAMESIZE];
-	const struct rte_memzone *mz;
-	int ret = 0;
-
-	if (hw->ntb_ops->mw_set_trans == NULL) {
-		NTB_LOG(ERR, "Not supported to set mw.");
-		return -ENOTSUP;
-	}
-
-	snprintf(mw_name, sizeof(mw_name), "ntb_%d_mw_%d",
-		 dev->dev_id, mw_idx);
-
-	mz = rte_memzone_lookup(mw_name);
-	if (mz)
-		return 0;
-
-	/**
-	 * Hardware requires that mapped memory base address should be
-	 * aligned with EMBARSZ and needs continuous memzone.
-	 */
-	mz = rte_memzone_reserve_aligned(mw_name, mw_size, dev->socket_id,
-				RTE_MEMZONE_IOVA_CONTIG, hw->mw_size[mw_idx]);
-	if (!mz) {
-		NTB_LOG(ERR, "Cannot allocate aligned memzone.");
-		return -EIO;
-	}
-	hw->mz[mw_idx] = mz;
-
-	ret = (*hw->ntb_ops->mw_set_trans)(dev, mw_idx, mz->iova, mw_size);
-	if (ret) {
-		NTB_LOG(ERR, "Cannot set mw translation.");
-		return ret;
-	}
-
-	return ret;
-}
-
-static void
+static inline void
 ntb_link_cleanup(struct rte_rawdev *dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
@@ -89,20 +50,94 @@ ntb_link_cleanup(struct rte_rawdev *dev)
 	}
 
 	/* Clear mw so that peer cannot access local memory.*/
-	for (i = 0; i < hw->mw_cnt; i++) {
+	for (i = 0; i < hw->used_mw_num; i++) {
 		status = (*hw->ntb_ops->mw_set_trans)(dev, i, 0, 0);
 		if (status)
 			NTB_LOG(ERR, "Failed to clean mw.");
 	}
 }
 
+static inline int
+ntb_handshake_work(const struct rte_rawdev *dev)
+{
+	struct ntb_hw *hw = dev->dev_private;
+	uint32_t val;
+	int ret, i;
+
+	if (hw->ntb_ops->spad_write == NULL ||
+	    hw->ntb_ops->mw_set_trans == NULL) {
+		NTB_LOG(ERR, "Scratchpad/MW setting is not supported.");
+		return -ENOTSUP;
+	}
+
+	/* Tell peer the mw info of local side. */
+	ret = (*hw->ntb_ops->spad_write)(dev, SPAD_NUM_MWS, 1, hw->mw_cnt);
+	if (ret < 0)
+		return ret;
+	for (i = 0; i < hw->mw_cnt; i++) {
+		NTB_LOG(INFO, "Local %u mw size: 0x%"PRIx64"", i,
+				hw->mw_size[i]);
+		val = hw->mw_size[i] >> 32;
+		ret = (*hw->ntb_ops->spad_write)(dev, SPAD_MW0_SZ_H + 2 * i,
+						 1, val);
+		if (ret < 0)
+			return ret;
+		val = hw->mw_size[i];
+		ret = (*hw->ntb_ops->spad_write)(dev, SPAD_MW0_SZ_L + 2 * i,
+						 1, val);
+		if (ret < 0)
+			return ret;
+	}
+
+	/* Tell peer about the queue info and map memory to the peer. */
+	ret = (*hw->ntb_ops->spad_write)(dev, SPAD_Q_SZ, 1, hw->queue_size);
+	if (ret < 0)
+		return ret;
+	ret = (*hw->ntb_ops->spad_write)(dev, SPAD_NUM_QPS, 1,
+					 hw->queue_pairs);
+	if (ret < 0)
+		return ret;
+	ret = (*hw->ntb_ops->spad_write)(dev, SPAD_USED_MWS, 1,
+					 hw->used_mw_num);
+	if (ret < 0)
+		return ret;
+	for (i = 0; i < hw->used_mw_num; i++) {
+		val = (uint64_t)(size_t)(hw->mz[i]->addr) >> 32;
+		ret = (*hw->ntb_ops->spad_write)(dev, SPAD_MW0_BA_H + 2 * i,
+						 1, val);
+		if (ret < 0)
+			return ret;
+		val = (uint64_t)(size_t)(hw->mz[i]->addr);
+		ret = (*hw->ntb_ops->spad_write)(dev, SPAD_MW0_BA_L + 2 * i,
+						 1, val);
+		if (ret < 0)
+			return ret;
+	}
+
+	for (i = 0; i < hw->used_mw_num; i++) {
+		ret = (*hw->ntb_ops->mw_set_trans)(dev, i, hw->mz[i]->iova,
+						   hw->mz[i]->len);
+		if (ret < 0)
+			return ret;
+	}
+
+	/* Ring doorbell 0 to tell peer the device is ready. */
+	ret = (*hw->ntb_ops->peer_db_set)(dev, 0);
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
+
 static void
 ntb_dev_intr_handler(void *param)
 {
 	struct rte_rawdev *dev = (struct rte_rawdev *)param;
 	struct ntb_hw *hw = dev->dev_private;
-	uint32_t mw_size_h, mw_size_l;
+	uint32_t val_h, val_l;
+	uint64_t peer_mw_size;
 	uint64_t db_bits = 0;
+	uint8_t peer_mw_cnt;
 	int i = 0;
 
 	if (hw->ntb_ops->db_read == NULL ||
@@ -118,7 +153,7 @@ ntb_dev_intr_handler(void *param)
 
 	/* Doorbell 0 is for peer device ready. */
 	if (db_bits & 1) {
-		NTB_LOG(DEBUG, "DB0: Peer device is up.");
+		NTB_LOG(INFO, "DB0: Peer device is up.");
 		/* Clear received doorbell. */
 		(*hw->ntb_ops->db_clear)(dev, 1);
 
@@ -129,47 +164,44 @@ ntb_dev_intr_handler(void *param)
 		if (hw->peer_dev_up)
 			return;
 
-		if (hw->ntb_ops->spad_read == NULL ||
-		    hw->ntb_ops->spad_write == NULL) {
-			NTB_LOG(ERR, "Scratchpad is not supported.");
+		if (hw->ntb_ops->spad_read == NULL) {
+			NTB_LOG(ERR, "Scratchpad read is not supported.");
+			return;
+		}
+
+		/* Check if mw setting on the peer is the same as local. */
+		peer_mw_cnt = (*hw->ntb_ops->spad_read)(dev, SPAD_NUM_MWS, 0);
+		if (peer_mw_cnt != hw->mw_cnt) {
+			NTB_LOG(ERR, "Both mw cnt must be the same.");
 			return;
 		}
 
-		hw->peer_mw_cnt = (*hw->ntb_ops->spad_read)
-				  (dev, SPAD_NUM_MWS, 0);
-		hw->peer_mw_size = rte_zmalloc("uint64_t",
-				   hw->peer_mw_cnt * sizeof(uint64_t), 0);
 		for (i = 0; i < hw->mw_cnt; i++) {
-			mw_size_h = (*hw->ntb_ops->spad_read)
-				    (dev, SPAD_MW0_SZ_H + 2 * i, 0);
-			mw_size_l = (*hw->ntb_ops->spad_read)
-				    (dev, SPAD_MW0_SZ_L + 2 * i, 0);
-			hw->peer_mw_size[i] = ((uint64_t)mw_size_h << 32) |
-					      mw_size_l;
+			val_h = (*hw->ntb_ops->spad_read)
+				(dev, SPAD_MW0_SZ_H + 2 * i, 0);
+			val_l = (*hw->ntb_ops->spad_read)
+				(dev, SPAD_MW0_SZ_L + 2 * i, 0);
+			peer_mw_size = ((uint64_t)val_h << 32) | val_l;
 			NTB_LOG(DEBUG, "Peer %u mw size: 0x%"PRIx64"", i,
-					hw->peer_mw_size[i]);
+					peer_mw_size);
+			if (peer_mw_size != hw->mw_size[i]) {
+				NTB_LOG(ERR, "Mw config must be the same.");
+				return;
+			}
 		}
 
 		hw->peer_dev_up = 1;
 
 		/**
-		 * Handshake with peer. Spad_write only works when both
-		 * devices are up. So write spad again when db is received.
-		 * And set db again for the later device who may miss
+		 * Handshake with peer. Spad_write & mw_set_trans only works
+		 * when both devices are up. So write spad again when db is
+		 * received. And set db again for the later device who may miss
 		 * the 1st db.
 		 */
-		for (i = 0; i < hw->mw_cnt; i++) {
-			(*hw->ntb_ops->spad_write)(dev, SPAD_NUM_MWS,
-						   1, hw->mw_cnt);
-			mw_size_h = hw->mw_size[i] >> 32;
-			(*hw->ntb_ops->spad_write)(dev, SPAD_MW0_SZ_H + 2 * i,
-						   1, mw_size_h);
-
-			mw_size_l = hw->mw_size[i];
-			(*hw->ntb_ops->spad_write)(dev, SPAD_MW0_SZ_L + 2 * i,
-						   1, mw_size_l);
+		if (ntb_handshake_work(dev) < 0) {
+			NTB_LOG(ERR, "Handshake work failed.");
+			return;
 		}
-		(*hw->ntb_ops->peer_db_set)(dev, 0);
 
 		/* To get the link info. */
 		if (hw->ntb_ops->get_link_status == NULL) {
@@ -183,7 +215,7 @@ ntb_dev_intr_handler(void *param)
 	}
 
 	if (db_bits & (1 << 1)) {
-		NTB_LOG(DEBUG, "DB1: Peer device is down.");
+		NTB_LOG(INFO, "DB1: Peer device is down.");
 		/* Clear received doorbell. */
 		(*hw->ntb_ops->db_clear)(dev, 2);
 
@@ -197,7 +229,7 @@ ntb_dev_intr_handler(void *param)
 	}
 
 	if (db_bits & (1 << 2)) {
-		NTB_LOG(DEBUG, "DB2: Peer device agrees dev to be down.");
+		NTB_LOG(INFO, "DB2: Peer device agrees dev to be down.");
 		/* Clear received doorbell. */
 		(*hw->ntb_ops->db_clear)(dev, (1 << 2));
 		hw->peer_dev_up = 0;
@@ -206,24 +238,228 @@ ntb_dev_intr_handler(void *param)
 }
 
 static void
-ntb_queue_conf_get(struct rte_rawdev *dev __rte_unused,
-		   uint16_t queue_id __rte_unused,
-		   rte_rawdev_obj_t queue_conf __rte_unused)
+ntb_queue_conf_get(struct rte_rawdev *dev,
+		   uint16_t queue_id,
+		   rte_rawdev_obj_t queue_conf)
+{
+	struct ntb_queue_conf *q_conf = queue_conf;
+	struct ntb_hw *hw = dev->dev_private;
+
+	q_conf->tx_free_thresh = hw->tx_queues[queue_id]->tx_free_thresh;
+	q_conf->nb_desc = hw->rx_queues[queue_id]->nb_rx_desc;
+	q_conf->rx_mp = hw->rx_queues[queue_id]->mpool;
+}
+
+static void
+ntb_rxq_release_mbufs(struct ntb_rx_queue *q)
 {
+	int i;
+
+	if (!q || !q->sw_ring) {
+		NTB_LOG(ERR, "Pointer to rxq or sw_ring is NULL");
+		return;
+	}
+
+	for (i = 0; i < q->nb_rx_desc; i++) {
+		if (q->sw_ring[i].mbuf) {
+			rte_pktmbuf_free_seg(q->sw_ring[i].mbuf);
+			q->sw_ring[i].mbuf = NULL;
+		}
+	}
+}
+
+static void
+ntb_rxq_release(struct ntb_rx_queue *rxq)
+{
+	if (!rxq) {
+		NTB_LOG(ERR, "Pointer to rxq is NULL");
+		return;
+	}
+
+	ntb_rxq_release_mbufs(rxq);
+
+	rte_free(rxq->sw_ring);
+	rte_free(rxq);
 }
 
 static int
-ntb_queue_setup(struct rte_rawdev *dev __rte_unused,
-		uint16_t queue_id __rte_unused,
-		rte_rawdev_obj_t queue_conf __rte_unused)
+ntb_rxq_setup(struct rte_rawdev *dev,
+	      uint16_t qp_id,
+	      rte_rawdev_obj_t queue_conf)
 {
+	struct ntb_queue_conf *rxq_conf = queue_conf;
+	struct ntb_hw *hw = dev->dev_private;
+	struct ntb_rx_queue *rxq;
+
+	/* Allocate the rx queue data structure */
+	rxq = rte_zmalloc_socket("ntb rx queue",
+				 sizeof(struct ntb_rx_queue),
+				 RTE_CACHE_LINE_SIZE,
+				 dev->socket_id);
+	if (!rxq) {
+		NTB_LOG(ERR, "Failed to allocate memory for "
+			    "rx queue data structure.");
+		return -ENOMEM;
+	}
+
+	if (rxq_conf->rx_mp == NULL) {
+		NTB_LOG(ERR, "Invalid null mempool pointer.");
+		return -EINVAL;
+	}
+	rxq->nb_rx_desc = rxq_conf->nb_desc;
+	rxq->mpool = rxq_conf->rx_mp;
+	rxq->port_id = dev->dev_id;
+	rxq->queue_id = qp_id;
+	rxq->hw = hw;
+
+	/* Allocate the software ring. */
+	rxq->sw_ring =
+		rte_zmalloc_socket("ntb rx sw ring",
+				   sizeof(struct ntb_rx_entry) *
+				   rxq->nb_rx_desc,
+				   RTE_CACHE_LINE_SIZE,
+				   dev->socket_id);
+	if (!rxq->sw_ring) {
+		ntb_rxq_release(rxq);
+		rxq = NULL;
+		NTB_LOG(ERR, "Failed to allocate memory for SW ring");
+		return -ENOMEM;
+	}
+
+	hw->rx_queues[qp_id] = rxq;
+
 	return 0;
 }
 
+static void
+ntb_txq_release_mbufs(struct ntb_tx_queue *q)
+{
+	int i;
+
+	if (!q || !q->sw_ring) {
+		NTB_LOG(ERR, "Pointer to txq or sw_ring is NULL");
+		return;
+	}
+
+	for (i = 0; i < q->nb_tx_desc; i++) {
+		if (q->sw_ring[i].mbuf) {
+			rte_pktmbuf_free_seg(q->sw_ring[i].mbuf);
+			q->sw_ring[i].mbuf = NULL;
+		}
+	}
+}
+
+static void
+ntb_txq_release(struct ntb_tx_queue *txq)
+{
+	if (!txq) {
+		NTB_LOG(ERR, "Pointer to txq is NULL");
+		return;
+	}
+
+	ntb_txq_release_mbufs(txq);
+
+	rte_free(txq->sw_ring);
+	rte_free(txq);
+}
+
 static int
-ntb_queue_release(struct rte_rawdev *dev __rte_unused,
-		  uint16_t queue_id __rte_unused)
+ntb_txq_setup(struct rte_rawdev *dev,
+	      uint16_t qp_id,
+	      rte_rawdev_obj_t queue_conf)
 {
+	struct ntb_queue_conf *txq_conf = queue_conf;
+	struct ntb_hw *hw = dev->dev_private;
+	struct ntb_tx_queue *txq;
+	uint16_t i, prev;
+
+	/* Allocate the TX queue data structure. */
+	txq = rte_zmalloc_socket("ntb tx queue",
+				  sizeof(struct ntb_tx_queue),
+				  RTE_CACHE_LINE_SIZE,
+				  dev->socket_id);
+	if (!txq) {
+		NTB_LOG(ERR, "Failed to allocate memory for "
+			    "tx queue structure");
+		return -ENOMEM;
+	}
+
+	txq->nb_tx_desc = txq_conf->nb_desc;
+	txq->port_id = dev->dev_id;
+	txq->queue_id = qp_id;
+	txq->hw = hw;
+
+	/* Allocate software ring */
+	txq->sw_ring =
+		rte_zmalloc_socket("ntb tx sw ring",
+				   sizeof(struct ntb_tx_entry) *
+				   txq->nb_tx_desc,
+				   RTE_CACHE_LINE_SIZE,
+				   dev->socket_id);
+	if (!txq->sw_ring) {
+		ntb_txq_release(txq);
+		txq = NULL;
+		NTB_LOG(ERR, "Failed to allocate memory for SW TX ring");
+		return -ENOMEM;
+	}
+
+	prev = txq->nb_tx_desc - 1;
+	for (i = 0; i < txq->nb_tx_desc; i++) {
+		txq->sw_ring[i].mbuf = NULL;
+		txq->sw_ring[i].last_id = i;
+		txq->sw_ring[prev].next_id = i;
+		prev = i;
+	}
+
+	txq->tx_free_thresh = txq_conf->tx_free_thresh ?
+			      txq_conf->tx_free_thresh :
+			      NTB_DFLT_TX_FREE_THRESH;
+	if (txq->tx_free_thresh >= txq->nb_tx_desc - 3) {
+		NTB_LOG(ERR, "tx_free_thresh must be less than nb_desc - 3. "
+			"(tx_free_thresh=%u qp_id=%u)", txq->tx_free_thresh,
+			qp_id);
+		return -EINVAL;
+	}
+
+	hw->tx_queues[qp_id] = txq;
+
+	return 0;
+}
+
+
+static int
+ntb_queue_setup(struct rte_rawdev *dev,
+		uint16_t queue_id,
+		rte_rawdev_obj_t queue_conf)
+{
+	struct ntb_hw *hw = dev->dev_private;
+	int ret;
+
+	if (queue_id >= hw->queue_pairs)
+		return -EINVAL;
+
+	ret = ntb_txq_setup(dev, queue_id, queue_conf);
+	if (ret < 0)
+		return ret;
+
+	ret = ntb_rxq_setup(dev, queue_id, queue_conf);
+
+	return ret;
+}
+
+static int
+ntb_queue_release(struct rte_rawdev *dev, uint16_t queue_id)
+{
+	struct ntb_hw *hw = dev->dev_private;
+
+	if (queue_id >= hw->queue_pairs)
+		return -EINVAL;
+
+	ntb_txq_release(hw->tx_queues[queue_id]);
+	hw->tx_queues[queue_id] = NULL;
+	ntb_rxq_release(hw->rx_queues[queue_id]);
+	hw->rx_queues[queue_id] = NULL;
+
 	return 0;
 }
 
@@ -234,6 +470,77 @@ ntb_queue_count(struct rte_rawdev *dev)
 	return hw->queue_pairs;
 }
 
+static int
+ntb_queue_init(struct rte_rawdev *dev, uint16_t qp_id)
+{
+	struct ntb_hw *hw = dev->dev_private;
+	struct ntb_rx_queue *rxq = hw->rx_queues[qp_id];
+	struct ntb_tx_queue *txq = hw->tx_queues[qp_id];
+	volatile struct ntb_header *local_hdr;
+	struct ntb_header *remote_hdr;
+	uint16_t q_size = hw->queue_size;
+	uint32_t hdr_offset;
+	void *bar_addr;
+	uint16_t i;
+
+	if (hw->ntb_ops->get_peer_mw_addr == NULL) {
+		NTB_LOG(ERR, "Getting peer mw addr is not supported.");
+		return -EINVAL;
+	}
+
+	/* Put queue info into the start of shared memory. */
+	hdr_offset = hw->hdr_size_per_queue * qp_id;
+	local_hdr = (volatile struct ntb_header *)
+		    ((size_t)hw->mz[0]->addr + hdr_offset);
+	bar_addr = (*hw->ntb_ops->get_peer_mw_addr)(dev, 0);
+	if (bar_addr == NULL)
+		return -EINVAL;
+	remote_hdr = (struct ntb_header *)
+		     ((size_t)bar_addr + hdr_offset);
+
+	/* rxq init. */
+	rxq->rx_desc_ring = (struct ntb_desc *)
+			    (&remote_hdr->desc_ring);
+	rxq->rx_used_ring = (volatile struct ntb_used *)
+			    (&local_hdr->desc_ring[q_size]);
+	rxq->avail_cnt = &remote_hdr->avail_cnt;
+	rxq->used_cnt = &local_hdr->used_cnt;
+
+	for (i = 0; i < rxq->nb_rx_desc - 1; i++) {
+		struct rte_mbuf *mbuf = rte_mbuf_raw_alloc(rxq->mpool);
+		if (unlikely(!mbuf)) {
+			NTB_LOG(ERR, "Failed to allocate mbuf for RX");
+			return -ENOMEM;
+		}
+		mbuf->port = dev->dev_id;
+
+		rxq->sw_ring[i].mbuf = mbuf;
+
+		rxq->rx_desc_ring[i].addr = rte_pktmbuf_mtod(mbuf, size_t);
+		rxq->rx_desc_ring[i].len = mbuf->buf_len - RTE_PKTMBUF_HEADROOM;
+	}
+	rte_wmb();
+	*rxq->avail_cnt = rxq->nb_rx_desc - 1;
+	rxq->last_avail = rxq->nb_rx_desc - 1;
+	rxq->last_used = 0;
+
+	/* txq init */
+	txq->tx_desc_ring = (volatile struct ntb_desc *)
+			    (&local_hdr->desc_ring);
+	txq->tx_used_ring = (struct ntb_used *)
+			    (&remote_hdr->desc_ring[q_size]);
+	txq->avail_cnt = &local_hdr->avail_cnt;
+	txq->used_cnt = &remote_hdr->used_cnt;
+
+	rte_wmb();
+	*txq->used_cnt = 0;
+	txq->last_used = 0;
+	txq->last_avail = 0;
+	txq->nb_tx_free = txq->nb_tx_desc - 1;
+
+	return 0;
+}
+
 static int
 ntb_enqueue_bufs(struct rte_rawdev *dev,
 		 struct rte_rawdev_buf **buffers,
@@ -278,58 +585,56 @@ static void
 ntb_dev_info_get(struct rte_rawdev *dev, rte_rawdev_obj_t dev_info)
 {
 	struct ntb_hw *hw = dev->dev_private;
-	struct ntb_attr *ntb_attrs = dev_info;
-
-	strncpy(ntb_attrs[NTB_TOPO_ID].name, NTB_TOPO_NAME, NTB_ATTR_NAME_LEN);
-	switch (hw->topo) {
-	case NTB_TOPO_B2B_DSD:
-		strncpy(ntb_attrs[NTB_TOPO_ID].value, "B2B DSD",
-			NTB_ATTR_VAL_LEN);
-		break;
-	case NTB_TOPO_B2B_USD:
-		strncpy(ntb_attrs[NTB_TOPO_ID].value, "B2B USD",
-			NTB_ATTR_VAL_LEN);
-		break;
-	default:
-		strncpy(ntb_attrs[NTB_TOPO_ID].value, "Unsupported",
-			NTB_ATTR_VAL_LEN);
-	}
-
-	strncpy(ntb_attrs[NTB_LINK_STATUS_ID].name, NTB_LINK_STATUS_NAME,
-		NTB_ATTR_NAME_LEN);
-	snprintf(ntb_attrs[NTB_LINK_STATUS_ID].value, NTB_ATTR_VAL_LEN,
-		 "%d", hw->link_status);
+	struct ntb_dev_info *info = dev_info;
 
-	strncpy(ntb_attrs[NTB_SPEED_ID].name, NTB_SPEED_NAME,
-		NTB_ATTR_NAME_LEN);
-	snprintf(ntb_attrs[NTB_SPEED_ID].value, NTB_ATTR_VAL_LEN,
-		 "%d", hw->link_speed);
+	info->mw_cnt = hw->mw_cnt;
+	info->mw_size = hw->mw_size;
 
-	strncpy(ntb_attrs[NTB_WIDTH_ID].name, NTB_WIDTH_NAME,
-		NTB_ATTR_NAME_LEN);
-	snprintf(ntb_attrs[NTB_WIDTH_ID].value, NTB_ATTR_VAL_LEN,
-		 "%d", hw->link_width);
-
-	strncpy(ntb_attrs[NTB_MW_CNT_ID].name, NTB_MW_CNT_NAME,
-		NTB_ATTR_NAME_LEN);
-	snprintf(ntb_attrs[NTB_MW_CNT_ID].value, NTB_ATTR_VAL_LEN,
-		 "%d", hw->mw_cnt);
+	/**
+	 * Intel hardware requires that mapped memory base address should be
+	 * aligned with EMBARSZ and needs continuous memzone.
+	 */
+	info->mw_size_align = (uint8_t)(hw->pci_dev->id.vendor_id ==
+					NTB_INTEL_VENDOR_ID);
 
-	strncpy(ntb_attrs[NTB_DB_CNT_ID].name, NTB_DB_CNT_NAME,
-		NTB_ATTR_NAME_LEN);
-	snprintf(ntb_attrs[NTB_DB_CNT_ID].value, NTB_ATTR_VAL_LEN,
-		 "%d", hw->db_cnt);
+	if (!hw->queue_size || !hw->queue_pairs) {
+		NTB_LOG(ERR, "No queue size and queue num assigned.");
+		return;
+	}
 
-	strncpy(ntb_attrs[NTB_SPAD_CNT_ID].name, NTB_SPAD_CNT_NAME,
-		NTB_ATTR_NAME_LEN);
-	snprintf(ntb_attrs[NTB_SPAD_CNT_ID].value, NTB_ATTR_VAL_LEN,
-		 "%d", hw->spad_cnt);
+	hw->hdr_size_per_queue = RTE_ALIGN(sizeof(struct ntb_header) +
+				hw->queue_size * sizeof(struct ntb_desc) +
+				hw->queue_size * sizeof(struct ntb_used),
+				RTE_CACHE_LINE_SIZE);
+	info->ntb_hdr_size = hw->hdr_size_per_queue * hw->queue_pairs;
 }
 
 static int
-ntb_dev_configure(const struct rte_rawdev *dev __rte_unused,
-		  rte_rawdev_obj_t config __rte_unused)
+ntb_dev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config)
 {
+	struct ntb_dev_config *conf = config;
+	struct ntb_hw *hw = dev->dev_private;
+	int ret;
+
+	hw->queue_pairs	= conf->num_queues;
+	hw->queue_size = conf->queue_size;
+	hw->used_mw_num = conf->mz_num;
+	hw->mz = conf->mz_list;
+	hw->rx_queues = rte_zmalloc("ntb_rx_queues",
+			sizeof(struct ntb_rx_queue *) * hw->queue_pairs, 0);
+	hw->tx_queues = rte_zmalloc("ntb_tx_queues",
+			sizeof(struct ntb_tx_queue *) * hw->queue_pairs, 0);
+
+	/* Start handshake with the peer. */
+	ret = ntb_handshake_work(dev);
+	if (ret < 0) {
+		rte_free(hw->rx_queues);
+		rte_free(hw->tx_queues);
+		hw->rx_queues = NULL;
+		hw->tx_queues = NULL;
+		return ret;
+	}
+
 	return 0;
 }
 
@@ -337,24 +642,69 @@ static int
 ntb_dev_start(struct rte_rawdev *dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
-	int ret, i;
+	uint32_t peer_base_l, peer_val;
+	uint64_t peer_base_h;
+	uint32_t i;
+	int ret;
 
-	/* TODO: init queues and start queues. */
+	if (!hw->link_status || !hw->peer_dev_up)
+		return -EINVAL;
 
-	/* Map memory of bar_size to remote. */
-	hw->mz = rte_zmalloc("struct rte_memzone *",
-			     hw->mw_cnt * sizeof(struct rte_memzone *), 0);
-	for (i = 0; i < hw->mw_cnt; i++) {
-		ret = ntb_set_mw(dev, i, hw->mw_size[i]);
+	for (i = 0; i < hw->queue_pairs; i++) {
+		ret = ntb_queue_init(dev, i);
 		if (ret) {
-			NTB_LOG(ERR, "Fail to set mw.");
-			return ret;
+			NTB_LOG(ERR, "Failed to init queue.");
+			goto err_q_init;
 		}
 	}
 
+	hw->peer_mw_base = rte_zmalloc("ntb_peer_mw_base", hw->mw_cnt *
+					sizeof(uint64_t), 0);
+
+	if (hw->ntb_ops->spad_read == NULL) {
+		ret = -ENOTSUP;
+		goto err_up;
+	}
+
+	peer_val = (*hw->ntb_ops->spad_read)(dev, SPAD_Q_SZ, 0);
+	if (peer_val != hw->queue_size) {
+		NTB_LOG(ERR, "Inconsistent queue size! (local: %u peer: %u)",
+			hw->queue_size, peer_val);
+		ret = -EINVAL;
+		goto err_up;
+	}
+
+	peer_val = (*hw->ntb_ops->spad_read)(dev, SPAD_NUM_QPS, 0);
+	if (peer_val != hw->queue_pairs) {
+		NTB_LOG(ERR, "Inconsistent number of queues! (local: %u peer:"
+			" %u)", hw->queue_pairs, peer_val);
+		ret = -EINVAL;
+		goto err_up;
+	}
+
+	hw->peer_used_mws = (*hw->ntb_ops->spad_read)(dev, SPAD_USED_MWS, 0);
+
+	for (i = 0; i < hw->peer_used_mws; i++) {
+		peer_base_h = (*hw->ntb_ops->spad_read)(dev,
+				SPAD_MW0_BA_H + 2 * i, 0);
+		peer_base_l = (*hw->ntb_ops->spad_read)(dev,
+				SPAD_MW0_BA_L + 2 * i, 0);
+		hw->peer_mw_base[i] = (peer_base_h << 32) + peer_base_l;
+	}
+
 	dev->started = 1;
 
 	return 0;
+
+err_up:
+	rte_free(hw->peer_mw_base);
+err_q_init:
+	for (i = 0; i < hw->queue_pairs; i++) {
+		ntb_rxq_release_mbufs(hw->rx_queues[i]);
+		ntb_txq_release_mbufs(hw->tx_queues[i]);
+	}
+
+	return ret;
 }
 
 static void
@@ -362,9 +712,7 @@ ntb_dev_stop(struct rte_rawdev *dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint32_t time_out;
-	int status;
-
-	/* TODO: stop rx/tx queues. */
+	int status, i;
 
 	if (!hw->peer_dev_up)
 		goto clean;
@@ -405,6 +753,11 @@ ntb_dev_stop(struct rte_rawdev *dev)
 	if (status)
 		NTB_LOG(ERR, "Failed to clear doorbells.");
 
+	for (i = 0; i < hw->queue_pairs; i++) {
+		ntb_rxq_release_mbufs(hw->rx_queues[i]);
+		ntb_txq_release_mbufs(hw->tx_queues[i]);
+	}
+
 	dev->started = 0;
 }
 
@@ -413,12 +766,15 @@ ntb_dev_close(struct rte_rawdev *dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	struct rte_intr_handle *intr_handle;
-	int ret = 0;
+	int i;
 
 	if (dev->started)
 		ntb_dev_stop(dev);
 
-	/* TODO: free queues. */
+	/* free queues */
+	for (i = 0; i < hw->queue_pairs; i++)
+		ntb_queue_release(dev, i);
+	hw->queue_pairs = 0;
 
 	intr_handle = &hw->pci_dev->intr_handle;
 	/* Clean datapath event and vec mapping */
@@ -434,7 +790,7 @@ ntb_dev_close(struct rte_rawdev *dev)
 	rte_intr_callback_unregister(intr_handle,
 				     ntb_dev_intr_handler, dev);
 
-	return ret;
+	return 0;
 }
 
 static int
@@ -445,7 +801,7 @@ ntb_dev_reset(struct rte_rawdev *rawdev __rte_unused)
 
 static int
 ntb_attr_set(struct rte_rawdev *dev, const char *attr_name,
-				 uint64_t attr_value)
+	     uint64_t attr_value)
 {
 	struct ntb_hw *hw;
 	int index;
@@ -463,7 +819,21 @@ ntb_attr_set(struct rte_rawdev *dev, const char *attr_name,
 		index = atoi(&attr_name[NTB_SPAD_USER_LEN]);
 		(*hw->ntb_ops->spad_write)(dev, hw->spad_user_list[index],
 					   1, attr_value);
-		NTB_LOG(INFO, "Set attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Set attribute (%s) Value (%" PRIu64 ")",
+			attr_name, attr_value);
+		return 0;
+	}
+
+	if (!strncmp(attr_name, NTB_QUEUE_SZ_NAME, NTB_ATTR_NAME_LEN)) {
+		hw->queue_size = attr_value;
+		NTB_LOG(DEBUG, "Set attribute (%s) Value (%" PRIu64 ")",
+			attr_name, attr_value);
+		return 0;
+	}
+
+	if (!strncmp(attr_name, NTB_QUEUE_NUM_NAME, NTB_ATTR_NAME_LEN)) {
+		hw->queue_pairs = attr_value;
+		NTB_LOG(DEBUG, "Set attribute (%s) Value (%" PRIu64 ")",
 			attr_name, attr_value);
 		return 0;
 	}
@@ -475,7 +845,7 @@ ntb_attr_set(struct rte_rawdev *dev, const char *attr_name,
 
 static int
 ntb_attr_get(struct rte_rawdev *dev, const char *attr_name,
-				 uint64_t *attr_value)
+	     uint64_t *attr_value)
 {
 	struct ntb_hw *hw;
 	int index;
@@ -489,49 +859,50 @@ ntb_attr_get(struct rte_rawdev *dev, const char *attr_name,
 
 	if (!strncmp(attr_name, NTB_TOPO_NAME, NTB_ATTR_NAME_LEN)) {
 		*attr_value = hw->topo;
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
 
 	if (!strncmp(attr_name, NTB_LINK_STATUS_NAME, NTB_ATTR_NAME_LEN)) {
-		*attr_value = hw->link_status;
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		/* hw->link_status only indicates hw link status. */
+		*attr_value = hw->link_status && hw->peer_dev_up;
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
 
 	if (!strncmp(attr_name, NTB_SPEED_NAME, NTB_ATTR_NAME_LEN)) {
 		*attr_value = hw->link_speed;
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
 
 	if (!strncmp(attr_name, NTB_WIDTH_NAME, NTB_ATTR_NAME_LEN)) {
 		*attr_value = hw->link_width;
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
 
 	if (!strncmp(attr_name, NTB_MW_CNT_NAME, NTB_ATTR_NAME_LEN)) {
 		*attr_value = hw->mw_cnt;
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
 
 	if (!strncmp(attr_name, NTB_DB_CNT_NAME, NTB_ATTR_NAME_LEN)) {
 		*attr_value = hw->db_cnt;
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
 
 	if (!strncmp(attr_name, NTB_SPAD_CNT_NAME, NTB_ATTR_NAME_LEN)) {
 		*attr_value = hw->spad_cnt;
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
@@ -542,7 +913,7 @@ ntb_attr_get(struct rte_rawdev *dev, const char *attr_name,
 		index = atoi(&attr_name[NTB_SPAD_USER_LEN]);
 		*attr_value = (*hw->ntb_ops->spad_read)(dev,
 				hw->spad_user_list[index], 0);
-		NTB_LOG(INFO, "Attribute (%s) Value (%" PRIu64 ")",
+		NTB_LOG(DEBUG, "Attribute (%s) Value (%" PRIu64 ")",
 			attr_name, *attr_value);
 		return 0;
 	}
@@ -585,6 +956,7 @@ ntb_xstats_reset(struct rte_rawdev *dev __rte_unused,
 	return 0;
 }
 
+
 static const struct rte_rawdev_ops ntb_ops = {
 	.dev_info_get         = ntb_dev_info_get,
 	.dev_configure        = ntb_dev_configure,
@@ -615,7 +987,6 @@ ntb_init_hw(struct rte_rawdev *dev, struct rte_pci_device *pci_dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	struct rte_intr_handle *intr_handle;
-	uint32_t val;
 	int ret, i;
 
 	hw->pci_dev = pci_dev;
@@ -688,45 +1059,6 @@ ntb_init_hw(struct rte_rawdev *dev, struct rte_pci_device *pci_dev)
 	/* enable uio intr after callback register */
 	rte_intr_enable(intr_handle);
 
-	if (hw->ntb_ops->spad_write == NULL) {
-		NTB_LOG(ERR, "Scratchpad is not supported.");
-		return -ENOTSUP;
-	}
-	/* Tell peer the mw_cnt of local side. */
-	ret = (*hw->ntb_ops->spad_write)(dev, SPAD_NUM_MWS, 1, hw->mw_cnt);
-	if (ret) {
-		NTB_LOG(ERR, "Failed to tell peer mw count.");
-		return ret;
-	}
-
-	/* Tell peer each mw size on local side. */
-	for (i = 0; i < hw->mw_cnt; i++) {
-		NTB_LOG(DEBUG, "Local %u mw size: 0x%"PRIx64"", i,
-				hw->mw_size[i]);
-		val = hw->mw_size[i] >> 32;
-		ret = (*hw->ntb_ops->spad_write)
-				(dev, SPAD_MW0_SZ_H + 2 * i, 1, val);
-		if (ret) {
-			NTB_LOG(ERR, "Failed to tell peer mw size.");
-			return ret;
-		}
-
-		val = hw->mw_size[i];
-		ret = (*hw->ntb_ops->spad_write)
-				(dev, SPAD_MW0_SZ_L + 2 * i, 1, val);
-		if (ret) {
-			NTB_LOG(ERR, "Failed to tell peer mw size.");
-			return ret;
-		}
-	}
-
-	/* Ring doorbell 0 to tell peer the device is ready. */
-	ret = (*hw->ntb_ops->peer_db_set)(dev, 0);
-	if (ret) {
-		NTB_LOG(ERR, "Failed to tell peer device is probed.");
-		return ret;
-	}
-
 	return ret;
 }
 
@@ -839,5 +1171,5 @@ RTE_INIT(ntb_init_log)
 {
 	ntb_logtype = rte_log_register("pmd.raw.ntb");
 	if (ntb_logtype >= 0)
-		rte_log_set_level(ntb_logtype, RTE_LOG_DEBUG);
+		rte_log_set_level(ntb_logtype, RTE_LOG_INFO);
 }
diff --git a/drivers/raw/ntb/ntb.h b/drivers/raw/ntb/ntb.h
index d355231b0..69f200b99 100644
--- a/drivers/raw/ntb/ntb.h
+++ b/drivers/raw/ntb/ntb.h
@@ -2,8 +2,8 @@
  * Copyright(c) 2019 Intel Corporation.
  */
 
-#ifndef _NTB_RAWDEV_H_
-#define _NTB_RAWDEV_H_
+#ifndef _NTB_H_
+#define _NTB_H_
 
 #include <stdbool.h>
 
@@ -19,38 +19,13 @@ extern int ntb_logtype;
 /* Device IDs */
 #define NTB_INTEL_DEV_ID_B2B_SKX    0x201C
 
-#define NTB_TOPO_NAME               "topo"
-#define NTB_LINK_STATUS_NAME        "link_status"
-#define NTB_SPEED_NAME              "speed"
-#define NTB_WIDTH_NAME              "width"
-#define NTB_MW_CNT_NAME             "mw_count"
-#define NTB_DB_CNT_NAME             "db_count"
-#define NTB_SPAD_CNT_NAME           "spad_count"
 /* Reserved to app to use. */
 #define NTB_SPAD_USER               "spad_user_"
 #define NTB_SPAD_USER_LEN           (sizeof(NTB_SPAD_USER) - 1)
-#define NTB_SPAD_USER_MAX_NUM       10
+#define NTB_SPAD_USER_MAX_NUM       4
 #define NTB_ATTR_NAME_LEN           30
-#define NTB_ATTR_VAL_LEN            30
-#define NTB_ATTR_MAX                20
-
-/* NTB Attributes */
-struct ntb_attr {
-	/**< Name of the attribute */
-	char name[NTB_ATTR_NAME_LEN];
-	/**< Value or reference of value of attribute */
-	char value[NTB_ATTR_NAME_LEN];
-};
 
-enum ntb_attr_idx {
-	NTB_TOPO_ID = 0,
-	NTB_LINK_STATUS_ID,
-	NTB_SPEED_ID,
-	NTB_WIDTH_ID,
-	NTB_MW_CNT_ID,
-	NTB_DB_CNT_ID,
-	NTB_SPAD_CNT_ID,
-};
+#define NTB_DFLT_TX_FREE_THRESH     256
 
 enum ntb_topo {
 	NTB_TOPO_NONE = 0,
@@ -87,10 +62,15 @@ enum ntb_spad_idx {
 	SPAD_NUM_MWS = 1,
 	SPAD_NUM_QPS,
 	SPAD_Q_SZ,
+	SPAD_USED_MWS,
 	SPAD_MW0_SZ_H,
 	SPAD_MW0_SZ_L,
 	SPAD_MW1_SZ_H,
 	SPAD_MW1_SZ_L,
+	SPAD_MW0_BA_H,
+	SPAD_MW0_BA_L,
+	SPAD_MW1_BA_H,
+	SPAD_MW1_BA_L,
 };
 
 /**
@@ -110,26 +90,97 @@ enum ntb_spad_idx {
  * @vector_bind: Bind vector source [intr] to msix vector [msix].
  */
 struct ntb_dev_ops {
-	int (*ntb_dev_init)(struct rte_rawdev *dev);
-	void *(*get_peer_mw_addr)(struct rte_rawdev *dev, int mw_idx);
-	int (*mw_set_trans)(struct rte_rawdev *dev, int mw_idx,
+	int (*ntb_dev_init)(const struct rte_rawdev *dev);
+	void *(*get_peer_mw_addr)(const struct rte_rawdev *dev, int mw_idx);
+	int (*mw_set_trans)(const struct rte_rawdev *dev, int mw_idx,
 			    uint64_t addr, uint64_t size);
-	int (*get_link_status)(struct rte_rawdev *dev);
-	int (*set_link)(struct rte_rawdev *dev, bool up);
-	uint32_t (*spad_read)(struct rte_rawdev *dev, int spad, bool peer);
-	int (*spad_write)(struct rte_rawdev *dev, int spad,
+	int (*get_link_status)(const struct rte_rawdev *dev);
+	int (*set_link)(const struct rte_rawdev *dev, bool up);
+	uint32_t (*spad_read)(const struct rte_rawdev *dev, int spad,
+			      bool peer);
+	int (*spad_write)(const struct rte_rawdev *dev, int spad,
 			  bool peer, uint32_t spad_v);
-	uint64_t (*db_read)(struct rte_rawdev *dev);
-	int (*db_clear)(struct rte_rawdev *dev, uint64_t db_bits);
-	int (*db_set_mask)(struct rte_rawdev *dev, uint64_t db_mask);
-	int (*peer_db_set)(struct rte_rawdev *dev, uint8_t db_bit);
-	int (*vector_bind)(struct rte_rawdev *dev, uint8_t intr, uint8_t msix);
+	uint64_t (*db_read)(const struct rte_rawdev *dev);
+	int (*db_clear)(const struct rte_rawdev *dev, uint64_t db_bits);
+	int (*db_set_mask)(const struct rte_rawdev *dev, uint64_t db_mask);
+	int (*peer_db_set)(const struct rte_rawdev *dev, uint8_t db_bit);
+	int (*vector_bind)(const struct rte_rawdev *dev, uint8_t intr,
+			   uint8_t msix);
+};
+
+struct ntb_desc {
+	uint64_t addr; /* buffer addr */
+	uint16_t len;  /* buffer length */
+	uint16_t rsv1;
+	uint32_t rsv2;
+};
+
+#define NTB_FLAG_EOP    1 /* end of packet */
+struct ntb_used {
+	uint16_t len;     /* buffer length */
+	uint16_t flags;   /* flags */
+};
+
+struct ntb_rx_entry {
+	struct rte_mbuf *mbuf;
+};
+
+struct ntb_rx_queue {
+	struct ntb_desc *rx_desc_ring;
+	volatile struct ntb_used *rx_used_ring;
+	uint16_t *avail_cnt;
+	volatile uint16_t *used_cnt;
+	uint16_t last_avail;
+	uint16_t last_used;
+	uint16_t nb_rx_desc;
+
+	uint16_t rx_free_thresh;
+
+	struct rte_mempool *mpool; /* mempool for mbuf allocation */
+	struct ntb_rx_entry *sw_ring;
+
+	uint16_t queue_id;         /* DPDK queue index. */
+	uint16_t port_id;          /* Device port identifier. */
+
+	struct ntb_hw *hw;
+};
+
+struct ntb_tx_entry {
+	struct rte_mbuf *mbuf;
+	uint16_t next_id;
+	uint16_t last_id;
+};
+
+struct ntb_tx_queue {
+	volatile struct ntb_desc *tx_desc_ring;
+	struct ntb_used *tx_used_ring;
+	volatile uint16_t *avail_cnt;
+	uint16_t *used_cnt;
+	uint16_t last_avail;          /* Next need to be free. */
+	uint16_t last_used;           /* Next need to be sent. */
+	uint16_t nb_tx_desc;
+
+	/* Total number of TX descriptors ready to be allocated. */
+	uint16_t nb_tx_free;
+	uint16_t tx_free_thresh;
+
+	struct ntb_tx_entry *sw_ring;
+
+	uint16_t queue_id;            /* DPDK queue index. */
+	uint16_t port_id;             /* Device port identifier. */
+
+	struct ntb_hw *hw;
+};
+
+struct ntb_header {
+	uint16_t avail_cnt __rte_cache_aligned;
+	uint16_t used_cnt __rte_cache_aligned;
+	struct ntb_desc desc_ring[] __rte_cache_aligned;
 };
 
 /* ntb private data. */
 struct ntb_hw {
 	uint8_t mw_cnt;
-	uint8_t peer_mw_cnt;
 	uint8_t db_cnt;
 	uint8_t spad_cnt;
 
@@ -147,18 +198,26 @@ struct ntb_hw {
 	struct rte_pci_device *pci_dev;
 	char *hw_addr;
 
-	uint64_t *mw_size;
-	uint64_t *peer_mw_size;
 	uint8_t peer_dev_up;
+	uint64_t *mw_size;
+	/* remote mem base addr */
+	uint64_t *peer_mw_base;
 
 	uint16_t queue_pairs;
 	uint16_t queue_size;
+	uint32_t hdr_size_per_queue;
+
+	struct ntb_rx_queue **rx_queues;
+	struct ntb_tx_queue **tx_queues;
 
-	/**< mem zone to populate RX ring. */
+	/* memzone to populate RX ring. */
 	const struct rte_memzone **mz;
+	uint8_t used_mw_num;
+
+	uint8_t peer_used_mws;
 
 	/* Reserve several spad for app to use. */
 	int spad_user_list[NTB_SPAD_USER_MAX_NUM];
 };
 
-#endif /* _NTB_RAWDEV_H_ */
+#endif /* _NTB_H_ */
diff --git a/drivers/raw/ntb/ntb_hw_intel.c b/drivers/raw/ntb/ntb_hw_intel.c
index 21eaa8511..0e73f1609 100644
--- a/drivers/raw/ntb/ntb_hw_intel.c
+++ b/drivers/raw/ntb/ntb_hw_intel.c
@@ -26,7 +26,7 @@ static enum xeon_ntb_bar intel_ntb_bar[] = {
 };
 
 static int
-intel_ntb_dev_init(struct rte_rawdev *dev)
+intel_ntb_dev_init(const struct rte_rawdev *dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint8_t reg_val, bar;
@@ -77,7 +77,7 @@ intel_ntb_dev_init(struct rte_rawdev *dev)
 	hw->db_cnt = XEON_DB_COUNT;
 	hw->spad_cnt = XEON_SPAD_COUNT;
 
-	hw->mw_size = rte_zmalloc("uint64_t",
+	hw->mw_size = rte_zmalloc("ntb_mw_size",
 				  hw->mw_cnt * sizeof(uint64_t), 0);
 	for (i = 0; i < hw->mw_cnt; i++) {
 		bar = intel_ntb_bar[i];
@@ -94,7 +94,7 @@ intel_ntb_dev_init(struct rte_rawdev *dev)
 }
 
 static void *
-intel_ntb_get_peer_mw_addr(struct rte_rawdev *dev, int mw_idx)
+intel_ntb_get_peer_mw_addr(const struct rte_rawdev *dev, int mw_idx)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint8_t bar;
@@ -116,7 +116,7 @@ intel_ntb_get_peer_mw_addr(struct rte_rawdev *dev, int mw_idx)
 }
 
 static int
-intel_ntb_mw_set_trans(struct rte_rawdev *dev, int mw_idx,
+intel_ntb_mw_set_trans(const struct rte_rawdev *dev, int mw_idx,
 		       uint64_t addr, uint64_t size)
 {
 	struct ntb_hw *hw = dev->dev_private;
@@ -163,7 +163,7 @@ intel_ntb_mw_set_trans(struct rte_rawdev *dev, int mw_idx,
 }
 
 static int
-intel_ntb_get_link_status(struct rte_rawdev *dev)
+intel_ntb_get_link_status(const struct rte_rawdev *dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint16_t reg_val;
@@ -195,7 +195,7 @@ intel_ntb_get_link_status(struct rte_rawdev *dev)
 }
 
 static int
-intel_ntb_set_link(struct rte_rawdev *dev, bool up)
+intel_ntb_set_link(const struct rte_rawdev *dev, bool up)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint32_t ntb_ctrl, reg_off;
@@ -221,7 +221,7 @@ intel_ntb_set_link(struct rte_rawdev *dev, bool up)
 }
 
 static uint32_t
-intel_ntb_spad_read(struct rte_rawdev *dev, int spad, bool peer)
+intel_ntb_spad_read(const struct rte_rawdev *dev, int spad, bool peer)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint32_t spad_v, reg_off;
@@ -241,7 +241,7 @@ intel_ntb_spad_read(struct rte_rawdev *dev, int spad, bool peer)
 }
 
 static int
-intel_ntb_spad_write(struct rte_rawdev *dev, int spad,
+intel_ntb_spad_write(const struct rte_rawdev *dev, int spad,
 		     bool peer, uint32_t spad_v)
 {
 	struct ntb_hw *hw = dev->dev_private;
@@ -263,7 +263,7 @@ intel_ntb_spad_write(struct rte_rawdev *dev, int spad,
 }
 
 static uint64_t
-intel_ntb_db_read(struct rte_rawdev *dev)
+intel_ntb_db_read(const struct rte_rawdev *dev)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint64_t db_off, db_bits;
@@ -278,7 +278,7 @@ intel_ntb_db_read(struct rte_rawdev *dev)
 }
 
 static int
-intel_ntb_db_clear(struct rte_rawdev *dev, uint64_t db_bits)
+intel_ntb_db_clear(const struct rte_rawdev *dev, uint64_t db_bits)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint64_t db_off;
@@ -293,7 +293,7 @@ intel_ntb_db_clear(struct rte_rawdev *dev, uint64_t db_bits)
 }
 
 static int
-intel_ntb_db_set_mask(struct rte_rawdev *dev, uint64_t db_mask)
+intel_ntb_db_set_mask(const struct rte_rawdev *dev, uint64_t db_mask)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint64_t db_m_off;
@@ -312,7 +312,7 @@ intel_ntb_db_set_mask(struct rte_rawdev *dev, uint64_t db_mask)
 }
 
 static int
-intel_ntb_peer_db_set(struct rte_rawdev *dev, uint8_t db_idx)
+intel_ntb_peer_db_set(const struct rte_rawdev *dev, uint8_t db_idx)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint32_t db_off;
@@ -332,7 +332,7 @@ intel_ntb_peer_db_set(struct rte_rawdev *dev, uint8_t db_idx)
 }
 
 static int
-intel_ntb_vector_bind(struct rte_rawdev *dev, uint8_t intr, uint8_t msix)
+intel_ntb_vector_bind(const struct rte_rawdev *dev, uint8_t intr, uint8_t msix)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	uint8_t reg_off;
diff --git a/drivers/raw/ntb/rte_pmd_ntb.h b/drivers/raw/ntb/rte_pmd_ntb.h
new file mode 100644
index 000000000..6591ce793
--- /dev/null
+++ b/drivers/raw/ntb/rte_pmd_ntb.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Intel Corporation.
+ */
+
+#ifndef _RTE_PMD_NTB_H_
+#define _RTE_PMD_NTB_H_
+
+/* App needs to set/get these attrs */
+#define NTB_QUEUE_SZ_NAME           "queue_size"
+#define NTB_QUEUE_NUM_NAME          "queue_num"
+#define NTB_TOPO_NAME               "topo"
+#define NTB_LINK_STATUS_NAME        "link_status"
+#define NTB_SPEED_NAME              "speed"
+#define NTB_WIDTH_NAME              "width"
+#define NTB_MW_CNT_NAME             "mw_count"
+#define NTB_DB_CNT_NAME             "db_count"
+#define NTB_SPAD_CNT_NAME           "spad_count"
+
+#define NTB_MAX_DESC_SIZE           1024
+#define NTB_MIN_DESC_SIZE           64
+
+struct ntb_dev_info {
+	uint32_t ntb_hdr_size;
+	/**< memzone needs to be mw size align or not. */
+	uint8_t mw_size_align;
+	uint8_t mw_cnt;
+	uint64_t *mw_size;
+};
+
+struct ntb_dev_config {
+	uint16_t num_queues;
+	uint16_t queue_size;
+	uint8_t mz_num;
+	const struct rte_memzone **mz_list;
+};
+
+struct ntb_queue_conf {
+	uint16_t nb_desc;
+	uint16_t tx_free_thresh;
+	struct rte_mempool *rx_mp;
+};
+
+#endif /* _RTE_PMD_NTB_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [PATCH v6 2/4] raw/ntb: add xstats support
  2019-09-26  3:20         ` [dpdk-dev] [PATCH v6 0/4] enable FIFO " Xiaoyun Li
  2019-09-26  3:20           ` [dpdk-dev] [PATCH v6 1/4] raw/ntb: setup ntb queue Xiaoyun Li
@ 2019-09-26  3:20           ` Xiaoyun Li
  2019-09-26  3:20           ` [dpdk-dev] [PATCH v6 3/4] raw/ntb: add enqueue and dequeue functions Xiaoyun Li
                             ` (2 subsequent siblings)
  4 siblings, 0 replies; 42+ messages in thread
From: Xiaoyun Li @ 2019-09-26  3:20 UTC (permalink / raw)
  To: jingjing.wu, keith.wiles, omkar.maslekar, cunming.liang; +Cc: dev, Xiaoyun Li

Add xstats support for ntb rawdev.
Support tx-packets, tx-bytes, tx-errors and
rx-packets, rx-bytes, rx-missed.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
---
 drivers/raw/ntb/ntb.c | 170 +++++++++++++++++++++++++++++++++++++-----
 drivers/raw/ntb/ntb.h |  12 +++
 2 files changed, 164 insertions(+), 18 deletions(-)

diff --git a/drivers/raw/ntb/ntb.c b/drivers/raw/ntb/ntb.c
index 0e62ad433..a30245c64 100644
--- a/drivers/raw/ntb/ntb.c
+++ b/drivers/raw/ntb/ntb.c
@@ -30,6 +30,17 @@ static const struct rte_pci_id pci_id_ntb_map[] = {
 	{ .vendor_id = 0, /* sentinel */ },
 };
 
+/* Align with enum ntb_xstats_idx */
+static struct rte_rawdev_xstats_name ntb_xstats_names[] = {
+	{"Tx-packets"},
+	{"Tx-bytes"},
+	{"Tx-errors"},
+	{"Rx-packets"},
+	{"Rx-bytes"},
+	{"Rx-missed"},
+};
+#define NTB_XSTATS_NUM RTE_DIM(ntb_xstats_names)
+
 static inline void
 ntb_link_cleanup(struct rte_rawdev *dev)
 {
@@ -538,6 +549,12 @@ ntb_queue_init(struct rte_rawdev *dev, uint16_t qp_id)
 	txq->last_avail = 0;
 	txq->nb_tx_free = txq->nb_tx_desc - 1;
 
+	/* Set per queue stats. */
+	for (i = 0; i < NTB_XSTATS_NUM; i++) {
+		hw->ntb_xstats[i + NTB_XSTATS_NUM * (qp_id + 1)] = 0;
+		hw->ntb_xstats_off[i + NTB_XSTATS_NUM * (qp_id + 1)] = 0;
+	}
+
 	return 0;
 }
 
@@ -614,6 +631,7 @@ ntb_dev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config)
 {
 	struct ntb_dev_config *conf = config;
 	struct ntb_hw *hw = dev->dev_private;
+	uint32_t xstats_num;
 	int ret;
 
 	hw->queue_pairs	= conf->num_queues;
@@ -624,6 +642,12 @@ ntb_dev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config)
 			sizeof(struct ntb_rx_queue *) * hw->queue_pairs, 0);
 	hw->tx_queues = rte_zmalloc("ntb_tx_queues",
 			sizeof(struct ntb_tx_queue *) * hw->queue_pairs, 0);
+	/* First total stats, then per queue stats. */
+	xstats_num = (hw->queue_pairs + 1) * NTB_XSTATS_NUM;
+	hw->ntb_xstats = rte_zmalloc("ntb_xstats", xstats_num *
+				     sizeof(uint64_t), 0);
+	hw->ntb_xstats_off = rte_zmalloc("ntb_xstats_off", xstats_num *
+					 sizeof(uint64_t), 0);
 
 	/* Start handshake with the peer. */
 	ret = ntb_handshake_work(dev);
@@ -650,6 +674,12 @@ ntb_dev_start(struct rte_rawdev *dev)
 	if (!hw->link_status || !hw->peer_dev_up)
 		return -EINVAL;
 
+	/* Set total stats. */
+	for (i = 0; i < NTB_XSTATS_NUM; i++) {
+		hw->ntb_xstats[i] = 0;
+		hw->ntb_xstats_off[i] = 0;
+	}
+
 	for (i = 0; i < hw->queue_pairs; i++) {
 		ret = ntb_queue_init(dev, i);
 		if (ret) {
@@ -923,39 +953,143 @@ ntb_attr_get(struct rte_rawdev *dev, const char *attr_name,
 	return -EINVAL;
 }
 
+static inline uint64_t
+ntb_stats_update(uint64_t offset, uint64_t stat)
+{
+	if (stat >= offset)
+		return (stat - offset);
+	else
+		return (uint64_t)(((uint64_t)-1) - offset + stat + 1);
+}
+
 static int
-ntb_xstats_get(const struct rte_rawdev *dev __rte_unused,
-	       const unsigned int ids[] __rte_unused,
-	       uint64_t values[] __rte_unused,
-	       unsigned int n __rte_unused)
+ntb_xstats_get(const struct rte_rawdev *dev,
+	       const unsigned int ids[],
+	       uint64_t values[],
+	       unsigned int n)
 {
-	return 0;
+	struct ntb_hw *hw = dev->dev_private;
+	uint32_t i, j, off, xstats_num;
+
+	/* Calculate total stats of all queues. */
+	for (i = 0; i < NTB_XSTATS_NUM; i++) {
+		hw->ntb_xstats[i] = 0;
+		for (j = 0; j < hw->queue_pairs; j++) {
+			off = NTB_XSTATS_NUM * (j + 1) + i;
+			hw->ntb_xstats[i] +=
+			ntb_stats_update(hw->ntb_xstats_off[off],
+					 hw->ntb_xstats[off]);
+		}
+	}
+
+	xstats_num = NTB_XSTATS_NUM * (hw->queue_pairs + 1);
+	for (i = 0; i < n && ids[i] < xstats_num; i++) {
+		if (ids[i] < NTB_XSTATS_NUM)
+			values[i] = hw->ntb_xstats[ids[i]];
+		else
+			values[i] =
+			ntb_stats_update(hw->ntb_xstats_off[ids[i]],
+					 hw->ntb_xstats[ids[i]]);
+	}
+
+	return i;
 }
 
 static int
-ntb_xstats_get_names(const struct rte_rawdev *dev __rte_unused,
-		     struct rte_rawdev_xstats_name *xstats_names __rte_unused,
-		     unsigned int size __rte_unused)
+ntb_xstats_get_names(const struct rte_rawdev *dev,
+		     struct rte_rawdev_xstats_name *xstats_names,
+		     unsigned int size)
 {
-	return 0;
+	struct ntb_hw *hw = dev->dev_private;
+	uint32_t xstats_num, i, j, off;
+
+	xstats_num = NTB_XSTATS_NUM * (hw->queue_pairs + 1);
+	if (xstats_names == NULL || size < xstats_num)
+		return xstats_num;
+
+	/* Total stats names */
+	memcpy(xstats_names, ntb_xstats_names, sizeof(ntb_xstats_names));
+
+	/* Queue stats names */
+	for (i = 0; i < hw->queue_pairs; i++) {
+		for (j = 0; j < NTB_XSTATS_NUM; j++) {
+			off = j + (i + 1) * NTB_XSTATS_NUM;
+			snprintf(xstats_names[off].name,
+				sizeof(xstats_names[0].name),
+				"%s_q%u", ntb_xstats_names[j].name, i);
+		}
+	}
+
+	return xstats_num;
 }
 
 static uint64_t
-ntb_xstats_get_by_name(const struct rte_rawdev *dev __rte_unused,
-		       const char *name __rte_unused,
-		       unsigned int *id __rte_unused)
+ntb_xstats_get_by_name(const struct rte_rawdev *dev,
+		       const char *name, unsigned int *id)
 {
-	return 0;
+	struct rte_rawdev_xstats_name *xstats_names;
+	struct ntb_hw *hw = dev->dev_private;
+	uint32_t xstats_num, i, j, off;
+
+	if (name == NULL)
+		return -EINVAL;
+
+	xstats_num = NTB_XSTATS_NUM * (hw->queue_pairs + 1);
+	xstats_names = rte_zmalloc("ntb_stats_name",
+				   sizeof(struct rte_rawdev_xstats_name) *
+				   xstats_num, 0);
+	ntb_xstats_get_names(dev, xstats_names, xstats_num);
+
+	/* Calculate total stats of all queues. */
+	for (i = 0; i < NTB_XSTATS_NUM; i++) {
+		for (j = 0; j < hw->queue_pairs; j++) {
+			off = NTB_XSTATS_NUM * (j + 1) + i;
+			hw->ntb_xstats[i] +=
+			ntb_stats_update(hw->ntb_xstats_off[off],
+					 hw->ntb_xstats[off]);
+		}
+	}
+
+	for (i = 0; i < xstats_num; i++) {
+		if (!strncmp(name, xstats_names[i].name,
+		    RTE_RAW_DEV_XSTATS_NAME_SIZE)) {
+			*id = i;
+			rte_free(xstats_names);
+			if (i < NTB_XSTATS_NUM)
+				return hw->ntb_xstats[i];
+			else
+				return ntb_stats_update(hw->ntb_xstats_off[i],
+							hw->ntb_xstats[i]);
+		}
+	}
+
+	NTB_LOG(ERR, "Cannot find the xstats name.");
+
+	return -EINVAL;
 }
 
 static int
-ntb_xstats_reset(struct rte_rawdev *dev __rte_unused,
-		 const uint32_t ids[] __rte_unused,
-		 uint32_t nb_ids __rte_unused)
+ntb_xstats_reset(struct rte_rawdev *dev,
+		 const uint32_t ids[],
+		 uint32_t nb_ids)
 {
-	return 0;
-}
+	struct ntb_hw *hw = dev->dev_private;
+	uint32_t i, j, off, xstats_num;
+
+	xstats_num = NTB_XSTATS_NUM * (hw->queue_pairs + 1);
+	for (i = 0; i < nb_ids && ids[i] < xstats_num; i++) {
+		if (ids[i] < NTB_XSTATS_NUM) {
+			for (j = 0; j < hw->queue_pairs; j++) {
+				off = NTB_XSTATS_NUM * (j + 1) + ids[i];
+				hw->ntb_xstats_off[off] = hw->ntb_xstats[off];
+			}
+		} else {
+			hw->ntb_xstats_off[ids[i]] = hw->ntb_xstats[ids[i]];
+		}
+	}
 
+	return i;
+}
 
 static const struct rte_rawdev_ops ntb_ops = {
 	.dev_info_get         = ntb_dev_info_get,
diff --git a/drivers/raw/ntb/ntb.h b/drivers/raw/ntb/ntb.h
index 69f200b99..3cc160680 100644
--- a/drivers/raw/ntb/ntb.h
+++ b/drivers/raw/ntb/ntb.h
@@ -27,6 +27,15 @@ extern int ntb_logtype;
 
 #define NTB_DFLT_TX_FREE_THRESH     256
 
+enum ntb_xstats_idx {
+	NTB_TX_PKTS_ID = 0,
+	NTB_TX_BYTES_ID,
+	NTB_TX_ERRS_ID,
+	NTB_RX_PKTS_ID,
+	NTB_RX_BYTES_ID,
+	NTB_RX_MISS_ID,
+};
+
 enum ntb_topo {
 	NTB_TOPO_NONE = 0,
 	NTB_TOPO_B2B_USD,
@@ -216,6 +225,9 @@ struct ntb_hw {
 
 	uint8_t peer_used_mws;
 
+	uint64_t *ntb_xstats;
+	uint64_t *ntb_xstats_off;
+
 	/* Reserve several spad for app to use. */
 	int spad_user_list[NTB_SPAD_USER_MAX_NUM];
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [PATCH v6 3/4] raw/ntb: add enqueue and dequeue functions
  2019-09-26  3:20         ` [dpdk-dev] [PATCH v6 0/4] enable FIFO " Xiaoyun Li
  2019-09-26  3:20           ` [dpdk-dev] [PATCH v6 1/4] raw/ntb: setup ntb queue Xiaoyun Li
  2019-09-26  3:20           ` [dpdk-dev] [PATCH v6 2/4] raw/ntb: add xstats support Xiaoyun Li
@ 2019-09-26  3:20           ` Xiaoyun Li
  2019-09-26  3:20           ` [dpdk-dev] [PATCH v6 4/4] examples/ntb: support more functions for NTB Xiaoyun Li
  2019-09-26  4:04           ` [dpdk-dev] [PATCH v6 0/4] enable FIFO " Wu, Jingjing
  4 siblings, 0 replies; 42+ messages in thread
From: Xiaoyun Li @ 2019-09-26  3:20 UTC (permalink / raw)
  To: jingjing.wu, keith.wiles, omkar.maslekar, cunming.liang; +Cc: dev, Xiaoyun Li

Introduce enqueue and dequeue functions to support packet based
processing. And enable write-combining for ntb driver since it
can improve the performance a lot.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
---
 doc/guides/rawdevs/ntb.rst     |  54 ++++++++
 drivers/raw/ntb/ntb.c          | 242 ++++++++++++++++++++++++++++++---
 drivers/raw/ntb/ntb.h          |   2 +
 drivers/raw/ntb/ntb_hw_intel.c |  22 +++
 4 files changed, 301 insertions(+), 19 deletions(-)

diff --git a/doc/guides/rawdevs/ntb.rst b/doc/guides/rawdevs/ntb.rst
index 99e7db441..12f931c97 100644
--- a/doc/guides/rawdevs/ntb.rst
+++ b/doc/guides/rawdevs/ntb.rst
@@ -45,6 +45,50 @@ to use, i.e. igb_uio, vfio. The ``dpdk-devbind.py`` script can be used to
 show devices status and to bind them to a suitable kernel driver. They will
 appear under the category of "Misc (rawdev) devices".
 
+Prerequisites
+-------------
+NTB PMD needs kernel PCI driver to support write combining (WC) to get
+better performance. The difference will be more than 10 times.
+To enable WC, there are 2 ways.
+- Insert igb_uio with ``wc_active=1`` flag if use igb_uio driver.
+
+.. code-block:: console
+  insmod igb_uio.ko wc_active=1
+
+- Enable WC for NTB device's Bar 2 and Bar 4 (Mapped memory) manually.
+The reference is https://www.kernel.org/doc/html/latest/x86/mtrr.html
+Get bar base address using ``lspci -vvv -s ae:00.0 | grep Region``.
+
+.. code-block:: console
+
+  # lspci -vvv -s ae:00.0 | grep Region
+  Region 0: Memory at 39bfe0000000 (64-bit, prefetchable) [size=64K]
+  Region 2: Memory at 39bfa0000000 (64-bit, prefetchable) [size=512M]
+  Region 4: Memory at 39bfc0000000 (64-bit, prefetchable) [size=512M]
+
+Using the following command to enable WC.
+
+.. code-block:: console
+
+  echo "base=0x39bfa0000000 size=0x20000000 type=write-combining" >> /proc/mtrr
+  echo "base=0x39bfc0000000 size=0x20000000 type=write-combining" >> /proc/mtrr
+
+And the results:
+
+.. code-block:: console
+
+  # cat /proc/mtrr
+  reg00: base=0x000000000 (    0MB), size= 2048MB, count=1: write-back
+  reg01: base=0x07f000000 ( 2032MB), size=   16MB, count=1: uncachable
+  reg02: base=0x39bfa0000000 (60553728MB), size=  512MB, count=1: write-combining
+  reg03: base=0x39bfc0000000 (60554240MB), size=  512MB, count=1: write-combining
+
+To disable WC for these regions, using the following.
+
+.. code-block:: console
+     echo "disable=2" >> /proc/mtrr
+     echo "disable=3" >> /proc/mtrr
+
 Ring Layout
 -----------
 
@@ -83,6 +127,16 @@ like the following:
       +------------------------+   +------------------------+
                     <---------traffic---------
 
+- Enqueue and Dequeue
+  Based on this ring layout, enqueue reads rx_tail to get how many free
+  buffers and writes used_ring and tx_tail to tell the peer which buffers
+  are filled with data.
+  And dequeue reads tx_tail to get how many packets are arrived, and
+  writes desc_ring and rx_tail to tell the peer about the new allocated
+  buffers.
+  So in this way, only remote write happens and remote read can be avoid
+  to get better performance.
+
 Limitation
 ----------
 
diff --git a/drivers/raw/ntb/ntb.c b/drivers/raw/ntb/ntb.c
index a30245c64..ad7f6abfd 100644
--- a/drivers/raw/ntb/ntb.c
+++ b/drivers/raw/ntb/ntb.c
@@ -558,26 +558,140 @@ ntb_queue_init(struct rte_rawdev *dev, uint16_t qp_id)
 	return 0;
 }
 
+static inline void
+ntb_enqueue_cleanup(struct ntb_tx_queue *txq)
+{
+	struct ntb_tx_entry *sw_ring = txq->sw_ring;
+	uint16_t tx_free = txq->last_avail;
+	uint16_t nb_to_clean, i;
+
+	/* avail_cnt + 1 represents where to rx next in the peer. */
+	nb_to_clean = (*txq->avail_cnt - txq->last_avail + 1 +
+			txq->nb_tx_desc) & (txq->nb_tx_desc - 1);
+	nb_to_clean = RTE_MIN(nb_to_clean, txq->tx_free_thresh);
+	for (i = 0; i < nb_to_clean; i++) {
+		if (sw_ring[tx_free].mbuf)
+			rte_pktmbuf_free_seg(sw_ring[tx_free].mbuf);
+		tx_free = (tx_free + 1) & (txq->nb_tx_desc - 1);
+	}
+
+	txq->nb_tx_free += nb_to_clean;
+	txq->last_avail = tx_free;
+}
+
 static int
 ntb_enqueue_bufs(struct rte_rawdev *dev,
 		 struct rte_rawdev_buf **buffers,
 		 unsigned int count,
 		 rte_rawdev_obj_t context)
 {
-	/* Not FIFO right now. Just for testing memory write. */
 	struct ntb_hw *hw = dev->dev_private;
-	unsigned int i;
-	void *bar_addr;
-	size_t size;
+	struct ntb_tx_queue *txq = hw->tx_queues[(size_t)context];
+	struct ntb_tx_entry *sw_ring = txq->sw_ring;
+	struct rte_mbuf *txm;
+	struct ntb_used tx_used[NTB_MAX_DESC_SIZE];
+	volatile struct ntb_desc *tx_item;
+	uint16_t tx_last, nb_segs, off, last_used, avail_cnt;
+	uint16_t nb_mbufs = 0;
+	uint16_t nb_tx = 0;
+	uint64_t bytes = 0;
+	void *buf_addr;
+	int i;
 
-	if (hw->ntb_ops->get_peer_mw_addr == NULL)
-		return -ENOTSUP;
-	bar_addr = (*hw->ntb_ops->get_peer_mw_addr)(dev, 0);
-	size = (size_t)context;
+	if (unlikely(hw->ntb_ops->ioremap == NULL)) {
+		NTB_LOG(ERR, "Ioremap not supported.");
+		return nb_tx;
+	}
 
-	for (i = 0; i < count; i++)
-		rte_memcpy(bar_addr, buffers[i]->buf_addr, size);
-	return 0;
+	if (unlikely(dev->started == 0 || hw->peer_dev_up == 0)) {
+		NTB_LOG(DEBUG, "Link is not up.");
+		return nb_tx;
+	}
+
+	if (txq->nb_tx_free < txq->tx_free_thresh)
+		ntb_enqueue_cleanup(txq);
+
+	off = NTB_XSTATS_NUM * ((size_t)context + 1);
+	last_used = txq->last_used;
+	avail_cnt = *txq->avail_cnt;/* Where to alloc next. */
+	for (nb_tx = 0; nb_tx < count; nb_tx++) {
+		txm = (struct rte_mbuf *)(buffers[nb_tx]->buf_addr);
+		if (txm == NULL || txq->nb_tx_free < txm->nb_segs)
+			break;
+
+		tx_last = (txq->last_used + txm->nb_segs - 1) &
+			  (txq->nb_tx_desc - 1);
+		nb_segs = txm->nb_segs;
+		for (i = 0; i < nb_segs; i++) {
+			/* Not enough ring space for tx. */
+			if (txq->last_used == avail_cnt)
+				goto end_of_tx;
+			sw_ring[txq->last_used].mbuf = txm;
+			tx_item = txq->tx_desc_ring + txq->last_used;
+
+			if (!tx_item->len) {
+				(hw->ntb_xstats[NTB_TX_ERRS_ID + off])++;
+				goto end_of_tx;
+			}
+			if (txm->data_len > tx_item->len) {
+				NTB_LOG(ERR, "Data length exceeds buf length."
+					" Only %u data would be transmitted.",
+					tx_item->len);
+				txm->data_len = tx_item->len;
+			}
+
+			/* translate remote virtual addr to bar virtual addr */
+			buf_addr = (*hw->ntb_ops->ioremap)(dev, tx_item->addr);
+			if (buf_addr == NULL) {
+				(hw->ntb_xstats[NTB_TX_ERRS_ID + off])++;
+				NTB_LOG(ERR, "Null remap addr.");
+				goto end_of_tx;
+			}
+			rte_memcpy(buf_addr, rte_pktmbuf_mtod(txm, void *),
+				   txm->data_len);
+
+			tx_used[nb_mbufs].len = txm->data_len;
+			tx_used[nb_mbufs++].flags = (txq->last_used ==
+						    tx_last) ?
+						    NTB_FLAG_EOP : 0;
+
+			/* update stats */
+			bytes += txm->data_len;
+
+			txm = txm->next;
+
+			sw_ring[txq->last_used].next_id = (txq->last_used + 1) &
+						  (txq->nb_tx_desc - 1);
+			sw_ring[txq->last_used].last_id = tx_last;
+			txq->last_used = (txq->last_used + 1) &
+					 (txq->nb_tx_desc - 1);
+		}
+		txq->nb_tx_free -= nb_segs;
+	}
+
+end_of_tx:
+	if (nb_tx) {
+		uint16_t nb1, nb2;
+		if (nb_mbufs > txq->nb_tx_desc - last_used) {
+			nb1 = txq->nb_tx_desc - last_used;
+			nb2 = nb_mbufs - txq->nb_tx_desc + last_used;
+		} else {
+			nb1 = nb_mbufs;
+			nb2 = 0;
+		}
+		rte_memcpy(txq->tx_used_ring + last_used, tx_used,
+			   sizeof(struct ntb_used) * nb1);
+		rte_memcpy(txq->tx_used_ring, tx_used + nb1,
+			   sizeof(struct ntb_used) * nb2);
+		*txq->used_cnt = txq->last_used;
+		rte_wmb();
+
+		/* update queue stats */
+		hw->ntb_xstats[NTB_TX_BYTES_ID + off] += bytes;
+		hw->ntb_xstats[NTB_TX_PKTS_ID + off] += nb_tx;
+	}
+
+	return nb_tx;
 }
 
 static int
@@ -586,16 +700,106 @@ ntb_dequeue_bufs(struct rte_rawdev *dev,
 		 unsigned int count,
 		 rte_rawdev_obj_t context)
 {
-	/* Not FIFO. Just for testing memory read. */
 	struct ntb_hw *hw = dev->dev_private;
-	unsigned int i;
-	size_t size;
+	struct ntb_rx_queue *rxq = hw->rx_queues[(size_t)context];
+	struct ntb_rx_entry *sw_ring = rxq->sw_ring;
+	struct ntb_desc rx_desc[NTB_MAX_DESC_SIZE];
+	struct rte_mbuf *first, *rxm_t;
+	struct rte_mbuf *prev = NULL;
+	volatile struct ntb_used *rx_item;
+	uint16_t nb_mbufs = 0;
+	uint16_t nb_rx = 0;
+	uint64_t bytes = 0;
+	uint16_t off, last_avail, used_cnt, used_nb;
+	int i;
+
+	if (unlikely(dev->started == 0 || hw->peer_dev_up == 0)) {
+		NTB_LOG(DEBUG, "Link is not up");
+		return nb_rx;
+	}
+
+	used_cnt = *rxq->used_cnt;
+
+	if (rxq->last_used == used_cnt)
+		return nb_rx;
+
+	last_avail = rxq->last_avail;
+	used_nb = (used_cnt - rxq->last_used) & (rxq->nb_rx_desc - 1);
+	count = RTE_MIN(count, used_nb);
+	for (nb_rx = 0; nb_rx < count; nb_rx++) {
+		i = 0;
+		while (true) {
+			rx_item = rxq->rx_used_ring + rxq->last_used;
+			rxm_t = sw_ring[rxq->last_used].mbuf;
+			rxm_t->data_len = rx_item->len;
+			rxm_t->data_off = RTE_PKTMBUF_HEADROOM;
+			rxm_t->port = rxq->port_id;
+
+			if (!i) {
+				rxm_t->nb_segs = 1;
+				first = rxm_t;
+				first->pkt_len = 0;
+				buffers[nb_rx]->buf_addr = rxm_t;
+			} else {
+				prev->next = rxm_t;
+				first->nb_segs++;
+			}
 
-	size = (size_t)context;
+			prev = rxm_t;
+			first->pkt_len += prev->data_len;
+			rxq->last_used = (rxq->last_used + 1) &
+					 (rxq->nb_rx_desc - 1);
 
-	for (i = 0; i < count; i++)
-		rte_memcpy(buffers[i]->buf_addr, hw->mz[i]->addr, size);
-	return 0;
+			/* alloc new mbuf */
+			rxm_t = rte_mbuf_raw_alloc(rxq->mpool);
+			if (unlikely(rxm_t == NULL)) {
+				NTB_LOG(ERR, "recv alloc mbuf failed.");
+				goto end_of_rx;
+			}
+			rxm_t->port = rxq->port_id;
+			sw_ring[rxq->last_avail].mbuf = rxm_t;
+			i++;
+
+			/* fill new desc */
+			rx_desc[nb_mbufs].addr =
+					rte_pktmbuf_mtod(rxm_t, size_t);
+			rx_desc[nb_mbufs++].len = rxm_t->buf_len -
+						  RTE_PKTMBUF_HEADROOM;
+			rxq->last_avail = (rxq->last_avail + 1) &
+					  (rxq->nb_rx_desc - 1);
+
+			if (rx_item->flags & NTB_FLAG_EOP)
+				break;
+		}
+		/* update stats */
+		bytes += first->pkt_len;
+	}
+
+end_of_rx:
+	if (nb_rx) {
+		uint16_t nb1, nb2;
+		if (nb_mbufs > rxq->nb_rx_desc - last_avail) {
+			nb1 = rxq->nb_rx_desc - last_avail;
+			nb2 = nb_mbufs - rxq->nb_rx_desc + last_avail;
+		} else {
+			nb1 = nb_mbufs;
+			nb2 = 0;
+		}
+		rte_memcpy(rxq->rx_desc_ring + last_avail, rx_desc,
+			   sizeof(struct ntb_desc) * nb1);
+		rte_memcpy(rxq->rx_desc_ring, rx_desc + nb1,
+			   sizeof(struct ntb_desc) * nb2);
+		*rxq->avail_cnt = rxq->last_avail;
+		rte_wmb();
+
+		/* update queue stats */
+		off = NTB_XSTATS_NUM * ((size_t)context + 1);
+		hw->ntb_xstats[NTB_RX_BYTES_ID + off] += bytes;
+		hw->ntb_xstats[NTB_RX_PKTS_ID + off] += nb_rx;
+		hw->ntb_xstats[NTB_RX_MISS_ID + off] += (count - nb_rx);
+	}
+
+	return nb_rx;
 }
 
 static void
@@ -1292,7 +1496,7 @@ ntb_remove(struct rte_pci_device *pci_dev)
 
 static struct rte_pci_driver rte_ntb_pmd = {
 	.id_table = pci_id_ntb_map,
-	.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_WC_ACTIVATE,
 	.probe = ntb_probe,
 	.remove = ntb_remove,
 };
diff --git a/drivers/raw/ntb/ntb.h b/drivers/raw/ntb/ntb.h
index 3cc160680..a561c42d1 100644
--- a/drivers/raw/ntb/ntb.h
+++ b/drivers/raw/ntb/ntb.h
@@ -87,6 +87,7 @@ enum ntb_spad_idx {
  * @ntb_dev_init: Init ntb dev.
  * @get_peer_mw_addr: To get the addr of peer mw[mw_idx].
  * @mw_set_trans: Set translation of internal memory that remote can access.
+ * @ioremap: Translate the remote host address to bar address.
  * @get_link_status: get link status, link speed and link width.
  * @set_link: Set local side up/down.
  * @spad_read: Read local/peer spad register val.
@@ -103,6 +104,7 @@ struct ntb_dev_ops {
 	void *(*get_peer_mw_addr)(const struct rte_rawdev *dev, int mw_idx);
 	int (*mw_set_trans)(const struct rte_rawdev *dev, int mw_idx,
 			    uint64_t addr, uint64_t size);
+	void *(*ioremap)(const struct rte_rawdev *dev, uint64_t addr);
 	int (*get_link_status)(const struct rte_rawdev *dev);
 	int (*set_link)(const struct rte_rawdev *dev, bool up);
 	uint32_t (*spad_read)(const struct rte_rawdev *dev, int spad,
diff --git a/drivers/raw/ntb/ntb_hw_intel.c b/drivers/raw/ntb/ntb_hw_intel.c
index 0e73f1609..e7f8667cd 100644
--- a/drivers/raw/ntb/ntb_hw_intel.c
+++ b/drivers/raw/ntb/ntb_hw_intel.c
@@ -162,6 +162,27 @@ intel_ntb_mw_set_trans(const struct rte_rawdev *dev, int mw_idx,
 	return 0;
 }
 
+static void *
+intel_ntb_ioremap(const struct rte_rawdev *dev, uint64_t addr)
+{
+	struct ntb_hw *hw = dev->dev_private;
+	void *mapped = NULL;
+	void *base;
+	int i;
+
+	for (i = 0; i < hw->peer_used_mws; i++) {
+		if (addr >= hw->peer_mw_base[i] &&
+		    addr <= hw->peer_mw_base[i] + hw->mw_size[i]) {
+			base = intel_ntb_get_peer_mw_addr(dev, i);
+			mapped = (void *)(size_t)(addr - hw->peer_mw_base[i] +
+				 (size_t)base);
+			break;
+		}
+	}
+
+	return mapped;
+}
+
 static int
 intel_ntb_get_link_status(const struct rte_rawdev *dev)
 {
@@ -357,6 +378,7 @@ const struct ntb_dev_ops intel_ntb_ops = {
 	.ntb_dev_init       = intel_ntb_dev_init,
 	.get_peer_mw_addr   = intel_ntb_get_peer_mw_addr,
 	.mw_set_trans       = intel_ntb_mw_set_trans,
+	.ioremap            = intel_ntb_ioremap,
 	.get_link_status    = intel_ntb_get_link_status,
 	.set_link           = intel_ntb_set_link,
 	.spad_read          = intel_ntb_spad_read,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [dpdk-dev] [PATCH v6 4/4] examples/ntb: support more functions for NTB
  2019-09-26  3:20         ` [dpdk-dev] [PATCH v6 0/4] enable FIFO " Xiaoyun Li
                             ` (2 preceding siblings ...)
  2019-09-26  3:20           ` [dpdk-dev] [PATCH v6 3/4] raw/ntb: add enqueue and dequeue functions Xiaoyun Li
@ 2019-09-26  3:20           ` Xiaoyun Li
  2019-09-26  4:04           ` [dpdk-dev] [PATCH v6 0/4] enable FIFO " Wu, Jingjing
  4 siblings, 0 replies; 42+ messages in thread
From: Xiaoyun Li @ 2019-09-26  3:20 UTC (permalink / raw)
  To: jingjing.wu, keith.wiles, omkar.maslekar, cunming.liang; +Cc: dev, Xiaoyun Li

Support to transmit files between two systems.
Support iofwd between one ethdev and NTB device.
Support rxonly and txonly for NTB device.
Support to set forwarding mode as file-trans, txonly,
rxonly or iofwd.
Support to show/clear port stats and throughput.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
---
 doc/guides/sample_app_ug/ntb.rst |   59 +-
 examples/ntb/meson.build         |    3 +
 examples/ntb/ntb_fwd.c           | 1298 +++++++++++++++++++++++++++---
 3 files changed, 1232 insertions(+), 128 deletions(-)

diff --git a/doc/guides/sample_app_ug/ntb.rst b/doc/guides/sample_app_ug/ntb.rst
index 079242175..df16af86c 100644
--- a/doc/guides/sample_app_ug/ntb.rst
+++ b/doc/guides/sample_app_ug/ntb.rst
@@ -5,8 +5,17 @@ NTB Sample Application
 ======================
 
 The ntb sample application shows how to use ntb rawdev driver.
-This sample provides interactive mode to transmit file between
-two hosts.
+This sample provides interactive mode to do packet based processing
+between two systems.
+
+This sample supports 4 types of packet forwarding mode.
+
+* ``file-trans``: transmit files between two systems. The sample will
+  be polling to receive files from the peer and save the file as
+  ``ntb_recv_file[N]``, [N] represents the number of received file.
+* ``rxonly``: NTB receives packets but doesn't transmit them.
+* ``txonly``: NTB generates and transmits packets without receiving any.
+* ``iofwd``: iofwd between NTB device and ethdev.
 
 Compiling the Application
 -------------------------
@@ -29,6 +38,40 @@ Refer to the *DPDK Getting Started Guide* for general information on
 running applications and the Environment Abstraction Layer (EAL)
 options.
 
+Command-line Options
+--------------------
+
+The application supports the following command-line options.
+
+* ``--buf-size=N``
+
+  Set the data size of the mbufs used to N bytes, where N < 65536.
+  The default value is 2048.
+
+* ``--fwd-mode=mode``
+
+  Set the packet forwarding mode as ``file-trans``, ``txonly``,
+  ``rxonly`` or ``iofwd``.
+
+* ``--nb-desc=N``
+
+  Set number of descriptors of queue as N, namely queue size,
+  where 64 <= N <= 1024. The default value is 1024.
+
+* ``--txfreet=N``
+
+  Set the transmit free threshold of TX rings to N, where 0 <= N <=
+  the value of ``--nb-desc``. The default value is 256.
+
+* ``--burst=N``
+
+  Set the number of packets per burst to N, where 1 <= N <= 32.
+  The default value is 32.
+
+* ``--qp=N``
+
+  Set the number of queues as N, where qp > 0. The default value is 1.
+
 Using the application
 ---------------------
 
@@ -41,7 +84,11 @@ The application is console-driven using the cmdline DPDK interface:
 From this interface the available commands and descriptions of what
 they do as as follows:
 
-* ``send [filepath]``: Send file to the peer host.
-* ``receive [filepath]``: Receive file to [filepath]. Need the peer
-  to send file successfully first.
-* ``quit``: Exit program
+* ``send [filepath]``: Send file to the peer host. Need to be in
+  file-trans forwarding mode first.
+* ``start``: Start transmission.
+* ``stop``: Stop transmission.
+* ``show/clear port stats``: Show/Clear port stats and throughput.
+* ``set fwd file-trans/rxonly/txonly/iofwd``: Set packet forwarding
+  mode.
+* ``quit``: Exit program.
diff --git a/examples/ntb/meson.build b/examples/ntb/meson.build
index 9a6288f4f..f5435fe12 100644
--- a/examples/ntb/meson.build
+++ b/examples/ntb/meson.build
@@ -14,3 +14,6 @@ cflags += ['-D_FILE_OFFSET_BITS=64']
 sources = files(
 	'ntb_fwd.c'
 )
+if dpdk_conf.has('RTE_LIBRTE_PMD_NTB_RAWDEV')
+	deps += 'rawdev_ntb'
+endif
diff --git a/examples/ntb/ntb_fwd.c b/examples/ntb/ntb_fwd.c
index f8c970cdb..b1ea71c8f 100644
--- a/examples/ntb/ntb_fwd.c
+++ b/examples/ntb/ntb_fwd.c
@@ -14,21 +14,103 @@
 #include <cmdline.h>
 #include <rte_common.h>
 #include <rte_rawdev.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
 #include <rte_lcore.h>
+#include <rte_cycles.h>
+#include <rte_pmd_ntb.h>
 
-#define NTB_DRV_NAME_LEN	7
-static uint64_t max_file_size = 0x400000;
+/* Per-port statistics struct */
+struct ntb_port_statistics {
+	uint64_t tx;
+	uint64_t rx;
+} __rte_cache_aligned;
+/* Port 0: NTB dev, Port 1: ethdev when iofwd. */
+struct ntb_port_statistics ntb_port_stats[2];
+
+struct ntb_fwd_stream {
+	uint16_t tx_port;
+	uint16_t rx_port;
+	uint16_t qp_id;
+	uint8_t tx_ntb;  /* If ntb device is tx port. */
+};
+
+struct ntb_fwd_lcore_conf {
+	uint16_t stream_id;
+	uint16_t nb_stream;
+	uint8_t stopped;
+};
+
+enum ntb_fwd_mode {
+	FILE_TRANS = 0,
+	RXONLY,
+	TXONLY,
+	IOFWD,
+	MAX_FWD_MODE,
+};
+static const char *const fwd_mode_s[] = {
+	"file-trans",
+	"rxonly",
+	"txonly",
+	"iofwd",
+	NULL,
+};
+static enum ntb_fwd_mode fwd_mode = MAX_FWD_MODE;
+
+static struct ntb_fwd_lcore_conf fwd_lcore_conf[RTE_MAX_LCORE];
+static struct ntb_fwd_stream *fwd_streams;
+
+static struct rte_mempool *mbuf_pool;
+
+#define NTB_DRV_NAME_LEN 7
+#define MEMPOOL_CACHE_SIZE 256
+
+static uint8_t in_test;
 static uint8_t interactive = 1;
+static uint16_t eth_port_id = RTE_MAX_ETHPORTS;
 static uint16_t dev_id;
 
+/* Number of queues, default set as 1 */
+static uint16_t num_queues = 1;
+static uint16_t ntb_buf_size = RTE_MBUF_DEFAULT_BUF_SIZE;
+
+/* Configurable number of descriptors */
+#define NTB_DEFAULT_NUM_DESCS 1024
+static uint16_t nb_desc = NTB_DEFAULT_NUM_DESCS;
+
+static uint16_t tx_free_thresh;
+
+#define NTB_MAX_PKT_BURST 32
+#define NTB_DFLT_PKT_BURST 32
+static uint16_t pkt_burst = NTB_DFLT_PKT_BURST;
+
+#define BURST_TX_RETRIES 64
+
+static struct rte_eth_conf eth_port_conf = {
+	.rxmode = {
+		.mq_mode = ETH_MQ_RX_RSS,
+		.split_hdr_size = 0,
+	},
+	.rx_adv_conf = {
+		.rss_conf = {
+			.rss_key = NULL,
+			.rss_hf = ETH_RSS_IP,
+		},
+	},
+	.txmode = {
+		.mq_mode = ETH_MQ_TX_NONE,
+	},
+};
+
 /* *** Help command with introduction. *** */
 struct cmd_help_result {
 	cmdline_fixed_string_t help;
 };
 
-static void cmd_help_parsed(__attribute__((unused)) void *parsed_result,
-			    struct cmdline *cl,
-			    __attribute__((unused)) void *data)
+static void
+cmd_help_parsed(__attribute__((unused)) void *parsed_result,
+		struct cmdline *cl,
+		__attribute__((unused)) void *data)
 {
 	cmdline_printf(
 		cl,
@@ -37,13 +119,17 @@ static void cmd_help_parsed(__attribute__((unused)) void *parsed_result,
 		"Control:\n"
 		"    quit                                      :"
 		" Quit the application.\n"
-		"\nFile transmit:\n"
+		"\nTransmission:\n"
 		"    send [path]                               :"
-		" Send [path] file. (No more than %"PRIu64")\n"
-		"    recv [path]                            :"
-		" Receive file to [path]. Make sure sending is done"
-		" on the other side.\n",
-		max_file_size
+		" Send [path] file. Only take effect in file-trans mode\n"
+		"    start                                     :"
+		" Start transmissions.\n"
+		"    stop                                      :"
+		" Stop transmissions.\n"
+		"    clear/show port stats                     :"
+		" Clear/show port stats.\n"
+		"    set fwd file-trans/rxonly/txonly/iofwd    :"
+		" Set packet forwarding mode.\n"
 	);
 
 }
@@ -66,13 +152,37 @@ struct cmd_quit_result {
 	cmdline_fixed_string_t quit;
 };
 
-static void cmd_quit_parsed(__attribute__((unused)) void *parsed_result,
-			    struct cmdline *cl,
-			    __attribute__((unused)) void *data)
+static void
+cmd_quit_parsed(__attribute__((unused)) void *parsed_result,
+		struct cmdline *cl,
+		__attribute__((unused)) void *data)
 {
+	struct ntb_fwd_lcore_conf *conf;
+	uint8_t lcore_id;
+
+	/* Stop transmission first. */
+	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
+		conf = &fwd_lcore_conf[lcore_id];
+
+		if (!conf->nb_stream)
+			continue;
+
+		if (conf->stopped)
+			continue;
+
+		conf->stopped = 1;
+	}
+	printf("\nWaiting for lcores to finish...\n");
+	rte_eal_mp_wait_lcore();
+	in_test = 0;
+
 	/* Stop traffic and Close port. */
 	rte_rawdev_stop(dev_id);
 	rte_rawdev_close(dev_id);
+	if (eth_port_id < RTE_MAX_ETHPORTS && fwd_mode == IOFWD) {
+		rte_eth_dev_stop(eth_port_id);
+		rte_eth_dev_close(eth_port_id);
+	}
 
 	cmdline_quit(cl);
 }
@@ -102,21 +212,19 @@ cmd_sendfile_parsed(void *parsed_result,
 		    __attribute__((unused)) void *data)
 {
 	struct cmd_sendfile_result *res = parsed_result;
-	struct rte_rawdev_buf *pkts_send[1];
-	uint64_t rsize, size, link;
-	uint8_t *buff;
+	struct rte_rawdev_buf *pkts_send[NTB_MAX_PKT_BURST];
+	struct rte_mbuf *mbuf_send[NTB_MAX_PKT_BURST];
+	uint64_t size, count, i, nb_burst;
+	uint16_t nb_tx, buf_size;
+	unsigned int nb_pkt;
+	size_t queue_id = 0;
+	uint16_t retry = 0;
 	uint32_t val;
 	FILE *file;
 
-	if (!rte_rawdevs[dev_id].started) {
-		printf("Device needs to be up first. Try later.\n");
-		return;
-	}
-
-	rte_rawdev_get_attr(dev_id, "link_status", &link);
-	if (!link) {
-		printf("Link is not up, cannot send file.\n");
-		return;
+	if (num_queues != 1) {
+		printf("File transmission only supports 1 queue.\n");
+		num_queues = 1;
 	}
 
 	file = fopen(res->filepath, "r");
@@ -127,30 +235,13 @@ cmd_sendfile_parsed(void *parsed_result,
 
 	if (fseek(file, 0, SEEK_END) < 0) {
 		printf("Fail to get file size.\n");
+		fclose(file);
 		return;
 	}
 	size = ftell(file);
 	if (fseek(file, 0, SEEK_SET) < 0) {
 		printf("Fail to get file size.\n");
-		return;
-	}
-
-	/**
-	 * No FIFO now. Only test memory. Limit sending file
-	 * size <= max_file_size.
-	 */
-	if (size > max_file_size) {
-		printf("Warning: The file is too large. Only send first"
-		       " %"PRIu64" bits.\n", max_file_size);
-		size = max_file_size;
-	}
-
-	buff = (uint8_t *)malloc(size);
-	rsize = fread(buff, size, 1, file);
-	if (rsize != 1) {
-		printf("Fail to read file.\n");
 		fclose(file);
-		free(buff);
 		return;
 	}
 
@@ -159,22 +250,63 @@ cmd_sendfile_parsed(void *parsed_result,
 	rte_rawdev_set_attr(dev_id, "spad_user_0", val);
 	val = size;
 	rte_rawdev_set_attr(dev_id, "spad_user_1", val);
+	printf("Sending file, size is %"PRIu64"\n", size);
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		pkts_send[i] = (struct rte_rawdev_buf *)
+				malloc(sizeof(struct rte_rawdev_buf));
+
+	buf_size = ntb_buf_size - RTE_PKTMBUF_HEADROOM;
+	count = (size + buf_size - 1) / buf_size;
+	nb_burst = (count + pkt_burst - 1) / pkt_burst;
 
-	pkts_send[0] = (struct rte_rawdev_buf *)malloc
-			(sizeof(struct rte_rawdev_buf));
-	pkts_send[0]->buf_addr = buff;
+	for (i = 0; i < nb_burst; i++) {
+		val = RTE_MIN(count, pkt_burst);
+		if (rte_mempool_get_bulk(mbuf_pool, (void **)mbuf_send,
+					val) == 0) {
+			for (nb_pkt = 0; nb_pkt < val; nb_pkt++) {
+				mbuf_send[nb_pkt]->port = dev_id;
+				mbuf_send[nb_pkt]->data_len =
+				fread(rte_pktmbuf_mtod(mbuf_send[nb_pkt],
+					void *), 1, buf_size, file);
+				mbuf_send[nb_pkt]->pkt_len =
+					mbuf_send[nb_pkt]->data_len;
+				pkts_send[nb_pkt]->buf_addr = mbuf_send[nb_pkt];
+			}
+		} else {
+			for (nb_pkt = 0; nb_pkt < val; nb_pkt++) {
+				mbuf_send[nb_pkt] =
+					rte_mbuf_raw_alloc(mbuf_pool);
+				if (mbuf_send[nb_pkt] == NULL)
+					break;
+				mbuf_send[nb_pkt]->port = dev_id;
+				mbuf_send[nb_pkt]->data_len =
+				fread(rte_pktmbuf_mtod(mbuf_send[nb_pkt],
+					void *), 1, buf_size, file);
+				mbuf_send[nb_pkt]->pkt_len =
+					mbuf_send[nb_pkt]->data_len;
+				pkts_send[nb_pkt]->buf_addr = mbuf_send[nb_pkt];
+			}
+		}
 
-	if (rte_rawdev_enqueue_buffers(dev_id, pkts_send, 1,
-				       (void *)(size_t)size)) {
-		printf("Fail to enqueue.\n");
-		goto clean;
+		nb_tx = rte_rawdev_enqueue_buffers(dev_id, pkts_send, nb_pkt,
+						   (void *)queue_id);
+		while (nb_tx != nb_pkt && retry < BURST_TX_RETRIES) {
+			rte_delay_us(1);
+			nb_tx += rte_rawdev_enqueue_buffers(dev_id,
+				&pkts_send[nb_tx], nb_pkt - nb_tx,
+				(void *)queue_id);
+		}
+		count -= nb_pkt;
 	}
+	/* Clear register after file sending done. */
+	rte_rawdev_set_attr(dev_id, "spad_user_0", 0);
+	rte_rawdev_set_attr(dev_id, "spad_user_1", 0);
 	printf("Done sending file.\n");
 
-clean:
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		free(pkts_send[i]);
 	fclose(file);
-	free(buff);
-	free(pkts_send[0]);
 }
 
 cmdline_parse_token_string_t cmd_send_file_send =
@@ -195,79 +327,680 @@ cmdline_parse_inst_t cmd_send_file = {
 	},
 };
 
-/* *** RECEIVE FILE PARAMETERS *** */
-struct cmd_recvfile_result {
-	cmdline_fixed_string_t recv_string;
-	char filepath[];
-};
+#define RECV_FILE_LEN 30
+static int
+start_polling_recv_file(void *param)
+{
+	struct rte_rawdev_buf *pkts_recv[NTB_MAX_PKT_BURST];
+	struct ntb_fwd_lcore_conf *conf = param;
+	struct rte_mbuf *mbuf;
+	char filepath[RECV_FILE_LEN];
+	uint64_t val, size, file_len;
+	uint16_t nb_rx, i, file_no;
+	size_t queue_id = 0;
+	FILE *file;
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		pkts_recv[i] = (struct rte_rawdev_buf *)
+				malloc(sizeof(struct rte_rawdev_buf));
+
+	file_no = 0;
+	while (!conf->stopped) {
+		snprintf(filepath, RECV_FILE_LEN, "ntb_recv_file%d", file_no);
+		file = fopen(filepath, "w");
+		if (file == NULL) {
+			printf("Fail to open the file.\n");
+			return -EINVAL;
+		}
+
+		rte_rawdev_get_attr(dev_id, "spad_user_0", &val);
+		size = val << 32;
+		rte_rawdev_get_attr(dev_id, "spad_user_1", &val);
+		size |= val;
+
+		if (!size) {
+			fclose(file);
+			continue;
+		}
+
+		file_len = 0;
+		nb_rx = NTB_MAX_PKT_BURST;
+		while (file_len < size && !conf->stopped) {
+			nb_rx = rte_rawdev_dequeue_buffers(dev_id, pkts_recv,
+						pkt_burst, (void *)queue_id);
+			ntb_port_stats[0].rx += nb_rx;
+			for (i = 0; i < nb_rx; i++) {
+				mbuf = pkts_recv[i]->buf_addr;
+				fwrite(rte_pktmbuf_mtod(mbuf, void *), 1,
+					mbuf->data_len, file);
+				file_len += mbuf->data_len;
+				rte_pktmbuf_free(mbuf);
+				pkts_recv[i]->buf_addr = NULL;
+			}
+		}
+
+		printf("Received file (size: %" PRIu64 ") from peer to %s.\n",
+			size, filepath);
+		fclose(file);
+		file_no++;
+	}
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		free(pkts_recv[i]);
+	return 0;
+}
+
+static int
+start_iofwd_per_lcore(void *param)
+{
+	struct rte_rawdev_buf *ntb_buf[NTB_MAX_PKT_BURST];
+	struct rte_mbuf *pkts_burst[NTB_MAX_PKT_BURST];
+	struct ntb_fwd_lcore_conf *conf = param;
+	struct ntb_fwd_stream fs;
+	uint16_t nb_rx, nb_tx;
+	int i, j;
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		ntb_buf[i] = (struct rte_rawdev_buf *)
+			     malloc(sizeof(struct rte_rawdev_buf));
+
+	while (!conf->stopped) {
+		for (i = 0; i < conf->nb_stream; i++) {
+			fs = fwd_streams[conf->stream_id + i];
+			if (fs.tx_ntb) {
+				nb_rx = rte_eth_rx_burst(fs.rx_port,
+						fs.qp_id, pkts_burst,
+						pkt_burst);
+				if (unlikely(nb_rx == 0))
+					continue;
+				for (j = 0; j < nb_rx; j++)
+					ntb_buf[j]->buf_addr = pkts_burst[j];
+				nb_tx =
+				rte_rawdev_enqueue_buffers(fs.tx_port,
+						ntb_buf, nb_rx,
+						(void *)(size_t)fs.qp_id);
+				ntb_port_stats[0].tx += nb_tx;
+				ntb_port_stats[1].rx += nb_rx;
+			} else {
+				nb_rx =
+				rte_rawdev_dequeue_buffers(fs.rx_port,
+						ntb_buf, pkt_burst,
+						(void *)(size_t)fs.qp_id);
+				if (unlikely(nb_rx == 0))
+					continue;
+				for (j = 0; j < nb_rx; j++)
+					pkts_burst[j] = ntb_buf[j]->buf_addr;
+				nb_tx = rte_eth_tx_burst(fs.tx_port,
+					fs.qp_id, pkts_burst, nb_rx);
+				ntb_port_stats[1].tx += nb_tx;
+				ntb_port_stats[0].rx += nb_rx;
+			}
+			if (unlikely(nb_tx < nb_rx)) {
+				do {
+					rte_pktmbuf_free(pkts_burst[nb_tx]);
+				} while (++nb_tx < nb_rx);
+			}
+		}
+	}
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		free(ntb_buf[i]);
+
+	return 0;
+}
+
+static int
+start_rxonly_per_lcore(void *param)
+{
+	struct rte_rawdev_buf *ntb_buf[NTB_MAX_PKT_BURST];
+	struct ntb_fwd_lcore_conf *conf = param;
+	struct ntb_fwd_stream fs;
+	uint16_t nb_rx;
+	int i, j;
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		ntb_buf[i] = (struct rte_rawdev_buf *)
+			     malloc(sizeof(struct rte_rawdev_buf));
+
+	while (!conf->stopped) {
+		for (i = 0; i < conf->nb_stream; i++) {
+			fs = fwd_streams[conf->stream_id + i];
+			nb_rx = rte_rawdev_dequeue_buffers(fs.rx_port,
+				ntb_buf, pkt_burst, (void *)(size_t)fs.qp_id);
+			if (unlikely(nb_rx == 0))
+				continue;
+			ntb_port_stats[0].rx += nb_rx;
+
+			for (j = 0; j < nb_rx; j++)
+				rte_pktmbuf_free(ntb_buf[j]->buf_addr);
+		}
+	}
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		free(ntb_buf[i]);
+
+	return 0;
+}
+
+
+static int
+start_txonly_per_lcore(void *param)
+{
+	struct rte_rawdev_buf *ntb_buf[NTB_MAX_PKT_BURST];
+	struct rte_mbuf *pkts_burst[NTB_MAX_PKT_BURST];
+	struct ntb_fwd_lcore_conf *conf = param;
+	struct ntb_fwd_stream fs;
+	uint16_t nb_pkt, nb_tx;
+	int i;
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		ntb_buf[i] = (struct rte_rawdev_buf *)
+			     malloc(sizeof(struct rte_rawdev_buf));
+
+	while (!conf->stopped) {
+		for (i = 0; i < conf->nb_stream; i++) {
+			fs = fwd_streams[conf->stream_id + i];
+			if (rte_mempool_get_bulk(mbuf_pool, (void **)pkts_burst,
+				  pkt_burst) == 0) {
+				for (nb_pkt = 0; nb_pkt < pkt_burst; nb_pkt++) {
+					pkts_burst[nb_pkt]->port = dev_id;
+					pkts_burst[nb_pkt]->data_len =
+						pkts_burst[nb_pkt]->buf_len -
+						RTE_PKTMBUF_HEADROOM;
+					pkts_burst[nb_pkt]->pkt_len =
+						pkts_burst[nb_pkt]->data_len;
+					ntb_buf[nb_pkt]->buf_addr =
+						pkts_burst[nb_pkt];
+				}
+			} else {
+				for (nb_pkt = 0; nb_pkt < pkt_burst; nb_pkt++) {
+					pkts_burst[nb_pkt] =
+						rte_pktmbuf_alloc(mbuf_pool);
+					if (pkts_burst[nb_pkt] == NULL)
+						break;
+					pkts_burst[nb_pkt]->port = dev_id;
+					pkts_burst[nb_pkt]->data_len =
+						pkts_burst[nb_pkt]->buf_len -
+						RTE_PKTMBUF_HEADROOM;
+					pkts_burst[nb_pkt]->pkt_len =
+						pkts_burst[nb_pkt]->data_len;
+					ntb_buf[nb_pkt]->buf_addr =
+						pkts_burst[nb_pkt];
+				}
+			}
+			nb_tx = rte_rawdev_enqueue_buffers(fs.tx_port,
+				ntb_buf, nb_pkt, (void *)(size_t)fs.qp_id);
+			ntb_port_stats[0].tx += nb_tx;
+			if (unlikely(nb_tx < nb_pkt)) {
+				do {
+					rte_pktmbuf_free(
+						ntb_buf[nb_tx]->buf_addr);
+				} while (++nb_tx < nb_pkt);
+			}
+		}
+	}
+
+	for (i = 0; i < NTB_MAX_PKT_BURST; i++)
+		free(ntb_buf[i]);
+
+	return 0;
+}
+
+static int
+ntb_fwd_config_setup(void)
+{
+	uint16_t i;
+
+	/* Make sure iofwd has valid ethdev. */
+	if (fwd_mode == IOFWD && eth_port_id >= RTE_MAX_ETHPORTS) {
+		printf("No ethdev, cannot be in iofwd mode.");
+		return -EINVAL;
+	}
+
+	if (fwd_mode == IOFWD) {
+		fwd_streams = rte_zmalloc("ntb_fwd: fwd_streams",
+			sizeof(struct ntb_fwd_stream) * num_queues * 2,
+			RTE_CACHE_LINE_SIZE);
+		for (i = 0; i < num_queues; i++) {
+			fwd_streams[i * 2].qp_id = i;
+			fwd_streams[i * 2].tx_port = dev_id;
+			fwd_streams[i * 2].rx_port = eth_port_id;
+			fwd_streams[i * 2].tx_ntb = 1;
+
+			fwd_streams[i * 2 + 1].qp_id = i;
+			fwd_streams[i * 2 + 1].tx_port = eth_port_id;
+			fwd_streams[i * 2 + 1].rx_port = dev_id;
+			fwd_streams[i * 2 + 1].tx_ntb = 0;
+		}
+		return 0;
+	}
+
+	if (fwd_mode == RXONLY || fwd_mode == FILE_TRANS) {
+		/* Only support 1 queue in file-trans for in order. */
+		if (fwd_mode == FILE_TRANS)
+			num_queues = 1;
+
+		fwd_streams = rte_zmalloc("ntb_fwd: fwd_streams",
+				sizeof(struct ntb_fwd_stream) * num_queues,
+				RTE_CACHE_LINE_SIZE);
+		for (i = 0; i < num_queues; i++) {
+			fwd_streams[i].qp_id = i;
+			fwd_streams[i].tx_port = RTE_MAX_ETHPORTS;
+			fwd_streams[i].rx_port = dev_id;
+			fwd_streams[i].tx_ntb = 0;
+		}
+		return 0;
+	}
+
+	if (fwd_mode == TXONLY) {
+		fwd_streams = rte_zmalloc("ntb_fwd: fwd_streams",
+				sizeof(struct ntb_fwd_stream) * num_queues,
+				RTE_CACHE_LINE_SIZE);
+		for (i = 0; i < num_queues; i++) {
+			fwd_streams[i].qp_id = i;
+			fwd_streams[i].tx_port = dev_id;
+			fwd_streams[i].rx_port = RTE_MAX_ETHPORTS;
+			fwd_streams[i].tx_ntb = 1;
+		}
+	}
+	return 0;
+}
 
 static void
-cmd_recvfile_parsed(void *parsed_result,
-		    __attribute__((unused)) struct cmdline *cl,
-		    __attribute__((unused)) void *data)
+assign_stream_to_lcores(void)
 {
-	struct cmd_sendfile_result *res = parsed_result;
-	struct rte_rawdev_buf *pkts_recv[1];
-	uint8_t *buff;
-	uint64_t val;
-	size_t size;
-	FILE *file;
+	struct ntb_fwd_lcore_conf *conf;
+	struct ntb_fwd_stream *fs;
+	uint16_t nb_streams, sm_per_lcore, sm_id, i;
+	uint8_t lcore_id, lcore_num, nb_extra;
 
-	if (!rte_rawdevs[dev_id].started) {
-		printf("Device needs to be up first. Try later.\n");
-		return;
+	lcore_num = rte_lcore_count();
+	/* Exclude master core */
+	lcore_num--;
+
+	nb_streams = (fwd_mode == IOFWD) ? num_queues * 2 : num_queues;
+
+	sm_per_lcore = nb_streams / lcore_num;
+	nb_extra = nb_streams % lcore_num;
+	sm_id = 0;
+	i = 0;
+
+	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
+		conf = &fwd_lcore_conf[lcore_id];
+
+		if (i < nb_extra) {
+			conf->nb_stream = sm_per_lcore + 1;
+			conf->stream_id = sm_id;
+			sm_id = sm_id + sm_per_lcore + 1;
+		} else {
+			conf->nb_stream = sm_per_lcore;
+			conf->stream_id = sm_id;
+			sm_id = sm_id + sm_per_lcore;
+		}
+
+		i++;
+		if (sm_id >= nb_streams)
+			break;
+	}
+
+	/* Print packet forwading config. */
+	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
+		conf = &fwd_lcore_conf[lcore_id];
+
+		if (!conf->nb_stream)
+			continue;
+
+		printf("Streams on Lcore %u :\n", lcore_id);
+		for (i = 0; i < conf->nb_stream; i++) {
+			fs = &fwd_streams[conf->stream_id + i];
+			if (fwd_mode == IOFWD)
+				printf(" + Stream %u : %s%u RX -> %s%u TX,"
+					" Q=%u\n", conf->stream_id + i,
+					fs->tx_ntb ? "Eth" : "NTB", fs->rx_port,
+					fs->tx_ntb ? "NTB" : "Eth", fs->tx_port,
+					fs->qp_id);
+			if (fwd_mode == FILE_TRANS || fwd_mode == RXONLY)
+				printf(" + Stream %u : %s%u RX only\n",
+					conf->stream_id, "NTB", fs->rx_port);
+			if (fwd_mode == TXONLY)
+				printf(" + Stream %u : %s%u TX only\n",
+					conf->stream_id, "NTB", fs->tx_port);
+		}
 	}
+}
 
-	rte_rawdev_get_attr(dev_id, "link_status", &val);
-	if (!val) {
-		printf("Link is not up, cannot receive file.\n");
+static void
+start_pkt_fwd(void)
+{
+	struct ntb_fwd_lcore_conf *conf;
+	struct rte_eth_link eth_link;
+	uint8_t lcore_id;
+	int ret, i;
+
+	ret = ntb_fwd_config_setup();
+	if (ret < 0) {
+		printf("Cannot start traffic. Please reset fwd mode.\n");
 		return;
 	}
 
-	file = fopen(res->filepath, "w");
-	if (file == NULL) {
-		printf("Fail to open the file.\n");
+	/* If using iofwd, checking ethdev link status first. */
+	if (fwd_mode == IOFWD) {
+		printf("Checking eth link status...\n");
+		/* Wait for eth link up at most 100 times. */
+		for (i = 0; i < 100; i++) {
+			rte_eth_link_get(eth_port_id, &eth_link);
+			if (eth_link.link_status) {
+				printf("Eth%u Link Up. Speed %u Mbps - %s\n",
+					eth_port_id, eth_link.link_speed,
+					(eth_link.link_duplex ==
+					 ETH_LINK_FULL_DUPLEX) ?
+					("full-duplex") : ("half-duplex"));
+				break;
+			}
+		}
+		if (!eth_link.link_status) {
+			printf("Eth%u link down. Cannot start traffic.\n",
+				eth_port_id);
+			return;
+		}
+	}
+
+	assign_stream_to_lcores();
+	in_test = 1;
+
+	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
+		conf = &fwd_lcore_conf[lcore_id];
+
+		if (!conf->nb_stream)
+			continue;
+
+		conf->stopped = 0;
+		if (fwd_mode == FILE_TRANS)
+			rte_eal_remote_launch(start_polling_recv_file,
+					      conf, lcore_id);
+		else if (fwd_mode == IOFWD)
+			rte_eal_remote_launch(start_iofwd_per_lcore,
+					      conf, lcore_id);
+		else if (fwd_mode == RXONLY)
+			rte_eal_remote_launch(start_rxonly_per_lcore,
+					      conf, lcore_id);
+		else if (fwd_mode == TXONLY)
+			rte_eal_remote_launch(start_txonly_per_lcore,
+					      conf, lcore_id);
+	}
+}
+
+/* *** START FWD PARAMETERS *** */
+struct cmd_start_result {
+	cmdline_fixed_string_t start;
+};
+
+static void
+cmd_start_parsed(__attribute__((unused)) void *parsed_result,
+			    __attribute__((unused)) struct cmdline *cl,
+			    __attribute__((unused)) void *data)
+{
+	start_pkt_fwd();
+}
+
+cmdline_parse_token_string_t cmd_start_start =
+		TOKEN_STRING_INITIALIZER(struct cmd_start_result, start, "start");
+
+cmdline_parse_inst_t cmd_start = {
+	.f = cmd_start_parsed,
+	.data = NULL,
+	.help_str = "start pkt fwd between ntb and ethdev",
+	.tokens = {
+		(void *)&cmd_start_start,
+		NULL,
+	},
+};
+
+/* *** STOP *** */
+struct cmd_stop_result {
+	cmdline_fixed_string_t stop;
+};
+
+static void
+cmd_stop_parsed(__attribute__((unused)) void *parsed_result,
+		__attribute__((unused)) struct cmdline *cl,
+		__attribute__((unused)) void *data)
+{
+	struct ntb_fwd_lcore_conf *conf;
+	uint8_t lcore_id;
+
+	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
+		conf = &fwd_lcore_conf[lcore_id];
+
+		if (!conf->nb_stream)
+			continue;
+
+		if (conf->stopped)
+			continue;
+
+		conf->stopped = 1;
+	}
+	printf("\nWaiting for lcores to finish...\n");
+	rte_eal_mp_wait_lcore();
+	in_test = 0;
+	printf("\nDone.\n");
+}
+
+cmdline_parse_token_string_t cmd_stop_stop =
+		TOKEN_STRING_INITIALIZER(struct cmd_stop_result, stop, "stop");
+
+cmdline_parse_inst_t cmd_stop = {
+	.f = cmd_stop_parsed,
+	.data = NULL,
+	.help_str = "stop: Stop packet forwarding",
+	.tokens = {
+		(void *)&cmd_stop_stop,
+		NULL,
+	},
+};
+
+static void
+ntb_stats_clear(void)
+{
+	int nb_ids, i;
+	uint32_t *ids;
+
+	/* Clear NTB dev stats */
+	nb_ids = rte_rawdev_xstats_names_get(dev_id, NULL, 0);
+	if (nb_ids  < 0) {
+		printf("Error: Cannot get count of xstats\n");
 		return;
 	}
+	ids = malloc(sizeof(uint32_t) * nb_ids);
+	for (i = 0; i < nb_ids; i++)
+		ids[i] = i;
+	rte_rawdev_xstats_reset(dev_id, ids, nb_ids);
+	printf("\n  statistics for NTB port %d cleared\n", dev_id);
+
+	/* Clear Ethdev stats if have any */
+	if (fwd_mode == IOFWD && eth_port_id != RTE_MAX_ETHPORTS) {
+		rte_eth_stats_reset(eth_port_id);
+		printf("\n  statistics for ETH port %d cleared\n", eth_port_id);
+	}
+}
+
+static inline void
+ntb_calculate_throughput(uint16_t port) {
+	uint64_t diff_pkts_rx, diff_pkts_tx, diff_cycles;
+	uint64_t mpps_rx, mpps_tx;
+	static uint64_t prev_pkts_rx[2];
+	static uint64_t prev_pkts_tx[2];
+	static uint64_t prev_cycles[2];
+
+	diff_cycles = prev_cycles[port];
+	prev_cycles[port] = rte_rdtsc();
+	if (diff_cycles > 0)
+		diff_cycles = prev_cycles[port] - diff_cycles;
+	diff_pkts_rx = (ntb_port_stats[port].rx > prev_pkts_rx[port]) ?
+		(ntb_port_stats[port].rx - prev_pkts_rx[port]) : 0;
+	diff_pkts_tx = (ntb_port_stats[port].tx > prev_pkts_tx[port]) ?
+		(ntb_port_stats[port].tx - prev_pkts_tx[port]) : 0;
+	prev_pkts_rx[port] = ntb_port_stats[port].rx;
+	prev_pkts_tx[port] = ntb_port_stats[port].tx;
+	mpps_rx = diff_cycles > 0 ?
+		diff_pkts_rx * rte_get_tsc_hz() / diff_cycles : 0;
+	mpps_tx = diff_cycles > 0 ?
+		diff_pkts_tx * rte_get_tsc_hz() / diff_cycles : 0;
+	printf("  Throughput (since last show)\n");
+	printf("  Rx-pps: %12"PRIu64"\n  Tx-pps: %12"PRIu64"\n",
+			mpps_rx, mpps_tx);
+
+}
+
+static void
+ntb_stats_display(void)
+{
+	struct rte_rawdev_xstats_name *xstats_names;
+	struct rte_eth_stats stats;
+	uint64_t *values;
+	uint32_t *ids;
+	int nb_ids, i;
 
-	rte_rawdev_get_attr(dev_id, "spad_user_0", &val);
-	size = val << 32;
-	rte_rawdev_get_attr(dev_id, "spad_user_1", &val);
-	size |= val;
+	printf("###### statistics for NTB port %d #######\n", dev_id);
 
-	buff = (uint8_t *)malloc(size);
-	pkts_recv[0] = (struct rte_rawdev_buf *)malloc
-			(sizeof(struct rte_rawdev_buf));
-	pkts_recv[0]->buf_addr = buff;
+	/* Get NTB dev stats and stats names */
+	nb_ids = rte_rawdev_xstats_names_get(dev_id, NULL, 0);
+	if (nb_ids  < 0) {
+		printf("Error: Cannot get count of xstats\n");
+		return;
+	}
+	xstats_names = malloc(sizeof(struct rte_rawdev_xstats_name) * nb_ids);
+	if (xstats_names == NULL) {
+		printf("Cannot allocate memory for xstats lookup\n");
+		return;
+	}
+	if (nb_ids != rte_rawdev_xstats_names_get(
+			dev_id, xstats_names, nb_ids)) {
+		printf("Error: Cannot get xstats lookup\n");
+		free(xstats_names);
+		return;
+	}
+	ids = malloc(sizeof(uint32_t) * nb_ids);
+	for (i = 0; i < nb_ids; i++)
+		ids[i] = i;
+	values = malloc(sizeof(uint64_t) * nb_ids);
+	if (nb_ids != rte_rawdev_xstats_get(dev_id, ids, values, nb_ids)) {
+		printf("Error: Unable to get xstats\n");
+		free(xstats_names);
+		free(values);
+		free(ids);
+		return;
+	}
+
+	/* Display NTB dev stats */
+	for (i = 0; i < nb_ids; i++)
+		printf("  %s: %"PRIu64"\n", xstats_names[i].name, values[i]);
+	ntb_calculate_throughput(0);
 
-	if (rte_rawdev_dequeue_buffers(dev_id, pkts_recv, 1, (void *)size)) {
-		printf("Fail to dequeue.\n");
-		goto clean;
+	/* Get Ethdev stats if have any */
+	if (fwd_mode == IOFWD && eth_port_id != RTE_MAX_ETHPORTS) {
+		printf("###### statistics for ETH port %d ######\n",
+			eth_port_id);
+		rte_eth_stats_get(eth_port_id, &stats);
+		printf("  RX-packets: %"PRIu64"\n", stats.ipackets);
+		printf("  RX-bytes: %"PRIu64"\n", stats.ibytes);
+		printf("  RX-errors: %"PRIu64"\n", stats.ierrors);
+		printf("  RX-missed: %"PRIu64"\n", stats.imissed);
+		printf("  TX-packets: %"PRIu64"\n", stats.opackets);
+		printf("  TX-bytes: %"PRIu64"\n", stats.obytes);
+		printf("  TX-errors: %"PRIu64"\n", stats.oerrors);
+		ntb_calculate_throughput(1);
 	}
 
-	fwrite(buff, size, 1, file);
-	printf("Done receiving to file.\n");
+	free(xstats_names);
+	free(values);
+	free(ids);
+}
 
-clean:
-	fclose(file);
-	free(buff);
-	free(pkts_recv[0]);
+/* *** SHOW/CLEAR PORT STATS *** */
+struct cmd_stats_result {
+	cmdline_fixed_string_t show;
+	cmdline_fixed_string_t port;
+	cmdline_fixed_string_t stats;
+};
+
+static void
+cmd_stats_parsed(void *parsed_result,
+		 __attribute__((unused)) struct cmdline *cl,
+		 __attribute__((unused)) void *data)
+{
+	struct cmd_stats_result *res = parsed_result;
+	if (!strcmp(res->show, "clear"))
+		ntb_stats_clear();
+	else
+		ntb_stats_display();
 }
 
-cmdline_parse_token_string_t cmd_recv_file_recv =
-	TOKEN_STRING_INITIALIZER(struct cmd_recvfile_result, recv_string,
-				 "recv");
-cmdline_parse_token_string_t cmd_recv_file_filepath =
-	TOKEN_STRING_INITIALIZER(struct cmd_recvfile_result, filepath, NULL);
+cmdline_parse_token_string_t cmd_stats_show =
+	TOKEN_STRING_INITIALIZER(struct cmd_stats_result, show, "show#clear");
+cmdline_parse_token_string_t cmd_stats_port =
+	TOKEN_STRING_INITIALIZER(struct cmd_stats_result, port, "port");
+cmdline_parse_token_string_t cmd_stats_stats =
+	TOKEN_STRING_INITIALIZER(struct cmd_stats_result, stats, "stats");
 
 
-cmdline_parse_inst_t cmd_recv_file = {
-	.f = cmd_recvfile_parsed,
+cmdline_parse_inst_t cmd_stats = {
+	.f = cmd_stats_parsed,
 	.data = NULL,
-	.help_str = "recv <file_path>",
+	.help_str = "show|clear port stats",
 	.tokens = {
-		(void *)&cmd_recv_file_recv,
-		(void *)&cmd_recv_file_filepath,
+		(void *)&cmd_stats_show,
+		(void *)&cmd_stats_port,
+		(void *)&cmd_stats_stats,
+		NULL,
+	},
+};
+
+/* *** SET FORWARDING MODE *** */
+struct cmd_set_fwd_mode_result {
+	cmdline_fixed_string_t set;
+	cmdline_fixed_string_t fwd;
+	cmdline_fixed_string_t mode;
+};
+
+static void
+cmd_set_fwd_mode_parsed(__attribute__((unused)) void *parsed_result,
+			__attribute__((unused)) struct cmdline *cl,
+			__attribute__((unused)) void *data)
+{
+	struct cmd_set_fwd_mode_result *res = parsed_result;
+	int i;
+
+	if (in_test) {
+		printf("Please stop traffic first.\n");
+		return;
+	}
+
+	for (i = 0; i < MAX_FWD_MODE; i++) {
+		if (!strcmp(res->mode, fwd_mode_s[i])) {
+			fwd_mode = i;
+			return;
+		}
+	}
+	printf("Invalid %s packet forwarding mode.\n", res->mode);
+}
+
+cmdline_parse_token_string_t cmd_setfwd_set =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_fwd_mode_result, set, "set");
+cmdline_parse_token_string_t cmd_setfwd_fwd =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_fwd_mode_result, fwd, "fwd");
+cmdline_parse_token_string_t cmd_setfwd_mode =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_fwd_mode_result, mode,
+				"file-trans#iofwd#txonly#rxonly");
+
+cmdline_parse_inst_t cmd_set_fwd_mode = {
+	.f = cmd_set_fwd_mode_parsed,
+	.data = NULL,
+	.help_str = "set forwarding mode as file-trans|rxonly|txonly|iofwd",
+	.tokens = {
+		(void *)&cmd_setfwd_set,
+		(void *)&cmd_setfwd_fwd,
+		(void *)&cmd_setfwd_mode,
 		NULL,
 	},
 };
@@ -276,7 +1009,10 @@ cmdline_parse_inst_t cmd_recv_file = {
 cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_help,
 	(cmdline_parse_inst_t *)&cmd_send_file,
-	(cmdline_parse_inst_t *)&cmd_recv_file,
+	(cmdline_parse_inst_t *)&cmd_start,
+	(cmdline_parse_inst_t *)&cmd_stop,
+	(cmdline_parse_inst_t *)&cmd_stats,
+	(cmdline_parse_inst_t *)&cmd_set_fwd_mode,
 	(cmdline_parse_inst_t *)&cmd_quit,
 	NULL,
 };
@@ -305,45 +1041,257 @@ signal_handler(int signum)
 	}
 }
 
+#define OPT_BUF_SIZE         "buf-size"
+#define OPT_FWD_MODE         "fwd-mode"
+#define OPT_NB_DESC          "nb-desc"
+#define OPT_TXFREET          "txfreet"
+#define OPT_BURST            "burst"
+#define OPT_QP               "qp"
+
+enum {
+	/* long options mapped to a short option */
+	OPT_NO_ZERO_COPY_NUM = 1,
+	OPT_BUF_SIZE_NUM,
+	OPT_FWD_MODE_NUM,
+	OPT_NB_DESC_NUM,
+	OPT_TXFREET_NUM,
+	OPT_BURST_NUM,
+	OPT_QP_NUM,
+};
+
+static const char short_options[] =
+	"i" /* interactive mode */
+	;
+
+static const struct option lgopts[] = {
+	{OPT_BUF_SIZE,     1, NULL, OPT_BUF_SIZE_NUM     },
+	{OPT_FWD_MODE,     1, NULL, OPT_FWD_MODE_NUM     },
+	{OPT_NB_DESC,      1, NULL, OPT_NB_DESC_NUM      },
+	{OPT_TXFREET,      1, NULL, OPT_TXFREET_NUM      },
+	{OPT_BURST,        1, NULL, OPT_BURST_NUM        },
+	{OPT_QP,           1, NULL, OPT_QP_NUM           },
+	{0,                0, NULL, 0                    }
+};
+
 static void
 ntb_usage(const char *prgname)
 {
 	printf("%s [EAL options] -- [options]\n"
-	       "-i : run in interactive mode (default value is 1)\n",
-	       prgname);
+	       "-i: run in interactive mode.\n"
+	       "-qp=N: set number of queues as N (N > 0, default: 1).\n"
+	       "--fwd-mode=N: set fwd mode (N: file-trans | rxonly | "
+	       "txonly | iofwd, default: file-trans)\n"
+	       "--buf-size=N: set mbuf dataroom size as N (0 < N < 65535,"
+	       " default: 2048).\n"
+	       "--nb-desc=N: set number of descriptors as N (%u <= N <= %u,"
+	       " default: 1024).\n"
+	       "--txfreet=N: set tx free thresh for NTB driver as N. (N >= 0)\n"
+	       "--burst=N: set pkt burst as N (0 < N <= %u default: 32).\n",
+	       prgname, NTB_MIN_DESC_SIZE, NTB_MAX_DESC_SIZE,
+	       NTB_MAX_PKT_BURST);
 }
 
-static int
-parse_args(int argc, char **argv)
+static void
+ntb_parse_args(int argc, char **argv)
 {
 	char *prgname = argv[0], **argvopt = argv;
-	int opt, ret;
+	int opt, opt_idx, n, i;
 
-	/* Only support interactive mode to send/recv file first. */
-	while ((opt = getopt(argc, argvopt, "i")) != EOF) {
+	while ((opt = getopt_long(argc, argvopt, short_options,
+				lgopts, &opt_idx)) != EOF) {
 		switch (opt) {
 		case 'i':
-			printf("Interactive-mode selected\n");
+			printf("Interactive-mode selected.\n");
 			interactive = 1;
 			break;
+		case OPT_QP_NUM:
+			n = atoi(optarg);
+			if (n > 0)
+				num_queues = n;
+			else
+				rte_exit(EXIT_FAILURE, "q must be > 0.\n");
+			break;
+		case OPT_BUF_SIZE_NUM:
+			n = atoi(optarg);
+			if (n > RTE_PKTMBUF_HEADROOM && n <= 0xFFFF)
+				ntb_buf_size = n;
+			else
+				rte_exit(EXIT_FAILURE, "buf-size must be > "
+					"%u and < 65536.\n",
+					RTE_PKTMBUF_HEADROOM);
+			break;
+		case OPT_FWD_MODE_NUM:
+			for (i = 0; i < MAX_FWD_MODE; i++) {
+				if (!strcmp(optarg, fwd_mode_s[i])) {
+					fwd_mode = i;
+					break;
+				}
+			}
+			if (i == MAX_FWD_MODE)
+				rte_exit(EXIT_FAILURE, "Unsupported mode. "
+				"(Should be: file-trans | rxonly | txonly "
+				"| iofwd)\n");
+			break;
+		case OPT_NB_DESC_NUM:
+			n = atoi(optarg);
+			if (n >= NTB_MIN_DESC_SIZE && n <= NTB_MAX_DESC_SIZE)
+				nb_desc = n;
+			else
+				rte_exit(EXIT_FAILURE, "nb-desc must be within"
+					" [%u, %u].\n", NTB_MIN_DESC_SIZE,
+					NTB_MAX_DESC_SIZE);
+			break;
+		case OPT_TXFREET_NUM:
+			n = atoi(optarg);
+			if (n >= 0)
+				tx_free_thresh = n;
+			else
+				rte_exit(EXIT_FAILURE, "txfreet must be"
+					" >= 0\n");
+			break;
+		case OPT_BURST_NUM:
+			n = atoi(optarg);
+			if (n > 0 && n <= NTB_MAX_PKT_BURST)
+				pkt_burst = n;
+			else
+				rte_exit(EXIT_FAILURE, "burst must be within "
+					"(0, %u].\n", NTB_MAX_PKT_BURST);
+			break;
 
 		default:
 			ntb_usage(prgname);
-			return -1;
+			rte_exit(EXIT_FAILURE,
+				 "Command line is incomplete or incorrect.\n");
+			break;
 		}
 	}
+}
 
-	if (optind >= 0)
-		argv[optind-1] = prgname;
+static void
+ntb_mempool_mz_free(__rte_unused struct rte_mempool_memhdr *memhdr,
+		void *opaque)
+{
+	const struct rte_memzone *mz = opaque;
+	rte_memzone_free(mz);
+}
 
-	ret = optind-1;
-	optind = 1; /* reset getopt lib */
-	return ret;
+static struct rte_mempool *
+ntb_mbuf_pool_create(uint16_t mbuf_seg_size, uint32_t nb_mbuf,
+		     struct ntb_dev_info ntb_info,
+		     struct ntb_dev_config *ntb_conf,
+		     unsigned int socket_id)
+{
+	size_t mz_len, total_elt_sz, max_mz_len, left_sz;
+	struct rte_pktmbuf_pool_private mbp_priv;
+	char pool_name[RTE_MEMPOOL_NAMESIZE];
+	char mz_name[RTE_MEMZONE_NAMESIZE];
+	const struct rte_memzone *mz;
+	struct rte_mempool *mp;
+	uint64_t align;
+	uint32_t mz_id;
+	int ret;
+
+	snprintf(pool_name, sizeof(pool_name), "ntb_mbuf_pool_%u", socket_id);
+	mp = rte_mempool_create_empty(pool_name, nb_mbuf,
+				      (mbuf_seg_size + sizeof(struct rte_mbuf)),
+				      MEMPOOL_CACHE_SIZE,
+				      sizeof(struct rte_pktmbuf_pool_private),
+				      socket_id, 0);
+	if (mp == NULL)
+		return NULL;
+
+	mbp_priv.mbuf_data_room_size = mbuf_seg_size;
+	mbp_priv.mbuf_priv_size = 0;
+	rte_pktmbuf_pool_init(mp, &mbp_priv);
+
+	ntb_conf->mz_list = rte_zmalloc("ntb_memzone_list",
+				sizeof(struct rte_memzone *) *
+				ntb_info.mw_cnt, 0);
+	if (ntb_conf->mz_list == NULL)
+		goto fail;
+
+	/* Put ntb header on mw0. */
+	if (ntb_info.mw_size[0] < ntb_info.ntb_hdr_size) {
+		printf("mw0 (size: %" PRIu64 ") is not enough for ntb hdr"
+		       " (size: %u)\n", ntb_info.mw_size[0],
+		       ntb_info.ntb_hdr_size);
+		goto fail;
+	}
+
+	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
+	left_sz = total_elt_sz * nb_mbuf;
+	for (mz_id = 0; mz_id < ntb_info.mw_cnt; mz_id++) {
+		/* If populated mbuf is enough, no need to reserve extra mz. */
+		if (!left_sz)
+			break;
+		snprintf(mz_name, sizeof(mz_name), "ntb_mw_%d", mz_id);
+		align = ntb_info.mw_size_align ? ntb_info.mw_size[mz_id] :
+			RTE_CACHE_LINE_SIZE;
+		/* Reserve ntb header space on memzone 0. */
+		max_mz_len = mz_id ? ntb_info.mw_size[mz_id] :
+			     ntb_info.mw_size[mz_id] - ntb_info.ntb_hdr_size;
+		mz_len = left_sz <= max_mz_len ? left_sz :
+			(max_mz_len / total_elt_sz * total_elt_sz);
+		if (!mz_len)
+			continue;
+		mz = rte_memzone_reserve_aligned(mz_name, mz_len, socket_id,
+					RTE_MEMZONE_IOVA_CONTIG, align);
+		if (mz == NULL) {
+			printf("Cannot allocate %" PRIu64 " aligned memzone"
+				" %u\n", align, mz_id);
+			goto fail;
+		}
+		left_sz -= mz_len;
+
+		/* Reserve ntb header space on memzone 0. */
+		if (mz_id)
+			ret = rte_mempool_populate_iova(mp, mz->addr, mz->iova,
+					mz->len, ntb_mempool_mz_free,
+					(void *)(uintptr_t)mz);
+		else
+			ret = rte_mempool_populate_iova(mp,
+					(void *)((size_t)mz->addr +
+					ntb_info.ntb_hdr_size),
+					mz->iova + ntb_info.ntb_hdr_size,
+					mz->len - ntb_info.ntb_hdr_size,
+					ntb_mempool_mz_free,
+					(void *)(uintptr_t)mz);
+		if (ret < 0) {
+			rte_memzone_free(mz);
+			rte_mempool_free(mp);
+			return NULL;
+		}
+
+		ntb_conf->mz_list[mz_id] = mz;
+	}
+	if (left_sz) {
+		printf("mw space is not enough for mempool.\n");
+		goto fail;
+	}
+
+	ntb_conf->mz_num = mz_id;
+	rte_mempool_obj_iter(mp, rte_pktmbuf_init, NULL);
+
+	return mp;
+fail:
+	rte_mempool_free(mp);
+	return NULL;
 }
 
 int
 main(int argc, char **argv)
 {
+	struct rte_eth_conf eth_pconf = eth_port_conf;
+	struct rte_rawdev_info ntb_rawdev_conf;
+	struct rte_rawdev_info ntb_rawdev_info;
+	struct rte_eth_dev_info ethdev_info;
+	struct rte_eth_rxconf eth_rx_conf;
+	struct rte_eth_txconf eth_tx_conf;
+	struct ntb_queue_conf ntb_q_conf;
+	struct ntb_dev_config ntb_conf;
+	struct ntb_dev_info ntb_info;
+	uint64_t ntb_link_status;
+	uint32_t nb_mbuf;
 	int ret, i;
 
 	signal(SIGINT, signal_handler);
@@ -353,6 +1301,9 @@ main(int argc, char **argv)
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "Error with EAL initialization.\n");
 
+	if (rte_lcore_count() < 2)
+		rte_exit(EXIT_FAILURE, "Need at least 2 cores\n");
+
 	/* Find 1st ntb rawdev. */
 	for (i = 0; i < RTE_RAWDEV_MAX_DEVS; i++)
 		if (rte_rawdevs[i].driver_name &&
@@ -368,15 +1319,118 @@ main(int argc, char **argv)
 	argc -= ret;
 	argv += ret;
 
-	ret = parse_args(argc, argv);
+	ntb_parse_args(argc, argv);
+
+	rte_rawdev_set_attr(dev_id, NTB_QUEUE_SZ_NAME, nb_desc);
+	printf("Set queue size as %u.\n", nb_desc);
+	rte_rawdev_set_attr(dev_id, NTB_QUEUE_NUM_NAME, num_queues);
+	printf("Set queue number as %u.\n", num_queues);
+	ntb_rawdev_info.dev_private = (rte_rawdev_obj_t)(&ntb_info);
+	rte_rawdev_info_get(dev_id, &ntb_rawdev_info);
+
+	nb_mbuf = nb_desc * num_queues * 2 * 2 + rte_lcore_count() *
+		  MEMPOOL_CACHE_SIZE;
+	mbuf_pool = ntb_mbuf_pool_create(ntb_buf_size, nb_mbuf, ntb_info,
+					 &ntb_conf, rte_socket_id());
+	if (mbuf_pool == NULL)
+		rte_exit(EXIT_FAILURE, "Cannot create mbuf pool.\n");
+
+	ntb_conf.num_queues = num_queues;
+	ntb_conf.queue_size = nb_desc;
+	ntb_rawdev_conf.dev_private = (rte_rawdev_obj_t)(&ntb_conf);
+	ret = rte_rawdev_configure(dev_id, &ntb_rawdev_conf);
+	if (ret)
+		rte_exit(EXIT_FAILURE, "Can't config ntb dev: err=%d, "
+			"port=%u\n", ret, dev_id);
+
+	ntb_q_conf.tx_free_thresh = tx_free_thresh;
+	ntb_q_conf.nb_desc = nb_desc;
+	ntb_q_conf.rx_mp = mbuf_pool;
+	for (i = 0; i < num_queues; i++) {
+		/* Setup rawdev queue */
+		ret = rte_rawdev_queue_setup(dev_id, i, &ntb_q_conf);
+		if (ret < 0)
+			rte_exit(EXIT_FAILURE,
+				"Failed to setup ntb queue %u.\n", i);
+	}
+
+	/* Waiting for peer dev up at most 100s.*/
+	printf("Checking ntb link status...\n");
+	for (i = 0; i < 1000; i++) {
+		rte_rawdev_get_attr(dev_id, NTB_LINK_STATUS_NAME,
+				    &ntb_link_status);
+		if (ntb_link_status) {
+			printf("Peer dev ready, ntb link up.\n");
+			break;
+		}
+		rte_delay_ms(100);
+	}
+	rte_rawdev_get_attr(dev_id, NTB_LINK_STATUS_NAME, &ntb_link_status);
+	if (ntb_link_status == 0)
+		printf("Expire 100s. Link is not up. Please restart app.\n");
+
+	ret = rte_rawdev_start(dev_id);
 	if (ret < 0)
-		rte_exit(EXIT_FAILURE, "Invalid arguments\n");
+		rte_exit(EXIT_FAILURE, "rte_rawdev_start: err=%d, port=%u\n",
+			ret, dev_id);
+
+	/* Find 1st ethdev */
+	eth_port_id = rte_eth_find_next(0);
 
-	rte_rawdev_start(dev_id);
+	if (eth_port_id < RTE_MAX_ETHPORTS) {
+		rte_eth_dev_info_get(eth_port_id, &ethdev_info);
+		eth_pconf.rx_adv_conf.rss_conf.rss_hf &=
+				ethdev_info.flow_type_rss_offloads;
+		ret = rte_eth_dev_configure(eth_port_id, num_queues,
+					    num_queues, &eth_pconf);
+		if (ret)
+			rte_exit(EXIT_FAILURE, "Can't config ethdev: err=%d, "
+				"port=%u\n", ret, eth_port_id);
+		eth_rx_conf = ethdev_info.default_rxconf;
+		eth_rx_conf.offloads = eth_pconf.rxmode.offloads;
+		eth_tx_conf = ethdev_info.default_txconf;
+		eth_tx_conf.offloads = eth_pconf.txmode.offloads;
+
+		/* Setup ethdev queue if ethdev exists */
+		for (i = 0; i < num_queues; i++) {
+			ret = rte_eth_rx_queue_setup(eth_port_id, i, nb_desc,
+					rte_eth_dev_socket_id(eth_port_id),
+					&eth_rx_conf, mbuf_pool);
+			if (ret < 0)
+				rte_exit(EXIT_FAILURE,
+					"Failed to setup eth rxq %u.\n", i);
+			ret = rte_eth_tx_queue_setup(eth_port_id, i, nb_desc,
+					rte_eth_dev_socket_id(eth_port_id),
+					&eth_tx_conf);
+			if (ret < 0)
+				rte_exit(EXIT_FAILURE,
+					"Failed to setup eth txq %u.\n", i);
+		}
+
+		ret = rte_eth_dev_start(eth_port_id);
+		if (ret < 0)
+			rte_exit(EXIT_FAILURE, "rte_eth_dev_start: err=%d, "
+				"port=%u\n", ret, eth_port_id);
+	}
+
+	/* initialize port stats */
+	memset(&ntb_port_stats, 0, sizeof(ntb_port_stats));
+
+	/* Set default fwd mode if user doesn't set it. */
+	if (fwd_mode == MAX_FWD_MODE && eth_port_id < RTE_MAX_ETHPORTS) {
+		printf("Set default fwd mode as iofwd.\n");
+		fwd_mode = IOFWD;
+	}
+	if (fwd_mode == MAX_FWD_MODE) {
+		printf("Set default fwd mode as file-trans.\n");
+		fwd_mode = FILE_TRANS;
+	}
 
 	if (interactive) {
 		sleep(1);
 		prompt();
+	} else {
+		start_pkt_fwd();
 	}
 
 	return 0;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH v6 0/4] enable FIFO for NTB
  2019-09-26  3:20         ` [dpdk-dev] [PATCH v6 0/4] enable FIFO " Xiaoyun Li
                             ` (3 preceding siblings ...)
  2019-09-26  3:20           ` [dpdk-dev] [PATCH v6 4/4] examples/ntb: support more functions for NTB Xiaoyun Li
@ 2019-09-26  4:04           ` " Wu, Jingjing
  2019-10-21 13:43             ` Thomas Monjalon
  4 siblings, 1 reply; 42+ messages in thread
From: Wu, Jingjing @ 2019-09-26  4:04 UTC (permalink / raw)
  To: Li, Xiaoyun, Wiles, Keith, Maslekar, Omkar, Liang, Cunming; +Cc: dev



> -----Original Message-----
> From: Li, Xiaoyun
> Sent: Thursday, September 26, 2019 11:20 AM
> To: Wu, Jingjing <jingjing.wu@intel.com>; Wiles, Keith <keith.wiles@intel.com>; Maslekar,
> Omkar <omkar.maslekar@intel.com>; Liang, Cunming <cunming.liang@intel.com>
> Cc: dev@dpdk.org; Li, Xiaoyun <xiaoyun.li@intel.com>
> Subject: [PATCH v6 0/4] enable FIFO for NTB
> 
> Enable FIFO for NTB rawdev driver to support packet based
> processing. And an example is provided to support txonly,
> rxonly, iofwd between NTB device and ethdev, and file
> transmission.
> 
> Acked-by: Omkar Maslekar <omkar.maslekar@intel.com>
> 
Series Acked-by: Jingjing Wu <jingjing.wu@intel.com> 


Thanks
Jingjing

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH v6 0/4] enable FIFO for NTB
  2019-09-26  4:04           ` [dpdk-dev] [PATCH v6 0/4] enable FIFO " Wu, Jingjing
@ 2019-10-21 13:43             ` Thomas Monjalon
  2019-10-21 15:54               ` David Marchand
  0 siblings, 1 reply; 42+ messages in thread
From: Thomas Monjalon @ 2019-10-21 13:43 UTC (permalink / raw)
  To: Li, Xiaoyun
  Cc: dev, Wu, Jingjing, Wiles, Keith, Maslekar, Omkar, Liang, Cunming

26/09/2019 06:04, Wu, Jingjing:
> From: Li, Xiaoyun
> > Enable FIFO for NTB rawdev driver to support packet based
> > processing. And an example is provided to support txonly,
> > rxonly, iofwd between NTB device and ethdev, and file
> > transmission.
> > 
> > Acked-by: Omkar Maslekar <omkar.maslekar@intel.com>
> > 
> Series Acked-by: Jingjing Wu <jingjing.wu@intel.com> 

Applied, thanks



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH v6 0/4] enable FIFO for NTB
  2019-10-21 13:43             ` Thomas Monjalon
@ 2019-10-21 15:54               ` David Marchand
  2019-10-22  1:12                 ` Li, Xiaoyun
  0 siblings, 1 reply; 42+ messages in thread
From: David Marchand @ 2019-10-21 15:54 UTC (permalink / raw)
  To: Li, Xiaoyun
  Cc: dev, Wu, Jingjing, Wiles, Keith, Maslekar, Omkar, Liang, Cunming,
	Thomas Monjalon

On Mon, Oct 21, 2019 at 3:44 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 26/09/2019 06:04, Wu, Jingjing:
> > From: Li, Xiaoyun
> > > Enable FIFO for NTB rawdev driver to support packet based
> > > processing. And an example is provided to support txonly,
> > > rxonly, iofwd between NTB device and ethdev, and file
> > > transmission.
> > >
> > > Acked-by: Omkar Maslekar <omkar.maslekar@intel.com>
> > >
> > Series Acked-by: Jingjing Wu <jingjing.wu@intel.com>
>
> Applied, thanks
>

Pushed a fix on the last patch to fix master compilation, please check
other variables:
https://git.dpdk.org/dpdk/commit/?id=fe56fe635b1d0933344d60e95d9359034dc52f75


Thanks.

-- 
David Marchand


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [dpdk-dev] [PATCH v6 0/4] enable FIFO for NTB
  2019-10-21 15:54               ` David Marchand
@ 2019-10-22  1:12                 ` Li, Xiaoyun
  0 siblings, 0 replies; 42+ messages in thread
From: Li, Xiaoyun @ 2019-10-22  1:12 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, Wu, Jingjing, Wiles, Keith, Maslekar, Omkar, Liang, Cunming,
	Thomas Monjalon

Thanks!

> -----Original Message-----
> From: David Marchand [mailto:david.marchand@redhat.com]
> Sent: Monday, October 21, 2019 23:55
> To: Li, Xiaoyun <xiaoyun.li@intel.com>
> Cc: dev <dev@dpdk.org>; Wu, Jingjing <jingjing.wu@intel.com>; Wiles, Keith
> <keith.wiles@intel.com>; Maslekar, Omkar <omkar.maslekar@intel.com>; Liang,
> Cunming <cunming.liang@intel.com>; Thomas Monjalon
> <thomas@monjalon.net>
> Subject: Re: [dpdk-dev] [PATCH v6 0/4] enable FIFO for NTB
> 
> On Mon, Oct 21, 2019 at 3:44 PM Thomas Monjalon <thomas@monjalon.net>
> wrote:
> >
> > 26/09/2019 06:04, Wu, Jingjing:
> > > From: Li, Xiaoyun
> > > > Enable FIFO for NTB rawdev driver to support packet based
> > > > processing. And an example is provided to support txonly, rxonly,
> > > > iofwd between NTB device and ethdev, and file transmission.
> > > >
> > > > Acked-by: Omkar Maslekar <omkar.maslekar@intel.com>
> > > >
> > > Series Acked-by: Jingjing Wu <jingjing.wu@intel.com>
> >
> > Applied, thanks
> >
> 
> Pushed a fix on the last patch to fix master compilation, please check other
> variables:
> https://git.dpdk.org/dpdk/commit/?id=fe56fe635b1d0933344d60e95d9359034d
> c52f75
> 
> 
> Thanks.
> 
> --
> David Marchand


^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, back to index

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-05  5:39 [dpdk-dev] [PATCH 0/4] enable FIFO for NTB Xiaoyun Li
2019-09-05  5:39 ` [dpdk-dev] [PATCH 1/4] raw/ntb: setup ntb queue Xiaoyun Li
2019-09-05  5:39 ` [dpdk-dev] [PATCH 2/4] raw/ntb: add xstats support Xiaoyun Li
2019-09-05  5:39 ` [dpdk-dev] [PATCH 3/4] raw/ntb: add enqueue and dequeue functions Xiaoyun Li
2019-09-05  5:39 ` [dpdk-dev] [PATCH 4/4] examples/ntb: support more functions for NTB Xiaoyun Li
2019-09-05 18:34 ` [dpdk-dev] [PATCH 0/4] enable FIFO " Maslekar, Omkar
2019-09-06  3:02 ` [dpdk-dev] [PATCH v2 " Xiaoyun Li
2019-09-06  3:02   ` [dpdk-dev] [PATCH v2 1/4] raw/ntb: setup ntb queue Xiaoyun Li
2019-09-06  3:02   ` [dpdk-dev] [PATCH v2 2/4] raw/ntb: add xstats support Xiaoyun Li
2019-09-06  3:02   ` [dpdk-dev] [PATCH v2 3/4] raw/ntb: add enqueue and dequeue functions Xiaoyun Li
2019-09-06  3:02   ` [dpdk-dev] [PATCH v2 4/4] examples/ntb: support more functions for NTB Xiaoyun Li
2019-09-06  7:53   ` [dpdk-dev] [PATCH v3 0/4] enable FIFO " Xiaoyun Li
2019-09-06  7:53     ` [dpdk-dev] [PATCH v3 1/4] raw/ntb: setup ntb queue Xiaoyun Li
2019-09-06  7:54     ` [dpdk-dev] [PATCH v3 2/4] raw/ntb: add xstats support Xiaoyun Li
2019-09-06  7:54     ` [dpdk-dev] [PATCH v3 3/4] raw/ntb: add enqueue and dequeue functions Xiaoyun Li
2019-09-06  7:54     ` [dpdk-dev] [PATCH v3 4/4] examples/ntb: support more functions for NTB Xiaoyun Li
2019-09-09  3:27     ` [dpdk-dev] [PATCH v4 0/4] enable FIFO " Xiaoyun Li
2019-09-09  3:27       ` [dpdk-dev] [PATCH v4 1/4] raw/ntb: setup ntb queue Xiaoyun Li
2019-09-23  2:50         ` Wu, Jingjing
2019-09-23  3:28           ` Li, Xiaoyun
2019-09-09  3:27       ` [dpdk-dev] [PATCH v4 2/4] raw/ntb: add xstats support Xiaoyun Li
2019-09-23  3:30         ` Wu, Jingjing
2019-09-09  3:27       ` [dpdk-dev] [PATCH v4 3/4] raw/ntb: add enqueue and dequeue functions Xiaoyun Li
2019-09-23  5:25         ` Wu, Jingjing
2019-09-09  3:27       ` [dpdk-dev] [PATCH v4 4/4] examples/ntb: support more functions for NTB Xiaoyun Li
2019-09-23  7:18         ` Wu, Jingjing
2019-09-23  7:26           ` Li, Xiaoyun
2019-09-24  8:24           ` Li, Xiaoyun
2019-09-24  8:43       ` [dpdk-dev] [PATCH v5 0/4] enable FIFO " Xiaoyun Li
2019-09-24  8:43         ` [dpdk-dev] [PATCH v5 1/4] raw/ntb: setup ntb queue Xiaoyun Li
2019-09-24  8:43         ` [dpdk-dev] [PATCH v5 2/4] raw/ntb: add xstats support Xiaoyun Li
2019-09-24  8:43         ` [dpdk-dev] [PATCH v5 3/4] raw/ntb: add enqueue and dequeue functions Xiaoyun Li
2019-09-24  8:43         ` [dpdk-dev] [PATCH v5 4/4] examples/ntb: support more functions for NTB Xiaoyun Li
2019-09-26  3:20         ` [dpdk-dev] [PATCH v6 0/4] enable FIFO " Xiaoyun Li
2019-09-26  3:20           ` [dpdk-dev] [PATCH v6 1/4] raw/ntb: setup ntb queue Xiaoyun Li
2019-09-26  3:20           ` [dpdk-dev] [PATCH v6 2/4] raw/ntb: add xstats support Xiaoyun Li
2019-09-26  3:20           ` [dpdk-dev] [PATCH v6 3/4] raw/ntb: add enqueue and dequeue functions Xiaoyun Li
2019-09-26  3:20           ` [dpdk-dev] [PATCH v6 4/4] examples/ntb: support more functions for NTB Xiaoyun Li
2019-09-26  4:04           ` [dpdk-dev] [PATCH v6 0/4] enable FIFO " Wu, Jingjing
2019-10-21 13:43             ` Thomas Monjalon
2019-10-21 15:54               ` David Marchand
2019-10-22  1:12                 ` Li, Xiaoyun

DPDK patches and discussions

Archives are clonable:
	git clone --mirror http://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ http://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev


Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/ public-inbox