DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH v1 0/5] net/qede: fixes and enhancement
@ 2019-09-06  7:32 Shahed Shaikh
  2019-09-06  7:32 ` [dpdk-dev] [PATCH v1 1/5] net/qede: refactor Rx and Tx queue setup Shahed Shaikh
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Shahed Shaikh @ 2019-09-06  7:32 UTC (permalink / raw)
  To: dev; +Cc: rmody, jerinj, GR-Everest-DPDK-Dev

First four patches are part of a fix for the ovs-dpdk issue
with 100Gb NIC [1].
Fifth patch adds support for drop action in rte_flow.

[1]
As per HW design of 100Gb mode, device internally uses 2 engines
(eng0 and eng1), and both engines need to be configured symmetrically.
Based on this requirement, driver design chose an approach
to allow user to allocate only even number of queues and split
those queues on both engines equally.

This approach puts a limitation on number of queues to be allocated -
i.e. user can't configure odd number of queues on 100Gb mode.
OVS configures DPDK port with 1 rxq and 1 txq, which causes initialization
of qede port to fail.

This patch series changes the implementation of queue allocation method
for 100Gb devices by removing above mentioned limitation and allowing
user to configure odd number of queues.

Fix is split into 4 logical patches -
 - First patch refactors Rx and Tx queue setup code to lay a foundation
   for actual fix.
 - Second patch actually implements a new approach to fix the issue.
 - Third patch fixes RSS configuration w.r.t. new approach.
 - Fourth patch fixes statistics code impacted due to new approach.

Shahed Shaikh (5):
  net/qede: refactor Rx and Tx queue setup
  net/qede: fix ovs-dpdk failure when using odd number of queues on
    100Gb mode
  net/qede: fix RSS configuration as per new 100Gb queue allocation
    method
  net/qede: fix stats flow as per new 100Gb queue allocation method
  net/qede: implement rte_flow drop action

 doc/guides/nics/qede.rst              |  39 +++
 drivers/net/qede/base/ecore_dev_api.h |   1 +
 drivers/net/qede/base/ecore_l2.c      |  50 ++--
 drivers/net/qede/base/ecore_l2_api.h  |  39 ++-
 drivers/net/qede/qede_ethdev.c        | 331 +++++++++++------------
 drivers/net/qede/qede_ethdev.h        |   6 +-
 drivers/net/qede/qede_filter.c        |  27 +-
 drivers/net/qede/qede_rxtx.c          | 362 +++++++++++++++++++-------
 drivers/net/qede/qede_rxtx.h          |  26 +-
 9 files changed, 549 insertions(+), 332 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [dpdk-dev] [PATCH v1 1/5] net/qede: refactor Rx and Tx queue setup
  2019-09-06  7:32 [dpdk-dev] [PATCH v1 0/5] net/qede: fixes and enhancement Shahed Shaikh
@ 2019-09-06  7:32 ` Shahed Shaikh
  2019-09-12 12:34   ` Jerin Jacob
  2019-09-06  7:32 ` [dpdk-dev] [PATCH v1 2/5] net/qede: fix ovs-dpdk failure when using odd number of queues on 100Gb mode Shahed Shaikh
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 8+ messages in thread
From: Shahed Shaikh @ 2019-09-06  7:32 UTC (permalink / raw)
  To: dev; +Cc: rmody, jerinj, GR-Everest-DPDK-Dev, stable

This patch refactors Rx and Tx queue setup flow required to allow
odd number of queues to be configured in next patch.

This is the first patch of the series required to fix an issue
where qede port initialization in ovs-dpdk fails due to 1 Rx/Tx queue
configuration. Detailed explaination is given in next patch.

Fixes: 2af14ca79c0a ("net/qede: support 100G")
Cc: stable@dpdk.org

Signed-off-by: Shahed Shaikh <shshaikh@marvell.com>
---
 drivers/net/qede/qede_rxtx.c | 228 ++++++++++++++++++++++-------------
 1 file changed, 141 insertions(+), 87 deletions(-)

diff --git a/drivers/net/qede/qede_rxtx.c b/drivers/net/qede/qede_rxtx.c
index c38cbb905..cb8ac9bf6 100644
--- a/drivers/net/qede/qede_rxtx.c
+++ b/drivers/net/qede/qede_rxtx.c
@@ -124,36 +124,20 @@ qede_calc_rx_buf_size(struct rte_eth_dev *dev, uint16_t mbufsz,
 	return QEDE_FLOOR_TO_CACHE_LINE_SIZE(rx_buf_size);
 }
 
-int
-qede_rx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
-		    uint16_t nb_desc, unsigned int socket_id,
-		    __rte_unused const struct rte_eth_rxconf *rx_conf,
-		    struct rte_mempool *mp)
+static struct qede_rx_queue *
+qede_alloc_rx_queue_mem(struct rte_eth_dev *dev,
+			uint16_t queue_idx,
+			uint16_t nb_desc,
+			unsigned int socket_id,
+			struct rte_mempool *mp,
+			uint16_t bufsz)
 {
 	struct qede_dev *qdev = QEDE_INIT_QDEV(dev);
 	struct ecore_dev *edev = QEDE_INIT_EDEV(qdev);
-	struct rte_eth_rxmode *rxmode = &dev->data->dev_conf.rxmode;
 	struct qede_rx_queue *rxq;
-	uint16_t max_rx_pkt_len;
-	uint16_t bufsz;
 	size_t size;
 	int rc;
 
-	PMD_INIT_FUNC_TRACE(edev);
-
-	/* Note: Ring size/align is controlled by struct rte_eth_desc_lim */
-	if (!rte_is_power_of_2(nb_desc)) {
-		DP_ERR(edev, "Ring size %u is not power of 2\n",
-			  nb_desc);
-		return -EINVAL;
-	}
-
-	/* Free memory prior to re-allocation if needed... */
-	if (dev->data->rx_queues[queue_idx] != NULL) {
-		qede_rx_queue_release(dev->data->rx_queues[queue_idx]);
-		dev->data->rx_queues[queue_idx] = NULL;
-	}
-
 	/* First allocate the rx queue data structure */
 	rxq = rte_zmalloc_socket("qede_rx_queue", sizeof(struct qede_rx_queue),
 				 RTE_CACHE_LINE_SIZE, socket_id);
@@ -161,7 +145,7 @@ qede_rx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
 	if (!rxq) {
 		DP_ERR(edev, "Unable to allocate memory for rxq on socket %u",
 			  socket_id);
-		return -ENOMEM;
+		return NULL;
 	}
 
 	rxq->qdev = qdev;
@@ -170,27 +154,8 @@ qede_rx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
 	rxq->queue_id = queue_idx;
 	rxq->port_id = dev->data->port_id;
 
-	max_rx_pkt_len = (uint16_t)rxmode->max_rx_pkt_len;
-
-	/* Fix up RX buffer size */
-	bufsz = (uint16_t)rte_pktmbuf_data_room_size(mp) - RTE_PKTMBUF_HEADROOM;
-	/* cache align the mbuf size to simplfy rx_buf_size calculation */
-	bufsz = QEDE_FLOOR_TO_CACHE_LINE_SIZE(bufsz);
-	if ((rxmode->offloads & DEV_RX_OFFLOAD_SCATTER)	||
-	    (max_rx_pkt_len + QEDE_ETH_OVERHEAD) > bufsz) {
-		if (!dev->data->scattered_rx) {
-			DP_INFO(edev, "Forcing scatter-gather mode\n");
-			dev->data->scattered_rx = 1;
-		}
-	}
-
-	rc = qede_calc_rx_buf_size(dev, bufsz, max_rx_pkt_len);
-	if (rc < 0) {
-		rte_free(rxq);
-		return rc;
-	}
 
-	rxq->rx_buf_size = rc;
+	rxq->rx_buf_size = bufsz;
 
 	DP_INFO(edev, "mtu %u mbufsz %u bd_max_bytes %u scatter_mode %d\n",
 		qdev->mtu, bufsz, rxq->rx_buf_size, dev->data->scattered_rx);
@@ -203,7 +168,7 @@ qede_rx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
 		DP_ERR(edev, "Memory allocation fails for sw_rx_ring on"
 		       " socket %u\n", socket_id);
 		rte_free(rxq);
-		return -ENOMEM;
+		return NULL;
 	}
 
 	/* Allocate FW Rx ring  */
@@ -221,7 +186,7 @@ qede_rx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
 		       " on socket %u\n", socket_id);
 		rte_free(rxq->sw_rx_ring);
 		rte_free(rxq);
-		return -ENOMEM;
+		return NULL;
 	}
 
 	/* Allocate FW completion ring */
@@ -240,14 +205,71 @@ qede_rx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
 		qdev->ops->common->chain_free(edev, &rxq->rx_bd_ring);
 		rte_free(rxq->sw_rx_ring);
 		rte_free(rxq);
-		return -ENOMEM;
+		return NULL;
+	}
+
+	return rxq;
+}
+
+int
+qede_rx_queue_setup(struct rte_eth_dev *dev, uint16_t qid,
+		    uint16_t nb_desc, unsigned int socket_id,
+		    __rte_unused const struct rte_eth_rxconf *rx_conf,
+		    struct rte_mempool *mp)
+{
+	struct qede_dev *qdev = QEDE_INIT_QDEV(dev);
+	struct ecore_dev *edev = QEDE_INIT_EDEV(qdev);
+	struct rte_eth_rxmode *rxmode = &dev->data->dev_conf.rxmode;
+	struct qede_rx_queue *rxq;
+	uint16_t max_rx_pkt_len;
+	uint16_t bufsz;
+	int rc;
+
+	PMD_INIT_FUNC_TRACE(edev);
+
+	/* Note: Ring size/align is controlled by struct rte_eth_desc_lim */
+	if (!rte_is_power_of_2(nb_desc)) {
+		DP_ERR(edev, "Ring size %u is not power of 2\n",
+			  nb_desc);
+		return -EINVAL;
 	}
 
-	dev->data->rx_queues[queue_idx] = rxq;
-	qdev->fp_array[queue_idx].rxq = rxq;
+	/* Free memory prior to re-allocation if needed... */
+	if (dev->data->rx_queues[qid] != NULL) {
+		qede_rx_queue_release(dev->data->rx_queues[qid]);
+		dev->data->rx_queues[qid] = NULL;
+	}
+
+	max_rx_pkt_len = (uint16_t)rxmode->max_rx_pkt_len;
+
+	/* Fix up RX buffer size */
+	bufsz = (uint16_t)rte_pktmbuf_data_room_size(mp) - RTE_PKTMBUF_HEADROOM;
+	/* cache align the mbuf size to simplfy rx_buf_size calculation */
+	bufsz = QEDE_FLOOR_TO_CACHE_LINE_SIZE(bufsz);
+	if ((rxmode->offloads & DEV_RX_OFFLOAD_SCATTER)	||
+	    (max_rx_pkt_len + QEDE_ETH_OVERHEAD) > bufsz) {
+		if (!dev->data->scattered_rx) {
+			DP_INFO(edev, "Forcing scatter-gather mode\n");
+			dev->data->scattered_rx = 1;
+		}
+	}
+
+	rc = qede_calc_rx_buf_size(dev, bufsz, max_rx_pkt_len);
+	if (rc < 0)
+		return rc;
+
+	bufsz = rc;
+
+	rxq = qede_alloc_rx_queue_mem(dev, qid, nb_desc,
+				      socket_id, mp, bufsz);
+	if (!rxq)
+		return -ENOMEM;
+
+	dev->data->rx_queues[qid] = rxq;
+	qdev->fp_array[qid].rxq = rxq;
 
 	DP_INFO(edev, "rxq %d num_desc %u rx_buf_size=%u socket %u\n",
-		  queue_idx, nb_desc, rxq->rx_buf_size, socket_id);
+		  qid, nb_desc, rxq->rx_buf_size, socket_id);
 
 	return 0;
 }
@@ -278,6 +300,17 @@ static void qede_rx_queue_release_mbufs(struct qede_rx_queue *rxq)
 	}
 }
 
+static void _qede_rx_queue_release(struct qede_dev *qdev,
+				   struct ecore_dev *edev,
+				   struct qede_rx_queue *rxq)
+{
+	qede_rx_queue_release_mbufs(rxq);
+	qdev->ops->common->chain_free(edev, &rxq->rx_bd_ring);
+	qdev->ops->common->chain_free(edev, &rxq->rx_comp_ring);
+	rte_free(rxq->sw_rx_ring);
+	rte_free(rxq);
+}
+
 void qede_rx_queue_release(void *rx_queue)
 {
 	struct qede_rx_queue *rxq = rx_queue;
@@ -288,11 +321,7 @@ void qede_rx_queue_release(void *rx_queue)
 		qdev = rxq->qdev;
 		edev = QEDE_INIT_EDEV(qdev);
 		PMD_INIT_FUNC_TRACE(edev);
-		qede_rx_queue_release_mbufs(rxq);
-		qdev->ops->common->chain_free(edev, &rxq->rx_bd_ring);
-		qdev->ops->common->chain_free(edev, &rxq->rx_comp_ring);
-		rte_free(rxq->sw_rx_ring);
-		rte_free(rxq);
+		_qede_rx_queue_release(qdev, edev, rxq);
 	}
 }
 
@@ -306,8 +335,8 @@ static int qede_rx_queue_stop(struct rte_eth_dev *eth_dev, uint16_t rx_queue_id)
 	int hwfn_index;
 	int rc;
 
-	if (rx_queue_id < eth_dev->data->nb_rx_queues) {
-		rxq = eth_dev->data->rx_queues[rx_queue_id];
+	if (rx_queue_id < qdev->num_rx_queues) {
+		rxq = qdev->fp_array[rx_queue_id].rxq;
 		hwfn_index = rx_queue_id % edev->num_hwfns;
 		p_hwfn = &edev->hwfns[hwfn_index];
 		rc = ecore_eth_rx_queue_stop(p_hwfn, rxq->handle,
@@ -329,32 +358,18 @@ static int qede_rx_queue_stop(struct rte_eth_dev *eth_dev, uint16_t rx_queue_id)
 	return rc;
 }
 
-int
-qede_tx_queue_setup(struct rte_eth_dev *dev,
-		    uint16_t queue_idx,
-		    uint16_t nb_desc,
-		    unsigned int socket_id,
-		    const struct rte_eth_txconf *tx_conf)
+static struct qede_tx_queue *
+qede_alloc_tx_queue_mem(struct rte_eth_dev *dev,
+			uint16_t queue_idx,
+			uint16_t nb_desc,
+			unsigned int socket_id,
+			const struct rte_eth_txconf *tx_conf)
 {
 	struct qede_dev *qdev = dev->data->dev_private;
 	struct ecore_dev *edev = &qdev->edev;
 	struct qede_tx_queue *txq;
 	int rc;
 
-	PMD_INIT_FUNC_TRACE(edev);
-
-	if (!rte_is_power_of_2(nb_desc)) {
-		DP_ERR(edev, "Ring size %u is not power of 2\n",
-		       nb_desc);
-		return -EINVAL;
-	}
-
-	/* Free memory prior to re-allocation if needed... */
-	if (dev->data->tx_queues[queue_idx] != NULL) {
-		qede_tx_queue_release(dev->data->tx_queues[queue_idx]);
-		dev->data->tx_queues[queue_idx] = NULL;
-	}
-
 	txq = rte_zmalloc_socket("qede_tx_queue", sizeof(struct qede_tx_queue),
 				 RTE_CACHE_LINE_SIZE, socket_id);
 
@@ -362,7 +377,7 @@ qede_tx_queue_setup(struct rte_eth_dev *dev,
 		DP_ERR(edev,
 		       "Unable to allocate memory for txq on socket %u",
 		       socket_id);
-		return -ENOMEM;
+		return NULL;
 	}
 
 	txq->nb_tx_desc = nb_desc;
@@ -382,7 +397,7 @@ qede_tx_queue_setup(struct rte_eth_dev *dev,
 		       "Unable to allocate memory for txbd ring on socket %u",
 		       socket_id);
 		qede_tx_queue_release(txq);
-		return -ENOMEM;
+		return NULL;
 	}
 
 	/* Allocate software ring */
@@ -397,7 +412,7 @@ qede_tx_queue_setup(struct rte_eth_dev *dev,
 		       socket_id);
 		qdev->ops->common->chain_free(edev, &txq->tx_pbl);
 		qede_tx_queue_release(txq);
-		return -ENOMEM;
+		return NULL;
 	}
 
 	txq->queue_id = queue_idx;
@@ -408,12 +423,44 @@ qede_tx_queue_setup(struct rte_eth_dev *dev,
 	    tx_conf->tx_free_thresh ? tx_conf->tx_free_thresh :
 	    (txq->nb_tx_desc - QEDE_DEFAULT_TX_FREE_THRESH);
 
-	dev->data->tx_queues[queue_idx] = txq;
-	qdev->fp_array[queue_idx].txq = txq;
-
 	DP_INFO(edev,
 		  "txq %u num_desc %u tx_free_thresh %u socket %u\n",
 		  queue_idx, nb_desc, txq->tx_free_thresh, socket_id);
+	return txq;
+}
+
+int
+qede_tx_queue_setup(struct rte_eth_dev *dev,
+		    uint16_t queue_idx,
+		    uint16_t nb_desc,
+		    unsigned int socket_id,
+		    const struct rte_eth_txconf *tx_conf)
+{
+	struct qede_dev *qdev = dev->data->dev_private;
+	struct ecore_dev *edev = &qdev->edev;
+	struct qede_tx_queue *txq;
+
+	PMD_INIT_FUNC_TRACE(edev);
+
+	if (!rte_is_power_of_2(nb_desc)) {
+		DP_ERR(edev, "Ring size %u is not power of 2\n",
+		       nb_desc);
+		return -EINVAL;
+	}
+
+	/* Free memory prior to re-allocation if needed... */
+	if (dev->data->tx_queues[queue_idx] != NULL) {
+		qede_tx_queue_release(dev->data->tx_queues[queue_idx]);
+		dev->data->tx_queues[queue_idx] = NULL;
+	}
+
+	txq = qede_alloc_tx_queue_mem(dev, queue_idx, nb_desc,
+				      socket_id, tx_conf);
+	if (!txq)
+		return -ENOMEM;
+
+	dev->data->tx_queues[queue_idx] = txq;
+	qdev->fp_array[queue_idx].txq = txq;
 
 	return 0;
 }
@@ -443,6 +490,16 @@ static void qede_tx_queue_release_mbufs(struct qede_tx_queue *txq)
 	}
 }
 
+static void _qede_tx_queue_release(struct qede_dev *qdev,
+				   struct ecore_dev *edev,
+				   struct qede_tx_queue *txq)
+{
+	qede_tx_queue_release_mbufs(txq);
+	qdev->ops->common->chain_free(edev, &txq->tx_pbl);
+	rte_free(txq->sw_tx_ring);
+	rte_free(txq);
+}
+
 void qede_tx_queue_release(void *tx_queue)
 {
 	struct qede_tx_queue *txq = tx_queue;
@@ -453,10 +510,7 @@ void qede_tx_queue_release(void *tx_queue)
 		qdev = txq->qdev;
 		edev = QEDE_INIT_EDEV(qdev);
 		PMD_INIT_FUNC_TRACE(edev);
-		qede_tx_queue_release_mbufs(txq);
-		qdev->ops->common->chain_free(edev, &txq->tx_pbl);
-		rte_free(txq->sw_tx_ring);
-		rte_free(txq);
+		_qede_tx_queue_release(qdev, edev, txq);
 	}
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [dpdk-dev] [PATCH v1 2/5] net/qede: fix ovs-dpdk failure when using odd number of queues on 100Gb mode
  2019-09-06  7:32 [dpdk-dev] [PATCH v1 0/5] net/qede: fixes and enhancement Shahed Shaikh
  2019-09-06  7:32 ` [dpdk-dev] [PATCH v1 1/5] net/qede: refactor Rx and Tx queue setup Shahed Shaikh
@ 2019-09-06  7:32 ` Shahed Shaikh
  2019-09-06  7:32 ` [dpdk-dev] [PATCH v1 3/5] net/qede: fix RSS configuration as per new 100Gb queue allocation method Shahed Shaikh
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Shahed Shaikh @ 2019-09-06  7:32 UTC (permalink / raw)
  To: dev; +Cc: rmody, jerinj, GR-Everest-DPDK-Dev, stable

As per HW design of 100Gb mode, device internally uses 2 engines
(eng0 and eng1), and both engines need to be configured symmetrically.
Based on this requirement, driver design chose an approach
to allow user to allocate only even number of queues and split
those queues on both engines equally.

This approach puts a limitation on number of queues to be allocated -
i.e. user can't configure odd number of queues on 100Gb mode.
OVS configures DPDK port with 1 rxq and 1 txq, which causes initialization
of qede port to fail.

Issue is fixed by changing the implementation of queue allocation and
assignment to hw engines only for 100Gb devices and allowing user to
configure odd number queues.

New approach works as below -
- Create 'struct qede_fastpath_cmt' to hold hw queue pair of both engines
  and provide it to rte_ethdev's Rx/Tx queues structure.
- So ethdev will see only one queue for underlying queue pair created for
  hw engine pair.
- Install separate Rx/Tx data path handlers for 100Gb mode and regular mode
- Rx/Tx handlers for 100Gb mode will split packet processing across both
  engines by providing hw queue structures from 'struct qede_fastpath_cmt'
  passed by Rx/Tx callbacks to respective engines.

Fixes: 2af14ca79c0a ("net/qede: support 100G")
Cc: stable@dpdk.org

Signed-off-by: Shahed Shaikh <shshaikh@marvell.com>
---
 drivers/net/qede/qede_ethdev.c | 112 ++++++++++++-----------
 drivers/net/qede/qede_ethdev.h |   5 +-
 drivers/net/qede/qede_filter.c |   5 +-
 drivers/net/qede/qede_rxtx.c   | 162 +++++++++++++++++++++++++++------
 drivers/net/qede/qede_rxtx.h   |  26 +++++-
 5 files changed, 219 insertions(+), 91 deletions(-)

diff --git a/drivers/net/qede/qede_ethdev.c b/drivers/net/qede/qede_ethdev.c
index 528b33e8c..308588cb8 100644
--- a/drivers/net/qede/qede_ethdev.c
+++ b/drivers/net/qede/qede_ethdev.c
@@ -304,6 +304,7 @@ static void qede_print_adapter_info(struct qede_dev *qdev)
 
 static void qede_reset_queue_stats(struct qede_dev *qdev, bool xstats)
 {
+	struct rte_eth_dev *dev = (struct rte_eth_dev *)qdev->ethdev;
 	struct ecore_dev *edev = QEDE_INIT_EDEV(qdev);
 	unsigned int i = 0, j = 0, qid;
 	unsigned int rxq_stat_cntrs, txq_stat_cntrs;
@@ -311,12 +312,12 @@ static void qede_reset_queue_stats(struct qede_dev *qdev, bool xstats)
 
 	DP_VERBOSE(edev, ECORE_MSG_DEBUG, "Clearing queue stats\n");
 
-	rxq_stat_cntrs = RTE_MIN(QEDE_RSS_COUNT(qdev),
+	rxq_stat_cntrs = RTE_MIN(QEDE_RSS_COUNT(dev),
 			       RTE_ETHDEV_QUEUE_STAT_CNTRS);
-	txq_stat_cntrs = RTE_MIN(QEDE_TSS_COUNT(qdev),
+	txq_stat_cntrs = RTE_MIN(QEDE_TSS_COUNT(dev),
 			       RTE_ETHDEV_QUEUE_STAT_CNTRS);
 
-	for_each_rss(qid) {
+	for (qid = 0; qid < qdev->num_rx_queues; qid++) {
 		OSAL_MEMSET(((char *)(qdev->fp_array[qid].rxq)) +
 			     offsetof(struct qede_rx_queue, rcv_pkts), 0,
 			    sizeof(uint64_t));
@@ -342,7 +343,7 @@ static void qede_reset_queue_stats(struct qede_dev *qdev, bool xstats)
 
 	i = 0;
 
-	for_each_tss(qid) {
+	for (qid = 0; qid < qdev->num_tx_queues; qid++) {
 		txq = qdev->fp_array[qid].txq;
 
 		OSAL_MEMSET((uint64_t *)(uintptr_t)
@@ -991,7 +992,7 @@ int qede_config_rss(struct rte_eth_dev *eth_dev)
 	for (i = 0; i < ECORE_RSS_IND_TABLE_SIZE; i++) {
 		id = i / RTE_RETA_GROUP_SIZE;
 		pos = i % RTE_RETA_GROUP_SIZE;
-		q = i % QEDE_RSS_COUNT(qdev);
+		q = i % QEDE_RSS_COUNT(eth_dev);
 		reta_conf[id].reta[pos] = q;
 	}
 	if (qede_rss_reta_update(eth_dev, &reta_conf[0],
@@ -1165,22 +1166,6 @@ static int qede_dev_configure(struct rte_eth_dev *eth_dev)
 
 	PMD_INIT_FUNC_TRACE(edev);
 
-	/* Check requirements for 100G mode */
-	if (ECORE_IS_CMT(edev)) {
-		if (eth_dev->data->nb_rx_queues < 2 ||
-		    eth_dev->data->nb_tx_queues < 2) {
-			DP_ERR(edev, "100G mode needs min. 2 RX/TX queues\n");
-			return -EINVAL;
-		}
-
-		if ((eth_dev->data->nb_rx_queues % 2 != 0) ||
-		    (eth_dev->data->nb_tx_queues % 2 != 0)) {
-			DP_ERR(edev,
-			       "100G mode needs even no. of RX/TX queues\n");
-			return -EINVAL;
-		}
-	}
-
 	/* We need to have min 1 RX queue.There is no min check in
 	 * rte_eth_dev_configure(), so we are checking it here.
 	 */
@@ -1207,8 +1192,9 @@ static int qede_dev_configure(struct rte_eth_dev *eth_dev)
 		return -ENOTSUP;
 
 	qede_dealloc_fp_resc(eth_dev);
-	qdev->num_tx_queues = eth_dev->data->nb_tx_queues;
-	qdev->num_rx_queues = eth_dev->data->nb_rx_queues;
+	qdev->num_tx_queues = eth_dev->data->nb_tx_queues * edev->num_hwfns;
+	qdev->num_rx_queues = eth_dev->data->nb_rx_queues * edev->num_hwfns;
+
 	if (qede_alloc_fp_resc(qdev))
 		return -ENOMEM;
 
@@ -1233,7 +1219,12 @@ static int qede_dev_configure(struct rte_eth_dev *eth_dev)
 		return ret;
 
 	DP_INFO(edev, "Device configured with RSS=%d TSS=%d\n",
-			QEDE_RSS_COUNT(qdev), QEDE_TSS_COUNT(qdev));
+			QEDE_RSS_COUNT(eth_dev), QEDE_TSS_COUNT(eth_dev));
+
+	if (ECORE_IS_CMT(edev))
+		DP_INFO(edev, "Actual HW queues for CMT mode - RX = %d TX = %d\n",
+			qdev->num_rx_queues, qdev->num_tx_queues);
+
 
 	return 0;
 }
@@ -1275,6 +1266,10 @@ qede_dev_info_get(struct rte_eth_dev *eth_dev,
 	else
 		dev_info->max_rx_queues = (uint16_t)RTE_MIN(
 			QEDE_MAX_RSS_CNT(qdev), ECORE_MAX_VF_CHAINS_PER_PF);
+	/* Since CMT mode internally doubles the number of queues */
+	if (ECORE_IS_CMT(edev))
+		dev_info->max_rx_queues  = dev_info->max_rx_queues / 2;
+
 	dev_info->max_tx_queues = dev_info->max_rx_queues;
 
 	dev_info->max_mac_addrs = qdev->dev_info.num_mac_filters;
@@ -1518,18 +1513,18 @@ qede_get_stats(struct rte_eth_dev *eth_dev, struct rte_eth_stats *eth_stats)
 	eth_stats->oerrors = stats.common.tx_err_drop_pkts;
 
 	/* Queue stats */
-	rxq_stat_cntrs = RTE_MIN(QEDE_RSS_COUNT(qdev),
+	rxq_stat_cntrs = RTE_MIN(QEDE_RSS_COUNT(eth_dev),
 			       RTE_ETHDEV_QUEUE_STAT_CNTRS);
-	txq_stat_cntrs = RTE_MIN(QEDE_TSS_COUNT(qdev),
+	txq_stat_cntrs = RTE_MIN(QEDE_TSS_COUNT(eth_dev),
 			       RTE_ETHDEV_QUEUE_STAT_CNTRS);
-	if ((rxq_stat_cntrs != (unsigned int)QEDE_RSS_COUNT(qdev)) ||
-	    (txq_stat_cntrs != (unsigned int)QEDE_TSS_COUNT(qdev)))
+	if (rxq_stat_cntrs != (unsigned int)QEDE_RSS_COUNT(eth_dev) ||
+	    txq_stat_cntrs != (unsigned int)QEDE_TSS_COUNT(eth_dev))
 		DP_VERBOSE(edev, ECORE_MSG_DEBUG,
 		       "Not all the queue stats will be displayed. Set"
 		       " RTE_ETHDEV_QUEUE_STAT_CNTRS config param"
 		       " appropriately and retry.\n");
 
-	for_each_rss(qid) {
+	for (qid = 0; qid < eth_dev->data->nb_rx_queues; qid++) {
 		eth_stats->q_ipackets[i] =
 			*(uint64_t *)(
 				((char *)(qdev->fp_array[qid].rxq)) +
@@ -1549,7 +1544,7 @@ qede_get_stats(struct rte_eth_dev *eth_dev, struct rte_eth_stats *eth_stats)
 			break;
 	}
 
-	for_each_tss(qid) {
+	for (qid = 0; qid < eth_dev->data->nb_tx_queues; qid++) {
 		txq = qdev->fp_array[qid].txq;
 		eth_stats->q_opackets[j] =
 			*((uint64_t *)(uintptr_t)
@@ -1566,18 +1561,18 @@ qede_get_stats(struct rte_eth_dev *eth_dev, struct rte_eth_stats *eth_stats)
 
 static unsigned
 qede_get_xstats_count(struct qede_dev *qdev) {
+	struct rte_eth_dev *dev = (struct rte_eth_dev *)qdev->ethdev;
+
 	if (ECORE_IS_BB(&qdev->edev))
 		return RTE_DIM(qede_xstats_strings) +
 		       RTE_DIM(qede_bb_xstats_strings) +
 		       (RTE_DIM(qede_rxq_xstats_strings) *
-			RTE_MIN(QEDE_RSS_COUNT(qdev),
-				RTE_ETHDEV_QUEUE_STAT_CNTRS));
+			QEDE_RSS_COUNT(dev) * qdev->edev.num_hwfns);
 	else
 		return RTE_DIM(qede_xstats_strings) +
 		       RTE_DIM(qede_ah_xstats_strings) +
 		       (RTE_DIM(qede_rxq_xstats_strings) *
-			RTE_MIN(QEDE_RSS_COUNT(qdev),
-				RTE_ETHDEV_QUEUE_STAT_CNTRS));
+			QEDE_RSS_COUNT(dev));
 }
 
 static int
@@ -1615,7 +1610,7 @@ qede_get_xstats_names(struct rte_eth_dev *dev,
 			}
 		}
 
-		rxq_stat_cntrs = RTE_MIN(QEDE_RSS_COUNT(qdev),
+		rxq_stat_cntrs = RTE_MIN(QEDE_RSS_COUNT(dev),
 					 RTE_ETHDEV_QUEUE_STAT_CNTRS);
 		for (qid = 0; qid < rxq_stat_cntrs; qid++) {
 			for (i = 0; i < RTE_DIM(qede_rxq_xstats_strings); i++) {
@@ -1673,17 +1668,15 @@ qede_get_xstats(struct rte_eth_dev *dev, struct rte_eth_xstat *xstats,
 		}
 	}
 
-	rxq_stat_cntrs = RTE_MIN(QEDE_RSS_COUNT(qdev),
+	rxq_stat_cntrs = RTE_MIN(QEDE_RSS_COUNT(dev),
 				 RTE_ETHDEV_QUEUE_STAT_CNTRS);
 	for (qid = 0; qid < rxq_stat_cntrs; qid++) {
-		for_each_rss(qid) {
-			for (i = 0; i < RTE_DIM(qede_rxq_xstats_strings); i++) {
-				xstats[stat_idx].value = *(uint64_t *)(
-					((char *)(qdev->fp_array[qid].rxq)) +
-					 qede_rxq_xstats_strings[i].offset);
-				xstats[stat_idx].id = stat_idx;
-				stat_idx++;
-			}
+		for (i = 0; i < RTE_DIM(qede_rxq_xstats_strings); i++) {
+			xstats[stat_idx].value = *(uint64_t *)
+				(((char *)(qdev->fp_array[qid].rxq)) +
+				 qede_rxq_xstats_strings[i].offset);
+			xstats[stat_idx].id = stat_idx;
+			stat_idx++;
 		}
 	}
 
@@ -1938,7 +1931,8 @@ qede_dev_supported_ptypes_get(struct rte_eth_dev *eth_dev)
 		RTE_PTYPE_UNKNOWN
 	};
 
-	if (eth_dev->rx_pkt_burst == qede_recv_pkts)
+	if (eth_dev->rx_pkt_burst == qede_recv_pkts ||
+	    eth_dev->rx_pkt_burst == qede_recv_pkts_cmt)
 		return ptypes;
 
 	return NULL;
@@ -2005,7 +1999,7 @@ int qede_rss_hash_update(struct rte_eth_dev *eth_dev,
 	vport_update_params.vport_id = 0;
 	/* pass the L2 handles instead of qids */
 	for (i = 0 ; i < ECORE_RSS_IND_TABLE_SIZE ; i++) {
-		idx = i % QEDE_RSS_COUNT(qdev);
+		idx = i % QEDE_RSS_COUNT(eth_dev);
 		rss_params.rss_ind_table[i] = qdev->fp_array[idx].rxq->handle;
 	}
 	vport_update_params.rss_params = &rss_params;
@@ -2257,7 +2251,7 @@ static int qede_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
 	qdev->mtu = mtu;
 
 	/* Fix up RX buf size for all queues of the port */
-	for_each_rss(i) {
+	for (i = 0; i < qdev->num_rx_queues; i++) {
 		fp = &qdev->fp_array[i];
 		if (fp->rxq != NULL) {
 			bufsz = (uint16_t)rte_pktmbuf_data_room_size(
@@ -2286,9 +2280,13 @@ static int qede_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
 	/* update max frame size */
 	dev->data->dev_conf.rxmode.max_rx_pkt_len = max_rx_pkt_len;
 	/* Reassign back */
-	dev->rx_pkt_burst = qede_recv_pkts;
-	dev->tx_pkt_burst = qede_xmit_pkts;
-
+	if (ECORE_IS_CMT(edev)) {
+		dev->rx_pkt_burst = qede_recv_pkts_cmt;
+		dev->tx_pkt_burst = qede_xmit_pkts_cmt;
+	} else {
+		dev->rx_pkt_burst = qede_recv_pkts;
+		dev->tx_pkt_burst = qede_xmit_pkts;
+	}
 	return 0;
 }
 
@@ -2429,10 +2427,6 @@ static int qede_common_dev_init(struct rte_eth_dev *eth_dev, bool is_vf)
 		 pci_addr.bus, pci_addr.devid, pci_addr.function,
 		 eth_dev->data->port_id);
 
-	eth_dev->rx_pkt_burst = qede_recv_pkts;
-	eth_dev->tx_pkt_burst = qede_xmit_pkts;
-	eth_dev->tx_pkt_prepare = qede_xmit_prep_pkts;
-
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
 		DP_ERR(edev, "Skipping device init from secondary process\n");
 		return 0;
@@ -2490,6 +2484,16 @@ static int qede_common_dev_init(struct rte_eth_dev *eth_dev, bool is_vf)
 	strncpy((char *)params.name, QEDE_PMD_VER_PREFIX,
 		QEDE_PMD_DRV_VER_STR_SIZE);
 
+	if (ECORE_IS_CMT(edev)) {
+		eth_dev->rx_pkt_burst = qede_recv_pkts_cmt;
+		eth_dev->tx_pkt_burst = qede_xmit_pkts_cmt;
+	} else {
+		eth_dev->rx_pkt_burst = qede_recv_pkts;
+		eth_dev->tx_pkt_burst = qede_xmit_pkts;
+	}
+
+	eth_dev->tx_pkt_prepare = qede_xmit_prep_pkts;
+
 	/* For CMT mode device do periodic polling for slowpath events.
 	 * This is required since uio device uses only one MSI-x
 	 * interrupt vector but we need one for each engine.
diff --git a/drivers/net/qede/qede_ethdev.h b/drivers/net/qede/qede_ethdev.h
index d0e7c70be..5549d0bf3 100644
--- a/drivers/net/qede/qede_ethdev.h
+++ b/drivers/net/qede/qede_ethdev.h
@@ -66,8 +66,8 @@
 					(edev)->dev_info.num_tc)
 
 #define QEDE_QUEUE_CNT(qdev) ((qdev)->num_queues)
-#define QEDE_RSS_COUNT(qdev) ((qdev)->num_rx_queues)
-#define QEDE_TSS_COUNT(qdev) ((qdev)->num_tx_queues)
+#define QEDE_RSS_COUNT(dev) ((dev)->data->nb_rx_queues)
+#define QEDE_TSS_COUNT(dev) ((dev)->data->nb_tx_queues)
 
 #define QEDE_DUPLEX_FULL	1
 #define QEDE_DUPLEX_HALF	2
@@ -215,6 +215,7 @@ struct qede_dev {
 	struct qed_dev_eth_info dev_info;
 	struct ecore_sb_info *sb_array;
 	struct qede_fastpath *fp_array;
+	struct qede_fastpath_cmt *fp_array_cmt;
 	uint16_t mtu;
 	bool enable_tx_switching;
 	bool rss_enable;
diff --git a/drivers/net/qede/qede_filter.c b/drivers/net/qede/qede_filter.c
index b3f62e0dd..81509f04b 100644
--- a/drivers/net/qede/qede_filter.c
+++ b/drivers/net/qede/qede_filter.c
@@ -431,7 +431,7 @@ qede_fdir_filter_add(struct rte_eth_dev *eth_dev,
 		return -EINVAL;
 	}
 
-	if (fdir->action.rx_queue >= QEDE_RSS_COUNT(qdev)) {
+	if (fdir->action.rx_queue >= QEDE_RSS_COUNT(eth_dev)) {
 		DP_ERR(edev, "invalid queue number %u\n",
 		       fdir->action.rx_queue);
 		return -EINVAL;
@@ -1345,7 +1345,6 @@ qede_flow_parse_actions(struct rte_eth_dev *dev,
 			struct rte_flow_error *error,
 			struct rte_flow *flow)
 {
-	struct qede_dev *qdev = QEDE_INIT_QDEV(dev);
 	const struct rte_flow_action_queue *queue;
 
 	if (actions == NULL) {
@@ -1360,7 +1359,7 @@ qede_flow_parse_actions(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_QUEUE:
 			queue = actions->conf;
 
-			if (queue->index >= QEDE_RSS_COUNT(qdev)) {
+			if (queue->index >= QEDE_RSS_COUNT(dev)) {
 				rte_flow_error_set(error, EINVAL,
 						   RTE_FLOW_ERROR_TYPE_ACTION,
 						   actions,
diff --git a/drivers/net/qede/qede_rxtx.c b/drivers/net/qede/qede_rxtx.c
index cb8ac9bf6..31276e8a6 100644
--- a/drivers/net/qede/qede_rxtx.c
+++ b/drivers/net/qede/qede_rxtx.c
@@ -260,13 +260,30 @@ qede_rx_queue_setup(struct rte_eth_dev *dev, uint16_t qid,
 
 	bufsz = rc;
 
-	rxq = qede_alloc_rx_queue_mem(dev, qid, nb_desc,
-				      socket_id, mp, bufsz);
-	if (!rxq)
-		return -ENOMEM;
+	if (ECORE_IS_CMT(edev)) {
+		rxq = qede_alloc_rx_queue_mem(dev, qid * 2, nb_desc,
+					      socket_id, mp, bufsz);
+		if (!rxq)
+			return -ENOMEM;
+
+		qdev->fp_array[qid * 2].rxq = rxq;
+		rxq = qede_alloc_rx_queue_mem(dev, qid * 2 + 1, nb_desc,
+					      socket_id, mp, bufsz);
+		if (!rxq)
+			return -ENOMEM;
+
+		qdev->fp_array[qid * 2 + 1].rxq = rxq;
+		/* provide per engine fp struct as rx queue */
+		dev->data->rx_queues[qid] = &qdev->fp_array_cmt[qid];
+	} else {
+		rxq = qede_alloc_rx_queue_mem(dev, qid, nb_desc,
+					      socket_id, mp, bufsz);
+		if (!rxq)
+			return -ENOMEM;
 
-	dev->data->rx_queues[qid] = rxq;
-	qdev->fp_array[qid].rxq = rxq;
+		dev->data->rx_queues[qid] = rxq;
+		qdev->fp_array[qid].rxq = rxq;
+	}
 
 	DP_INFO(edev, "rxq %d num_desc %u rx_buf_size=%u socket %u\n",
 		  qid, nb_desc, rxq->rx_buf_size, socket_id);
@@ -314,6 +331,7 @@ static void _qede_rx_queue_release(struct qede_dev *qdev,
 void qede_rx_queue_release(void *rx_queue)
 {
 	struct qede_rx_queue *rxq = rx_queue;
+	struct qede_fastpath_cmt *fp_cmt;
 	struct qede_dev *qdev;
 	struct ecore_dev *edev;
 
@@ -321,7 +339,13 @@ void qede_rx_queue_release(void *rx_queue)
 		qdev = rxq->qdev;
 		edev = QEDE_INIT_EDEV(qdev);
 		PMD_INIT_FUNC_TRACE(edev);
-		_qede_rx_queue_release(qdev, edev, rxq);
+		if (ECORE_IS_CMT(edev)) {
+			fp_cmt = rx_queue;
+			_qede_rx_queue_release(qdev, edev, fp_cmt->fp0->rxq);
+			_qede_rx_queue_release(qdev, edev, fp_cmt->fp1->rxq);
+		} else {
+			_qede_rx_queue_release(qdev, edev, rxq);
+		}
 	}
 }
 
@@ -454,13 +478,30 @@ qede_tx_queue_setup(struct rte_eth_dev *dev,
 		dev->data->tx_queues[queue_idx] = NULL;
 	}
 
-	txq = qede_alloc_tx_queue_mem(dev, queue_idx, nb_desc,
-				      socket_id, tx_conf);
-	if (!txq)
-		return -ENOMEM;
+	if (ECORE_IS_CMT(edev)) {
+		txq = qede_alloc_tx_queue_mem(dev, queue_idx * 2, nb_desc,
+					      socket_id, tx_conf);
+		if (!txq)
+			return -ENOMEM;
+
+		qdev->fp_array[queue_idx * 2].txq = txq;
+		txq = qede_alloc_tx_queue_mem(dev, (queue_idx * 2) + 1, nb_desc,
+					      socket_id, tx_conf);
+		if (!txq)
+			return -ENOMEM;
+
+		qdev->fp_array[(queue_idx * 2) + 1].txq = txq;
+		dev->data->tx_queues[queue_idx] =
+					&qdev->fp_array_cmt[queue_idx];
+	} else {
+		txq = qede_alloc_tx_queue_mem(dev, queue_idx, nb_desc,
+					      socket_id, tx_conf);
+		if (!txq)
+			return -ENOMEM;
 
-	dev->data->tx_queues[queue_idx] = txq;
-	qdev->fp_array[queue_idx].txq = txq;
+		dev->data->tx_queues[queue_idx] = txq;
+		qdev->fp_array[queue_idx].txq = txq;
+	}
 
 	return 0;
 }
@@ -503,6 +544,7 @@ static void _qede_tx_queue_release(struct qede_dev *qdev,
 void qede_tx_queue_release(void *tx_queue)
 {
 	struct qede_tx_queue *txq = tx_queue;
+	struct qede_fastpath_cmt *fp_cmt;
 	struct qede_dev *qdev;
 	struct ecore_dev *edev;
 
@@ -510,7 +552,14 @@ void qede_tx_queue_release(void *tx_queue)
 		qdev = txq->qdev;
 		edev = QEDE_INIT_EDEV(qdev);
 		PMD_INIT_FUNC_TRACE(edev);
-		_qede_tx_queue_release(qdev, edev, txq);
+
+		if (ECORE_IS_CMT(edev)) {
+			fp_cmt = tx_queue;
+			_qede_tx_queue_release(qdev, edev, fp_cmt->fp0->txq);
+			_qede_tx_queue_release(qdev, edev, fp_cmt->fp1->txq);
+		} else {
+			_qede_tx_queue_release(qdev, edev, txq);
+		}
 	}
 }
 
@@ -546,7 +595,7 @@ int qede_alloc_fp_resc(struct qede_dev *qdev)
 {
 	struct ecore_dev *edev = &qdev->edev;
 	struct qede_fastpath *fp;
-	uint32_t num_sbs;
+	uint32_t num_sbs, i;
 	uint16_t sb_idx;
 
 	if (IS_VF(edev))
@@ -571,6 +620,28 @@ int qede_alloc_fp_resc(struct qede_dev *qdev)
 	memset((void *)qdev->fp_array, 0, QEDE_RXTX_MAX(qdev) *
 			sizeof(*qdev->fp_array));
 
+	if (ECORE_IS_CMT(edev)) {
+		qdev->fp_array_cmt = rte_calloc("fp_cmt",
+						QEDE_RXTX_MAX(qdev) / 2,
+						sizeof(*qdev->fp_array_cmt),
+						RTE_CACHE_LINE_SIZE);
+
+		if (!qdev->fp_array_cmt) {
+			DP_ERR(edev, "fp array for CMT allocation failed\n");
+			return -ENOMEM;
+		}
+
+		memset((void *)qdev->fp_array_cmt, 0,
+		       (QEDE_RXTX_MAX(qdev) / 2) * sizeof(*qdev->fp_array_cmt));
+
+		/* Establish the mapping of fp_array with fp_array_cmt */
+		for (i = 0; i < QEDE_RXTX_MAX(qdev) / 2; i++) {
+			qdev->fp_array_cmt[i].qdev = qdev;
+			qdev->fp_array_cmt[i].fp0 = &qdev->fp_array[i * 2];
+			qdev->fp_array_cmt[i].fp1 = &qdev->fp_array[i * 2 + 1];
+		}
+	}
+
 	for (sb_idx = 0; sb_idx < QEDE_RXTX_MAX(qdev); sb_idx++) {
 		fp = &qdev->fp_array[sb_idx];
 		if (!fp)
@@ -635,6 +706,10 @@ void qede_dealloc_fp_resc(struct rte_eth_dev *eth_dev)
 	if (qdev->fp_array)
 		rte_free(qdev->fp_array);
 	qdev->fp_array = NULL;
+
+	if (qdev->fp_array_cmt)
+		rte_free(qdev->fp_array_cmt);
+	qdev->fp_array_cmt = NULL;
 }
 
 static inline void
@@ -686,9 +761,9 @@ qede_rx_queue_start(struct rte_eth_dev *eth_dev, uint16_t rx_queue_id)
 	int hwfn_index;
 	int rc;
 
-	if (rx_queue_id < eth_dev->data->nb_rx_queues) {
+	if (rx_queue_id < qdev->num_rx_queues) {
 		fp = &qdev->fp_array[rx_queue_id];
-		rxq = eth_dev->data->rx_queues[rx_queue_id];
+		rxq = fp->rxq;
 		/* Allocate buffers for the Rx ring */
 		for (j = 0; j < rxq->nb_rx_desc; j++) {
 			rc = qede_alloc_rx_buffer(rxq);
@@ -757,9 +832,9 @@ qede_tx_queue_start(struct rte_eth_dev *eth_dev, uint16_t tx_queue_id)
 	int hwfn_index;
 	int rc;
 
-	if (tx_queue_id < eth_dev->data->nb_tx_queues) {
-		txq = eth_dev->data->tx_queues[tx_queue_id];
+	if (tx_queue_id < qdev->num_tx_queues) {
 		fp = &qdev->fp_array[tx_queue_id];
+		txq = fp->txq;
 		memset(&params, 0, sizeof(params));
 		params.queue_id = tx_queue_id / edev->num_hwfns;
 		params.vport_id = 0;
@@ -900,8 +975,8 @@ static int qede_tx_queue_stop(struct rte_eth_dev *eth_dev, uint16_t tx_queue_id)
 	int hwfn_index;
 	int rc;
 
-	if (tx_queue_id < eth_dev->data->nb_tx_queues) {
-		txq = eth_dev->data->tx_queues[tx_queue_id];
+	if (tx_queue_id < qdev->num_tx_queues) {
+		txq = qdev->fp_array[tx_queue_id].txq;
 		/* Drain txq */
 		if (qede_drain_txq(qdev, txq, true))
 			return -1; /* For the lack of retcodes */
@@ -932,13 +1007,13 @@ int qede_start_queues(struct rte_eth_dev *eth_dev)
 	uint8_t id;
 	int rc = -1;
 
-	for_each_rss(id) {
+	for (id = 0; id < qdev->num_rx_queues; id++) {
 		rc = qede_rx_queue_start(eth_dev, id);
 		if (rc != ECORE_SUCCESS)
 			return -1;
 	}
 
-	for_each_tss(id) {
+	for (id = 0; id < qdev->num_tx_queues; id++) {
 		rc = qede_tx_queue_start(eth_dev, id);
 		if (rc != ECORE_SUCCESS)
 			return -1;
@@ -953,13 +1028,11 @@ void qede_stop_queues(struct rte_eth_dev *eth_dev)
 	uint8_t id;
 
 	/* Stopping RX/TX queues */
-	for_each_tss(id) {
+	for (id = 0; id < qdev->num_tx_queues; id++)
 		qede_tx_queue_stop(eth_dev, id);
-	}
 
-	for_each_rss(id) {
+	for (id = 0; id < qdev->num_rx_queues; id++)
 		qede_rx_queue_stop(eth_dev, id);
-	}
 }
 
 static inline bool qede_tunn_exist(uint16_t flag)
@@ -1741,6 +1814,23 @@ qede_recv_pkts(void *p_rxq, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 	return rx_pkt;
 }
 
+uint16_t
+qede_recv_pkts_cmt(void *p_fp_cmt, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
+{
+	struct qede_fastpath_cmt *fp_cmt = p_fp_cmt;
+	uint16_t eng0_pkts, eng1_pkts;
+
+	eng0_pkts = nb_pkts / 2;
+
+	eng0_pkts = qede_recv_pkts(fp_cmt->fp0->rxq, rx_pkts, eng0_pkts);
+
+	eng1_pkts = nb_pkts - eng0_pkts;
+
+	eng1_pkts = qede_recv_pkts(fp_cmt->fp1->rxq, rx_pkts + eng0_pkts,
+				   eng1_pkts);
+
+	return eng0_pkts + eng1_pkts;
+}
 
 /* Populate scatter gather buffer descriptor fields */
 static inline uint16_t
@@ -2263,6 +2353,24 @@ qede_xmit_pkts(void *p_txq, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	return nb_pkt_sent;
 }
 
+uint16_t
+qede_xmit_pkts_cmt(void *p_fp_cmt, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	struct qede_fastpath_cmt *fp_cmt = p_fp_cmt;
+	uint16_t eng0_pkts, eng1_pkts;
+
+	eng0_pkts = nb_pkts / 2;
+
+	eng0_pkts = qede_xmit_pkts(fp_cmt->fp0->txq, tx_pkts, eng0_pkts);
+
+	eng1_pkts = nb_pkts - eng0_pkts;
+
+	eng1_pkts = qede_xmit_pkts(fp_cmt->fp1->txq, tx_pkts + eng0_pkts,
+				   eng1_pkts);
+
+	return eng0_pkts + eng1_pkts;
+}
+
 uint16_t
 qede_rxtx_pkts_dummy(__rte_unused void *p_rxq,
 		     __rte_unused struct rte_mbuf **pkts,
diff --git a/drivers/net/qede/qede_rxtx.h b/drivers/net/qede/qede_rxtx.h
index 41a5f0f5c..75cc930fd 100644
--- a/drivers/net/qede/qede_rxtx.h
+++ b/drivers/net/qede/qede_rxtx.h
@@ -81,10 +81,8 @@
 				 ETH_RSS_VXLAN			|\
 				 ETH_RSS_GENEVE)
 
-#define for_each_rss(i)		for (i = 0; i < qdev->num_rx_queues; i++)
-#define for_each_tss(i)		for (i = 0; i < qdev->num_tx_queues; i++)
 #define QEDE_RXTX_MAX(qdev) \
-	(RTE_MAX(QEDE_RSS_COUNT(qdev), QEDE_TSS_COUNT(qdev)))
+	(RTE_MAX(qdev->num_rx_queues, qdev->num_tx_queues))
 
 /* Macros for non-tunnel packet types lkup table */
 #define QEDE_PKT_TYPE_UNKNOWN				0x0
@@ -179,6 +177,8 @@ struct qede_agg_info {
  * Structure associated with each RX queue.
  */
 struct qede_rx_queue {
+	/* Always keep qdev as first member */
+	struct qede_dev *qdev;
 	struct rte_mempool *mb_pool;
 	struct ecore_chain rx_bd_ring;
 	struct ecore_chain rx_comp_ring;
@@ -199,7 +199,6 @@ struct qede_rx_queue {
 	uint64_t rx_hw_errors;
 	uint64_t rx_alloc_errors;
 	struct qede_agg_info tpa_info[ETH_TPA_MAX_AGGS_NUM];
-	struct qede_dev *qdev;
 	void *handle;
 };
 
@@ -217,6 +216,8 @@ union db_prod {
 };
 
 struct qede_tx_queue {
+	/* Always keep qdev as first member */
+	struct qede_dev *qdev;
 	struct ecore_chain tx_pbl;
 	struct qede_tx_entry *sw_tx_ring;
 	uint16_t nb_tx_desc;
@@ -231,7 +232,6 @@ struct qede_tx_queue {
 	uint16_t port_id;
 	uint64_t xmit_pkts;
 	bool is_legacy;
-	struct qede_dev *qdev;
 	void *handle;
 };
 
@@ -241,6 +241,18 @@ struct qede_fastpath {
 	struct qede_tx_queue *txq;
 };
 
+/* This structure holds the inforation of fast path queues
+ * belonging to individual engines in CMT mode.
+ */
+struct qede_fastpath_cmt {
+	/* Always keep this a first element */
+	struct qede_dev *qdev;
+	/* fastpath info of engine 0 */
+	struct qede_fastpath *fp0;
+	/* fastpath info of engine 1 */
+	struct qede_fastpath *fp1;
+};
+
 /*
  * RX/TX function prototypes
  */
@@ -261,12 +273,16 @@ void qede_tx_queue_release(void *tx_queue);
 
 uint16_t qede_xmit_pkts(void *p_txq, struct rte_mbuf **tx_pkts,
 			uint16_t nb_pkts);
+uint16_t qede_xmit_pkts_cmt(void *p_txq, struct rte_mbuf **tx_pkts,
+			    uint16_t nb_pkts);
 
 uint16_t qede_xmit_prep_pkts(void *p_txq, struct rte_mbuf **tx_pkts,
 			     uint16_t nb_pkts);
 
 uint16_t qede_recv_pkts(void *p_rxq, struct rte_mbuf **rx_pkts,
 			uint16_t nb_pkts);
+uint16_t qede_recv_pkts_cmt(void *p_rxq, struct rte_mbuf **rx_pkts,
+			    uint16_t nb_pkts);
 
 uint16_t qede_rxtx_pkts_dummy(void *p_rxq,
 			      struct rte_mbuf **pkts,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [dpdk-dev] [PATCH v1 3/5] net/qede: fix RSS configuration as per new 100Gb queue allocation method
  2019-09-06  7:32 [dpdk-dev] [PATCH v1 0/5] net/qede: fixes and enhancement Shahed Shaikh
  2019-09-06  7:32 ` [dpdk-dev] [PATCH v1 1/5] net/qede: refactor Rx and Tx queue setup Shahed Shaikh
  2019-09-06  7:32 ` [dpdk-dev] [PATCH v1 2/5] net/qede: fix ovs-dpdk failure when using odd number of queues on 100Gb mode Shahed Shaikh
@ 2019-09-06  7:32 ` Shahed Shaikh
  2019-09-06  7:32 ` [dpdk-dev] [PATCH v1 4/5] net/qede: fix stats flow " Shahed Shaikh
  2019-09-06  7:32 ` [dpdk-dev] [PATCH v1 5/5] net/qede: implement rte_flow drop action Shahed Shaikh
  4 siblings, 0 replies; 8+ messages in thread
From: Shahed Shaikh @ 2019-09-06  7:32 UTC (permalink / raw)
  To: dev; +Cc: rmody, jerinj, GR-Everest-DPDK-Dev, stable

With old design, RETA was configured in round-robin fashion since
queue allocation was distributed across both engines alternately.
Now, we need to configure RETA symmetrically on both engines since
both engines have same number of queues.

Fixes: 2af14ca79c0a ("net/qede: support 100G")
Cc: stable@dpdk.org

Signed-off-by: Shahed Shaikh <shshaikh@marvell.com>
---
 drivers/net/qede/qede_ethdev.c | 110 ++++++++-------------------------
 1 file changed, 27 insertions(+), 83 deletions(-)

diff --git a/drivers/net/qede/qede_ethdev.c b/drivers/net/qede/qede_ethdev.c
index 308588cb8..8b75ca3a7 100644
--- a/drivers/net/qede/qede_ethdev.c
+++ b/drivers/net/qede/qede_ethdev.c
@@ -1962,8 +1962,7 @@ int qede_rss_hash_update(struct rte_eth_dev *eth_dev,
 	uint32_t *key = (uint32_t *)rss_conf->rss_key;
 	uint64_t hf = rss_conf->rss_hf;
 	uint8_t len = rss_conf->rss_key_len;
-	uint8_t idx;
-	uint8_t i;
+	uint8_t idx, i, j, fpidx;
 	int rc;
 
 	memset(&vport_update_params, 0, sizeof(vport_update_params));
@@ -1997,14 +1996,18 @@ int qede_rss_hash_update(struct rte_eth_dev *eth_dev,
 	/* tbl_size has to be set with capabilities */
 	rss_params.rss_table_size_log = 7;
 	vport_update_params.vport_id = 0;
-	/* pass the L2 handles instead of qids */
-	for (i = 0 ; i < ECORE_RSS_IND_TABLE_SIZE ; i++) {
-		idx = i % QEDE_RSS_COUNT(eth_dev);
-		rss_params.rss_ind_table[i] = qdev->fp_array[idx].rxq->handle;
-	}
-	vport_update_params.rss_params = &rss_params;
 
 	for_each_hwfn(edev, i) {
+		/* pass the L2 handles instead of qids */
+		for (j = 0 ; j < ECORE_RSS_IND_TABLE_SIZE ; j++) {
+			idx = j % QEDE_RSS_COUNT(eth_dev);
+			fpidx = idx * edev->num_hwfns + i;
+			rss_params.rss_ind_table[j] =
+				qdev->fp_array[fpidx].rxq->handle;
+		}
+
+		vport_update_params.rss_params = &rss_params;
+
 		p_hwfn = &edev->hwfns[i];
 		vport_update_params.opaque_fid = p_hwfn->hw_info.opaque_fid;
 		rc = ecore_sp_vport_update(p_hwfn, &vport_update_params,
@@ -2056,61 +2059,6 @@ static int qede_rss_hash_conf_get(struct rte_eth_dev *eth_dev,
 	return 0;
 }
 
-static bool qede_update_rss_parm_cmt(struct ecore_dev *edev,
-				    struct ecore_rss_params *rss)
-{
-	int i, fn;
-	bool rss_mode = 1; /* enable */
-	struct ecore_queue_cid *cid;
-	struct ecore_rss_params *t_rss;
-
-	/* In regular scenario, we'd simply need to take input handlers.
-	 * But in CMT, we'd have to split the handlers according to the
-	 * engine they were configured on. We'd then have to understand
-	 * whether RSS is really required, since 2-queues on CMT doesn't
-	 * require RSS.
-	 */
-
-	/* CMT should be round-robin */
-	for (i = 0; i < ECORE_RSS_IND_TABLE_SIZE; i++) {
-		cid = rss->rss_ind_table[i];
-
-		if (cid->p_owner == ECORE_LEADING_HWFN(edev))
-			t_rss = &rss[0];
-		else
-			t_rss = &rss[1];
-
-		t_rss->rss_ind_table[i / edev->num_hwfns] = cid;
-	}
-
-	t_rss = &rss[1];
-	t_rss->update_rss_ind_table = 1;
-	t_rss->rss_table_size_log = 7;
-	t_rss->update_rss_config = 1;
-
-	/* Make sure RSS is actually required */
-	for_each_hwfn(edev, fn) {
-		for (i = 1; i < ECORE_RSS_IND_TABLE_SIZE / edev->num_hwfns;
-		     i++) {
-			if (rss[fn].rss_ind_table[i] !=
-			    rss[fn].rss_ind_table[0])
-				break;
-		}
-
-		if (i == ECORE_RSS_IND_TABLE_SIZE / edev->num_hwfns) {
-			DP_INFO(edev,
-				"CMT - 1 queue per-hwfn; Disabling RSS\n");
-			rss_mode = 0;
-			goto out;
-		}
-	}
-
-out:
-	t_rss->rss_enable = rss_mode;
-
-	return rss_mode;
-}
-
 int qede_rss_reta_update(struct rte_eth_dev *eth_dev,
 			 struct rte_eth_rss_reta_entry64 *reta_conf,
 			 uint16_t reta_size)
@@ -2119,8 +2067,8 @@ int qede_rss_reta_update(struct rte_eth_dev *eth_dev,
 	struct ecore_dev *edev = QEDE_INIT_EDEV(qdev);
 	struct ecore_sp_vport_update_params vport_update_params;
 	struct ecore_rss_params *params;
+	uint16_t i, j, idx, fid, shift;
 	struct ecore_hwfn *p_hwfn;
-	uint16_t i, idx, shift;
 	uint8_t entry;
 	int rc = 0;
 
@@ -2131,40 +2079,36 @@ int qede_rss_reta_update(struct rte_eth_dev *eth_dev,
 	}
 
 	memset(&vport_update_params, 0, sizeof(vport_update_params));
-	params = rte_zmalloc("qede_rss", sizeof(*params) * edev->num_hwfns,
-			     RTE_CACHE_LINE_SIZE);
+	params = rte_zmalloc("qede_rss", sizeof(*params), RTE_CACHE_LINE_SIZE);
 	if (params == NULL) {
 		DP_ERR(edev, "failed to allocate memory\n");
 		return -ENOMEM;
 	}
 
-	for (i = 0; i < reta_size; i++) {
-		idx = i / RTE_RETA_GROUP_SIZE;
-		shift = i % RTE_RETA_GROUP_SIZE;
-		if (reta_conf[idx].mask & (1ULL << shift)) {
-			entry = reta_conf[idx].reta[shift];
-			/* Pass rxq handles to ecore */
-			params->rss_ind_table[i] =
-					qdev->fp_array[entry].rxq->handle;
-			/* Update the local copy for RETA query command */
-			qdev->rss_ind_table[i] = entry;
-		}
-	}
-
 	params->update_rss_ind_table = 1;
 	params->rss_table_size_log = 7;
 	params->update_rss_config = 1;
 
-	/* Fix up RETA for CMT mode device */
-	if (ECORE_IS_CMT(edev))
-		qdev->rss_enable = qede_update_rss_parm_cmt(edev,
-							    params);
 	vport_update_params.vport_id = 0;
 	/* Use the current value of rss_enable */
 	params->rss_enable = qdev->rss_enable;
 	vport_update_params.rss_params = params;
 
 	for_each_hwfn(edev, i) {
+		for (j = 0; j < reta_size; j++) {
+			idx = j / RTE_RETA_GROUP_SIZE;
+			shift = j % RTE_RETA_GROUP_SIZE;
+			if (reta_conf[idx].mask & (1ULL << shift)) {
+				entry = reta_conf[idx].reta[shift];
+				fid = entry * edev->num_hwfns + i;
+				/* Pass rxq handles to ecore */
+				params->rss_ind_table[j] =
+						qdev->fp_array[fid].rxq->handle;
+				/* Update the local copy for RETA query cmd */
+				qdev->rss_ind_table[j] = entry;
+			}
+		}
+
 		p_hwfn = &edev->hwfns[i];
 		vport_update_params.opaque_fid = p_hwfn->hw_info.opaque_fid;
 		rc = ecore_sp_vport_update(p_hwfn, &vport_update_params,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [dpdk-dev] [PATCH v1 4/5] net/qede: fix stats flow as per new 100Gb queue allocation method
  2019-09-06  7:32 [dpdk-dev] [PATCH v1 0/5] net/qede: fixes and enhancement Shahed Shaikh
                   ` (2 preceding siblings ...)
  2019-09-06  7:32 ` [dpdk-dev] [PATCH v1 3/5] net/qede: fix RSS configuration as per new 100Gb queue allocation method Shahed Shaikh
@ 2019-09-06  7:32 ` Shahed Shaikh
  2019-09-06  7:32 ` [dpdk-dev] [PATCH v1 5/5] net/qede: implement rte_flow drop action Shahed Shaikh
  4 siblings, 0 replies; 8+ messages in thread
From: Shahed Shaikh @ 2019-09-06  7:32 UTC (permalink / raw)
  To: dev; +Cc: rmody, jerinj, GR-Everest-DPDK-Dev, stable

As per new method, need to consider hw stats of queues from
both engines. This patch fixes the stats collection flow accordingly.

Fixes: 2af14ca79c0a ("net/qede: support 100G")
Cc: stable@dpdk.org

Signed-off-by: Shahed Shaikh <shshaikh@marvell.com>
---
 drivers/net/qede/qede_ethdev.c | 135 +++++++++++++++++++--------------
 1 file changed, 76 insertions(+), 59 deletions(-)

diff --git a/drivers/net/qede/qede_ethdev.c b/drivers/net/qede/qede_ethdev.c
index 8b75ca3a7..98290fdc7 100644
--- a/drivers/net/qede/qede_ethdev.c
+++ b/drivers/net/qede/qede_ethdev.c
@@ -1477,7 +1477,7 @@ qede_get_stats(struct rte_eth_dev *eth_dev, struct rte_eth_stats *eth_stats)
 	struct qede_dev *qdev = eth_dev->data->dev_private;
 	struct ecore_dev *edev = &qdev->edev;
 	struct ecore_eth_stats stats;
-	unsigned int i = 0, j = 0, qid;
+	unsigned int i = 0, j = 0, qid, idx, hw_fn;
 	unsigned int rxq_stat_cntrs, txq_stat_cntrs;
 	struct qede_tx_queue *txq;
 
@@ -1525,32 +1525,47 @@ qede_get_stats(struct rte_eth_dev *eth_dev, struct rte_eth_stats *eth_stats)
 		       " appropriately and retry.\n");
 
 	for (qid = 0; qid < eth_dev->data->nb_rx_queues; qid++) {
-		eth_stats->q_ipackets[i] =
-			*(uint64_t *)(
-				((char *)(qdev->fp_array[qid].rxq)) +
-				offsetof(struct qede_rx_queue,
-				rcv_pkts));
-		eth_stats->q_errors[i] =
-			*(uint64_t *)(
-				((char *)(qdev->fp_array[qid].rxq)) +
-				offsetof(struct qede_rx_queue,
-				rx_hw_errors)) +
-			*(uint64_t *)(
-				((char *)(qdev->fp_array[qid].rxq)) +
-				offsetof(struct qede_rx_queue,
-				rx_alloc_errors));
+		eth_stats->q_ipackets[i] = 0;
+		eth_stats->q_errors[i] = 0;
+
+		for_each_hwfn(edev, hw_fn) {
+			idx = qid * edev->num_hwfns + hw_fn;
+
+			eth_stats->q_ipackets[i] +=
+				*(uint64_t *)
+					(((char *)(qdev->fp_array[idx].rxq)) +
+					 offsetof(struct qede_rx_queue,
+					 rcv_pkts));
+			eth_stats->q_errors[i] +=
+				*(uint64_t *)
+					(((char *)(qdev->fp_array[idx].rxq)) +
+					 offsetof(struct qede_rx_queue,
+					 rx_hw_errors)) +
+				*(uint64_t *)
+					(((char *)(qdev->fp_array[idx].rxq)) +
+					 offsetof(struct qede_rx_queue,
+					 rx_alloc_errors));
+		}
+
 		i++;
 		if (i == rxq_stat_cntrs)
 			break;
 	}
 
 	for (qid = 0; qid < eth_dev->data->nb_tx_queues; qid++) {
-		txq = qdev->fp_array[qid].txq;
-		eth_stats->q_opackets[j] =
-			*((uint64_t *)(uintptr_t)
-				(((uint64_t)(uintptr_t)(txq)) +
-				 offsetof(struct qede_tx_queue,
-					  xmit_pkts)));
+		eth_stats->q_opackets[j] = 0;
+
+		for_each_hwfn(edev, hw_fn) {
+			idx = qid * edev->num_hwfns + hw_fn;
+
+			txq = qdev->fp_array[idx].txq;
+			eth_stats->q_opackets[j] +=
+				*((uint64_t *)(uintptr_t)
+					(((uint64_t)(uintptr_t)(txq)) +
+					 offsetof(struct qede_tx_queue,
+						  xmit_pkts)));
+		}
+
 		j++;
 		if (j == txq_stat_cntrs)
 			break;
@@ -1583,42 +1598,43 @@ qede_get_xstats_names(struct rte_eth_dev *dev,
 	struct qede_dev *qdev = dev->data->dev_private;
 	struct ecore_dev *edev = &qdev->edev;
 	const unsigned int stat_cnt = qede_get_xstats_count(qdev);
-	unsigned int i, qid, stat_idx = 0;
-	unsigned int rxq_stat_cntrs;
+	unsigned int i, qid, hw_fn, stat_idx = 0;
 
-	if (xstats_names != NULL) {
-		for (i = 0; i < RTE_DIM(qede_xstats_strings); i++) {
+	if (xstats_names == NULL)
+		return stat_cnt;
+
+	for (i = 0; i < RTE_DIM(qede_xstats_strings); i++) {
+		strlcpy(xstats_names[stat_idx].name,
+			qede_xstats_strings[i].name,
+			sizeof(xstats_names[stat_idx].name));
+		stat_idx++;
+	}
+
+	if (ECORE_IS_BB(edev)) {
+		for (i = 0; i < RTE_DIM(qede_bb_xstats_strings); i++) {
 			strlcpy(xstats_names[stat_idx].name,
-				qede_xstats_strings[i].name,
+				qede_bb_xstats_strings[i].name,
 				sizeof(xstats_names[stat_idx].name));
 			stat_idx++;
 		}
-
-		if (ECORE_IS_BB(edev)) {
-			for (i = 0; i < RTE_DIM(qede_bb_xstats_strings); i++) {
-				strlcpy(xstats_names[stat_idx].name,
-					qede_bb_xstats_strings[i].name,
-					sizeof(xstats_names[stat_idx].name));
-				stat_idx++;
-			}
-		} else {
-			for (i = 0; i < RTE_DIM(qede_ah_xstats_strings); i++) {
-				strlcpy(xstats_names[stat_idx].name,
-					qede_ah_xstats_strings[i].name,
-					sizeof(xstats_names[stat_idx].name));
-				stat_idx++;
-			}
+	} else {
+		for (i = 0; i < RTE_DIM(qede_ah_xstats_strings); i++) {
+			strlcpy(xstats_names[stat_idx].name,
+				qede_ah_xstats_strings[i].name,
+				sizeof(xstats_names[stat_idx].name));
+			stat_idx++;
 		}
+	}
 
-		rxq_stat_cntrs = RTE_MIN(QEDE_RSS_COUNT(dev),
-					 RTE_ETHDEV_QUEUE_STAT_CNTRS);
-		for (qid = 0; qid < rxq_stat_cntrs; qid++) {
+	for (qid = 0; qid < QEDE_RSS_COUNT(dev); qid++) {
+		for_each_hwfn(edev, hw_fn) {
 			for (i = 0; i < RTE_DIM(qede_rxq_xstats_strings); i++) {
 				snprintf(xstats_names[stat_idx].name,
-					sizeof(xstats_names[stat_idx].name),
-					"%.4s%d%s",
-					qede_rxq_xstats_strings[i].name, qid,
-					qede_rxq_xstats_strings[i].name + 4);
+					 RTE_ETH_XSTATS_NAME_SIZE,
+					 "%.4s%d.%d%s",
+					 qede_rxq_xstats_strings[i].name,
+					 hw_fn, qid,
+					 qede_rxq_xstats_strings[i].name + 4);
 				stat_idx++;
 			}
 		}
@@ -1635,8 +1651,7 @@ qede_get_xstats(struct rte_eth_dev *dev, struct rte_eth_xstat *xstats,
 	struct ecore_dev *edev = &qdev->edev;
 	struct ecore_eth_stats stats;
 	const unsigned int num = qede_get_xstats_count(qdev);
-	unsigned int i, qid, stat_idx = 0;
-	unsigned int rxq_stat_cntrs;
+	unsigned int i, qid, hw_fn, fpidx, stat_idx = 0;
 
 	if (n < num)
 		return num;
@@ -1668,15 +1683,17 @@ qede_get_xstats(struct rte_eth_dev *dev, struct rte_eth_xstat *xstats,
 		}
 	}
 
-	rxq_stat_cntrs = RTE_MIN(QEDE_RSS_COUNT(dev),
-				 RTE_ETHDEV_QUEUE_STAT_CNTRS);
-	for (qid = 0; qid < rxq_stat_cntrs; qid++) {
-		for (i = 0; i < RTE_DIM(qede_rxq_xstats_strings); i++) {
-			xstats[stat_idx].value = *(uint64_t *)
-				(((char *)(qdev->fp_array[qid].rxq)) +
-				 qede_rxq_xstats_strings[i].offset);
-			xstats[stat_idx].id = stat_idx;
-			stat_idx++;
+	for (qid = 0; qid < dev->data->nb_rx_queues; qid++) {
+		for_each_hwfn(edev, hw_fn) {
+			for (i = 0; i < RTE_DIM(qede_rxq_xstats_strings); i++) {
+				fpidx = qid * edev->num_hwfns + hw_fn;
+				xstats[stat_idx].value = *(uint64_t *)
+					(((char *)(qdev->fp_array[fpidx].rxq)) +
+					 qede_rxq_xstats_strings[i].offset);
+				xstats[stat_idx].id = stat_idx;
+				stat_idx++;
+			}
+
 		}
 	}
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [dpdk-dev] [PATCH v1 5/5] net/qede: implement rte_flow drop action
  2019-09-06  7:32 [dpdk-dev] [PATCH v1 0/5] net/qede: fixes and enhancement Shahed Shaikh
                   ` (3 preceding siblings ...)
  2019-09-06  7:32 ` [dpdk-dev] [PATCH v1 4/5] net/qede: fix stats flow " Shahed Shaikh
@ 2019-09-06  7:32 ` Shahed Shaikh
  4 siblings, 0 replies; 8+ messages in thread
From: Shahed Shaikh @ 2019-09-06  7:32 UTC (permalink / raw)
  To: dev; +Cc: rmody, jerinj, GR-Everest-DPDK-Dev

Add support to configure drop action in rte_flow
infrastructure and add counter for dropped
packets due to this filter action "rx_gft_filter_drop".

Also, update supported flows and actions in qede guide.

Signed-off-by: Shahed Shaikh <shshaikh@marvell.com>
---
 doc/guides/nics/qede.rst              | 39 +++++++++++++++++++++
 drivers/net/qede/base/ecore_dev_api.h |  1 +
 drivers/net/qede/base/ecore_l2.c      | 50 +++++++++++++++------------
 drivers/net/qede/base/ecore_l2_api.h  | 39 ++++++++++++++-------
 drivers/net/qede/qede_ethdev.c        |  2 ++
 drivers/net/qede/qede_ethdev.h        |  1 +
 drivers/net/qede/qede_filter.c        | 22 ++++++++----
 7 files changed, 114 insertions(+), 40 deletions(-)

diff --git a/doc/guides/nics/qede.rst b/doc/guides/nics/qede.rst
index 05a6aef57..471d98014 100644
--- a/doc/guides/nics/qede.rst
+++ b/doc/guides/nics/qede.rst
@@ -39,6 +39,7 @@ Supported Features
 - GENEVE Tunneling offload
 - VXLAN Tunneling offload
 - MPLSoUDP Tx Tunneling offload
+- Generic flow API
 
 Non-supported Features
 ----------------------
@@ -137,6 +138,44 @@ Driver compilation and testing
 Refer to the document :ref:`compiling and testing a PMD for a NIC <pmd_build_and_test>`
 for details.
 
+RTE Flow Support
+----------------
+
+QLogic FastLinQ QL4xxxx NICs has support for the following patterns and
+actions.
+
+Patterns:
+
+.. _table_qede_supported_flow_item_types:
+
+.. table:: Item types
+
+   +----+--------------------------------+
+   | #  | Pattern Type                   |
+   +====+================================+
+   | 1  | RTE_FLOW_ITEM_TYPE_IPV4        |
+   +----+--------------------------------+
+   | 2  | RTE_FLOW_ITEM_TYPE_IPV6        |
+   +----+--------------------------------+
+   | 3  | RTE_FLOW_ITEM_TYPE_UDP         |
+   +----+--------------------------------+
+   | 4  | RTE_FLOW_ITEM_TYPE_TCP         |
+   +----+--------------------------------+
+
+Actions:
+
+.. _table_qede_supported_ingress_action_types:
+
+.. table:: Ingress action types
+
+   +----+--------------------------------+
+   | #  | Action Type                    |
+   +====+================================+
+   | 1  | RTE_FLOW_ACTION_TYPE_QUEUE     |
+   +----+--------------------------------+
+   | 2  | RTE_FLOW_ACTION_TYPE_DROP      |
+   +----+--------------------------------+
+
 SR-IOV: Prerequisites and Sample Application Notes
 --------------------------------------------------
 
diff --git a/drivers/net/qede/base/ecore_dev_api.h b/drivers/net/qede/base/ecore_dev_api.h
index 730806321..a99888097 100644
--- a/drivers/net/qede/base/ecore_dev_api.h
+++ b/drivers/net/qede/base/ecore_dev_api.h
@@ -334,6 +334,7 @@ struct ecore_eth_stats_common {
 	u64 rx_bcast_pkts;
 	u64 mftag_filter_discards;
 	u64 mac_filter_discards;
+	u64 gft_filter_drop;
 	u64 tx_ucast_bytes;
 	u64 tx_mcast_bytes;
 	u64 tx_bcast_bytes;
diff --git a/drivers/net/qede/base/ecore_l2.c b/drivers/net/qede/base/ecore_l2.c
index 8b9817eb2..5dcdc84fc 100644
--- a/drivers/net/qede/base/ecore_l2.c
+++ b/drivers/net/qede/base/ecore_l2.c
@@ -1782,6 +1782,8 @@ static void __ecore_get_vport_tstats(struct ecore_hwfn *p_hwfn,
 		HILO_64_REGPAIR(tstats.mftag_filter_discard);
 	p_stats->common.mac_filter_discards +=
 		HILO_64_REGPAIR(tstats.eth_mac_filter_discard);
+	p_stats->common.gft_filter_drop +=
+		HILO_64_REGPAIR(tstats.eth_gft_drop_pkt);
 }
 
 static void __ecore_get_vport_ustats_addrlen(struct ecore_hwfn *p_hwfn,
@@ -2140,9 +2142,7 @@ void ecore_arfs_mode_configure(struct ecore_hwfn *p_hwfn,
 enum _ecore_status_t
 ecore_configure_rfs_ntuple_filter(struct ecore_hwfn *p_hwfn,
 				  struct ecore_spq_comp_cb *p_cb,
-				  dma_addr_t p_addr, u16 length,
-				  u16 qid, u8 vport_id,
-				  bool b_is_add)
+				  struct ecore_ntuple_filter_params *p_params)
 {
 	struct rx_update_gft_filter_data *p_ramrod = OSAL_NULL;
 	struct ecore_spq_entry *p_ent = OSAL_NULL;
@@ -2151,14 +2151,6 @@ ecore_configure_rfs_ntuple_filter(struct ecore_hwfn *p_hwfn,
 	u8 abs_vport_id = 0;
 	enum _ecore_status_t rc = ECORE_NOTIMPL;
 
-	rc = ecore_fw_vport(p_hwfn, vport_id, &abs_vport_id);
-	if (rc != ECORE_SUCCESS)
-		return rc;
-
-	rc = ecore_fw_l2_queue(p_hwfn, qid, &abs_rx_q_id);
-	if (rc != ECORE_SUCCESS)
-		return rc;
-
 	/* Get SPQ entry */
 	OSAL_MEMSET(&init_data, 0, sizeof(init_data));
 	init_data.cid = ecore_spq_get_cid(p_hwfn);
@@ -2180,27 +2172,41 @@ ecore_configure_rfs_ntuple_filter(struct ecore_hwfn *p_hwfn,
 
 	p_ramrod = &p_ent->ramrod.rx_update_gft;
 
-	DMA_REGPAIR_LE(p_ramrod->pkt_hdr_addr, p_addr);
-	p_ramrod->pkt_hdr_length = OSAL_CPU_TO_LE16(length);
+	DMA_REGPAIR_LE(p_ramrod->pkt_hdr_addr, p_params->addr);
+	p_ramrod->pkt_hdr_length = OSAL_CPU_TO_LE16(p_params->length);
 
-	p_ramrod->action_icid_valid = 0;
-	p_ramrod->action_icid = 0;
+	if (p_params->b_is_drop) {
+		p_ramrod->vport_id = OSAL_CPU_TO_LE16(ETH_GFT_TRASHCAN_VPORT);
+	} else {
+		rc = ecore_fw_vport(p_hwfn, p_params->vport_id,
+				    &abs_vport_id);
+		if (rc)
+			return rc;
+
+		if (p_params->qid != ECORE_RFS_NTUPLE_QID_RSS) {
+			rc = ecore_fw_l2_queue(p_hwfn, p_params->qid,
+					       &abs_rx_q_id);
+			if (rc)
+				return rc;
 
-	p_ramrod->rx_qid_valid = 1;
-	p_ramrod->rx_qid = OSAL_CPU_TO_LE16(abs_rx_q_id);
+			p_ramrod->rx_qid_valid = 1;
+			p_ramrod->rx_qid = OSAL_CPU_TO_LE16(abs_rx_q_id);
+		}
+
+		p_ramrod->vport_id = OSAL_CPU_TO_LE16((u16)abs_vport_id);
+	}
 
 	p_ramrod->flow_id_valid = 0;
 	p_ramrod->flow_id = 0;
 
-	p_ramrod->vport_id = OSAL_CPU_TO_LE16((u16)abs_vport_id);
-	p_ramrod->filter_action = b_is_add ? GFT_ADD_FILTER
-					   : GFT_DELETE_FILTER;
+	p_ramrod->filter_action = p_params->b_is_add ? GFT_ADD_FILTER
+						     : GFT_DELETE_FILTER;
 
 	DP_VERBOSE(p_hwfn, ECORE_MSG_SP,
 		   "V[%0x], Q[%04x] - %s filter from 0x%lx [length %04xb]\n",
 		   abs_vport_id, abs_rx_q_id,
-		   b_is_add ? "Adding" : "Removing",
-		   (unsigned long)p_addr, length);
+		   p_params->b_is_add ? "Adding" : "Removing",
+		   (unsigned long)p_params->addr, p_params->length);
 
 	return ecore_spq_post(p_hwfn, p_ent, OSAL_NULL);
 }
diff --git a/drivers/net/qede/base/ecore_l2_api.h b/drivers/net/qede/base/ecore_l2_api.h
index 004fb61ba..acde81fad 100644
--- a/drivers/net/qede/base/ecore_l2_api.h
+++ b/drivers/net/qede/base/ecore_l2_api.h
@@ -448,6 +448,31 @@ void ecore_arfs_mode_configure(struct ecore_hwfn *p_hwfn,
 			       struct ecore_ptt *p_ptt,
 			       struct ecore_arfs_config_params *p_cfg_params);
 
+struct ecore_ntuple_filter_params {
+	/* Physically mapped address containing header of buffer to be used
+	 * as filter.
+	 */
+	dma_addr_t addr;
+
+	/* Length of header in bytes */
+	u16 length;
+
+	/* Relative queue-id to receive classified packet */
+	#define ECORE_RFS_NTUPLE_QID_RSS ((u16)-1)
+	u16 qid;
+
+	/* Identifier can either be according to vport-id or vfid */
+	bool b_is_vf;
+	u8 vport_id;
+	u8 vf_id;
+
+	/* true if this filter is to be added. Else to be removed */
+	bool b_is_add;
+
+	/* If packet needs to be dropped */
+	bool b_is_drop;
+};
+
 /**
  * @brief - ecore_configure_rfs_ntuple_filter
  *
@@ -457,22 +482,12 @@ void ecore_arfs_mode_configure(struct ecore_hwfn *p_hwfn,
  * @params p_cb		Used for ECORE_SPQ_MODE_CB,where client would initialize
  *			it with cookie and callback function address, if not
  *			using this mode then client must pass NULL.
- * @params p_addr	p_addr is an actual packet header that needs to be
- *			filter. It has to mapped with IO to read prior to
- *			calling this, [contains 4 tuples- src ip, dest ip,
- *			src port, dest port].
- * @params length	length of p_addr header up to past the transport header.
- * @params qid		receive packet will be directed to this queue.
- * @params vport_id
- * @params b_is_add	flag to add or remove filter.
- *
+ * @params p_params
  */
 enum _ecore_status_t
 ecore_configure_rfs_ntuple_filter(struct ecore_hwfn *p_hwfn,
 				  struct ecore_spq_comp_cb *p_cb,
-				  dma_addr_t p_addr, u16 length,
-				  u16 qid, u8 vport_id,
-				  bool b_is_add);
+				  struct ecore_ntuple_filter_params *p_params);
 
 /**
  * @brief - ecore_update_eth_rss_ind_table_entry
diff --git a/drivers/net/qede/qede_ethdev.c b/drivers/net/qede/qede_ethdev.c
index 98290fdc7..8064735db 100644
--- a/drivers/net/qede/qede_ethdev.c
+++ b/drivers/net/qede/qede_ethdev.c
@@ -125,6 +125,8 @@ static const struct rte_qede_xstats_name_off qede_xstats_strings[] = {
 		offsetof(struct ecore_eth_stats_common, mftag_filter_discards)},
 	{"rx_mac_filter_discards",
 		offsetof(struct ecore_eth_stats_common, mac_filter_discards)},
+	{"rx_gft_filter_drop",
+		offsetof(struct ecore_eth_stats_common, gft_filter_drop)},
 	{"rx_hw_buffer_truncates",
 		offsetof(struct ecore_eth_stats_common, brb_truncates)},
 	{"rx_hw_buffer_discards",
diff --git a/drivers/net/qede/qede_ethdev.h b/drivers/net/qede/qede_ethdev.h
index 5549d0bf3..b5f93e9fa 100644
--- a/drivers/net/qede/qede_ethdev.h
+++ b/drivers/net/qede/qede_ethdev.h
@@ -179,6 +179,7 @@ struct qede_arfs_entry {
 	uint32_t soft_id; /* unused for now */
 	uint16_t pkt_len; /* actual packet length to match */
 	uint16_t rx_queue; /* queue to be steered to */
+	bool is_drop; /* drop action */
 	const struct rte_memzone *mz; /* mz used to hold L2 frame */
 	struct qede_arfs_tuple tuple;
 	SLIST_ENTRY(qede_arfs_entry) list;
diff --git a/drivers/net/qede/qede_filter.c b/drivers/net/qede/qede_filter.c
index 81509f04b..b7ad59ad6 100644
--- a/drivers/net/qede/qede_filter.c
+++ b/drivers/net/qede/qede_filter.c
@@ -272,6 +272,7 @@ qede_config_arfs_filter(struct rte_eth_dev *eth_dev,
 {
 	struct qede_dev *qdev = QEDE_INIT_QDEV(eth_dev);
 	struct ecore_dev *edev = QEDE_INIT_EDEV(qdev);
+	struct ecore_ntuple_filter_params params;
 	char mz_name[RTE_MEMZONE_NAMESIZE] = {0};
 	struct qede_arfs_entry *tmp = NULL;
 	const struct rte_memzone *mz;
@@ -344,12 +345,18 @@ qede_config_arfs_filter(struct rte_eth_dev *eth_dev,
 		ecore_arfs_mode_configure(p_hwfn, p_hwfn->p_arfs_ptt,
 					  &qdev->arfs_info.arfs);
 	}
+
+	memset(&params, 0, sizeof(params));
+	params.addr = (dma_addr_t)mz->iova;
+	params.length = pkt_len;
+	params.qid = arfs->rx_queue;
+	params.vport_id = 0;
+	params.b_is_add = add;
+	params.b_is_drop = arfs->is_drop;
+
 	/* configure filter with ECORE_SPQ_MODE_EBLOCK */
 	rc = ecore_configure_rfs_ntuple_filter(p_hwfn, NULL,
-					       (dma_addr_t)mz->iova,
-					       pkt_len,
-					       arfs->rx_queue,
-					       0, add);
+					       &params);
 	if (rc == ECORE_SUCCESS) {
 		if (add) {
 			arfs->pkt_len = pkt_len;
@@ -1371,12 +1378,15 @@ qede_flow_parse_actions(struct rte_eth_dev *dev,
 				flow->entry.rx_queue = queue->index;
 
 			break;
-
+		case RTE_FLOW_ACTION_TYPE_DROP:
+			if (flow)
+				flow->entry.is_drop = true;
+			break;
 		default:
 			rte_flow_error_set(error, ENOTSUP,
 					   RTE_FLOW_ERROR_TYPE_ACTION,
 					   actions,
-					   "Action is not supported - only ACTION_TYPE_QUEUE supported");
+					   "Action is not supported - only ACTION_TYPE_QUEUE and ACTION_TYPE_DROP supported");
 			return -rte_errno;
 		}
 	}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [dpdk-dev] [PATCH v1 1/5] net/qede: refactor Rx and Tx queue setup
  2019-09-06  7:32 ` [dpdk-dev] [PATCH v1 1/5] net/qede: refactor Rx and Tx queue setup Shahed Shaikh
@ 2019-09-12 12:34   ` Jerin Jacob
  2019-09-12 14:48     ` Shahed Shaikh
  0 siblings, 1 reply; 8+ messages in thread
From: Jerin Jacob @ 2019-09-12 12:34 UTC (permalink / raw)
  To: Shahed Shaikh; +Cc: dev, Rasesh Mody, Jerin Jacob, GR-Everest-DPDK-Dev, stable

On Fri, Sep 6, 2019 at 1:10 PM Shahed Shaikh <shshaikh@marvell.com> wrote:
>
> This patch refactors Rx and Tx queue setup flow required to allow
> odd number of queues to be configured in next patch.
>
> This is the first patch of the series required to fix an issue
> where qede port initialization in ovs-dpdk fails due to 1 Rx/Tx queue
> configuration. Detailed explaination is given in next patch.
>
> Fixes: 2af14ca79c0a ("net/qede: support 100G")
> Cc: stable@dpdk.org
>
> Signed-off-by: Shahed Shaikh <shshaikh@marvell.com>
> ---

This series looks good to me.
Looks like there is genuine compilation issue from qede driver.
http://mails.dpdk.org/archives/test-report/2019-September/096256.html

Please confirm in either case.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [dpdk-dev] [PATCH v1 1/5] net/qede: refactor Rx and Tx queue setup
  2019-09-12 12:34   ` Jerin Jacob
@ 2019-09-12 14:48     ` Shahed Shaikh
  0 siblings, 0 replies; 8+ messages in thread
From: Shahed Shaikh @ 2019-09-12 14:48 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, Rasesh Mody, Jerin Jacob Kollanukkaran, GR-Everest-DPDK-Dev, stable

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Jerin Jacob
> Sent: Thursday, September 12, 2019 6:04 PM
> To: Shahed Shaikh <shshaikh@marvell.com>
> Cc: dev@dpdk.org; Rasesh Mody <rmody@marvell.com>; Jerin Jacob
> Kollanukkaran <jerinj@marvell.com>; GR-Everest-DPDK-Dev <GR-Everest-DPDK-
> Dev@marvell.com>; stable@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v1 1/5] net/qede: refactor Rx and Tx queue
> setup
> 
> On Fri, Sep 6, 2019 at 1:10 PM Shahed Shaikh <shshaikh@marvell.com> wrote:
> >
> > This patch refactors Rx and Tx queue setup flow required to allow odd
> > number of queues to be configured in next patch.
> >
> > This is the first patch of the series required to fix an issue where
> > qede port initialization in ovs-dpdk fails due to 1 Rx/Tx queue
> > configuration. Detailed explaination is given in next patch.
> >
> > Fixes: 2af14ca79c0a ("net/qede: support 100G")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Shahed Shaikh <shshaikh@marvell.com>
> > ---
> 
> This series looks good to me.
> Looks like there is genuine compilation issue from qede driver.
> http://mails.dpdk.org/archives/test-report/2019-September/096256.html
> 
> Please confirm in either case.
Hi Jerin,
Sending v2 to fix compilation failure for clang compiler reported here -
http://mails.dpdk.org/archives/test-report/2019-September/096256.html

Thanks,
Shahed

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-09-12 14:48 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-06  7:32 [dpdk-dev] [PATCH v1 0/5] net/qede: fixes and enhancement Shahed Shaikh
2019-09-06  7:32 ` [dpdk-dev] [PATCH v1 1/5] net/qede: refactor Rx and Tx queue setup Shahed Shaikh
2019-09-12 12:34   ` Jerin Jacob
2019-09-12 14:48     ` Shahed Shaikh
2019-09-06  7:32 ` [dpdk-dev] [PATCH v1 2/5] net/qede: fix ovs-dpdk failure when using odd number of queues on 100Gb mode Shahed Shaikh
2019-09-06  7:32 ` [dpdk-dev] [PATCH v1 3/5] net/qede: fix RSS configuration as per new 100Gb queue allocation method Shahed Shaikh
2019-09-06  7:32 ` [dpdk-dev] [PATCH v1 4/5] net/qede: fix stats flow " Shahed Shaikh
2019-09-06  7:32 ` [dpdk-dev] [PATCH v1 5/5] net/qede: implement rte_flow drop action Shahed Shaikh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).