DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 00/17] Enhance mlx5 with Mellanox OFED 3.1
@ 2015-10-05 17:54 Adrien Mazarguil
  2015-10-05 17:54 ` [dpdk-dev] [PATCH 01/17] mlx5: use fast Verbs interface for scattered RX operation Adrien Mazarguil
                   ` (18 more replies)
  0 siblings, 19 replies; 39+ messages in thread
From: Adrien Mazarguil @ 2015-10-05 17:54 UTC (permalink / raw)
  To: dev

Mellanox OFED 3.1 [1] comes with improved APIs that Mellanox ConnectX-4
(mlx5) adapters can take advantage of, such as:

- Separate post and doorbell operations on all queues.
- Lightweight RX queues called Work Queues (WQs).
- Low-level RSS indirection table and hash key configuration.

This patchset enhances mlx5 with all of these for better performance and
flexibility. Documentation is updated accordingly.

[1] http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers

Adrien Mazarguil (8):
  mlx5: use fast Verbs interface for scattered RX operation
  mlx5: get rid of the WR structure in RX queue elements
  mlx5: refactor RX code for the new Verbs RSS API
  mlx5: restore allmulti and promisc modes after device restart
  app/testpmd: fix missing initialization in the RSS hash show command
  mlx5: use experimental flows in hash RX queues
  mlx5: enable multi packet send WR in TX CQ
  doc: update mlx5 documentation

Nelio Laranjeiro (5):
  mlx5: adapt indirection table size depending on RX queues number
  mlx5: add RSS hash update/get
  mlx5: use one RSS hash key per flow type
  app/testpmd: add missing type to RSS hash commands
  mlx5: remove normal MAC flows when enabling promiscuous mode

Olga Shern (3):
  mlx5: use separate indirection table for default hash RX queue
  mlx5: define specific flow steering rules for each hash RX QP
  mlx5: use alternate method to configure promiscuous mode

Yaacov Hazan (1):
  mlx5: fix compilation error with GCC < 4.6

 app/test-pmd/cmdline.c                      |  45 +-
 app/test-pmd/config.c                       |  69 +-
 app/test-pmd/testpmd.h                      |   6 +-
 doc/guides/nics/mlx5.rst                    |  26 +
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |   2 +-
 drivers/net/mlx5/Makefile                   |  10 +-
 drivers/net/mlx5/mlx5.c                     |  69 +-
 drivers/net/mlx5/mlx5.h                     |  56 +-
 drivers/net/mlx5/mlx5_defs.h                |   3 +
 drivers/net/mlx5/mlx5_ethdev.c              |  53 +-
 drivers/net/mlx5/mlx5_mac.c                 | 214 ++++---
 drivers/net/mlx5/mlx5_rss.c                 | 213 +++++++
 drivers/net/mlx5/mlx5_rxmode.c              | 327 +++++-----
 drivers/net/mlx5/mlx5_rxq.c                 | 938 ++++++++++++++++++----------
 drivers/net/mlx5/mlx5_rxtx.c                |  68 +-
 drivers/net/mlx5/mlx5_rxtx.h                |  88 ++-
 drivers/net/mlx5/mlx5_trigger.c             |  86 +--
 drivers/net/mlx5/mlx5_txq.c                 |   7 +
 drivers/net/mlx5/mlx5_utils.h               |   2 -
 drivers/net/mlx5/mlx5_vlan.c                |  44 +-
 20 files changed, 1451 insertions(+), 875 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5_rss.c

-- 
2.1.0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [dpdk-dev] [PATCH 01/17] mlx5: use fast Verbs interface for scattered RX operation
  2015-10-05 17:54 [dpdk-dev] [PATCH 00/17] Enhance mlx5 with Mellanox OFED 3.1 Adrien Mazarguil
@ 2015-10-05 17:54 ` Adrien Mazarguil
  2015-10-05 17:54 ` [dpdk-dev] [PATCH 02/17] mlx5: get rid of the WR structure in RX queue elements Adrien Mazarguil
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 39+ messages in thread
From: Adrien Mazarguil @ 2015-10-05 17:54 UTC (permalink / raw)
  To: dev

This commit updates mlx5_rx_burst_sp() to use the fast verbs interface for
posting RX buffers just like mlx5_rx_burst(). Doing so avoids a loop in
libmlx5 and an indirect function call through libibverbs.

Note: recv_sg_list() is not implemented in the QP burst API, this commit is
only to prepare transition to the WQ-based API.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx5/mlx5_rxtx.c | 33 ++++++++++-----------------------
 1 file changed, 10 insertions(+), 23 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 668aff0..8db4f3f 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -600,9 +600,6 @@ mlx5_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 	struct rxq_elt_sp (*elts)[rxq->elts_n] = rxq->elts.sp;
 	const unsigned int elts_n = rxq->elts_n;
 	unsigned int elts_head = rxq->elts_head;
-	struct ibv_recv_wr head;
-	struct ibv_recv_wr **next = &head.next;
-	struct ibv_recv_wr *bad_wr;
 	unsigned int i;
 	unsigned int pkts_ret = 0;
 	int ret;
@@ -660,9 +657,6 @@ mlx5_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 				/* Increment dropped packets counter. */
 				++rxq->stats.idropped;
 #endif
-				/* Link completed WRs together for repost. */
-				*next = wr;
-				next = &wr->next;
 				goto repost;
 			}
 			ret = wc.byte_len;
@@ -671,9 +665,6 @@ mlx5_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 			break;
 		len = ret;
 		pkt_buf_len = len;
-		/* Link completed WRs together for repost. */
-		*next = wr;
-		next = &wr->next;
 		/*
 		 * Replace spent segments with new ones, concatenate and
 		 * return them as pkt_buf.
@@ -770,26 +761,22 @@ mlx5_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		rxq->stats.ibytes += pkt_buf_len;
 #endif
 repost:
+		ret = rxq->if_qp->recv_sg_list(rxq->qp,
+					       elt->sges,
+					       RTE_DIM(elt->sges));
+		if (unlikely(ret)) {
+			/* Inability to repost WRs is fatal. */
+			DEBUG("%p: recv_sg_list(): failed (ret=%d)",
+			      (void *)rxq->priv,
+			      ret);
+			abort();
+		}
 		if (++elts_head >= elts_n)
 			elts_head = 0;
 		continue;
 	}
 	if (unlikely(i == 0))
 		return 0;
-	*next = NULL;
-	/* Repost WRs. */
-#ifdef DEBUG_RECV
-	DEBUG("%p: reposting %d WRs", (void *)rxq, i);
-#endif
-	ret = ibv_post_recv(rxq->qp, head.next, &bad_wr);
-	if (unlikely(ret)) {
-		/* Inability to repost WRs is fatal. */
-		DEBUG("%p: ibv_post_recv(): failed for WR %p: %s",
-		      (void *)rxq->priv,
-		      (void *)bad_wr,
-		      strerror(ret));
-		abort();
-	}
 	rxq->elts_head = elts_head;
 #ifdef MLX5_PMD_SOFT_COUNTERS
 	/* Increment packets counter. */
-- 
2.1.0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [dpdk-dev] [PATCH 02/17] mlx5: get rid of the WR structure in RX queue elements
  2015-10-05 17:54 [dpdk-dev] [PATCH 00/17] Enhance mlx5 with Mellanox OFED 3.1 Adrien Mazarguil
  2015-10-05 17:54 ` [dpdk-dev] [PATCH 01/17] mlx5: use fast Verbs interface for scattered RX operation Adrien Mazarguil
@ 2015-10-05 17:54 ` Adrien Mazarguil
  2015-10-05 17:54 ` [dpdk-dev] [PATCH 03/17] mlx5: refactor RX code for the new Verbs RSS API Adrien Mazarguil
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 39+ messages in thread
From: Adrien Mazarguil @ 2015-10-05 17:54 UTC (permalink / raw)
  To: dev

Removing this structure reduces the size of SG and non-SG RX queue elements
significantly to improve performance.

An nice side effect is that the mbuf pointer is now fully stored in
struct rxq_elt instead of relying on the WR ID data offset hack.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Olga Shern <olgas@mellanox.com>
Signed-off-by: Or Ami <ora@mellanox.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx5/mlx5.h       |  18 -----
 drivers/net/mlx5/mlx5_rxq.c   | 162 ++++++++++++++++++++----------------------
 drivers/net/mlx5/mlx5_rxtx.c  |  33 +++------
 drivers/net/mlx5/mlx5_rxtx.h  |   4 +-
 drivers/net/mlx5/mlx5_utils.h |   2 -
 5 files changed, 87 insertions(+), 132 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 459dc3d..a818703 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -118,24 +118,6 @@ struct priv {
 	rte_spinlock_t lock; /* Lock for control functions. */
 };
 
-/* Work Request ID data type (64 bit). */
-typedef union {
-	struct {
-		uint32_t id;
-		uint16_t offset;
-	} data;
-	uint64_t raw;
-} wr_id_t;
-
-/* Compile-time check. */
-static inline void wr_id_t_check(void)
-{
-	wr_id_t check[1 + (2 * -!(sizeof(wr_id_t) == sizeof(uint64_t)))];
-
-	(void)check;
-	(void)wr_id_t_check;
-}
-
 /**
  * Lock private structure to protect it from concurrent access in the
  * control path.
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 8cfad17..c938d2d 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -97,16 +97,10 @@ rxq_alloc_elts_sp(struct rxq *rxq, unsigned int elts_n,
 	for (i = 0; (i != elts_n); ++i) {
 		unsigned int j;
 		struct rxq_elt_sp *elt = &(*elts)[i];
-		struct ibv_recv_wr *wr = &elt->wr;
 		struct ibv_sge (*sges)[RTE_DIM(elt->sges)] = &elt->sges;
 
 		/* These two arrays must have the same size. */
 		assert(RTE_DIM(elt->sges) == RTE_DIM(elt->bufs));
-		/* Configure WR. */
-		wr->wr_id = i;
-		wr->next = &(*elts)[(i + 1)].wr;
-		wr->sg_list = &(*sges)[0];
-		wr->num_sge = RTE_DIM(*sges);
 		/* For each SGE (segment). */
 		for (j = 0; (j != RTE_DIM(elt->bufs)); ++j) {
 			struct ibv_sge *sge = &(*sges)[j];
@@ -149,8 +143,6 @@ rxq_alloc_elts_sp(struct rxq *rxq, unsigned int elts_n,
 			assert(sge->length == rte_pktmbuf_tailroom(buf));
 		}
 	}
-	/* The last WR pointer must be NULL. */
-	(*elts)[(i - 1)].wr.next = NULL;
 	DEBUG("%p: allocated and configured %u WRs (%zu segments)",
 	      (void *)rxq, elts_n, (elts_n * RTE_DIM((*elts)[0].sges)));
 	rxq->elts_n = elts_n;
@@ -242,7 +234,6 @@ rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n, struct rte_mbuf **pool)
 	/* For each WR (packet). */
 	for (i = 0; (i != elts_n); ++i) {
 		struct rxq_elt *elt = &(*elts)[i];
-		struct ibv_recv_wr *wr = &elt->wr;
 		struct ibv_sge *sge = &(*elts)[i].sge;
 		struct rte_mbuf *buf;
 
@@ -258,16 +249,7 @@ rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n, struct rte_mbuf **pool)
 			ret = ENOMEM;
 			goto error;
 		}
-		/* Configure WR. Work request ID contains its own index in
-		 * the elts array and the offset between SGE buffer header and
-		 * its data. */
-		WR_ID(wr->wr_id).id = i;
-		WR_ID(wr->wr_id).offset =
-			(((uintptr_t)buf->buf_addr + RTE_PKTMBUF_HEADROOM) -
-			 (uintptr_t)buf);
-		wr->next = &(*elts)[(i + 1)].wr;
-		wr->sg_list = sge;
-		wr->num_sge = 1;
+		elt->buf = buf;
 		/* Headroom is reserved by rte_pktmbuf_alloc(). */
 		assert(DATA_OFF(buf) == RTE_PKTMBUF_HEADROOM);
 		/* Buffer is supposed to be empty. */
@@ -282,21 +264,7 @@ rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n, struct rte_mbuf **pool)
 		sge->lkey = rxq->mr->lkey;
 		/* Redundant check for tailroom. */
 		assert(sge->length == rte_pktmbuf_tailroom(buf));
-		/* Make sure elts index and SGE mbuf pointer can be deduced
-		 * from WR ID. */
-		if ((WR_ID(wr->wr_id).id != i) ||
-		    ((void *)((uintptr_t)sge->addr -
-			WR_ID(wr->wr_id).offset) != buf)) {
-			ERROR("%p: cannot store index and offset in WR ID",
-			      (void *)rxq);
-			sge->addr = 0;
-			rte_pktmbuf_free(buf);
-			ret = EOVERFLOW;
-			goto error;
-		}
 	}
-	/* The last WR pointer must be NULL. */
-	(*elts)[(i - 1)].wr.next = NULL;
 	DEBUG("%p: allocated and configured %u single-segment WRs",
 	      (void *)rxq, elts_n);
 	rxq->elts_n = elts_n;
@@ -309,14 +277,10 @@ error:
 		assert(pool == NULL);
 		for (i = 0; (i != RTE_DIM(*elts)); ++i) {
 			struct rxq_elt *elt = &(*elts)[i];
-			struct rte_mbuf *buf;
+			struct rte_mbuf *buf = elt->buf;
 
-			if (elt->sge.addr == 0)
-				continue;
-			assert(WR_ID(elt->wr.wr_id).id == i);
-			buf = (void *)((uintptr_t)elt->sge.addr -
-				WR_ID(elt->wr.wr_id).offset);
-			rte_pktmbuf_free_seg(buf);
+			if (buf != NULL)
+				rte_pktmbuf_free_seg(buf);
 		}
 		rte_free(elts);
 	}
@@ -345,14 +309,10 @@ rxq_free_elts(struct rxq *rxq)
 		return;
 	for (i = 0; (i != RTE_DIM(*elts)); ++i) {
 		struct rxq_elt *elt = &(*elts)[i];
-		struct rte_mbuf *buf;
+		struct rte_mbuf *buf = elt->buf;
 
-		if (elt->sge.addr == 0)
-			continue;
-		assert(WR_ID(elt->wr.wr_id).id == i);
-		buf = (void *)((uintptr_t)elt->sge.addr -
-			WR_ID(elt->wr.wr_id).offset);
-		rte_pktmbuf_free_seg(buf);
+		if (buf != NULL)
+			rte_pktmbuf_free_seg(buf);
 	}
 	rte_free(elts);
 }
@@ -552,7 +512,6 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 	struct rte_mbuf **pool;
 	unsigned int i, k;
 	struct ibv_exp_qp_attr mod;
-	struct ibv_recv_wr *bad_wr;
 	int err;
 	int parent = (rxq == &priv->rxq_parent);
 
@@ -673,11 +632,8 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 
 		for (i = 0; (i != RTE_DIM(*elts)); ++i) {
 			struct rxq_elt *elt = &(*elts)[i];
-			struct rte_mbuf *buf = (void *)
-				((uintptr_t)elt->sge.addr -
-				 WR_ID(elt->wr.wr_id).offset);
+			struct rte_mbuf *buf = elt->buf;
 
-			assert(WR_ID(elt->wr.wr_id).id == i);
 			pool[k++] = buf;
 		}
 	}
@@ -701,17 +657,36 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 	rxq->elts_n = 0;
 	rte_free(rxq->elts.sp);
 	rxq->elts.sp = NULL;
-	/* Post WRs. */
-	err = ibv_post_recv(tmpl.qp,
-			    (tmpl.sp ?
-			     &(*tmpl.elts.sp)[0].wr :
-			     &(*tmpl.elts.no_sp)[0].wr),
-			    &bad_wr);
+	/* Post SGEs. */
+	assert(tmpl.if_qp != NULL);
+	if (tmpl.sp) {
+		struct rxq_elt_sp (*elts)[tmpl.elts_n] = tmpl.elts.sp;
+
+		for (i = 0; (i != RTE_DIM(*elts)); ++i) {
+			err = tmpl.if_qp->recv_sg_list
+				(tmpl.qp,
+				 (*elts)[i].sges,
+				 RTE_DIM((*elts)[i].sges));
+			if (err)
+				break;
+		}
+	} else {
+		struct rxq_elt (*elts)[tmpl.elts_n] = tmpl.elts.no_sp;
+
+		for (i = 0; (i != RTE_DIM(*elts)); ++i) {
+			err = tmpl.if_qp->recv_burst(
+				tmpl.qp,
+				&(*elts)[i].sge,
+				1);
+			if (err)
+				break;
+		}
+	}
 	if (err) {
-		ERROR("%p: ibv_post_recv() failed for WR %p: %s",
-		      (void *)dev,
-		      (void *)bad_wr,
-		      strerror(err));
+		ERROR("%p: failed to post SGEs with error %d",
+		      (void *)dev, err);
+		/* Set err because it does not contain a valid errno value. */
+		err = EIO;
 		goto skip_rtr;
 	}
 	mod = (struct ibv_exp_qp_attr){
@@ -764,10 +739,10 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 		struct ibv_exp_res_domain_init_attr rd;
 	} attr;
 	enum ibv_exp_query_intf_status status;
-	struct ibv_recv_wr *bad_wr;
 	struct rte_mbuf *buf;
 	int ret = 0;
 	int parent = (rxq == &priv->rxq_parent);
+	unsigned int i;
 
 	(void)conf; /* Thresholds configuration (ignored). */
 	/*
@@ -903,28 +878,7 @@ skip_mr:
 		      (void *)dev, strerror(ret));
 		goto error;
 	}
-	ret = ibv_post_recv(tmpl.qp,
-			    (tmpl.sp ?
-			     &(*tmpl.elts.sp)[0].wr :
-			     &(*tmpl.elts.no_sp)[0].wr),
-			    &bad_wr);
-	if (ret) {
-		ERROR("%p: ibv_post_recv() failed for WR %p: %s",
-		      (void *)dev,
-		      (void *)bad_wr,
-		      strerror(ret));
-		goto error;
-	}
 skip_alloc:
-	mod = (struct ibv_exp_qp_attr){
-		.qp_state = IBV_QPS_RTR
-	};
-	ret = ibv_exp_modify_qp(tmpl.qp, &mod, IBV_EXP_QP_STATE);
-	if (ret) {
-		ERROR("%p: QP state to IBV_QPS_RTR failed: %s",
-		      (void *)dev, strerror(ret));
-		goto error;
-	}
 	/* Save port ID. */
 	tmpl.port_id = dev->data->port_id;
 	DEBUG("%p: RTE port ID: %u", (void *)rxq, tmpl.port_id);
@@ -950,6 +904,46 @@ skip_alloc:
 		      (void *)dev, status);
 		goto error;
 	}
+	/* Post SGEs. */
+	if (!parent && tmpl.sp) {
+		struct rxq_elt_sp (*elts)[tmpl.elts_n] = tmpl.elts.sp;
+
+		for (i = 0; (i != RTE_DIM(*elts)); ++i) {
+			ret = tmpl.if_qp->recv_sg_list
+				(tmpl.qp,
+				 (*elts)[i].sges,
+				 RTE_DIM((*elts)[i].sges));
+			if (ret)
+				break;
+		}
+	} else if (!parent) {
+		struct rxq_elt (*elts)[tmpl.elts_n] = tmpl.elts.no_sp;
+
+		for (i = 0; (i != RTE_DIM(*elts)); ++i) {
+			ret = tmpl.if_qp->recv_burst(
+				tmpl.qp,
+				&(*elts)[i].sge,
+				1);
+			if (ret)
+				break;
+		}
+	}
+	if (ret) {
+		ERROR("%p: failed to post SGEs with error %d",
+		      (void *)dev, ret);
+		/* Set ret because it does not contain a valid errno value. */
+		ret = EIO;
+		goto error;
+	}
+	mod = (struct ibv_exp_qp_attr){
+		.qp_state = IBV_QPS_RTR
+	};
+	ret = ibv_exp_modify_qp(tmpl.qp, &mod, IBV_EXP_QP_STATE);
+	if (ret) {
+		ERROR("%p: QP state to IBV_QPS_RTR failed: %s",
+		      (void *)dev, strerror(ret));
+		goto error;
+	}
 	/* Clean up rxq in case we're reinitializing it. */
 	DEBUG("%p: cleaning-up old rxq just in case", (void *)rxq);
 	rxq_cleanup(rxq);
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 8db4f3f..06712cb 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -610,8 +610,6 @@ mlx5_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		return 0;
 	for (i = 0; (i != pkts_n); ++i) {
 		struct rxq_elt_sp *elt = &(*elts)[elts_head];
-		struct ibv_recv_wr *wr = &elt->wr;
-		uint64_t wr_id = wr->wr_id;
 		unsigned int len;
 		unsigned int pkt_buf_len;
 		struct rte_mbuf *pkt_buf = NULL; /* Buffer returned in pkts. */
@@ -621,12 +619,6 @@ mlx5_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		uint32_t flags;
 
 		/* Sanity checks. */
-#ifdef NDEBUG
-		(void)wr_id;
-#endif
-		assert(wr_id < rxq->elts_n);
-		assert(wr->sg_list == elt->sges);
-		assert(wr->num_sge == RTE_DIM(elt->sges));
 		assert(elts_head < rxq->elts_n);
 		assert(rxq->elts_head < rxq->elts_n);
 		ret = rxq->if_cq->poll_length_flags(rxq->cq, NULL, NULL,
@@ -675,6 +667,7 @@ mlx5_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 			struct rte_mbuf *rep;
 			unsigned int seg_tailroom;
 
+			assert(seg != NULL);
 			/*
 			 * Fetch initial bytes of packet descriptor into a
 			 * cacheline while allocating rep.
@@ -686,9 +679,8 @@ mlx5_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 				 * Unable to allocate a replacement mbuf,
 				 * repost WR.
 				 */
-				DEBUG("rxq=%p, wr_id=%" PRIu64 ":"
-				      " can't allocate a new mbuf",
-				      (void *)rxq, wr_id);
+				DEBUG("rxq=%p: can't allocate a new mbuf",
+				      (void *)rxq);
 				if (pkt_buf != NULL) {
 					*pkt_buf_next = NULL;
 					rte_pktmbuf_free(pkt_buf);
@@ -818,18 +810,13 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		return mlx5_rx_burst_sp(dpdk_rxq, pkts, pkts_n);
 	for (i = 0; (i != pkts_n); ++i) {
 		struct rxq_elt *elt = &(*elts)[elts_head];
-		struct ibv_recv_wr *wr = &elt->wr;
-		uint64_t wr_id = wr->wr_id;
 		unsigned int len;
-		struct rte_mbuf *seg = (void *)((uintptr_t)elt->sge.addr -
-			WR_ID(wr_id).offset);
+		struct rte_mbuf *seg = elt->buf;
 		struct rte_mbuf *rep;
 		uint32_t flags;
 
 		/* Sanity checks. */
-		assert(WR_ID(wr_id).id < rxq->elts_n);
-		assert(wr->sg_list == &elt->sge);
-		assert(wr->num_sge == 1);
+		assert(seg != NULL);
 		assert(elts_head < rxq->elts_n);
 		assert(rxq->elts_head < rxq->elts_n);
 		ret = rxq->if_cq->poll_length_flags(rxq->cq, NULL, NULL,
@@ -880,9 +867,8 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 			 * Unable to allocate a replacement mbuf,
 			 * repost WR.
 			 */
-			DEBUG("rxq=%p, wr_id=%" PRIu32 ":"
-			      " can't allocate a new mbuf",
-			      (void *)rxq, WR_ID(wr_id).id);
+			DEBUG("rxq=%p: can't allocate a new mbuf",
+			      (void *)rxq);
 			/* Increment out of memory counters. */
 			++rxq->stats.rx_nombuf;
 			++rxq->priv->dev->data->rx_mbuf_alloc_failed;
@@ -892,10 +878,7 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		/* Reconfigure sge to use rep instead of seg. */
 		elt->sge.addr = (uintptr_t)rep->buf_addr + RTE_PKTMBUF_HEADROOM;
 		assert(elt->sge.lkey == rxq->mr->lkey);
-		WR_ID(wr->wr_id).offset =
-			(((uintptr_t)rep->buf_addr + RTE_PKTMBUF_HEADROOM) -
-			 (uintptr_t)rep);
-		assert(WR_ID(wr->wr_id).id == WR_ID(wr_id).id);
+		elt->buf = rep;
 
 		/* Add SGE to array for repost. */
 		sges[i] = elt->sge;
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 0eb1e98..aec67f6 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -81,16 +81,14 @@ struct mlx5_txq_stats {
 
 /* RX element (scattered packets). */
 struct rxq_elt_sp {
-	struct ibv_recv_wr wr; /* Work Request. */
 	struct ibv_sge sges[MLX5_PMD_SGE_WR_N]; /* Scatter/Gather Elements. */
 	struct rte_mbuf *bufs[MLX5_PMD_SGE_WR_N]; /* SGEs buffers. */
 };
 
 /* RX element. */
 struct rxq_elt {
-	struct ibv_recv_wr wr; /* Work Request. */
 	struct ibv_sge sge; /* Scatter/Gather Element. */
-	/* mbuf pointer is derived from WR_ID(wr.wr_id).offset. */
+	struct rte_mbuf *buf; /* SGE buffer. */
 };
 
 struct priv;
diff --git a/drivers/net/mlx5/mlx5_utils.h b/drivers/net/mlx5/mlx5_utils.h
index 8ff075b..f1fad18 100644
--- a/drivers/net/mlx5/mlx5_utils.h
+++ b/drivers/net/mlx5/mlx5_utils.h
@@ -161,6 +161,4 @@ pmd_drv_log_basename(const char *s)
 	\
 	snprintf(name, sizeof(name), __VA_ARGS__)
 
-#define WR_ID(o) (((wr_id_t *)&(o))->data)
-
 #endif /* RTE_PMD_MLX5_UTILS_H_ */
-- 
2.1.0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [dpdk-dev] [PATCH 03/17] mlx5: refactor RX code for the new Verbs RSS API
  2015-10-05 17:54 [dpdk-dev] [PATCH 00/17] Enhance mlx5 with Mellanox OFED 3.1 Adrien Mazarguil
  2015-10-05 17:54 ` [dpdk-dev] [PATCH 01/17] mlx5: use fast Verbs interface for scattered RX operation Adrien Mazarguil
  2015-10-05 17:54 ` [dpdk-dev] [PATCH 02/17] mlx5: get rid of the WR structure in RX queue elements Adrien Mazarguil
@ 2015-10-05 17:54 ` Adrien Mazarguil
  2015-10-05 17:54 ` [dpdk-dev] [PATCH 04/17] mlx5: restore allmulti and promisc modes after device restart Adrien Mazarguil
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 39+ messages in thread
From: Adrien Mazarguil @ 2015-10-05 17:54 UTC (permalink / raw)
  To: dev

The new Verbs RSS API is lower-level than the previous one and much more
flexible but requires RX queues to use Work Queues (WQs) internally instead
of Queue Pairs (QPs), which are grouped in an indirection table used by a
new kind of hash RX QPs.

Hash RX QPs and the indirection table together replace the parent RSS QP
while WQs are mostly similar to child QPs.

RSS hash key is not configurable yet.

Summary of changes:

- Individual DPDK RX queues do not store flow properties anymore, this info
  is now part of the hash RX queues.
- All functions affecting the parent queue when RSS is enabled or the basic
  queues otherwise are modified to affect hash RX queues instead.
- Hash RX queues are also used when a single DPDK RX queue is configured (no
  RSS) to remove that special case.
- Hash RX queues and indirection table are created/destroyed when device
  is started/stopped in addition to create/destroy flows.
- Contrary to QPs, WQs are moved to the "ready" state before posting RX
  buffers, otherwise they are ignored.
- Resource domain information is added to WQs for better performance.
- CQs are not resized anymore when switching between non-SG and SG modes as
  it does not work correctly with WQs. Use the largest possible size
  instead, since CQ size does not have to be the same as the number of
  elements in the RX queue. This also applies to the maximum number of
  outstanding WRs in a WQ (max_recv_wr).

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Olga Shern <olgas@mellanox.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Or Ami <ora@mellanox.com>
---
 drivers/net/mlx5/Makefile       |   4 -
 drivers/net/mlx5/mlx5.c         |  38 +--
 drivers/net/mlx5/mlx5.h         |  25 +-
 drivers/net/mlx5/mlx5_ethdev.c  |  53 +---
 drivers/net/mlx5/mlx5_mac.c     | 186 +++++++------
 drivers/net/mlx5/mlx5_rxmode.c  | 295 +++++++++++----------
 drivers/net/mlx5/mlx5_rxq.c     | 559 +++++++++++++++++++++-------------------
 drivers/net/mlx5/mlx5_rxtx.c    |   4 +-
 drivers/net/mlx5/mlx5_rxtx.h    |  23 +-
 drivers/net/mlx5/mlx5_trigger.c |  86 ++-----
 drivers/net/mlx5/mlx5_vlan.c    |  44 +---
 11 files changed, 641 insertions(+), 676 deletions(-)

diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 8b1e32b..938f924 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -112,10 +112,6 @@ endif
 mlx5_autoconf.h: $(RTE_SDK)/scripts/auto-config-h.sh
 	$Q $(RM) -f -- '$@'
 	$Q sh -- '$<' '$@' \
-		RSS_SUPPORT \
-		infiniband/verbs.h \
-		enum IBV_EXP_DEVICE_UD_RSS $(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
 		HAVE_EXP_QUERY_DEVICE \
 		infiniband/verbs.h \
 		type 'struct ibv_exp_device_attr' $(AUTOCONF_OUTPUT)
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 47070f8..a316989 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -85,6 +85,13 @@ mlx5_dev_close(struct rte_eth_dev *dev)
 	DEBUG("%p: closing device \"%s\"",
 	      (void *)dev,
 	      ((priv->ctx != NULL) ? priv->ctx->device->name : ""));
+	/* In case mlx5_dev_stop() has not been called. */
+	if (priv->started) {
+		priv_allmulticast_disable(priv);
+		priv_promiscuous_disable(priv);
+		priv_mac_addrs_disable(priv);
+		priv_destroy_hash_rxqs(priv);
+	}
 	/* Prevent crashes when queues are still in use. */
 	dev->rx_pkt_burst = removed_rx_burst;
 	dev->tx_pkt_burst = removed_tx_burst;
@@ -116,8 +123,6 @@ mlx5_dev_close(struct rte_eth_dev *dev)
 		priv->txqs_n = 0;
 		priv->txqs = NULL;
 	}
-	if (priv->rss)
-		rxq_cleanup(&priv->rxq_parent);
 	if (priv->pd != NULL) {
 		assert(priv->ctx != NULL);
 		claim_zero(ibv_dealloc_pd(priv->pd));
@@ -297,9 +302,6 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 
 #ifdef HAVE_EXP_QUERY_DEVICE
 		exp_device_attr.comp_mask = IBV_EXP_DEVICE_ATTR_EXP_CAP_FLAGS;
-#ifdef RSS_SUPPORT
-		exp_device_attr.comp_mask |= IBV_EXP_DEVICE_ATTR_RSS_TBL_SZ;
-#endif /* RSS_SUPPORT */
 #endif /* HAVE_EXP_QUERY_DEVICE */
 
 		DEBUG("using port %u (%08" PRIx32 ")", port, test);
@@ -349,32 +351,6 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 			ERROR("ibv_exp_query_device() failed");
 			goto port_error;
 		}
-#ifdef RSS_SUPPORT
-		if ((exp_device_attr.exp_device_cap_flags &
-		     IBV_EXP_DEVICE_QPG) &&
-		    (exp_device_attr.exp_device_cap_flags &
-		     IBV_EXP_DEVICE_UD_RSS) &&
-		    (exp_device_attr.comp_mask &
-		     IBV_EXP_DEVICE_ATTR_RSS_TBL_SZ) &&
-		    (exp_device_attr.max_rss_tbl_sz > 0)) {
-			priv->hw_qpg = 1;
-			priv->hw_rss = 1;
-			priv->max_rss_tbl_sz = exp_device_attr.max_rss_tbl_sz;
-		} else {
-			priv->hw_qpg = 0;
-			priv->hw_rss = 0;
-			priv->max_rss_tbl_sz = 0;
-		}
-		priv->hw_tss = !!(exp_device_attr.exp_device_cap_flags &
-				  IBV_EXP_DEVICE_UD_TSS);
-		DEBUG("device flags: %s%s%s",
-		      (priv->hw_qpg ? "IBV_DEVICE_QPG " : ""),
-		      (priv->hw_tss ? "IBV_DEVICE_TSS " : ""),
-		      (priv->hw_rss ? "IBV_DEVICE_RSS " : ""));
-		if (priv->hw_rss)
-			DEBUG("maximum RSS indirection table size: %u",
-			      exp_device_attr.max_rss_tbl_sz);
-#endif /* RSS_SUPPORT */
 
 		priv->hw_csum =
 			((exp_device_attr.exp_device_cap_flags &
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index a818703..9720e96 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -101,20 +101,19 @@ struct priv {
 	unsigned int started:1; /* Device started, flows enabled. */
 	unsigned int promisc:1; /* Device in promiscuous mode. */
 	unsigned int allmulti:1; /* Device receives all multicast packets. */
-	unsigned int hw_qpg:1; /* QP groups are supported. */
-	unsigned int hw_tss:1; /* TSS is supported. */
-	unsigned int hw_rss:1; /* RSS is supported. */
 	unsigned int hw_csum:1; /* Checksum offload is supported. */
 	unsigned int hw_csum_l2tun:1; /* Same for L2 tunnels. */
-	unsigned int rss:1; /* RSS is enabled. */
 	unsigned int vf:1; /* This is a VF device. */
-	unsigned int max_rss_tbl_sz; /* Maximum number of RSS queues. */
 	/* RX/TX queues. */
-	struct rxq rxq_parent; /* Parent queue when RSS is enabled. */
 	unsigned int rxqs_n; /* RX queues array size. */
 	unsigned int txqs_n; /* TX queues array size. */
 	struct rxq *(*rxqs)[]; /* RX queues. */
 	struct txq *(*txqs)[]; /* TX queues. */
+	/* Indirection table referencing all RX WQs. */
+	struct ibv_exp_rwq_ind_table *ind_table;
+	/* Hash RX QPs feeding the indirection table. */
+	struct hash_rxq (*hash_rxqs)[];
+	unsigned int hash_rxqs_n; /* Hash RX QPs array size. */
 	rte_spinlock_t lock; /* Lock for control functions. */
 };
 
@@ -161,23 +160,25 @@ int mlx5_ibv_device_to_pci_addr(const struct ibv_device *,
 /* mlx5_mac.c */
 
 int priv_get_mac(struct priv *, uint8_t (*)[ETHER_ADDR_LEN]);
-void rxq_mac_addrs_del(struct rxq *);
+void hash_rxq_mac_addrs_del(struct hash_rxq *);
+void priv_mac_addrs_disable(struct priv *);
 void mlx5_mac_addr_remove(struct rte_eth_dev *, uint32_t);
-int rxq_mac_addrs_add(struct rxq *);
+int hash_rxq_mac_addrs_add(struct hash_rxq *);
 int priv_mac_addr_add(struct priv *, unsigned int,
 		      const uint8_t (*)[ETHER_ADDR_LEN]);
+int priv_mac_addrs_enable(struct priv *);
 void mlx5_mac_addr_add(struct rte_eth_dev *, struct ether_addr *, uint32_t,
 		       uint32_t);
 
 /* mlx5_rxmode.c */
 
-int rxq_promiscuous_enable(struct rxq *);
+int priv_promiscuous_enable(struct priv *);
 void mlx5_promiscuous_enable(struct rte_eth_dev *);
-void rxq_promiscuous_disable(struct rxq *);
+void priv_promiscuous_disable(struct priv *);
 void mlx5_promiscuous_disable(struct rte_eth_dev *);
-int rxq_allmulticast_enable(struct rxq *);
+int priv_allmulticast_enable(struct priv *);
 void mlx5_allmulticast_enable(struct rte_eth_dev *);
-void rxq_allmulticast_disable(struct rxq *);
+void priv_allmulticast_disable(struct priv *);
 void mlx5_allmulticast_disable(struct rte_eth_dev *);
 
 /* mlx5_stats.c */
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 181a877..fac685e 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -394,7 +394,6 @@ priv_set_flags(struct priv *priv, unsigned int keep, unsigned int flags)
  * Ethernet device configuration.
  *
  * Prepare the driver for a given number of TX and RX queues.
- * Allocate parent RSS queue when several RX queues are requested.
  *
  * @param dev
  *   Pointer to Ethernet device structure.
@@ -408,8 +407,6 @@ dev_configure(struct rte_eth_dev *dev)
 	struct priv *priv = dev->data->dev_private;
 	unsigned int rxqs_n = dev->data->nb_rx_queues;
 	unsigned int txqs_n = dev->data->nb_tx_queues;
-	unsigned int tmp;
-	int ret;
 
 	priv->rxqs = (void *)dev->data->rx_queues;
 	priv->txqs = (void *)dev->data->tx_queues;
@@ -422,47 +419,8 @@ dev_configure(struct rte_eth_dev *dev)
 		return 0;
 	INFO("%p: RX queues number update: %u -> %u",
 	     (void *)dev, priv->rxqs_n, rxqs_n);
-	/* If RSS is enabled, disable it first. */
-	if (priv->rss) {
-		unsigned int i;
-
-		/* Only if there are no remaining child RX queues. */
-		for (i = 0; (i != priv->rxqs_n); ++i)
-			if ((*priv->rxqs)[i] != NULL)
-				return EINVAL;
-		rxq_cleanup(&priv->rxq_parent);
-		priv->rss = 0;
-		priv->rxqs_n = 0;
-	}
-	if (rxqs_n <= 1) {
-		/* Nothing else to do. */
-		priv->rxqs_n = rxqs_n;
-		return 0;
-	}
-	/* Allocate a new RSS parent queue if supported by hardware. */
-	if (!priv->hw_rss) {
-		ERROR("%p: only a single RX queue can be configured when"
-		      " hardware doesn't support RSS",
-		      (void *)dev);
-		return EINVAL;
-	}
-	/* Fail if hardware doesn't support that many RSS queues. */
-	if (rxqs_n >= priv->max_rss_tbl_sz) {
-		ERROR("%p: only %u RX queues can be configured for RSS",
-		      (void *)dev, priv->max_rss_tbl_sz);
-		return EINVAL;
-	}
-	priv->rss = 1;
-	tmp = priv->rxqs_n;
 	priv->rxqs_n = rxqs_n;
-	ret = rxq_setup(dev, &priv->rxq_parent, 0, 0, NULL, NULL);
-	if (!ret)
-		return 0;
-	/* Failure, rollback. */
-	priv->rss = 0;
-	priv->rxqs_n = tmp;
-	assert(ret > 0);
-	return ret;
+	return 0;
 }
 
 /**
@@ -671,15 +629,6 @@ mlx5_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
 				rx_func = mlx5_rx_burst_sp;
 			break;
 		}
-		/* Reenable non-RSS queue attributes. No need to check
-		 * for errors at this stage. */
-		if (!priv->rss) {
-			rxq_mac_addrs_add(rxq);
-			if (priv->promisc)
-				rxq_promiscuous_enable(rxq);
-			if (priv->allmulti)
-				rxq_allmulticast_enable(rxq);
-		}
 		/* Scattered burst function takes priority. */
 		if (rxq->sp)
 			rx_func = mlx5_rx_burst_sp;
diff --git a/drivers/net/mlx5/mlx5_mac.c b/drivers/net/mlx5/mlx5_mac.c
index f01faf0..971f2cd 100644
--- a/drivers/net/mlx5/mlx5_mac.c
+++ b/drivers/net/mlx5/mlx5_mac.c
@@ -93,83 +93,84 @@ priv_get_mac(struct priv *priv, uint8_t (*mac)[ETHER_ADDR_LEN])
 /**
  * Delete flow steering rule.
  *
- * @param rxq
- *   Pointer to RX queue structure.
+ * @param hash_rxq
+ *   Pointer to hash RX queue structure.
  * @param mac_index
  *   MAC address index.
  * @param vlan_index
  *   VLAN index.
  */
 static void
-rxq_del_flow(struct rxq *rxq, unsigned int mac_index, unsigned int vlan_index)
+hash_rxq_del_flow(struct hash_rxq *hash_rxq, unsigned int mac_index,
+		  unsigned int vlan_index)
 {
 #ifndef NDEBUG
-	struct priv *priv = rxq->priv;
+	struct priv *priv = hash_rxq->priv;
 	const uint8_t (*mac)[ETHER_ADDR_LEN] =
 		(const uint8_t (*)[ETHER_ADDR_LEN])
 		priv->mac[mac_index].addr_bytes;
 #endif
-	assert(rxq->mac_flow[mac_index][vlan_index] != NULL);
+	assert(hash_rxq->mac_flow[mac_index][vlan_index] != NULL);
 	DEBUG("%p: removing MAC address %02x:%02x:%02x:%02x:%02x:%02x index %u"
 	      " (VLAN ID %" PRIu16 ")",
-	      (void *)rxq,
+	      (void *)hash_rxq,
 	      (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5],
 	      mac_index, priv->vlan_filter[vlan_index].id);
-	claim_zero(ibv_destroy_flow(rxq->mac_flow[mac_index][vlan_index]));
-	rxq->mac_flow[mac_index][vlan_index] = NULL;
+	claim_zero(ibv_destroy_flow(hash_rxq->mac_flow
+				    [mac_index][vlan_index]));
+	hash_rxq->mac_flow[mac_index][vlan_index] = NULL;
 }
 
 /**
- * Unregister a MAC address from a RX queue.
+ * Unregister a MAC address from a hash RX queue.
  *
- * @param rxq
- *   Pointer to RX queue structure.
+ * @param hash_rxq
+ *   Pointer to hash RX queue structure.
  * @param mac_index
  *   MAC address index.
  */
 static void
-rxq_mac_addr_del(struct rxq *rxq, unsigned int mac_index)
+hash_rxq_mac_addr_del(struct hash_rxq *hash_rxq, unsigned int mac_index)
 {
-	struct priv *priv = rxq->priv;
+	struct priv *priv = hash_rxq->priv;
 	unsigned int i;
 	unsigned int vlans = 0;
 
 	assert(mac_index < RTE_DIM(priv->mac));
-	if (!BITFIELD_ISSET(rxq->mac_configured, mac_index))
+	if (!BITFIELD_ISSET(hash_rxq->mac_configured, mac_index))
 		return;
 	for (i = 0; (i != RTE_DIM(priv->vlan_filter)); ++i) {
 		if (!priv->vlan_filter[i].enabled)
 			continue;
-		rxq_del_flow(rxq, mac_index, i);
+		hash_rxq_del_flow(hash_rxq, mac_index, i);
 		vlans++;
 	}
 	if (!vlans) {
-		rxq_del_flow(rxq, mac_index, 0);
+		hash_rxq_del_flow(hash_rxq, mac_index, 0);
 	}
-	BITFIELD_RESET(rxq->mac_configured, mac_index);
+	BITFIELD_RESET(hash_rxq->mac_configured, mac_index);
 }
 
 /**
- * Unregister all MAC addresses from a RX queue.
+ * Unregister all MAC addresses from a hash RX queue.
  *
- * @param rxq
- *   Pointer to RX queue structure.
+ * @param hash_rxq
+ *   Pointer to hash RX queue structure.
  */
 void
-rxq_mac_addrs_del(struct rxq *rxq)
+hash_rxq_mac_addrs_del(struct hash_rxq *hash_rxq)
 {
-	struct priv *priv = rxq->priv;
+	struct priv *priv = hash_rxq->priv;
 	unsigned int i;
 
 	for (i = 0; (i != RTE_DIM(priv->mac)); ++i)
-		rxq_mac_addr_del(rxq, i);
+		hash_rxq_mac_addr_del(hash_rxq, i);
 }
 
 /**
  * Unregister a MAC address.
  *
- * In RSS mode, the MAC address is unregistered from the parent queue,
- * otherwise it is unregistered from each queue directly.
+ * This is done for each hash RX queue.
  *
  * @param priv
  *   Pointer to private structure.
@@ -184,17 +185,27 @@ priv_mac_addr_del(struct priv *priv, unsigned int mac_index)
 	assert(mac_index < RTE_DIM(priv->mac));
 	if (!BITFIELD_ISSET(priv->mac_configured, mac_index))
 		return;
-	if (priv->rss) {
-		rxq_mac_addr_del(&priv->rxq_parent, mac_index);
-		goto end;
-	}
-	for (i = 0; (i != priv->dev->data->nb_rx_queues); ++i)
-		rxq_mac_addr_del((*priv->rxqs)[i], mac_index);
-end:
+	for (i = 0; (i != priv->hash_rxqs_n); ++i)
+		hash_rxq_mac_addr_del(&(*priv->hash_rxqs)[i], mac_index);
 	BITFIELD_RESET(priv->mac_configured, mac_index);
 }
 
 /**
+ * Unregister all MAC addresses from all hash RX queues.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ */
+void
+priv_mac_addrs_disable(struct priv *priv)
+{
+	unsigned int i;
+
+	for (i = 0; (i != priv->hash_rxqs_n); ++i)
+		hash_rxq_mac_addrs_del(&(*priv->hash_rxqs)[i]);
+}
+
+/**
  * DPDK callback to remove a MAC address.
  *
  * @param dev
@@ -221,8 +232,8 @@ end:
 /**
  * Add single flow steering rule.
  *
- * @param rxq
- *   Pointer to RX queue structure.
+ * @param hash_rxq
+ *   Pointer to hash RX queue structure.
  * @param mac_index
  *   MAC address index to register.
  * @param vlan_index
@@ -232,10 +243,11 @@ end:
  *   0 on success, errno value on failure.
  */
 static int
-rxq_add_flow(struct rxq *rxq, unsigned int mac_index, unsigned int vlan_index)
+hash_rxq_add_flow(struct hash_rxq *hash_rxq, unsigned int mac_index,
+		  unsigned int vlan_index)
 {
 	struct ibv_flow *flow;
-	struct priv *priv = rxq->priv;
+	struct priv *priv = hash_rxq->priv;
 	const uint8_t (*mac)[ETHER_ADDR_LEN] =
 			(const uint8_t (*)[ETHER_ADDR_LEN])
 			priv->mac[mac_index].addr_bytes;
@@ -280,18 +292,18 @@ rxq_add_flow(struct rxq *rxq, unsigned int mac_index, unsigned int vlan_index)
 	};
 	DEBUG("%p: adding MAC address %02x:%02x:%02x:%02x:%02x:%02x index %u"
 	      " (VLAN %s %" PRIu16 ")",
-	      (void *)rxq,
+	      (void *)hash_rxq,
 	      (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5],
 	      mac_index,
 	      ((vlan_index != -1u) ? "ID" : "index"),
 	      ((vlan_index != -1u) ? priv->vlan_filter[vlan_index].id : -1u));
 	/* Create related flow. */
 	errno = 0;
-	flow = ibv_create_flow(rxq->qp, attr);
+	flow = ibv_create_flow(hash_rxq->qp, attr);
 	if (flow == NULL) {
 		/* It's not clear whether errno is always set in this case. */
 		ERROR("%p: flow configuration failed, errno=%d: %s",
-		      (void *)rxq, errno,
+		      (void *)hash_rxq, errno,
 		      (errno ? strerror(errno) : "Unknown error"));
 		if (errno)
 			return errno;
@@ -299,16 +311,16 @@ rxq_add_flow(struct rxq *rxq, unsigned int mac_index, unsigned int vlan_index)
 	}
 	if (vlan_index == -1u)
 		vlan_index = 0;
-	assert(rxq->mac_flow[mac_index][vlan_index] == NULL);
-	rxq->mac_flow[mac_index][vlan_index] = flow;
+	assert(hash_rxq->mac_flow[mac_index][vlan_index] == NULL);
+	hash_rxq->mac_flow[mac_index][vlan_index] = flow;
 	return 0;
 }
 
 /**
- * Register a MAC address in a RX queue.
+ * Register a MAC address in a hash RX queue.
  *
- * @param rxq
- *   Pointer to RX queue structure.
+ * @param hash_rxq
+ *   Pointer to hash RX queue structure.
  * @param mac_index
  *   MAC address index to register.
  *
@@ -316,22 +328,22 @@ rxq_add_flow(struct rxq *rxq, unsigned int mac_index, unsigned int vlan_index)
  *   0 on success, errno value on failure.
  */
 static int
-rxq_mac_addr_add(struct rxq *rxq, unsigned int mac_index)
+hash_rxq_mac_addr_add(struct hash_rxq *hash_rxq, unsigned int mac_index)
 {
-	struct priv *priv = rxq->priv;
+	struct priv *priv = hash_rxq->priv;
 	unsigned int i;
 	unsigned int vlans = 0;
 	int ret;
 
 	assert(mac_index < RTE_DIM(priv->mac));
-	if (BITFIELD_ISSET(rxq->mac_configured, mac_index))
-		rxq_mac_addr_del(rxq, mac_index);
+	if (BITFIELD_ISSET(hash_rxq->mac_configured, mac_index))
+		hash_rxq_mac_addr_del(hash_rxq, mac_index);
 	/* Fill VLAN specifications. */
 	for (i = 0; (i != RTE_DIM(priv->vlan_filter)); ++i) {
 		if (!priv->vlan_filter[i].enabled)
 			continue;
 		/* Create related flow. */
-		ret = rxq_add_flow(rxq, mac_index, i);
+		ret = hash_rxq_add_flow(hash_rxq, mac_index, i);
 		if (!ret) {
 			vlans++;
 			continue;
@@ -339,45 +351,45 @@ rxq_mac_addr_add(struct rxq *rxq, unsigned int mac_index)
 		/* Failure, rollback. */
 		while (i != 0)
 			if (priv->vlan_filter[--i].enabled)
-				rxq_del_flow(rxq, mac_index, i);
+				hash_rxq_del_flow(hash_rxq, mac_index, i);
 		assert(ret > 0);
 		return ret;
 	}
 	/* In case there is no VLAN filter. */
 	if (!vlans) {
-		ret = rxq_add_flow(rxq, mac_index, -1);
+		ret = hash_rxq_add_flow(hash_rxq, mac_index, -1);
 		if (ret)
 			return ret;
 	}
-	BITFIELD_SET(rxq->mac_configured, mac_index);
+	BITFIELD_SET(hash_rxq->mac_configured, mac_index);
 	return 0;
 }
 
 /**
- * Register all MAC addresses in a RX queue.
+ * Register all MAC addresses in a hash RX queue.
  *
- * @param rxq
- *   Pointer to RX queue structure.
+ * @param hash_rxq
+ *   Pointer to hash RX queue structure.
  *
  * @return
  *   0 on success, errno value on failure.
  */
 int
-rxq_mac_addrs_add(struct rxq *rxq)
+hash_rxq_mac_addrs_add(struct hash_rxq *hash_rxq)
 {
-	struct priv *priv = rxq->priv;
+	struct priv *priv = hash_rxq->priv;
 	unsigned int i;
 	int ret;
 
 	for (i = 0; (i != RTE_DIM(priv->mac)); ++i) {
 		if (!BITFIELD_ISSET(priv->mac_configured, i))
 			continue;
-		ret = rxq_mac_addr_add(rxq, i);
+		ret = hash_rxq_mac_addr_add(hash_rxq, i);
 		if (!ret)
 			continue;
 		/* Failure, rollback. */
 		while (i != 0)
-			rxq_mac_addr_del(rxq, --i);
+			hash_rxq_mac_addr_del(hash_rxq, --i);
 		assert(ret > 0);
 		return ret;
 	}
@@ -387,8 +399,7 @@ rxq_mac_addrs_add(struct rxq *rxq)
 /**
  * Register a MAC address.
  *
- * In RSS mode, the MAC address is registered in the parent queue,
- * otherwise it is registered in each queue directly.
+ * This is done for each hash RX queue.
  *
  * @param priv
  *   Pointer to private structure.
@@ -431,32 +442,23 @@ priv_mac_addr_add(struct priv *priv, unsigned int mac_index,
 	/* If device isn't started, this is all we need to do. */
 	if (!priv->started) {
 #ifndef NDEBUG
-		/* Verify that all queues have this index disabled. */
-		for (i = 0; (i != priv->rxqs_n); ++i) {
-			if ((*priv->rxqs)[i] == NULL)
-				continue;
+		/* Verify that all hash RX queues have this index disabled. */
+		for (i = 0; (i != priv->hash_rxqs_n); ++i) {
 			assert(!BITFIELD_ISSET
-			       ((*priv->rxqs)[i]->mac_configured, mac_index));
+			       ((*priv->hash_rxqs)[i].mac_configured,
+				mac_index));
 		}
 #endif
 		goto end;
 	}
-	if (priv->rss) {
-		ret = rxq_mac_addr_add(&priv->rxq_parent, mac_index);
-		if (ret)
-			return ret;
-		goto end;
-	}
-	for (i = 0; (i != priv->rxqs_n); ++i) {
-		if ((*priv->rxqs)[i] == NULL)
-			continue;
-		ret = rxq_mac_addr_add((*priv->rxqs)[i], mac_index);
+	for (i = 0; (i != priv->hash_rxqs_n); ++i) {
+		ret = hash_rxq_mac_addr_add(&(*priv->hash_rxqs)[i], mac_index);
 		if (!ret)
 			continue;
 		/* Failure, rollback. */
 		while (i != 0)
-			if ((*priv->rxqs)[(--i)] != NULL)
-				rxq_mac_addr_del((*priv->rxqs)[i], mac_index);
+			hash_rxq_mac_addr_del(&(*priv->hash_rxqs)[--i],
+					      mac_index);
 		return ret;
 	}
 end:
@@ -465,6 +467,34 @@ end:
 }
 
 /**
+ * Register all MAC addresses in all hash RX queues.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ *
+ * @return
+ *   0 on success, errno value on failure.
+ */
+int
+priv_mac_addrs_enable(struct priv *priv)
+{
+	unsigned int i;
+	int ret;
+
+	for (i = 0; (i != priv->hash_rxqs_n); ++i) {
+		ret = hash_rxq_mac_addrs_add(&(*priv->hash_rxqs)[i]);
+		if (!ret)
+			continue;
+		/* Failure, rollback. */
+		while (i != 0)
+			hash_rxq_mac_addrs_del(&(*priv->hash_rxqs)[--i]);
+		assert(ret > 0);
+		return ret;
+	}
+	return 0;
+}
+
+/**
  * DPDK callback to add a MAC address.
  *
  * @param dev
diff --git a/drivers/net/mlx5/mlx5_rxmode.c b/drivers/net/mlx5/mlx5_rxmode.c
index b4e5493..1f5cd40 100644
--- a/drivers/net/mlx5/mlx5_rxmode.c
+++ b/drivers/net/mlx5/mlx5_rxmode.c
@@ -58,111 +58,142 @@
 #include "mlx5_rxtx.h"
 #include "mlx5_utils.h"
 
+static void hash_rxq_promiscuous_disable(struct hash_rxq *);
+static void hash_rxq_allmulticast_disable(struct hash_rxq *);
+
 /**
- * Enable promiscuous mode in a RX queue.
+ * Enable promiscuous mode in a hash RX queue.
  *
- * @param rxq
- *   Pointer to RX queue structure.
+ * @param hash_rxq
+ *   Pointer to hash RX queue structure.
  *
  * @return
  *   0 on success, errno value on failure.
  */
-int
-rxq_promiscuous_enable(struct rxq *rxq)
+static int
+hash_rxq_promiscuous_enable(struct hash_rxq *hash_rxq)
 {
 	struct ibv_flow *flow;
 	struct ibv_flow_attr attr = {
 		.type = IBV_FLOW_ATTR_ALL_DEFAULT,
 		.num_of_specs = 0,
-		.port = rxq->priv->port,
+		.port = hash_rxq->priv->port,
 		.flags = 0
 	};
 
-	if (rxq->priv->vf)
+	if (hash_rxq->priv->vf)
 		return 0;
-	DEBUG("%p: enabling promiscuous mode", (void *)rxq);
-	if (rxq->promisc_flow != NULL)
+	DEBUG("%p: enabling promiscuous mode", (void *)hash_rxq);
+	if (hash_rxq->promisc_flow != NULL)
 		return EBUSY;
 	errno = 0;
-	flow = ibv_create_flow(rxq->qp, &attr);
+	flow = ibv_create_flow(hash_rxq->qp, &attr);
 	if (flow == NULL) {
 		/* It's not clear whether errno is always set in this case. */
 		ERROR("%p: flow configuration failed, errno=%d: %s",
-		      (void *)rxq, errno,
+		      (void *)hash_rxq, errno,
 		      (errno ? strerror(errno) : "Unknown error"));
 		if (errno)
 			return errno;
 		return EINVAL;
 	}
-	rxq->promisc_flow = flow;
-	DEBUG("%p: promiscuous mode enabled", (void *)rxq);
+	hash_rxq->promisc_flow = flow;
+	DEBUG("%p: promiscuous mode enabled", (void *)hash_rxq);
 	return 0;
 }
 
 /**
- * DPDK callback to enable promiscuous mode.
+ * Enable promiscuous mode in all hash RX queues.
  *
- * @param dev
- *   Pointer to Ethernet device structure.
+ * @param priv
+ *   Private structure.
+ *
+ * @return
+ *   0 on success, errno value on failure.
  */
-void
-mlx5_promiscuous_enable(struct rte_eth_dev *dev)
+int
+priv_promiscuous_enable(struct priv *priv)
 {
-	struct priv *priv = dev->data->dev_private;
 	unsigned int i;
-	int ret;
 
-	priv_lock(priv);
-	if (priv->promisc) {
-		priv_unlock(priv);
-		return;
-	}
+	if (priv->promisc)
+		return 0;
 	/* If device isn't started, this is all we need to do. */
 	if (!priv->started)
 		goto end;
-	if (priv->rss) {
-		ret = rxq_promiscuous_enable(&priv->rxq_parent);
-		if (ret) {
-			priv_unlock(priv);
-			return;
-		}
-		goto end;
-	}
-	for (i = 0; (i != priv->rxqs_n); ++i) {
-		if ((*priv->rxqs)[i] == NULL)
-			continue;
-		ret = rxq_promiscuous_enable((*priv->rxqs)[i]);
+	for (i = 0; (i != priv->hash_rxqs_n); ++i) {
+		struct hash_rxq *hash_rxq = &(*priv->hash_rxqs)[i];
+		int ret;
+
+		ret = hash_rxq_promiscuous_enable(hash_rxq);
 		if (!ret)
 			continue;
 		/* Failure, rollback. */
-		while (i != 0)
-			if ((*priv->rxqs)[--i] != NULL)
-				rxq_promiscuous_disable((*priv->rxqs)[i]);
-		priv_unlock(priv);
-		return;
+		while (i != 0) {
+			hash_rxq = &(*priv->hash_rxqs)[--i];
+			hash_rxq_promiscuous_disable(hash_rxq);
+		}
+		return ret;
 	}
 end:
 	priv->promisc = 1;
-	priv_unlock(priv);
+	return 0;
 }
 
 /**
- * Disable promiscuous mode in a RX queue.
+ * DPDK callback to enable promiscuous mode.
  *
- * @param rxq
- *   Pointer to RX queue structure.
+ * @param dev
+ *   Pointer to Ethernet device structure.
  */
 void
-rxq_promiscuous_disable(struct rxq *rxq)
+mlx5_promiscuous_enable(struct rte_eth_dev *dev)
 {
-	if (rxq->priv->vf)
+	struct priv *priv = dev->data->dev_private;
+	int ret;
+
+	priv_lock(priv);
+	ret = priv_promiscuous_enable(priv);
+	if (ret)
+		ERROR("cannot enable promiscuous mode: %s", strerror(ret));
+	priv_unlock(priv);
+}
+
+/**
+ * Disable promiscuous mode in a hash RX queue.
+ *
+ * @param hash_rxq
+ *   Pointer to hash RX queue structure.
+ */
+static void
+hash_rxq_promiscuous_disable(struct hash_rxq *hash_rxq)
+{
+	if (hash_rxq->priv->vf)
 		return;
-	DEBUG("%p: disabling promiscuous mode", (void *)rxq);
-	if (rxq->promisc_flow == NULL)
+	DEBUG("%p: disabling promiscuous mode", (void *)hash_rxq);
+	if (hash_rxq->promisc_flow == NULL)
 		return;
-	claim_zero(ibv_destroy_flow(rxq->promisc_flow));
-	rxq->promisc_flow = NULL;
-	DEBUG("%p: promiscuous mode disabled", (void *)rxq);
+	claim_zero(ibv_destroy_flow(hash_rxq->promisc_flow));
+	hash_rxq->promisc_flow = NULL;
+	DEBUG("%p: promiscuous mode disabled", (void *)hash_rxq);
+}
+
+/**
+ * Disable promiscuous mode in all hash RX queues.
+ *
+ * @param priv
+ *   Private structure.
+ */
+void
+priv_promiscuous_disable(struct priv *priv)
+{
+	unsigned int i;
+
+	if (!priv->promisc)
+		return;
+	for (i = 0; (i != priv->hash_rxqs_n); ++i)
+		hash_rxq_promiscuous_disable(&(*priv->hash_rxqs)[i]);
+	priv->promisc = 0;
 }
 
 /**
@@ -175,126 +206,141 @@ void
 mlx5_promiscuous_disable(struct rte_eth_dev *dev)
 {
 	struct priv *priv = dev->data->dev_private;
-	unsigned int i;
 
 	priv_lock(priv);
-	if (!priv->promisc) {
-		priv_unlock(priv);
-		return;
-	}
-	if (priv->rss) {
-		rxq_promiscuous_disable(&priv->rxq_parent);
-		goto end;
-	}
-	for (i = 0; (i != priv->rxqs_n); ++i)
-		if ((*priv->rxqs)[i] != NULL)
-			rxq_promiscuous_disable((*priv->rxqs)[i]);
-end:
-	priv->promisc = 0;
+	priv_promiscuous_disable(priv);
 	priv_unlock(priv);
 }
 
 /**
- * Enable allmulti mode in a RX queue.
+ * Enable allmulti mode in a hash RX queue.
  *
- * @param rxq
- *   Pointer to RX queue structure.
+ * @param hash_rxq
+ *   Pointer to hash RX queue structure.
  *
  * @return
  *   0 on success, errno value on failure.
  */
-int
-rxq_allmulticast_enable(struct rxq *rxq)
+static int
+hash_rxq_allmulticast_enable(struct hash_rxq *hash_rxq)
 {
 	struct ibv_flow *flow;
 	struct ibv_flow_attr attr = {
 		.type = IBV_FLOW_ATTR_MC_DEFAULT,
 		.num_of_specs = 0,
-		.port = rxq->priv->port,
+		.port = hash_rxq->priv->port,
 		.flags = 0
 	};
 
-	DEBUG("%p: enabling allmulticast mode", (void *)rxq);
-	if (rxq->allmulti_flow != NULL)
+	DEBUG("%p: enabling allmulticast mode", (void *)hash_rxq);
+	if (hash_rxq->allmulti_flow != NULL)
 		return EBUSY;
 	errno = 0;
-	flow = ibv_create_flow(rxq->qp, &attr);
+	flow = ibv_create_flow(hash_rxq->qp, &attr);
 	if (flow == NULL) {
 		/* It's not clear whether errno is always set in this case. */
 		ERROR("%p: flow configuration failed, errno=%d: %s",
-		      (void *)rxq, errno,
+		      (void *)hash_rxq, errno,
 		      (errno ? strerror(errno) : "Unknown error"));
 		if (errno)
 			return errno;
 		return EINVAL;
 	}
-	rxq->allmulti_flow = flow;
-	DEBUG("%p: allmulticast mode enabled", (void *)rxq);
+	hash_rxq->allmulti_flow = flow;
+	DEBUG("%p: allmulticast mode enabled", (void *)hash_rxq);
 	return 0;
 }
 
 /**
- * DPDK callback to enable allmulti mode.
+ * Enable allmulti mode in all hash RX queues.
  *
- * @param dev
- *   Pointer to Ethernet device structure.
+ * @param priv
+ *   Private structure.
+ *
+ * @return
+ *   0 on success, errno value on failure.
  */
-void
-mlx5_allmulticast_enable(struct rte_eth_dev *dev)
+int
+priv_allmulticast_enable(struct priv *priv)
 {
-	struct priv *priv = dev->data->dev_private;
 	unsigned int i;
-	int ret;
 
-	priv_lock(priv);
-	if (priv->allmulti) {
-		priv_unlock(priv);
-		return;
-	}
+	if (priv->allmulti)
+		return 0;
 	/* If device isn't started, this is all we need to do. */
 	if (!priv->started)
 		goto end;
-	if (priv->rss) {
-		ret = rxq_allmulticast_enable(&priv->rxq_parent);
-		if (ret) {
-			priv_unlock(priv);
-			return;
-		}
-		goto end;
-	}
-	for (i = 0; (i != priv->rxqs_n); ++i) {
-		if ((*priv->rxqs)[i] == NULL)
-			continue;
-		ret = rxq_allmulticast_enable((*priv->rxqs)[i]);
+	for (i = 0; (i != priv->hash_rxqs_n); ++i) {
+		struct hash_rxq *hash_rxq = &(*priv->hash_rxqs)[i];
+		int ret;
+
+		ret = hash_rxq_allmulticast_enable(hash_rxq);
 		if (!ret)
 			continue;
 		/* Failure, rollback. */
-		while (i != 0)
-			if ((*priv->rxqs)[--i] != NULL)
-				rxq_allmulticast_disable((*priv->rxqs)[i]);
-		priv_unlock(priv);
-		return;
+		while (i != 0) {
+			hash_rxq = &(*priv->hash_rxqs)[--i];
+			hash_rxq_allmulticast_disable(hash_rxq);
+		}
+		return ret;
 	}
 end:
 	priv->allmulti = 1;
+	return 0;
+}
+
+/**
+ * DPDK callback to enable allmulti mode.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ */
+void
+mlx5_allmulticast_enable(struct rte_eth_dev *dev)
+{
+	struct priv *priv = dev->data->dev_private;
+	int ret;
+
+	priv_lock(priv);
+	ret = priv_allmulticast_enable(priv);
+	if (ret)
+		ERROR("cannot enable allmulticast mode: %s", strerror(ret));
 	priv_unlock(priv);
 }
 
 /**
- * Disable allmulti mode in a RX queue.
+ * Disable allmulti mode in a hash RX queue.
+ *
+ * @param hash_rxq
+ *   Pointer to hash RX queue structure.
+ */
+static void
+hash_rxq_allmulticast_disable(struct hash_rxq *hash_rxq)
+{
+	DEBUG("%p: disabling allmulticast mode", (void *)hash_rxq);
+	if (hash_rxq->allmulti_flow == NULL)
+		return;
+	claim_zero(ibv_destroy_flow(hash_rxq->allmulti_flow));
+	hash_rxq->allmulti_flow = NULL;
+	DEBUG("%p: allmulticast mode disabled", (void *)hash_rxq);
+}
+
+/**
+ * Disable allmulti mode in all hash RX queues.
  *
- * @param rxq
- *   Pointer to RX queue structure.
+ * @param priv
+ *   Private structure.
  */
 void
-rxq_allmulticast_disable(struct rxq *rxq)
+priv_allmulticast_disable(struct priv *priv)
 {
-	DEBUG("%p: disabling allmulticast mode", (void *)rxq);
-	if (rxq->allmulti_flow == NULL)
+	unsigned int i;
+
+	if (!priv->allmulti)
 		return;
-	claim_zero(ibv_destroy_flow(rxq->allmulti_flow));
-	rxq->allmulti_flow = NULL;
-	DEBUG("%p: allmulticast mode disabled", (void *)rxq);
+	for (i = 0; (i != priv->hash_rxqs_n); ++i)
+		hash_rxq_allmulticast_disable(&(*priv->hash_rxqs)[i]);
+	priv->allmulti = 0;
 }
 
 /**
@@ -307,21 +353,8 @@ void
 mlx5_allmulticast_disable(struct rte_eth_dev *dev)
 {
 	struct priv *priv = dev->data->dev_private;
-	unsigned int i;
 
 	priv_lock(priv);
-	if (!priv->allmulti) {
-		priv_unlock(priv);
-		return;
-	}
-	if (priv->rss) {
-		rxq_allmulticast_disable(&priv->rxq_parent);
-		goto end;
-	}
-	for (i = 0; (i != priv->rxqs_n); ++i)
-		if ((*priv->rxqs)[i] != NULL)
-			rxq_allmulticast_disable((*priv->rxqs)[i]);
-end:
-	priv->allmulti = 0;
+	priv_allmulticast_disable(priv);
 	priv_unlock(priv);
 }
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index c938d2d..5392221 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -64,6 +64,224 @@
 #include "mlx5_utils.h"
 #include "mlx5_defs.h"
 
+/* Default RSS hash key also used for ConnectX-3. */
+static uint8_t hash_rxq_default_key[] = {
+	0x2c, 0xc6, 0x81, 0xd1,
+	0x5b, 0xdb, 0xf4, 0xf7,
+	0xfc, 0xa2, 0x83, 0x19,
+	0xdb, 0x1a, 0x3e, 0x94,
+	0x6b, 0x9e, 0x38, 0xd9,
+	0x2c, 0x9c, 0x03, 0xd1,
+	0xad, 0x99, 0x44, 0xa7,
+	0xd9, 0x56, 0x3d, 0x59,
+	0x06, 0x3c, 0x25, 0xf3,
+	0xfc, 0x1f, 0xdc, 0x2a,
+};
+
+/**
+ * Return nearest power of two above input value.
+ *
+ * @param v
+ *   Input value.
+ *
+ * @return
+ *   Nearest power of two above input value.
+ */
+static unsigned int
+log2above(unsigned int v)
+{
+	unsigned int l;
+	unsigned int r;
+
+	for (l = 0, r = 0; (v >> 1); ++l, v >>= 1)
+		r |= (v & 1);
+	return (l + r);
+}
+
+/**
+ * Initialize hash RX queues and indirection table.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ *
+ * @return
+ *   0 on success, errno value on failure.
+ */
+int
+priv_create_hash_rxqs(struct priv *priv)
+{
+	static const uint64_t rss_hash_table[] = {
+		/* TCPv4. */
+		(IBV_EXP_RX_HASH_SRC_IPV4 | IBV_EXP_RX_HASH_DST_IPV4 |
+		 IBV_EXP_RX_HASH_SRC_PORT_TCP | IBV_EXP_RX_HASH_DST_PORT_TCP),
+		/* UDPv4. */
+		(IBV_EXP_RX_HASH_SRC_IPV4 | IBV_EXP_RX_HASH_DST_IPV4 |
+		 IBV_EXP_RX_HASH_SRC_PORT_UDP | IBV_EXP_RX_HASH_DST_PORT_UDP),
+		/* TCPv6. */
+		(IBV_EXP_RX_HASH_SRC_IPV6 | IBV_EXP_RX_HASH_DST_IPV6 |
+		 IBV_EXP_RX_HASH_SRC_PORT_TCP | IBV_EXP_RX_HASH_DST_PORT_TCP),
+		/* UDPv6. */
+		(IBV_EXP_RX_HASH_SRC_IPV6 | IBV_EXP_RX_HASH_DST_IPV6 |
+		 IBV_EXP_RX_HASH_SRC_PORT_UDP | IBV_EXP_RX_HASH_DST_PORT_UDP),
+		/* Other IPv4. */
+		(IBV_EXP_RX_HASH_SRC_IPV4 | IBV_EXP_RX_HASH_DST_IPV4),
+		/* Other IPv6. */
+		(IBV_EXP_RX_HASH_SRC_IPV6 | IBV_EXP_RX_HASH_DST_IPV6),
+		/* None, used for everything else. */
+		0,
+	};
+
+	DEBUG("allocating hash RX queues for %u WQs", priv->rxqs_n);
+	assert(priv->ind_table == NULL);
+	assert(priv->hash_rxqs == NULL);
+	assert(priv->hash_rxqs_n == 0);
+	assert(priv->pd != NULL);
+	assert(priv->ctx != NULL);
+	if (priv->rxqs_n == 0)
+		return EINVAL;
+	assert(priv->rxqs != NULL);
+
+	/* FIXME: large data structures are allocated on the stack. */
+	unsigned int wqs_n = (1 << log2above(priv->rxqs_n));
+	struct ibv_exp_wq *wqs[wqs_n];
+	struct ibv_exp_rwq_ind_table_init_attr ind_init_attr = {
+		.pd = priv->pd,
+		.log_ind_tbl_size = log2above(priv->rxqs_n),
+		.ind_tbl = wqs,
+		.comp_mask = 0,
+	};
+	struct ibv_exp_rwq_ind_table *ind_table = NULL;
+	/* If only one RX queue is configured, RSS is not needed and a single
+	 * empty hash entry is used (last rss_hash_table[] entry). */
+	unsigned int hash_rxqs_n =
+		((priv->rxqs_n == 1) ? 1 : RTE_DIM(rss_hash_table));
+	struct hash_rxq (*hash_rxqs)[hash_rxqs_n] = NULL;
+	unsigned int i;
+	unsigned int j;
+	int err = 0;
+
+	if (wqs_n < priv->rxqs_n) {
+		ERROR("cannot handle this many RX queues (%u)", priv->rxqs_n);
+		err = ERANGE;
+		goto error;
+	}
+	if (wqs_n != priv->rxqs_n)
+		WARN("%u RX queues are configured, consider rounding this"
+		     " number to the next power of two (%u) for optimal"
+		     " performance",
+		     priv->rxqs_n, wqs_n);
+	/* When the number of RX queues is not a power of two, the remaining
+	 * table entries are padded with reused WQs and hashes are not spread
+	 * uniformly. */
+	for (i = 0, j = 0; (i != wqs_n); ++i) {
+		wqs[i] = (*priv->rxqs)[j]->wq;
+		if (++j == priv->rxqs_n)
+			j = 0;
+	}
+	errno = 0;
+	ind_table = ibv_exp_create_rwq_ind_table(priv->ctx, &ind_init_attr);
+	if (ind_table == NULL) {
+		/* Not clear whether errno is set. */
+		err = (errno ? errno : EINVAL);
+		ERROR("RX indirection table creation failed with error %d: %s",
+		      err, strerror(err));
+		goto error;
+	}
+	/* Allocate array that holds hash RX queues and related data. */
+	hash_rxqs = rte_malloc(__func__, sizeof(*hash_rxqs), 0);
+	if (hash_rxqs == NULL) {
+		err = ENOMEM;
+		ERROR("cannot allocate hash RX queues container: %s",
+		      strerror(err));
+		goto error;
+	}
+	for (i = 0, j = (RTE_DIM(rss_hash_table) - hash_rxqs_n);
+	     (j != RTE_DIM(rss_hash_table));
+	     ++i, ++j) {
+		struct hash_rxq *hash_rxq = &(*hash_rxqs)[i];
+
+		struct ibv_exp_rx_hash_conf hash_conf = {
+			.rx_hash_function = IBV_EXP_RX_HASH_FUNC_TOEPLITZ,
+			.rx_hash_key_len = sizeof(hash_rxq_default_key),
+			.rx_hash_key = hash_rxq_default_key,
+			.rx_hash_fields_mask = rss_hash_table[j],
+			.rwq_ind_tbl = ind_table,
+		};
+		struct ibv_exp_qp_init_attr qp_init_attr = {
+			.max_inl_recv = 0, /* Currently not supported. */
+			.qp_type = IBV_QPT_RAW_PACKET,
+			.comp_mask = (IBV_EXP_QP_INIT_ATTR_PD |
+				      IBV_EXP_QP_INIT_ATTR_RX_HASH),
+			.pd = priv->pd,
+			.rx_hash_conf = &hash_conf,
+			.port_num = priv->port,
+		};
+
+		*hash_rxq = (struct hash_rxq){
+			.priv = priv,
+			.qp = ibv_exp_create_qp(priv->ctx, &qp_init_attr),
+		};
+		if (hash_rxq->qp == NULL) {
+			err = (errno ? errno : EINVAL);
+			ERROR("Hash RX QP creation failure: %s",
+			      strerror(err));
+			while (i) {
+				hash_rxq = &(*hash_rxqs)[--i];
+				claim_zero(ibv_destroy_qp(hash_rxq->qp));
+			}
+			goto error;
+		}
+	}
+	priv->ind_table = ind_table;
+	priv->hash_rxqs = hash_rxqs;
+	priv->hash_rxqs_n = hash_rxqs_n;
+	assert(err == 0);
+	return 0;
+error:
+	rte_free(hash_rxqs);
+	if (ind_table != NULL)
+		claim_zero(ibv_exp_destroy_rwq_ind_table(ind_table));
+	return err;
+}
+
+/**
+ * Clean up hash RX queues and indirection table.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ */
+void
+priv_destroy_hash_rxqs(struct priv *priv)
+{
+	unsigned int i;
+
+	DEBUG("destroying %u hash RX queues", priv->hash_rxqs_n);
+	if (priv->hash_rxqs_n == 0) {
+		assert(priv->hash_rxqs == NULL);
+		assert(priv->ind_table == NULL);
+		return;
+	}
+	for (i = 0; (i != priv->hash_rxqs_n); ++i) {
+		struct hash_rxq *hash_rxq = &(*priv->hash_rxqs)[i];
+		unsigned int j, k;
+
+		assert(hash_rxq->priv == priv);
+		assert(hash_rxq->qp != NULL);
+		/* Also check that there are no remaining flows. */
+		assert(hash_rxq->allmulti_flow == NULL);
+		assert(hash_rxq->promisc_flow == NULL);
+		for (j = 0; (j != RTE_DIM(hash_rxq->mac_flow)); ++j)
+			for (k = 0; (k != RTE_DIM(hash_rxq->mac_flow[j])); ++k)
+				assert(hash_rxq->mac_flow[j][k] == NULL);
+		claim_zero(ibv_destroy_qp(hash_rxq->qp));
+	}
+	priv->hash_rxqs_n = 0;
+	rte_free(priv->hash_rxqs);
+	priv->hash_rxqs = NULL;
+	claim_zero(ibv_exp_destroy_rwq_ind_table(priv->ind_table));
+	priv->ind_table = NULL;
+}
+
 /**
  * Allocate RX queue elements with scattered packets support.
  *
@@ -335,15 +553,15 @@ rxq_cleanup(struct rxq *rxq)
 		rxq_free_elts_sp(rxq);
 	else
 		rxq_free_elts(rxq);
-	if (rxq->if_qp != NULL) {
+	if (rxq->if_wq != NULL) {
 		assert(rxq->priv != NULL);
 		assert(rxq->priv->ctx != NULL);
-		assert(rxq->qp != NULL);
+		assert(rxq->wq != NULL);
 		params = (struct ibv_exp_release_intf_params){
 			.comp_mask = 0,
 		};
 		claim_zero(ibv_exp_release_intf(rxq->priv->ctx,
-						rxq->if_qp,
+						rxq->if_wq,
 						&params));
 	}
 	if (rxq->if_cq != NULL) {
@@ -357,12 +575,8 @@ rxq_cleanup(struct rxq *rxq)
 						rxq->if_cq,
 						&params));
 	}
-	if (rxq->qp != NULL) {
-		rxq_promiscuous_disable(rxq);
-		rxq_allmulticast_disable(rxq);
-		rxq_mac_addrs_del(rxq);
-		claim_zero(ibv_destroy_qp(rxq->qp));
-	}
+	if (rxq->wq != NULL)
+		claim_zero(ibv_exp_destroy_wq(rxq->wq));
 	if (rxq->cq != NULL)
 		claim_zero(ibv_destroy_cq(rxq->cq));
 	if (rxq->rd != NULL) {
@@ -382,112 +596,6 @@ rxq_cleanup(struct rxq *rxq)
 }
 
 /**
- * Allocate a Queue Pair.
- * Optionally setup inline receive if supported.
- *
- * @param priv
- *   Pointer to private structure.
- * @param cq
- *   Completion queue to associate with QP.
- * @param desc
- *   Number of descriptors in QP (hint only).
- *
- * @return
- *   QP pointer or NULL in case of error.
- */
-static struct ibv_qp *
-rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, uint16_t desc,
-	     struct ibv_exp_res_domain *rd)
-{
-	struct ibv_exp_qp_init_attr attr = {
-		/* CQ to be associated with the send queue. */
-		.send_cq = cq,
-		/* CQ to be associated with the receive queue. */
-		.recv_cq = cq,
-		.cap = {
-			/* Max number of outstanding WRs. */
-			.max_recv_wr = ((priv->device_attr.max_qp_wr < desc) ?
-					priv->device_attr.max_qp_wr :
-					desc),
-			/* Max number of scatter/gather elements in a WR. */
-			.max_recv_sge = ((priv->device_attr.max_sge <
-					  MLX5_PMD_SGE_WR_N) ?
-					 priv->device_attr.max_sge :
-					 MLX5_PMD_SGE_WR_N),
-		},
-		.qp_type = IBV_QPT_RAW_PACKET,
-		.comp_mask = (IBV_EXP_QP_INIT_ATTR_PD |
-			      IBV_EXP_QP_INIT_ATTR_RES_DOMAIN),
-		.pd = priv->pd,
-		.res_domain = rd,
-	};
-
-	return ibv_exp_create_qp(priv->ctx, &attr);
-}
-
-#ifdef RSS_SUPPORT
-
-/**
- * Allocate a RSS Queue Pair.
- * Optionally setup inline receive if supported.
- *
- * @param priv
- *   Pointer to private structure.
- * @param cq
- *   Completion queue to associate with QP.
- * @param desc
- *   Number of descriptors in QP (hint only).
- * @param parent
- *   If nonzero, create a parent QP, otherwise a child.
- *
- * @return
- *   QP pointer or NULL in case of error.
- */
-static struct ibv_qp *
-rxq_setup_qp_rss(struct priv *priv, struct ibv_cq *cq, uint16_t desc,
-		 int parent, struct ibv_exp_res_domain *rd)
-{
-	struct ibv_exp_qp_init_attr attr = {
-		/* CQ to be associated with the send queue. */
-		.send_cq = cq,
-		/* CQ to be associated with the receive queue. */
-		.recv_cq = cq,
-		.cap = {
-			/* Max number of outstanding WRs. */
-			.max_recv_wr = ((priv->device_attr.max_qp_wr < desc) ?
-					priv->device_attr.max_qp_wr :
-					desc),
-			/* Max number of scatter/gather elements in a WR. */
-			.max_recv_sge = ((priv->device_attr.max_sge <
-					  MLX5_PMD_SGE_WR_N) ?
-					 priv->device_attr.max_sge :
-					 MLX5_PMD_SGE_WR_N),
-		},
-		.qp_type = IBV_QPT_RAW_PACKET,
-		.comp_mask = (IBV_EXP_QP_INIT_ATTR_PD |
-			      IBV_EXP_QP_INIT_ATTR_RES_DOMAIN |
-			      IBV_EXP_QP_INIT_ATTR_QPG),
-		.pd = priv->pd,
-		.res_domain = rd,
-	};
-
-	if (parent) {
-		attr.qpg.qpg_type = IBV_EXP_QPG_PARENT;
-		/* TSS isn't necessary. */
-		attr.qpg.parent_attrib.tss_child_count = 0;
-		attr.qpg.parent_attrib.rss_child_count = priv->rxqs_n;
-		DEBUG("initializing parent RSS queue");
-	} else {
-		attr.qpg.qpg_type = IBV_EXP_QPG_CHILD_RX;
-		attr.qpg.qpg_parent = priv->rxq_parent.qp;
-		DEBUG("initializing child RSS queue");
-	}
-	return ibv_exp_create_qp(priv->ctx, &attr);
-}
-
-#endif /* RSS_SUPPORT */
-
-/**
  * Reconfigure a RX queue with new parameters.
  *
  * rxq_rehash() does not allocate mbufs, which, if not done from the right
@@ -511,15 +619,9 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 	unsigned int desc_n;
 	struct rte_mbuf **pool;
 	unsigned int i, k;
-	struct ibv_exp_qp_attr mod;
+	struct ibv_exp_wq_attr mod;
 	int err;
-	int parent = (rxq == &priv->rxq_parent);
 
-	if (parent) {
-		ERROR("%p: cannot rehash parent queue %p",
-		      (void *)dev, (void *)rxq);
-		return EINVAL;
-	}
 	DEBUG("%p: rehashing queue %p", (void *)dev, (void *)rxq);
 	/* Number of descriptors and mbufs currently allocated. */
 	desc_n = (tmpl.elts_n * (tmpl.sp ? MLX5_PMD_SGE_WR_N : 1));
@@ -548,64 +650,17 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 		DEBUG("%p: nothing to do", (void *)dev);
 		return 0;
 	}
-	/* Remove attached flows if RSS is disabled (no parent queue). */
-	if (!priv->rss) {
-		rxq_allmulticast_disable(&tmpl);
-		rxq_promiscuous_disable(&tmpl);
-		rxq_mac_addrs_del(&tmpl);
-		/* Update original queue in case of failure. */
-		rxq->allmulti_flow = tmpl.allmulti_flow;
-		rxq->promisc_flow = tmpl.promisc_flow;
-		memcpy(rxq->mac_configured, tmpl.mac_configured,
-		       sizeof(rxq->mac_configured));
-		memcpy(rxq->mac_flow, tmpl.mac_flow, sizeof(rxq->mac_flow));
-	}
 	/* From now on, any failure will render the queue unusable.
-	 * Reinitialize QP. */
-	mod = (struct ibv_exp_qp_attr){ .qp_state = IBV_QPS_RESET };
-	err = ibv_exp_modify_qp(tmpl.qp, &mod, IBV_EXP_QP_STATE);
-	if (err) {
-		ERROR("%p: cannot reset QP: %s", (void *)dev, strerror(err));
-		assert(err > 0);
-		return err;
-	}
-	err = ibv_resize_cq(tmpl.cq, desc_n);
-	if (err) {
-		ERROR("%p: cannot resize CQ: %s", (void *)dev, strerror(err));
-		assert(err > 0);
-		return err;
-	}
-	mod = (struct ibv_exp_qp_attr){
-		/* Move the QP to this state. */
-		.qp_state = IBV_QPS_INIT,
-		/* Primary port number. */
-		.port_num = priv->port
+	 * Reinitialize WQ. */
+	mod = (struct ibv_exp_wq_attr){
+		.attr_mask = IBV_EXP_WQ_ATTR_STATE,
+		.wq_state = IBV_EXP_WQS_RESET,
 	};
-	err = ibv_exp_modify_qp(tmpl.qp, &mod,
-				(IBV_EXP_QP_STATE |
-#ifdef RSS_SUPPORT
-				 (parent ? IBV_EXP_QP_GROUP_RSS : 0) |
-#endif /* RSS_SUPPORT */
-				 IBV_EXP_QP_PORT));
+	err = ibv_exp_modify_wq(tmpl.wq, &mod);
 	if (err) {
-		ERROR("%p: QP state to IBV_QPS_INIT failed: %s",
-		      (void *)dev, strerror(err));
+		ERROR("%p: cannot reset WQ: %s", (void *)dev, strerror(err));
 		assert(err > 0);
 		return err;
-	};
-	/* Reconfigure flows. Do not care for errors. */
-	if (!priv->rss) {
-		rxq_mac_addrs_add(&tmpl);
-		if (priv->promisc)
-			rxq_promiscuous_enable(&tmpl);
-		if (priv->allmulti)
-			rxq_allmulticast_enable(&tmpl);
-		/* Update original queue in case of failure. */
-		rxq->allmulti_flow = tmpl.allmulti_flow;
-		rxq->promisc_flow = tmpl.promisc_flow;
-		memcpy(rxq->mac_configured, tmpl.mac_configured,
-		       sizeof(rxq->mac_configured));
-		memcpy(rxq->mac_flow, tmpl.mac_flow, sizeof(rxq->mac_flow));
 	}
 	/* Allocate pool. */
 	pool = rte_malloc(__func__, (mbuf_n * sizeof(*pool)), 0);
@@ -657,14 +712,25 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 	rxq->elts_n = 0;
 	rte_free(rxq->elts.sp);
 	rxq->elts.sp = NULL;
+	/* Change queue state to ready. */
+	mod = (struct ibv_exp_wq_attr){
+		.attr_mask = IBV_EXP_WQ_ATTR_STATE,
+		.wq_state = IBV_EXP_WQS_RDY,
+	};
+	err = ibv_exp_modify_wq(tmpl.wq, &mod);
+	if (err) {
+		ERROR("%p: WQ state to IBV_EXP_WQS_RDY failed: %s",
+		      (void *)dev, strerror(err));
+		goto error;
+	}
 	/* Post SGEs. */
-	assert(tmpl.if_qp != NULL);
+	assert(tmpl.if_wq != NULL);
 	if (tmpl.sp) {
 		struct rxq_elt_sp (*elts)[tmpl.elts_n] = tmpl.elts.sp;
 
 		for (i = 0; (i != RTE_DIM(*elts)); ++i) {
-			err = tmpl.if_qp->recv_sg_list
-				(tmpl.qp,
+			err = tmpl.if_wq->recv_sg_list
+				(tmpl.wq,
 				 (*elts)[i].sges,
 				 RTE_DIM((*elts)[i].sges));
 			if (err)
@@ -674,8 +740,8 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 		struct rxq_elt (*elts)[tmpl.elts_n] = tmpl.elts.no_sp;
 
 		for (i = 0; (i != RTE_DIM(*elts)); ++i) {
-			err = tmpl.if_qp->recv_burst(
-				tmpl.qp,
+			err = tmpl.if_wq->recv_burst(
+				tmpl.wq,
 				&(*elts)[i].sge,
 				1);
 			if (err)
@@ -687,16 +753,9 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 		      (void *)dev, err);
 		/* Set err because it does not contain a valid errno value. */
 		err = EIO;
-		goto skip_rtr;
+		goto error;
 	}
-	mod = (struct ibv_exp_qp_attr){
-		.qp_state = IBV_QPS_RTR
-	};
-	err = ibv_exp_modify_qp(tmpl.qp, &mod, IBV_EXP_QP_STATE);
-	if (err)
-		ERROR("%p: QP state to IBV_QPS_RTR failed: %s",
-		      (void *)dev, strerror(err));
-skip_rtr:
+error:
 	*rxq = tmpl;
 	assert(err >= 0);
 	return err;
@@ -732,30 +791,20 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 		.mp = mp,
 		.socket = socket
 	};
-	struct ibv_exp_qp_attr mod;
+	struct ibv_exp_wq_attr mod;
 	union {
 		struct ibv_exp_query_intf_params params;
 		struct ibv_exp_cq_init_attr cq;
 		struct ibv_exp_res_domain_init_attr rd;
+		struct ibv_exp_wq_init_attr wq;
 	} attr;
 	enum ibv_exp_query_intf_status status;
 	struct rte_mbuf *buf;
 	int ret = 0;
-	int parent = (rxq == &priv->rxq_parent);
 	unsigned int i;
+	unsigned int cq_size = desc;
 
 	(void)conf; /* Thresholds configuration (ignored). */
-	/*
-	 * If this is a parent queue, hardware must support RSS and
-	 * RSS must be enabled.
-	 */
-	assert((!parent) || ((priv->hw_rss) && (priv->rss)));
-	if (parent) {
-		/* Even if unused, ibv_create_cq() requires at least one
-		 * descriptor. */
-		desc = 1;
-		goto skip_mr;
-	}
 	if ((desc == 0) || (desc % MLX5_PMD_SGE_WR_N)) {
 		ERROR("%p: invalid number of RX descriptors (must be a"
 		      " multiple of %d)", (void *)dev, MLX5_PMD_SGE_WR_N);
@@ -798,7 +847,6 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 		      (void *)dev, strerror(ret));
 		goto error;
 	}
-skip_mr:
 	attr.rd = (struct ibv_exp_res_domain_init_attr){
 		.comp_mask = (IBV_EXP_RES_DOMAIN_THREAD_MODEL |
 			      IBV_EXP_RES_DOMAIN_MSG_MODEL),
@@ -816,7 +864,8 @@ skip_mr:
 		.comp_mask = IBV_EXP_CQ_INIT_ATTR_RES_DOMAIN,
 		.res_domain = tmpl.rd,
 	};
-	tmpl.cq = ibv_exp_create_cq(priv->ctx, desc, NULL, NULL, 0, &attr.cq);
+	tmpl.cq = ibv_exp_create_cq(priv->ctx, cq_size, NULL, NULL, 0,
+				    &attr.cq);
 	if (tmpl.cq == NULL) {
 		ret = ENOMEM;
 		ERROR("%p: CQ creation failure: %s",
@@ -827,48 +876,30 @@ skip_mr:
 	      priv->device_attr.max_qp_wr);
 	DEBUG("priv->device_attr.max_sge is %d",
 	      priv->device_attr.max_sge);
-#ifdef RSS_SUPPORT
-	if (priv->rss)
-		tmpl.qp = rxq_setup_qp_rss(priv, tmpl.cq, desc, parent,
-					   tmpl.rd);
-	else
-#endif /* RSS_SUPPORT */
-		tmpl.qp = rxq_setup_qp(priv, tmpl.cq, desc, tmpl.rd);
-	if (tmpl.qp == NULL) {
-		ret = (errno ? errno : EINVAL);
-		ERROR("%p: QP creation failure: %s",
-		      (void *)dev, strerror(ret));
-		goto error;
-	}
-	mod = (struct ibv_exp_qp_attr){
-		/* Move the QP to this state. */
-		.qp_state = IBV_QPS_INIT,
-		/* Primary port number. */
-		.port_num = priv->port
+	attr.wq = (struct ibv_exp_wq_init_attr){
+		.wq_context = NULL, /* Could be useful in the future. */
+		.wq_type = IBV_EXP_WQT_RQ,
+		/* Max number of outstanding WRs. */
+		.max_recv_wr = ((priv->device_attr.max_qp_wr < (int)cq_size) ?
+				priv->device_attr.max_qp_wr :
+				(int)cq_size),
+		/* Max number of scatter/gather elements in a WR. */
+		.max_recv_sge = ((priv->device_attr.max_sge <
+				  MLX5_PMD_SGE_WR_N) ?
+				 priv->device_attr.max_sge :
+				 MLX5_PMD_SGE_WR_N),
+		.pd = priv->pd,
+		.cq = tmpl.cq,
+		.comp_mask = IBV_EXP_CREATE_WQ_RES_DOMAIN,
+		.res_domain = tmpl.rd,
 	};
-	ret = ibv_exp_modify_qp(tmpl.qp, &mod,
-				(IBV_EXP_QP_STATE |
-#ifdef RSS_SUPPORT
-				 (parent ? IBV_EXP_QP_GROUP_RSS : 0) |
-#endif /* RSS_SUPPORT */
-				 IBV_EXP_QP_PORT));
-	if (ret) {
-		ERROR("%p: QP state to IBV_QPS_INIT failed: %s",
+	tmpl.wq = ibv_exp_create_wq(priv->ctx, &attr.wq);
+	if (tmpl.wq == NULL) {
+		ret = (errno ? errno : EINVAL);
+		ERROR("%p: WQ creation failure: %s",
 		      (void *)dev, strerror(ret));
 		goto error;
 	}
-	if ((parent) || (!priv->rss))  {
-		/* Configure MAC and broadcast addresses. */
-		ret = rxq_mac_addrs_add(&tmpl);
-		if (ret) {
-			ERROR("%p: QP flow attachment failed: %s",
-			      (void *)dev, strerror(ret));
-			goto error;
-		}
-	}
-	/* Allocate descriptors for RX queues, except for the RSS parent. */
-	if (parent)
-		goto skip_alloc;
 	if (tmpl.sp)
 		ret = rxq_alloc_elts_sp(&tmpl, desc, NULL);
 	else
@@ -878,7 +909,6 @@ skip_mr:
 		      (void *)dev, strerror(ret));
 		goto error;
 	}
-skip_alloc:
 	/* Save port ID. */
 	tmpl.port_id = dev->data->port_id;
 	DEBUG("%p: RTE port ID: %u", (void *)rxq, tmpl.port_id);
@@ -895,33 +925,44 @@ skip_alloc:
 	}
 	attr.params = (struct ibv_exp_query_intf_params){
 		.intf_scope = IBV_EXP_INTF_GLOBAL,
-		.intf = IBV_EXP_INTF_QP_BURST,
-		.obj = tmpl.qp,
+		.intf = IBV_EXP_INTF_WQ,
+		.obj = tmpl.wq,
 	};
-	tmpl.if_qp = ibv_exp_query_intf(priv->ctx, &attr.params, &status);
-	if (tmpl.if_qp == NULL) {
-		ERROR("%p: QP interface family query failed with status %d",
+	tmpl.if_wq = ibv_exp_query_intf(priv->ctx, &attr.params, &status);
+	if (tmpl.if_wq == NULL) {
+		ERROR("%p: WQ interface family query failed with status %d",
 		      (void *)dev, status);
 		goto error;
 	}
+	/* Change queue state to ready. */
+	mod = (struct ibv_exp_wq_attr){
+		.attr_mask = IBV_EXP_WQ_ATTR_STATE,
+		.wq_state = IBV_EXP_WQS_RDY,
+	};
+	ret = ibv_exp_modify_wq(tmpl.wq, &mod);
+	if (ret) {
+		ERROR("%p: WQ state to IBV_EXP_WQS_RDY failed: %s",
+		      (void *)dev, strerror(ret));
+		goto error;
+	}
 	/* Post SGEs. */
-	if (!parent && tmpl.sp) {
+	if (tmpl.sp) {
 		struct rxq_elt_sp (*elts)[tmpl.elts_n] = tmpl.elts.sp;
 
 		for (i = 0; (i != RTE_DIM(*elts)); ++i) {
-			ret = tmpl.if_qp->recv_sg_list
-				(tmpl.qp,
+			ret = tmpl.if_wq->recv_sg_list
+				(tmpl.wq,
 				 (*elts)[i].sges,
 				 RTE_DIM((*elts)[i].sges));
 			if (ret)
 				break;
 		}
-	} else if (!parent) {
+	} else {
 		struct rxq_elt (*elts)[tmpl.elts_n] = tmpl.elts.no_sp;
 
 		for (i = 0; (i != RTE_DIM(*elts)); ++i) {
-			ret = tmpl.if_qp->recv_burst(
-				tmpl.qp,
+			ret = tmpl.if_wq->recv_burst(
+				tmpl.wq,
 				&(*elts)[i].sge,
 				1);
 			if (ret)
@@ -935,15 +976,6 @@ skip_alloc:
 		ret = EIO;
 		goto error;
 	}
-	mod = (struct ibv_exp_qp_attr){
-		.qp_state = IBV_QPS_RTR
-	};
-	ret = ibv_exp_modify_qp(tmpl.qp, &mod, IBV_EXP_QP_STATE);
-	if (ret) {
-		ERROR("%p: QP state to IBV_QPS_RTR failed: %s",
-		      (void *)dev, strerror(ret));
-		goto error;
-	}
 	/* Clean up rxq in case we're reinitializing it. */
 	DEBUG("%p: cleaning-up old rxq just in case", (void *)rxq);
 	rxq_cleanup(rxq);
@@ -1047,7 +1079,6 @@ mlx5_rx_queue_release(void *dpdk_rxq)
 		return;
 	priv = rxq->priv;
 	priv_lock(priv);
-	assert(rxq != &priv->rxq_parent);
 	for (i = 0; (i != priv->rxqs_n); ++i)
 		if ((*priv->rxqs)[i] == rxq) {
 			DEBUG("%p: removing RX queue %p from list",
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 06712cb..6469a8d 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -753,7 +753,7 @@ mlx5_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		rxq->stats.ibytes += pkt_buf_len;
 #endif
 repost:
-		ret = rxq->if_qp->recv_sg_list(rxq->qp,
+		ret = rxq->if_wq->recv_sg_list(rxq->wq,
 					       elt->sges,
 					       RTE_DIM(elt->sges));
 		if (unlikely(ret)) {
@@ -911,7 +911,7 @@ repost:
 #ifdef DEBUG_RECV
 	DEBUG("%p: reposting %u WRs", (void *)rxq, i);
 #endif
-	ret = rxq->if_qp->recv_burst(rxq->qp, sges, i);
+	ret = rxq->if_wq->recv_burst(rxq->wq, sges, i);
 	if (unlikely(ret)) {
 		/* Inability to repost WRs is fatal. */
 		DEBUG("%p: recv_burst(): failed (ret=%d)",
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index aec67f6..75f8297 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -99,16 +99,9 @@ struct rxq {
 	struct rte_mempool *mp; /* Memory Pool for allocations. */
 	struct ibv_mr *mr; /* Memory Region (for mp). */
 	struct ibv_cq *cq; /* Completion Queue. */
-	struct ibv_qp *qp; /* Queue Pair. */
-	struct ibv_exp_qp_burst_family *if_qp; /* QP burst interface. */
+	struct ibv_exp_wq *wq; /* Work Queue. */
+	struct ibv_exp_wq_family *if_wq; /* WQ burst interface. */
 	struct ibv_exp_cq_family *if_cq; /* CQ interface. */
-	/*
-	 * Each VLAN ID requires a separate flow steering rule.
-	 */
-	BITFIELD_DECLARE(mac_configured, uint32_t, MLX5_MAX_MAC_ADDRESSES);
-	struct ibv_flow *mac_flow[MLX5_MAX_MAC_ADDRESSES][MLX5_MAX_VLAN_IDS];
-	struct ibv_flow *promisc_flow; /* Promiscuous flow. */
-	struct ibv_flow *allmulti_flow; /* Multicast flow. */
 	unsigned int port_id; /* Port ID for incoming packets. */
 	unsigned int elts_n; /* (*elts)[] length. */
 	unsigned int elts_head; /* Current index in (*elts)[]. */
@@ -125,6 +118,16 @@ struct rxq {
 	struct ibv_exp_res_domain *rd; /* Resource Domain. */
 };
 
+struct hash_rxq {
+	struct priv *priv; /* Back pointer to private data. */
+	struct ibv_qp *qp; /* Hash RX QP. */
+	/* Each VLAN ID requires a separate flow steering rule. */
+	BITFIELD_DECLARE(mac_configured, uint32_t, MLX5_MAX_MAC_ADDRESSES);
+	struct ibv_flow *mac_flow[MLX5_MAX_MAC_ADDRESSES][MLX5_MAX_VLAN_IDS];
+	struct ibv_flow *promisc_flow; /* Promiscuous flow. */
+	struct ibv_flow *allmulti_flow; /* Multicast flow. */
+};
+
 /* TX element. */
 struct txq_elt {
 	struct rte_mbuf *buf;
@@ -169,6 +172,8 @@ struct txq {
 
 /* mlx5_rxq.c */
 
+int priv_create_hash_rxqs(struct priv *);
+void priv_destroy_hash_rxqs(struct priv *);
 void rxq_cleanup(struct rxq *);
 int rxq_rehash(struct rte_eth_dev *, struct rxq *);
 int rxq_setup(struct rte_eth_dev *, struct rxq *, uint16_t, unsigned int,
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index fbc977c..2876ea7 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -60,54 +60,35 @@ int
 mlx5_dev_start(struct rte_eth_dev *dev)
 {
 	struct priv *priv = dev->data->dev_private;
-	unsigned int i = 0;
-	unsigned int r;
-	struct rxq *rxq;
+	int err;
 
 	priv_lock(priv);
 	if (priv->started) {
 		priv_unlock(priv);
 		return 0;
 	}
-	DEBUG("%p: attaching configured flows to all RX queues", (void *)dev);
-	priv->started = 1;
-	if (priv->rss) {
-		rxq = &priv->rxq_parent;
-		r = 1;
-	} else {
-		rxq = (*priv->rxqs)[0];
-		r = priv->rxqs_n;
-	}
-	/* Iterate only once when RSS is enabled. */
-	do {
-		int ret;
-
-		/* Ignore nonexistent RX queues. */
-		if (rxq == NULL)
-			continue;
-		ret = rxq_mac_addrs_add(rxq);
-		if (!ret && priv->promisc)
-			ret = rxq_promiscuous_enable(rxq);
-		if (!ret && priv->allmulti)
-			ret = rxq_allmulticast_enable(rxq);
-		if (!ret)
-			continue;
-		WARN("%p: QP flow attachment failed: %s",
-		     (void *)dev, strerror(ret));
+	DEBUG("%p: allocating and configuring hash RX queues", (void *)dev);
+	err = priv_create_hash_rxqs(priv);
+	if (!err)
+		err = priv_mac_addrs_enable(priv);
+	if (!err && priv->promisc)
+		err = priv_promiscuous_enable(priv);
+	if (!err && priv->allmulti)
+		err = priv_allmulticast_enable(priv);
+	if (!err)
+		priv->started = 1;
+	else {
+		ERROR("%p: an error occurred while configuring hash RX queues:"
+		      " %s",
+		      (void *)priv, strerror(err));
 		/* Rollback. */
-		while (i != 0) {
-			rxq = (*priv->rxqs)[--i];
-			if (rxq != NULL) {
-				rxq_allmulticast_disable(rxq);
-				rxq_promiscuous_disable(rxq);
-				rxq_mac_addrs_del(rxq);
-			}
-		}
-		priv->started = 0;
-		return -ret;
-	} while ((--r) && ((rxq = (*priv->rxqs)[++i]), i));
+		priv_allmulticast_disable(priv);
+		priv_promiscuous_disable(priv);
+		priv_mac_addrs_disable(priv);
+		priv_destroy_hash_rxqs(priv);
+	}
 	priv_unlock(priv);
-	return 0;
+	return -err;
 }
 
 /**
@@ -122,32 +103,17 @@ void
 mlx5_dev_stop(struct rte_eth_dev *dev)
 {
 	struct priv *priv = dev->data->dev_private;
-	unsigned int i = 0;
-	unsigned int r;
-	struct rxq *rxq;
 
 	priv_lock(priv);
 	if (!priv->started) {
 		priv_unlock(priv);
 		return;
 	}
-	DEBUG("%p: detaching flows from all RX queues", (void *)dev);
+	DEBUG("%p: cleaning up and destroying hash RX queues", (void *)dev);
+	priv_allmulticast_disable(priv);
+	priv_promiscuous_disable(priv);
+	priv_mac_addrs_disable(priv);
+	priv_destroy_hash_rxqs(priv);
 	priv->started = 0;
-	if (priv->rss) {
-		rxq = &priv->rxq_parent;
-		r = 1;
-	} else {
-		rxq = (*priv->rxqs)[0];
-		r = priv->rxqs_n;
-	}
-	/* Iterate only once when RSS is enabled. */
-	do {
-		/* Ignore nonexistent RX queues. */
-		if (rxq == NULL)
-			continue;
-		rxq_allmulticast_disable(rxq);
-		rxq_promiscuous_disable(rxq);
-		rxq_mac_addrs_del(rxq);
-	} while ((--r) && ((rxq = (*priv->rxqs)[++i]), i));
 	priv_unlock(priv);
 }
diff --git a/drivers/net/mlx5/mlx5_vlan.c b/drivers/net/mlx5/mlx5_vlan.c
index 60fe06b..2105a81 100644
--- a/drivers/net/mlx5/mlx5_vlan.c
+++ b/drivers/net/mlx5/mlx5_vlan.c
@@ -94,47 +94,25 @@ vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on)
 	if ((on) && (!priv->vlan_filter[j].enabled)) {
 		/*
 		 * Filter is disabled, enable it.
-		 * Rehashing flows in all RX queues is necessary.
+		 * Rehashing flows in all hash RX queues is necessary.
 		 */
-		if (priv->rss)
-			rxq_mac_addrs_del(&priv->rxq_parent);
-		else
-			for (i = 0; (i != priv->rxqs_n); ++i)
-				if ((*priv->rxqs)[i] != NULL)
-					rxq_mac_addrs_del((*priv->rxqs)[i]);
+		for (i = 0; (i != priv->hash_rxqs_n); ++i)
+			hash_rxq_mac_addrs_del(&(*priv->hash_rxqs)[i]);
 		priv->vlan_filter[j].enabled = 1;
-		if (priv->started) {
-			if (priv->rss)
-				rxq_mac_addrs_add(&priv->rxq_parent);
-			else
-				for (i = 0; (i != priv->rxqs_n); ++i) {
-					if ((*priv->rxqs)[i] == NULL)
-						continue;
-					rxq_mac_addrs_add((*priv->rxqs)[i]);
-				}
-		}
+		if (priv->started)
+			for (i = 0; (i != priv->hash_rxqs_n); ++i)
+				hash_rxq_mac_addrs_add(&(*priv->hash_rxqs)[i]);
 	} else if ((!on) && (priv->vlan_filter[j].enabled)) {
 		/*
 		 * Filter is enabled, disable it.
 		 * Rehashing flows in all RX queues is necessary.
 		 */
-		if (priv->rss)
-			rxq_mac_addrs_del(&priv->rxq_parent);
-		else
-			for (i = 0; (i != priv->rxqs_n); ++i)
-				if ((*priv->rxqs)[i] != NULL)
-					rxq_mac_addrs_del((*priv->rxqs)[i]);
+		for (i = 0; (i != priv->hash_rxqs_n); ++i)
+			hash_rxq_mac_addrs_del(&(*priv->hash_rxqs)[i]);
 		priv->vlan_filter[j].enabled = 0;
-		if (priv->started) {
-			if (priv->rss)
-				rxq_mac_addrs_add(&priv->rxq_parent);
-			else
-				for (i = 0; (i != priv->rxqs_n); ++i) {
-					if ((*priv->rxqs)[i] == NULL)
-						continue;
-					rxq_mac_addrs_add((*priv->rxqs)[i]);
-				}
-		}
+		if (priv->started)
+			for (i = 0; (i != priv->hash_rxqs_n); ++i)
+				hash_rxq_mac_addrs_add(&(*priv->hash_rxqs)[i]);
 	}
 	return 0;
 }
-- 
2.1.0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [dpdk-dev] [PATCH 04/17] mlx5: restore allmulti and promisc modes after device restart
  2015-10-05 17:54 [dpdk-dev] [PATCH 00/17] Enhance mlx5 with Mellanox OFED 3.1 Adrien Mazarguil
                   ` (2 preceding siblings ...)
  2015-10-05 17:54 ` [dpdk-dev] [PATCH 03/17] mlx5: refactor RX code for the new Verbs RSS API Adrien Mazarguil
@ 2015-10-05 17:54 ` Adrien Mazarguil
  2015-10-05 17:54 ` [dpdk-dev] [PATCH 05/17] mlx5: use separate indirection table for default hash RX queue Adrien Mazarguil
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 39+ messages in thread
From: Adrien Mazarguil @ 2015-10-05 17:54 UTC (permalink / raw)
  To: dev

These modes are otherwise lost when device is stopped.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx5/mlx5.h         |  2 ++
 drivers/net/mlx5/mlx5_rxmode.c  | 12 ++++--------
 drivers/net/mlx5/mlx5_trigger.c |  4 ++--
 3 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 9720e96..9dcfe89 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -99,7 +99,9 @@ struct priv {
 	uint16_t mtu; /* Configured MTU. */
 	uint8_t port; /* Physical port number. */
 	unsigned int started:1; /* Device started, flows enabled. */
+	unsigned int promisc_req:1; /* Promiscuous mode requested. */
 	unsigned int promisc:1; /* Device in promiscuous mode. */
+	unsigned int allmulti_req:1; /* All multicast mode requested. */
 	unsigned int allmulti:1; /* Device receives all multicast packets. */
 	unsigned int hw_csum:1; /* Checksum offload is supported. */
 	unsigned int hw_csum_l2tun:1; /* Same for L2 tunnels. */
diff --git a/drivers/net/mlx5/mlx5_rxmode.c b/drivers/net/mlx5/mlx5_rxmode.c
index 1f5cd40..7fe7f0e 100644
--- a/drivers/net/mlx5/mlx5_rxmode.c
+++ b/drivers/net/mlx5/mlx5_rxmode.c
@@ -118,9 +118,6 @@ priv_promiscuous_enable(struct priv *priv)
 
 	if (priv->promisc)
 		return 0;
-	/* If device isn't started, this is all we need to do. */
-	if (!priv->started)
-		goto end;
 	for (i = 0; (i != priv->hash_rxqs_n); ++i) {
 		struct hash_rxq *hash_rxq = &(*priv->hash_rxqs)[i];
 		int ret;
@@ -135,7 +132,6 @@ priv_promiscuous_enable(struct priv *priv)
 		}
 		return ret;
 	}
-end:
 	priv->promisc = 1;
 	return 0;
 }
@@ -153,6 +149,7 @@ mlx5_promiscuous_enable(struct rte_eth_dev *dev)
 	int ret;
 
 	priv_lock(priv);
+	priv->promisc_req = 1;
 	ret = priv_promiscuous_enable(priv);
 	if (ret)
 		ERROR("cannot enable promiscuous mode: %s", strerror(ret));
@@ -208,6 +205,7 @@ mlx5_promiscuous_disable(struct rte_eth_dev *dev)
 	struct priv *priv = dev->data->dev_private;
 
 	priv_lock(priv);
+	priv->promisc_req = 0;
 	priv_promiscuous_disable(priv);
 	priv_unlock(priv);
 }
@@ -267,9 +265,6 @@ priv_allmulticast_enable(struct priv *priv)
 
 	if (priv->allmulti)
 		return 0;
-	/* If device isn't started, this is all we need to do. */
-	if (!priv->started)
-		goto end;
 	for (i = 0; (i != priv->hash_rxqs_n); ++i) {
 		struct hash_rxq *hash_rxq = &(*priv->hash_rxqs)[i];
 		int ret;
@@ -284,7 +279,6 @@ priv_allmulticast_enable(struct priv *priv)
 		}
 		return ret;
 	}
-end:
 	priv->allmulti = 1;
 	return 0;
 }
@@ -302,6 +296,7 @@ mlx5_allmulticast_enable(struct rte_eth_dev *dev)
 	int ret;
 
 	priv_lock(priv);
+	priv->allmulti_req = 1;
 	ret = priv_allmulticast_enable(priv);
 	if (ret)
 		ERROR("cannot enable allmulticast mode: %s", strerror(ret));
@@ -355,6 +350,7 @@ mlx5_allmulticast_disable(struct rte_eth_dev *dev)
 	struct priv *priv = dev->data->dev_private;
 
 	priv_lock(priv);
+	priv->allmulti_req = 0;
 	priv_allmulticast_disable(priv);
 	priv_unlock(priv);
 }
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 2876ea7..233c0d8 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -71,9 +71,9 @@ mlx5_dev_start(struct rte_eth_dev *dev)
 	err = priv_create_hash_rxqs(priv);
 	if (!err)
 		err = priv_mac_addrs_enable(priv);
-	if (!err && priv->promisc)
+	if (!err && priv->promisc_req)
 		err = priv_promiscuous_enable(priv);
-	if (!err && priv->allmulti)
+	if (!err && priv->allmulti_req)
 		err = priv_allmulticast_enable(priv);
 	if (!err)
 		priv->started = 1;
-- 
2.1.0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [dpdk-dev] [PATCH 05/17] mlx5: use separate indirection table for default hash RX queue
  2015-10-05 17:54 [dpdk-dev] [PATCH 00/17] Enhance mlx5 with Mellanox OFED 3.1 Adrien Mazarguil
                   ` (3 preceding siblings ...)
  2015-10-05 17:54 ` [dpdk-dev] [PATCH 04/17] mlx5: restore allmulti and promisc modes after device restart Adrien Mazarguil
@ 2015-10-05 17:54 ` Adrien Mazarguil
  2015-10-05 17:54 ` [dpdk-dev] [PATCH 06/17] mlx5: adapt indirection table size depending on RX queues number Adrien Mazarguil
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 39+ messages in thread
From: Adrien Mazarguil @ 2015-10-05 17:54 UTC (permalink / raw)
  To: dev

From: Olga Shern <olgas@mellanox.com>

The default hash RX queue handles packets that are not matched by more
specific types and requires its own indirection table of size 1 to work
properly.

This commit implements support for multiple indirection tables by grouping
their layout and properties in a static initialization table.

Signed-off-by: Olga Shern <olgas@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx5/mlx5.h      |   5 +-
 drivers/net/mlx5/mlx5_rxq.c  | 282 +++++++++++++++++++++++++++++++++----------
 drivers/net/mlx5/mlx5_rxtx.h |  25 ++++
 3 files changed, 247 insertions(+), 65 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 9dcfe89..08900f5 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -111,8 +111,9 @@ struct priv {
 	unsigned int txqs_n; /* TX queues array size. */
 	struct rxq *(*rxqs)[]; /* RX queues. */
 	struct txq *(*txqs)[]; /* TX queues. */
-	/* Indirection table referencing all RX WQs. */
-	struct ibv_exp_rwq_ind_table *ind_table;
+	/* Indirection tables referencing all RX WQs. */
+	struct ibv_exp_rwq_ind_table *(*ind_tables)[];
+	unsigned int ind_tables_n; /* Number of indirection tables. */
 	/* Hash RX QPs feeding the indirection table. */
 	struct hash_rxq (*hash_rxqs)[];
 	unsigned int hash_rxqs_n; /* Hash RX QPs array size. */
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 5392221..b5084f8 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -64,6 +64,52 @@
 #include "mlx5_utils.h"
 #include "mlx5_defs.h"
 
+/* Initialization data for hash RX queues. */
+static const struct hash_rxq_init hash_rxq_init[] = {
+	[HASH_RXQ_TCPv4] = {
+		.hash_fields = (IBV_EXP_RX_HASH_SRC_IPV4 |
+				IBV_EXP_RX_HASH_DST_IPV4 |
+				IBV_EXP_RX_HASH_SRC_PORT_TCP |
+				IBV_EXP_RX_HASH_DST_PORT_TCP),
+	},
+	[HASH_RXQ_UDPv4] = {
+		.hash_fields = (IBV_EXP_RX_HASH_SRC_IPV4 |
+				IBV_EXP_RX_HASH_DST_IPV4 |
+				IBV_EXP_RX_HASH_SRC_PORT_UDP |
+				IBV_EXP_RX_HASH_DST_PORT_UDP),
+	},
+	[HASH_RXQ_IPv4] = {
+		.hash_fields = (IBV_EXP_RX_HASH_SRC_IPV4 |
+				IBV_EXP_RX_HASH_DST_IPV4),
+	},
+	[HASH_RXQ_ETH] = {
+		.hash_fields = 0,
+	},
+};
+
+/* Number of entries in hash_rxq_init[]. */
+static const unsigned int hash_rxq_init_n = RTE_DIM(hash_rxq_init);
+
+/* Initialization data for hash RX queue indirection tables. */
+static const struct ind_table_init ind_table_init[] = {
+	{
+		.max_size = -1u, /* Superseded by HW limitations. */
+		.hash_types =
+			1 << HASH_RXQ_TCPv4 |
+			1 << HASH_RXQ_UDPv4 |
+			1 << HASH_RXQ_IPv4 |
+			0,
+		.hash_types_n = 3,
+	},
+	{
+		.max_size = 1,
+		.hash_types = 1 << HASH_RXQ_ETH,
+		.hash_types_n = 1,
+	},
+};
+
+#define IND_TABLE_INIT_N RTE_DIM(ind_table_init)
+
 /* Default RSS hash key also used for ConnectX-3. */
 static uint8_t hash_rxq_default_key[] = {
 	0x2c, 0xc6, 0x81, 0xd1,
@@ -99,6 +145,74 @@ log2above(unsigned int v)
 }
 
 /**
+ * Return the type corresponding to the n'th bit set.
+ *
+ * @param table
+ *   The indirection table.
+ * @param n
+ *   The n'th bit set.
+ *
+ * @return
+ *   The corresponding hash_rxq_type.
+ */
+static enum hash_rxq_type
+hash_rxq_type_from_n(const struct ind_table_init *table, unsigned int n)
+{
+	assert(n < table->hash_types_n);
+	while (((table->hash_types >> n) & 0x1) == 0)
+		++n;
+	return n;
+}
+
+/**
+ * Filter out disabled hash RX queue types from ind_table_init[].
+ *
+ * @param priv
+ *   Pointer to private structure.
+ * @param[out] table
+ *   Output table.
+ *
+ * @return
+ *   Number of table entries.
+ */
+static unsigned int
+priv_make_ind_table_init(struct priv *priv,
+			 struct ind_table_init (*table)[IND_TABLE_INIT_N])
+{
+	unsigned int i;
+	unsigned int j;
+	unsigned int table_n = 0;
+	/* Mandatory to receive frames not handled by normal hash RX queues. */
+	unsigned int hash_types_sup = 1 << HASH_RXQ_ETH;
+
+	/* Process other protocols only if more than one queue. */
+	if (priv->rxqs_n > 1)
+		for (i = 0; (i != hash_rxq_init_n); ++i)
+			if (hash_rxq_init[i].hash_fields)
+				hash_types_sup |= (1 << i);
+
+	/* Filter out entries whose protocols are not in the set. */
+	for (i = 0, j = 0; (i != IND_TABLE_INIT_N); ++i) {
+		unsigned int nb;
+		unsigned int h;
+
+		/* j is increased only if the table has valid protocols. */
+		assert(j <= i);
+		(*table)[j] = ind_table_init[i];
+		(*table)[j].hash_types &= hash_types_sup;
+		for (h = 0, nb = 0; (h != hash_rxq_init_n); ++h)
+			if (((*table)[j].hash_types >> h) & 0x1)
+				++nb;
+		(*table)[i].hash_types_n = nb;
+		if (nb) {
+			++table_n;
+			++j;
+		}
+	}
+	return table_n;
+}
+
+/**
  * Initialize hash RX queues and indirection table.
  *
  * @param priv
@@ -110,29 +224,21 @@ log2above(unsigned int v)
 int
 priv_create_hash_rxqs(struct priv *priv)
 {
-	static const uint64_t rss_hash_table[] = {
-		/* TCPv4. */
-		(IBV_EXP_RX_HASH_SRC_IPV4 | IBV_EXP_RX_HASH_DST_IPV4 |
-		 IBV_EXP_RX_HASH_SRC_PORT_TCP | IBV_EXP_RX_HASH_DST_PORT_TCP),
-		/* UDPv4. */
-		(IBV_EXP_RX_HASH_SRC_IPV4 | IBV_EXP_RX_HASH_DST_IPV4 |
-		 IBV_EXP_RX_HASH_SRC_PORT_UDP | IBV_EXP_RX_HASH_DST_PORT_UDP),
-		/* TCPv6. */
-		(IBV_EXP_RX_HASH_SRC_IPV6 | IBV_EXP_RX_HASH_DST_IPV6 |
-		 IBV_EXP_RX_HASH_SRC_PORT_TCP | IBV_EXP_RX_HASH_DST_PORT_TCP),
-		/* UDPv6. */
-		(IBV_EXP_RX_HASH_SRC_IPV6 | IBV_EXP_RX_HASH_DST_IPV6 |
-		 IBV_EXP_RX_HASH_SRC_PORT_UDP | IBV_EXP_RX_HASH_DST_PORT_UDP),
-		/* Other IPv4. */
-		(IBV_EXP_RX_HASH_SRC_IPV4 | IBV_EXP_RX_HASH_DST_IPV4),
-		/* Other IPv6. */
-		(IBV_EXP_RX_HASH_SRC_IPV6 | IBV_EXP_RX_HASH_DST_IPV6),
-		/* None, used for everything else. */
-		0,
-	};
+	unsigned int wqs_n = (1 << log2above(priv->rxqs_n));
+	struct ibv_exp_wq *wqs[wqs_n];
+	struct ind_table_init ind_table_init[IND_TABLE_INIT_N];
+	unsigned int ind_tables_n =
+		priv_make_ind_table_init(priv, &ind_table_init);
+	unsigned int hash_rxqs_n = 0;
+	struct hash_rxq (*hash_rxqs)[] = NULL;
+	struct ibv_exp_rwq_ind_table *(*ind_tables)[] = NULL;
+	unsigned int i;
+	unsigned int j;
+	unsigned int k;
+	int err = 0;
 
-	DEBUG("allocating hash RX queues for %u WQs", priv->rxqs_n);
-	assert(priv->ind_table == NULL);
+	assert(priv->ind_tables == NULL);
+	assert(priv->ind_tables_n == 0);
 	assert(priv->hash_rxqs == NULL);
 	assert(priv->hash_rxqs_n == 0);
 	assert(priv->pd != NULL);
@@ -140,26 +246,11 @@ priv_create_hash_rxqs(struct priv *priv)
 	if (priv->rxqs_n == 0)
 		return EINVAL;
 	assert(priv->rxqs != NULL);
-
-	/* FIXME: large data structures are allocated on the stack. */
-	unsigned int wqs_n = (1 << log2above(priv->rxqs_n));
-	struct ibv_exp_wq *wqs[wqs_n];
-	struct ibv_exp_rwq_ind_table_init_attr ind_init_attr = {
-		.pd = priv->pd,
-		.log_ind_tbl_size = log2above(priv->rxqs_n),
-		.ind_tbl = wqs,
-		.comp_mask = 0,
-	};
-	struct ibv_exp_rwq_ind_table *ind_table = NULL;
-	/* If only one RX queue is configured, RSS is not needed and a single
-	 * empty hash entry is used (last rss_hash_table[] entry). */
-	unsigned int hash_rxqs_n =
-		((priv->rxqs_n == 1) ? 1 : RTE_DIM(rss_hash_table));
-	struct hash_rxq (*hash_rxqs)[hash_rxqs_n] = NULL;
-	unsigned int i;
-	unsigned int j;
-	int err = 0;
-
+	if (ind_tables_n == 0) {
+		ERROR("all hash RX queue types have been filtered out,"
+		      " indirection table cannot be created");
+		return EINVAL;
+	}
 	if (wqs_n < priv->rxqs_n) {
 		ERROR("cannot handle this many RX queues (%u)", priv->rxqs_n);
 		err = ERANGE;
@@ -178,9 +269,40 @@ priv_create_hash_rxqs(struct priv *priv)
 		if (++j == priv->rxqs_n)
 			j = 0;
 	}
-	errno = 0;
-	ind_table = ibv_exp_create_rwq_ind_table(priv->ctx, &ind_init_attr);
-	if (ind_table == NULL) {
+	/* Get number of hash RX queues to configure. */
+	for (i = 0, hash_rxqs_n = 0; (i != ind_tables_n); ++i)
+		hash_rxqs_n += ind_table_init[i].hash_types_n;
+	DEBUG("allocating %u hash RX queues for %u WQs, %u indirection tables",
+	      hash_rxqs_n, priv->rxqs_n, ind_tables_n);
+	/* Create indirection tables. */
+	ind_tables = rte_calloc(__func__, ind_tables_n,
+				sizeof((*ind_tables)[0]), 0);
+	if (ind_tables == NULL) {
+		err = ENOMEM;
+		ERROR("cannot allocate indirection tables container: %s",
+		      strerror(err));
+		goto error;
+	}
+	for (i = 0; (i != ind_tables_n); ++i) {
+		struct ibv_exp_rwq_ind_table_init_attr ind_init_attr = {
+			.pd = priv->pd,
+			.log_ind_tbl_size = 0, /* Set below. */
+			.ind_tbl = wqs,
+			.comp_mask = 0,
+		};
+		unsigned int ind_tbl_size = ind_table_init[i].max_size;
+		struct ibv_exp_rwq_ind_table *ind_table;
+
+		if (wqs_n < ind_tbl_size)
+			ind_tbl_size = wqs_n;
+		ind_init_attr.log_ind_tbl_size = log2above(ind_tbl_size);
+		errno = 0;
+		ind_table = ibv_exp_create_rwq_ind_table(priv->ctx,
+							 &ind_init_attr);
+		if (ind_table != NULL) {
+			(*ind_tables)[i] = ind_table;
+			continue;
+		}
 		/* Not clear whether errno is set. */
 		err = (errno ? errno : EINVAL);
 		ERROR("RX indirection table creation failed with error %d: %s",
@@ -188,24 +310,26 @@ priv_create_hash_rxqs(struct priv *priv)
 		goto error;
 	}
 	/* Allocate array that holds hash RX queues and related data. */
-	hash_rxqs = rte_malloc(__func__, sizeof(*hash_rxqs), 0);
+	hash_rxqs = rte_calloc(__func__, hash_rxqs_n,
+			       sizeof((*hash_rxqs)[0]), 0);
 	if (hash_rxqs == NULL) {
 		err = ENOMEM;
 		ERROR("cannot allocate hash RX queues container: %s",
 		      strerror(err));
 		goto error;
 	}
-	for (i = 0, j = (RTE_DIM(rss_hash_table) - hash_rxqs_n);
-	     (j != RTE_DIM(rss_hash_table));
-	     ++i, ++j) {
+	for (i = 0, j = 0, k = 0;
+	     ((i != hash_rxqs_n) && (j != ind_tables_n));
+	     ++i) {
 		struct hash_rxq *hash_rxq = &(*hash_rxqs)[i];
-
+		enum hash_rxq_type type =
+			hash_rxq_type_from_n(&ind_table_init[j], k);
 		struct ibv_exp_rx_hash_conf hash_conf = {
 			.rx_hash_function = IBV_EXP_RX_HASH_FUNC_TOEPLITZ,
 			.rx_hash_key_len = sizeof(hash_rxq_default_key),
 			.rx_hash_key = hash_rxq_default_key,
-			.rx_hash_fields_mask = rss_hash_table[j],
-			.rwq_ind_tbl = ind_table,
+			.rx_hash_fields_mask = hash_rxq_init[type].hash_fields,
+			.rwq_ind_tbl = (*ind_tables)[j],
 		};
 		struct ibv_exp_qp_init_attr qp_init_attr = {
 			.max_inl_recv = 0, /* Currently not supported. */
@@ -217,30 +341,54 @@ priv_create_hash_rxqs(struct priv *priv)
 			.port_num = priv->port,
 		};
 
+		DEBUG("using indirection table %u for hash RX queue %u",
+		      j, i);
 		*hash_rxq = (struct hash_rxq){
 			.priv = priv,
 			.qp = ibv_exp_create_qp(priv->ctx, &qp_init_attr),
+			.type = type,
 		};
 		if (hash_rxq->qp == NULL) {
 			err = (errno ? errno : EINVAL);
 			ERROR("Hash RX QP creation failure: %s",
 			      strerror(err));
-			while (i) {
-				hash_rxq = &(*hash_rxqs)[--i];
-				claim_zero(ibv_destroy_qp(hash_rxq->qp));
-			}
 			goto error;
 		}
+		if (++k < ind_table_init[j].hash_types_n)
+			continue;
+		/* Switch to the next indirection table and reset hash RX
+		 * queue type array index. */
+		++j;
+		k = 0;
 	}
-	priv->ind_table = ind_table;
+	priv->ind_tables = ind_tables;
+	priv->ind_tables_n = ind_tables_n;
 	priv->hash_rxqs = hash_rxqs;
 	priv->hash_rxqs_n = hash_rxqs_n;
 	assert(err == 0);
 	return 0;
 error:
-	rte_free(hash_rxqs);
-	if (ind_table != NULL)
-		claim_zero(ibv_exp_destroy_rwq_ind_table(ind_table));
+	if (hash_rxqs != NULL) {
+		for (i = 0; (i != hash_rxqs_n); ++i) {
+			struct ibv_qp *qp = (*hash_rxqs)[i].qp;
+
+			if (qp == NULL)
+				continue;
+			claim_zero(ibv_destroy_qp(qp));
+		}
+		rte_free(hash_rxqs);
+	}
+	if (ind_tables != NULL) {
+		for (j = 0; (j != ind_tables_n); ++j) {
+			struct ibv_exp_rwq_ind_table *ind_table =
+				(*ind_tables)[j];
+
+			if (ind_table == NULL)
+				continue;
+			claim_zero(ibv_exp_destroy_rwq_ind_table(ind_table));
+		}
+		rte_free(ind_tables);
+	}
 	return err;
 }
 
@@ -258,7 +406,7 @@ priv_destroy_hash_rxqs(struct priv *priv)
 	DEBUG("destroying %u hash RX queues", priv->hash_rxqs_n);
 	if (priv->hash_rxqs_n == 0) {
 		assert(priv->hash_rxqs == NULL);
-		assert(priv->ind_table == NULL);
+		assert(priv->ind_tables == NULL);
 		return;
 	}
 	for (i = 0; (i != priv->hash_rxqs_n); ++i) {
@@ -278,8 +426,16 @@ priv_destroy_hash_rxqs(struct priv *priv)
 	priv->hash_rxqs_n = 0;
 	rte_free(priv->hash_rxqs);
 	priv->hash_rxqs = NULL;
-	claim_zero(ibv_exp_destroy_rwq_ind_table(priv->ind_table));
-	priv->ind_table = NULL;
+	for (i = 0; (i != priv->ind_tables_n); ++i) {
+		struct ibv_exp_rwq_ind_table *ind_table =
+			(*priv->ind_tables)[i];
+
+		assert(ind_table != NULL);
+		claim_zero(ibv_exp_destroy_rwq_ind_table(ind_table));
+	}
+	priv->ind_tables_n = 0;
+	rte_free(priv->ind_tables);
+	priv->ind_tables = NULL;
 }
 
 /**
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 75f8297..d9fa13e 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -118,9 +118,34 @@ struct rxq {
 	struct ibv_exp_res_domain *rd; /* Resource Domain. */
 };
 
+/* Hash RX queue types. */
+enum hash_rxq_type {
+	HASH_RXQ_TCPv4,
+	HASH_RXQ_UDPv4,
+	HASH_RXQ_IPv4,
+	HASH_RXQ_TCPv6,
+	HASH_RXQ_UDPv6,
+	HASH_RXQ_IPv6,
+	HASH_RXQ_ETH,
+};
+
+/* Initialization data for hash RX queue. */
+struct hash_rxq_init {
+	uint64_t hash_fields; /* Fields that participate in the hash. */
+};
+
+/* Initialization data for indirection table. */
+struct ind_table_init {
+	unsigned int max_size; /* Maximum number of WQs. */
+	/* Hash RX queues using this table. */
+	unsigned int hash_types;
+	unsigned int hash_types_n;
+};
+
 struct hash_rxq {
 	struct priv *priv; /* Back pointer to private data. */
 	struct ibv_qp *qp; /* Hash RX QP. */
+	enum hash_rxq_type type; /* Hash RX queue type. */
 	/* Each VLAN ID requires a separate flow steering rule. */
 	BITFIELD_DECLARE(mac_configured, uint32_t, MLX5_MAX_MAC_ADDRESSES);
 	struct ibv_flow *mac_flow[MLX5_MAX_MAC_ADDRESSES][MLX5_MAX_VLAN_IDS];
-- 
2.1.0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [dpdk-dev] [PATCH 06/17] mlx5: adapt indirection table size depending on RX queues number
  2015-10-05 17:54 [dpdk-dev] [PATCH 00/17] Enhance mlx5 with Mellanox OFED 3.1 Adrien Mazarguil
                   ` (4 preceding siblings ...)
  2015-10-05 17:54 ` [dpdk-dev] [PATCH 05/17] mlx5: use separate indirection table for default hash RX queue Adrien Mazarguil
@ 2015-10-05 17:54 ` Adrien Mazarguil
  2015-10-05 17:54 ` [dpdk-dev] [PATCH 07/17] mlx5: define specific flow steering rules for each hash RX QP Adrien Mazarguil
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 39+ messages in thread
From: Adrien Mazarguil @ 2015-10-05 17:54 UTC (permalink / raw)
  To: dev

From: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>

Use the maximum size of the indirection table when the number of requested
RX queues is not a power of two, this help to improve RSS balancing.

A message informs users that balancing is not optimal in such cases.

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx5/mlx5.c      | 10 +++++++++-
 drivers/net/mlx5/mlx5.h      |  1 +
 drivers/net/mlx5/mlx5_defs.h |  3 +++
 drivers/net/mlx5/mlx5_rxq.c  | 21 ++++++++++++++-------
 4 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index a316989..167e14b 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -301,7 +301,9 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		struct ether_addr mac;
 
 #ifdef HAVE_EXP_QUERY_DEVICE
-		exp_device_attr.comp_mask = IBV_EXP_DEVICE_ATTR_EXP_CAP_FLAGS;
+		exp_device_attr.comp_mask =
+			IBV_EXP_DEVICE_ATTR_EXP_CAP_FLAGS |
+			IBV_EXP_DEVICE_ATTR_RX_HASH;
 #endif /* HAVE_EXP_QUERY_DEVICE */
 
 		DEBUG("using port %u (%08" PRIx32 ")", port, test);
@@ -365,6 +367,12 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		DEBUG("L2 tunnel checksum offloads are %ssupported",
 		      (priv->hw_csum_l2tun ? "" : "not "));
 
+		priv->ind_table_max_size = exp_device_attr.rx_hash_caps.max_rwq_indirection_table_size;
+		DEBUG("maximum RX indirection table size is %u",
+		      priv->ind_table_max_size);
+
+#else /* HAVE_EXP_QUERY_DEVICE */
+		priv->ind_table_max_size = RSS_INDIRECTION_TABLE_SIZE;
 #endif /* HAVE_EXP_QUERY_DEVICE */
 
 		priv->vf = vf;
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 08900f5..b099dac 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -114,6 +114,7 @@ struct priv {
 	/* Indirection tables referencing all RX WQs. */
 	struct ibv_exp_rwq_ind_table *(*ind_tables)[];
 	unsigned int ind_tables_n; /* Number of indirection tables. */
+	unsigned int ind_table_max_size; /* Maximum indirection table size. */
 	/* Hash RX QPs feeding the indirection table. */
 	struct hash_rxq (*hash_rxqs)[];
 	unsigned int hash_rxqs_n; /* Hash RX QPs array size. */
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index 79de609..e697764 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -53,6 +53,9 @@
 /* Request send completion once in every 64 sends, might be less. */
 #define MLX5_PMD_TX_PER_COMP_REQ 64
 
+/* RSS Indirection table size. */
+#define RSS_INDIRECTION_TABLE_SIZE 128
+
 /* Maximum number of Scatter/Gather Elements per Work Request. */
 #ifndef MLX5_PMD_SGE_WR_N
 #define MLX5_PMD_SGE_WR_N 4
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index b5084f8..606367c 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -224,7 +224,13 @@ priv_make_ind_table_init(struct priv *priv,
 int
 priv_create_hash_rxqs(struct priv *priv)
 {
-	unsigned int wqs_n = (1 << log2above(priv->rxqs_n));
+	/* If the requested number of WQs is not a power of two, use the
+	 * maximum indirection table size for better balancing.
+	 * The result is always rounded to the next power of two. */
+	unsigned int wqs_n =
+		(1 << log2above((priv->rxqs_n & (priv->rxqs_n - 1)) ?
+				priv->ind_table_max_size :
+				priv->rxqs_n));
 	struct ibv_exp_wq *wqs[wqs_n];
 	struct ind_table_init ind_table_init[IND_TABLE_INIT_N];
 	unsigned int ind_tables_n =
@@ -251,16 +257,17 @@ priv_create_hash_rxqs(struct priv *priv)
 		      " indirection table cannot be created");
 		return EINVAL;
 	}
-	if (wqs_n < priv->rxqs_n) {
+	if ((wqs_n < priv->rxqs_n) || (wqs_n > priv->ind_table_max_size)) {
 		ERROR("cannot handle this many RX queues (%u)", priv->rxqs_n);
 		err = ERANGE;
 		goto error;
 	}
-	if (wqs_n != priv->rxqs_n)
-		WARN("%u RX queues are configured, consider rounding this"
-		     " number to the next power of two (%u) for optimal"
-		     " performance",
-		     priv->rxqs_n, wqs_n);
+	if (wqs_n != priv->rxqs_n) {
+		INFO("%u RX queues are configured, consider rounding this"
+		     " number to the next power of two for better balancing",
+		     priv->rxqs_n);
+		DEBUG("indirection table extended to assume %u WQs", wqs_n);
+	}
 	/* When the number of RX queues is not a power of two, the remaining
 	 * table entries are padded with reused WQs and hashes are not spread
 	 * uniformly. */
-- 
2.1.0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [dpdk-dev] [PATCH 07/17] mlx5: define specific flow steering rules for each hash RX QP
  2015-10-05 17:54 [dpdk-dev] [PATCH 00/17] Enhance mlx5 with Mellanox OFED 3.1 Adrien Mazarguil
                   ` (5 preceding siblings ...)
  2015-10-05 17:54 ` [dpdk-dev] [PATCH 06/17] mlx5: adapt indirection table size depending on RX queues number Adrien Mazarguil
@ 2015-10-05 17:54 ` Adrien Mazarguil
  2015-10-05 17:54 ` [dpdk-dev] [PATCH 08/17] mlx5: use alternate method to configure promiscuous mode Adrien Mazarguil
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 39+ messages in thread
From: Adrien Mazarguil @ 2015-10-05 17:54 UTC (permalink / raw)
  To: dev

From: Olga Shern <olgas@mellanox.com>

All hash RX QPs currently use the same flow steering rule (L2 MAC filtering)
regardless of their type (TCP, UDP, IPv4, IPv6), which prevents them from
being dispatched properly. This is fixed by adding flow information to the
hash RX queue initialization data and generating specific flow steering
rules for each of them.

Signed-off-by: Olga Shern <olgas@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx5/mlx5_mac.c  | 22 +++++--------
 drivers/net/mlx5/mlx5_rxq.c  | 78 ++++++++++++++++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_rxtx.h | 21 ++++++++++++
 3 files changed, 107 insertions(+), 14 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_mac.c b/drivers/net/mlx5/mlx5_mac.c
index 971f2cd..bf69095 100644
--- a/drivers/net/mlx5/mlx5_mac.c
+++ b/drivers/net/mlx5/mlx5_mac.c
@@ -251,14 +251,10 @@ hash_rxq_add_flow(struct hash_rxq *hash_rxq, unsigned int mac_index,
 	const uint8_t (*mac)[ETHER_ADDR_LEN] =
 			(const uint8_t (*)[ETHER_ADDR_LEN])
 			priv->mac[mac_index].addr_bytes;
-
-	/* Allocate flow specification on the stack. */
-	struct __attribute__((packed)) {
-		struct ibv_flow_attr attr;
-		struct ibv_flow_spec_eth spec;
-	} data;
-	struct ibv_flow_attr *attr = &data.attr;
-	struct ibv_flow_spec_eth *spec = &data.spec;
+	FLOW_ATTR_SPEC_ETH(data, priv_populate_flow_attr(priv, NULL, 0,
+							 hash_rxq->type));
+	struct ibv_flow_attr *attr = &data->attr;
+	struct ibv_flow_spec_eth *spec = &data->spec;
 
 	assert(mac_index < RTE_DIM(priv->mac));
 	assert((vlan_index < RTE_DIM(priv->vlan_filter)) || (vlan_index == -1u));
@@ -267,12 +263,10 @@ hash_rxq_add_flow(struct hash_rxq *hash_rxq, unsigned int mac_index,
 	 * This layout is expected by libibverbs.
 	 */
 	assert(((uint8_t *)attr + sizeof(*attr)) == (uint8_t *)spec);
-	*attr = (struct ibv_flow_attr){
-		.type = IBV_FLOW_ATTR_NORMAL,
-		.num_of_specs = 1,
-		.port = priv->port,
-		.flags = 0
-	};
+	priv_populate_flow_attr(priv, attr, sizeof(data), hash_rxq->type);
+	/* The first specification must be Ethernet. */
+	assert(spec->type == IBV_FLOW_SPEC_ETH);
+	assert(spec->size == sizeof(*spec));
 	*spec = (struct ibv_flow_spec_eth){
 		.type = IBV_FLOW_SPEC_ETH,
 		.size = sizeof(*spec),
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 606367c..7f25688 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -71,19 +71,43 @@ static const struct hash_rxq_init hash_rxq_init[] = {
 				IBV_EXP_RX_HASH_DST_IPV4 |
 				IBV_EXP_RX_HASH_SRC_PORT_TCP |
 				IBV_EXP_RX_HASH_DST_PORT_TCP),
+		.flow_priority = 0,
+		.flow_spec.tcp_udp = {
+			.type = IBV_FLOW_SPEC_TCP,
+			.size = sizeof(hash_rxq_init[0].flow_spec.tcp_udp),
+		},
+		.underlayer = &hash_rxq_init[HASH_RXQ_IPv4],
 	},
 	[HASH_RXQ_UDPv4] = {
 		.hash_fields = (IBV_EXP_RX_HASH_SRC_IPV4 |
 				IBV_EXP_RX_HASH_DST_IPV4 |
 				IBV_EXP_RX_HASH_SRC_PORT_UDP |
 				IBV_EXP_RX_HASH_DST_PORT_UDP),
+		.flow_priority = 0,
+		.flow_spec.tcp_udp = {
+			.type = IBV_FLOW_SPEC_UDP,
+			.size = sizeof(hash_rxq_init[0].flow_spec.tcp_udp),
+		},
+		.underlayer = &hash_rxq_init[HASH_RXQ_IPv4],
 	},
 	[HASH_RXQ_IPv4] = {
 		.hash_fields = (IBV_EXP_RX_HASH_SRC_IPV4 |
 				IBV_EXP_RX_HASH_DST_IPV4),
+		.flow_priority = 1,
+		.flow_spec.ipv4 = {
+			.type = IBV_FLOW_SPEC_IPV4,
+			.size = sizeof(hash_rxq_init[0].flow_spec.ipv4),
+		},
+		.underlayer = &hash_rxq_init[HASH_RXQ_ETH],
 	},
 	[HASH_RXQ_ETH] = {
 		.hash_fields = 0,
+		.flow_priority = 2,
+		.flow_spec.eth = {
+			.type = IBV_FLOW_SPEC_ETH,
+			.size = sizeof(hash_rxq_init[0].flow_spec.eth),
+		},
+		.underlayer = NULL,
 	},
 };
 
@@ -125,6 +149,60 @@ static uint8_t hash_rxq_default_key[] = {
 };
 
 /**
+ * Populate flow steering rule for a given hash RX queue type using
+ * information from hash_rxq_init[]. Nothing is written to flow_attr when
+ * flow_attr_size is not large enough, but the required size is still returned.
+ *
+ * @param[in] priv
+ *   Pointer to private structure.
+ * @param[out] flow_attr
+ *   Pointer to flow attribute structure to fill. Note that the allocated
+ *   area must be larger and large enough to hold all flow specifications.
+ * @param flow_attr_size
+ *   Entire size of flow_attr and trailing room for flow specifications.
+ * @param type
+ *   Requested hash RX queue type.
+ *
+ * @return
+ *   Total size of the flow attribute buffer. No errors are defined.
+ */
+size_t
+priv_populate_flow_attr(const struct priv *priv,
+			struct ibv_flow_attr *flow_attr,
+			size_t flow_attr_size,
+			enum hash_rxq_type type)
+{
+	size_t offset = sizeof(*flow_attr);
+	const struct hash_rxq_init *init = &hash_rxq_init[type];
+
+	assert((size_t)type < RTE_DIM(hash_rxq_init));
+	do {
+		offset += init->flow_spec.hdr.size;
+		init = init->underlayer;
+	} while (init != NULL);
+	if (offset > flow_attr_size)
+		return offset;
+	flow_attr_size = offset;
+	init = &hash_rxq_init[type];
+	*flow_attr = (struct ibv_flow_attr){
+		.type = IBV_FLOW_ATTR_NORMAL,
+		.priority = init->flow_priority,
+		.num_of_specs = 0,
+		.port = priv->port,
+		.flags = 0,
+	};
+	do {
+		offset -= init->flow_spec.hdr.size;
+		memcpy((void *)((uintptr_t)flow_attr + offset),
+		       &init->flow_spec,
+		       init->flow_spec.hdr.size);
+		++flow_attr->num_of_specs;
+		init = init->underlayer;
+	} while (init != NULL);
+	return flow_attr_size;
+}
+
+/**
  * Return nearest power of two above input value.
  *
  * @param v
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index d9fa13e..00b5d6c 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -34,6 +34,7 @@
 #ifndef RTE_PMD_MLX5_RXTX_H_
 #define RTE_PMD_MLX5_RXTX_H_
 
+#include <stddef.h>
 #include <stdint.h>
 
 /* Verbs header. */
@@ -129,9 +130,27 @@ enum hash_rxq_type {
 	HASH_RXQ_ETH,
 };
 
+/* Flow structure with Ethernet specification. It is packed to prevent padding
+ * between attr and spec as this layout is expected by libibverbs. */
+struct flow_attr_spec_eth {
+	struct ibv_flow_attr attr;
+	struct ibv_flow_spec_eth spec;
+} __attribute__((packed));
+
+/* Define a struct flow_attr_spec_eth object as an array of at least
+ * "size" bytes. Room after the first index is normally used to store
+ * extra flow specifications. */
+#define FLOW_ATTR_SPEC_ETH(name, size) \
+	struct flow_attr_spec_eth name \
+		[((size) / sizeof(struct flow_attr_spec_eth)) + \
+		 !!((size) % sizeof(struct flow_attr_spec_eth))]
+
 /* Initialization data for hash RX queue. */
 struct hash_rxq_init {
 	uint64_t hash_fields; /* Fields that participate in the hash. */
+	unsigned int flow_priority; /* Flow priority to use. */
+	struct ibv_flow_spec flow_spec; /* Flow specification template. */
+	const struct hash_rxq_init *underlayer; /* Pointer to underlayer. */
 };
 
 /* Initialization data for indirection table. */
@@ -197,6 +216,8 @@ struct txq {
 
 /* mlx5_rxq.c */
 
+size_t priv_populate_flow_attr(const struct priv *, struct ibv_flow_attr *,
+			       size_t, enum hash_rxq_type);
 int priv_create_hash_rxqs(struct priv *);
 void priv_destroy_hash_rxqs(struct priv *);
 void rxq_cleanup(struct rxq *);
-- 
2.1.0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [dpdk-dev] [PATCH 08/17] mlx5: use alternate method to configure promiscuous mode
  2015-10-05 17:54 [dpdk-dev] [PATCH 00/17] Enhance mlx5 with Mellanox OFED 3.1 Adrien Mazarguil
                   ` (6 preceding siblings ...)
  2015-10-05 17:54 ` [dpdk-dev] [PATCH 07/17] mlx5: define specific flow steering rules for each hash RX QP Adrien Mazarguil
@ 2015-10-05 17:54 ` Adrien Mazarguil
  2015-10-05 17:54 ` [dpdk-dev] [PATCH 09/17] mlx5: add RSS hash update/get Adrien Mazarguil
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 39+ messages in thread
From: Adrien Mazarguil @ 2015-10-05 17:54 UTC (permalink / raw)
  To: dev

From: Olga Shern <olgas@mellanox.com>

Promiscuous mode was historically enabled by adding a specific flow with
type IBV_FLOW_ATTR_ALL_DEFAULT to each hash RX queue, but this method is
deprecated. It is now simply enabled by omitting destination MAC addresses
from basic flow specifications.

Signed-off-by: Olga Shern <olgas@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx5/mlx5_rxmode.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxmode.c b/drivers/net/mlx5/mlx5_rxmode.c
index 7fe7f0e..aab38ee 100644
--- a/drivers/net/mlx5/mlx5_rxmode.c
+++ b/drivers/net/mlx5/mlx5_rxmode.c
@@ -74,20 +74,21 @@ static int
 hash_rxq_promiscuous_enable(struct hash_rxq *hash_rxq)
 {
 	struct ibv_flow *flow;
-	struct ibv_flow_attr attr = {
-		.type = IBV_FLOW_ATTR_ALL_DEFAULT,
-		.num_of_specs = 0,
-		.port = hash_rxq->priv->port,
-		.flags = 0
-	};
+	struct priv *priv = hash_rxq->priv;
+	FLOW_ATTR_SPEC_ETH(data, priv_populate_flow_attr(priv, NULL, 0,
+							 hash_rxq->type));
+	struct ibv_flow_attr *attr = &data->attr;
 
 	if (hash_rxq->priv->vf)
 		return 0;
 	DEBUG("%p: enabling promiscuous mode", (void *)hash_rxq);
 	if (hash_rxq->promisc_flow != NULL)
 		return EBUSY;
+	/* Promiscuous flows only differ from normal flows by not filtering
+	 * on specific MAC addresses. */
+	priv_populate_flow_attr(priv, attr, sizeof(data), hash_rxq->type);
 	errno = 0;
-	flow = ibv_create_flow(hash_rxq->qp, &attr);
+	flow = ibv_create_flow(hash_rxq->qp, attr);
 	if (flow == NULL) {
 		/* It's not clear whether errno is always set in this case. */
 		ERROR("%p: flow configuration failed, errno=%d: %s",
-- 
2.1.0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [dpdk-dev] [PATCH 09/17] mlx5: add RSS hash update/get
  2015-10-05 17:54 [dpdk-dev] [PATCH 00/17] Enhance mlx5 with Mellanox OFED 3.1 Adrien Mazarguil
                   ` (7 preceding siblings ...)
  2015-10-05 17:54 ` [dpdk-dev] [PATCH 08/17] mlx5: use alternate method to configure promiscuous mode Adrien Mazarguil
@ 2015-10-05 17:54 ` Adrien Mazarguil
  2015-10-05 17:54 ` [dpdk-dev] [PATCH 10/17] mlx5: use one RSS hash key per flow type Adrien Mazarguil
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 39+ messages in thread
From: Adrien Mazarguil @ 2015-10-05 17:54 UTC (permalink / raw)
  To: dev

From: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>

First implementation of rss_hash_update and rss_hash_conf_get, those
functions still lack in functionality but are usable to change the RSS
hash key.  For now, the PMD does not handle an indirection table for
each kind of flow (IPv4, IPv6, etc.), the same RSS hash key is used
for all protocols.  This situation explains why the rss_hash_conf_get
returns the RSS hash key for all DPDK supported protocols and why the
hash key is set for all of them too.

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx5/Makefile    |   1 +
 drivers/net/mlx5/mlx5.c      |  10 +++
 drivers/net/mlx5/mlx5.h      |   7 ++
 drivers/net/mlx5/mlx5_rss.c  | 168 +++++++++++++++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_rxq.c  |  10 ++-
 drivers/net/mlx5/mlx5_rxtx.h |   3 +
 6 files changed, 196 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5_rss.c

diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 938f924..54f1e89 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -51,6 +51,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_mac.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_rxmode.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_vlan.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_stats.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_rss.c
 
 # Dependencies.
 DEPDIRS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += lib/librte_ether
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 167e14b..bc2a19b 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -129,6 +129,7 @@ mlx5_dev_close(struct rte_eth_dev *dev)
 		claim_zero(ibv_close_device(priv->ctx));
 	} else
 		assert(priv->ctx == NULL);
+	rte_free(priv->rss_conf);
 	priv_unlock(priv);
 	memset(priv, 0, sizeof(*priv));
 }
@@ -156,6 +157,8 @@ static const struct eth_dev_ops mlx5_dev_ops = {
 	.mac_addr_remove = mlx5_mac_addr_remove,
 	.mac_addr_add = mlx5_mac_addr_add,
 	.mtu_set = mlx5_dev_set_mtu,
+	.rss_hash_update = mlx5_rss_hash_update,
+	.rss_hash_conf_get = mlx5_rss_hash_conf_get,
 };
 
 static struct {
@@ -376,6 +379,12 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 #endif /* HAVE_EXP_QUERY_DEVICE */
 
 		priv->vf = vf;
+		/* Register default RSS hash key. */
+		err = rss_hash_rss_conf_new_key(priv,
+						rss_hash_default_key,
+						rss_hash_default_key_len);
+		if (err)
+			goto port_error;
 		/* Configure the first MAC address by default. */
 		if (priv_get_mac(priv, &mac.addr_bytes)) {
 			ERROR("cannot get MAC address, is mlx5_en loaded?"
@@ -439,6 +448,7 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		continue;
 
 port_error:
+		rte_free(priv->rss_conf);
 		rte_free(priv);
 		if (pd)
 			claim_zero(ibv_dealloc_pd(pd));
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index b099dac..fc042e8 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -118,6 +118,7 @@ struct priv {
 	/* Hash RX QPs feeding the indirection table. */
 	struct hash_rxq (*hash_rxqs)[];
 	unsigned int hash_rxqs_n; /* Hash RX QPs array size. */
+	struct rte_eth_rss_conf *rss_conf; /* RSS configuration. */
 	rte_spinlock_t lock; /* Lock for control functions. */
 };
 
@@ -174,6 +175,12 @@ int priv_mac_addrs_enable(struct priv *);
 void mlx5_mac_addr_add(struct rte_eth_dev *, struct ether_addr *, uint32_t,
 		       uint32_t);
 
+/* mlx5_rss.c */
+
+int rss_hash_rss_conf_new_key(struct priv *, const uint8_t *, unsigned int);
+int mlx5_rss_hash_update(struct rte_eth_dev *, struct rte_eth_rss_conf *);
+int mlx5_rss_hash_conf_get(struct rte_eth_dev *, struct rte_eth_rss_conf *);
+
 /* mlx5_rxmode.c */
 
 int priv_promiscuous_enable(struct priv *);
diff --git a/drivers/net/mlx5/mlx5_rss.c b/drivers/net/mlx5/mlx5_rss.c
new file mode 100644
index 0000000..2dc58e5
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_rss.c
@@ -0,0 +1,168 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2015 6WIND S.A.
+ *   Copyright 2015 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stddef.h>
+#include <stdint.h>
+#include <errno.h>
+#include <string.h>
+#include <assert.h>
+
+/* Verbs header. */
+/* ISO C doesn't support unnamed structs/unions, disabling -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include <infiniband/verbs.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+/* DPDK headers don't like -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include <rte_malloc.h>
+#include <rte_ethdev.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+#include "mlx5.h"
+#include "mlx5_rxtx.h"
+
+/**
+ * Register a RSS key.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ * @param key
+ *   Hash key to register.
+ * @param key_len
+ *   Hash key length in bytes.
+ *
+ * @return
+ *   0 on success, errno value on failure.
+ */
+int
+rss_hash_rss_conf_new_key(struct priv *priv, const uint8_t *key,
+			  unsigned int key_len)
+{
+	struct rte_eth_rss_conf *rss_conf;
+
+	rss_conf = rte_realloc(priv->rss_conf,
+			       (sizeof(*rss_conf) + key_len),
+			       0);
+	if (!rss_conf)
+		return ENOMEM;
+	rss_conf->rss_key = (void *)(rss_conf + 1);
+	rss_conf->rss_key_len = key_len;
+	memcpy(rss_conf->rss_key, key, key_len);
+	priv->rss_conf = rss_conf;
+	return 0;
+}
+
+/**
+ * DPDK callback to update the RSS hash configuration.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param[in] rss_conf
+ *   RSS configuration data.
+ *
+ * @return
+ *   0 on success, negative errno value on failure.
+ */
+int
+mlx5_rss_hash_update(struct rte_eth_dev *dev,
+		     struct rte_eth_rss_conf *rss_conf)
+{
+	struct priv *priv = dev->data->dev_private;
+	int err = 0;
+
+	priv_lock(priv);
+
+	assert(priv->rss_conf != NULL);
+
+	/* Apply configuration. */
+	if (rss_conf->rss_key)
+		err = rss_hash_rss_conf_new_key(priv,
+						rss_conf->rss_key,
+						rss_conf->rss_key_len);
+	else
+		err = rss_hash_rss_conf_new_key(priv,
+						rss_hash_default_key,
+						rss_hash_default_key_len);
+
+	/* Store the configuration set into port configure.
+	 * This will enable/disable hash RX queues associated to the protocols
+	 * enabled/disabled by this update. */
+	priv->dev->data->dev_conf.rx_adv_conf.rss_conf.rss_hf =
+		rss_conf->rss_hf;
+	priv_unlock(priv);
+	assert(err >= 0);
+	return -err;
+}
+
+/**
+ * DPDK callback to get the RSS hash configuration.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param[in, out] rss_conf
+ *   RSS configuration data.
+ *
+ * @return
+ *   0 on success, negative errno value on failure.
+ */
+int
+mlx5_rss_hash_conf_get(struct rte_eth_dev *dev,
+		       struct rte_eth_rss_conf *rss_conf)
+{
+	struct priv *priv = dev->data->dev_private;
+
+	priv_lock(priv);
+
+	assert(priv->rss_conf != NULL);
+
+	if (rss_conf->rss_key &&
+	    rss_conf->rss_key_len >= priv->rss_conf->rss_key_len)
+		memcpy(rss_conf->rss_key,
+		       priv->rss_conf->rss_key,
+		       priv->rss_conf->rss_key_len);
+	rss_conf->rss_key_len = priv->rss_conf->rss_key_len;
+	/* FIXME: rss_hf should be more specific. */
+	rss_conf->rss_hf = ETH_RSS_PROTO_MASK;
+
+	priv_unlock(priv);
+	return 0;
+}
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 7f25688..2bd1277 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -135,7 +135,7 @@ static const struct ind_table_init ind_table_init[] = {
 #define IND_TABLE_INIT_N RTE_DIM(ind_table_init)
 
 /* Default RSS hash key also used for ConnectX-3. */
-static uint8_t hash_rxq_default_key[] = {
+uint8_t rss_hash_default_key[] = {
 	0x2c, 0xc6, 0x81, 0xd1,
 	0x5b, 0xdb, 0xf4, 0xf7,
 	0xfc, 0xa2, 0x83, 0x19,
@@ -148,6 +148,9 @@ static uint8_t hash_rxq_default_key[] = {
 	0xfc, 0x1f, 0xdc, 0x2a,
 };
 
+/* Length of the default RSS hash key. */
+const size_t rss_hash_default_key_len = sizeof(rss_hash_default_key);
+
 /**
  * Populate flow steering rule for a given hash RX queue type using
  * information from hash_rxq_init[]. Nothing is written to flow_attr when
@@ -327,6 +330,7 @@ priv_create_hash_rxqs(struct priv *priv)
 	assert(priv->hash_rxqs_n == 0);
 	assert(priv->pd != NULL);
 	assert(priv->ctx != NULL);
+	assert(priv->rss_conf != NULL);
 	if (priv->rxqs_n == 0)
 		return EINVAL;
 	assert(priv->rxqs != NULL);
@@ -411,8 +415,8 @@ priv_create_hash_rxqs(struct priv *priv)
 			hash_rxq_type_from_n(&ind_table_init[j], k);
 		struct ibv_exp_rx_hash_conf hash_conf = {
 			.rx_hash_function = IBV_EXP_RX_HASH_FUNC_TOEPLITZ,
-			.rx_hash_key_len = sizeof(hash_rxq_default_key),
-			.rx_hash_key = hash_rxq_default_key,
+			.rx_hash_key_len = priv->rss_conf->rss_key_len,
+			.rx_hash_key = priv->rss_conf->rss_key,
 			.rx_hash_fields_mask = hash_rxq_init[type].hash_fields,
 			.rwq_ind_tbl = (*ind_tables)[j],
 		};
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 00b5d6c..94e5f04 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -216,6 +216,9 @@ struct txq {
 
 /* mlx5_rxq.c */
 
+extern uint8_t rss_hash_default_key[];
+extern const size_t rss_hash_default_key_len;
+
 size_t priv_populate_flow_attr(const struct priv *, struct ibv_flow_attr *,
 			       size_t, enum hash_rxq_type);
 int priv_create_hash_rxqs(struct priv *);
-- 
2.1.0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [dpdk-dev] [PATCH 10/17] mlx5: use one RSS hash key per flow type
  2015-10-05 17:54 [dpdk-dev] [PATCH 00/17] Enhance mlx5 with Mellanox OFED 3.1 Adrien Mazarguil
                   ` (8 preceding siblings ...)
  2015-10-05 17:54 ` [dpdk-dev] [PATCH 09/17] mlx5: add RSS hash update/get Adrien Mazarguil
@ 2015-10-05 17:54 ` Adrien Mazarguil
  2015-10-05 17:54 ` [dpdk-dev] [PATCH 11/17] app/testpmd: add missing type to RSS hash commands Adrien Mazarguil
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 39+ messages in thread
From: Adrien Mazarguil @ 2015-10-05 17:54 UTC (permalink / raw)
  To: dev

From: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>

DPDK expects to have an RSS hash key per flow type (IPv4, IPv6, UDPv4,
etc.), to handle this the PMD must keep a table of hash keys to be able
to reconfigure the queues at each start/stop call.

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx5/mlx5.c      | 17 +++++++--
 drivers/net/mlx5/mlx5.h      |  6 ++--
 drivers/net/mlx5/mlx5_rss.c  | 85 +++++++++++++++++++++++++++++++++-----------
 drivers/net/mlx5/mlx5_rxq.c  | 24 +++++++++----
 drivers/net/mlx5/mlx5_rxtx.h |  4 +++
 5 files changed, 105 insertions(+), 31 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index bc2a19b..fb9d594 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -129,7 +129,11 @@ mlx5_dev_close(struct rte_eth_dev *dev)
 		claim_zero(ibv_close_device(priv->ctx));
 	} else
 		assert(priv->ctx == NULL);
-	rte_free(priv->rss_conf);
+	if (priv->rss_conf != NULL) {
+		for (i = 0; (i != hash_rxq_init_n); ++i)
+			rte_free((*priv->rss_conf)[i]);
+		rte_free(priv->rss_conf);
+	}
 	priv_unlock(priv);
 	memset(priv, 0, sizeof(*priv));
 }
@@ -379,10 +383,17 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 #endif /* HAVE_EXP_QUERY_DEVICE */
 
 		priv->vf = vf;
-		/* Register default RSS hash key. */
+		/* Allocate and register default RSS hash keys. */
+		priv->rss_conf = rte_calloc(__func__, hash_rxq_init_n,
+					    sizeof((*priv->rss_conf)[0]), 0);
+		if (priv->rss_conf == NULL) {
+			err = ENOMEM;
+			goto port_error;
+		}
 		err = rss_hash_rss_conf_new_key(priv,
 						rss_hash_default_key,
-						rss_hash_default_key_len);
+						rss_hash_default_key_len,
+						ETH_RSS_PROTO_MASK);
 		if (err)
 			goto port_error;
 		/* Configure the first MAC address by default. */
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index fc042e8..de72f94 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -118,7 +118,8 @@ struct priv {
 	/* Hash RX QPs feeding the indirection table. */
 	struct hash_rxq (*hash_rxqs)[];
 	unsigned int hash_rxqs_n; /* Hash RX QPs array size. */
-	struct rte_eth_rss_conf *rss_conf; /* RSS configuration. */
+	/* RSS configuration array indexed by hash RX queue type. */
+	struct rte_eth_rss_conf *(*rss_conf)[];
 	rte_spinlock_t lock; /* Lock for control functions. */
 };
 
@@ -177,7 +178,8 @@ void mlx5_mac_addr_add(struct rte_eth_dev *, struct ether_addr *, uint32_t,
 
 /* mlx5_rss.c */
 
-int rss_hash_rss_conf_new_key(struct priv *, const uint8_t *, unsigned int);
+int rss_hash_rss_conf_new_key(struct priv *, const uint8_t *, unsigned int,
+			      uint64_t);
 int mlx5_rss_hash_update(struct rte_eth_dev *, struct rte_eth_rss_conf *);
 int mlx5_rss_hash_conf_get(struct rte_eth_dev *, struct rte_eth_rss_conf *);
 
diff --git a/drivers/net/mlx5/mlx5_rss.c b/drivers/net/mlx5/mlx5_rss.c
index 2dc58e5..bf19aca 100644
--- a/drivers/net/mlx5/mlx5_rss.c
+++ b/drivers/net/mlx5/mlx5_rss.c
@@ -61,6 +61,33 @@
 #include "mlx5_rxtx.h"
 
 /**
+ * Get a RSS configuration hash key.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ * @param rss_hf
+ *   RSS hash functions configuration must be retrieved for.
+ *
+ * @return
+ *   Pointer to a RSS configuration structure or NULL if rss_hf cannot
+ *   be matched.
+ */
+static struct rte_eth_rss_conf *
+rss_hash_get(struct priv *priv, uint64_t rss_hf)
+{
+	unsigned int i;
+
+	for (i = 0; (i != hash_rxq_init_n); ++i) {
+		uint64_t dpdk_rss_hf = hash_rxq_init[i].dpdk_rss_hf;
+
+		if (!(dpdk_rss_hf & rss_hf))
+			continue;
+		return (*priv->rss_conf)[i];
+	}
+	return NULL;
+}
+
+/**
  * Register a RSS key.
  *
  * @param priv
@@ -69,25 +96,35 @@
  *   Hash key to register.
  * @param key_len
  *   Hash key length in bytes.
+ * @param rss_hf
+ *   RSS hash functions the provided key applies to.
  *
  * @return
  *   0 on success, errno value on failure.
  */
 int
 rss_hash_rss_conf_new_key(struct priv *priv, const uint8_t *key,
-			  unsigned int key_len)
+			  unsigned int key_len, uint64_t rss_hf)
 {
-	struct rte_eth_rss_conf *rss_conf;
-
-	rss_conf = rte_realloc(priv->rss_conf,
-			       (sizeof(*rss_conf) + key_len),
-			       0);
-	if (!rss_conf)
-		return ENOMEM;
-	rss_conf->rss_key = (void *)(rss_conf + 1);
-	rss_conf->rss_key_len = key_len;
-	memcpy(rss_conf->rss_key, key, key_len);
-	priv->rss_conf = rss_conf;
+	unsigned int i;
+
+	for (i = 0; (i != hash_rxq_init_n); ++i) {
+		struct rte_eth_rss_conf *rss_conf;
+		uint64_t dpdk_rss_hf = hash_rxq_init[i].dpdk_rss_hf;
+
+		if (!(dpdk_rss_hf & rss_hf))
+			continue;
+		rss_conf = rte_realloc((*priv->rss_conf)[i],
+				       (sizeof(*rss_conf) + key_len),
+				       0);
+		if (!rss_conf)
+			return ENOMEM;
+		rss_conf->rss_key = (void *)(rss_conf + 1);
+		rss_conf->rss_key_len = key_len;
+		rss_conf->rss_hf = dpdk_rss_hf;
+		memcpy(rss_conf->rss_key, key, key_len);
+		(*priv->rss_conf)[i] = rss_conf;
+	}
 	return 0;
 }
 
@@ -117,11 +154,13 @@ mlx5_rss_hash_update(struct rte_eth_dev *dev,
 	if (rss_conf->rss_key)
 		err = rss_hash_rss_conf_new_key(priv,
 						rss_conf->rss_key,
-						rss_conf->rss_key_len);
+						rss_conf->rss_key_len,
+						rss_conf->rss_hf);
 	else
 		err = rss_hash_rss_conf_new_key(priv,
 						rss_hash_default_key,
-						rss_hash_default_key_len);
+						rss_hash_default_key_len,
+						ETH_RSS_PROTO_MASK);
 
 	/* Store the configuration set into port configure.
 	 * This will enable/disable hash RX queues associated to the protocols
@@ -149,19 +188,25 @@ mlx5_rss_hash_conf_get(struct rte_eth_dev *dev,
 		       struct rte_eth_rss_conf *rss_conf)
 {
 	struct priv *priv = dev->data->dev_private;
+	struct rte_eth_rss_conf *priv_rss_conf;
 
 	priv_lock(priv);
 
 	assert(priv->rss_conf != NULL);
 
+	priv_rss_conf = rss_hash_get(priv, rss_conf->rss_hf);
+	if (!priv_rss_conf) {
+		rss_conf->rss_hf = 0;
+		priv_unlock(priv);
+		return -EINVAL;
+	}
 	if (rss_conf->rss_key &&
-	    rss_conf->rss_key_len >= priv->rss_conf->rss_key_len)
+	    rss_conf->rss_key_len >= priv_rss_conf->rss_key_len)
 		memcpy(rss_conf->rss_key,
-		       priv->rss_conf->rss_key,
-		       priv->rss_conf->rss_key_len);
-	rss_conf->rss_key_len = priv->rss_conf->rss_key_len;
-	/* FIXME: rss_hf should be more specific. */
-	rss_conf->rss_hf = ETH_RSS_PROTO_MASK;
+		       priv_rss_conf->rss_key,
+		       priv_rss_conf->rss_key_len);
+	rss_conf->rss_key_len = priv_rss_conf->rss_key_len;
+	rss_conf->rss_hf = priv_rss_conf->rss_hf;
 
 	priv_unlock(priv);
 	return 0;
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 2bd1277..283e56b 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -65,12 +65,13 @@
 #include "mlx5_defs.h"
 
 /* Initialization data for hash RX queues. */
-static const struct hash_rxq_init hash_rxq_init[] = {
+const struct hash_rxq_init hash_rxq_init[] = {
 	[HASH_RXQ_TCPv4] = {
 		.hash_fields = (IBV_EXP_RX_HASH_SRC_IPV4 |
 				IBV_EXP_RX_HASH_DST_IPV4 |
 				IBV_EXP_RX_HASH_SRC_PORT_TCP |
 				IBV_EXP_RX_HASH_DST_PORT_TCP),
+		.dpdk_rss_hf = ETH_RSS_NONFRAG_IPV4_TCP,
 		.flow_priority = 0,
 		.flow_spec.tcp_udp = {
 			.type = IBV_FLOW_SPEC_TCP,
@@ -83,6 +84,7 @@ static const struct hash_rxq_init hash_rxq_init[] = {
 				IBV_EXP_RX_HASH_DST_IPV4 |
 				IBV_EXP_RX_HASH_SRC_PORT_UDP |
 				IBV_EXP_RX_HASH_DST_PORT_UDP),
+		.dpdk_rss_hf = ETH_RSS_NONFRAG_IPV4_UDP,
 		.flow_priority = 0,
 		.flow_spec.tcp_udp = {
 			.type = IBV_FLOW_SPEC_UDP,
@@ -93,6 +95,8 @@ static const struct hash_rxq_init hash_rxq_init[] = {
 	[HASH_RXQ_IPv4] = {
 		.hash_fields = (IBV_EXP_RX_HASH_SRC_IPV4 |
 				IBV_EXP_RX_HASH_DST_IPV4),
+		.dpdk_rss_hf = (ETH_RSS_IPV4 |
+				ETH_RSS_FRAG_IPV4),
 		.flow_priority = 1,
 		.flow_spec.ipv4 = {
 			.type = IBV_FLOW_SPEC_IPV4,
@@ -102,6 +106,7 @@ static const struct hash_rxq_init hash_rxq_init[] = {
 	},
 	[HASH_RXQ_ETH] = {
 		.hash_fields = 0,
+		.dpdk_rss_hf = 0,
 		.flow_priority = 2,
 		.flow_spec.eth = {
 			.type = IBV_FLOW_SPEC_ETH,
@@ -112,7 +117,7 @@ static const struct hash_rxq_init hash_rxq_init[] = {
 };
 
 /* Number of entries in hash_rxq_init[]. */
-static const unsigned int hash_rxq_init_n = RTE_DIM(hash_rxq_init);
+const unsigned int hash_rxq_init_n = RTE_DIM(hash_rxq_init);
 
 /* Initialization data for hash RX queue indirection tables. */
 static const struct ind_table_init ind_table_init[] = {
@@ -260,16 +265,18 @@ static unsigned int
 priv_make_ind_table_init(struct priv *priv,
 			 struct ind_table_init (*table)[IND_TABLE_INIT_N])
 {
+	uint64_t rss_hf;
 	unsigned int i;
 	unsigned int j;
 	unsigned int table_n = 0;
 	/* Mandatory to receive frames not handled by normal hash RX queues. */
 	unsigned int hash_types_sup = 1 << HASH_RXQ_ETH;
 
+	rss_hf = priv->dev->data->dev_conf.rx_adv_conf.rss_conf.rss_hf;
 	/* Process other protocols only if more than one queue. */
 	if (priv->rxqs_n > 1)
 		for (i = 0; (i != hash_rxq_init_n); ++i)
-			if (hash_rxq_init[i].hash_fields)
+			if (rss_hf & hash_rxq_init[i].dpdk_rss_hf)
 				hash_types_sup |= (1 << i);
 
 	/* Filter out entries whose protocols are not in the set. */
@@ -330,7 +337,6 @@ priv_create_hash_rxqs(struct priv *priv)
 	assert(priv->hash_rxqs_n == 0);
 	assert(priv->pd != NULL);
 	assert(priv->ctx != NULL);
-	assert(priv->rss_conf != NULL);
 	if (priv->rxqs_n == 0)
 		return EINVAL;
 	assert(priv->rxqs != NULL);
@@ -413,10 +419,16 @@ priv_create_hash_rxqs(struct priv *priv)
 		struct hash_rxq *hash_rxq = &(*hash_rxqs)[i];
 		enum hash_rxq_type type =
 			hash_rxq_type_from_n(&ind_table_init[j], k);
+		struct rte_eth_rss_conf *priv_rss_conf =
+			(*priv->rss_conf)[type];
 		struct ibv_exp_rx_hash_conf hash_conf = {
 			.rx_hash_function = IBV_EXP_RX_HASH_FUNC_TOEPLITZ,
-			.rx_hash_key_len = priv->rss_conf->rss_key_len,
-			.rx_hash_key = priv->rss_conf->rss_key,
+			.rx_hash_key_len = (priv_rss_conf ?
+					    priv_rss_conf->rss_key_len :
+					    rss_hash_default_key_len),
+			.rx_hash_key = (priv_rss_conf ?
+					priv_rss_conf->rss_key :
+					rss_hash_default_key),
 			.rx_hash_fields_mask = hash_rxq_init[type].hash_fields,
 			.rwq_ind_tbl = (*ind_tables)[j],
 		};
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 94e5f04..0db4393 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -148,6 +148,7 @@ struct flow_attr_spec_eth {
 /* Initialization data for hash RX queue. */
 struct hash_rxq_init {
 	uint64_t hash_fields; /* Fields that participate in the hash. */
+	uint64_t dpdk_rss_hf; /* Matching DPDK RSS hash fields. */
 	unsigned int flow_priority; /* Flow priority to use. */
 	struct ibv_flow_spec flow_spec; /* Flow specification template. */
 	const struct hash_rxq_init *underlayer; /* Pointer to underlayer. */
@@ -216,6 +217,9 @@ struct txq {
 
 /* mlx5_rxq.c */
 
+extern const struct hash_rxq_init hash_rxq_init[];
+extern const unsigned int hash_rxq_init_n;
+
 extern uint8_t rss_hash_default_key[];
 extern const size_t rss_hash_default_key_len;
 
-- 
2.1.0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [dpdk-dev] [PATCH 11/17] app/testpmd: add missing type to RSS hash commands
  2015-10-05 17:54 [dpdk-dev] [PATCH 00/17] Enhance mlx5 with Mellanox OFED 3.1 Adrien Mazarguil
                   ` (9 preceding siblings ...)
  2015-10-05 17:54 ` [dpdk-dev] [PATCH 10/17] mlx5: use one RSS hash key per flow type Adrien Mazarguil
@ 2015-10-05 17:54 ` Adrien Mazarguil
  2015-10-05 17:54 ` [dpdk-dev] [PATCH 12/17] app/testpmd: fix missing initialization in the RSS hash show command Adrien Mazarguil
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 39+ messages in thread
From: Adrien Mazarguil @ 2015-10-05 17:54 UTC (permalink / raw)
  To: dev

From: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>

DPDK uses a structure to get or set a new hash key (see
eth_rte_rss_hash_conf).  rss_hf field from this structure is used in
rss_hash_get_conf to retrieve the hash key and in rss_hash_update uses
it to verify the key exists before trying to update it.

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 app/test-pmd/cmdline.c                      | 45 +++++++++++++++++---
 app/test-pmd/config.c                       | 66 ++++++++++++++++++-----------
 app/test-pmd/testpmd.h                      |  6 ++-
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  2 +-
 4 files changed, 85 insertions(+), 34 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 0f8f48f..9e3b7f9 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -190,7 +190,9 @@ static void cmd_help_long_parsed(void *parsed_result,
 			" by masks on port X. size is used to indicate the"
 			" hardware supported reta size\n\n"
 
-			"show port rss-hash [key]\n"
+			"show port rss-hash ipv4|ipv4-frag|ipv4-tcp|ipv4-udp|"
+			"ipv4-sctp|ipv4-other|ipv6|ipv6-frag|ipv6-tcp|ipv6-udp|ipv6-sctp|"
+			"ipv6-other|l2-payload|ipv6-ex|ipv6-tcp-ex|ipv6-udp-ex [key]\n"
 			"    Display the RSS hash functions and RSS hash key"
 			" of port X\n\n"
 
@@ -1498,6 +1500,7 @@ struct cmd_config_rss_hash_key {
 	cmdline_fixed_string_t config;
 	uint8_t port_id;
 	cmdline_fixed_string_t rss_hash_key;
+	cmdline_fixed_string_t rss_type;
 	cmdline_fixed_string_t key;
 };
 
@@ -1555,7 +1558,8 @@ cmd_config_rss_hash_key_parsed(void *parsed_result,
 			return;
 		hash_key[i] = (uint8_t) ((xdgt0 * 16) + xdgt1);
 	}
-	port_rss_hash_key_update(res->port_id, hash_key);
+	port_rss_hash_key_update(res->port_id, res->rss_type, hash_key,
+				 RSS_HASH_KEY_LENGTH);
 }
 
 cmdline_parse_token_string_t cmd_config_rss_hash_key_port =
@@ -1568,18 +1572,29 @@ cmdline_parse_token_num_t cmd_config_rss_hash_key_port_id =
 cmdline_parse_token_string_t cmd_config_rss_hash_key_rss_hash_key =
 	TOKEN_STRING_INITIALIZER(struct cmd_config_rss_hash_key,
 				 rss_hash_key, "rss-hash-key");
+cmdline_parse_token_string_t cmd_config_rss_hash_key_rss_type =
+	TOKEN_STRING_INITIALIZER(struct cmd_config_rss_hash_key, rss_type,
+				 "ipv4#ipv4-frag#ipv4-tcp#ipv4-udp#ipv4-sctp#"
+				 "ipv4-other#ipv6#ipv6-frag#ipv6-tcp#ipv6-udp#"
+				 "ipv6-sctp#ipv6-other#l2-payload#ipv6-ex#"
+				 "ipv6-tcp-ex#ipv6-udp-ex");
 cmdline_parse_token_string_t cmd_config_rss_hash_key_value =
 	TOKEN_STRING_INITIALIZER(struct cmd_config_rss_hash_key, key, NULL);
 
 cmdline_parse_inst_t cmd_config_rss_hash_key = {
 	.f = cmd_config_rss_hash_key_parsed,
 	.data = NULL,
-	.help_str = "port config X rss-hash-key 80 hexa digits",
+	.help_str =
+		"port config X rss-hash-key ipv4|ipv4-frag|ipv4-tcp|ipv4-udp|"
+		"ipv4-sctp|ipv4-other|ipv6|ipv6-frag|ipv6-tcp|ipv6-udp|"
+		"ipv6-sctp|ipv6-other|l2-payload|"
+		"ipv6-ex|ipv6-tcp-ex|ipv6-udp-ex 80 hexa digits\n",
 	.tokens = {
 		(void *)&cmd_config_rss_hash_key_port,
 		(void *)&cmd_config_rss_hash_key_config,
 		(void *)&cmd_config_rss_hash_key_port_id,
 		(void *)&cmd_config_rss_hash_key_rss_hash_key,
+		(void *)&cmd_config_rss_hash_key_rss_type,
 		(void *)&cmd_config_rss_hash_key_value,
 		NULL,
 	},
@@ -1929,6 +1944,7 @@ struct cmd_showport_rss_hash {
 	cmdline_fixed_string_t port;
 	uint8_t port_id;
 	cmdline_fixed_string_t rss_hash;
+	cmdline_fixed_string_t rss_type;
 	cmdline_fixed_string_t key; /* optional argument */
 };
 
@@ -1938,7 +1954,8 @@ static void cmd_showport_rss_hash_parsed(void *parsed_result,
 {
 	struct cmd_showport_rss_hash *res = parsed_result;
 
-	port_rss_hash_conf_show(res->port_id, show_rss_key != NULL);
+	port_rss_hash_conf_show(res->port_id, res->rss_type,
+				show_rss_key != NULL);
 }
 
 cmdline_parse_token_string_t cmd_showport_rss_hash_show =
@@ -1950,18 +1967,29 @@ cmdline_parse_token_num_t cmd_showport_rss_hash_port_id =
 cmdline_parse_token_string_t cmd_showport_rss_hash_rss_hash =
 	TOKEN_STRING_INITIALIZER(struct cmd_showport_rss_hash, rss_hash,
 				 "rss-hash");
+cmdline_parse_token_string_t cmd_showport_rss_hash_rss_hash_info =
+	TOKEN_STRING_INITIALIZER(struct cmd_showport_rss_hash, rss_type,
+				 "ipv4#ipv4-frag#ipv4-tcp#ipv4-udp#ipv4-sctp#"
+				 "ipv4-other#ipv6#ipv6-frag#ipv6-tcp#ipv6-udp#"
+				 "ipv6-sctp#ipv6-other#l2-payload#ipv6-ex#"
+				 "ipv6-tcp-ex#ipv6-udp-ex");
 cmdline_parse_token_string_t cmd_showport_rss_hash_rss_key =
 	TOKEN_STRING_INITIALIZER(struct cmd_showport_rss_hash, key, "key");
 
 cmdline_parse_inst_t cmd_showport_rss_hash = {
 	.f = cmd_showport_rss_hash_parsed,
 	.data = NULL,
-	.help_str = "show port X rss-hash (X = port number)\n",
+	.help_str =
+		"show port X rss-hash ipv4|ipv4-frag|ipv4-tcp|ipv4-udp|"
+		"ipv4-sctp|ipv4-other|ipv6|ipv6-frag|ipv6-tcp|ipv6-udp|"
+		"ipv6-sctp|ipv6-other|l2-payload|"
+		"ipv6-ex|ipv6-tcp-ex|ipv6-udp-ex (X = port number)\n",
 	.tokens = {
 		(void *)&cmd_showport_rss_hash_show,
 		(void *)&cmd_showport_rss_hash_port,
 		(void *)&cmd_showport_rss_hash_port_id,
 		(void *)&cmd_showport_rss_hash_rss_hash,
+		(void *)&cmd_showport_rss_hash_rss_hash_info,
 		NULL,
 	},
 };
@@ -1969,12 +1997,17 @@ cmdline_parse_inst_t cmd_showport_rss_hash = {
 cmdline_parse_inst_t cmd_showport_rss_hash_key = {
 	.f = cmd_showport_rss_hash_parsed,
 	.data = (void *)1,
-	.help_str = "show port X rss-hash key (X = port number)\n",
+	.help_str =
+		"show port X rss-hash ipv4|ipv4-frag|ipv4-tcp|ipv4-udp|"
+		"ipv4-sctp|ipv4-other|ipv6|ipv6-frag|ipv6-tcp|ipv6-udp|"
+		"ipv6-sctp|ipv6-other|l2-payload|"
+		"ipv6-ex|ipv6-tcp-ex|ipv6-udp-ex key (X = port number)\n",
 	.tokens = {
 		(void *)&cmd_showport_rss_hash_show,
 		(void *)&cmd_showport_rss_hash_port,
 		(void *)&cmd_showport_rss_hash_port_id,
 		(void *)&cmd_showport_rss_hash_rss_hash,
+		(void *)&cmd_showport_rss_hash_rss_hash_info,
 		(void *)&cmd_showport_rss_hash_rss_key,
 		NULL,
 	},
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index cf2aa6e..f3b96a3 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -97,6 +97,30 @@
 
 static char *flowtype_to_str(uint16_t flow_type);
 
+struct rss_type_info {
+	char str[32];
+	uint64_t rss_type;
+};
+
+static const struct rss_type_info rss_type_table[] = {
+	{ "ipv4", ETH_RSS_IPV4 },
+	{ "ipv4-frag", ETH_RSS_FRAG_IPV4 },
+	{ "ipv4-tcp", ETH_RSS_NONFRAG_IPV4_TCP },
+	{ "ipv4-udp", ETH_RSS_NONFRAG_IPV4_UDP },
+	{ "ipv4-sctp", ETH_RSS_NONFRAG_IPV4_SCTP },
+	{ "ipv4-other", ETH_RSS_NONFRAG_IPV4_OTHER },
+	{ "ipv6", ETH_RSS_IPV6 },
+	{ "ipv6-frag", ETH_RSS_FRAG_IPV6 },
+	{ "ipv6-tcp", ETH_RSS_NONFRAG_IPV6_TCP },
+	{ "ipv6-udp", ETH_RSS_NONFRAG_IPV6_UDP },
+	{ "ipv6-sctp", ETH_RSS_NONFRAG_IPV6_SCTP },
+	{ "ipv6-other", ETH_RSS_NONFRAG_IPV6_OTHER },
+	{ "l2-payload", ETH_RSS_L2_PAYLOAD },
+	{ "ipv6-ex", ETH_RSS_IPV6_EX },
+	{ "ipv6-tcp-ex", ETH_RSS_IPV6_TCP_EX },
+	{ "ipv6-udp-ex", ETH_RSS_IPV6_UDP_EX },
+};
+
 static void
 print_ethaddr(const char *name, struct ether_addr *eth_addr)
 {
@@ -852,31 +876,8 @@ port_rss_reta_info(portid_t port_id,
  * key of the port.
  */
 void
-port_rss_hash_conf_show(portid_t port_id, int show_rss_key)
+port_rss_hash_conf_show(portid_t port_id, char rss_info[], int show_rss_key)
 {
-	struct rss_type_info {
-		char str[32];
-		uint64_t rss_type;
-	};
-	static const struct rss_type_info rss_type_table[] = {
-		{"ipv4", ETH_RSS_IPV4},
-		{"ipv4-frag", ETH_RSS_FRAG_IPV4},
-		{"ipv4-tcp", ETH_RSS_NONFRAG_IPV4_TCP},
-		{"ipv4-udp", ETH_RSS_NONFRAG_IPV4_UDP},
-		{"ipv4-sctp", ETH_RSS_NONFRAG_IPV4_SCTP},
-		{"ipv4-other", ETH_RSS_NONFRAG_IPV4_OTHER},
-		{"ipv6", ETH_RSS_IPV6},
-		{"ipv6-frag", ETH_RSS_FRAG_IPV6},
-		{"ipv6-tcp", ETH_RSS_NONFRAG_IPV6_TCP},
-		{"ipv6-udp", ETH_RSS_NONFRAG_IPV6_UDP},
-		{"ipv6-sctp", ETH_RSS_NONFRAG_IPV6_SCTP},
-		{"ipv6-other", ETH_RSS_NONFRAG_IPV6_OTHER},
-		{"l2-payload", ETH_RSS_L2_PAYLOAD},
-		{"ipv6-ex", ETH_RSS_IPV6_EX},
-		{"ipv6-tcp-ex", ETH_RSS_IPV6_TCP_EX},
-		{"ipv6-udp-ex", ETH_RSS_IPV6_UDP_EX},
-	};
-
 	struct rte_eth_rss_conf rss_conf;
 	uint8_t rss_key[10 * 4];
 	uint64_t rss_hf;
@@ -885,6 +886,13 @@ port_rss_hash_conf_show(portid_t port_id, int show_rss_key)
 
 	if (port_id_is_invalid(port_id, ENABLED_WARN))
 		return;
+
+	rss_conf.rss_hf = 0;
+	for (i = 0; i < RTE_DIM(rss_type_table); i++) {
+		if (!strcmp(rss_info, rss_type_table[i].str))
+			rss_conf.rss_hf = rss_type_table[i].rss_type;
+	}
+
 	/* Get RSS hash key if asked to display it */
 	rss_conf.rss_key = (show_rss_key) ? rss_key : NULL;
 	diag = rte_eth_dev_rss_hash_conf_get(port_id, &rss_conf);
@@ -922,12 +930,20 @@ port_rss_hash_conf_show(portid_t port_id, int show_rss_key)
 }
 
 void
-port_rss_hash_key_update(portid_t port_id, uint8_t *hash_key)
+port_rss_hash_key_update(portid_t port_id, char rss_type[], uint8_t *hash_key,
+			 uint hash_key_len)
 {
 	struct rte_eth_rss_conf rss_conf;
 	int diag;
+	unsigned int i;
 
 	rss_conf.rss_key = NULL;
+	rss_conf.rss_key_len = hash_key_len;
+	rss_conf.rss_hf = 0;
+	for (i = 0; i < RTE_DIM(rss_type_table); i++) {
+		if (!strcmp(rss_type_table[i].str, rss_type))
+			rss_conf.rss_hf = rss_type_table[i].rss_type;
+	}
 	diag = rte_eth_dev_rss_hash_conf_get(port_id, &rss_conf);
 	if (diag == 0) {
 		rss_conf.rss_key = hash_key;
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index d287274..31a7cb8 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -563,8 +563,10 @@ int set_queue_rate_limit(portid_t port_id, uint16_t queue_idx, uint16_t rate);
 int set_vf_rate_limit(portid_t port_id, uint16_t vf, uint16_t rate,
 				uint64_t q_msk);
 
-void port_rss_hash_conf_show(portid_t port_id, int show_rss_key);
-void port_rss_hash_key_update(portid_t port_id, uint8_t *hash_key);
+void port_rss_hash_conf_show(portid_t port_id, char rss_info[],
+			     int show_rss_key);
+void port_rss_hash_key_update(portid_t port_id, char rss_type[],
+			      uint8_t *hash_key, uint hash_key_len);
 void get_syn_filter(uint8_t port_id);
 void get_ethertype_filter(uint8_t port_id, uint16_t index);
 void get_2tuple_filter(uint8_t port_id, uint16_t index);
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index aa77a91..a469b0b 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -179,7 +179,7 @@ show port rss-hash
 
 Display the RSS hash functions and RSS hash key of a port:
 
-show port (port_id) rss-hash [key]
+show port (port_id) rss-hash ipv4|ipv4-frag|ipv4-tcp|ipv4-udp|ipv4-sctp|ipv4-other|ipv6|ipv6-frag|ipv6-tcp|ipv6-udp|ipv6-sctp|ipv6-other|l2-payload|ipv6-ex|ipv6-tcp-ex|ipv6-udp-ex [key]
 
 clear port
 ~~~~~~~~~~
-- 
2.1.0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [dpdk-dev] [PATCH 12/17] app/testpmd: fix missing initialization in the RSS hash show command
  2015-10-05 17:54 [dpdk-dev] [PATCH 00/17] Enhance mlx5 with Mellanox OFED 3.1 Adrien Mazarguil
                   ` (10 preceding siblings ...)
  2015-10-05 17:54 ` [dpdk-dev] [PATCH 11/17] app/testpmd: add missing type to RSS hash commands Adrien Mazarguil
@ 2015-10-05 17:54 ` Adrien Mazarguil
  2015-10-05 17:54 ` [dpdk-dev] [PATCH 13/17] mlx5: remove normal MAC flows when enabling promiscuous mode Adrien Mazarguil
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 39+ messages in thread
From: Adrien Mazarguil @ 2015-10-05 17:54 UTC (permalink / raw)
  To: dev

The "show port X rss-hash" command sometimes displays garbage instead of the
expected RSS hash key because the maximum key length is undefined. When the
requested key is too large to fit in the buffer,
rte_eth_dev_rss_hash_conf_get() does not update it.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 app/test-pmd/config.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index f3b96a3..e2dc33e 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -879,7 +879,7 @@ void
 port_rss_hash_conf_show(portid_t port_id, char rss_info[], int show_rss_key)
 {
 	struct rte_eth_rss_conf rss_conf;
-	uint8_t rss_key[10 * 4];
+	uint8_t rss_key[10 * 4] = "";
 	uint64_t rss_hf;
 	uint8_t i;
 	int diag;
@@ -895,6 +895,7 @@ port_rss_hash_conf_show(portid_t port_id, char rss_info[], int show_rss_key)
 
 	/* Get RSS hash key if asked to display it */
 	rss_conf.rss_key = (show_rss_key) ? rss_key : NULL;
+	rss_conf.rss_key_len = sizeof(rss_key);
 	diag = rte_eth_dev_rss_hash_conf_get(port_id, &rss_conf);
 	if (diag != 0) {
 		switch (diag) {
-- 
2.1.0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [dpdk-dev] [PATCH 13/17] mlx5: remove normal MAC flows when enabling promiscuous mode
  2015-10-05 17:54 [dpdk-dev] [PATCH 00/17] Enhance mlx5 with Mellanox OFED 3.1 Adrien Mazarguil
                   ` (11 preceding siblings ...)
  2015-10-05 17:54 ` [dpdk-dev] [PATCH 12/17] app/testpmd: fix missing initialization in the RSS hash show command Adrien Mazarguil
@ 2015-10-05 17:54 ` Adrien Mazarguil
  2015-10-05 17:54 ` [dpdk-dev] [PATCH 14/17] mlx5: use experimental flows in hash RX queues Adrien Mazarguil
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 39+ messages in thread
From: Adrien Mazarguil @ 2015-10-05 17:54 UTC (permalink / raw)
  To: dev

From: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>

Normal MAC flows are not necessary when promiscuous mode is enabled.
Removing them frees up hardware resources.

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx5/mlx5_rxmode.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxmode.c b/drivers/net/mlx5/mlx5_rxmode.c
index aab38ee..578f2fb 100644
--- a/drivers/net/mlx5/mlx5_rxmode.c
+++ b/drivers/net/mlx5/mlx5_rxmode.c
@@ -123,6 +123,8 @@ priv_promiscuous_enable(struct priv *priv)
 		struct hash_rxq *hash_rxq = &(*priv->hash_rxqs)[i];
 		int ret;
 
+		/* Remove normal MAC flows first. */
+		hash_rxq_mac_addrs_del(hash_rxq);
 		ret = hash_rxq_promiscuous_enable(hash_rxq);
 		if (!ret)
 			continue;
@@ -130,6 +132,9 @@ priv_promiscuous_enable(struct priv *priv)
 		while (i != 0) {
 			hash_rxq = &(*priv->hash_rxqs)[--i];
 			hash_rxq_promiscuous_disable(hash_rxq);
+			/* Restore MAC flows. */
+			if (priv->started)
+				hash_rxq_mac_addrs_add(hash_rxq);
 		}
 		return ret;
 	}
@@ -189,8 +194,14 @@ priv_promiscuous_disable(struct priv *priv)
 
 	if (!priv->promisc)
 		return;
-	for (i = 0; (i != priv->hash_rxqs_n); ++i)
-		hash_rxq_promiscuous_disable(&(*priv->hash_rxqs)[i]);
+	for (i = 0; (i != priv->hash_rxqs_n); ++i) {
+		struct hash_rxq *hash_rxq = &(*priv->hash_rxqs)[i];
+
+		hash_rxq_promiscuous_disable(hash_rxq);
+		/* Restore MAC flows. */
+		if (priv->started)
+			hash_rxq_mac_addrs_add(hash_rxq);
+	}
 	priv->promisc = 0;
 }
 
-- 
2.1.0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [dpdk-dev] [PATCH 14/17] mlx5: use experimental flows in hash RX queues
  2015-10-05 17:54 [dpdk-dev] [PATCH 00/17] Enhance mlx5 with Mellanox OFED 3.1 Adrien Mazarguil
                   ` (12 preceding siblings ...)
  2015-10-05 17:54 ` [dpdk-dev] [PATCH 13/17] mlx5: remove normal MAC flows when enabling promiscuous mode Adrien Mazarguil
@ 2015-10-05 17:54 ` Adrien Mazarguil
  2015-10-05 17:54 ` [dpdk-dev] [PATCH 15/17] mlx5: enable multi packet send WR in TX CQ Adrien Mazarguil
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 39+ messages in thread
From: Adrien Mazarguil @ 2015-10-05 17:54 UTC (permalink / raw)
  To: dev

This is done because normal flows cannot support IPv6 at the moment.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx5/mlx5_mac.c    | 18 +++++++++---------
 drivers/net/mlx5/mlx5_rxmode.c | 18 +++++++++---------
 drivers/net/mlx5/mlx5_rxq.c    | 14 +++++++-------
 drivers/net/mlx5/mlx5_rxtx.h   | 14 +++++++-------
 4 files changed, 32 insertions(+), 32 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_mac.c b/drivers/net/mlx5/mlx5_mac.c
index bf69095..05e2484 100644
--- a/drivers/net/mlx5/mlx5_mac.c
+++ b/drivers/net/mlx5/mlx5_mac.c
@@ -116,8 +116,8 @@ hash_rxq_del_flow(struct hash_rxq *hash_rxq, unsigned int mac_index,
 	      (void *)hash_rxq,
 	      (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5],
 	      mac_index, priv->vlan_filter[vlan_index].id);
-	claim_zero(ibv_destroy_flow(hash_rxq->mac_flow
-				    [mac_index][vlan_index]));
+	claim_zero(ibv_exp_destroy_flow(hash_rxq->mac_flow
+					[mac_index][vlan_index]));
 	hash_rxq->mac_flow[mac_index][vlan_index] = NULL;
 }
 
@@ -246,15 +246,15 @@ static int
 hash_rxq_add_flow(struct hash_rxq *hash_rxq, unsigned int mac_index,
 		  unsigned int vlan_index)
 {
-	struct ibv_flow *flow;
+	struct ibv_exp_flow *flow;
 	struct priv *priv = hash_rxq->priv;
 	const uint8_t (*mac)[ETHER_ADDR_LEN] =
 			(const uint8_t (*)[ETHER_ADDR_LEN])
 			priv->mac[mac_index].addr_bytes;
 	FLOW_ATTR_SPEC_ETH(data, priv_populate_flow_attr(priv, NULL, 0,
 							 hash_rxq->type));
-	struct ibv_flow_attr *attr = &data->attr;
-	struct ibv_flow_spec_eth *spec = &data->spec;
+	struct ibv_exp_flow_attr *attr = &data->attr;
+	struct ibv_exp_flow_spec_eth *spec = &data->spec;
 
 	assert(mac_index < RTE_DIM(priv->mac));
 	assert((vlan_index < RTE_DIM(priv->vlan_filter)) || (vlan_index == -1u));
@@ -265,10 +265,10 @@ hash_rxq_add_flow(struct hash_rxq *hash_rxq, unsigned int mac_index,
 	assert(((uint8_t *)attr + sizeof(*attr)) == (uint8_t *)spec);
 	priv_populate_flow_attr(priv, attr, sizeof(data), hash_rxq->type);
 	/* The first specification must be Ethernet. */
-	assert(spec->type == IBV_FLOW_SPEC_ETH);
+	assert(spec->type == IBV_EXP_FLOW_SPEC_ETH);
 	assert(spec->size == sizeof(*spec));
-	*spec = (struct ibv_flow_spec_eth){
-		.type = IBV_FLOW_SPEC_ETH,
+	*spec = (struct ibv_exp_flow_spec_eth){
+		.type = IBV_EXP_FLOW_SPEC_ETH,
 		.size = sizeof(*spec),
 		.val = {
 			.dst_mac = {
@@ -293,7 +293,7 @@ hash_rxq_add_flow(struct hash_rxq *hash_rxq, unsigned int mac_index,
 	      ((vlan_index != -1u) ? priv->vlan_filter[vlan_index].id : -1u));
 	/* Create related flow. */
 	errno = 0;
-	flow = ibv_create_flow(hash_rxq->qp, attr);
+	flow = ibv_exp_create_flow(hash_rxq->qp, attr);
 	if (flow == NULL) {
 		/* It's not clear whether errno is always set in this case. */
 		ERROR("%p: flow configuration failed, errno=%d: %s",
diff --git a/drivers/net/mlx5/mlx5_rxmode.c b/drivers/net/mlx5/mlx5_rxmode.c
index 578f2fb..9b1551f 100644
--- a/drivers/net/mlx5/mlx5_rxmode.c
+++ b/drivers/net/mlx5/mlx5_rxmode.c
@@ -73,11 +73,11 @@ static void hash_rxq_allmulticast_disable(struct hash_rxq *);
 static int
 hash_rxq_promiscuous_enable(struct hash_rxq *hash_rxq)
 {
-	struct ibv_flow *flow;
+	struct ibv_exp_flow *flow;
 	struct priv *priv = hash_rxq->priv;
 	FLOW_ATTR_SPEC_ETH(data, priv_populate_flow_attr(priv, NULL, 0,
 							 hash_rxq->type));
-	struct ibv_flow_attr *attr = &data->attr;
+	struct ibv_exp_flow_attr *attr = &data->attr;
 
 	if (hash_rxq->priv->vf)
 		return 0;
@@ -88,7 +88,7 @@ hash_rxq_promiscuous_enable(struct hash_rxq *hash_rxq)
 	 * on specific MAC addresses. */
 	priv_populate_flow_attr(priv, attr, sizeof(data), hash_rxq->type);
 	errno = 0;
-	flow = ibv_create_flow(hash_rxq->qp, attr);
+	flow = ibv_exp_create_flow(hash_rxq->qp, attr);
 	if (flow == NULL) {
 		/* It's not clear whether errno is always set in this case. */
 		ERROR("%p: flow configuration failed, errno=%d: %s",
@@ -176,7 +176,7 @@ hash_rxq_promiscuous_disable(struct hash_rxq *hash_rxq)
 	DEBUG("%p: disabling promiscuous mode", (void *)hash_rxq);
 	if (hash_rxq->promisc_flow == NULL)
 		return;
-	claim_zero(ibv_destroy_flow(hash_rxq->promisc_flow));
+	claim_zero(ibv_exp_destroy_flow(hash_rxq->promisc_flow));
 	hash_rxq->promisc_flow = NULL;
 	DEBUG("%p: promiscuous mode disabled", (void *)hash_rxq);
 }
@@ -234,9 +234,9 @@ mlx5_promiscuous_disable(struct rte_eth_dev *dev)
 static int
 hash_rxq_allmulticast_enable(struct hash_rxq *hash_rxq)
 {
-	struct ibv_flow *flow;
-	struct ibv_flow_attr attr = {
-		.type = IBV_FLOW_ATTR_MC_DEFAULT,
+	struct ibv_exp_flow *flow;
+	struct ibv_exp_flow_attr attr = {
+		.type = IBV_EXP_FLOW_ATTR_MC_DEFAULT,
 		.num_of_specs = 0,
 		.port = hash_rxq->priv->port,
 		.flags = 0
@@ -246,7 +246,7 @@ hash_rxq_allmulticast_enable(struct hash_rxq *hash_rxq)
 	if (hash_rxq->allmulti_flow != NULL)
 		return EBUSY;
 	errno = 0;
-	flow = ibv_create_flow(hash_rxq->qp, &attr);
+	flow = ibv_exp_create_flow(hash_rxq->qp, &attr);
 	if (flow == NULL) {
 		/* It's not clear whether errno is always set in this case. */
 		ERROR("%p: flow configuration failed, errno=%d: %s",
@@ -327,7 +327,7 @@ hash_rxq_allmulticast_disable(struct hash_rxq *hash_rxq)
 	DEBUG("%p: disabling allmulticast mode", (void *)hash_rxq);
 	if (hash_rxq->allmulti_flow == NULL)
 		return;
-	claim_zero(ibv_destroy_flow(hash_rxq->allmulti_flow));
+	claim_zero(ibv_exp_destroy_flow(hash_rxq->allmulti_flow));
 	hash_rxq->allmulti_flow = NULL;
 	DEBUG("%p: allmulticast mode disabled", (void *)hash_rxq);
 }
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 283e56b..717824c 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -74,7 +74,7 @@ const struct hash_rxq_init hash_rxq_init[] = {
 		.dpdk_rss_hf = ETH_RSS_NONFRAG_IPV4_TCP,
 		.flow_priority = 0,
 		.flow_spec.tcp_udp = {
-			.type = IBV_FLOW_SPEC_TCP,
+			.type = IBV_EXP_FLOW_SPEC_TCP,
 			.size = sizeof(hash_rxq_init[0].flow_spec.tcp_udp),
 		},
 		.underlayer = &hash_rxq_init[HASH_RXQ_IPv4],
@@ -87,7 +87,7 @@ const struct hash_rxq_init hash_rxq_init[] = {
 		.dpdk_rss_hf = ETH_RSS_NONFRAG_IPV4_UDP,
 		.flow_priority = 0,
 		.flow_spec.tcp_udp = {
-			.type = IBV_FLOW_SPEC_UDP,
+			.type = IBV_EXP_FLOW_SPEC_UDP,
 			.size = sizeof(hash_rxq_init[0].flow_spec.tcp_udp),
 		},
 		.underlayer = &hash_rxq_init[HASH_RXQ_IPv4],
@@ -99,7 +99,7 @@ const struct hash_rxq_init hash_rxq_init[] = {
 				ETH_RSS_FRAG_IPV4),
 		.flow_priority = 1,
 		.flow_spec.ipv4 = {
-			.type = IBV_FLOW_SPEC_IPV4,
+			.type = IBV_EXP_FLOW_SPEC_IPV4,
 			.size = sizeof(hash_rxq_init[0].flow_spec.ipv4),
 		},
 		.underlayer = &hash_rxq_init[HASH_RXQ_ETH],
@@ -109,7 +109,7 @@ const struct hash_rxq_init hash_rxq_init[] = {
 		.dpdk_rss_hf = 0,
 		.flow_priority = 2,
 		.flow_spec.eth = {
-			.type = IBV_FLOW_SPEC_ETH,
+			.type = IBV_EXP_FLOW_SPEC_ETH,
 			.size = sizeof(hash_rxq_init[0].flow_spec.eth),
 		},
 		.underlayer = NULL,
@@ -176,7 +176,7 @@ const size_t rss_hash_default_key_len = sizeof(rss_hash_default_key);
  */
 size_t
 priv_populate_flow_attr(const struct priv *priv,
-			struct ibv_flow_attr *flow_attr,
+			struct ibv_exp_flow_attr *flow_attr,
 			size_t flow_attr_size,
 			enum hash_rxq_type type)
 {
@@ -192,8 +192,8 @@ priv_populate_flow_attr(const struct priv *priv,
 		return offset;
 	flow_attr_size = offset;
 	init = &hash_rxq_init[type];
-	*flow_attr = (struct ibv_flow_attr){
-		.type = IBV_FLOW_ATTR_NORMAL,
+	*flow_attr = (struct ibv_exp_flow_attr){
+		.type = IBV_EXP_FLOW_ATTR_NORMAL,
 		.priority = init->flow_priority,
 		.num_of_specs = 0,
 		.port = priv->port,
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 0db4393..4018ac1 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -133,8 +133,8 @@ enum hash_rxq_type {
 /* Flow structure with Ethernet specification. It is packed to prevent padding
  * between attr and spec as this layout is expected by libibverbs. */
 struct flow_attr_spec_eth {
-	struct ibv_flow_attr attr;
-	struct ibv_flow_spec_eth spec;
+	struct ibv_exp_flow_attr attr;
+	struct ibv_exp_flow_spec_eth spec;
 } __attribute__((packed));
 
 /* Define a struct flow_attr_spec_eth object as an array of at least
@@ -150,7 +150,7 @@ struct hash_rxq_init {
 	uint64_t hash_fields; /* Fields that participate in the hash. */
 	uint64_t dpdk_rss_hf; /* Matching DPDK RSS hash fields. */
 	unsigned int flow_priority; /* Flow priority to use. */
-	struct ibv_flow_spec flow_spec; /* Flow specification template. */
+	struct ibv_exp_flow_spec flow_spec; /* Flow specification template. */
 	const struct hash_rxq_init *underlayer; /* Pointer to underlayer. */
 };
 
@@ -168,9 +168,9 @@ struct hash_rxq {
 	enum hash_rxq_type type; /* Hash RX queue type. */
 	/* Each VLAN ID requires a separate flow steering rule. */
 	BITFIELD_DECLARE(mac_configured, uint32_t, MLX5_MAX_MAC_ADDRESSES);
-	struct ibv_flow *mac_flow[MLX5_MAX_MAC_ADDRESSES][MLX5_MAX_VLAN_IDS];
-	struct ibv_flow *promisc_flow; /* Promiscuous flow. */
-	struct ibv_flow *allmulti_flow; /* Multicast flow. */
+	struct ibv_exp_flow *mac_flow[MLX5_MAX_MAC_ADDRESSES][MLX5_MAX_VLAN_IDS];
+	struct ibv_exp_flow *promisc_flow; /* Promiscuous flow. */
+	struct ibv_exp_flow *allmulti_flow; /* Multicast flow. */
 };
 
 /* TX element. */
@@ -223,7 +223,7 @@ extern const unsigned int hash_rxq_init_n;
 extern uint8_t rss_hash_default_key[];
 extern const size_t rss_hash_default_key_len;
 
-size_t priv_populate_flow_attr(const struct priv *, struct ibv_flow_attr *,
+size_t priv_populate_flow_attr(const struct priv *, struct ibv_exp_flow_attr *,
 			       size_t, enum hash_rxq_type);
 int priv_create_hash_rxqs(struct priv *);
 void priv_destroy_hash_rxqs(struct priv *);
-- 
2.1.0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [dpdk-dev] [PATCH 15/17] mlx5: enable multi packet send WR in TX CQ
  2015-10-05 17:54 [dpdk-dev] [PATCH 00/17] Enhance mlx5 with Mellanox OFED 3.1 Adrien Mazarguil
                   ` (13 preceding siblings ...)
  2015-10-05 17:54 ` [dpdk-dev] [PATCH 14/17] mlx5: use experimental flows in hash RX queues Adrien Mazarguil
@ 2015-10-05 17:54 ` Adrien Mazarguil
  2015-10-05 17:54 ` [dpdk-dev] [PATCH 16/17] mlx5: fix compilation error with GCC < 4.6 Adrien Mazarguil
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 39+ messages in thread
From: Adrien Mazarguil @ 2015-10-05 17:54 UTC (permalink / raw)
  To: dev

For adapters that support it, this flag improves performance outside of VF
context.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx5/Makefile   | 5 +++++
 drivers/net/mlx5/mlx5_txq.c | 7 +++++++
 2 files changed, 12 insertions(+)

diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 54f1e89..6ced4ac 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -116,6 +116,11 @@ mlx5_autoconf.h: $(RTE_SDK)/scripts/auto-config-h.sh
 		HAVE_EXP_QUERY_DEVICE \
 		infiniband/verbs.h \
 		type 'struct ibv_exp_device_attr' $(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_EXP_QP_BURST_CREATE_ENABLE_MULTI_PACKET_SEND_WR \
+		infiniband/verbs.h \
+		enum IBV_EXP_QP_BURST_CREATE_ENABLE_MULTI_PACKET_SEND_WR \
+		$(AUTOCONF_OUTPUT)
 
 mlx5.o: mlx5_autoconf.h
 
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index a53b128..aa7581f 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -395,6 +395,13 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
 		.intf_scope = IBV_EXP_INTF_GLOBAL,
 		.intf = IBV_EXP_INTF_QP_BURST,
 		.obj = tmpl.qp,
+#ifdef HAVE_EXP_QP_BURST_CREATE_ENABLE_MULTI_PACKET_SEND_WR
+		/* Multi packet send WR can only be used outside of VF. */
+		.family_flags =
+			(!priv->vf ?
+			 IBV_EXP_QP_BURST_CREATE_ENABLE_MULTI_PACKET_SEND_WR :
+			 0),
+#endif
 	};
 	tmpl.if_qp = ibv_exp_query_intf(priv->ctx, &attr.params, &status);
 	if (tmpl.if_qp == NULL) {
-- 
2.1.0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [dpdk-dev] [PATCH 16/17] mlx5: fix compilation error with GCC < 4.6
  2015-10-05 17:54 [dpdk-dev] [PATCH 00/17] Enhance mlx5 with Mellanox OFED 3.1 Adrien Mazarguil
                   ` (14 preceding siblings ...)
  2015-10-05 17:54 ` [dpdk-dev] [PATCH 15/17] mlx5: enable multi packet send WR in TX CQ Adrien Mazarguil
@ 2015-10-05 17:54 ` Adrien Mazarguil
  2015-10-05 17:54 ` [dpdk-dev] [PATCH 17/17] doc: update mlx5 documentation Adrien Mazarguil
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 39+ messages in thread
From: Adrien Mazarguil @ 2015-10-05 17:54 UTC (permalink / raw)
  To: dev; +Cc: Yaacov Hazan

From: Yaacov Hazan <yaacovh@mellanox.com>

Seen with GCC < 4.6:

 error: unknown field ‘tcp_udp’ specified in initializer
 error: extra brace group at end of initializer

Static initialization of anonymous structs/unions is a C11 feature
properly supported only since GCC 4.6.

Work around compilation errors with older versions by expanding
struct ibv_exp_flow_spec into struct hash_rxq_init.

Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx5/mlx5_rxtx.h | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 4018ac1..422730d 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -150,7 +150,15 @@ struct hash_rxq_init {
 	uint64_t hash_fields; /* Fields that participate in the hash. */
 	uint64_t dpdk_rss_hf; /* Matching DPDK RSS hash fields. */
 	unsigned int flow_priority; /* Flow priority to use. */
-	struct ibv_exp_flow_spec flow_spec; /* Flow specification template. */
+	union {
+		struct {
+			enum ibv_exp_flow_spec_type type;
+			uint16_t size;
+		} hdr;
+		struct ibv_exp_flow_spec_tcp_udp tcp_udp;
+		struct ibv_exp_flow_spec_ipv4 ipv4;
+		struct ibv_exp_flow_spec_eth eth;
+	} flow_spec; /* Flow specification template. */
 	const struct hash_rxq_init *underlayer; /* Pointer to underlayer. */
 };
 
-- 
2.1.0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [dpdk-dev] [PATCH 17/17] doc: update mlx5 documentation
  2015-10-05 17:54 [dpdk-dev] [PATCH 00/17] Enhance mlx5 with Mellanox OFED 3.1 Adrien Mazarguil
                   ` (15 preceding siblings ...)
  2015-10-05 17:54 ` [dpdk-dev] [PATCH 16/17] mlx5: fix compilation error with GCC < 4.6 Adrien Mazarguil
@ 2015-10-05 17:54 ` Adrien Mazarguil
  2015-10-06  8:54 ` [dpdk-dev] [PATCH 00/17] Enhance mlx5 with Mellanox OFED 3.1 Stephen Hemminger
  2015-10-30 18:55 ` [dpdk-dev] [PATCH v2 00/16] " Adrien Mazarguil
  18 siblings, 0 replies; 39+ messages in thread
From: Adrien Mazarguil @ 2015-10-05 17:54 UTC (permalink / raw)
  To: dev

Add new features related to Mellanox OFED 3.1 support.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 doc/guides/nics/mlx5.rst | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index fdb621c..2d68914 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -73,6 +73,25 @@ long as they share the same MAC address.
 Enabling librte_pmd_mlx5 causes DPDK applications to be linked against
 libibverbs.
 
+Features
+--------
+
+- Multiple TX and RX queues.
+- Support for scattered TX and RX frames.
+- IPv4, TCPv4 and UDPv4 RSS on any number of queues.
+- Several RSS hash keys, one for each flow type.
+- Support for multiple MAC addresses.
+- VLAN filtering.
+- Promiscuous mode.
+
+Limitations
+-----------
+
+- IPv6 and inner VXLAN RSS are not supported yet.
+- Port statistics through software counters only.
+- No allmulticast mode.
+- Hardware checksum offloads are not supported yet.
+
 Configuration
 -------------
 
@@ -171,6 +190,13 @@ DPDK and must be installed separately:
    Both libraries are BSD and GPL licensed. Linux kernel modules are GPL
    licensed.
 
+Currently supported by DPDK:
+
+- Mellanox OFED **3.1**.
+- Minimum firmware version:
+  - ConnectX-4: **12.12.0780**.
+  - ConnectX-4 Lx: **14.12.0780**.
+
 Getting Mellanox OFED
 ~~~~~~~~~~~~~~~~~~~~~
 
-- 
2.1.0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [dpdk-dev] [PATCH 00/17] Enhance mlx5 with Mellanox OFED 3.1
  2015-10-05 17:54 [dpdk-dev] [PATCH 00/17] Enhance mlx5 with Mellanox OFED 3.1 Adrien Mazarguil
                   ` (16 preceding siblings ...)
  2015-10-05 17:54 ` [dpdk-dev] [PATCH 17/17] doc: update mlx5 documentation Adrien Mazarguil
@ 2015-10-06  8:54 ` Stephen Hemminger
  2015-10-06  9:58   ` Vincent JARDIN
  2015-10-07 13:30   ` Joongi Kim
  2015-10-30 18:55 ` [dpdk-dev] [PATCH v2 00/16] " Adrien Mazarguil
  18 siblings, 2 replies; 39+ messages in thread
From: Stephen Hemminger @ 2015-10-06  8:54 UTC (permalink / raw)
  To: Adrien Mazarguil; +Cc: dev

On Mon,  5 Oct 2015 19:54:35 +0200
Adrien Mazarguil <adrien.mazarguil@6wind.com> wrote:

> Mellanox OFED 3.1 [1] comes with improved APIs that Mellanox ConnectX-4
> (mlx5) adapters can take advantage of, such as:
> 
> - Separate post and doorbell operations on all queues.
> - Lightweight RX queues called Work Queues (WQs).
> - Low-level RSS indirection table and hash key configuration.
> 
> This patchset enhances mlx5 with all of these for better performance and
> flexibility. Documentation is updated accordingly.

Has anybody explored doing a driver without the dependency on OFED?
It is certainly possible. The Linux kernel drivers don't depend on it.
And dropping OFED would certainly be faster.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [dpdk-dev] [PATCH 00/17] Enhance mlx5 with Mellanox OFED 3.1
  2015-10-06  8:54 ` [dpdk-dev] [PATCH 00/17] Enhance mlx5 with Mellanox OFED 3.1 Stephen Hemminger
@ 2015-10-06  9:58   ` Vincent JARDIN
  2015-10-07 13:30   ` Joongi Kim
  1 sibling, 0 replies; 39+ messages in thread
From: Vincent JARDIN @ 2015-10-06  9:58 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

Le 6 oct. 2015 09:54, "Stephen Hemminger" <stephen@networkplumber.org> a
écrit :
>
> On Mon,  5 Oct 2015 19:54:35 +0200
> Adrien Mazarguil <adrien.mazarguil@6wind.com> wrote:
>
> > Mellanox OFED 3.1 [1] comes with improved APIs that Mellanox ConnectX-4
> > (mlx5) adapters can take advantage of, such as:
> >
> > - Separate post and doorbell operations on all queues.
> > - Lightweight RX queues called Work Queues (WQs).
> > - Low-level RSS indirection table and hash key configuration.
> >
> > This patchset enhances mlx5 with all of these for better performance and
> > flexibility. Documentation is updated accordingly.
>
> Has anybody explored doing a driver without the dependency on OFED?
> It is certainly possible. The Linux kernel drivers don't depend on it.
> And dropping OFED would certainly be faster.

OFED is an established kernel API. I agree that F from infiniband should be
deprecated since it has broader scope of use.

It avoid wasting effort by duplicating kernel's code.

It provides security too that UIO could not provide.

Best regards,
  Vincent

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [dpdk-dev] [PATCH 00/17] Enhance mlx5 with Mellanox OFED 3.1
  2015-10-06  8:54 ` [dpdk-dev] [PATCH 00/17] Enhance mlx5 with Mellanox OFED 3.1 Stephen Hemminger
  2015-10-06  9:58   ` Vincent JARDIN
@ 2015-10-07 13:30   ` Joongi Kim
  1 sibling, 0 replies; 39+ messages in thread
From: Joongi Kim @ 2015-10-07 13:30 UTC (permalink / raw)
  To: Stephen Hemminger, Adrien Mazarguil; +Cc: dev

My laboratory (an.kaist.ac.kr) had tried to build a native kernel driver
for mlx4 a few months ago, and sent the full patch to the patchwork system:
http://dpdk.org/dev/patchwork/patch/6128/
This driver supports only minimal RX/TX of packets, and many standard
features such as VLAN are missing.

The major technical difficulty was to make a memory region that is
persistent even when the user process terminates. We have sent another
patch for this: http://dpdk.org/dev/patchwork/patch/6127/

Nonetheless, we abandoned this approach, because the new mlx4 PMD based on
OFED 3.0 performed almost same or better than our native driver with an
almost complete feature set. Still, I believe that there are rooms to
improve/optimize the native driver but we just do not have enough human
resources for that.
In background, we tried to publish a academic paper about an automated
convertor from Linux NIC drivers to DPDK poll-mode drivers, but
unfortunately this project is being hold back now.

Regards,
Joongi

2015년 10월 6일 (화) 오후 5:54, Stephen Hemminger <stephen@networkplumber.org>님이
작성:

> On Mon,  5 Oct 2015 19:54:35 +0200
> Adrien Mazarguil <adrien.mazarguil@6wind.com> wrote:
>
> > Mellanox OFED 3.1 [1] comes with improved APIs that Mellanox ConnectX-4
> > (mlx5) adapters can take advantage of, such as:
> >
> > - Separate post and doorbell operations on all queues.
> > - Lightweight RX queues called Work Queues (WQs).
> > - Low-level RSS indirection table and hash key configuration.
> >
> > This patchset enhances mlx5 with all of these for better performance and
> > flexibility. Documentation is updated accordingly.
>
> Has anybody explored doing a driver without the dependency on OFED?
> It is certainly possible. The Linux kernel drivers don't depend on it.
> And dropping OFED would certainly be faster.
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [dpdk-dev] [PATCH v2 00/16] Enhance mlx5 with Mellanox OFED 3.1
  2015-10-05 17:54 [dpdk-dev] [PATCH 00/17] Enhance mlx5 with Mellanox OFED 3.1 Adrien Mazarguil
                   ` (17 preceding siblings ...)
  2015-10-06  8:54 ` [dpdk-dev] [PATCH 00/17] Enhance mlx5 with Mellanox OFED 3.1 Stephen Hemminger
@ 2015-10-30 18:55 ` Adrien Mazarguil
  2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 01/16] mlx5: use fast Verbs interface for scattered RX operation Adrien Mazarguil
                     ` (16 more replies)
  18 siblings, 17 replies; 39+ messages in thread
From: Adrien Mazarguil @ 2015-10-30 18:55 UTC (permalink / raw)
  To: dev

Mellanox OFED 3.1 [1] comes with improved APIs that Mellanox ConnectX-4
(mlx5) adapters can take advantage of, such as:

- Separate post and doorbell operations on all queues.
- Lightweight RX queues called Work Queues (WQs).
- Low-level RSS indirection table and hash key configuration.

This patchset enhances mlx5 with all of these for better performance and
flexibility. Documentation is updated accordingly.

[1] http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers

Changes in v2:
- Fixed compilation errors due to a missing libibverbs callback in the first
  two commits.
- Improved clean up in device stop function.
- Added flows clean up when going promiscuous to save HW resources.
- Removed VF check in promiscuous mode.
- Added RSS support for IPv6 flows.
- Renamed flow specification generator function.
- Modified allmulticast code to use enhanced flow specifications.
- Remaining changes are caused by rebase on v2 of the initial patchset
  ("Mellanox ConnectX-4 PMD (mlx5)").

Adrien Mazarguil (7):
  mlx5: use fast Verbs interface for scattered RX operation
  mlx5: get rid of the WR structure in RX queue elements
  mlx5: refactor RX code for the new Verbs RSS API
  app/testpmd: fix missing initialization in the RSS hash show command
  mlx5: add IPv6 RSS support using experimental flows
  mlx5: enable multi packet send WR in TX CQ
  doc: update mlx5 documentation

Nelio Laranjeiro (5):
  mlx5: adapt indirection table size depending on RX queues number
  mlx5: add RSS hash update/get
  mlx5: use one RSS hash key per flow type
  app/testpmd: add missing type to RSS hash commands
  mlx5: disable useless flows in promiscuous mode

Olga Shern (3):
  mlx5: use separate indirection table for default hash RX queue
  mlx5: define specific flow steering rules for each hash RX QP
  mlx5: use alternate method to configure promisc and allmulti modes

Yaacov Hazan (1):
  mlx5: fix compilation error with GCC < 4.6

 app/test-pmd/cmdline.c                      |   45 +-
 app/test-pmd/config.c                       |   69 +-
 app/test-pmd/testpmd.h                      |    6 +-
 doc/guides/nics/mlx5.rst                    |   26 +
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |    2 +-
 drivers/net/mlx5/Makefile                   |   14 +-
 drivers/net/mlx5/mlx5.c                     |   67 +-
 drivers/net/mlx5/mlx5.h                     |   54 +-
 drivers/net/mlx5/mlx5_defs.h                |    3 +
 drivers/net/mlx5/mlx5_ethdev.c              |   54 +-
 drivers/net/mlx5/mlx5_mac.c                 |  212 +++---
 drivers/net/mlx5/mlx5_rss.c                 |  213 ++++++
 drivers/net/mlx5/mlx5_rxmode.c              |  313 +++++----
 drivers/net/mlx5/mlx5_rxq.c                 | 1012 ++++++++++++++++++---------
 drivers/net/mlx5/mlx5_rxtx.c                |   68 +-
 drivers/net/mlx5/mlx5_rxtx.h                |   97 ++-
 drivers/net/mlx5/mlx5_trigger.c             |   87 +--
 drivers/net/mlx5/mlx5_txq.c                 |    7 +
 drivers/net/mlx5/mlx5_utils.h               |    2 -
 drivers/net/mlx5/mlx5_vlan.c                |   33 +-
 20 files changed, 1532 insertions(+), 852 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5_rss.c

-- 
2.1.0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [dpdk-dev] [PATCH v2 01/16] mlx5: use fast Verbs interface for scattered RX operation
  2015-10-30 18:55 ` [dpdk-dev] [PATCH v2 00/16] " Adrien Mazarguil
@ 2015-10-30 18:55   ` Adrien Mazarguil
  2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 02/16] mlx5: get rid of the WR structure in RX queue elements Adrien Mazarguil
                     ` (15 subsequent siblings)
  16 siblings, 0 replies; 39+ messages in thread
From: Adrien Mazarguil @ 2015-10-30 18:55 UTC (permalink / raw)
  To: dev

This commit updates mlx5_rx_burst_sp() to use the fast verbs interface for
posting RX buffers just like mlx5_rx_burst(). Doing so avoids a loop in
libmlx5 and an indirect function call through libibverbs.

Note: recv_sg_list() is not implemented in the QP burst API, this commit is
only to prepare transition to the WQ-based API.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx5/Makefile    |  4 ++++
 drivers/net/mlx5/mlx5_rxtx.c | 40 +++++++++++++++++-----------------------
 2 files changed, 21 insertions(+), 23 deletions(-)

diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 8b1e32b..2969045 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -119,6 +119,10 @@ mlx5_autoconf.h: $(RTE_SDK)/scripts/auto-config-h.sh
 		HAVE_EXP_QUERY_DEVICE \
 		infiniband/verbs.h \
 		type 'struct ibv_exp_device_attr' $(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_EXP_QP_BURST_RECV_SG_LIST \
+		infiniband/verbs.h \
+		field 'struct ibv_exp_qp_burst_family.recv_sg_list' $(AUTOCONF_OUTPUT)
 
 mlx5.o: mlx5_autoconf.h
 
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 623219d..8872f19 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -35,6 +35,7 @@
 #include <stdint.h>
 #include <string.h>
 #include <stdlib.h>
+#include <errno.h>
 
 /* Verbs header. */
 /* ISO C doesn't support unnamed structs/unions, disabling -pedantic. */
@@ -60,6 +61,7 @@
 #endif
 
 #include "mlx5.h"
+#include "mlx5_autoconf.h"
 #include "mlx5_utils.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_defs.h"
@@ -600,9 +602,6 @@ mlx5_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 	struct rxq_elt_sp (*elts)[rxq->elts_n] = rxq->elts.sp;
 	const unsigned int elts_n = rxq->elts_n;
 	unsigned int elts_head = rxq->elts_head;
-	struct ibv_recv_wr head;
-	struct ibv_recv_wr **next = &head.next;
-	struct ibv_recv_wr *bad_wr;
 	unsigned int i;
 	unsigned int pkts_ret = 0;
 	int ret;
@@ -660,9 +659,6 @@ mlx5_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 				/* Increment dropped packets counter. */
 				++rxq->stats.idropped;
 #endif
-				/* Link completed WRs together for repost. */
-				*next = wr;
-				next = &wr->next;
 				goto repost;
 			}
 			ret = wc.byte_len;
@@ -671,9 +667,6 @@ mlx5_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 			break;
 		len = ret;
 		pkt_buf_len = len;
-		/* Link completed WRs together for repost. */
-		*next = wr;
-		next = &wr->next;
 		/*
 		 * Replace spent segments with new ones, concatenate and
 		 * return them as pkt_buf.
@@ -770,26 +763,27 @@ mlx5_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		rxq->stats.ibytes += pkt_buf_len;
 #endif
 repost:
+#ifdef HAVE_EXP_QP_BURST_RECV_SG_LIST
+		ret = rxq->if_qp->recv_sg_list(rxq->qp,
+					       elt->sges,
+					       RTE_DIM(elt->sges));
+#else /* HAVE_EXP_QP_BURST_RECV_SG_LIST */
+		errno = ENOSYS;
+		ret = -1;
+#endif /* HAVE_EXP_QP_BURST_RECV_SG_LIST */
+		if (unlikely(ret)) {
+			/* Inability to repost WRs is fatal. */
+			DEBUG("%p: recv_sg_list(): failed (ret=%d)",
+			      (void *)rxq->priv,
+			      ret);
+			abort();
+		}
 		if (++elts_head >= elts_n)
 			elts_head = 0;
 		continue;
 	}
 	if (unlikely(i == 0))
 		return 0;
-	*next = NULL;
-	/* Repost WRs. */
-#ifdef DEBUG_RECV
-	DEBUG("%p: reposting %d WRs", (void *)rxq, i);
-#endif
-	ret = ibv_post_recv(rxq->qp, head.next, &bad_wr);
-	if (unlikely(ret)) {
-		/* Inability to repost WRs is fatal. */
-		DEBUG("%p: ibv_post_recv(): failed for WR %p: %s",
-		      (void *)rxq->priv,
-		      (void *)bad_wr,
-		      strerror(ret));
-		abort();
-	}
 	rxq->elts_head = elts_head;
 #ifdef MLX5_PMD_SOFT_COUNTERS
 	/* Increment packets counter. */
-- 
2.1.0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [dpdk-dev] [PATCH v2 02/16] mlx5: get rid of the WR structure in RX queue elements
  2015-10-30 18:55 ` [dpdk-dev] [PATCH v2 00/16] " Adrien Mazarguil
  2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 01/16] mlx5: use fast Verbs interface for scattered RX operation Adrien Mazarguil
@ 2015-10-30 18:55   ` Adrien Mazarguil
  2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 03/16] mlx5: refactor RX code for the new Verbs RSS API Adrien Mazarguil
                     ` (14 subsequent siblings)
  16 siblings, 0 replies; 39+ messages in thread
From: Adrien Mazarguil @ 2015-10-30 18:55 UTC (permalink / raw)
  To: dev

Removing this structure reduces the size of SG and non-SG RX queue elements
significantly to improve performance.

An nice side effect is that the mbuf pointer is now fully stored in
struct rxq_elt instead of relying on the WR ID data offset hack.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Olga Shern <olgas@mellanox.com>
Signed-off-by: Or Ami <ora@mellanox.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx5/mlx5.h       |  18 -----
 drivers/net/mlx5/mlx5_rxq.c   | 173 ++++++++++++++++++++++--------------------
 drivers/net/mlx5/mlx5_rxtx.c  |  33 ++------
 drivers/net/mlx5/mlx5_rxtx.h  |   4 +-
 drivers/net/mlx5/mlx5_utils.h |   2 -
 5 files changed, 98 insertions(+), 132 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 3a1e7a6..c8a517c 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -115,24 +115,6 @@ struct priv {
 	rte_spinlock_t lock; /* Lock for control functions. */
 };
 
-/* Work Request ID data type (64 bit). */
-typedef union {
-	struct {
-		uint32_t id;
-		uint16_t offset;
-	} data;
-	uint64_t raw;
-} wr_id_t;
-
-/* Compile-time check. */
-static inline void wr_id_t_check(void)
-{
-	wr_id_t check[1 + (2 * -!(sizeof(wr_id_t) == sizeof(uint64_t)))];
-
-	(void)check;
-	(void)wr_id_t_check;
-}
-
 /**
  * Lock private structure to protect it from concurrent access in the
  * control path.
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 5a55886..f2f773e 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -60,6 +60,7 @@
 #endif
 
 #include "mlx5.h"
+#include "mlx5_autoconf.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_utils.h"
 #include "mlx5_defs.h"
@@ -97,16 +98,10 @@ rxq_alloc_elts_sp(struct rxq *rxq, unsigned int elts_n,
 	for (i = 0; (i != elts_n); ++i) {
 		unsigned int j;
 		struct rxq_elt_sp *elt = &(*elts)[i];
-		struct ibv_recv_wr *wr = &elt->wr;
 		struct ibv_sge (*sges)[RTE_DIM(elt->sges)] = &elt->sges;
 
 		/* These two arrays must have the same size. */
 		assert(RTE_DIM(elt->sges) == RTE_DIM(elt->bufs));
-		/* Configure WR. */
-		wr->wr_id = i;
-		wr->next = &(*elts)[(i + 1)].wr;
-		wr->sg_list = &(*sges)[0];
-		wr->num_sge = RTE_DIM(*sges);
 		/* For each SGE (segment). */
 		for (j = 0; (j != RTE_DIM(elt->bufs)); ++j) {
 			struct ibv_sge *sge = &(*sges)[j];
@@ -149,8 +144,6 @@ rxq_alloc_elts_sp(struct rxq *rxq, unsigned int elts_n,
 			assert(sge->length == rte_pktmbuf_tailroom(buf));
 		}
 	}
-	/* The last WR pointer must be NULL. */
-	(*elts)[(i - 1)].wr.next = NULL;
 	DEBUG("%p: allocated and configured %u WRs (%zu segments)",
 	      (void *)rxq, elts_n, (elts_n * RTE_DIM((*elts)[0].sges)));
 	rxq->elts_n = elts_n;
@@ -242,7 +235,6 @@ rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n, struct rte_mbuf **pool)
 	/* For each WR (packet). */
 	for (i = 0; (i != elts_n); ++i) {
 		struct rxq_elt *elt = &(*elts)[i];
-		struct ibv_recv_wr *wr = &elt->wr;
 		struct ibv_sge *sge = &(*elts)[i].sge;
 		struct rte_mbuf *buf;
 
@@ -258,16 +250,7 @@ rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n, struct rte_mbuf **pool)
 			ret = ENOMEM;
 			goto error;
 		}
-		/* Configure WR. Work request ID contains its own index in
-		 * the elts array and the offset between SGE buffer header and
-		 * its data. */
-		WR_ID(wr->wr_id).id = i;
-		WR_ID(wr->wr_id).offset =
-			(((uintptr_t)buf->buf_addr + RTE_PKTMBUF_HEADROOM) -
-			 (uintptr_t)buf);
-		wr->next = &(*elts)[(i + 1)].wr;
-		wr->sg_list = sge;
-		wr->num_sge = 1;
+		elt->buf = buf;
 		/* Headroom is reserved by rte_pktmbuf_alloc(). */
 		assert(DATA_OFF(buf) == RTE_PKTMBUF_HEADROOM);
 		/* Buffer is supposed to be empty. */
@@ -282,21 +265,7 @@ rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n, struct rte_mbuf **pool)
 		sge->lkey = rxq->mr->lkey;
 		/* Redundant check for tailroom. */
 		assert(sge->length == rte_pktmbuf_tailroom(buf));
-		/* Make sure elts index and SGE mbuf pointer can be deduced
-		 * from WR ID. */
-		if ((WR_ID(wr->wr_id).id != i) ||
-		    ((void *)((uintptr_t)sge->addr -
-			WR_ID(wr->wr_id).offset) != buf)) {
-			ERROR("%p: cannot store index and offset in WR ID",
-			      (void *)rxq);
-			sge->addr = 0;
-			rte_pktmbuf_free(buf);
-			ret = EOVERFLOW;
-			goto error;
-		}
 	}
-	/* The last WR pointer must be NULL. */
-	(*elts)[(i - 1)].wr.next = NULL;
 	DEBUG("%p: allocated and configured %u single-segment WRs",
 	      (void *)rxq, elts_n);
 	rxq->elts_n = elts_n;
@@ -309,14 +278,10 @@ error:
 		assert(pool == NULL);
 		for (i = 0; (i != RTE_DIM(*elts)); ++i) {
 			struct rxq_elt *elt = &(*elts)[i];
-			struct rte_mbuf *buf;
+			struct rte_mbuf *buf = elt->buf;
 
-			if (elt->sge.addr == 0)
-				continue;
-			assert(WR_ID(elt->wr.wr_id).id == i);
-			buf = (void *)((uintptr_t)elt->sge.addr -
-				WR_ID(elt->wr.wr_id).offset);
-			rte_pktmbuf_free_seg(buf);
+			if (buf != NULL)
+				rte_pktmbuf_free_seg(buf);
 		}
 		rte_free(elts);
 	}
@@ -345,14 +310,10 @@ rxq_free_elts(struct rxq *rxq)
 		return;
 	for (i = 0; (i != RTE_DIM(*elts)); ++i) {
 		struct rxq_elt *elt = &(*elts)[i];
-		struct rte_mbuf *buf;
+		struct rte_mbuf *buf = elt->buf;
 
-		if (elt->sge.addr == 0)
-			continue;
-		assert(WR_ID(elt->wr.wr_id).id == i);
-		buf = (void *)((uintptr_t)elt->sge.addr -
-			WR_ID(elt->wr.wr_id).offset);
-		rte_pktmbuf_free_seg(buf);
+		if (buf != NULL)
+			rte_pktmbuf_free_seg(buf);
 	}
 	rte_free(elts);
 }
@@ -552,7 +513,6 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 	struct rte_mbuf **pool;
 	unsigned int i, k;
 	struct ibv_exp_qp_attr mod;
-	struct ibv_recv_wr *bad_wr;
 	int err;
 	int parent = (rxq == &priv->rxq_parent);
 
@@ -670,11 +630,8 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 
 		for (i = 0; (i != RTE_DIM(*elts)); ++i) {
 			struct rxq_elt *elt = &(*elts)[i];
-			struct rte_mbuf *buf = (void *)
-				((uintptr_t)elt->sge.addr -
-				 WR_ID(elt->wr.wr_id).offset);
+			struct rte_mbuf *buf = elt->buf;
 
-			assert(WR_ID(elt->wr.wr_id).id == i);
 			pool[k++] = buf;
 		}
 	}
@@ -698,17 +655,41 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 	rxq->elts_n = 0;
 	rte_free(rxq->elts.sp);
 	rxq->elts.sp = NULL;
-	/* Post WRs. */
-	err = ibv_post_recv(tmpl.qp,
-			    (tmpl.sp ?
-			     &(*tmpl.elts.sp)[0].wr :
-			     &(*tmpl.elts.no_sp)[0].wr),
-			    &bad_wr);
+	/* Post SGEs. */
+	assert(tmpl.if_qp != NULL);
+	if (tmpl.sp) {
+		struct rxq_elt_sp (*elts)[tmpl.elts_n] = tmpl.elts.sp;
+
+		for (i = 0; (i != RTE_DIM(*elts)); ++i) {
+#ifdef HAVE_EXP_QP_BURST_RECV_SG_LIST
+			err = tmpl.if_qp->recv_sg_list
+				(tmpl.qp,
+				 (*elts)[i].sges,
+				 RTE_DIM((*elts)[i].sges));
+#else /* HAVE_EXP_QP_BURST_RECV_SG_LIST */
+			errno = ENOSYS;
+			err = -1;
+#endif /* HAVE_EXP_QP_BURST_RECV_SG_LIST */
+			if (err)
+				break;
+		}
+	} else {
+		struct rxq_elt (*elts)[tmpl.elts_n] = tmpl.elts.no_sp;
+
+		for (i = 0; (i != RTE_DIM(*elts)); ++i) {
+			err = tmpl.if_qp->recv_burst(
+				tmpl.qp,
+				&(*elts)[i].sge,
+				1);
+			if (err)
+				break;
+		}
+	}
 	if (err) {
-		ERROR("%p: ibv_post_recv() failed for WR %p: %s",
-		      (void *)dev,
-		      (void *)bad_wr,
-		      strerror(err));
+		ERROR("%p: failed to post SGEs with error %d",
+		      (void *)dev, err);
+		/* Set err because it does not contain a valid errno value. */
+		err = EIO;
 		goto skip_rtr;
 	}
 	mod = (struct ibv_exp_qp_attr){
@@ -761,10 +742,10 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 		struct ibv_exp_res_domain_init_attr rd;
 	} attr;
 	enum ibv_exp_query_intf_status status;
-	struct ibv_recv_wr *bad_wr;
 	struct rte_mbuf *buf;
 	int ret = 0;
 	int parent = (rxq == &priv->rxq_parent);
+	unsigned int i;
 
 	(void)conf; /* Thresholds configuration (ignored). */
 	/*
@@ -900,28 +881,7 @@ skip_mr:
 		      (void *)dev, strerror(ret));
 		goto error;
 	}
-	ret = ibv_post_recv(tmpl.qp,
-			    (tmpl.sp ?
-			     &(*tmpl.elts.sp)[0].wr :
-			     &(*tmpl.elts.no_sp)[0].wr),
-			    &bad_wr);
-	if (ret) {
-		ERROR("%p: ibv_post_recv() failed for WR %p: %s",
-		      (void *)dev,
-		      (void *)bad_wr,
-		      strerror(ret));
-		goto error;
-	}
 skip_alloc:
-	mod = (struct ibv_exp_qp_attr){
-		.qp_state = IBV_QPS_RTR
-	};
-	ret = ibv_exp_modify_qp(tmpl.qp, &mod, IBV_EXP_QP_STATE);
-	if (ret) {
-		ERROR("%p: QP state to IBV_QPS_RTR failed: %s",
-		      (void *)dev, strerror(ret));
-		goto error;
-	}
 	/* Save port ID. */
 	tmpl.port_id = dev->data->port_id;
 	DEBUG("%p: RTE port ID: %u", (void *)rxq, tmpl.port_id);
@@ -947,6 +907,51 @@ skip_alloc:
 		      (void *)dev, status);
 		goto error;
 	}
+	/* Post SGEs. */
+	if (!parent && tmpl.sp) {
+		struct rxq_elt_sp (*elts)[tmpl.elts_n] = tmpl.elts.sp;
+
+		for (i = 0; (i != RTE_DIM(*elts)); ++i) {
+#ifdef HAVE_EXP_QP_BURST_RECV_SG_LIST
+			ret = tmpl.if_qp->recv_sg_list
+				(tmpl.qp,
+				 (*elts)[i].sges,
+				 RTE_DIM((*elts)[i].sges));
+#else /* HAVE_EXP_QP_BURST_RECV_SG_LIST */
+			errno = ENOSYS;
+			ret = -1;
+#endif /* HAVE_EXP_QP_BURST_RECV_SG_LIST */
+			if (ret)
+				break;
+		}
+	} else if (!parent) {
+		struct rxq_elt (*elts)[tmpl.elts_n] = tmpl.elts.no_sp;
+
+		for (i = 0; (i != RTE_DIM(*elts)); ++i) {
+			ret = tmpl.if_qp->recv_burst(
+				tmpl.qp,
+				&(*elts)[i].sge,
+				1);
+			if (ret)
+				break;
+		}
+	}
+	if (ret) {
+		ERROR("%p: failed to post SGEs with error %d",
+		      (void *)dev, ret);
+		/* Set ret because it does not contain a valid errno value. */
+		ret = EIO;
+		goto error;
+	}
+	mod = (struct ibv_exp_qp_attr){
+		.qp_state = IBV_QPS_RTR
+	};
+	ret = ibv_exp_modify_qp(tmpl.qp, &mod, IBV_EXP_QP_STATE);
+	if (ret) {
+		ERROR("%p: QP state to IBV_QPS_RTR failed: %s",
+		      (void *)dev, strerror(ret));
+		goto error;
+	}
 	/* Clean up rxq in case we're reinitializing it. */
 	DEBUG("%p: cleaning-up old rxq just in case", (void *)rxq);
 	rxq_cleanup(rxq);
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 8872f19..f48fec1 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -612,8 +612,6 @@ mlx5_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		return 0;
 	for (i = 0; (i != pkts_n); ++i) {
 		struct rxq_elt_sp *elt = &(*elts)[elts_head];
-		struct ibv_recv_wr *wr = &elt->wr;
-		uint64_t wr_id = wr->wr_id;
 		unsigned int len;
 		unsigned int pkt_buf_len;
 		struct rte_mbuf *pkt_buf = NULL; /* Buffer returned in pkts. */
@@ -623,12 +621,6 @@ mlx5_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		uint32_t flags;
 
 		/* Sanity checks. */
-#ifdef NDEBUG
-		(void)wr_id;
-#endif
-		assert(wr_id < rxq->elts_n);
-		assert(wr->sg_list == elt->sges);
-		assert(wr->num_sge == RTE_DIM(elt->sges));
 		assert(elts_head < rxq->elts_n);
 		assert(rxq->elts_head < rxq->elts_n);
 		ret = rxq->if_cq->poll_length_flags(rxq->cq, NULL, NULL,
@@ -677,6 +669,7 @@ mlx5_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 			struct rte_mbuf *rep;
 			unsigned int seg_tailroom;
 
+			assert(seg != NULL);
 			/*
 			 * Fetch initial bytes of packet descriptor into a
 			 * cacheline while allocating rep.
@@ -688,9 +681,8 @@ mlx5_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 				 * Unable to allocate a replacement mbuf,
 				 * repost WR.
 				 */
-				DEBUG("rxq=%p, wr_id=%" PRIu64 ":"
-				      " can't allocate a new mbuf",
-				      (void *)rxq, wr_id);
+				DEBUG("rxq=%p: can't allocate a new mbuf",
+				      (void *)rxq);
 				if (pkt_buf != NULL) {
 					*pkt_buf_next = NULL;
 					rte_pktmbuf_free(pkt_buf);
@@ -825,18 +817,13 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		return mlx5_rx_burst_sp(dpdk_rxq, pkts, pkts_n);
 	for (i = 0; (i != pkts_n); ++i) {
 		struct rxq_elt *elt = &(*elts)[elts_head];
-		struct ibv_recv_wr *wr = &elt->wr;
-		uint64_t wr_id = wr->wr_id;
 		unsigned int len;
-		struct rte_mbuf *seg = (void *)((uintptr_t)elt->sge.addr -
-			WR_ID(wr_id).offset);
+		struct rte_mbuf *seg = elt->buf;
 		struct rte_mbuf *rep;
 		uint32_t flags;
 
 		/* Sanity checks. */
-		assert(WR_ID(wr_id).id < rxq->elts_n);
-		assert(wr->sg_list == &elt->sge);
-		assert(wr->num_sge == 1);
+		assert(seg != NULL);
 		assert(elts_head < rxq->elts_n);
 		assert(rxq->elts_head < rxq->elts_n);
 		/*
@@ -888,9 +875,8 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 			 * Unable to allocate a replacement mbuf,
 			 * repost WR.
 			 */
-			DEBUG("rxq=%p, wr_id=%" PRIu32 ":"
-			      " can't allocate a new mbuf",
-			      (void *)rxq, WR_ID(wr_id).id);
+			DEBUG("rxq=%p: can't allocate a new mbuf",
+			      (void *)rxq);
 			/* Increment out of memory counters. */
 			++rxq->stats.rx_nombuf;
 			++rxq->priv->dev->data->rx_mbuf_alloc_failed;
@@ -900,10 +886,7 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		/* Reconfigure sge to use rep instead of seg. */
 		elt->sge.addr = (uintptr_t)rep->buf_addr + RTE_PKTMBUF_HEADROOM;
 		assert(elt->sge.lkey == rxq->mr->lkey);
-		WR_ID(wr->wr_id).offset =
-			(((uintptr_t)rep->buf_addr + RTE_PKTMBUF_HEADROOM) -
-			 (uintptr_t)rep);
-		assert(WR_ID(wr->wr_id).id == WR_ID(wr_id).id);
+		elt->buf = rep;
 
 		/* Add SGE to array for repost. */
 		sges[i] = elt->sge;
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index d86d623..90c99dc 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -81,16 +81,14 @@ struct mlx5_txq_stats {
 
 /* RX element (scattered packets). */
 struct rxq_elt_sp {
-	struct ibv_recv_wr wr; /* Work Request. */
 	struct ibv_sge sges[MLX5_PMD_SGE_WR_N]; /* Scatter/Gather Elements. */
 	struct rte_mbuf *bufs[MLX5_PMD_SGE_WR_N]; /* SGEs buffers. */
 };
 
 /* RX element. */
 struct rxq_elt {
-	struct ibv_recv_wr wr; /* Work Request. */
 	struct ibv_sge sge; /* Scatter/Gather Element. */
-	/* mbuf pointer is derived from WR_ID(wr.wr_id).offset. */
+	struct rte_mbuf *buf; /* SGE buffer. */
 };
 
 struct priv;
diff --git a/drivers/net/mlx5/mlx5_utils.h b/drivers/net/mlx5/mlx5_utils.h
index 8ff075b..f1fad18 100644
--- a/drivers/net/mlx5/mlx5_utils.h
+++ b/drivers/net/mlx5/mlx5_utils.h
@@ -161,6 +161,4 @@ pmd_drv_log_basename(const char *s)
 	\
 	snprintf(name, sizeof(name), __VA_ARGS__)
 
-#define WR_ID(o) (((wr_id_t *)&(o))->data)
-
 #endif /* RTE_PMD_MLX5_UTILS_H_ */
-- 
2.1.0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [dpdk-dev] [PATCH v2 03/16] mlx5: refactor RX code for the new Verbs RSS API
  2015-10-30 18:55 ` [dpdk-dev] [PATCH v2 00/16] " Adrien Mazarguil
  2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 01/16] mlx5: use fast Verbs interface for scattered RX operation Adrien Mazarguil
  2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 02/16] mlx5: get rid of the WR structure in RX queue elements Adrien Mazarguil
@ 2015-10-30 18:55   ` Adrien Mazarguil
  2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 04/16] mlx5: use separate indirection table for default hash RX queue Adrien Mazarguil
                     ` (13 subsequent siblings)
  16 siblings, 0 replies; 39+ messages in thread
From: Adrien Mazarguil @ 2015-10-30 18:55 UTC (permalink / raw)
  To: dev; +Cc: Yaacov Hazan

The new Verbs RSS API is lower-level than the previous one and much more
flexible but requires RX queues to use Work Queues (WQs) internally instead
of Queue Pairs (QPs), which are grouped in an indirection table used by a
new kind of hash RX QPs.

Hash RX QPs and the indirection table together replace the parent RSS QP
while WQs are mostly similar to child QPs.

RSS hash key is not configurable yet.

Summary of changes:

- Individual DPDK RX queues do not store flow properties anymore, this info
  is now part of the hash RX queues.
- All functions affecting the parent queue when RSS is enabled or the basic
  queues otherwise are modified to affect hash RX queues instead.
- Hash RX queues are also used when a single DPDK RX queue is configured (no
  RSS) to remove that special case.
- Hash RX queues and indirection table are created/destroyed when device
  is started/stopped in addition to create/destroy flows.
- Contrary to QPs, WQs are moved to the "ready" state before posting RX
  buffers, otherwise they are ignored.
- Resource domain information is added to WQs for better performance.
- CQs are not resized anymore when switching between non-SG and SG modes as
  it does not work correctly with WQs. Use the largest possible size
  instead, since CQ size does not have to be the same as the number of
  elements in the RX queue. This also applies to the maximum number of
  outstanding WRs in a WQ (max_recv_wr).

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Olga Shern <olgas@mellanox.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Or Ami <ora@mellanox.com>
Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>
---
 drivers/net/mlx5/Makefile       |   8 -
 drivers/net/mlx5/mlx5.c         |  36 +--
 drivers/net/mlx5/mlx5.h         |  25 +-
 drivers/net/mlx5/mlx5_ethdev.c  |  54 +---
 drivers/net/mlx5/mlx5_mac.c     | 186 +++++++------
 drivers/net/mlx5/mlx5_rxmode.c  | 266 ++++++++++---------
 drivers/net/mlx5/mlx5_rxq.c     | 559 +++++++++++++++++++++-------------------
 drivers/net/mlx5/mlx5_rxtx.c    |  11 +-
 drivers/net/mlx5/mlx5_rxtx.h    |  19 +-
 drivers/net/mlx5/mlx5_trigger.c |  87 ++-----
 drivers/net/mlx5/mlx5_vlan.c    |  33 +--
 11 files changed, 606 insertions(+), 678 deletions(-)

diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 2969045..938f924 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -112,17 +112,9 @@ endif
 mlx5_autoconf.h: $(RTE_SDK)/scripts/auto-config-h.sh
 	$Q $(RM) -f -- '$@'
 	$Q sh -- '$<' '$@' \
-		RSS_SUPPORT \
-		infiniband/verbs.h \
-		enum IBV_EXP_DEVICE_UD_RSS $(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
 		HAVE_EXP_QUERY_DEVICE \
 		infiniband/verbs.h \
 		type 'struct ibv_exp_device_attr' $(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_EXP_QP_BURST_RECV_SG_LIST \
-		infiniband/verbs.h \
-		field 'struct ibv_exp_qp_burst_family.recv_sg_list' $(AUTOCONF_OUTPUT)
 
 mlx5.o: mlx5_autoconf.h
 
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 8f75f76..e394d32 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -85,6 +85,11 @@ mlx5_dev_close(struct rte_eth_dev *dev)
 	DEBUG("%p: closing device \"%s\"",
 	      (void *)dev,
 	      ((priv->ctx != NULL) ? priv->ctx->device->name : ""));
+	/* In case mlx5_dev_stop() has not been called. */
+	priv_allmulticast_disable(priv);
+	priv_promiscuous_disable(priv);
+	priv_mac_addrs_disable(priv);
+	priv_destroy_hash_rxqs(priv);
 	/* Prevent crashes when queues are still in use. */
 	dev->rx_pkt_burst = removed_rx_burst;
 	dev->tx_pkt_burst = removed_tx_burst;
@@ -116,8 +121,6 @@ mlx5_dev_close(struct rte_eth_dev *dev)
 		priv->txqs_n = 0;
 		priv->txqs = NULL;
 	}
-	if (priv->rss)
-		rxq_cleanup(&priv->rxq_parent);
 	if (priv->pd != NULL) {
 		assert(priv->ctx != NULL);
 		claim_zero(ibv_dealloc_pd(priv->pd));
@@ -297,9 +300,6 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 
 #ifdef HAVE_EXP_QUERY_DEVICE
 		exp_device_attr.comp_mask = IBV_EXP_DEVICE_ATTR_EXP_CAP_FLAGS;
-#ifdef RSS_SUPPORT
-		exp_device_attr.comp_mask |= IBV_EXP_DEVICE_ATTR_RSS_TBL_SZ;
-#endif /* RSS_SUPPORT */
 #endif /* HAVE_EXP_QUERY_DEVICE */
 
 		DEBUG("using port %u (%08" PRIx32 ")", port, test);
@@ -349,32 +349,6 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 			ERROR("ibv_exp_query_device() failed");
 			goto port_error;
 		}
-#ifdef RSS_SUPPORT
-		if ((exp_device_attr.exp_device_cap_flags &
-		     IBV_EXP_DEVICE_QPG) &&
-		    (exp_device_attr.exp_device_cap_flags &
-		     IBV_EXP_DEVICE_UD_RSS) &&
-		    (exp_device_attr.comp_mask &
-		     IBV_EXP_DEVICE_ATTR_RSS_TBL_SZ) &&
-		    (exp_device_attr.max_rss_tbl_sz > 0)) {
-			priv->hw_qpg = 1;
-			priv->hw_rss = 1;
-			priv->max_rss_tbl_sz = exp_device_attr.max_rss_tbl_sz;
-		} else {
-			priv->hw_qpg = 0;
-			priv->hw_rss = 0;
-			priv->max_rss_tbl_sz = 0;
-		}
-		priv->hw_tss = !!(exp_device_attr.exp_device_cap_flags &
-				  IBV_EXP_DEVICE_UD_TSS);
-		DEBUG("device flags: %s%s%s",
-		      (priv->hw_qpg ? "IBV_DEVICE_QPG " : ""),
-		      (priv->hw_tss ? "IBV_DEVICE_TSS " : ""),
-		      (priv->hw_rss ? "IBV_DEVICE_RSS " : ""));
-		if (priv->hw_rss)
-			DEBUG("maximum RSS indirection table size: %u",
-			      exp_device_attr.max_rss_tbl_sz);
-#endif /* RSS_SUPPORT */
 
 		priv->hw_csum =
 			((exp_device_attr.exp_device_cap_flags &
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index c8a517c..4407b18 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -98,20 +98,19 @@ struct priv {
 	unsigned int started:1; /* Device started, flows enabled. */
 	unsigned int promisc_req:1; /* Promiscuous mode requested. */
 	unsigned int allmulti_req:1; /* All multicast mode requested. */
-	unsigned int hw_qpg:1; /* QP groups are supported. */
-	unsigned int hw_tss:1; /* TSS is supported. */
-	unsigned int hw_rss:1; /* RSS is supported. */
 	unsigned int hw_csum:1; /* Checksum offload is supported. */
 	unsigned int hw_csum_l2tun:1; /* Same for L2 tunnels. */
-	unsigned int rss:1; /* RSS is enabled. */
 	unsigned int vf:1; /* This is a VF device. */
-	unsigned int max_rss_tbl_sz; /* Maximum number of RSS queues. */
 	/* RX/TX queues. */
-	struct rxq rxq_parent; /* Parent queue when RSS is enabled. */
 	unsigned int rxqs_n; /* RX queues array size. */
 	unsigned int txqs_n; /* TX queues array size. */
 	struct rxq *(*rxqs)[]; /* RX queues. */
 	struct txq *(*txqs)[]; /* TX queues. */
+	/* Indirection table referencing all RX WQs. */
+	struct ibv_exp_rwq_ind_table *ind_table;
+	/* Hash RX QPs feeding the indirection table. */
+	struct hash_rxq (*hash_rxqs)[];
+	unsigned int hash_rxqs_n; /* Hash RX QPs array size. */
 	rte_spinlock_t lock; /* Lock for control functions. */
 };
 
@@ -158,23 +157,25 @@ int mlx5_ibv_device_to_pci_addr(const struct ibv_device *,
 /* mlx5_mac.c */
 
 int priv_get_mac(struct priv *, uint8_t (*)[ETHER_ADDR_LEN]);
-void rxq_mac_addrs_del(struct rxq *);
+void hash_rxq_mac_addrs_del(struct hash_rxq *);
+void priv_mac_addrs_disable(struct priv *);
 void mlx5_mac_addr_remove(struct rte_eth_dev *, uint32_t);
-int rxq_mac_addrs_add(struct rxq *);
+int hash_rxq_mac_addrs_add(struct hash_rxq *);
 int priv_mac_addr_add(struct priv *, unsigned int,
 		      const uint8_t (*)[ETHER_ADDR_LEN]);
+int priv_mac_addrs_enable(struct priv *);
 void mlx5_mac_addr_add(struct rte_eth_dev *, struct ether_addr *, uint32_t,
 		       uint32_t);
 
 /* mlx5_rxmode.c */
 
-int rxq_promiscuous_enable(struct rxq *);
+int priv_promiscuous_enable(struct priv *);
 void mlx5_promiscuous_enable(struct rte_eth_dev *);
-void rxq_promiscuous_disable(struct rxq *);
+void priv_promiscuous_disable(struct priv *);
 void mlx5_promiscuous_disable(struct rte_eth_dev *);
-int rxq_allmulticast_enable(struct rxq *);
+int priv_allmulticast_enable(struct priv *);
 void mlx5_allmulticast_enable(struct rte_eth_dev *);
-void rxq_allmulticast_disable(struct rxq *);
+void priv_allmulticast_disable(struct priv *);
 void mlx5_allmulticast_disable(struct rte_eth_dev *);
 
 /* mlx5_stats.c */
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 5df5fa1..fac685e 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -394,7 +394,6 @@ priv_set_flags(struct priv *priv, unsigned int keep, unsigned int flags)
  * Ethernet device configuration.
  *
  * Prepare the driver for a given number of TX and RX queues.
- * Allocate parent RSS queue when several RX queues are requested.
  *
  * @param dev
  *   Pointer to Ethernet device structure.
@@ -408,8 +407,6 @@ dev_configure(struct rte_eth_dev *dev)
 	struct priv *priv = dev->data->dev_private;
 	unsigned int rxqs_n = dev->data->nb_rx_queues;
 	unsigned int txqs_n = dev->data->nb_tx_queues;
-	unsigned int tmp;
-	int ret;
 
 	priv->rxqs = (void *)dev->data->rx_queues;
 	priv->txqs = (void *)dev->data->tx_queues;
@@ -422,47 +419,8 @@ dev_configure(struct rte_eth_dev *dev)
 		return 0;
 	INFO("%p: RX queues number update: %u -> %u",
 	     (void *)dev, priv->rxqs_n, rxqs_n);
-	/* If RSS is enabled, disable it first. */
-	if (priv->rss) {
-		unsigned int i;
-
-		/* Only if there are no remaining child RX queues. */
-		for (i = 0; (i != priv->rxqs_n); ++i)
-			if ((*priv->rxqs)[i] != NULL)
-				return EINVAL;
-		rxq_cleanup(&priv->rxq_parent);
-		priv->rss = 0;
-		priv->rxqs_n = 0;
-	}
-	if (rxqs_n <= 1) {
-		/* Nothing else to do. */
-		priv->rxqs_n = rxqs_n;
-		return 0;
-	}
-	/* Allocate a new RSS parent queue if supported by hardware. */
-	if (!priv->hw_rss) {
-		ERROR("%p: only a single RX queue can be configured when"
-		      " hardware doesn't support RSS",
-		      (void *)dev);
-		return EINVAL;
-	}
-	/* Fail if hardware doesn't support that many RSS queues. */
-	if (rxqs_n >= priv->max_rss_tbl_sz) {
-		ERROR("%p: only %u RX queues can be configured for RSS",
-		      (void *)dev, priv->max_rss_tbl_sz);
-		return EINVAL;
-	}
-	priv->rss = 1;
-	tmp = priv->rxqs_n;
 	priv->rxqs_n = rxqs_n;
-	ret = rxq_setup(dev, &priv->rxq_parent, 0, 0, NULL, NULL);
-	if (!ret)
-		return 0;
-	/* Failure, rollback. */
-	priv->rss = 0;
-	priv->rxqs_n = tmp;
-	assert(ret > 0);
-	return ret;
+	return 0;
 }
 
 /**
@@ -671,16 +629,6 @@ mlx5_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
 				rx_func = mlx5_rx_burst_sp;
 			break;
 		}
-		/* Reenable non-RSS queue attributes. No need to check
-		 * for errors at this stage. */
-		if (!priv->rss) {
-			if (priv->started)
-				rxq_mac_addrs_add(rxq);
-			if (priv->started && priv->promisc_req)
-				rxq_promiscuous_enable(rxq);
-			if (priv->started && priv->allmulti_req)
-				rxq_allmulticast_enable(rxq);
-		}
 		/* Scattered burst function takes priority. */
 		if (rxq->sp)
 			rx_func = mlx5_rx_burst_sp;
diff --git a/drivers/net/mlx5/mlx5_mac.c b/drivers/net/mlx5/mlx5_mac.c
index 95afccf..b580494 100644
--- a/drivers/net/mlx5/mlx5_mac.c
+++ b/drivers/net/mlx5/mlx5_mac.c
@@ -93,75 +93,75 @@ priv_get_mac(struct priv *priv, uint8_t (*mac)[ETHER_ADDR_LEN])
 /**
  * Delete MAC flow steering rule.
  *
- * @param rxq
- *   Pointer to RX queue structure.
+ * @param hash_rxq
+ *   Pointer to hash RX queue structure.
  * @param mac_index
  *   MAC address index.
  * @param vlan_index
  *   VLAN index to use.
  */
 static void
-rxq_del_mac_flow(struct rxq *rxq, unsigned int mac_index,
-		 unsigned int vlan_index)
+hash_rxq_del_mac_flow(struct hash_rxq *hash_rxq, unsigned int mac_index,
+		      unsigned int vlan_index)
 {
 #ifndef NDEBUG
 	const uint8_t (*mac)[ETHER_ADDR_LEN] =
 		(const uint8_t (*)[ETHER_ADDR_LEN])
-		rxq->priv->mac[mac_index].addr_bytes;
+		hash_rxq->priv->mac[mac_index].addr_bytes;
 #endif
 
-	assert(mac_index < RTE_DIM(rxq->mac_flow));
-	assert(vlan_index < RTE_DIM(rxq->mac_flow[mac_index]));
-	if (rxq->mac_flow[mac_index][vlan_index] == NULL)
+	assert(mac_index < RTE_DIM(hash_rxq->mac_flow));
+	assert(vlan_index < RTE_DIM(hash_rxq->mac_flow[mac_index]));
+	if (hash_rxq->mac_flow[mac_index][vlan_index] == NULL)
 		return;
 	DEBUG("%p: removing MAC address %02x:%02x:%02x:%02x:%02x:%02x index %u"
 	      " VLAN index %u",
-	      (void *)rxq,
+	      (void *)hash_rxq,
 	      (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5],
 	      mac_index,
 	      vlan_index);
-	claim_zero(ibv_destroy_flow(rxq->mac_flow[mac_index][vlan_index]));
-	rxq->mac_flow[mac_index][vlan_index] = NULL;
+	claim_zero(ibv_destroy_flow(hash_rxq->mac_flow
+				    [mac_index][vlan_index]));
+	hash_rxq->mac_flow[mac_index][vlan_index] = NULL;
 }
 
 /**
- * Unregister a MAC address from a RX queue.
+ * Unregister a MAC address from a hash RX queue.
  *
- * @param rxq
- *   Pointer to RX queue structure.
+ * @param hash_rxq
+ *   Pointer to hash RX queue structure.
  * @param mac_index
  *   MAC address index.
  */
 static void
-rxq_mac_addr_del(struct rxq *rxq, unsigned int mac_index)
+hash_rxq_mac_addr_del(struct hash_rxq *hash_rxq, unsigned int mac_index)
 {
 	unsigned int i;
 
-	assert(mac_index < RTE_DIM(rxq->mac_flow));
-	for (i = 0; (i != RTE_DIM(rxq->mac_flow[mac_index])); ++i)
-		rxq_del_mac_flow(rxq, mac_index, i);
+	assert(mac_index < RTE_DIM(hash_rxq->mac_flow));
+	for (i = 0; (i != RTE_DIM(hash_rxq->mac_flow[mac_index])); ++i)
+		hash_rxq_del_mac_flow(hash_rxq, mac_index, i);
 }
 
 /**
- * Unregister all MAC addresses from a RX queue.
+ * Unregister all MAC addresses from a hash RX queue.
  *
- * @param rxq
- *   Pointer to RX queue structure.
+ * @param hash_rxq
+ *   Pointer to hash RX queue structure.
  */
 void
-rxq_mac_addrs_del(struct rxq *rxq)
+hash_rxq_mac_addrs_del(struct hash_rxq *hash_rxq)
 {
 	unsigned int i;
 
-	for (i = 0; (i != RTE_DIM(rxq->mac_flow)); ++i)
-		rxq_mac_addr_del(rxq, i);
+	for (i = 0; (i != RTE_DIM(hash_rxq->mac_flow)); ++i)
+		hash_rxq_mac_addr_del(hash_rxq, i);
 }
 
 /**
  * Unregister a MAC address.
  *
- * In RSS mode, the MAC address is unregistered from the parent queue,
- * otherwise it is unregistered from each queue directly.
+ * This is done for each hash RX queue.
  *
  * @param priv
  *   Pointer to private structure.
@@ -176,17 +176,27 @@ priv_mac_addr_del(struct priv *priv, unsigned int mac_index)
 	assert(mac_index < RTE_DIM(priv->mac));
 	if (!BITFIELD_ISSET(priv->mac_configured, mac_index))
 		return;
-	if (priv->rss) {
-		rxq_mac_addr_del(&priv->rxq_parent, mac_index);
-		goto end;
-	}
-	for (i = 0; (i != priv->dev->data->nb_rx_queues); ++i)
-		rxq_mac_addr_del((*priv->rxqs)[i], mac_index);
-end:
+	for (i = 0; (i != priv->hash_rxqs_n); ++i)
+		hash_rxq_mac_addr_del(&(*priv->hash_rxqs)[i], mac_index);
 	BITFIELD_RESET(priv->mac_configured, mac_index);
 }
 
 /**
+ * Unregister all MAC addresses from all hash RX queues.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ */
+void
+priv_mac_addrs_disable(struct priv *priv)
+{
+	unsigned int i;
+
+	for (i = 0; (i != priv->hash_rxqs_n); ++i)
+		hash_rxq_mac_addrs_del(&(*priv->hash_rxqs)[i]);
+}
+
+/**
  * DPDK callback to remove a MAC address.
  *
  * @param dev
@@ -213,8 +223,8 @@ end:
 /**
  * Add MAC flow steering rule.
  *
- * @param rxq
- *   Pointer to RX queue structure.
+ * @param hash_rxq
+ *   Pointer to hash RX queue structure.
  * @param mac_index
  *   MAC address index to register.
  * @param vlan_index
@@ -224,11 +234,11 @@ end:
  *   0 on success, errno value on failure.
  */
 static int
-rxq_add_mac_flow(struct rxq *rxq, unsigned int mac_index,
-		 unsigned int vlan_index)
+hash_rxq_add_mac_flow(struct hash_rxq *hash_rxq, unsigned int mac_index,
+		      unsigned int vlan_index)
 {
 	struct ibv_flow *flow;
-	struct priv *priv = rxq->priv;
+	struct priv *priv = hash_rxq->priv;
 	const uint8_t (*mac)[ETHER_ADDR_LEN] =
 			(const uint8_t (*)[ETHER_ADDR_LEN])
 			priv->mac[mac_index].addr_bytes;
@@ -241,9 +251,9 @@ rxq_add_mac_flow(struct rxq *rxq, unsigned int mac_index,
 	unsigned int vlan_enabled = !!priv->vlan_filter_n;
 	unsigned int vlan_id = priv->vlan_filter[vlan_index];
 
-	assert(mac_index < RTE_DIM(rxq->mac_flow));
-	assert(vlan_index < RTE_DIM(rxq->mac_flow[mac_index]));
-	if (rxq->mac_flow[mac_index][vlan_index] != NULL)
+	assert(mac_index < RTE_DIM(hash_rxq->mac_flow));
+	assert(vlan_index < RTE_DIM(hash_rxq->mac_flow[mac_index]));
+	if (hash_rxq->mac_flow[mac_index][vlan_index] != NULL)
 		return 0;
 	/*
 	 * No padding must be inserted by the compiler between attr and spec.
@@ -273,7 +283,7 @@ rxq_add_mac_flow(struct rxq *rxq, unsigned int mac_index,
 	};
 	DEBUG("%p: adding MAC address %02x:%02x:%02x:%02x:%02x:%02x index %u"
 	      " VLAN index %u filtering %s, ID %u",
-	      (void *)rxq,
+	      (void *)hash_rxq,
 	      (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5],
 	      mac_index,
 	      vlan_index,
@@ -281,25 +291,25 @@ rxq_add_mac_flow(struct rxq *rxq, unsigned int mac_index,
 	      vlan_id);
 	/* Create related flow. */
 	errno = 0;
-	flow = ibv_create_flow(rxq->qp, attr);
+	flow = ibv_create_flow(hash_rxq->qp, attr);
 	if (flow == NULL) {
 		/* It's not clear whether errno is always set in this case. */
 		ERROR("%p: flow configuration failed, errno=%d: %s",
-		      (void *)rxq, errno,
+		      (void *)hash_rxq, errno,
 		      (errno ? strerror(errno) : "Unknown error"));
 		if (errno)
 			return errno;
 		return EINVAL;
 	}
-	rxq->mac_flow[mac_index][vlan_index] = flow;
+	hash_rxq->mac_flow[mac_index][vlan_index] = flow;
 	return 0;
 }
 
 /**
- * Register a MAC address in a RX queue.
+ * Register a MAC address in a hash RX queue.
  *
- * @param rxq
- *   Pointer to RX queue structure.
+ * @param hash_rxq
+ *   Pointer to hash RX queue structure.
  * @param mac_index
  *   MAC address index to register.
  *
@@ -307,22 +317,23 @@ rxq_add_mac_flow(struct rxq *rxq, unsigned int mac_index,
  *   0 on success, errno value on failure.
  */
 static int
-rxq_mac_addr_add(struct rxq *rxq, unsigned int mac_index)
+hash_rxq_mac_addr_add(struct hash_rxq *hash_rxq, unsigned int mac_index)
 {
-	struct priv *priv = rxq->priv;
+	struct priv *priv = hash_rxq->priv;
 	unsigned int i = 0;
 	int ret;
 
-	assert(mac_index < RTE_DIM(rxq->mac_flow));
-	assert(RTE_DIM(rxq->mac_flow[mac_index]) ==
+	assert(mac_index < RTE_DIM(hash_rxq->mac_flow));
+	assert(RTE_DIM(hash_rxq->mac_flow[mac_index]) ==
 	       RTE_DIM(priv->vlan_filter));
 	/* Add a MAC address for each VLAN filter, or at least once. */
 	do {
-		ret = rxq_add_mac_flow(rxq, mac_index, i);
+		ret = hash_rxq_add_mac_flow(hash_rxq, mac_index, i);
 		if (ret) {
 			/* Failure, rollback. */
 			while (i != 0)
-				rxq_del_mac_flow(rxq, mac_index, --i);
+				hash_rxq_del_mac_flow(hash_rxq, mac_index,
+						      --i);
 			return ret;
 		}
 	} while (++i < priv->vlan_filter_n);
@@ -330,31 +341,31 @@ rxq_mac_addr_add(struct rxq *rxq, unsigned int mac_index)
 }
 
 /**
- * Register all MAC addresses in a RX queue.
+ * Register all MAC addresses in a hash RX queue.
  *
- * @param rxq
- *   Pointer to RX queue structure.
+ * @param hash_rxq
+ *   Pointer to hash RX queue structure.
  *
  * @return
  *   0 on success, errno value on failure.
  */
 int
-rxq_mac_addrs_add(struct rxq *rxq)
+hash_rxq_mac_addrs_add(struct hash_rxq *hash_rxq)
 {
-	struct priv *priv = rxq->priv;
+	struct priv *priv = hash_rxq->priv;
 	unsigned int i;
 	int ret;
 
-	assert(RTE_DIM(priv->mac) == RTE_DIM(rxq->mac_flow));
+	assert(RTE_DIM(priv->mac) == RTE_DIM(hash_rxq->mac_flow));
 	for (i = 0; (i != RTE_DIM(priv->mac)); ++i) {
 		if (!BITFIELD_ISSET(priv->mac_configured, i))
 			continue;
-		ret = rxq_mac_addr_add(rxq, i);
+		ret = hash_rxq_mac_addr_add(hash_rxq, i);
 		if (!ret)
 			continue;
 		/* Failure, rollback. */
 		while (i != 0)
-			rxq_mac_addr_del(rxq, --i);
+			hash_rxq_mac_addr_del(hash_rxq, --i);
 		assert(ret > 0);
 		return ret;
 	}
@@ -364,8 +375,7 @@ rxq_mac_addrs_add(struct rxq *rxq)
 /**
  * Register a MAC address.
  *
- * In RSS mode, the MAC address is registered in the parent queue,
- * otherwise it is registered in each queue directly.
+ * This is done for each hash RX queue.
  *
  * @param priv
  *   Pointer to private structure.
@@ -405,33 +415,49 @@ priv_mac_addr_add(struct priv *priv, unsigned int mac_index,
 			(*mac)[3], (*mac)[4], (*mac)[5]
 		}
 	};
-	/* If device isn't started, this is all we need to do. */
-	if (!priv->started)
-		goto end;
-	if (priv->rss) {
-		ret = rxq_mac_addr_add(&priv->rxq_parent, mac_index);
-		if (ret)
-			return ret;
-		goto end;
-	}
-	for (i = 0; (i != priv->rxqs_n); ++i) {
-		if ((*priv->rxqs)[i] == NULL)
-			continue;
-		ret = rxq_mac_addr_add((*priv->rxqs)[i], mac_index);
+	for (i = 0; (i != priv->hash_rxqs_n); ++i) {
+		ret = hash_rxq_mac_addr_add(&(*priv->hash_rxqs)[i], mac_index);
 		if (!ret)
 			continue;
 		/* Failure, rollback. */
 		while (i != 0)
-			if ((*priv->rxqs)[--i] != NULL)
-				rxq_mac_addr_del((*priv->rxqs)[i], mac_index);
+			hash_rxq_mac_addr_del(&(*priv->hash_rxqs)[--i],
+					      mac_index);
 		return ret;
 	}
-end:
 	BITFIELD_SET(priv->mac_configured, mac_index);
 	return 0;
 }
 
 /**
+ * Register all MAC addresses in all hash RX queues.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ *
+ * @return
+ *   0 on success, errno value on failure.
+ */
+int
+priv_mac_addrs_enable(struct priv *priv)
+{
+	unsigned int i;
+	int ret;
+
+	for (i = 0; (i != priv->hash_rxqs_n); ++i) {
+		ret = hash_rxq_mac_addrs_add(&(*priv->hash_rxqs)[i]);
+		if (!ret)
+			continue;
+		/* Failure, rollback. */
+		while (i != 0)
+			hash_rxq_mac_addrs_del(&(*priv->hash_rxqs)[--i]);
+		assert(ret > 0);
+		return ret;
+	}
+	return 0;
+}
+
+/**
  * DPDK callback to add a MAC address.
  *
  * @param dev
diff --git a/drivers/net/mlx5/mlx5_rxmode.c b/drivers/net/mlx5/mlx5_rxmode.c
index 7efa21b..2a74c64 100644
--- a/drivers/net/mlx5/mlx5_rxmode.c
+++ b/drivers/net/mlx5/mlx5_rxmode.c
@@ -58,44 +58,78 @@
 #include "mlx5_rxtx.h"
 #include "mlx5_utils.h"
 
+static void hash_rxq_promiscuous_disable(struct hash_rxq *);
+static void hash_rxq_allmulticast_disable(struct hash_rxq *);
+
 /**
- * Enable promiscuous mode in a RX queue.
+ * Enable promiscuous mode in a hash RX queue.
  *
- * @param rxq
- *   Pointer to RX queue structure.
+ * @param hash_rxq
+ *   Pointer to hash RX queue structure.
  *
  * @return
  *   0 on success, errno value on failure.
  */
-int
-rxq_promiscuous_enable(struct rxq *rxq)
+static int
+hash_rxq_promiscuous_enable(struct hash_rxq *hash_rxq)
 {
 	struct ibv_flow *flow;
 	struct ibv_flow_attr attr = {
 		.type = IBV_FLOW_ATTR_ALL_DEFAULT,
 		.num_of_specs = 0,
-		.port = rxq->priv->port,
+		.port = hash_rxq->priv->port,
 		.flags = 0
 	};
 
-	if (rxq->priv->vf)
+	if (hash_rxq->priv->vf)
 		return 0;
-	if (rxq->promisc_flow != NULL)
+	if (hash_rxq->promisc_flow != NULL)
 		return 0;
-	DEBUG("%p: enabling promiscuous mode", (void *)rxq);
+	DEBUG("%p: enabling promiscuous mode", (void *)hash_rxq);
 	errno = 0;
-	flow = ibv_create_flow(rxq->qp, &attr);
+	flow = ibv_create_flow(hash_rxq->qp, &attr);
 	if (flow == NULL) {
 		/* It's not clear whether errno is always set in this case. */
 		ERROR("%p: flow configuration failed, errno=%d: %s",
-		      (void *)rxq, errno,
+		      (void *)hash_rxq, errno,
 		      (errno ? strerror(errno) : "Unknown error"));
 		if (errno)
 			return errno;
 		return EINVAL;
 	}
-	rxq->promisc_flow = flow;
-	DEBUG("%p: promiscuous mode enabled", (void *)rxq);
+	hash_rxq->promisc_flow = flow;
+	DEBUG("%p: promiscuous mode enabled", (void *)hash_rxq);
+	return 0;
+}
+
+/**
+ * Enable promiscuous mode in all hash RX queues.
+ *
+ * @param priv
+ *   Private structure.
+ *
+ * @return
+ *   0 on success, errno value on failure.
+ */
+int
+priv_promiscuous_enable(struct priv *priv)
+{
+	unsigned int i;
+
+	for (i = 0; (i != priv->hash_rxqs_n); ++i) {
+		struct hash_rxq *hash_rxq = &(*priv->hash_rxqs)[i];
+		int ret;
+
+		ret = hash_rxq_promiscuous_enable(hash_rxq);
+		if (!ret)
+			continue;
+		/* Failure, rollback. */
+		while (i != 0) {
+			hash_rxq = &(*priv->hash_rxqs)[--i];
+			hash_rxq_promiscuous_disable(hash_rxq);
+		}
+		return ret;
+	}
 	return 0;
 }
 
@@ -109,56 +143,48 @@ void
 mlx5_promiscuous_enable(struct rte_eth_dev *dev)
 {
 	struct priv *priv = dev->data->dev_private;
-	unsigned int i;
 	int ret;
 
 	priv_lock(priv);
 	priv->promisc_req = 1;
-	/* If device isn't started, this is all we need to do. */
-	if (!priv->started)
-		goto end;
-	if (priv->rss) {
-		ret = rxq_promiscuous_enable(&priv->rxq_parent);
-		if (ret) {
-			priv_unlock(priv);
-			return;
-		}
-		goto end;
-	}
-	for (i = 0; (i != priv->rxqs_n); ++i) {
-		if ((*priv->rxqs)[i] == NULL)
-			continue;
-		ret = rxq_promiscuous_enable((*priv->rxqs)[i]);
-		if (!ret)
-			continue;
-		/* Failure, rollback. */
-		while (i != 0)
-			if ((*priv->rxqs)[--i] != NULL)
-				rxq_promiscuous_disable((*priv->rxqs)[i]);
-		priv_unlock(priv);
-		return;
-	}
-end:
+	ret = priv_promiscuous_enable(priv);
+	if (ret)
+		ERROR("cannot enable promiscuous mode: %s", strerror(ret));
 	priv_unlock(priv);
 }
 
 /**
- * Disable promiscuous mode in a RX queue.
+ * Disable promiscuous mode in a hash RX queue.
  *
- * @param rxq
- *   Pointer to RX queue structure.
+ * @param hash_rxq
+ *   Pointer to hash RX queue structure.
  */
-void
-rxq_promiscuous_disable(struct rxq *rxq)
+static void
+hash_rxq_promiscuous_disable(struct hash_rxq *hash_rxq)
 {
-	if (rxq->priv->vf)
+	if (hash_rxq->priv->vf)
 		return;
-	if (rxq->promisc_flow == NULL)
+	if (hash_rxq->promisc_flow == NULL)
 		return;
-	DEBUG("%p: disabling promiscuous mode", (void *)rxq);
-	claim_zero(ibv_destroy_flow(rxq->promisc_flow));
-	rxq->promisc_flow = NULL;
-	DEBUG("%p: promiscuous mode disabled", (void *)rxq);
+	DEBUG("%p: disabling promiscuous mode", (void *)hash_rxq);
+	claim_zero(ibv_destroy_flow(hash_rxq->promisc_flow));
+	hash_rxq->promisc_flow = NULL;
+	DEBUG("%p: promiscuous mode disabled", (void *)hash_rxq);
+}
+
+/**
+ * Disable promiscuous mode in all hash RX queues.
+ *
+ * @param priv
+ *   Private structure.
+ */
+void
+priv_promiscuous_disable(struct priv *priv)
+{
+	unsigned int i;
+
+	for (i = 0; (i != priv->hash_rxqs_n); ++i)
+		hash_rxq_promiscuous_disable(&(*priv->hash_rxqs)[i]);
 }
 
 /**
@@ -171,57 +197,81 @@ void
 mlx5_promiscuous_disable(struct rte_eth_dev *dev)
 {
 	struct priv *priv = dev->data->dev_private;
-	unsigned int i;
 
 	priv_lock(priv);
 	priv->promisc_req = 0;
-	if (priv->rss) {
-		rxq_promiscuous_disable(&priv->rxq_parent);
-		goto end;
-	}
-	for (i = 0; (i != priv->rxqs_n); ++i)
-		if ((*priv->rxqs)[i] != NULL)
-			rxq_promiscuous_disable((*priv->rxqs)[i]);
-end:
+	priv_promiscuous_disable(priv);
 	priv_unlock(priv);
 }
 
 /**
- * Enable allmulti mode in a RX queue.
+ * Enable allmulti mode in a hash RX queue.
  *
- * @param rxq
- *   Pointer to RX queue structure.
+ * @param hash_rxq
+ *   Pointer to hash RX queue structure.
  *
  * @return
  *   0 on success, errno value on failure.
  */
-int
-rxq_allmulticast_enable(struct rxq *rxq)
+static int
+hash_rxq_allmulticast_enable(struct hash_rxq *hash_rxq)
 {
 	struct ibv_flow *flow;
 	struct ibv_flow_attr attr = {
 		.type = IBV_FLOW_ATTR_MC_DEFAULT,
 		.num_of_specs = 0,
-		.port = rxq->priv->port,
+		.port = hash_rxq->priv->port,
 		.flags = 0
 	};
 
-	if (rxq->allmulti_flow != NULL)
+	if (hash_rxq->allmulti_flow != NULL)
 		return 0;
-	DEBUG("%p: enabling allmulticast mode", (void *)rxq);
+	DEBUG("%p: enabling allmulticast mode", (void *)hash_rxq);
 	errno = 0;
-	flow = ibv_create_flow(rxq->qp, &attr);
+	flow = ibv_create_flow(hash_rxq->qp, &attr);
 	if (flow == NULL) {
 		/* It's not clear whether errno is always set in this case. */
 		ERROR("%p: flow configuration failed, errno=%d: %s",
-		      (void *)rxq, errno,
+		      (void *)hash_rxq, errno,
 		      (errno ? strerror(errno) : "Unknown error"));
 		if (errno)
 			return errno;
 		return EINVAL;
 	}
-	rxq->allmulti_flow = flow;
-	DEBUG("%p: allmulticast mode enabled", (void *)rxq);
+	hash_rxq->allmulti_flow = flow;
+	DEBUG("%p: allmulticast mode enabled", (void *)hash_rxq);
+	return 0;
+}
+
+/**
+ * Enable allmulti mode in most hash RX queues.
+ * TCP queues are exempted to save resources.
+ *
+ * @param priv
+ *   Private structure.
+ *
+ * @return
+ *   0 on success, errno value on failure.
+ */
+int
+priv_allmulticast_enable(struct priv *priv)
+{
+	unsigned int i;
+
+	for (i = 0; (i != priv->hash_rxqs_n); ++i) {
+		struct hash_rxq *hash_rxq = &(*priv->hash_rxqs)[i];
+		int ret;
+
+		ret = hash_rxq_allmulticast_enable(hash_rxq);
+		if (!ret)
+			continue;
+		/* Failure, rollback. */
+		while (i != 0) {
+			hash_rxq = &(*priv->hash_rxqs)[--i];
+			hash_rxq_allmulticast_disable(hash_rxq);
+		}
+		return ret;
+	}
 	return 0;
 }
 
@@ -235,54 +285,46 @@ void
 mlx5_allmulticast_enable(struct rte_eth_dev *dev)
 {
 	struct priv *priv = dev->data->dev_private;
-	unsigned int i;
 	int ret;
 
 	priv_lock(priv);
 	priv->allmulti_req = 1;
-	/* If device isn't started, this is all we need to do. */
-	if (!priv->started)
-		goto end;
-	if (priv->rss) {
-		ret = rxq_allmulticast_enable(&priv->rxq_parent);
-		if (ret) {
-			priv_unlock(priv);
-			return;
-		}
-		goto end;
-	}
-	for (i = 0; (i != priv->rxqs_n); ++i) {
-		if ((*priv->rxqs)[i] == NULL)
-			continue;
-		ret = rxq_allmulticast_enable((*priv->rxqs)[i]);
-		if (!ret)
-			continue;
-		/* Failure, rollback. */
-		while (i != 0)
-			if ((*priv->rxqs)[--i] != NULL)
-				rxq_allmulticast_disable((*priv->rxqs)[i]);
-		priv_unlock(priv);
-		return;
-	}
-end:
+	ret = priv_allmulticast_enable(priv);
+	if (ret)
+		ERROR("cannot enable allmulticast mode: %s", strerror(ret));
 	priv_unlock(priv);
 }
 
 /**
- * Disable allmulti mode in a RX queue.
+ * Disable allmulti mode in a hash RX queue.
  *
- * @param rxq
- *   Pointer to RX queue structure.
+ * @param hash_rxq
+ *   Pointer to hash RX queue structure.
  */
-void
-rxq_allmulticast_disable(struct rxq *rxq)
+static void
+hash_rxq_allmulticast_disable(struct hash_rxq *hash_rxq)
 {
-	if (rxq->allmulti_flow == NULL)
+	if (hash_rxq->allmulti_flow == NULL)
 		return;
-	DEBUG("%p: disabling allmulticast mode", (void *)rxq);
-	claim_zero(ibv_destroy_flow(rxq->allmulti_flow));
-	rxq->allmulti_flow = NULL;
-	DEBUG("%p: allmulticast mode disabled", (void *)rxq);
+	DEBUG("%p: disabling allmulticast mode", (void *)hash_rxq);
+	claim_zero(ibv_destroy_flow(hash_rxq->allmulti_flow));
+	hash_rxq->allmulti_flow = NULL;
+	DEBUG("%p: allmulticast mode disabled", (void *)hash_rxq);
+}
+
+/**
+ * Disable allmulti mode in all hash RX queues.
+ *
+ * @param priv
+ *   Private structure.
+ */
+void
+priv_allmulticast_disable(struct priv *priv)
+{
+	unsigned int i;
+
+	for (i = 0; (i != priv->hash_rxqs_n); ++i)
+		hash_rxq_allmulticast_disable(&(*priv->hash_rxqs)[i]);
 }
 
 /**
@@ -295,17 +337,9 @@ void
 mlx5_allmulticast_disable(struct rte_eth_dev *dev)
 {
 	struct priv *priv = dev->data->dev_private;
-	unsigned int i;
 
 	priv_lock(priv);
 	priv->allmulti_req = 0;
-	if (priv->rss) {
-		rxq_allmulticast_disable(&priv->rxq_parent);
-		goto end;
-	}
-	for (i = 0; (i != priv->rxqs_n); ++i)
-		if ((*priv->rxqs)[i] != NULL)
-			rxq_allmulticast_disable((*priv->rxqs)[i]);
-end:
+	priv_allmulticast_disable(priv);
 	priv_unlock(priv);
 }
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index f2f773e..6d8f7d2 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -60,11 +60,220 @@
 #endif
 
 #include "mlx5.h"
-#include "mlx5_autoconf.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_utils.h"
 #include "mlx5_defs.h"
 
+/* Default RSS hash key also used for ConnectX-3. */
+static uint8_t hash_rxq_default_key[] = {
+	0x2c, 0xc6, 0x81, 0xd1,
+	0x5b, 0xdb, 0xf4, 0xf7,
+	0xfc, 0xa2, 0x83, 0x19,
+	0xdb, 0x1a, 0x3e, 0x94,
+	0x6b, 0x9e, 0x38, 0xd9,
+	0x2c, 0x9c, 0x03, 0xd1,
+	0xad, 0x99, 0x44, 0xa7,
+	0xd9, 0x56, 0x3d, 0x59,
+	0x06, 0x3c, 0x25, 0xf3,
+	0xfc, 0x1f, 0xdc, 0x2a,
+};
+
+/**
+ * Return nearest power of two above input value.
+ *
+ * @param v
+ *   Input value.
+ *
+ * @return
+ *   Nearest power of two above input value.
+ */
+static unsigned int
+log2above(unsigned int v)
+{
+	unsigned int l;
+	unsigned int r;
+
+	for (l = 0, r = 0; (v >> 1); ++l, v >>= 1)
+		r |= (v & 1);
+	return (l + r);
+}
+
+/**
+ * Initialize hash RX queues and indirection table.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ *
+ * @return
+ *   0 on success, errno value on failure.
+ */
+int
+priv_create_hash_rxqs(struct priv *priv)
+{
+	static const uint64_t rss_hash_table[] = {
+		/* TCPv4. */
+		(IBV_EXP_RX_HASH_SRC_IPV4 | IBV_EXP_RX_HASH_DST_IPV4 |
+		 IBV_EXP_RX_HASH_SRC_PORT_TCP | IBV_EXP_RX_HASH_DST_PORT_TCP),
+		/* UDPv4. */
+		(IBV_EXP_RX_HASH_SRC_IPV4 | IBV_EXP_RX_HASH_DST_IPV4 |
+		 IBV_EXP_RX_HASH_SRC_PORT_UDP | IBV_EXP_RX_HASH_DST_PORT_UDP),
+		/* Other IPv4. */
+		(IBV_EXP_RX_HASH_SRC_IPV4 | IBV_EXP_RX_HASH_DST_IPV4),
+		/* None, used for everything else. */
+		0,
+	};
+
+	DEBUG("allocating hash RX queues for %u WQs", priv->rxqs_n);
+	assert(priv->ind_table == NULL);
+	assert(priv->hash_rxqs == NULL);
+	assert(priv->hash_rxqs_n == 0);
+	assert(priv->pd != NULL);
+	assert(priv->ctx != NULL);
+	if (priv->rxqs_n == 0)
+		return EINVAL;
+	assert(priv->rxqs != NULL);
+
+	/* FIXME: large data structures are allocated on the stack. */
+	unsigned int wqs_n = (1 << log2above(priv->rxqs_n));
+	struct ibv_exp_wq *wqs[wqs_n];
+	struct ibv_exp_rwq_ind_table_init_attr ind_init_attr = {
+		.pd = priv->pd,
+		.log_ind_tbl_size = log2above(priv->rxqs_n),
+		.ind_tbl = wqs,
+		.comp_mask = 0,
+	};
+	struct ibv_exp_rwq_ind_table *ind_table = NULL;
+	/* If only one RX queue is configured, RSS is not needed and a single
+	 * empty hash entry is used (last rss_hash_table[] entry). */
+	unsigned int hash_rxqs_n =
+		((priv->rxqs_n == 1) ? 1 : RTE_DIM(rss_hash_table));
+	struct hash_rxq (*hash_rxqs)[hash_rxqs_n] = NULL;
+	unsigned int i;
+	unsigned int j;
+	int err = 0;
+
+	if (wqs_n < priv->rxqs_n) {
+		ERROR("cannot handle this many RX queues (%u)", priv->rxqs_n);
+		err = ERANGE;
+		goto error;
+	}
+	if (wqs_n != priv->rxqs_n)
+		WARN("%u RX queues are configured, consider rounding this"
+		     " number to the next power of two (%u) for optimal"
+		     " performance",
+		     priv->rxqs_n, wqs_n);
+	/* When the number of RX queues is not a power of two, the remaining
+	 * table entries are padded with reused WQs and hashes are not spread
+	 * uniformly. */
+	for (i = 0, j = 0; (i != wqs_n); ++i) {
+		wqs[i] = (*priv->rxqs)[j]->wq;
+		if (++j == priv->rxqs_n)
+			j = 0;
+	}
+	errno = 0;
+	ind_table = ibv_exp_create_rwq_ind_table(priv->ctx, &ind_init_attr);
+	if (ind_table == NULL) {
+		/* Not clear whether errno is set. */
+		err = (errno ? errno : EINVAL);
+		ERROR("RX indirection table creation failed with error %d: %s",
+		      err, strerror(err));
+		goto error;
+	}
+	/* Allocate array that holds hash RX queues and related data. */
+	hash_rxqs = rte_malloc(__func__, sizeof(*hash_rxqs), 0);
+	if (hash_rxqs == NULL) {
+		err = ENOMEM;
+		ERROR("cannot allocate hash RX queues container: %s",
+		      strerror(err));
+		goto error;
+	}
+	for (i = 0, j = (RTE_DIM(rss_hash_table) - hash_rxqs_n);
+	     (j != RTE_DIM(rss_hash_table));
+	     ++i, ++j) {
+		struct hash_rxq *hash_rxq = &(*hash_rxqs)[i];
+
+		struct ibv_exp_rx_hash_conf hash_conf = {
+			.rx_hash_function = IBV_EXP_RX_HASH_FUNC_TOEPLITZ,
+			.rx_hash_key_len = sizeof(hash_rxq_default_key),
+			.rx_hash_key = hash_rxq_default_key,
+			.rx_hash_fields_mask = rss_hash_table[j],
+			.rwq_ind_tbl = ind_table,
+		};
+		struct ibv_exp_qp_init_attr qp_init_attr = {
+			.max_inl_recv = 0, /* Currently not supported. */
+			.qp_type = IBV_QPT_RAW_PACKET,
+			.comp_mask = (IBV_EXP_QP_INIT_ATTR_PD |
+				      IBV_EXP_QP_INIT_ATTR_RX_HASH),
+			.pd = priv->pd,
+			.rx_hash_conf = &hash_conf,
+			.port_num = priv->port,
+		};
+
+		*hash_rxq = (struct hash_rxq){
+			.priv = priv,
+			.qp = ibv_exp_create_qp(priv->ctx, &qp_init_attr),
+		};
+		if (hash_rxq->qp == NULL) {
+			err = (errno ? errno : EINVAL);
+			ERROR("Hash RX QP creation failure: %s",
+			      strerror(err));
+			while (i) {
+				hash_rxq = &(*hash_rxqs)[--i];
+				claim_zero(ibv_destroy_qp(hash_rxq->qp));
+			}
+			goto error;
+		}
+	}
+	priv->ind_table = ind_table;
+	priv->hash_rxqs = hash_rxqs;
+	priv->hash_rxqs_n = hash_rxqs_n;
+	assert(err == 0);
+	return 0;
+error:
+	rte_free(hash_rxqs);
+	if (ind_table != NULL)
+		claim_zero(ibv_exp_destroy_rwq_ind_table(ind_table));
+	return err;
+}
+
+/**
+ * Clean up hash RX queues and indirection table.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ */
+void
+priv_destroy_hash_rxqs(struct priv *priv)
+{
+	unsigned int i;
+
+	DEBUG("destroying %u hash RX queues", priv->hash_rxqs_n);
+	if (priv->hash_rxqs_n == 0) {
+		assert(priv->hash_rxqs == NULL);
+		assert(priv->ind_table == NULL);
+		return;
+	}
+	for (i = 0; (i != priv->hash_rxqs_n); ++i) {
+		struct hash_rxq *hash_rxq = &(*priv->hash_rxqs)[i];
+		unsigned int j, k;
+
+		assert(hash_rxq->priv == priv);
+		assert(hash_rxq->qp != NULL);
+		/* Also check that there are no remaining flows. */
+		assert(hash_rxq->allmulti_flow == NULL);
+		assert(hash_rxq->promisc_flow == NULL);
+		for (j = 0; (j != RTE_DIM(hash_rxq->mac_flow)); ++j)
+			for (k = 0; (k != RTE_DIM(hash_rxq->mac_flow[j])); ++k)
+				assert(hash_rxq->mac_flow[j][k] == NULL);
+		claim_zero(ibv_destroy_qp(hash_rxq->qp));
+	}
+	priv->hash_rxqs_n = 0;
+	rte_free(priv->hash_rxqs);
+	priv->hash_rxqs = NULL;
+	claim_zero(ibv_exp_destroy_rwq_ind_table(priv->ind_table));
+	priv->ind_table = NULL;
+}
+
 /**
  * Allocate RX queue elements with scattered packets support.
  *
@@ -336,15 +545,15 @@ rxq_cleanup(struct rxq *rxq)
 		rxq_free_elts_sp(rxq);
 	else
 		rxq_free_elts(rxq);
-	if (rxq->if_qp != NULL) {
+	if (rxq->if_wq != NULL) {
 		assert(rxq->priv != NULL);
 		assert(rxq->priv->ctx != NULL);
-		assert(rxq->qp != NULL);
+		assert(rxq->wq != NULL);
 		params = (struct ibv_exp_release_intf_params){
 			.comp_mask = 0,
 		};
 		claim_zero(ibv_exp_release_intf(rxq->priv->ctx,
-						rxq->if_qp,
+						rxq->if_wq,
 						&params));
 	}
 	if (rxq->if_cq != NULL) {
@@ -358,12 +567,8 @@ rxq_cleanup(struct rxq *rxq)
 						rxq->if_cq,
 						&params));
 	}
-	if (rxq->qp != NULL) {
-		rxq_promiscuous_disable(rxq);
-		rxq_allmulticast_disable(rxq);
-		rxq_mac_addrs_del(rxq);
-		claim_zero(ibv_destroy_qp(rxq->qp));
-	}
+	if (rxq->wq != NULL)
+		claim_zero(ibv_exp_destroy_wq(rxq->wq));
 	if (rxq->cq != NULL)
 		claim_zero(ibv_destroy_cq(rxq->cq));
 	if (rxq->rd != NULL) {
@@ -383,112 +588,6 @@ rxq_cleanup(struct rxq *rxq)
 }
 
 /**
- * Allocate a Queue Pair.
- * Optionally setup inline receive if supported.
- *
- * @param priv
- *   Pointer to private structure.
- * @param cq
- *   Completion queue to associate with QP.
- * @param desc
- *   Number of descriptors in QP (hint only).
- *
- * @return
- *   QP pointer or NULL in case of error.
- */
-static struct ibv_qp *
-rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, uint16_t desc,
-	     struct ibv_exp_res_domain *rd)
-{
-	struct ibv_exp_qp_init_attr attr = {
-		/* CQ to be associated with the send queue. */
-		.send_cq = cq,
-		/* CQ to be associated with the receive queue. */
-		.recv_cq = cq,
-		.cap = {
-			/* Max number of outstanding WRs. */
-			.max_recv_wr = ((priv->device_attr.max_qp_wr < desc) ?
-					priv->device_attr.max_qp_wr :
-					desc),
-			/* Max number of scatter/gather elements in a WR. */
-			.max_recv_sge = ((priv->device_attr.max_sge <
-					  MLX5_PMD_SGE_WR_N) ?
-					 priv->device_attr.max_sge :
-					 MLX5_PMD_SGE_WR_N),
-		},
-		.qp_type = IBV_QPT_RAW_PACKET,
-		.comp_mask = (IBV_EXP_QP_INIT_ATTR_PD |
-			      IBV_EXP_QP_INIT_ATTR_RES_DOMAIN),
-		.pd = priv->pd,
-		.res_domain = rd,
-	};
-
-	return ibv_exp_create_qp(priv->ctx, &attr);
-}
-
-#ifdef RSS_SUPPORT
-
-/**
- * Allocate a RSS Queue Pair.
- * Optionally setup inline receive if supported.
- *
- * @param priv
- *   Pointer to private structure.
- * @param cq
- *   Completion queue to associate with QP.
- * @param desc
- *   Number of descriptors in QP (hint only).
- * @param parent
- *   If nonzero, create a parent QP, otherwise a child.
- *
- * @return
- *   QP pointer or NULL in case of error.
- */
-static struct ibv_qp *
-rxq_setup_qp_rss(struct priv *priv, struct ibv_cq *cq, uint16_t desc,
-		 int parent, struct ibv_exp_res_domain *rd)
-{
-	struct ibv_exp_qp_init_attr attr = {
-		/* CQ to be associated with the send queue. */
-		.send_cq = cq,
-		/* CQ to be associated with the receive queue. */
-		.recv_cq = cq,
-		.cap = {
-			/* Max number of outstanding WRs. */
-			.max_recv_wr = ((priv->device_attr.max_qp_wr < desc) ?
-					priv->device_attr.max_qp_wr :
-					desc),
-			/* Max number of scatter/gather elements in a WR. */
-			.max_recv_sge = ((priv->device_attr.max_sge <
-					  MLX5_PMD_SGE_WR_N) ?
-					 priv->device_attr.max_sge :
-					 MLX5_PMD_SGE_WR_N),
-		},
-		.qp_type = IBV_QPT_RAW_PACKET,
-		.comp_mask = (IBV_EXP_QP_INIT_ATTR_PD |
-			      IBV_EXP_QP_INIT_ATTR_RES_DOMAIN |
-			      IBV_EXP_QP_INIT_ATTR_QPG),
-		.pd = priv->pd,
-		.res_domain = rd,
-	};
-
-	if (parent) {
-		attr.qpg.qpg_type = IBV_EXP_QPG_PARENT;
-		/* TSS isn't necessary. */
-		attr.qpg.parent_attrib.tss_child_count = 0;
-		attr.qpg.parent_attrib.rss_child_count = priv->rxqs_n;
-		DEBUG("initializing parent RSS queue");
-	} else {
-		attr.qpg.qpg_type = IBV_EXP_QPG_CHILD_RX;
-		attr.qpg.qpg_parent = priv->rxq_parent.qp;
-		DEBUG("initializing child RSS queue");
-	}
-	return ibv_exp_create_qp(priv->ctx, &attr);
-}
-
-#endif /* RSS_SUPPORT */
-
-/**
  * Reconfigure a RX queue with new parameters.
  *
  * rxq_rehash() does not allocate mbufs, which, if not done from the right
@@ -512,15 +611,9 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 	unsigned int desc_n;
 	struct rte_mbuf **pool;
 	unsigned int i, k;
-	struct ibv_exp_qp_attr mod;
+	struct ibv_exp_wq_attr mod;
 	int err;
-	int parent = (rxq == &priv->rxq_parent);
 
-	if (parent) {
-		ERROR("%p: cannot rehash parent queue %p",
-		      (void *)dev, (void *)rxq);
-		return EINVAL;
-	}
 	DEBUG("%p: rehashing queue %p", (void *)dev, (void *)rxq);
 	/* Number of descriptors and mbufs currently allocated. */
 	desc_n = (tmpl.elts_n * (tmpl.sp ? MLX5_PMD_SGE_WR_N : 1));
@@ -549,61 +642,17 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 		DEBUG("%p: nothing to do", (void *)dev);
 		return 0;
 	}
-	/* Remove attached flows if RSS is disabled (no parent queue). */
-	if (!priv->rss) {
-		rxq_allmulticast_disable(&tmpl);
-		rxq_promiscuous_disable(&tmpl);
-		rxq_mac_addrs_del(&tmpl);
-		/* Update original queue in case of failure. */
-		rxq->allmulti_flow = tmpl.allmulti_flow;
-		rxq->promisc_flow = tmpl.promisc_flow;
-		memcpy(rxq->mac_flow, tmpl.mac_flow, sizeof(rxq->mac_flow));
-	}
 	/* From now on, any failure will render the queue unusable.
-	 * Reinitialize QP. */
-	mod = (struct ibv_exp_qp_attr){ .qp_state = IBV_QPS_RESET };
-	err = ibv_exp_modify_qp(tmpl.qp, &mod, IBV_EXP_QP_STATE);
-	if (err) {
-		ERROR("%p: cannot reset QP: %s", (void *)dev, strerror(err));
-		assert(err > 0);
-		return err;
-	}
-	err = ibv_resize_cq(tmpl.cq, desc_n);
-	if (err) {
-		ERROR("%p: cannot resize CQ: %s", (void *)dev, strerror(err));
-		assert(err > 0);
-		return err;
-	}
-	mod = (struct ibv_exp_qp_attr){
-		/* Move the QP to this state. */
-		.qp_state = IBV_QPS_INIT,
-		/* Primary port number. */
-		.port_num = priv->port
+	 * Reinitialize WQ. */
+	mod = (struct ibv_exp_wq_attr){
+		.attr_mask = IBV_EXP_WQ_ATTR_STATE,
+		.wq_state = IBV_EXP_WQS_RESET,
 	};
-	err = ibv_exp_modify_qp(tmpl.qp, &mod,
-				(IBV_EXP_QP_STATE |
-#ifdef RSS_SUPPORT
-				 (parent ? IBV_EXP_QP_GROUP_RSS : 0) |
-#endif /* RSS_SUPPORT */
-				 IBV_EXP_QP_PORT));
+	err = ibv_exp_modify_wq(tmpl.wq, &mod);
 	if (err) {
-		ERROR("%p: QP state to IBV_QPS_INIT failed: %s",
-		      (void *)dev, strerror(err));
+		ERROR("%p: cannot reset WQ: %s", (void *)dev, strerror(err));
 		assert(err > 0);
 		return err;
-	};
-	/* Reconfigure flows. Do not care for errors. */
-	if (!priv->rss) {
-		if (priv->started)
-			rxq_mac_addrs_add(&tmpl);
-		if (priv->started && priv->promisc_req)
-			rxq_promiscuous_enable(&tmpl);
-		if (priv->started && priv->allmulti_req)
-			rxq_allmulticast_enable(&tmpl);
-		/* Update original queue in case of failure. */
-		rxq->allmulti_flow = tmpl.allmulti_flow;
-		rxq->promisc_flow = tmpl.promisc_flow;
-		memcpy(rxq->mac_flow, tmpl.mac_flow, sizeof(rxq->mac_flow));
 	}
 	/* Allocate pool. */
 	pool = rte_malloc(__func__, (mbuf_n * sizeof(*pool)), 0);
@@ -655,21 +704,27 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 	rxq->elts_n = 0;
 	rte_free(rxq->elts.sp);
 	rxq->elts.sp = NULL;
+	/* Change queue state to ready. */
+	mod = (struct ibv_exp_wq_attr){
+		.attr_mask = IBV_EXP_WQ_ATTR_STATE,
+		.wq_state = IBV_EXP_WQS_RDY,
+	};
+	err = ibv_exp_modify_wq(tmpl.wq, &mod);
+	if (err) {
+		ERROR("%p: WQ state to IBV_EXP_WQS_RDY failed: %s",
+		      (void *)dev, strerror(err));
+		goto error;
+	}
 	/* Post SGEs. */
-	assert(tmpl.if_qp != NULL);
+	assert(tmpl.if_wq != NULL);
 	if (tmpl.sp) {
 		struct rxq_elt_sp (*elts)[tmpl.elts_n] = tmpl.elts.sp;
 
 		for (i = 0; (i != RTE_DIM(*elts)); ++i) {
-#ifdef HAVE_EXP_QP_BURST_RECV_SG_LIST
-			err = tmpl.if_qp->recv_sg_list
-				(tmpl.qp,
+			err = tmpl.if_wq->recv_sg_list
+				(tmpl.wq,
 				 (*elts)[i].sges,
 				 RTE_DIM((*elts)[i].sges));
-#else /* HAVE_EXP_QP_BURST_RECV_SG_LIST */
-			errno = ENOSYS;
-			err = -1;
-#endif /* HAVE_EXP_QP_BURST_RECV_SG_LIST */
 			if (err)
 				break;
 		}
@@ -677,8 +732,8 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 		struct rxq_elt (*elts)[tmpl.elts_n] = tmpl.elts.no_sp;
 
 		for (i = 0; (i != RTE_DIM(*elts)); ++i) {
-			err = tmpl.if_qp->recv_burst(
-				tmpl.qp,
+			err = tmpl.if_wq->recv_burst(
+				tmpl.wq,
 				&(*elts)[i].sge,
 				1);
 			if (err)
@@ -690,16 +745,9 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 		      (void *)dev, err);
 		/* Set err because it does not contain a valid errno value. */
 		err = EIO;
-		goto skip_rtr;
+		goto error;
 	}
-	mod = (struct ibv_exp_qp_attr){
-		.qp_state = IBV_QPS_RTR
-	};
-	err = ibv_exp_modify_qp(tmpl.qp, &mod, IBV_EXP_QP_STATE);
-	if (err)
-		ERROR("%p: QP state to IBV_QPS_RTR failed: %s",
-		      (void *)dev, strerror(err));
-skip_rtr:
+error:
 	*rxq = tmpl;
 	assert(err >= 0);
 	return err;
@@ -735,30 +783,20 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 		.mp = mp,
 		.socket = socket
 	};
-	struct ibv_exp_qp_attr mod;
+	struct ibv_exp_wq_attr mod;
 	union {
 		struct ibv_exp_query_intf_params params;
 		struct ibv_exp_cq_init_attr cq;
 		struct ibv_exp_res_domain_init_attr rd;
+		struct ibv_exp_wq_init_attr wq;
 	} attr;
 	enum ibv_exp_query_intf_status status;
 	struct rte_mbuf *buf;
 	int ret = 0;
-	int parent = (rxq == &priv->rxq_parent);
 	unsigned int i;
+	unsigned int cq_size = desc;
 
 	(void)conf; /* Thresholds configuration (ignored). */
-	/*
-	 * If this is a parent queue, hardware must support RSS and
-	 * RSS must be enabled.
-	 */
-	assert((!parent) || ((priv->hw_rss) && (priv->rss)));
-	if (parent) {
-		/* Even if unused, ibv_create_cq() requires at least one
-		 * descriptor. */
-		desc = 1;
-		goto skip_mr;
-	}
 	if ((desc == 0) || (desc % MLX5_PMD_SGE_WR_N)) {
 		ERROR("%p: invalid number of RX descriptors (must be a"
 		      " multiple of %d)", (void *)dev, MLX5_PMD_SGE_WR_N);
@@ -801,7 +839,6 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 		      (void *)dev, strerror(ret));
 		goto error;
 	}
-skip_mr:
 	attr.rd = (struct ibv_exp_res_domain_init_attr){
 		.comp_mask = (IBV_EXP_RES_DOMAIN_THREAD_MODEL |
 			      IBV_EXP_RES_DOMAIN_MSG_MODEL),
@@ -819,7 +856,8 @@ skip_mr:
 		.comp_mask = IBV_EXP_CQ_INIT_ATTR_RES_DOMAIN,
 		.res_domain = tmpl.rd,
 	};
-	tmpl.cq = ibv_exp_create_cq(priv->ctx, desc, NULL, NULL, 0, &attr.cq);
+	tmpl.cq = ibv_exp_create_cq(priv->ctx, cq_size, NULL, NULL, 0,
+				    &attr.cq);
 	if (tmpl.cq == NULL) {
 		ret = ENOMEM;
 		ERROR("%p: CQ creation failure: %s",
@@ -830,48 +868,30 @@ skip_mr:
 	      priv->device_attr.max_qp_wr);
 	DEBUG("priv->device_attr.max_sge is %d",
 	      priv->device_attr.max_sge);
-#ifdef RSS_SUPPORT
-	if (priv->rss)
-		tmpl.qp = rxq_setup_qp_rss(priv, tmpl.cq, desc, parent,
-					   tmpl.rd);
-	else
-#endif /* RSS_SUPPORT */
-		tmpl.qp = rxq_setup_qp(priv, tmpl.cq, desc, tmpl.rd);
-	if (tmpl.qp == NULL) {
-		ret = (errno ? errno : EINVAL);
-		ERROR("%p: QP creation failure: %s",
-		      (void *)dev, strerror(ret));
-		goto error;
-	}
-	mod = (struct ibv_exp_qp_attr){
-		/* Move the QP to this state. */
-		.qp_state = IBV_QPS_INIT,
-		/* Primary port number. */
-		.port_num = priv->port
+	attr.wq = (struct ibv_exp_wq_init_attr){
+		.wq_context = NULL, /* Could be useful in the future. */
+		.wq_type = IBV_EXP_WQT_RQ,
+		/* Max number of outstanding WRs. */
+		.max_recv_wr = ((priv->device_attr.max_qp_wr < (int)cq_size) ?
+				priv->device_attr.max_qp_wr :
+				(int)cq_size),
+		/* Max number of scatter/gather elements in a WR. */
+		.max_recv_sge = ((priv->device_attr.max_sge <
+				  MLX5_PMD_SGE_WR_N) ?
+				 priv->device_attr.max_sge :
+				 MLX5_PMD_SGE_WR_N),
+		.pd = priv->pd,
+		.cq = tmpl.cq,
+		.comp_mask = IBV_EXP_CREATE_WQ_RES_DOMAIN,
+		.res_domain = tmpl.rd,
 	};
-	ret = ibv_exp_modify_qp(tmpl.qp, &mod,
-				(IBV_EXP_QP_STATE |
-#ifdef RSS_SUPPORT
-				 (parent ? IBV_EXP_QP_GROUP_RSS : 0) |
-#endif /* RSS_SUPPORT */
-				 IBV_EXP_QP_PORT));
-	if (ret) {
-		ERROR("%p: QP state to IBV_QPS_INIT failed: %s",
+	tmpl.wq = ibv_exp_create_wq(priv->ctx, &attr.wq);
+	if (tmpl.wq == NULL) {
+		ret = (errno ? errno : EINVAL);
+		ERROR("%p: WQ creation failure: %s",
 		      (void *)dev, strerror(ret));
 		goto error;
 	}
-	if ((parent) || (!priv->rss))  {
-		/* Configure MAC and broadcast addresses. */
-		ret = rxq_mac_addrs_add(&tmpl);
-		if (ret) {
-			ERROR("%p: QP flow attachment failed: %s",
-			      (void *)dev, strerror(ret));
-			goto error;
-		}
-	}
-	/* Allocate descriptors for RX queues, except for the RSS parent. */
-	if (parent)
-		goto skip_alloc;
 	if (tmpl.sp)
 		ret = rxq_alloc_elts_sp(&tmpl, desc, NULL);
 	else
@@ -881,7 +901,6 @@ skip_mr:
 		      (void *)dev, strerror(ret));
 		goto error;
 	}
-skip_alloc:
 	/* Save port ID. */
 	tmpl.port_id = dev->data->port_id;
 	DEBUG("%p: RTE port ID: %u", (void *)rxq, tmpl.port_id);
@@ -898,38 +917,44 @@ skip_alloc:
 	}
 	attr.params = (struct ibv_exp_query_intf_params){
 		.intf_scope = IBV_EXP_INTF_GLOBAL,
-		.intf = IBV_EXP_INTF_QP_BURST,
-		.obj = tmpl.qp,
+		.intf = IBV_EXP_INTF_WQ,
+		.obj = tmpl.wq,
 	};
-	tmpl.if_qp = ibv_exp_query_intf(priv->ctx, &attr.params, &status);
-	if (tmpl.if_qp == NULL) {
-		ERROR("%p: QP interface family query failed with status %d",
+	tmpl.if_wq = ibv_exp_query_intf(priv->ctx, &attr.params, &status);
+	if (tmpl.if_wq == NULL) {
+		ERROR("%p: WQ interface family query failed with status %d",
 		      (void *)dev, status);
 		goto error;
 	}
+	/* Change queue state to ready. */
+	mod = (struct ibv_exp_wq_attr){
+		.attr_mask = IBV_EXP_WQ_ATTR_STATE,
+		.wq_state = IBV_EXP_WQS_RDY,
+	};
+	ret = ibv_exp_modify_wq(tmpl.wq, &mod);
+	if (ret) {
+		ERROR("%p: WQ state to IBV_EXP_WQS_RDY failed: %s",
+		      (void *)dev, strerror(ret));
+		goto error;
+	}
 	/* Post SGEs. */
-	if (!parent && tmpl.sp) {
+	if (tmpl.sp) {
 		struct rxq_elt_sp (*elts)[tmpl.elts_n] = tmpl.elts.sp;
 
 		for (i = 0; (i != RTE_DIM(*elts)); ++i) {
-#ifdef HAVE_EXP_QP_BURST_RECV_SG_LIST
-			ret = tmpl.if_qp->recv_sg_list
-				(tmpl.qp,
+			ret = tmpl.if_wq->recv_sg_list
+				(tmpl.wq,
 				 (*elts)[i].sges,
 				 RTE_DIM((*elts)[i].sges));
-#else /* HAVE_EXP_QP_BURST_RECV_SG_LIST */
-			errno = ENOSYS;
-			ret = -1;
-#endif /* HAVE_EXP_QP_BURST_RECV_SG_LIST */
 			if (ret)
 				break;
 		}
-	} else if (!parent) {
+	} else {
 		struct rxq_elt (*elts)[tmpl.elts_n] = tmpl.elts.no_sp;
 
 		for (i = 0; (i != RTE_DIM(*elts)); ++i) {
-			ret = tmpl.if_qp->recv_burst(
-				tmpl.qp,
+			ret = tmpl.if_wq->recv_burst(
+				tmpl.wq,
 				&(*elts)[i].sge,
 				1);
 			if (ret)
@@ -943,15 +968,6 @@ skip_alloc:
 		ret = EIO;
 		goto error;
 	}
-	mod = (struct ibv_exp_qp_attr){
-		.qp_state = IBV_QPS_RTR
-	};
-	ret = ibv_exp_modify_qp(tmpl.qp, &mod, IBV_EXP_QP_STATE);
-	if (ret) {
-		ERROR("%p: QP state to IBV_QPS_RTR failed: %s",
-		      (void *)dev, strerror(ret));
-		goto error;
-	}
 	/* Clean up rxq in case we're reinitializing it. */
 	DEBUG("%p: cleaning-up old rxq just in case", (void *)rxq);
 	rxq_cleanup(rxq);
@@ -1055,7 +1071,6 @@ mlx5_rx_queue_release(void *dpdk_rxq)
 		return;
 	priv = rxq->priv;
 	priv_lock(priv);
-	assert(rxq != &priv->rxq_parent);
 	for (i = 0; (i != priv->rxqs_n); ++i)
 		if ((*priv->rxqs)[i] == rxq) {
 			DEBUG("%p: removing RX queue %p from list",
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index f48fec1..db2ac03 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -35,7 +35,6 @@
 #include <stdint.h>
 #include <string.h>
 #include <stdlib.h>
-#include <errno.h>
 
 /* Verbs header. */
 /* ISO C doesn't support unnamed structs/unions, disabling -pedantic. */
@@ -61,7 +60,6 @@
 #endif
 
 #include "mlx5.h"
-#include "mlx5_autoconf.h"
 #include "mlx5_utils.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_defs.h"
@@ -755,14 +753,9 @@ mlx5_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		rxq->stats.ibytes += pkt_buf_len;
 #endif
 repost:
-#ifdef HAVE_EXP_QP_BURST_RECV_SG_LIST
-		ret = rxq->if_qp->recv_sg_list(rxq->qp,
+		ret = rxq->if_wq->recv_sg_list(rxq->wq,
 					       elt->sges,
 					       RTE_DIM(elt->sges));
-#else /* HAVE_EXP_QP_BURST_RECV_SG_LIST */
-		errno = ENOSYS;
-		ret = -1;
-#endif /* HAVE_EXP_QP_BURST_RECV_SG_LIST */
 		if (unlikely(ret)) {
 			/* Inability to repost WRs is fatal. */
 			DEBUG("%p: recv_sg_list(): failed (ret=%d)",
@@ -919,7 +912,7 @@ repost:
 #ifdef DEBUG_RECV
 	DEBUG("%p: reposting %u WRs", (void *)rxq, i);
 #endif
-	ret = rxq->if_qp->recv_burst(rxq->qp, sges, i);
+	ret = rxq->if_wq->recv_burst(rxq->wq, sges, i);
 	if (unlikely(ret)) {
 		/* Inability to repost WRs is fatal. */
 		DEBUG("%p: recv_burst(): failed (ret=%d)",
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 90c99dc..df1d52b 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -99,13 +99,9 @@ struct rxq {
 	struct rte_mempool *mp; /* Memory Pool for allocations. */
 	struct ibv_mr *mr; /* Memory Region (for mp). */
 	struct ibv_cq *cq; /* Completion Queue. */
-	struct ibv_qp *qp; /* Queue Pair. */
-	struct ibv_exp_qp_burst_family *if_qp; /* QP burst interface. */
+	struct ibv_exp_wq *wq; /* Work Queue. */
+	struct ibv_exp_wq_family *if_wq; /* WQ burst interface. */
 	struct ibv_exp_cq_family *if_cq; /* CQ interface. */
-	/* MAC flow steering rules, one per VLAN ID. */
-	struct ibv_flow *mac_flow[MLX5_MAX_MAC_ADDRESSES][MLX5_MAX_VLAN_IDS];
-	struct ibv_flow *promisc_flow; /* Promiscuous flow. */
-	struct ibv_flow *allmulti_flow; /* Multicast flow. */
 	unsigned int port_id; /* Port ID for incoming packets. */
 	unsigned int elts_n; /* (*elts)[] length. */
 	unsigned int elts_head; /* Current index in (*elts)[]. */
@@ -122,6 +118,15 @@ struct rxq {
 	struct ibv_exp_res_domain *rd; /* Resource Domain. */
 };
 
+struct hash_rxq {
+	struct priv *priv; /* Back pointer to private data. */
+	struct ibv_qp *qp; /* Hash RX QP. */
+	/* MAC flow steering rules, one per VLAN ID. */
+	struct ibv_flow *mac_flow[MLX5_MAX_MAC_ADDRESSES][MLX5_MAX_VLAN_IDS];
+	struct ibv_flow *promisc_flow; /* Promiscuous flow. */
+	struct ibv_flow *allmulti_flow; /* Multicast flow. */
+};
+
 /* TX element. */
 struct txq_elt {
 	struct rte_mbuf *buf;
@@ -166,6 +171,8 @@ struct txq {
 
 /* mlx5_rxq.c */
 
+int priv_create_hash_rxqs(struct priv *);
+void priv_destroy_hash_rxqs(struct priv *);
 void rxq_cleanup(struct rxq *);
 int rxq_rehash(struct rte_eth_dev *, struct rxq *);
 int rxq_setup(struct rte_eth_dev *, struct rxq *, uint16_t, unsigned int,
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index dced025..233c0d8 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -60,55 +60,35 @@ int
 mlx5_dev_start(struct rte_eth_dev *dev)
 {
 	struct priv *priv = dev->data->dev_private;
-	unsigned int i = 0;
-	unsigned int r;
-	struct rxq *rxq;
+	int err;
 
 	priv_lock(priv);
 	if (priv->started) {
 		priv_unlock(priv);
 		return 0;
 	}
-	DEBUG("%p: attaching configured flows to all RX queues", (void *)dev);
-	priv->started = 1;
-	if (priv->rss) {
-		rxq = &priv->rxq_parent;
-		r = 1;
-	} else {
-		rxq = (*priv->rxqs)[0];
-		r = priv->rxqs_n;
-	}
-	/* Iterate only once when RSS is enabled. */
-	do {
-		int ret;
-
-		/* Ignore nonexistent RX queues. */
-		if (rxq == NULL)
-			continue;
-		ret = rxq_mac_addrs_add(rxq);
-		if (!ret && priv->promisc_req)
-			ret = rxq_promiscuous_enable(rxq);
-		if (!ret && priv->allmulti_req)
-			ret = rxq_allmulticast_enable(rxq);
-		if (!ret)
-			continue;
-		WARN("%p: QP flow attachment failed: %s",
-		     (void *)dev, strerror(ret));
+	DEBUG("%p: allocating and configuring hash RX queues", (void *)dev);
+	err = priv_create_hash_rxqs(priv);
+	if (!err)
+		err = priv_mac_addrs_enable(priv);
+	if (!err && priv->promisc_req)
+		err = priv_promiscuous_enable(priv);
+	if (!err && priv->allmulti_req)
+		err = priv_allmulticast_enable(priv);
+	if (!err)
+		priv->started = 1;
+	else {
+		ERROR("%p: an error occurred while configuring hash RX queues:"
+		      " %s",
+		      (void *)priv, strerror(err));
 		/* Rollback. */
-		while (i != 0) {
-			rxq = (*priv->rxqs)[--i];
-			if (rxq != NULL) {
-				rxq_allmulticast_disable(rxq);
-				rxq_promiscuous_disable(rxq);
-				rxq_mac_addrs_del(rxq);
-			}
-		}
-		priv->started = 0;
-		priv_unlock(priv);
-		return -ret;
-	} while ((--r) && ((rxq = (*priv->rxqs)[++i]), i));
+		priv_allmulticast_disable(priv);
+		priv_promiscuous_disable(priv);
+		priv_mac_addrs_disable(priv);
+		priv_destroy_hash_rxqs(priv);
+	}
 	priv_unlock(priv);
-	return 0;
+	return -err;
 }
 
 /**
@@ -123,32 +103,17 @@ void
 mlx5_dev_stop(struct rte_eth_dev *dev)
 {
 	struct priv *priv = dev->data->dev_private;
-	unsigned int i = 0;
-	unsigned int r;
-	struct rxq *rxq;
 
 	priv_lock(priv);
 	if (!priv->started) {
 		priv_unlock(priv);
 		return;
 	}
-	DEBUG("%p: detaching flows from all RX queues", (void *)dev);
+	DEBUG("%p: cleaning up and destroying hash RX queues", (void *)dev);
+	priv_allmulticast_disable(priv);
+	priv_promiscuous_disable(priv);
+	priv_mac_addrs_disable(priv);
+	priv_destroy_hash_rxqs(priv);
 	priv->started = 0;
-	if (priv->rss) {
-		rxq = &priv->rxq_parent;
-		r = 1;
-	} else {
-		rxq = (*priv->rxqs)[0];
-		r = priv->rxqs_n;
-	}
-	/* Iterate only once when RSS is enabled. */
-	do {
-		/* Ignore nonexistent RX queues. */
-		if (rxq == NULL)
-			continue;
-		rxq_allmulticast_disable(rxq);
-		rxq_promiscuous_disable(rxq);
-		rxq_mac_addrs_del(rxq);
-	} while ((--r) && ((rxq = (*priv->rxqs)[++i]), i));
 	priv_unlock(priv);
 }
diff --git a/drivers/net/mlx5/mlx5_vlan.c b/drivers/net/mlx5/mlx5_vlan.c
index ca80571..3a07ad1 100644
--- a/drivers/net/mlx5/mlx5_vlan.c
+++ b/drivers/net/mlx5/mlx5_vlan.c
@@ -67,8 +67,6 @@ vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on)
 {
 	struct priv *priv = dev->data->dev_private;
 	unsigned int i;
-	unsigned int r;
-	struct rxq *rxq;
 
 	DEBUG("%p: %s VLAN filter ID %" PRIu16,
 	      (void *)dev, (on ? "enable" : "disable"), vlan_id);
@@ -99,34 +97,9 @@ vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on)
 		priv->vlan_filter[priv->vlan_filter_n] = vlan_id;
 		++priv->vlan_filter_n;
 	}
-	if (!priv->started)
-		return 0;
-	/* Rehash MAC flows in all RX queues. */
-	if (priv->rss) {
-		rxq = &priv->rxq_parent;
-		r = 1;
-	} else {
-		rxq = (*priv->rxqs)[0];
-		r = priv->rxqs_n;
-	}
-	for (i = 0; (i < r); rxq = (*priv->rxqs)[++i]) {
-		int ret;
-
-		if (rxq == NULL)
-			continue;
-		rxq_mac_addrs_del(rxq);
-		ret = rxq_mac_addrs_add(rxq);
-		if (!ret)
-			continue;
-		/* Rollback. */
-		while (i != 0) {
-			rxq = (*priv->rxqs)[--i];
-			if (rxq != NULL)
-				rxq_mac_addrs_del(rxq);
-		}
-		return ret;
-	}
-	return 0;
+	/* Rehash MAC flows in all hash RX queues. */
+	priv_mac_addrs_disable(priv);
+	return priv_mac_addrs_enable(priv);
 }
 
 /**
-- 
2.1.0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [dpdk-dev] [PATCH v2 04/16] mlx5: use separate indirection table for default hash RX queue
  2015-10-30 18:55 ` [dpdk-dev] [PATCH v2 00/16] " Adrien Mazarguil
                     ` (2 preceding siblings ...)
  2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 03/16] mlx5: refactor RX code for the new Verbs RSS API Adrien Mazarguil
@ 2015-10-30 18:55   ` Adrien Mazarguil
  2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 05/16] mlx5: adapt indirection table size depending on RX queues number Adrien Mazarguil
                     ` (12 subsequent siblings)
  16 siblings, 0 replies; 39+ messages in thread
From: Adrien Mazarguil @ 2015-10-30 18:55 UTC (permalink / raw)
  To: dev; +Cc: Yaacov Hazan

From: Olga Shern <olgas@mellanox.com>

The default hash RX queue handles packets that are not matched by more
specific types and requires its own indirection table of size 1 to work
properly.

This commit implements support for multiple indirection tables by grouping
their layout and properties in a static initialization table.

Signed-off-by: Olga Shern <olgas@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>
---
 drivers/net/mlx5/mlx5.h        |   5 +-
 drivers/net/mlx5/mlx5_rxmode.c |   3 +
 drivers/net/mlx5/mlx5_rxq.c    | 274 ++++++++++++++++++++++++++++++++---------
 drivers/net/mlx5/mlx5_rxtx.h   |  22 ++++
 4 files changed, 247 insertions(+), 57 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 4407b18..29fc1da 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -106,8 +106,9 @@ struct priv {
 	unsigned int txqs_n; /* TX queues array size. */
 	struct rxq *(*rxqs)[]; /* RX queues. */
 	struct txq *(*txqs)[]; /* TX queues. */
-	/* Indirection table referencing all RX WQs. */
-	struct ibv_exp_rwq_ind_table *ind_table;
+	/* Indirection tables referencing all RX WQs. */
+	struct ibv_exp_rwq_ind_table *(*ind_tables)[];
+	unsigned int ind_tables_n; /* Number of indirection tables. */
 	/* Hash RX QPs feeding the indirection table. */
 	struct hash_rxq (*hash_rxqs)[];
 	unsigned int hash_rxqs_n; /* Hash RX QPs array size. */
diff --git a/drivers/net/mlx5/mlx5_rxmode.c b/drivers/net/mlx5/mlx5_rxmode.c
index 2a74c64..79e31fb 100644
--- a/drivers/net/mlx5/mlx5_rxmode.c
+++ b/drivers/net/mlx5/mlx5_rxmode.c
@@ -262,6 +262,9 @@ priv_allmulticast_enable(struct priv *priv)
 		struct hash_rxq *hash_rxq = &(*priv->hash_rxqs)[i];
 		int ret;
 
+		/* allmulticast not relevant for TCP. */
+		if (hash_rxq->type == HASH_RXQ_TCPV4)
+			continue;
 		ret = hash_rxq_allmulticast_enable(hash_rxq);
 		if (!ret)
 			continue;
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 6d8f7d2..8ea1267 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -64,6 +64,52 @@
 #include "mlx5_utils.h"
 #include "mlx5_defs.h"
 
+/* Initialization data for hash RX queues. */
+static const struct hash_rxq_init hash_rxq_init[] = {
+	[HASH_RXQ_TCPV4] = {
+		.hash_fields = (IBV_EXP_RX_HASH_SRC_IPV4 |
+				IBV_EXP_RX_HASH_DST_IPV4 |
+				IBV_EXP_RX_HASH_SRC_PORT_TCP |
+				IBV_EXP_RX_HASH_DST_PORT_TCP),
+	},
+	[HASH_RXQ_UDPV4] = {
+		.hash_fields = (IBV_EXP_RX_HASH_SRC_IPV4 |
+				IBV_EXP_RX_HASH_DST_IPV4 |
+				IBV_EXP_RX_HASH_SRC_PORT_UDP |
+				IBV_EXP_RX_HASH_DST_PORT_UDP),
+	},
+	[HASH_RXQ_IPV4] = {
+		.hash_fields = (IBV_EXP_RX_HASH_SRC_IPV4 |
+				IBV_EXP_RX_HASH_DST_IPV4),
+	},
+	[HASH_RXQ_ETH] = {
+		.hash_fields = 0,
+	},
+};
+
+/* Number of entries in hash_rxq_init[]. */
+static const unsigned int hash_rxq_init_n = RTE_DIM(hash_rxq_init);
+
+/* Initialization data for hash RX queue indirection tables. */
+static const struct ind_table_init ind_table_init[] = {
+	{
+		.max_size = -1u, /* Superseded by HW limitations. */
+		.hash_types =
+			1 << HASH_RXQ_TCPV4 |
+			1 << HASH_RXQ_UDPV4 |
+			1 << HASH_RXQ_IPV4 |
+			0,
+		.hash_types_n = 3,
+	},
+	{
+		.max_size = 1,
+		.hash_types = 1 << HASH_RXQ_ETH,
+		.hash_types_n = 1,
+	},
+};
+
+#define IND_TABLE_INIT_N RTE_DIM(ind_table_init)
+
 /* Default RSS hash key also used for ConnectX-3. */
 static uint8_t hash_rxq_default_key[] = {
 	0x2c, 0xc6, 0x81, 0xd1,
@@ -99,6 +145,74 @@ log2above(unsigned int v)
 }
 
 /**
+ * Return the type corresponding to the n'th bit set.
+ *
+ * @param table
+ *   The indirection table.
+ * @param n
+ *   The n'th bit set.
+ *
+ * @return
+ *   The corresponding hash_rxq_type.
+ */
+static enum hash_rxq_type
+hash_rxq_type_from_n(const struct ind_table_init *table, unsigned int n)
+{
+	assert(n < table->hash_types_n);
+	while (((table->hash_types >> n) & 0x1) == 0)
+		++n;
+	return n;
+}
+
+/**
+ * Filter out disabled hash RX queue types from ind_table_init[].
+ *
+ * @param priv
+ *   Pointer to private structure.
+ * @param[out] table
+ *   Output table.
+ *
+ * @return
+ *   Number of table entries.
+ */
+static unsigned int
+priv_make_ind_table_init(struct priv *priv,
+			 struct ind_table_init (*table)[IND_TABLE_INIT_N])
+{
+	unsigned int i;
+	unsigned int j;
+	unsigned int table_n = 0;
+	/* Mandatory to receive frames not handled by normal hash RX queues. */
+	unsigned int hash_types_sup = 1 << HASH_RXQ_ETH;
+
+	/* Process other protocols only if more than one queue. */
+	if (priv->rxqs_n > 1)
+		for (i = 0; (i != hash_rxq_init_n); ++i)
+			if (hash_rxq_init[i].hash_fields)
+				hash_types_sup |= (1 << i);
+
+	/* Filter out entries whose protocols are not in the set. */
+	for (i = 0, j = 0; (i != IND_TABLE_INIT_N); ++i) {
+		unsigned int nb;
+		unsigned int h;
+
+		/* j is increased only if the table has valid protocols. */
+		assert(j <= i);
+		(*table)[j] = ind_table_init[i];
+		(*table)[j].hash_types &= hash_types_sup;
+		for (h = 0, nb = 0; (h != hash_rxq_init_n); ++h)
+			if (((*table)[j].hash_types >> h) & 0x1)
+				++nb;
+		(*table)[i].hash_types_n = nb;
+		if (nb) {
+			++table_n;
+			++j;
+		}
+	}
+	return table_n;
+}
+
+/**
  * Initialize hash RX queues and indirection table.
  *
  * @param priv
@@ -110,21 +224,21 @@ log2above(unsigned int v)
 int
 priv_create_hash_rxqs(struct priv *priv)
 {
-	static const uint64_t rss_hash_table[] = {
-		/* TCPv4. */
-		(IBV_EXP_RX_HASH_SRC_IPV4 | IBV_EXP_RX_HASH_DST_IPV4 |
-		 IBV_EXP_RX_HASH_SRC_PORT_TCP | IBV_EXP_RX_HASH_DST_PORT_TCP),
-		/* UDPv4. */
-		(IBV_EXP_RX_HASH_SRC_IPV4 | IBV_EXP_RX_HASH_DST_IPV4 |
-		 IBV_EXP_RX_HASH_SRC_PORT_UDP | IBV_EXP_RX_HASH_DST_PORT_UDP),
-		/* Other IPv4. */
-		(IBV_EXP_RX_HASH_SRC_IPV4 | IBV_EXP_RX_HASH_DST_IPV4),
-		/* None, used for everything else. */
-		0,
-	};
+	unsigned int wqs_n = (1 << log2above(priv->rxqs_n));
+	struct ibv_exp_wq *wqs[wqs_n];
+	struct ind_table_init ind_table_init[IND_TABLE_INIT_N];
+	unsigned int ind_tables_n =
+		priv_make_ind_table_init(priv, &ind_table_init);
+	unsigned int hash_rxqs_n = 0;
+	struct hash_rxq (*hash_rxqs)[] = NULL;
+	struct ibv_exp_rwq_ind_table *(*ind_tables)[] = NULL;
+	unsigned int i;
+	unsigned int j;
+	unsigned int k;
+	int err = 0;
 
-	DEBUG("allocating hash RX queues for %u WQs", priv->rxqs_n);
-	assert(priv->ind_table == NULL);
+	assert(priv->ind_tables == NULL);
+	assert(priv->ind_tables_n == 0);
 	assert(priv->hash_rxqs == NULL);
 	assert(priv->hash_rxqs_n == 0);
 	assert(priv->pd != NULL);
@@ -132,26 +246,11 @@ priv_create_hash_rxqs(struct priv *priv)
 	if (priv->rxqs_n == 0)
 		return EINVAL;
 	assert(priv->rxqs != NULL);
-
-	/* FIXME: large data structures are allocated on the stack. */
-	unsigned int wqs_n = (1 << log2above(priv->rxqs_n));
-	struct ibv_exp_wq *wqs[wqs_n];
-	struct ibv_exp_rwq_ind_table_init_attr ind_init_attr = {
-		.pd = priv->pd,
-		.log_ind_tbl_size = log2above(priv->rxqs_n),
-		.ind_tbl = wqs,
-		.comp_mask = 0,
-	};
-	struct ibv_exp_rwq_ind_table *ind_table = NULL;
-	/* If only one RX queue is configured, RSS is not needed and a single
-	 * empty hash entry is used (last rss_hash_table[] entry). */
-	unsigned int hash_rxqs_n =
-		((priv->rxqs_n == 1) ? 1 : RTE_DIM(rss_hash_table));
-	struct hash_rxq (*hash_rxqs)[hash_rxqs_n] = NULL;
-	unsigned int i;
-	unsigned int j;
-	int err = 0;
-
+	if (ind_tables_n == 0) {
+		ERROR("all hash RX queue types have been filtered out,"
+		      " indirection table cannot be created");
+		return EINVAL;
+	}
 	if (wqs_n < priv->rxqs_n) {
 		ERROR("cannot handle this many RX queues (%u)", priv->rxqs_n);
 		err = ERANGE;
@@ -170,9 +269,40 @@ priv_create_hash_rxqs(struct priv *priv)
 		if (++j == priv->rxqs_n)
 			j = 0;
 	}
-	errno = 0;
-	ind_table = ibv_exp_create_rwq_ind_table(priv->ctx, &ind_init_attr);
-	if (ind_table == NULL) {
+	/* Get number of hash RX queues to configure. */
+	for (i = 0, hash_rxqs_n = 0; (i != ind_tables_n); ++i)
+		hash_rxqs_n += ind_table_init[i].hash_types_n;
+	DEBUG("allocating %u hash RX queues for %u WQs, %u indirection tables",
+	      hash_rxqs_n, priv->rxqs_n, ind_tables_n);
+	/* Create indirection tables. */
+	ind_tables = rte_calloc(__func__, ind_tables_n,
+				sizeof((*ind_tables)[0]), 0);
+	if (ind_tables == NULL) {
+		err = ENOMEM;
+		ERROR("cannot allocate indirection tables container: %s",
+		      strerror(err));
+		goto error;
+	}
+	for (i = 0; (i != ind_tables_n); ++i) {
+		struct ibv_exp_rwq_ind_table_init_attr ind_init_attr = {
+			.pd = priv->pd,
+			.log_ind_tbl_size = 0, /* Set below. */
+			.ind_tbl = wqs,
+			.comp_mask = 0,
+		};
+		unsigned int ind_tbl_size = ind_table_init[i].max_size;
+		struct ibv_exp_rwq_ind_table *ind_table;
+
+		if (wqs_n < ind_tbl_size)
+			ind_tbl_size = wqs_n;
+		ind_init_attr.log_ind_tbl_size = log2above(ind_tbl_size);
+		errno = 0;
+		ind_table = ibv_exp_create_rwq_ind_table(priv->ctx,
+							 &ind_init_attr);
+		if (ind_table != NULL) {
+			(*ind_tables)[i] = ind_table;
+			continue;
+		}
 		/* Not clear whether errno is set. */
 		err = (errno ? errno : EINVAL);
 		ERROR("RX indirection table creation failed with error %d: %s",
@@ -180,24 +310,26 @@ priv_create_hash_rxqs(struct priv *priv)
 		goto error;
 	}
 	/* Allocate array that holds hash RX queues and related data. */
-	hash_rxqs = rte_malloc(__func__, sizeof(*hash_rxqs), 0);
+	hash_rxqs = rte_calloc(__func__, hash_rxqs_n,
+			       sizeof((*hash_rxqs)[0]), 0);
 	if (hash_rxqs == NULL) {
 		err = ENOMEM;
 		ERROR("cannot allocate hash RX queues container: %s",
 		      strerror(err));
 		goto error;
 	}
-	for (i = 0, j = (RTE_DIM(rss_hash_table) - hash_rxqs_n);
-	     (j != RTE_DIM(rss_hash_table));
-	     ++i, ++j) {
+	for (i = 0, j = 0, k = 0;
+	     ((i != hash_rxqs_n) && (j != ind_tables_n));
+	     ++i) {
 		struct hash_rxq *hash_rxq = &(*hash_rxqs)[i];
-
+		enum hash_rxq_type type =
+			hash_rxq_type_from_n(&ind_table_init[j], k);
 		struct ibv_exp_rx_hash_conf hash_conf = {
 			.rx_hash_function = IBV_EXP_RX_HASH_FUNC_TOEPLITZ,
 			.rx_hash_key_len = sizeof(hash_rxq_default_key),
 			.rx_hash_key = hash_rxq_default_key,
-			.rx_hash_fields_mask = rss_hash_table[j],
-			.rwq_ind_tbl = ind_table,
+			.rx_hash_fields_mask = hash_rxq_init[type].hash_fields,
+			.rwq_ind_tbl = (*ind_tables)[j],
 		};
 		struct ibv_exp_qp_init_attr qp_init_attr = {
 			.max_inl_recv = 0, /* Currently not supported. */
@@ -209,30 +341,54 @@ priv_create_hash_rxqs(struct priv *priv)
 			.port_num = priv->port,
 		};
 
+		DEBUG("using indirection table %u for hash RX queue %u",
+		      j, i);
 		*hash_rxq = (struct hash_rxq){
 			.priv = priv,
 			.qp = ibv_exp_create_qp(priv->ctx, &qp_init_attr),
+			.type = type,
 		};
 		if (hash_rxq->qp == NULL) {
 			err = (errno ? errno : EINVAL);
 			ERROR("Hash RX QP creation failure: %s",
 			      strerror(err));
-			while (i) {
-				hash_rxq = &(*hash_rxqs)[--i];
-				claim_zero(ibv_destroy_qp(hash_rxq->qp));
-			}
 			goto error;
 		}
+		if (++k < ind_table_init[j].hash_types_n)
+			continue;
+		/* Switch to the next indirection table and reset hash RX
+		 * queue type array index. */
+		++j;
+		k = 0;
 	}
-	priv->ind_table = ind_table;
+	priv->ind_tables = ind_tables;
+	priv->ind_tables_n = ind_tables_n;
 	priv->hash_rxqs = hash_rxqs;
 	priv->hash_rxqs_n = hash_rxqs_n;
 	assert(err == 0);
 	return 0;
 error:
-	rte_free(hash_rxqs);
-	if (ind_table != NULL)
-		claim_zero(ibv_exp_destroy_rwq_ind_table(ind_table));
+	if (hash_rxqs != NULL) {
+		for (i = 0; (i != hash_rxqs_n); ++i) {
+			struct ibv_qp *qp = (*hash_rxqs)[i].qp;
+
+			if (qp == NULL)
+				continue;
+			claim_zero(ibv_destroy_qp(qp));
+		}
+		rte_free(hash_rxqs);
+	}
+	if (ind_tables != NULL) {
+		for (j = 0; (j != ind_tables_n); ++j) {
+			struct ibv_exp_rwq_ind_table *ind_table =
+				(*ind_tables)[j];
+
+			if (ind_table == NULL)
+				continue;
+			claim_zero(ibv_exp_destroy_rwq_ind_table(ind_table));
+		}
+		rte_free(ind_tables);
+	}
 	return err;
 }
 
@@ -250,7 +406,7 @@ priv_destroy_hash_rxqs(struct priv *priv)
 	DEBUG("destroying %u hash RX queues", priv->hash_rxqs_n);
 	if (priv->hash_rxqs_n == 0) {
 		assert(priv->hash_rxqs == NULL);
-		assert(priv->ind_table == NULL);
+		assert(priv->ind_tables == NULL);
 		return;
 	}
 	for (i = 0; (i != priv->hash_rxqs_n); ++i) {
@@ -270,8 +426,16 @@ priv_destroy_hash_rxqs(struct priv *priv)
 	priv->hash_rxqs_n = 0;
 	rte_free(priv->hash_rxqs);
 	priv->hash_rxqs = NULL;
-	claim_zero(ibv_exp_destroy_rwq_ind_table(priv->ind_table));
-	priv->ind_table = NULL;
+	for (i = 0; (i != priv->ind_tables_n); ++i) {
+		struct ibv_exp_rwq_ind_table *ind_table =
+			(*priv->ind_tables)[i];
+
+		assert(ind_table != NULL);
+		claim_zero(ibv_exp_destroy_rwq_ind_table(ind_table));
+	}
+	priv->ind_tables_n = 0;
+	rte_free(priv->ind_tables);
+	priv->ind_tables = NULL;
 }
 
 /**
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index df1d52b..f89d3ec 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -118,9 +118,31 @@ struct rxq {
 	struct ibv_exp_res_domain *rd; /* Resource Domain. */
 };
 
+/* Hash RX queue types. */
+enum hash_rxq_type {
+	HASH_RXQ_TCPV4,
+	HASH_RXQ_UDPV4,
+	HASH_RXQ_IPV4,
+	HASH_RXQ_ETH,
+};
+
+/* Initialization data for hash RX queue. */
+struct hash_rxq_init {
+	uint64_t hash_fields; /* Fields that participate in the hash. */
+};
+
+/* Initialization data for indirection table. */
+struct ind_table_init {
+	unsigned int max_size; /* Maximum number of WQs. */
+	/* Hash RX queues using this table. */
+	unsigned int hash_types;
+	unsigned int hash_types_n;
+};
+
 struct hash_rxq {
 	struct priv *priv; /* Back pointer to private data. */
 	struct ibv_qp *qp; /* Hash RX QP. */
+	enum hash_rxq_type type; /* Hash RX queue type. */
 	/* MAC flow steering rules, one per VLAN ID. */
 	struct ibv_flow *mac_flow[MLX5_MAX_MAC_ADDRESSES][MLX5_MAX_VLAN_IDS];
 	struct ibv_flow *promisc_flow; /* Promiscuous flow. */
-- 
2.1.0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [dpdk-dev] [PATCH v2 05/16] mlx5: adapt indirection table size depending on RX queues number
  2015-10-30 18:55 ` [dpdk-dev] [PATCH v2 00/16] " Adrien Mazarguil
                     ` (3 preceding siblings ...)
  2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 04/16] mlx5: use separate indirection table for default hash RX queue Adrien Mazarguil
@ 2015-10-30 18:55   ` Adrien Mazarguil
  2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 06/16] mlx5: define specific flow steering rules for each hash RX QP Adrien Mazarguil
                     ` (11 subsequent siblings)
  16 siblings, 0 replies; 39+ messages in thread
From: Adrien Mazarguil @ 2015-10-30 18:55 UTC (permalink / raw)
  To: dev

From: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>

Use the maximum size of the indirection table when the number of requested
RX queues is not a power of two, this help to improve RSS balancing.

A message informs users that balancing is not optimal in such cases.

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx5/mlx5.c      | 10 +++++++++-
 drivers/net/mlx5/mlx5.h      |  1 +
 drivers/net/mlx5/mlx5_defs.h |  3 +++
 drivers/net/mlx5/mlx5_rxq.c  | 21 ++++++++++++++-------
 4 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index e394d32..4413248 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -299,7 +299,9 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		struct ether_addr mac;
 
 #ifdef HAVE_EXP_QUERY_DEVICE
-		exp_device_attr.comp_mask = IBV_EXP_DEVICE_ATTR_EXP_CAP_FLAGS;
+		exp_device_attr.comp_mask =
+			IBV_EXP_DEVICE_ATTR_EXP_CAP_FLAGS |
+			IBV_EXP_DEVICE_ATTR_RX_HASH;
 #endif /* HAVE_EXP_QUERY_DEVICE */
 
 		DEBUG("using port %u (%08" PRIx32 ")", port, test);
@@ -363,6 +365,12 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		DEBUG("L2 tunnel checksum offloads are %ssupported",
 		      (priv->hw_csum_l2tun ? "" : "not "));
 
+		priv->ind_table_max_size = exp_device_attr.rx_hash_caps.max_rwq_indirection_table_size;
+		DEBUG("maximum RX indirection table size is %u",
+		      priv->ind_table_max_size);
+
+#else /* HAVE_EXP_QUERY_DEVICE */
+		priv->ind_table_max_size = RSS_INDIRECTION_TABLE_SIZE;
 #endif /* HAVE_EXP_QUERY_DEVICE */
 
 		priv->vf = vf;
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 29fc1da..5a41678 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -109,6 +109,7 @@ struct priv {
 	/* Indirection tables referencing all RX WQs. */
 	struct ibv_exp_rwq_ind_table *(*ind_tables)[];
 	unsigned int ind_tables_n; /* Number of indirection tables. */
+	unsigned int ind_table_max_size; /* Maximum indirection table size. */
 	/* Hash RX QPs feeding the indirection table. */
 	struct hash_rxq (*hash_rxqs)[];
 	unsigned int hash_rxqs_n; /* Hash RX QPs array size. */
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index 369f8b6..3952c71 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -46,6 +46,9 @@
 /* Request send completion once in every 64 sends, might be less. */
 #define MLX5_PMD_TX_PER_COMP_REQ 64
 
+/* RSS Indirection table size. */
+#define RSS_INDIRECTION_TABLE_SIZE 128
+
 /* Maximum number of Scatter/Gather Elements per Work Request. */
 #ifndef MLX5_PMD_SGE_WR_N
 #define MLX5_PMD_SGE_WR_N 4
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 8ea1267..41f8811 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -224,7 +224,13 @@ priv_make_ind_table_init(struct priv *priv,
 int
 priv_create_hash_rxqs(struct priv *priv)
 {
-	unsigned int wqs_n = (1 << log2above(priv->rxqs_n));
+	/* If the requested number of WQs is not a power of two, use the
+	 * maximum indirection table size for better balancing.
+	 * The result is always rounded to the next power of two. */
+	unsigned int wqs_n =
+		(1 << log2above((priv->rxqs_n & (priv->rxqs_n - 1)) ?
+				priv->ind_table_max_size :
+				priv->rxqs_n));
 	struct ibv_exp_wq *wqs[wqs_n];
 	struct ind_table_init ind_table_init[IND_TABLE_INIT_N];
 	unsigned int ind_tables_n =
@@ -251,16 +257,17 @@ priv_create_hash_rxqs(struct priv *priv)
 		      " indirection table cannot be created");
 		return EINVAL;
 	}
-	if (wqs_n < priv->rxqs_n) {
+	if ((wqs_n < priv->rxqs_n) || (wqs_n > priv->ind_table_max_size)) {
 		ERROR("cannot handle this many RX queues (%u)", priv->rxqs_n);
 		err = ERANGE;
 		goto error;
 	}
-	if (wqs_n != priv->rxqs_n)
-		WARN("%u RX queues are configured, consider rounding this"
-		     " number to the next power of two (%u) for optimal"
-		     " performance",
-		     priv->rxqs_n, wqs_n);
+	if (wqs_n != priv->rxqs_n) {
+		INFO("%u RX queues are configured, consider rounding this"
+		     " number to the next power of two for better balancing",
+		     priv->rxqs_n);
+		DEBUG("indirection table extended to assume %u WQs", wqs_n);
+	}
 	/* When the number of RX queues is not a power of two, the remaining
 	 * table entries are padded with reused WQs and hashes are not spread
 	 * uniformly. */
-- 
2.1.0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [dpdk-dev] [PATCH v2 06/16] mlx5: define specific flow steering rules for each hash RX QP
  2015-10-30 18:55 ` [dpdk-dev] [PATCH v2 00/16] " Adrien Mazarguil
                     ` (4 preceding siblings ...)
  2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 05/16] mlx5: adapt indirection table size depending on RX queues number Adrien Mazarguil
@ 2015-10-30 18:55   ` Adrien Mazarguil
  2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 07/16] mlx5: use alternate method to configure promisc and allmulti modes Adrien Mazarguil
                     ` (10 subsequent siblings)
  16 siblings, 0 replies; 39+ messages in thread
From: Adrien Mazarguil @ 2015-10-30 18:55 UTC (permalink / raw)
  To: dev

From: Olga Shern <olgas@mellanox.com>

All hash RX QPs currently use the same flow steering rule (L2 MAC filtering)
regardless of their type (TCP, UDP, IPv4, IPv6), which prevents them from
being dispatched properly. This is fixed by adding flow information to the
hash RX queue initialization data and generating specific flow steering
rules for each of them.

Signed-off-by: Olga Shern <olgas@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx5/mlx5_mac.c  | 19 ++++-------
 drivers/net/mlx5/mlx5_rxq.c  | 77 ++++++++++++++++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_rxtx.h | 21 ++++++++++++
 3 files changed, 105 insertions(+), 12 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_mac.c b/drivers/net/mlx5/mlx5_mac.c
index b580494..d3ab5b9 100644
--- a/drivers/net/mlx5/mlx5_mac.c
+++ b/drivers/net/mlx5/mlx5_mac.c
@@ -242,12 +242,9 @@ hash_rxq_add_mac_flow(struct hash_rxq *hash_rxq, unsigned int mac_index,
 	const uint8_t (*mac)[ETHER_ADDR_LEN] =
 			(const uint8_t (*)[ETHER_ADDR_LEN])
 			priv->mac[mac_index].addr_bytes;
-	struct __attribute__((packed)) {
-		struct ibv_flow_attr attr;
-		struct ibv_flow_spec_eth spec;
-	} data;
-	struct ibv_flow_attr *attr = &data.attr;
-	struct ibv_flow_spec_eth *spec = &data.spec;
+	FLOW_ATTR_SPEC_ETH(data, hash_rxq_flow_attr(hash_rxq, NULL, 0));
+	struct ibv_flow_attr *attr = &data->attr;
+	struct ibv_flow_spec_eth *spec = &data->spec;
 	unsigned int vlan_enabled = !!priv->vlan_filter_n;
 	unsigned int vlan_id = priv->vlan_filter[vlan_index];
 
@@ -260,12 +257,10 @@ hash_rxq_add_mac_flow(struct hash_rxq *hash_rxq, unsigned int mac_index,
 	 * This layout is expected by libibverbs.
 	 */
 	assert(((uint8_t *)attr + sizeof(*attr)) == (uint8_t *)spec);
-	*attr = (struct ibv_flow_attr){
-		.type = IBV_FLOW_ATTR_NORMAL,
-		.num_of_specs = 1,
-		.port = priv->port,
-		.flags = 0
-	};
+	hash_rxq_flow_attr(hash_rxq, attr, sizeof(data));
+	/* The first specification must be Ethernet. */
+	assert(spec->type == IBV_FLOW_SPEC_ETH);
+	assert(spec->size == sizeof(*spec));
 	*spec = (struct ibv_flow_spec_eth){
 		.type = IBV_FLOW_SPEC_ETH,
 		.size = sizeof(*spec),
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 41f8811..1e15ff9 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -71,19 +71,43 @@ static const struct hash_rxq_init hash_rxq_init[] = {
 				IBV_EXP_RX_HASH_DST_IPV4 |
 				IBV_EXP_RX_HASH_SRC_PORT_TCP |
 				IBV_EXP_RX_HASH_DST_PORT_TCP),
+		.flow_priority = 0,
+		.flow_spec.tcp_udp = {
+			.type = IBV_FLOW_SPEC_TCP,
+			.size = sizeof(hash_rxq_init[0].flow_spec.tcp_udp),
+		},
+		.underlayer = &hash_rxq_init[HASH_RXQ_IPV4],
 	},
 	[HASH_RXQ_UDPV4] = {
 		.hash_fields = (IBV_EXP_RX_HASH_SRC_IPV4 |
 				IBV_EXP_RX_HASH_DST_IPV4 |
 				IBV_EXP_RX_HASH_SRC_PORT_UDP |
 				IBV_EXP_RX_HASH_DST_PORT_UDP),
+		.flow_priority = 0,
+		.flow_spec.tcp_udp = {
+			.type = IBV_FLOW_SPEC_UDP,
+			.size = sizeof(hash_rxq_init[0].flow_spec.tcp_udp),
+		},
+		.underlayer = &hash_rxq_init[HASH_RXQ_IPV4],
 	},
 	[HASH_RXQ_IPV4] = {
 		.hash_fields = (IBV_EXP_RX_HASH_SRC_IPV4 |
 				IBV_EXP_RX_HASH_DST_IPV4),
+		.flow_priority = 1,
+		.flow_spec.ipv4 = {
+			.type = IBV_FLOW_SPEC_IPV4,
+			.size = sizeof(hash_rxq_init[0].flow_spec.ipv4),
+		},
+		.underlayer = &hash_rxq_init[HASH_RXQ_ETH],
 	},
 	[HASH_RXQ_ETH] = {
 		.hash_fields = 0,
+		.flow_priority = 2,
+		.flow_spec.eth = {
+			.type = IBV_FLOW_SPEC_ETH,
+			.size = sizeof(hash_rxq_init[0].flow_spec.eth),
+		},
+		.underlayer = NULL,
 	},
 };
 
@@ -125,6 +149,59 @@ static uint8_t hash_rxq_default_key[] = {
 };
 
 /**
+ * Populate flow steering rule for a given hash RX queue type using
+ * information from hash_rxq_init[]. Nothing is written to flow_attr when
+ * flow_attr_size is not large enough, but the required size is still returned.
+ *
+ * @param[in] hash_rxq
+ *   Pointer to hash RX queue.
+ * @param[out] flow_attr
+ *   Pointer to flow attribute structure to fill. Note that the allocated
+ *   area must be larger and large enough to hold all flow specifications.
+ * @param flow_attr_size
+ *   Entire size of flow_attr and trailing room for flow specifications.
+ *
+ * @return
+ *   Total size of the flow attribute buffer. No errors are defined.
+ */
+size_t
+hash_rxq_flow_attr(const struct hash_rxq *hash_rxq,
+		   struct ibv_flow_attr *flow_attr,
+		   size_t flow_attr_size)
+{
+	size_t offset = sizeof(*flow_attr);
+	enum hash_rxq_type type = hash_rxq->type;
+	const struct hash_rxq_init *init = &hash_rxq_init[type];
+
+	assert(hash_rxq->priv != NULL);
+	assert((size_t)type < RTE_DIM(hash_rxq_init));
+	do {
+		offset += init->flow_spec.hdr.size;
+		init = init->underlayer;
+	} while (init != NULL);
+	if (offset > flow_attr_size)
+		return offset;
+	flow_attr_size = offset;
+	init = &hash_rxq_init[type];
+	*flow_attr = (struct ibv_flow_attr){
+		.type = IBV_FLOW_ATTR_NORMAL,
+		.priority = init->flow_priority,
+		.num_of_specs = 0,
+		.port = hash_rxq->priv->port,
+		.flags = 0,
+	};
+	do {
+		offset -= init->flow_spec.hdr.size;
+		memcpy((void *)((uintptr_t)flow_attr + offset),
+		       &init->flow_spec,
+		       init->flow_spec.hdr.size);
+		++flow_attr->num_of_specs;
+		init = init->underlayer;
+	} while (init != NULL);
+	return flow_attr_size;
+}
+
+/**
  * Return nearest power of two above input value.
  *
  * @param v
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index f89d3ec..c31fa8e 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -34,6 +34,7 @@
 #ifndef RTE_PMD_MLX5_RXTX_H_
 #define RTE_PMD_MLX5_RXTX_H_
 
+#include <stddef.h>
 #include <stdint.h>
 
 /* Verbs header. */
@@ -126,9 +127,27 @@ enum hash_rxq_type {
 	HASH_RXQ_ETH,
 };
 
+/* Flow structure with Ethernet specification. It is packed to prevent padding
+ * between attr and spec as this layout is expected by libibverbs. */
+struct flow_attr_spec_eth {
+	struct ibv_flow_attr attr;
+	struct ibv_flow_spec_eth spec;
+} __attribute__((packed));
+
+/* Define a struct flow_attr_spec_eth object as an array of at least
+ * "size" bytes. Room after the first index is normally used to store
+ * extra flow specifications. */
+#define FLOW_ATTR_SPEC_ETH(name, size) \
+	struct flow_attr_spec_eth name \
+		[((size) / sizeof(struct flow_attr_spec_eth)) + \
+		 !!((size) % sizeof(struct flow_attr_spec_eth))]
+
 /* Initialization data for hash RX queue. */
 struct hash_rxq_init {
 	uint64_t hash_fields; /* Fields that participate in the hash. */
+	unsigned int flow_priority; /* Flow priority to use. */
+	struct ibv_flow_spec flow_spec; /* Flow specification template. */
+	const struct hash_rxq_init *underlayer; /* Pointer to underlayer. */
 };
 
 /* Initialization data for indirection table. */
@@ -193,6 +212,8 @@ struct txq {
 
 /* mlx5_rxq.c */
 
+size_t hash_rxq_flow_attr(const struct hash_rxq *, struct ibv_flow_attr *,
+			  size_t);
 int priv_create_hash_rxqs(struct priv *);
 void priv_destroy_hash_rxqs(struct priv *);
 void rxq_cleanup(struct rxq *);
-- 
2.1.0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [dpdk-dev] [PATCH v2 07/16] mlx5: use alternate method to configure promisc and allmulti modes
  2015-10-30 18:55 ` [dpdk-dev] [PATCH v2 00/16] " Adrien Mazarguil
                     ` (5 preceding siblings ...)
  2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 06/16] mlx5: define specific flow steering rules for each hash RX QP Adrien Mazarguil
@ 2015-10-30 18:55   ` Adrien Mazarguil
  2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 08/16] mlx5: add RSS hash update/get Adrien Mazarguil
                     ` (9 subsequent siblings)
  16 siblings, 0 replies; 39+ messages in thread
From: Adrien Mazarguil @ 2015-10-30 18:55 UTC (permalink / raw)
  To: dev; +Cc: Yaacov Hazan

From: Olga Shern <olgas@mellanox.com>

Promiscuous and allmulticast modes were historically enabled by adding
specific flows with types IBV_FLOW_ATTR_ALL_DEFAULT or
IBV_EXP_FLOW_ATTR_MC_DEFAULT to each hash RX queue, but this method is
deprecated.

- Promiscuous mode is now enabled by omitting destination MAC addresses from
  basic flow specifications.
- Allmulticast mode is now enabled by using flow specifications that match
  the broadcast bit in destination MAC addresses.

Signed-off-by: Olga Shern <olgas@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>
---
 drivers/net/mlx5/mlx5_rxmode.c | 44 +++++++++++++++++++++++++-----------------
 1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxmode.c b/drivers/net/mlx5/mlx5_rxmode.c
index 79e31fb..7794608 100644
--- a/drivers/net/mlx5/mlx5_rxmode.c
+++ b/drivers/net/mlx5/mlx5_rxmode.c
@@ -74,20 +74,17 @@ static int
 hash_rxq_promiscuous_enable(struct hash_rxq *hash_rxq)
 {
 	struct ibv_flow *flow;
-	struct ibv_flow_attr attr = {
-		.type = IBV_FLOW_ATTR_ALL_DEFAULT,
-		.num_of_specs = 0,
-		.port = hash_rxq->priv->port,
-		.flags = 0
-	};
+	FLOW_ATTR_SPEC_ETH(data, hash_rxq_flow_attr(hash_rxq, NULL, 0));
+	struct ibv_flow_attr *attr = &data->attr;
 
-	if (hash_rxq->priv->vf)
-		return 0;
 	if (hash_rxq->promisc_flow != NULL)
 		return 0;
 	DEBUG("%p: enabling promiscuous mode", (void *)hash_rxq);
+	/* Promiscuous flows only differ from normal flows by not filtering
+	 * on specific MAC addresses. */
+	hash_rxq_flow_attr(hash_rxq, attr, sizeof(data));
 	errno = 0;
-	flow = ibv_create_flow(hash_rxq->qp, &attr);
+	flow = ibv_create_flow(hash_rxq->qp, attr);
 	if (flow == NULL) {
 		/* It's not clear whether errno is always set in this case. */
 		ERROR("%p: flow configuration failed, errno=%d: %s",
@@ -162,8 +159,6 @@ mlx5_promiscuous_enable(struct rte_eth_dev *dev)
 static void
 hash_rxq_promiscuous_disable(struct hash_rxq *hash_rxq)
 {
-	if (hash_rxq->priv->vf)
-		return;
 	if (hash_rxq->promisc_flow == NULL)
 		return;
 	DEBUG("%p: disabling promiscuous mode", (void *)hash_rxq);
@@ -217,18 +212,31 @@ static int
 hash_rxq_allmulticast_enable(struct hash_rxq *hash_rxq)
 {
 	struct ibv_flow *flow;
-	struct ibv_flow_attr attr = {
-		.type = IBV_FLOW_ATTR_MC_DEFAULT,
-		.num_of_specs = 0,
-		.port = hash_rxq->priv->port,
-		.flags = 0
-	};
+	FLOW_ATTR_SPEC_ETH(data, hash_rxq_flow_attr(hash_rxq, NULL, 0));
+	struct ibv_flow_attr *attr = &data->attr;
+	struct ibv_flow_spec_eth *spec = &data->spec;
 
 	if (hash_rxq->allmulti_flow != NULL)
 		return 0;
 	DEBUG("%p: enabling allmulticast mode", (void *)hash_rxq);
+	/*
+	 * No padding must be inserted by the compiler between attr and spec.
+	 * This layout is expected by libibverbs.
+	 */
+	assert(((uint8_t *)attr + sizeof(*attr)) == (uint8_t *)spec);
+	hash_rxq_flow_attr(hash_rxq, attr, sizeof(data));
+	*spec = (struct ibv_flow_spec_eth){
+		.type = IBV_FLOW_SPEC_ETH,
+		.size = sizeof(*spec),
+		.val = {
+			.dst_mac = "\x01\x00\x00\x00\x00\x00",
+		},
+		.mask = {
+			.dst_mac = "\x01\x00\x00\x00\x00\x00",
+		},
+	};
 	errno = 0;
-	flow = ibv_create_flow(hash_rxq->qp, &attr);
+	flow = ibv_create_flow(hash_rxq->qp, attr);
 	if (flow == NULL) {
 		/* It's not clear whether errno is always set in this case. */
 		ERROR("%p: flow configuration failed, errno=%d: %s",
-- 
2.1.0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [dpdk-dev] [PATCH v2 08/16] mlx5: add RSS hash update/get
  2015-10-30 18:55 ` [dpdk-dev] [PATCH v2 00/16] " Adrien Mazarguil
                     ` (6 preceding siblings ...)
  2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 07/16] mlx5: use alternate method to configure promisc and allmulti modes Adrien Mazarguil
@ 2015-10-30 18:55   ` Adrien Mazarguil
  2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 09/16] mlx5: use one RSS hash key per flow type Adrien Mazarguil
                     ` (8 subsequent siblings)
  16 siblings, 0 replies; 39+ messages in thread
From: Adrien Mazarguil @ 2015-10-30 18:55 UTC (permalink / raw)
  To: dev

From: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>

First implementation of rss_hash_update and rss_hash_conf_get, those
functions still lack in functionality but are usable to change the RSS
hash key.  For now, the PMD does not handle an indirection table for
each kind of flow (IPv4, IPv6, etc.), the same RSS hash key is used
for all protocols.  This situation explains why the rss_hash_conf_get
returns the RSS hash key for all DPDK supported protocols and why the
hash key is set for all of them too.

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx5/Makefile    |   1 +
 drivers/net/mlx5/mlx5.c      |  10 +++
 drivers/net/mlx5/mlx5.h      |   7 ++
 drivers/net/mlx5/mlx5_rss.c  | 168 +++++++++++++++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_rxq.c  |  10 ++-
 drivers/net/mlx5/mlx5_rxtx.h |   3 +
 6 files changed, 196 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5_rss.c

diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 938f924..54f1e89 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -51,6 +51,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_mac.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_rxmode.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_vlan.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_stats.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_rss.c
 
 # Dependencies.
 DEPDIRS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += lib/librte_ether
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 4413248..5a3d198 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -127,6 +127,7 @@ mlx5_dev_close(struct rte_eth_dev *dev)
 		claim_zero(ibv_close_device(priv->ctx));
 	} else
 		assert(priv->ctx == NULL);
+	rte_free(priv->rss_conf);
 	priv_unlock(priv);
 	memset(priv, 0, sizeof(*priv));
 }
@@ -154,6 +155,8 @@ static const struct eth_dev_ops mlx5_dev_ops = {
 	.mac_addr_remove = mlx5_mac_addr_remove,
 	.mac_addr_add = mlx5_mac_addr_add,
 	.mtu_set = mlx5_dev_set_mtu,
+	.rss_hash_update = mlx5_rss_hash_update,
+	.rss_hash_conf_get = mlx5_rss_hash_conf_get,
 };
 
 static struct {
@@ -374,6 +377,12 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 #endif /* HAVE_EXP_QUERY_DEVICE */
 
 		priv->vf = vf;
+		/* Register default RSS hash key. */
+		err = rss_hash_rss_conf_new_key(priv,
+						rss_hash_default_key,
+						rss_hash_default_key_len);
+		if (err)
+			goto port_error;
 		/* Configure the first MAC address by default. */
 		if (priv_get_mac(priv, &mac.addr_bytes)) {
 			ERROR("cannot get MAC address, is mlx5_en loaded?"
@@ -437,6 +446,7 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		continue;
 
 port_error:
+		rte_free(priv->rss_conf);
 		rte_free(priv);
 		if (pd)
 			claim_zero(ibv_dealloc_pd(pd));
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 5a41678..70bacf5 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -113,6 +113,7 @@ struct priv {
 	/* Hash RX QPs feeding the indirection table. */
 	struct hash_rxq (*hash_rxqs)[];
 	unsigned int hash_rxqs_n; /* Hash RX QPs array size. */
+	struct rte_eth_rss_conf *rss_conf; /* RSS configuration. */
 	rte_spinlock_t lock; /* Lock for control functions. */
 };
 
@@ -169,6 +170,12 @@ int priv_mac_addrs_enable(struct priv *);
 void mlx5_mac_addr_add(struct rte_eth_dev *, struct ether_addr *, uint32_t,
 		       uint32_t);
 
+/* mlx5_rss.c */
+
+int rss_hash_rss_conf_new_key(struct priv *, const uint8_t *, unsigned int);
+int mlx5_rss_hash_update(struct rte_eth_dev *, struct rte_eth_rss_conf *);
+int mlx5_rss_hash_conf_get(struct rte_eth_dev *, struct rte_eth_rss_conf *);
+
 /* mlx5_rxmode.c */
 
 int priv_promiscuous_enable(struct priv *);
diff --git a/drivers/net/mlx5/mlx5_rss.c b/drivers/net/mlx5/mlx5_rss.c
new file mode 100644
index 0000000..2dc58e5
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_rss.c
@@ -0,0 +1,168 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2015 6WIND S.A.
+ *   Copyright 2015 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stddef.h>
+#include <stdint.h>
+#include <errno.h>
+#include <string.h>
+#include <assert.h>
+
+/* Verbs header. */
+/* ISO C doesn't support unnamed structs/unions, disabling -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include <infiniband/verbs.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+/* DPDK headers don't like -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include <rte_malloc.h>
+#include <rte_ethdev.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+#include "mlx5.h"
+#include "mlx5_rxtx.h"
+
+/**
+ * Register a RSS key.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ * @param key
+ *   Hash key to register.
+ * @param key_len
+ *   Hash key length in bytes.
+ *
+ * @return
+ *   0 on success, errno value on failure.
+ */
+int
+rss_hash_rss_conf_new_key(struct priv *priv, const uint8_t *key,
+			  unsigned int key_len)
+{
+	struct rte_eth_rss_conf *rss_conf;
+
+	rss_conf = rte_realloc(priv->rss_conf,
+			       (sizeof(*rss_conf) + key_len),
+			       0);
+	if (!rss_conf)
+		return ENOMEM;
+	rss_conf->rss_key = (void *)(rss_conf + 1);
+	rss_conf->rss_key_len = key_len;
+	memcpy(rss_conf->rss_key, key, key_len);
+	priv->rss_conf = rss_conf;
+	return 0;
+}
+
+/**
+ * DPDK callback to update the RSS hash configuration.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param[in] rss_conf
+ *   RSS configuration data.
+ *
+ * @return
+ *   0 on success, negative errno value on failure.
+ */
+int
+mlx5_rss_hash_update(struct rte_eth_dev *dev,
+		     struct rte_eth_rss_conf *rss_conf)
+{
+	struct priv *priv = dev->data->dev_private;
+	int err = 0;
+
+	priv_lock(priv);
+
+	assert(priv->rss_conf != NULL);
+
+	/* Apply configuration. */
+	if (rss_conf->rss_key)
+		err = rss_hash_rss_conf_new_key(priv,
+						rss_conf->rss_key,
+						rss_conf->rss_key_len);
+	else
+		err = rss_hash_rss_conf_new_key(priv,
+						rss_hash_default_key,
+						rss_hash_default_key_len);
+
+	/* Store the configuration set into port configure.
+	 * This will enable/disable hash RX queues associated to the protocols
+	 * enabled/disabled by this update. */
+	priv->dev->data->dev_conf.rx_adv_conf.rss_conf.rss_hf =
+		rss_conf->rss_hf;
+	priv_unlock(priv);
+	assert(err >= 0);
+	return -err;
+}
+
+/**
+ * DPDK callback to get the RSS hash configuration.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param[in, out] rss_conf
+ *   RSS configuration data.
+ *
+ * @return
+ *   0 on success, negative errno value on failure.
+ */
+int
+mlx5_rss_hash_conf_get(struct rte_eth_dev *dev,
+		       struct rte_eth_rss_conf *rss_conf)
+{
+	struct priv *priv = dev->data->dev_private;
+
+	priv_lock(priv);
+
+	assert(priv->rss_conf != NULL);
+
+	if (rss_conf->rss_key &&
+	    rss_conf->rss_key_len >= priv->rss_conf->rss_key_len)
+		memcpy(rss_conf->rss_key,
+		       priv->rss_conf->rss_key,
+		       priv->rss_conf->rss_key_len);
+	rss_conf->rss_key_len = priv->rss_conf->rss_key_len;
+	/* FIXME: rss_hf should be more specific. */
+	rss_conf->rss_hf = ETH_RSS_PROTO_MASK;
+
+	priv_unlock(priv);
+	return 0;
+}
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 1e15ff9..79c2346 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -135,7 +135,7 @@ static const struct ind_table_init ind_table_init[] = {
 #define IND_TABLE_INIT_N RTE_DIM(ind_table_init)
 
 /* Default RSS hash key also used for ConnectX-3. */
-static uint8_t hash_rxq_default_key[] = {
+uint8_t rss_hash_default_key[] = {
 	0x2c, 0xc6, 0x81, 0xd1,
 	0x5b, 0xdb, 0xf4, 0xf7,
 	0xfc, 0xa2, 0x83, 0x19,
@@ -148,6 +148,9 @@ static uint8_t hash_rxq_default_key[] = {
 	0xfc, 0x1f, 0xdc, 0x2a,
 };
 
+/* Length of the default RSS hash key. */
+const size_t rss_hash_default_key_len = sizeof(rss_hash_default_key);
+
 /**
  * Populate flow steering rule for a given hash RX queue type using
  * information from hash_rxq_init[]. Nothing is written to flow_attr when
@@ -326,6 +329,7 @@ priv_create_hash_rxqs(struct priv *priv)
 	assert(priv->hash_rxqs_n == 0);
 	assert(priv->pd != NULL);
 	assert(priv->ctx != NULL);
+	assert(priv->rss_conf != NULL);
 	if (priv->rxqs_n == 0)
 		return EINVAL;
 	assert(priv->rxqs != NULL);
@@ -410,8 +414,8 @@ priv_create_hash_rxqs(struct priv *priv)
 			hash_rxq_type_from_n(&ind_table_init[j], k);
 		struct ibv_exp_rx_hash_conf hash_conf = {
 			.rx_hash_function = IBV_EXP_RX_HASH_FUNC_TOEPLITZ,
-			.rx_hash_key_len = sizeof(hash_rxq_default_key),
-			.rx_hash_key = hash_rxq_default_key,
+			.rx_hash_key_len = priv->rss_conf->rss_key_len,
+			.rx_hash_key = priv->rss_conf->rss_key,
 			.rx_hash_fields_mask = hash_rxq_init[type].hash_fields,
 			.rwq_ind_tbl = (*ind_tables)[j],
 		};
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index c31fa8e..a1bf11f 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -212,6 +212,9 @@ struct txq {
 
 /* mlx5_rxq.c */
 
+extern uint8_t rss_hash_default_key[];
+extern const size_t rss_hash_default_key_len;
+
 size_t hash_rxq_flow_attr(const struct hash_rxq *, struct ibv_flow_attr *,
 			  size_t);
 int priv_create_hash_rxqs(struct priv *);
-- 
2.1.0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [dpdk-dev] [PATCH v2 09/16] mlx5: use one RSS hash key per flow type
  2015-10-30 18:55 ` [dpdk-dev] [PATCH v2 00/16] " Adrien Mazarguil
                     ` (7 preceding siblings ...)
  2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 08/16] mlx5: add RSS hash update/get Adrien Mazarguil
@ 2015-10-30 18:55   ` Adrien Mazarguil
  2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 10/16] app/testpmd: add missing type to RSS hash commands Adrien Mazarguil
                     ` (7 subsequent siblings)
  16 siblings, 0 replies; 39+ messages in thread
From: Adrien Mazarguil @ 2015-10-30 18:55 UTC (permalink / raw)
  To: dev

From: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>

DPDK expects to have an RSS hash key per flow type (IPv4, IPv6, UDPv4,
etc.), to handle this the PMD must keep a table of hash keys to be able
to reconfigure the queues at each start/stop call.

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx5/mlx5.c      | 17 +++++++--
 drivers/net/mlx5/mlx5.h      |  6 ++--
 drivers/net/mlx5/mlx5_rss.c  | 85 +++++++++++++++++++++++++++++++++-----------
 drivers/net/mlx5/mlx5_rxq.c  | 24 +++++++++----
 drivers/net/mlx5/mlx5_rxtx.h |  4 +++
 5 files changed, 105 insertions(+), 31 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 5a3d198..97ce902 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -127,7 +127,11 @@ mlx5_dev_close(struct rte_eth_dev *dev)
 		claim_zero(ibv_close_device(priv->ctx));
 	} else
 		assert(priv->ctx == NULL);
-	rte_free(priv->rss_conf);
+	if (priv->rss_conf != NULL) {
+		for (i = 0; (i != hash_rxq_init_n); ++i)
+			rte_free((*priv->rss_conf)[i]);
+		rte_free(priv->rss_conf);
+	}
 	priv_unlock(priv);
 	memset(priv, 0, sizeof(*priv));
 }
@@ -377,10 +381,17 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 #endif /* HAVE_EXP_QUERY_DEVICE */
 
 		priv->vf = vf;
-		/* Register default RSS hash key. */
+		/* Allocate and register default RSS hash keys. */
+		priv->rss_conf = rte_calloc(__func__, hash_rxq_init_n,
+					    sizeof((*priv->rss_conf)[0]), 0);
+		if (priv->rss_conf == NULL) {
+			err = ENOMEM;
+			goto port_error;
+		}
 		err = rss_hash_rss_conf_new_key(priv,
 						rss_hash_default_key,
-						rss_hash_default_key_len);
+						rss_hash_default_key_len,
+						ETH_RSS_PROTO_MASK);
 		if (err)
 			goto port_error;
 		/* Configure the first MAC address by default. */
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 70bacf5..03e33d6 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -113,7 +113,8 @@ struct priv {
 	/* Hash RX QPs feeding the indirection table. */
 	struct hash_rxq (*hash_rxqs)[];
 	unsigned int hash_rxqs_n; /* Hash RX QPs array size. */
-	struct rte_eth_rss_conf *rss_conf; /* RSS configuration. */
+	/* RSS configuration array indexed by hash RX queue type. */
+	struct rte_eth_rss_conf *(*rss_conf)[];
 	rte_spinlock_t lock; /* Lock for control functions. */
 };
 
@@ -172,7 +173,8 @@ void mlx5_mac_addr_add(struct rte_eth_dev *, struct ether_addr *, uint32_t,
 
 /* mlx5_rss.c */
 
-int rss_hash_rss_conf_new_key(struct priv *, const uint8_t *, unsigned int);
+int rss_hash_rss_conf_new_key(struct priv *, const uint8_t *, unsigned int,
+			      uint64_t);
 int mlx5_rss_hash_update(struct rte_eth_dev *, struct rte_eth_rss_conf *);
 int mlx5_rss_hash_conf_get(struct rte_eth_dev *, struct rte_eth_rss_conf *);
 
diff --git a/drivers/net/mlx5/mlx5_rss.c b/drivers/net/mlx5/mlx5_rss.c
index 2dc58e5..bf19aca 100644
--- a/drivers/net/mlx5/mlx5_rss.c
+++ b/drivers/net/mlx5/mlx5_rss.c
@@ -61,6 +61,33 @@
 #include "mlx5_rxtx.h"
 
 /**
+ * Get a RSS configuration hash key.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ * @param rss_hf
+ *   RSS hash functions configuration must be retrieved for.
+ *
+ * @return
+ *   Pointer to a RSS configuration structure or NULL if rss_hf cannot
+ *   be matched.
+ */
+static struct rte_eth_rss_conf *
+rss_hash_get(struct priv *priv, uint64_t rss_hf)
+{
+	unsigned int i;
+
+	for (i = 0; (i != hash_rxq_init_n); ++i) {
+		uint64_t dpdk_rss_hf = hash_rxq_init[i].dpdk_rss_hf;
+
+		if (!(dpdk_rss_hf & rss_hf))
+			continue;
+		return (*priv->rss_conf)[i];
+	}
+	return NULL;
+}
+
+/**
  * Register a RSS key.
  *
  * @param priv
@@ -69,25 +96,35 @@
  *   Hash key to register.
  * @param key_len
  *   Hash key length in bytes.
+ * @param rss_hf
+ *   RSS hash functions the provided key applies to.
  *
  * @return
  *   0 on success, errno value on failure.
  */
 int
 rss_hash_rss_conf_new_key(struct priv *priv, const uint8_t *key,
-			  unsigned int key_len)
+			  unsigned int key_len, uint64_t rss_hf)
 {
-	struct rte_eth_rss_conf *rss_conf;
-
-	rss_conf = rte_realloc(priv->rss_conf,
-			       (sizeof(*rss_conf) + key_len),
-			       0);
-	if (!rss_conf)
-		return ENOMEM;
-	rss_conf->rss_key = (void *)(rss_conf + 1);
-	rss_conf->rss_key_len = key_len;
-	memcpy(rss_conf->rss_key, key, key_len);
-	priv->rss_conf = rss_conf;
+	unsigned int i;
+
+	for (i = 0; (i != hash_rxq_init_n); ++i) {
+		struct rte_eth_rss_conf *rss_conf;
+		uint64_t dpdk_rss_hf = hash_rxq_init[i].dpdk_rss_hf;
+
+		if (!(dpdk_rss_hf & rss_hf))
+			continue;
+		rss_conf = rte_realloc((*priv->rss_conf)[i],
+				       (sizeof(*rss_conf) + key_len),
+				       0);
+		if (!rss_conf)
+			return ENOMEM;
+		rss_conf->rss_key = (void *)(rss_conf + 1);
+		rss_conf->rss_key_len = key_len;
+		rss_conf->rss_hf = dpdk_rss_hf;
+		memcpy(rss_conf->rss_key, key, key_len);
+		(*priv->rss_conf)[i] = rss_conf;
+	}
 	return 0;
 }
 
@@ -117,11 +154,13 @@ mlx5_rss_hash_update(struct rte_eth_dev *dev,
 	if (rss_conf->rss_key)
 		err = rss_hash_rss_conf_new_key(priv,
 						rss_conf->rss_key,
-						rss_conf->rss_key_len);
+						rss_conf->rss_key_len,
+						rss_conf->rss_hf);
 	else
 		err = rss_hash_rss_conf_new_key(priv,
 						rss_hash_default_key,
-						rss_hash_default_key_len);
+						rss_hash_default_key_len,
+						ETH_RSS_PROTO_MASK);
 
 	/* Store the configuration set into port configure.
 	 * This will enable/disable hash RX queues associated to the protocols
@@ -149,19 +188,25 @@ mlx5_rss_hash_conf_get(struct rte_eth_dev *dev,
 		       struct rte_eth_rss_conf *rss_conf)
 {
 	struct priv *priv = dev->data->dev_private;
+	struct rte_eth_rss_conf *priv_rss_conf;
 
 	priv_lock(priv);
 
 	assert(priv->rss_conf != NULL);
 
+	priv_rss_conf = rss_hash_get(priv, rss_conf->rss_hf);
+	if (!priv_rss_conf) {
+		rss_conf->rss_hf = 0;
+		priv_unlock(priv);
+		return -EINVAL;
+	}
 	if (rss_conf->rss_key &&
-	    rss_conf->rss_key_len >= priv->rss_conf->rss_key_len)
+	    rss_conf->rss_key_len >= priv_rss_conf->rss_key_len)
 		memcpy(rss_conf->rss_key,
-		       priv->rss_conf->rss_key,
-		       priv->rss_conf->rss_key_len);
-	rss_conf->rss_key_len = priv->rss_conf->rss_key_len;
-	/* FIXME: rss_hf should be more specific. */
-	rss_conf->rss_hf = ETH_RSS_PROTO_MASK;
+		       priv_rss_conf->rss_key,
+		       priv_rss_conf->rss_key_len);
+	rss_conf->rss_key_len = priv_rss_conf->rss_key_len;
+	rss_conf->rss_hf = priv_rss_conf->rss_hf;
 
 	priv_unlock(priv);
 	return 0;
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 79c2346..d46fc13 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -65,12 +65,13 @@
 #include "mlx5_defs.h"
 
 /* Initialization data for hash RX queues. */
-static const struct hash_rxq_init hash_rxq_init[] = {
+const struct hash_rxq_init hash_rxq_init[] = {
 	[HASH_RXQ_TCPV4] = {
 		.hash_fields = (IBV_EXP_RX_HASH_SRC_IPV4 |
 				IBV_EXP_RX_HASH_DST_IPV4 |
 				IBV_EXP_RX_HASH_SRC_PORT_TCP |
 				IBV_EXP_RX_HASH_DST_PORT_TCP),
+		.dpdk_rss_hf = ETH_RSS_NONFRAG_IPV4_TCP,
 		.flow_priority = 0,
 		.flow_spec.tcp_udp = {
 			.type = IBV_FLOW_SPEC_TCP,
@@ -83,6 +84,7 @@ static const struct hash_rxq_init hash_rxq_init[] = {
 				IBV_EXP_RX_HASH_DST_IPV4 |
 				IBV_EXP_RX_HASH_SRC_PORT_UDP |
 				IBV_EXP_RX_HASH_DST_PORT_UDP),
+		.dpdk_rss_hf = ETH_RSS_NONFRAG_IPV4_UDP,
 		.flow_priority = 0,
 		.flow_spec.tcp_udp = {
 			.type = IBV_FLOW_SPEC_UDP,
@@ -93,6 +95,8 @@ static const struct hash_rxq_init hash_rxq_init[] = {
 	[HASH_RXQ_IPV4] = {
 		.hash_fields = (IBV_EXP_RX_HASH_SRC_IPV4 |
 				IBV_EXP_RX_HASH_DST_IPV4),
+		.dpdk_rss_hf = (ETH_RSS_IPV4 |
+				ETH_RSS_FRAG_IPV4),
 		.flow_priority = 1,
 		.flow_spec.ipv4 = {
 			.type = IBV_FLOW_SPEC_IPV4,
@@ -102,6 +106,7 @@ static const struct hash_rxq_init hash_rxq_init[] = {
 	},
 	[HASH_RXQ_ETH] = {
 		.hash_fields = 0,
+		.dpdk_rss_hf = 0,
 		.flow_priority = 2,
 		.flow_spec.eth = {
 			.type = IBV_FLOW_SPEC_ETH,
@@ -112,7 +117,7 @@ static const struct hash_rxq_init hash_rxq_init[] = {
 };
 
 /* Number of entries in hash_rxq_init[]. */
-static const unsigned int hash_rxq_init_n = RTE_DIM(hash_rxq_init);
+const unsigned int hash_rxq_init_n = RTE_DIM(hash_rxq_init);
 
 /* Initialization data for hash RX queue indirection tables. */
 static const struct ind_table_init ind_table_init[] = {
@@ -259,16 +264,18 @@ static unsigned int
 priv_make_ind_table_init(struct priv *priv,
 			 struct ind_table_init (*table)[IND_TABLE_INIT_N])
 {
+	uint64_t rss_hf;
 	unsigned int i;
 	unsigned int j;
 	unsigned int table_n = 0;
 	/* Mandatory to receive frames not handled by normal hash RX queues. */
 	unsigned int hash_types_sup = 1 << HASH_RXQ_ETH;
 
+	rss_hf = priv->dev->data->dev_conf.rx_adv_conf.rss_conf.rss_hf;
 	/* Process other protocols only if more than one queue. */
 	if (priv->rxqs_n > 1)
 		for (i = 0; (i != hash_rxq_init_n); ++i)
-			if (hash_rxq_init[i].hash_fields)
+			if (rss_hf & hash_rxq_init[i].dpdk_rss_hf)
 				hash_types_sup |= (1 << i);
 
 	/* Filter out entries whose protocols are not in the set. */
@@ -329,7 +336,6 @@ priv_create_hash_rxqs(struct priv *priv)
 	assert(priv->hash_rxqs_n == 0);
 	assert(priv->pd != NULL);
 	assert(priv->ctx != NULL);
-	assert(priv->rss_conf != NULL);
 	if (priv->rxqs_n == 0)
 		return EINVAL;
 	assert(priv->rxqs != NULL);
@@ -412,10 +418,16 @@ priv_create_hash_rxqs(struct priv *priv)
 		struct hash_rxq *hash_rxq = &(*hash_rxqs)[i];
 		enum hash_rxq_type type =
 			hash_rxq_type_from_n(&ind_table_init[j], k);
+		struct rte_eth_rss_conf *priv_rss_conf =
+			(*priv->rss_conf)[type];
 		struct ibv_exp_rx_hash_conf hash_conf = {
 			.rx_hash_function = IBV_EXP_RX_HASH_FUNC_TOEPLITZ,
-			.rx_hash_key_len = priv->rss_conf->rss_key_len,
-			.rx_hash_key = priv->rss_conf->rss_key,
+			.rx_hash_key_len = (priv_rss_conf ?
+					    priv_rss_conf->rss_key_len :
+					    rss_hash_default_key_len),
+			.rx_hash_key = (priv_rss_conf ?
+					priv_rss_conf->rss_key :
+					rss_hash_default_key),
 			.rx_hash_fields_mask = hash_rxq_init[type].hash_fields,
 			.rwq_ind_tbl = (*ind_tables)[j],
 		};
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index a1bf11f..651eb95 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -145,6 +145,7 @@ struct flow_attr_spec_eth {
 /* Initialization data for hash RX queue. */
 struct hash_rxq_init {
 	uint64_t hash_fields; /* Fields that participate in the hash. */
+	uint64_t dpdk_rss_hf; /* Matching DPDK RSS hash fields. */
 	unsigned int flow_priority; /* Flow priority to use. */
 	struct ibv_flow_spec flow_spec; /* Flow specification template. */
 	const struct hash_rxq_init *underlayer; /* Pointer to underlayer. */
@@ -212,6 +213,9 @@ struct txq {
 
 /* mlx5_rxq.c */
 
+extern const struct hash_rxq_init hash_rxq_init[];
+extern const unsigned int hash_rxq_init_n;
+
 extern uint8_t rss_hash_default_key[];
 extern const size_t rss_hash_default_key_len;
 
-- 
2.1.0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [dpdk-dev] [PATCH v2 10/16] app/testpmd: add missing type to RSS hash commands
  2015-10-30 18:55 ` [dpdk-dev] [PATCH v2 00/16] " Adrien Mazarguil
                     ` (8 preceding siblings ...)
  2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 09/16] mlx5: use one RSS hash key per flow type Adrien Mazarguil
@ 2015-10-30 18:55   ` Adrien Mazarguil
  2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 11/16] app/testpmd: fix missing initialization in the RSS hash show command Adrien Mazarguil
                     ` (6 subsequent siblings)
  16 siblings, 0 replies; 39+ messages in thread
From: Adrien Mazarguil @ 2015-10-30 18:55 UTC (permalink / raw)
  To: dev

From: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>

DPDK uses a structure to get or set a new hash key (see
eth_rte_rss_hash_conf).  rss_hf field from this structure is used in
rss_hash_get_conf to retrieve the hash key and in rss_hash_update uses
it to verify the key exists before trying to update it.

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 app/test-pmd/cmdline.c                      | 45 +++++++++++++++++---
 app/test-pmd/config.c                       | 66 ++++++++++++++++++-----------
 app/test-pmd/testpmd.h                      |  6 ++-
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  2 +-
 4 files changed, 85 insertions(+), 34 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index b3c36f3..7a27862 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -190,7 +190,9 @@ static void cmd_help_long_parsed(void *parsed_result,
 			" by masks on port X. size is used to indicate the"
 			" hardware supported reta size\n\n"
 
-			"show port rss-hash [key]\n"
+			"show port rss-hash ipv4|ipv4-frag|ipv4-tcp|ipv4-udp|"
+			"ipv4-sctp|ipv4-other|ipv6|ipv6-frag|ipv6-tcp|ipv6-udp|ipv6-sctp|"
+			"ipv6-other|l2-payload|ipv6-ex|ipv6-tcp-ex|ipv6-udp-ex [key]\n"
 			"    Display the RSS hash functions and RSS hash key"
 			" of port X\n\n"
 
@@ -1498,6 +1500,7 @@ struct cmd_config_rss_hash_key {
 	cmdline_fixed_string_t config;
 	uint8_t port_id;
 	cmdline_fixed_string_t rss_hash_key;
+	cmdline_fixed_string_t rss_type;
 	cmdline_fixed_string_t key;
 };
 
@@ -1555,7 +1558,8 @@ cmd_config_rss_hash_key_parsed(void *parsed_result,
 			return;
 		hash_key[i] = (uint8_t) ((xdgt0 * 16) + xdgt1);
 	}
-	port_rss_hash_key_update(res->port_id, hash_key);
+	port_rss_hash_key_update(res->port_id, res->rss_type, hash_key,
+				 RSS_HASH_KEY_LENGTH);
 }
 
 cmdline_parse_token_string_t cmd_config_rss_hash_key_port =
@@ -1568,18 +1572,29 @@ cmdline_parse_token_num_t cmd_config_rss_hash_key_port_id =
 cmdline_parse_token_string_t cmd_config_rss_hash_key_rss_hash_key =
 	TOKEN_STRING_INITIALIZER(struct cmd_config_rss_hash_key,
 				 rss_hash_key, "rss-hash-key");
+cmdline_parse_token_string_t cmd_config_rss_hash_key_rss_type =
+	TOKEN_STRING_INITIALIZER(struct cmd_config_rss_hash_key, rss_type,
+				 "ipv4#ipv4-frag#ipv4-tcp#ipv4-udp#ipv4-sctp#"
+				 "ipv4-other#ipv6#ipv6-frag#ipv6-tcp#ipv6-udp#"
+				 "ipv6-sctp#ipv6-other#l2-payload#ipv6-ex#"
+				 "ipv6-tcp-ex#ipv6-udp-ex");
 cmdline_parse_token_string_t cmd_config_rss_hash_key_value =
 	TOKEN_STRING_INITIALIZER(struct cmd_config_rss_hash_key, key, NULL);
 
 cmdline_parse_inst_t cmd_config_rss_hash_key = {
 	.f = cmd_config_rss_hash_key_parsed,
 	.data = NULL,
-	.help_str = "port config X rss-hash-key 80 hexa digits",
+	.help_str =
+		"port config X rss-hash-key ipv4|ipv4-frag|ipv4-tcp|ipv4-udp|"
+		"ipv4-sctp|ipv4-other|ipv6|ipv6-frag|ipv6-tcp|ipv6-udp|"
+		"ipv6-sctp|ipv6-other|l2-payload|"
+		"ipv6-ex|ipv6-tcp-ex|ipv6-udp-ex 80 hexa digits\n",
 	.tokens = {
 		(void *)&cmd_config_rss_hash_key_port,
 		(void *)&cmd_config_rss_hash_key_config,
 		(void *)&cmd_config_rss_hash_key_port_id,
 		(void *)&cmd_config_rss_hash_key_rss_hash_key,
+		(void *)&cmd_config_rss_hash_key_rss_type,
 		(void *)&cmd_config_rss_hash_key_value,
 		NULL,
 	},
@@ -1929,6 +1944,7 @@ struct cmd_showport_rss_hash {
 	cmdline_fixed_string_t port;
 	uint8_t port_id;
 	cmdline_fixed_string_t rss_hash;
+	cmdline_fixed_string_t rss_type;
 	cmdline_fixed_string_t key; /* optional argument */
 };
 
@@ -1938,7 +1954,8 @@ static void cmd_showport_rss_hash_parsed(void *parsed_result,
 {
 	struct cmd_showport_rss_hash *res = parsed_result;
 
-	port_rss_hash_conf_show(res->port_id, show_rss_key != NULL);
+	port_rss_hash_conf_show(res->port_id, res->rss_type,
+				show_rss_key != NULL);
 }
 
 cmdline_parse_token_string_t cmd_showport_rss_hash_show =
@@ -1950,18 +1967,29 @@ cmdline_parse_token_num_t cmd_showport_rss_hash_port_id =
 cmdline_parse_token_string_t cmd_showport_rss_hash_rss_hash =
 	TOKEN_STRING_INITIALIZER(struct cmd_showport_rss_hash, rss_hash,
 				 "rss-hash");
+cmdline_parse_token_string_t cmd_showport_rss_hash_rss_hash_info =
+	TOKEN_STRING_INITIALIZER(struct cmd_showport_rss_hash, rss_type,
+				 "ipv4#ipv4-frag#ipv4-tcp#ipv4-udp#ipv4-sctp#"
+				 "ipv4-other#ipv6#ipv6-frag#ipv6-tcp#ipv6-udp#"
+				 "ipv6-sctp#ipv6-other#l2-payload#ipv6-ex#"
+				 "ipv6-tcp-ex#ipv6-udp-ex");
 cmdline_parse_token_string_t cmd_showport_rss_hash_rss_key =
 	TOKEN_STRING_INITIALIZER(struct cmd_showport_rss_hash, key, "key");
 
 cmdline_parse_inst_t cmd_showport_rss_hash = {
 	.f = cmd_showport_rss_hash_parsed,
 	.data = NULL,
-	.help_str = "show port X rss-hash (X = port number)\n",
+	.help_str =
+		"show port X rss-hash ipv4|ipv4-frag|ipv4-tcp|ipv4-udp|"
+		"ipv4-sctp|ipv4-other|ipv6|ipv6-frag|ipv6-tcp|ipv6-udp|"
+		"ipv6-sctp|ipv6-other|l2-payload|"
+		"ipv6-ex|ipv6-tcp-ex|ipv6-udp-ex (X = port number)\n",
 	.tokens = {
 		(void *)&cmd_showport_rss_hash_show,
 		(void *)&cmd_showport_rss_hash_port,
 		(void *)&cmd_showport_rss_hash_port_id,
 		(void *)&cmd_showport_rss_hash_rss_hash,
+		(void *)&cmd_showport_rss_hash_rss_hash_info,
 		NULL,
 	},
 };
@@ -1969,12 +1997,17 @@ cmdline_parse_inst_t cmd_showport_rss_hash = {
 cmdline_parse_inst_t cmd_showport_rss_hash_key = {
 	.f = cmd_showport_rss_hash_parsed,
 	.data = (void *)1,
-	.help_str = "show port X rss-hash key (X = port number)\n",
+	.help_str =
+		"show port X rss-hash ipv4|ipv4-frag|ipv4-tcp|ipv4-udp|"
+		"ipv4-sctp|ipv4-other|ipv6|ipv6-frag|ipv6-tcp|ipv6-udp|"
+		"ipv6-sctp|ipv6-other|l2-payload|"
+		"ipv6-ex|ipv6-tcp-ex|ipv6-udp-ex key (X = port number)\n",
 	.tokens = {
 		(void *)&cmd_showport_rss_hash_show,
 		(void *)&cmd_showport_rss_hash_port,
 		(void *)&cmd_showport_rss_hash_port_id,
 		(void *)&cmd_showport_rss_hash_rss_hash,
+		(void *)&cmd_showport_rss_hash_rss_hash_info,
 		(void *)&cmd_showport_rss_hash_rss_key,
 		NULL,
 	},
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 1ec6a77..8474706 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -97,6 +97,30 @@
 
 static char *flowtype_to_str(uint16_t flow_type);
 
+struct rss_type_info {
+	char str[32];
+	uint64_t rss_type;
+};
+
+static const struct rss_type_info rss_type_table[] = {
+	{ "ipv4", ETH_RSS_IPV4 },
+	{ "ipv4-frag", ETH_RSS_FRAG_IPV4 },
+	{ "ipv4-tcp", ETH_RSS_NONFRAG_IPV4_TCP },
+	{ "ipv4-udp", ETH_RSS_NONFRAG_IPV4_UDP },
+	{ "ipv4-sctp", ETH_RSS_NONFRAG_IPV4_SCTP },
+	{ "ipv4-other", ETH_RSS_NONFRAG_IPV4_OTHER },
+	{ "ipv6", ETH_RSS_IPV6 },
+	{ "ipv6-frag", ETH_RSS_FRAG_IPV6 },
+	{ "ipv6-tcp", ETH_RSS_NONFRAG_IPV6_TCP },
+	{ "ipv6-udp", ETH_RSS_NONFRAG_IPV6_UDP },
+	{ "ipv6-sctp", ETH_RSS_NONFRAG_IPV6_SCTP },
+	{ "ipv6-other", ETH_RSS_NONFRAG_IPV6_OTHER },
+	{ "l2-payload", ETH_RSS_L2_PAYLOAD },
+	{ "ipv6-ex", ETH_RSS_IPV6_EX },
+	{ "ipv6-tcp-ex", ETH_RSS_IPV6_TCP_EX },
+	{ "ipv6-udp-ex", ETH_RSS_IPV6_UDP_EX },
+};
+
 static void
 print_ethaddr(const char *name, struct ether_addr *eth_addr)
 {
@@ -852,31 +876,8 @@ port_rss_reta_info(portid_t port_id,
  * key of the port.
  */
 void
-port_rss_hash_conf_show(portid_t port_id, int show_rss_key)
+port_rss_hash_conf_show(portid_t port_id, char rss_info[], int show_rss_key)
 {
-	struct rss_type_info {
-		char str[32];
-		uint64_t rss_type;
-	};
-	static const struct rss_type_info rss_type_table[] = {
-		{"ipv4", ETH_RSS_IPV4},
-		{"ipv4-frag", ETH_RSS_FRAG_IPV4},
-		{"ipv4-tcp", ETH_RSS_NONFRAG_IPV4_TCP},
-		{"ipv4-udp", ETH_RSS_NONFRAG_IPV4_UDP},
-		{"ipv4-sctp", ETH_RSS_NONFRAG_IPV4_SCTP},
-		{"ipv4-other", ETH_RSS_NONFRAG_IPV4_OTHER},
-		{"ipv6", ETH_RSS_IPV6},
-		{"ipv6-frag", ETH_RSS_FRAG_IPV6},
-		{"ipv6-tcp", ETH_RSS_NONFRAG_IPV6_TCP},
-		{"ipv6-udp", ETH_RSS_NONFRAG_IPV6_UDP},
-		{"ipv6-sctp", ETH_RSS_NONFRAG_IPV6_SCTP},
-		{"ipv6-other", ETH_RSS_NONFRAG_IPV6_OTHER},
-		{"l2-payload", ETH_RSS_L2_PAYLOAD},
-		{"ipv6-ex", ETH_RSS_IPV6_EX},
-		{"ipv6-tcp-ex", ETH_RSS_IPV6_TCP_EX},
-		{"ipv6-udp-ex", ETH_RSS_IPV6_UDP_EX},
-	};
-
 	struct rte_eth_rss_conf rss_conf;
 	uint8_t rss_key[10 * 4];
 	uint64_t rss_hf;
@@ -885,6 +886,13 @@ port_rss_hash_conf_show(portid_t port_id, int show_rss_key)
 
 	if (port_id_is_invalid(port_id, ENABLED_WARN))
 		return;
+
+	rss_conf.rss_hf = 0;
+	for (i = 0; i < RTE_DIM(rss_type_table); i++) {
+		if (!strcmp(rss_info, rss_type_table[i].str))
+			rss_conf.rss_hf = rss_type_table[i].rss_type;
+	}
+
 	/* Get RSS hash key if asked to display it */
 	rss_conf.rss_key = (show_rss_key) ? rss_key : NULL;
 	diag = rte_eth_dev_rss_hash_conf_get(port_id, &rss_conf);
@@ -922,12 +930,20 @@ port_rss_hash_conf_show(portid_t port_id, int show_rss_key)
 }
 
 void
-port_rss_hash_key_update(portid_t port_id, uint8_t *hash_key)
+port_rss_hash_key_update(portid_t port_id, char rss_type[], uint8_t *hash_key,
+			 uint hash_key_len)
 {
 	struct rte_eth_rss_conf rss_conf;
 	int diag;
+	unsigned int i;
 
 	rss_conf.rss_key = NULL;
+	rss_conf.rss_key_len = hash_key_len;
+	rss_conf.rss_hf = 0;
+	for (i = 0; i < RTE_DIM(rss_type_table); i++) {
+		if (!strcmp(rss_type_table[i].str, rss_type))
+			rss_conf.rss_hf = rss_type_table[i].rss_type;
+	}
 	diag = rte_eth_dev_rss_hash_conf_get(port_id, &rss_conf);
 	if (diag == 0) {
 		rss_conf.rss_key = hash_key;
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index f925df7..513b1d8 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -562,8 +562,10 @@ int set_queue_rate_limit(portid_t port_id, uint16_t queue_idx, uint16_t rate);
 int set_vf_rate_limit(portid_t port_id, uint16_t vf, uint16_t rate,
 				uint64_t q_msk);
 
-void port_rss_hash_conf_show(portid_t port_id, int show_rss_key);
-void port_rss_hash_key_update(portid_t port_id, uint8_t *hash_key);
+void port_rss_hash_conf_show(portid_t port_id, char rss_info[],
+			     int show_rss_key);
+void port_rss_hash_key_update(portid_t port_id, char rss_type[],
+			      uint8_t *hash_key, uint hash_key_len);
 void get_syn_filter(uint8_t port_id);
 void get_ethertype_filter(uint8_t port_id, uint16_t index);
 void get_2tuple_filter(uint8_t port_id, uint16_t index);
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 71d831b..b74819b 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -190,7 +190,7 @@ show port rss-hash
 
 Display the RSS hash functions and RSS hash key of a port::
 
-   testpmd> show port (port_id) rss-hash [key]
+   testpmd> show port (port_id) rss-hash ipv4|ipv4-frag|ipv4-tcp|ipv4-udp|ipv4-sctp|ipv4-other|ipv6|ipv6-frag|ipv6-tcp|ipv6-udp|ipv6-sctp|ipv6-other|l2-payload|ipv6-ex|ipv6-tcp-ex|ipv6-udp-ex [key]
 
 clear port
 ~~~~~~~~~~
-- 
2.1.0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [dpdk-dev] [PATCH v2 11/16] app/testpmd: fix missing initialization in the RSS hash show command
  2015-10-30 18:55 ` [dpdk-dev] [PATCH v2 00/16] " Adrien Mazarguil
                     ` (9 preceding siblings ...)
  2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 10/16] app/testpmd: add missing type to RSS hash commands Adrien Mazarguil
@ 2015-10-30 18:55   ` Adrien Mazarguil
  2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 12/16] mlx5: disable useless flows in promiscuous mode Adrien Mazarguil
                     ` (5 subsequent siblings)
  16 siblings, 0 replies; 39+ messages in thread
From: Adrien Mazarguil @ 2015-10-30 18:55 UTC (permalink / raw)
  To: dev

The "show port X rss-hash" command sometimes displays garbage instead of the
expected RSS hash key because the maximum key length is undefined. When the
requested key is too large to fit in the buffer,
rte_eth_dev_rss_hash_conf_get() does not update it.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 app/test-pmd/config.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 8474706..d6f4e64 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -879,7 +879,7 @@ void
 port_rss_hash_conf_show(portid_t port_id, char rss_info[], int show_rss_key)
 {
 	struct rte_eth_rss_conf rss_conf;
-	uint8_t rss_key[10 * 4];
+	uint8_t rss_key[10 * 4] = "";
 	uint64_t rss_hf;
 	uint8_t i;
 	int diag;
@@ -895,6 +895,7 @@ port_rss_hash_conf_show(portid_t port_id, char rss_info[], int show_rss_key)
 
 	/* Get RSS hash key if asked to display it */
 	rss_conf.rss_key = (show_rss_key) ? rss_key : NULL;
+	rss_conf.rss_key_len = sizeof(rss_key);
 	diag = rte_eth_dev_rss_hash_conf_get(port_id, &rss_conf);
 	if (diag != 0) {
 		switch (diag) {
-- 
2.1.0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [dpdk-dev] [PATCH v2 12/16] mlx5: disable useless flows in promiscuous mode
  2015-10-30 18:55 ` [dpdk-dev] [PATCH v2 00/16] " Adrien Mazarguil
                     ` (10 preceding siblings ...)
  2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 11/16] app/testpmd: fix missing initialization in the RSS hash show command Adrien Mazarguil
@ 2015-10-30 18:55   ` Adrien Mazarguil
  2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 13/16] mlx5: add IPv6 RSS support using experimental flows Adrien Mazarguil
                     ` (4 subsequent siblings)
  16 siblings, 0 replies; 39+ messages in thread
From: Adrien Mazarguil @ 2015-10-30 18:55 UTC (permalink / raw)
  To: dev; +Cc: Yaacov Hazan

From: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>

Only a single flow per hash RX queue is needed in promiscuous mode.
Disable others to free up hardware resources.

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>
---
 drivers/net/mlx5/mlx5_mac.c     |  5 +++++
 drivers/net/mlx5/mlx5_rxmode.c  | 10 ++++++++++
 drivers/net/mlx5/mlx5_rxq.c     | 29 +++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_rxtx.h    |  7 +++++++
 drivers/net/mlx5/mlx5_trigger.c |  6 +++---
 5 files changed, 54 insertions(+), 3 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_mac.c b/drivers/net/mlx5/mlx5_mac.c
index d3ab5b9..c7927c1 100644
--- a/drivers/net/mlx5/mlx5_mac.c
+++ b/drivers/net/mlx5/mlx5_mac.c
@@ -410,6 +410,8 @@ priv_mac_addr_add(struct priv *priv, unsigned int mac_index,
 			(*mac)[3], (*mac)[4], (*mac)[5]
 		}
 	};
+	if (!priv_allow_flow_type(priv, HASH_RXQ_FLOW_TYPE_MAC))
+		goto end;
 	for (i = 0; (i != priv->hash_rxqs_n); ++i) {
 		ret = hash_rxq_mac_addr_add(&(*priv->hash_rxqs)[i], mac_index);
 		if (!ret)
@@ -420,6 +422,7 @@ priv_mac_addr_add(struct priv *priv, unsigned int mac_index,
 					      mac_index);
 		return ret;
 	}
+end:
 	BITFIELD_SET(priv->mac_configured, mac_index);
 	return 0;
 }
@@ -439,6 +442,8 @@ priv_mac_addrs_enable(struct priv *priv)
 	unsigned int i;
 	int ret;
 
+	if (!priv_allow_flow_type(priv, HASH_RXQ_FLOW_TYPE_MAC))
+		return 0;
 	for (i = 0; (i != priv->hash_rxqs_n); ++i) {
 		ret = hash_rxq_mac_addrs_add(&(*priv->hash_rxqs)[i]);
 		if (!ret)
diff --git a/drivers/net/mlx5/mlx5_rxmode.c b/drivers/net/mlx5/mlx5_rxmode.c
index 7794608..f7de1b8 100644
--- a/drivers/net/mlx5/mlx5_rxmode.c
+++ b/drivers/net/mlx5/mlx5_rxmode.c
@@ -113,6 +113,8 @@ priv_promiscuous_enable(struct priv *priv)
 {
 	unsigned int i;
 
+	if (!priv_allow_flow_type(priv, HASH_RXQ_FLOW_TYPE_PROMISC))
+		return 0;
 	for (i = 0; (i != priv->hash_rxqs_n); ++i) {
 		struct hash_rxq *hash_rxq = &(*priv->hash_rxqs)[i];
 		int ret;
@@ -147,6 +149,10 @@ mlx5_promiscuous_enable(struct rte_eth_dev *dev)
 	ret = priv_promiscuous_enable(priv);
 	if (ret)
 		ERROR("cannot enable promiscuous mode: %s", strerror(ret));
+	else {
+		priv_mac_addrs_disable(priv);
+		priv_allmulticast_disable(priv);
+	}
 	priv_unlock(priv);
 }
 
@@ -196,6 +202,8 @@ mlx5_promiscuous_disable(struct rte_eth_dev *dev)
 	priv_lock(priv);
 	priv->promisc_req = 0;
 	priv_promiscuous_disable(priv);
+	priv_mac_addrs_enable(priv);
+	priv_allmulticast_enable(priv);
 	priv_unlock(priv);
 }
 
@@ -266,6 +274,8 @@ priv_allmulticast_enable(struct priv *priv)
 {
 	unsigned int i;
 
+	if (!priv_allow_flow_type(priv, HASH_RXQ_FLOW_TYPE_ALLMULTI))
+		return 0;
 	for (i = 0; (i != priv->hash_rxqs_n); ++i) {
 		struct hash_rxq *hash_rxq = &(*priv->hash_rxqs)[i];
 		int ret;
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index d46fc13..bdd5429 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -539,6 +539,35 @@ priv_destroy_hash_rxqs(struct priv *priv)
 }
 
 /**
+ * Check whether a given flow type is allowed.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ * @param type
+ *   Flow type to check.
+ *
+ * @return
+ *   Nonzero if the given flow type is allowed.
+ */
+int
+priv_allow_flow_type(struct priv *priv, enum hash_rxq_flow_type type)
+{
+	/* Only FLOW_TYPE_PROMISC is allowed when promiscuous mode
+	 * has been requested. */
+	if (priv->promisc_req)
+		return (type == HASH_RXQ_FLOW_TYPE_PROMISC);
+	switch (type) {
+	case HASH_RXQ_FLOW_TYPE_PROMISC:
+		return !!priv->promisc_req;
+	case HASH_RXQ_FLOW_TYPE_ALLMULTI:
+		return !!priv->allmulti_req;
+	case HASH_RXQ_FLOW_TYPE_MAC:
+		return 1;
+	}
+	return 0;
+}
+
+/**
  * Allocate RX queue elements with scattered packets support.
  *
  * @param rxq
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 651eb95..dc23f4f 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -159,6 +159,12 @@ struct ind_table_init {
 	unsigned int hash_types_n;
 };
 
+enum hash_rxq_flow_type {
+	HASH_RXQ_FLOW_TYPE_MAC,
+	HASH_RXQ_FLOW_TYPE_PROMISC,
+	HASH_RXQ_FLOW_TYPE_ALLMULTI,
+};
+
 struct hash_rxq {
 	struct priv *priv; /* Back pointer to private data. */
 	struct ibv_qp *qp; /* Hash RX QP. */
@@ -223,6 +229,7 @@ size_t hash_rxq_flow_attr(const struct hash_rxq *, struct ibv_flow_attr *,
 			  size_t);
 int priv_create_hash_rxqs(struct priv *);
 void priv_destroy_hash_rxqs(struct priv *);
+int priv_allow_flow_type(struct priv *, enum hash_rxq_flow_type);
 void rxq_cleanup(struct rxq *);
 int rxq_rehash(struct rte_eth_dev *, struct rxq *);
 int rxq_setup(struct rte_eth_dev *, struct rxq *, uint16_t, unsigned int,
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 233c0d8..68e00a0 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -70,10 +70,10 @@ mlx5_dev_start(struct rte_eth_dev *dev)
 	DEBUG("%p: allocating and configuring hash RX queues", (void *)dev);
 	err = priv_create_hash_rxqs(priv);
 	if (!err)
-		err = priv_mac_addrs_enable(priv);
-	if (!err && priv->promisc_req)
 		err = priv_promiscuous_enable(priv);
-	if (!err && priv->allmulti_req)
+	if (!err)
+		err = priv_mac_addrs_enable(priv);
+	if (!err)
 		err = priv_allmulticast_enable(priv);
 	if (!err)
 		priv->started = 1;
-- 
2.1.0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [dpdk-dev] [PATCH v2 13/16] mlx5: add IPv6 RSS support using experimental flows
  2015-10-30 18:55 ` [dpdk-dev] [PATCH v2 00/16] " Adrien Mazarguil
                     ` (11 preceding siblings ...)
  2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 12/16] mlx5: disable useless flows in promiscuous mode Adrien Mazarguil
@ 2015-10-30 18:55   ` Adrien Mazarguil
  2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 14/16] mlx5: enable multi packet send WR in TX CQ Adrien Mazarguil
                     ` (3 subsequent siblings)
  16 siblings, 0 replies; 39+ messages in thread
From: Adrien Mazarguil @ 2015-10-30 18:55 UTC (permalink / raw)
  To: dev

Normal flows do not currently provide IPv6 support.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx5/Makefile      |  4 +++
 drivers/net/mlx5/mlx5_mac.c    | 18 ++++++------
 drivers/net/mlx5/mlx5_rxmode.c | 22 +++++++--------
 drivers/net/mlx5/mlx5_rxq.c    | 63 +++++++++++++++++++++++++++++++++++++-----
 drivers/net/mlx5/mlx5_rxtx.h   | 20 +++++++++-----
 5 files changed, 93 insertions(+), 34 deletions(-)

diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 54f1e89..2f9f2b8 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -116,6 +116,10 @@ mlx5_autoconf.h: $(RTE_SDK)/scripts/auto-config-h.sh
 		HAVE_EXP_QUERY_DEVICE \
 		infiniband/verbs.h \
 		type 'struct ibv_exp_device_attr' $(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_FLOW_SPEC_IPV6 \
+		infiniband/verbs.h \
+		type 'struct ibv_exp_flow_spec_ipv6' $(AUTOCONF_OUTPUT)
 
 mlx5.o: mlx5_autoconf.h
 
diff --git a/drivers/net/mlx5/mlx5_mac.c b/drivers/net/mlx5/mlx5_mac.c
index c7927c1..e37ce06 100644
--- a/drivers/net/mlx5/mlx5_mac.c
+++ b/drivers/net/mlx5/mlx5_mac.c
@@ -120,8 +120,8 @@ hash_rxq_del_mac_flow(struct hash_rxq *hash_rxq, unsigned int mac_index,
 	      (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5],
 	      mac_index,
 	      vlan_index);
-	claim_zero(ibv_destroy_flow(hash_rxq->mac_flow
-				    [mac_index][vlan_index]));
+	claim_zero(ibv_exp_destroy_flow(hash_rxq->mac_flow
+					[mac_index][vlan_index]));
 	hash_rxq->mac_flow[mac_index][vlan_index] = NULL;
 }
 
@@ -237,14 +237,14 @@ static int
 hash_rxq_add_mac_flow(struct hash_rxq *hash_rxq, unsigned int mac_index,
 		      unsigned int vlan_index)
 {
-	struct ibv_flow *flow;
+	struct ibv_exp_flow *flow;
 	struct priv *priv = hash_rxq->priv;
 	const uint8_t (*mac)[ETHER_ADDR_LEN] =
 			(const uint8_t (*)[ETHER_ADDR_LEN])
 			priv->mac[mac_index].addr_bytes;
 	FLOW_ATTR_SPEC_ETH(data, hash_rxq_flow_attr(hash_rxq, NULL, 0));
-	struct ibv_flow_attr *attr = &data->attr;
-	struct ibv_flow_spec_eth *spec = &data->spec;
+	struct ibv_exp_flow_attr *attr = &data->attr;
+	struct ibv_exp_flow_spec_eth *spec = &data->spec;
 	unsigned int vlan_enabled = !!priv->vlan_filter_n;
 	unsigned int vlan_id = priv->vlan_filter[vlan_index];
 
@@ -259,10 +259,10 @@ hash_rxq_add_mac_flow(struct hash_rxq *hash_rxq, unsigned int mac_index,
 	assert(((uint8_t *)attr + sizeof(*attr)) == (uint8_t *)spec);
 	hash_rxq_flow_attr(hash_rxq, attr, sizeof(data));
 	/* The first specification must be Ethernet. */
-	assert(spec->type == IBV_FLOW_SPEC_ETH);
+	assert(spec->type == IBV_EXP_FLOW_SPEC_ETH);
 	assert(spec->size == sizeof(*spec));
-	*spec = (struct ibv_flow_spec_eth){
-		.type = IBV_FLOW_SPEC_ETH,
+	*spec = (struct ibv_exp_flow_spec_eth){
+		.type = IBV_EXP_FLOW_SPEC_ETH,
 		.size = sizeof(*spec),
 		.val = {
 			.dst_mac = {
@@ -286,7 +286,7 @@ hash_rxq_add_mac_flow(struct hash_rxq *hash_rxq, unsigned int mac_index,
 	      vlan_id);
 	/* Create related flow. */
 	errno = 0;
-	flow = ibv_create_flow(hash_rxq->qp, attr);
+	flow = ibv_exp_create_flow(hash_rxq->qp, attr);
 	if (flow == NULL) {
 		/* It's not clear whether errno is always set in this case. */
 		ERROR("%p: flow configuration failed, errno=%d: %s",
diff --git a/drivers/net/mlx5/mlx5_rxmode.c b/drivers/net/mlx5/mlx5_rxmode.c
index f7de1b8..096fd18 100644
--- a/drivers/net/mlx5/mlx5_rxmode.c
+++ b/drivers/net/mlx5/mlx5_rxmode.c
@@ -73,9 +73,9 @@ static void hash_rxq_allmulticast_disable(struct hash_rxq *);
 static int
 hash_rxq_promiscuous_enable(struct hash_rxq *hash_rxq)
 {
-	struct ibv_flow *flow;
+	struct ibv_exp_flow *flow;
 	FLOW_ATTR_SPEC_ETH(data, hash_rxq_flow_attr(hash_rxq, NULL, 0));
-	struct ibv_flow_attr *attr = &data->attr;
+	struct ibv_exp_flow_attr *attr = &data->attr;
 
 	if (hash_rxq->promisc_flow != NULL)
 		return 0;
@@ -84,7 +84,7 @@ hash_rxq_promiscuous_enable(struct hash_rxq *hash_rxq)
 	 * on specific MAC addresses. */
 	hash_rxq_flow_attr(hash_rxq, attr, sizeof(data));
 	errno = 0;
-	flow = ibv_create_flow(hash_rxq->qp, attr);
+	flow = ibv_exp_create_flow(hash_rxq->qp, attr);
 	if (flow == NULL) {
 		/* It's not clear whether errno is always set in this case. */
 		ERROR("%p: flow configuration failed, errno=%d: %s",
@@ -168,7 +168,7 @@ hash_rxq_promiscuous_disable(struct hash_rxq *hash_rxq)
 	if (hash_rxq->promisc_flow == NULL)
 		return;
 	DEBUG("%p: disabling promiscuous mode", (void *)hash_rxq);
-	claim_zero(ibv_destroy_flow(hash_rxq->promisc_flow));
+	claim_zero(ibv_exp_destroy_flow(hash_rxq->promisc_flow));
 	hash_rxq->promisc_flow = NULL;
 	DEBUG("%p: promiscuous mode disabled", (void *)hash_rxq);
 }
@@ -219,10 +219,10 @@ mlx5_promiscuous_disable(struct rte_eth_dev *dev)
 static int
 hash_rxq_allmulticast_enable(struct hash_rxq *hash_rxq)
 {
-	struct ibv_flow *flow;
+	struct ibv_exp_flow *flow;
 	FLOW_ATTR_SPEC_ETH(data, hash_rxq_flow_attr(hash_rxq, NULL, 0));
-	struct ibv_flow_attr *attr = &data->attr;
-	struct ibv_flow_spec_eth *spec = &data->spec;
+	struct ibv_exp_flow_attr *attr = &data->attr;
+	struct ibv_exp_flow_spec_eth *spec = &data->spec;
 
 	if (hash_rxq->allmulti_flow != NULL)
 		return 0;
@@ -233,8 +233,8 @@ hash_rxq_allmulticast_enable(struct hash_rxq *hash_rxq)
 	 */
 	assert(((uint8_t *)attr + sizeof(*attr)) == (uint8_t *)spec);
 	hash_rxq_flow_attr(hash_rxq, attr, sizeof(data));
-	*spec = (struct ibv_flow_spec_eth){
-		.type = IBV_FLOW_SPEC_ETH,
+	*spec = (struct ibv_exp_flow_spec_eth){
+		.type = IBV_EXP_FLOW_SPEC_ETH,
 		.size = sizeof(*spec),
 		.val = {
 			.dst_mac = "\x01\x00\x00\x00\x00\x00",
@@ -244,7 +244,7 @@ hash_rxq_allmulticast_enable(struct hash_rxq *hash_rxq)
 		},
 	};
 	errno = 0;
-	flow = ibv_create_flow(hash_rxq->qp, attr);
+	flow = ibv_exp_create_flow(hash_rxq->qp, attr);
 	if (flow == NULL) {
 		/* It's not clear whether errno is always set in this case. */
 		ERROR("%p: flow configuration failed, errno=%d: %s",
@@ -328,7 +328,7 @@ hash_rxq_allmulticast_disable(struct hash_rxq *hash_rxq)
 	if (hash_rxq->allmulti_flow == NULL)
 		return;
 	DEBUG("%p: disabling allmulticast mode", (void *)hash_rxq);
-	claim_zero(ibv_destroy_flow(hash_rxq->allmulti_flow));
+	claim_zero(ibv_exp_destroy_flow(hash_rxq->allmulti_flow));
 	hash_rxq->allmulti_flow = NULL;
 	DEBUG("%p: allmulticast mode disabled", (void *)hash_rxq);
 }
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index bdd5429..084bf41 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -74,7 +74,7 @@ const struct hash_rxq_init hash_rxq_init[] = {
 		.dpdk_rss_hf = ETH_RSS_NONFRAG_IPV4_TCP,
 		.flow_priority = 0,
 		.flow_spec.tcp_udp = {
-			.type = IBV_FLOW_SPEC_TCP,
+			.type = IBV_EXP_FLOW_SPEC_TCP,
 			.size = sizeof(hash_rxq_init[0].flow_spec.tcp_udp),
 		},
 		.underlayer = &hash_rxq_init[HASH_RXQ_IPV4],
@@ -87,7 +87,7 @@ const struct hash_rxq_init hash_rxq_init[] = {
 		.dpdk_rss_hf = ETH_RSS_NONFRAG_IPV4_UDP,
 		.flow_priority = 0,
 		.flow_spec.tcp_udp = {
-			.type = IBV_FLOW_SPEC_UDP,
+			.type = IBV_EXP_FLOW_SPEC_UDP,
 			.size = sizeof(hash_rxq_init[0].flow_spec.tcp_udp),
 		},
 		.underlayer = &hash_rxq_init[HASH_RXQ_IPV4],
@@ -99,17 +99,57 @@ const struct hash_rxq_init hash_rxq_init[] = {
 				ETH_RSS_FRAG_IPV4),
 		.flow_priority = 1,
 		.flow_spec.ipv4 = {
-			.type = IBV_FLOW_SPEC_IPV4,
+			.type = IBV_EXP_FLOW_SPEC_IPV4,
 			.size = sizeof(hash_rxq_init[0].flow_spec.ipv4),
 		},
 		.underlayer = &hash_rxq_init[HASH_RXQ_ETH],
 	},
+#ifdef HAVE_FLOW_SPEC_IPV6
+	[HASH_RXQ_TCPV6] = {
+		.hash_fields = (IBV_EXP_RX_HASH_SRC_IPV6 |
+				IBV_EXP_RX_HASH_DST_IPV6 |
+				IBV_EXP_RX_HASH_SRC_PORT_TCP |
+				IBV_EXP_RX_HASH_DST_PORT_TCP),
+		.dpdk_rss_hf = ETH_RSS_NONFRAG_IPV6_TCP,
+		.flow_priority = 0,
+		.flow_spec.tcp_udp = {
+			.type = IBV_EXP_FLOW_SPEC_TCP,
+			.size = sizeof(hash_rxq_init[0].flow_spec.tcp_udp),
+		},
+		.underlayer = &hash_rxq_init[HASH_RXQ_IPV6],
+	},
+	[HASH_RXQ_UDPV6] = {
+		.hash_fields = (IBV_EXP_RX_HASH_SRC_IPV6 |
+				IBV_EXP_RX_HASH_DST_IPV6 |
+				IBV_EXP_RX_HASH_SRC_PORT_UDP |
+				IBV_EXP_RX_HASH_DST_PORT_UDP),
+		.dpdk_rss_hf = ETH_RSS_NONFRAG_IPV6_UDP,
+		.flow_priority = 0,
+		.flow_spec.tcp_udp = {
+			.type = IBV_EXP_FLOW_SPEC_UDP,
+			.size = sizeof(hash_rxq_init[0].flow_spec.tcp_udp),
+		},
+		.underlayer = &hash_rxq_init[HASH_RXQ_IPV6],
+	},
+	[HASH_RXQ_IPV6] = {
+		.hash_fields = (IBV_EXP_RX_HASH_SRC_IPV6 |
+				IBV_EXP_RX_HASH_DST_IPV6),
+		.dpdk_rss_hf = (ETH_RSS_IPV6 |
+				ETH_RSS_FRAG_IPV6),
+		.flow_priority = 1,
+		.flow_spec.ipv6 = {
+			.type = IBV_EXP_FLOW_SPEC_IPV6,
+			.size = sizeof(hash_rxq_init[0].flow_spec.ipv6),
+		},
+		.underlayer = &hash_rxq_init[HASH_RXQ_ETH],
+	},
+#endif /* HAVE_FLOW_SPEC_IPV6 */
 	[HASH_RXQ_ETH] = {
 		.hash_fields = 0,
 		.dpdk_rss_hf = 0,
 		.flow_priority = 2,
 		.flow_spec.eth = {
-			.type = IBV_FLOW_SPEC_ETH,
+			.type = IBV_EXP_FLOW_SPEC_ETH,
 			.size = sizeof(hash_rxq_init[0].flow_spec.eth),
 		},
 		.underlayer = NULL,
@@ -127,8 +167,17 @@ static const struct ind_table_init ind_table_init[] = {
 			1 << HASH_RXQ_TCPV4 |
 			1 << HASH_RXQ_UDPV4 |
 			1 << HASH_RXQ_IPV4 |
+#ifdef HAVE_FLOW_SPEC_IPV6
+			1 << HASH_RXQ_TCPV6 |
+			1 << HASH_RXQ_UDPV6 |
+			1 << HASH_RXQ_IPV6 |
+#endif /* HAVE_FLOW_SPEC_IPV6 */
 			0,
+#ifdef HAVE_FLOW_SPEC_IPV6
+		.hash_types_n = 6,
+#else /* HAVE_FLOW_SPEC_IPV6 */
 		.hash_types_n = 3,
+#endif /* HAVE_FLOW_SPEC_IPV6 */
 	},
 	{
 		.max_size = 1,
@@ -174,7 +223,7 @@ const size_t rss_hash_default_key_len = sizeof(rss_hash_default_key);
  */
 size_t
 hash_rxq_flow_attr(const struct hash_rxq *hash_rxq,
-		   struct ibv_flow_attr *flow_attr,
+		   struct ibv_exp_flow_attr *flow_attr,
 		   size_t flow_attr_size)
 {
 	size_t offset = sizeof(*flow_attr);
@@ -191,8 +240,8 @@ hash_rxq_flow_attr(const struct hash_rxq *hash_rxq,
 		return offset;
 	flow_attr_size = offset;
 	init = &hash_rxq_init[type];
-	*flow_attr = (struct ibv_flow_attr){
-		.type = IBV_FLOW_ATTR_NORMAL,
+	*flow_attr = (struct ibv_exp_flow_attr){
+		.type = IBV_EXP_FLOW_ATTR_NORMAL,
 		.priority = init->flow_priority,
 		.num_of_specs = 0,
 		.port = hash_rxq->priv->port,
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index dc23f4f..25e256f 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -59,6 +59,7 @@
 
 #include "mlx5_utils.h"
 #include "mlx5.h"
+#include "mlx5_autoconf.h"
 #include "mlx5_defs.h"
 
 struct mlx5_rxq_stats {
@@ -124,14 +125,19 @@ enum hash_rxq_type {
 	HASH_RXQ_TCPV4,
 	HASH_RXQ_UDPV4,
 	HASH_RXQ_IPV4,
+#ifdef HAVE_FLOW_SPEC_IPV6
+	HASH_RXQ_TCPV6,
+	HASH_RXQ_UDPV6,
+	HASH_RXQ_IPV6,
+#endif /* HAVE_FLOW_SPEC_IPV6 */
 	HASH_RXQ_ETH,
 };
 
 /* Flow structure with Ethernet specification. It is packed to prevent padding
  * between attr and spec as this layout is expected by libibverbs. */
 struct flow_attr_spec_eth {
-	struct ibv_flow_attr attr;
-	struct ibv_flow_spec_eth spec;
+	struct ibv_exp_flow_attr attr;
+	struct ibv_exp_flow_spec_eth spec;
 } __attribute__((packed));
 
 /* Define a struct flow_attr_spec_eth object as an array of at least
@@ -147,7 +153,7 @@ struct hash_rxq_init {
 	uint64_t hash_fields; /* Fields that participate in the hash. */
 	uint64_t dpdk_rss_hf; /* Matching DPDK RSS hash fields. */
 	unsigned int flow_priority; /* Flow priority to use. */
-	struct ibv_flow_spec flow_spec; /* Flow specification template. */
+	struct ibv_exp_flow_spec flow_spec; /* Flow specification template. */
 	const struct hash_rxq_init *underlayer; /* Pointer to underlayer. */
 };
 
@@ -170,9 +176,9 @@ struct hash_rxq {
 	struct ibv_qp *qp; /* Hash RX QP. */
 	enum hash_rxq_type type; /* Hash RX queue type. */
 	/* MAC flow steering rules, one per VLAN ID. */
-	struct ibv_flow *mac_flow[MLX5_MAX_MAC_ADDRESSES][MLX5_MAX_VLAN_IDS];
-	struct ibv_flow *promisc_flow; /* Promiscuous flow. */
-	struct ibv_flow *allmulti_flow; /* Multicast flow. */
+	struct ibv_exp_flow *mac_flow[MLX5_MAX_MAC_ADDRESSES][MLX5_MAX_VLAN_IDS];
+	struct ibv_exp_flow *promisc_flow; /* Promiscuous flow. */
+	struct ibv_exp_flow *allmulti_flow; /* Multicast flow. */
 };
 
 /* TX element. */
@@ -225,7 +231,7 @@ extern const unsigned int hash_rxq_init_n;
 extern uint8_t rss_hash_default_key[];
 extern const size_t rss_hash_default_key_len;
 
-size_t hash_rxq_flow_attr(const struct hash_rxq *, struct ibv_flow_attr *,
+size_t hash_rxq_flow_attr(const struct hash_rxq *, struct ibv_exp_flow_attr *,
 			  size_t);
 int priv_create_hash_rxqs(struct priv *);
 void priv_destroy_hash_rxqs(struct priv *);
-- 
2.1.0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [dpdk-dev] [PATCH v2 14/16] mlx5: enable multi packet send WR in TX CQ
  2015-10-30 18:55 ` [dpdk-dev] [PATCH v2 00/16] " Adrien Mazarguil
                     ` (12 preceding siblings ...)
  2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 13/16] mlx5: add IPv6 RSS support using experimental flows Adrien Mazarguil
@ 2015-10-30 18:55   ` Adrien Mazarguil
  2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 15/16] mlx5: fix compilation error with GCC < 4.6 Adrien Mazarguil
                     ` (2 subsequent siblings)
  16 siblings, 0 replies; 39+ messages in thread
From: Adrien Mazarguil @ 2015-10-30 18:55 UTC (permalink / raw)
  To: dev

For adapters that support it, this flag improves performance outside of VF
context.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx5/Makefile   | 5 +++++
 drivers/net/mlx5/mlx5_txq.c | 7 +++++++
 2 files changed, 12 insertions(+)

diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 2f9f2b8..ae568e6 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -120,6 +120,11 @@ mlx5_autoconf.h: $(RTE_SDK)/scripts/auto-config-h.sh
 		HAVE_FLOW_SPEC_IPV6 \
 		infiniband/verbs.h \
 		type 'struct ibv_exp_flow_spec_ipv6' $(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_EXP_QP_BURST_CREATE_ENABLE_MULTI_PACKET_SEND_WR \
+		infiniband/verbs.h \
+		enum IBV_EXP_QP_BURST_CREATE_ENABLE_MULTI_PACKET_SEND_WR \
+		$(AUTOCONF_OUTPUT)
 
 mlx5.o: mlx5_autoconf.h
 
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index a53b128..aa7581f 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -395,6 +395,13 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
 		.intf_scope = IBV_EXP_INTF_GLOBAL,
 		.intf = IBV_EXP_INTF_QP_BURST,
 		.obj = tmpl.qp,
+#ifdef HAVE_EXP_QP_BURST_CREATE_ENABLE_MULTI_PACKET_SEND_WR
+		/* Multi packet send WR can only be used outside of VF. */
+		.family_flags =
+			(!priv->vf ?
+			 IBV_EXP_QP_BURST_CREATE_ENABLE_MULTI_PACKET_SEND_WR :
+			 0),
+#endif
 	};
 	tmpl.if_qp = ibv_exp_query_intf(priv->ctx, &attr.params, &status);
 	if (tmpl.if_qp == NULL) {
-- 
2.1.0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [dpdk-dev] [PATCH v2 15/16] mlx5: fix compilation error with GCC < 4.6
  2015-10-30 18:55 ` [dpdk-dev] [PATCH v2 00/16] " Adrien Mazarguil
                     ` (13 preceding siblings ...)
  2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 14/16] mlx5: enable multi packet send WR in TX CQ Adrien Mazarguil
@ 2015-10-30 18:55   ` Adrien Mazarguil
  2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 16/16] doc: update mlx5 documentation Adrien Mazarguil
  2015-11-01 10:26   ` [dpdk-dev] [PATCH v2 00/16] Enhance mlx5 with Mellanox OFED 3.1 Thomas Monjalon
  16 siblings, 0 replies; 39+ messages in thread
From: Adrien Mazarguil @ 2015-10-30 18:55 UTC (permalink / raw)
  To: dev; +Cc: Yaacov Hazan

From: Yaacov Hazan <yaacovh@mellanox.com>

Seen with GCC < 4.6:

 error: unknown field ‘tcp_udp’ specified in initializer
 error: extra brace group at end of initializer

Static initialization of anonymous structs/unions is a C11 feature
properly supported only since GCC 4.6.

Work around compilation errors with older versions by expanding
struct ibv_exp_flow_spec into struct hash_rxq_init.

Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx5/mlx5_rxtx.h | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 25e256f..15c4bc8 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -153,7 +153,18 @@ struct hash_rxq_init {
 	uint64_t hash_fields; /* Fields that participate in the hash. */
 	uint64_t dpdk_rss_hf; /* Matching DPDK RSS hash fields. */
 	unsigned int flow_priority; /* Flow priority to use. */
-	struct ibv_exp_flow_spec flow_spec; /* Flow specification template. */
+	union {
+		struct {
+			enum ibv_exp_flow_spec_type type;
+			uint16_t size;
+		} hdr;
+		struct ibv_exp_flow_spec_tcp_udp tcp_udp;
+		struct ibv_exp_flow_spec_ipv4 ipv4;
+#ifdef HAVE_FLOW_SPEC_IPV6
+		struct ibv_exp_flow_spec_ipv6 ipv6;
+#endif /* HAVE_FLOW_SPEC_IPV6 */
+		struct ibv_exp_flow_spec_eth eth;
+	} flow_spec; /* Flow specification template. */
 	const struct hash_rxq_init *underlayer; /* Pointer to underlayer. */
 };
 
-- 
2.1.0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [dpdk-dev] [PATCH v2 16/16] doc: update mlx5 documentation
  2015-10-30 18:55 ` [dpdk-dev] [PATCH v2 00/16] " Adrien Mazarguil
                     ` (14 preceding siblings ...)
  2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 15/16] mlx5: fix compilation error with GCC < 4.6 Adrien Mazarguil
@ 2015-10-30 18:55   ` Adrien Mazarguil
  2015-11-01 10:26   ` [dpdk-dev] [PATCH v2 00/16] Enhance mlx5 with Mellanox OFED 3.1 Thomas Monjalon
  16 siblings, 0 replies; 39+ messages in thread
From: Adrien Mazarguil @ 2015-10-30 18:55 UTC (permalink / raw)
  To: dev

Add new features related to Mellanox OFED 3.1 support.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 doc/guides/nics/mlx5.rst | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index fdb621c..2d68914 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -73,6 +73,25 @@ long as they share the same MAC address.
 Enabling librte_pmd_mlx5 causes DPDK applications to be linked against
 libibverbs.
 
+Features
+--------
+
+- Multiple TX and RX queues.
+- Support for scattered TX and RX frames.
+- IPv4, TCPv4 and UDPv4 RSS on any number of queues.
+- Several RSS hash keys, one for each flow type.
+- Support for multiple MAC addresses.
+- VLAN filtering.
+- Promiscuous mode.
+
+Limitations
+-----------
+
+- IPv6 and inner VXLAN RSS are not supported yet.
+- Port statistics through software counters only.
+- No allmulticast mode.
+- Hardware checksum offloads are not supported yet.
+
 Configuration
 -------------
 
@@ -171,6 +190,13 @@ DPDK and must be installed separately:
    Both libraries are BSD and GPL licensed. Linux kernel modules are GPL
    licensed.
 
+Currently supported by DPDK:
+
+- Mellanox OFED **3.1**.
+- Minimum firmware version:
+  - ConnectX-4: **12.12.0780**.
+  - ConnectX-4 Lx: **14.12.0780**.
+
 Getting Mellanox OFED
 ~~~~~~~~~~~~~~~~~~~~~
 
-- 
2.1.0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [dpdk-dev] [PATCH v2 00/16] Enhance mlx5 with Mellanox OFED 3.1
  2015-10-30 18:55 ` [dpdk-dev] [PATCH v2 00/16] " Adrien Mazarguil
                     ` (15 preceding siblings ...)
  2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 16/16] doc: update mlx5 documentation Adrien Mazarguil
@ 2015-11-01 10:26   ` Thomas Monjalon
  16 siblings, 0 replies; 39+ messages in thread
From: Thomas Monjalon @ 2015-11-01 10:26 UTC (permalink / raw)
  To: Adrien Mazarguil; +Cc: dev, Yaacov Hazan

2015-10-30 19:55, Adrien Mazarguil:
> Mellanox OFED 3.1 [1] comes with improved APIs that Mellanox ConnectX-4
> (mlx5) adapters can take advantage of, such as:
> 
> - Separate post and doorbell operations on all queues.
> - Lightweight RX queues called Work Queues (WQs).
> - Low-level RSS indirection table and hash key configuration.
> 
> This patchset enhances mlx5 with all of these for better performance and
> flexibility. Documentation is updated accordingly.

Applied, thanks

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2015-11-01 10:28 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-05 17:54 [dpdk-dev] [PATCH 00/17] Enhance mlx5 with Mellanox OFED 3.1 Adrien Mazarguil
2015-10-05 17:54 ` [dpdk-dev] [PATCH 01/17] mlx5: use fast Verbs interface for scattered RX operation Adrien Mazarguil
2015-10-05 17:54 ` [dpdk-dev] [PATCH 02/17] mlx5: get rid of the WR structure in RX queue elements Adrien Mazarguil
2015-10-05 17:54 ` [dpdk-dev] [PATCH 03/17] mlx5: refactor RX code for the new Verbs RSS API Adrien Mazarguil
2015-10-05 17:54 ` [dpdk-dev] [PATCH 04/17] mlx5: restore allmulti and promisc modes after device restart Adrien Mazarguil
2015-10-05 17:54 ` [dpdk-dev] [PATCH 05/17] mlx5: use separate indirection table for default hash RX queue Adrien Mazarguil
2015-10-05 17:54 ` [dpdk-dev] [PATCH 06/17] mlx5: adapt indirection table size depending on RX queues number Adrien Mazarguil
2015-10-05 17:54 ` [dpdk-dev] [PATCH 07/17] mlx5: define specific flow steering rules for each hash RX QP Adrien Mazarguil
2015-10-05 17:54 ` [dpdk-dev] [PATCH 08/17] mlx5: use alternate method to configure promiscuous mode Adrien Mazarguil
2015-10-05 17:54 ` [dpdk-dev] [PATCH 09/17] mlx5: add RSS hash update/get Adrien Mazarguil
2015-10-05 17:54 ` [dpdk-dev] [PATCH 10/17] mlx5: use one RSS hash key per flow type Adrien Mazarguil
2015-10-05 17:54 ` [dpdk-dev] [PATCH 11/17] app/testpmd: add missing type to RSS hash commands Adrien Mazarguil
2015-10-05 17:54 ` [dpdk-dev] [PATCH 12/17] app/testpmd: fix missing initialization in the RSS hash show command Adrien Mazarguil
2015-10-05 17:54 ` [dpdk-dev] [PATCH 13/17] mlx5: remove normal MAC flows when enabling promiscuous mode Adrien Mazarguil
2015-10-05 17:54 ` [dpdk-dev] [PATCH 14/17] mlx5: use experimental flows in hash RX queues Adrien Mazarguil
2015-10-05 17:54 ` [dpdk-dev] [PATCH 15/17] mlx5: enable multi packet send WR in TX CQ Adrien Mazarguil
2015-10-05 17:54 ` [dpdk-dev] [PATCH 16/17] mlx5: fix compilation error with GCC < 4.6 Adrien Mazarguil
2015-10-05 17:54 ` [dpdk-dev] [PATCH 17/17] doc: update mlx5 documentation Adrien Mazarguil
2015-10-06  8:54 ` [dpdk-dev] [PATCH 00/17] Enhance mlx5 with Mellanox OFED 3.1 Stephen Hemminger
2015-10-06  9:58   ` Vincent JARDIN
2015-10-07 13:30   ` Joongi Kim
2015-10-30 18:55 ` [dpdk-dev] [PATCH v2 00/16] " Adrien Mazarguil
2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 01/16] mlx5: use fast Verbs interface for scattered RX operation Adrien Mazarguil
2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 02/16] mlx5: get rid of the WR structure in RX queue elements Adrien Mazarguil
2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 03/16] mlx5: refactor RX code for the new Verbs RSS API Adrien Mazarguil
2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 04/16] mlx5: use separate indirection table for default hash RX queue Adrien Mazarguil
2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 05/16] mlx5: adapt indirection table size depending on RX queues number Adrien Mazarguil
2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 06/16] mlx5: define specific flow steering rules for each hash RX QP Adrien Mazarguil
2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 07/16] mlx5: use alternate method to configure promisc and allmulti modes Adrien Mazarguil
2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 08/16] mlx5: add RSS hash update/get Adrien Mazarguil
2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 09/16] mlx5: use one RSS hash key per flow type Adrien Mazarguil
2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 10/16] app/testpmd: add missing type to RSS hash commands Adrien Mazarguil
2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 11/16] app/testpmd: fix missing initialization in the RSS hash show command Adrien Mazarguil
2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 12/16] mlx5: disable useless flows in promiscuous mode Adrien Mazarguil
2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 13/16] mlx5: add IPv6 RSS support using experimental flows Adrien Mazarguil
2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 14/16] mlx5: enable multi packet send WR in TX CQ Adrien Mazarguil
2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 15/16] mlx5: fix compilation error with GCC < 4.6 Adrien Mazarguil
2015-10-30 18:55   ` [dpdk-dev] [PATCH v2 16/16] doc: update mlx5 documentation Adrien Mazarguil
2015-11-01 10:26   ` [dpdk-dev] [PATCH v2 00/16] Enhance mlx5 with Mellanox OFED 3.1 Thomas Monjalon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).