DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 0/4] net/mlx5: keep indirect actions across port restart
@ 2021-07-27  7:31 Dmitry Kozlyuk
  2021-07-27  7:31 ` [dpdk-dev] [PATCH 1/4] net/mlx5: discover max flow priority using DevX Dmitry Kozlyuk
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: Dmitry Kozlyuk @ 2021-07-27  7:31 UTC (permalink / raw)
  To: dev; +Cc: David Marchand

It was unspecified what happens to indirect actions when a port
is stopped, possibly reconfigured, and started again. MLX5 PMD,
the first one to use indirect actions, intended to keep them across
such a sequence, but the implementation was buggy. Patches 1-3 fix
the PMD behavior, patch 4 adds common specification with rationale.

Dmitry Kozlyuk (4):
  net/mlx5: discover max flow priority using DevX
  net/mlx5: create drop queue using DevX
  net/mlx5: preserve indirect actions across port restart
  ethdev: document indirect flow action life cycle

 doc/guides/prog_guide/rte_flow.rst |  10 +
 drivers/net/mlx5/linux/mlx5_os.c   |   5 -
 drivers/net/mlx5/mlx5_devx.c       | 204 +++++++++++++++++---
 drivers/net/mlx5/mlx5_flow.c       | 292 ++++++++++++++++++++++++++---
 drivers/net/mlx5/mlx5_flow.h       |   6 +
 drivers/net/mlx5/mlx5_flow_dv.c    | 103 ++++++++++
 drivers/net/mlx5/mlx5_flow_verbs.c |  77 +-------
 drivers/net/mlx5/mlx5_rx.h         |   4 +
 drivers/net/mlx5/mlx5_rxq.c        |  99 ++++++++--
 drivers/net/mlx5/mlx5_trigger.c    |  10 +
 lib/ethdev/rte_flow.h              |   4 +
 11 files changed, 680 insertions(+), 134 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [dpdk-dev] [PATCH 1/4] net/mlx5: discover max flow priority using DevX
  2021-07-27  7:31 [dpdk-dev] [PATCH 0/4] net/mlx5: keep indirect actions across port restart Dmitry Kozlyuk
@ 2021-07-27  7:31 ` Dmitry Kozlyuk
  2021-07-27  7:31 ` [dpdk-dev] [PATCH 2/4] net/mlx5: create drop queue " Dmitry Kozlyuk
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Dmitry Kozlyuk @ 2021-07-27  7:31 UTC (permalink / raw)
  To: dev
  Cc: David Marchand, stable, Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko

Maximum available flow priority was discovered using Verbs API
regardless of the selected flow engine. This required some Verbs
objects to be initialized in order to use DevX engine. Make priority
discovery an engine method and implement it for DevX using its API.

Cc: stable@dpdk.org

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_os.c   |   1 -
 drivers/net/mlx5/mlx5_flow.c       |  98 +++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_flow.h       |   4 ++
 drivers/net/mlx5/mlx5_flow_dv.c    | 103 +++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_flow_verbs.c |  77 +++------------------
 5 files changed, 215 insertions(+), 68 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 4712bd6f9b..671dfface9 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1781,7 +1781,6 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	priv->drop_queue.hrxq = mlx5_drop_action_create(eth_dev);
 	if (!priv->drop_queue.hrxq)
 		goto error;
-	/* Supported Verbs flow priority number detection. */
 	err = mlx5_flow_discover_priorities(eth_dev);
 	if (err < 0) {
 		err = -err;
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index a3fdce685e..e8d2678877 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -9362,3 +9362,101 @@ mlx5_dbg__print_pattern(const struct rte_flow_item *item)
 	}
 	printf("END\n");
 }
+
+/* Map of Verbs to Flow priority with 8 Verbs priorities. */
+static const uint32_t priority_map_3[][MLX5_PRIORITY_MAP_MAX] = {
+	{ 0, 1, 2 }, { 2, 3, 4 }, { 5, 6, 7 },
+};
+
+/* Map of Verbs to Flow priority with 16 Verbs priorities. */
+static const uint32_t priority_map_5[][MLX5_PRIORITY_MAP_MAX] = {
+	{ 0, 1, 2 }, { 3, 4, 5 }, { 6, 7, 8 },
+	{ 9, 10, 11 }, { 12, 13, 14 },
+};
+
+/**
+ * Discover the number of available flow priorities.
+ *
+ * @param dev
+ *   Ethernet device.
+ *
+ * @return
+ *   On success, number of available flow priorities.
+ *   On failure, a negative errno-style code and rte_errno is set.
+ */
+int
+mlx5_flow_discover_priorities(struct rte_eth_dev *dev)
+{
+	static const uint16_t vprio[] = {8, 16};
+	const struct mlx5_priv *priv = dev->data->dev_private;
+	const struct mlx5_flow_driver_ops *fops;
+	enum mlx5_flow_drv_type type;
+	int ret;
+
+	type = mlx5_flow_os_get_type();
+	if (type == MLX5_FLOW_TYPE_MAX) {
+		type = MLX5_FLOW_TYPE_VERBS;
+		if (priv->config.devx && priv->config.dv_flow_en)
+			type = MLX5_FLOW_TYPE_DV;
+	}
+	fops = flow_get_drv_ops(type);
+	if (fops->discover_priorities == NULL) {
+		DRV_LOG(ERR, "Priority discovery not supported");
+		rte_errno = ENOTSUP;
+		return -rte_errno;
+	}
+	ret = fops->discover_priorities(dev, vprio, RTE_DIM(vprio));
+	if (ret < 0)
+		return ret;
+	switch (ret) {
+	case 8:
+		ret = RTE_DIM(priority_map_3);
+		break;
+	case 16:
+		ret = RTE_DIM(priority_map_5);
+		break;
+	default:
+		rte_errno = ENOTSUP;
+		DRV_LOG(ERR,
+			"port %u maximum priority: %d expected 8/16",
+			dev->data->port_id, ret);
+		return -rte_errno;
+	}
+	DRV_LOG(INFO, "port %u supported flow priorities:"
+		" 0-%d for ingress or egress root table,"
+		" 0-%d for non-root table or transfer root table.",
+		dev->data->port_id, ret - 2,
+		MLX5_NON_ROOT_FLOW_MAX_PRIO - 1);
+	return ret;
+}
+
+/**
+ * Adjust flow priority based on the highest layer and the request priority.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] priority
+ *   The rule base priority.
+ * @param[in] subpriority
+ *   The priority based on the items.
+ *
+ * @return
+ *   The new priority.
+ */
+uint32_t
+mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
+			  uint32_t subpriority)
+{
+	uint32_t res = 0;
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	switch (priv->config.flow_prio) {
+	case RTE_DIM(priority_map_3):
+		res = priority_map_3[priority][subpriority];
+		break;
+	case RTE_DIM(priority_map_5):
+		res = priority_map_5[priority][subpriority];
+		break;
+	}
+	return  res;
+}
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 3724293d26..da39eeb596 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1229,6 +1229,9 @@ typedef int (*mlx5_flow_create_def_policy_t)
 			(struct rte_eth_dev *dev);
 typedef void (*mlx5_flow_destroy_def_policy_t)
 			(struct rte_eth_dev *dev);
+typedef int (*mlx5_flow_discover_priorities_t)
+			(struct rte_eth_dev *dev,
+			 const uint16_t *vprio, int vprio_n);
 
 struct mlx5_flow_driver_ops {
 	mlx5_flow_validate_t validate;
@@ -1263,6 +1266,7 @@ struct mlx5_flow_driver_ops {
 	mlx5_flow_action_update_t action_update;
 	mlx5_flow_action_query_t action_query;
 	mlx5_flow_sync_domain_t sync_domain;
+	mlx5_flow_discover_priorities_t discover_priorities;
 };
 
 /* mlx5_flow.c */
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 736227bc0c..b31445b51e 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -17828,6 +17828,108 @@ flow_dv_sync_domain(struct rte_eth_dev *dev, uint32_t domains, uint32_t flags)
 	return 0;
 }
 
+/**
+ * Discover the number of available flow priorities
+ * by trying to create a flow with the highest priority value
+ * for each possible number.
+ *
+ * @param[in] dev
+ *   Ethernet device.
+ * @param[in] vprio
+ *   List of possible number of available priorities.
+ * @param[in] vprio_n
+ *   Size of @p vprio array.
+ * @return
+ *   On success, number of available flow priorities.
+ *   On failure, a negative errno-style code and rte_errno is set.
+ */
+static int
+flow_dv_discover_priorities(struct rte_eth_dev *dev,
+			    const uint16_t *vprio, int vprio_n)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_indexed_pool *pool = priv->sh->ipool[MLX5_IPOOL_MLX5_FLOW];
+	struct rte_flow_item_eth eth;
+	struct rte_flow_item item = {
+		.type = RTE_FLOW_ITEM_TYPE_ETH,
+		.spec = &eth,
+		.mask = &eth,
+	};
+	struct mlx5_flow_dv_matcher matcher = {
+		.mask = {
+			.size = sizeof(matcher.mask.buf),
+		},
+	};
+	union mlx5_flow_tbl_key tbl_key;
+	struct mlx5_flow flow;
+	void *action;
+	struct rte_flow_error error;
+	uint8_t misc_mask;
+	int i, err, ret = -ENOTSUP;
+
+	/*
+	 * Prepare a flow with a catch-all pattern and a drop action.
+	 * Use drop queue, because shared drop action may be unavailable.
+	 */
+	action = priv->drop_queue.hrxq->action;
+	if (action == NULL) {
+		DRV_LOG(ERR, "Priority discovery requires a drop action");
+		rte_errno = ENOTSUP;
+		return -rte_errno;
+	}
+	memset(&flow, 0, sizeof(flow));
+	flow.handle = mlx5_ipool_zmalloc(pool, &flow.handle_idx);
+	if (flow.handle == NULL) {
+		DRV_LOG(ERR, "Cannot create flow handle");
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	flow.ingress = true;
+	flow.dv.value.size = MLX5_ST_SZ_BYTES(fte_match_param);
+	flow.dv.actions[0] = action;
+	flow.dv.actions_n = 1;
+	memset(&eth, 0, sizeof(eth));
+	flow_dv_translate_item_eth(matcher.mask.buf, flow.dv.value.buf,
+				   &item, /* inner */ false, /* group */ 0);
+	matcher.crc = rte_raw_cksum(matcher.mask.buf, matcher.mask.size);
+	for (i = 0; i < vprio_n; i++) {
+		/* Configure the next proposed maximum priority. */
+		matcher.priority = vprio[i] - 1;
+		memset(&tbl_key, 0, sizeof(tbl_key));
+		err = flow_dv_matcher_register(dev, &matcher, &tbl_key, &flow,
+					       /* tunnel */ NULL,
+					       /* group */ 0,
+					       &error);
+		if (err != 0) {
+			/* This action is pure SW and must always succeed. */
+			DRV_LOG(ERR, "Cannot register matcher");
+			ret = -rte_errno;
+			break;
+		}
+		/* Try to apply the flow to HW. */
+		misc_mask = flow_dv_matcher_enable(flow.dv.value.buf);
+		__flow_dv_adjust_buf_size(&flow.dv.value.size, misc_mask);
+		err = mlx5_flow_os_create_flow
+				(flow.handle->dvh.matcher->matcher_object,
+				 (void *)&flow.dv.value, flow.dv.actions_n,
+				 flow.dv.actions, &flow.handle->drv_flow);
+		if (err == 0) {
+			claim_zero(mlx5_flow_os_destroy_flow
+						(flow.handle->drv_flow));
+			flow.handle->drv_flow = NULL;
+		}
+		claim_zero(flow_dv_matcher_release(dev, flow.handle));
+		if (err != 0)
+			break;
+		ret = vprio[i];
+	}
+	mlx5_ipool_free(pool, flow.handle_idx);
+	/* Set rte_errno if no expected priority value matched. */
+	if (ret < 0)
+		rte_errno = -ret;
+	return ret;
+}
+
 const struct mlx5_flow_driver_ops mlx5_flow_dv_drv_ops = {
 	.validate = flow_dv_validate,
 	.prepare = flow_dv_prepare,
@@ -17861,6 +17963,7 @@ const struct mlx5_flow_driver_ops mlx5_flow_dv_drv_ops = {
 	.action_update = flow_dv_action_update,
 	.action_query = flow_dv_action_query,
 	.sync_domain = flow_dv_sync_domain,
+	.discover_priorities = flow_dv_discover_priorities,
 };
 
 #endif /* HAVE_IBV_FLOW_DV_SUPPORT */
diff --git a/drivers/net/mlx5/mlx5_flow_verbs.c b/drivers/net/mlx5/mlx5_flow_verbs.c
index 7b3d0b320d..c87b257b37 100644
--- a/drivers/net/mlx5/mlx5_flow_verbs.c
+++ b/drivers/net/mlx5/mlx5_flow_verbs.c
@@ -28,17 +28,6 @@
 #define VERBS_SPEC_INNER(item_flags) \
 	(!!((item_flags) & MLX5_FLOW_LAYER_TUNNEL) ? IBV_FLOW_SPEC_INNER : 0)
 
-/* Map of Verbs to Flow priority with 8 Verbs priorities. */
-static const uint32_t priority_map_3[][MLX5_PRIORITY_MAP_MAX] = {
-	{ 0, 1, 2 }, { 2, 3, 4 }, { 5, 6, 7 },
-};
-
-/* Map of Verbs to Flow priority with 16 Verbs priorities. */
-static const uint32_t priority_map_5[][MLX5_PRIORITY_MAP_MAX] = {
-	{ 0, 1, 2 }, { 3, 4, 5 }, { 6, 7, 8 },
-	{ 9, 10, 11 }, { 12, 13, 14 },
-};
-
 /* Verbs specification header. */
 struct ibv_spec_header {
 	enum ibv_flow_spec_type type;
@@ -50,13 +39,17 @@ struct ibv_spec_header {
  *
  * @param[in] dev
  *   Pointer to the Ethernet device structure.
- *
+ * @param[in] vprio
+ *   Expected result variants.
+ * @param[in] vprio_n
+ *   Number of entries in @p vprio array.
  * @return
- *   number of supported flow priority on success, a negative errno
+ *   Number of supported flow priority on success, a negative errno
  *   value otherwise and rte_errno is set.
  */
-int
-mlx5_flow_discover_priorities(struct rte_eth_dev *dev)
+static int
+flow_verbs_discover_priorities(struct rte_eth_dev *dev,
+			       const uint16_t *vprio, int vprio_n)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct {
@@ -79,7 +72,6 @@ mlx5_flow_discover_priorities(struct rte_eth_dev *dev)
 	};
 	struct ibv_flow *flow;
 	struct mlx5_hrxq *drop = priv->drop_queue.hrxq;
-	uint16_t vprio[] = { 8, 16 };
 	int i;
 	int priority = 0;
 
@@ -87,7 +79,7 @@ mlx5_flow_discover_priorities(struct rte_eth_dev *dev)
 		rte_errno = ENOTSUP;
 		return -rte_errno;
 	}
-	for (i = 0; i != RTE_DIM(vprio); i++) {
+	for (i = 0; i != vprio_n; i++) {
 		flow_attr.attr.priority = vprio[i] - 1;
 		flow = mlx5_glue->create_flow(drop->qp, &flow_attr.attr);
 		if (!flow)
@@ -95,59 +87,9 @@ mlx5_flow_discover_priorities(struct rte_eth_dev *dev)
 		claim_zero(mlx5_glue->destroy_flow(flow));
 		priority = vprio[i];
 	}
-	switch (priority) {
-	case 8:
-		priority = RTE_DIM(priority_map_3);
-		break;
-	case 16:
-		priority = RTE_DIM(priority_map_5);
-		break;
-	default:
-		rte_errno = ENOTSUP;
-		DRV_LOG(ERR,
-			"port %u verbs maximum priority: %d expected 8/16",
-			dev->data->port_id, priority);
-		return -rte_errno;
-	}
-	DRV_LOG(INFO, "port %u supported flow priorities:"
-		" 0-%d for ingress or egress root table,"
-		" 0-%d for non-root table or transfer root table.",
-		dev->data->port_id, priority - 2,
-		MLX5_NON_ROOT_FLOW_MAX_PRIO - 1);
 	return priority;
 }
 
-/**
- * Adjust flow priority based on the highest layer and the request priority.
- *
- * @param[in] dev
- *   Pointer to the Ethernet device structure.
- * @param[in] priority
- *   The rule base priority.
- * @param[in] subpriority
- *   The priority based on the items.
- *
- * @return
- *   The new priority.
- */
-uint32_t
-mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
-				   uint32_t subpriority)
-{
-	uint32_t res = 0;
-	struct mlx5_priv *priv = dev->data->dev_private;
-
-	switch (priv->config.flow_prio) {
-	case RTE_DIM(priority_map_3):
-		res = priority_map_3[priority][subpriority];
-		break;
-	case RTE_DIM(priority_map_5):
-		res = priority_map_5[priority][subpriority];
-		break;
-	}
-	return  res;
-}
-
 /**
  * Get Verbs flow counter by index.
  *
@@ -2093,4 +2035,5 @@ const struct mlx5_flow_driver_ops mlx5_flow_verbs_drv_ops = {
 	.destroy = flow_verbs_destroy,
 	.query = flow_verbs_query,
 	.sync_domain = flow_verbs_sync_domain,
+	.discover_priorities = flow_verbs_discover_priorities,
 };
-- 
2.25.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [dpdk-dev] [PATCH 2/4] net/mlx5: create drop queue using DevX
  2021-07-27  7:31 [dpdk-dev] [PATCH 0/4] net/mlx5: keep indirect actions across port restart Dmitry Kozlyuk
  2021-07-27  7:31 ` [dpdk-dev] [PATCH 1/4] net/mlx5: discover max flow priority using DevX Dmitry Kozlyuk
@ 2021-07-27  7:31 ` Dmitry Kozlyuk
  2021-07-27  7:31 ` [dpdk-dev] [PATCH 3/4] net/mlx5: preserve indirect actions across port restart Dmitry Kozlyuk
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Dmitry Kozlyuk @ 2021-07-27  7:31 UTC (permalink / raw)
  To: dev
  Cc: David Marchand, stable, Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko

Drop queue creation and destruction were not implemented for DevX
flow engine and Verbs engine methods were used as a workaround.
Implement these methods for DevX so that there is a valid queue ID
that can be used regardless of queue configuration via API.

Cc: stable@dpdk.org

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_os.c |   4 -
 drivers/net/mlx5/mlx5_devx.c     | 204 ++++++++++++++++++++++++++-----
 2 files changed, 176 insertions(+), 32 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 671dfface9..791024c2c6 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1756,10 +1756,6 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	}
 	if (config->devx && config->dv_flow_en && config->dest_tir) {
 		priv->obj_ops = devx_obj_ops;
-		priv->obj_ops.drop_action_create =
-						ibv_obj_ops.drop_action_create;
-		priv->obj_ops.drop_action_destroy =
-						ibv_obj_ops.drop_action_destroy;
 #ifndef HAVE_MLX5DV_DEVX_UAR_OFFSET
 		priv->obj_ops.txq_obj_modify = ibv_obj_ops.txq_obj_modify;
 #else
diff --git a/drivers/net/mlx5/mlx5_devx.c b/drivers/net/mlx5/mlx5_devx.c
index a1db53577a..447d6bafb9 100644
--- a/drivers/net/mlx5/mlx5_devx.c
+++ b/drivers/net/mlx5/mlx5_devx.c
@@ -226,17 +226,17 @@ mlx5_rx_devx_get_event(struct mlx5_rxq_obj *rxq_obj)
  *
  * @param dev
  *   Pointer to Ethernet device.
- * @param idx
- *   Queue index in DPDK Rx queue array.
+ * @param rxq_data
+ *   RX queue data.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_rxq_create_devx_rq_resources(struct rte_eth_dev *dev, uint16_t idx)
+mlx5_rxq_create_devx_rq_resources(struct rte_eth_dev *dev,
+				  struct mlx5_rxq_data *rxq_data)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[idx];
 	struct mlx5_rxq_ctrl *rxq_ctrl =
 		container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
 	struct mlx5_devx_create_rq_attr rq_attr = { 0 };
@@ -289,20 +289,20 @@ mlx5_rxq_create_devx_rq_resources(struct rte_eth_dev *dev, uint16_t idx)
  *
  * @param dev
  *   Pointer to Ethernet device.
- * @param idx
- *   Queue index in DPDK Rx queue array.
+ * @param rxq_data
+ *   RX queue data.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_rxq_create_devx_cq_resources(struct rte_eth_dev *dev, uint16_t idx)
+mlx5_rxq_create_devx_cq_resources(struct rte_eth_dev *dev,
+				  struct mlx5_rxq_data *rxq_data)
 {
 	struct mlx5_devx_cq *cq_obj = 0;
 	struct mlx5_devx_cq_attr cq_attr = { 0 };
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_dev_ctx_shared *sh = priv->sh;
-	struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[idx];
 	struct mlx5_rxq_ctrl *rxq_ctrl =
 		container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
 	unsigned int cqe_n = mlx5_rxq_cqe_num(rxq_data);
@@ -497,13 +497,13 @@ mlx5_rxq_devx_obj_new(struct rte_eth_dev *dev, uint16_t idx)
 		tmpl->fd = mlx5_os_get_devx_channel_fd(tmpl->devx_channel);
 	}
 	/* Create CQ using DevX API. */
-	ret = mlx5_rxq_create_devx_cq_resources(dev, idx);
+	ret = mlx5_rxq_create_devx_cq_resources(dev, rxq_data);
 	if (ret) {
 		DRV_LOG(ERR, "Failed to create CQ.");
 		goto error;
 	}
 	/* Create RQ using DevX API. */
-	ret = mlx5_rxq_create_devx_rq_resources(dev, idx);
+	ret = mlx5_rxq_create_devx_rq_resources(dev, rxq_data);
 	if (ret) {
 		DRV_LOG(ERR, "Port %u Rx queue %u RQ creation failure.",
 			dev->data->port_id, idx);
@@ -536,6 +536,11 @@ mlx5_rxq_devx_obj_new(struct rte_eth_dev *dev, uint16_t idx)
  *   Pointer to Ethernet device.
  * @param log_n
  *   Log of number of queues in the array.
+ * @param queues
+ *   List of RX queue indices or NULL, in which case
+ *   the attribute will be filled by drop queue ID.
+ * @param queues_n
+ *   Size of @p queues array or 0 if it is NULL.
  * @param ind_tbl
  *   DevX indirection table object.
  *
@@ -563,6 +568,11 @@ mlx5_devx_ind_table_create_rqt_attr(struct rte_eth_dev *dev,
 	}
 	rqt_attr->rqt_max_size = priv->config.ind_table_max_size;
 	rqt_attr->rqt_actual_size = rqt_n;
+	if (queues == NULL) {
+		for (i = 0; i < rqt_n; i++)
+			rqt_attr->rq_list[i] = priv->drop_queue.rxq->rq->id;
+		return rqt_attr;
+	}
 	for (i = 0; i != queues_n; ++i) {
 		struct mlx5_rxq_data *rxq = (*priv->rxqs)[queues[i]];
 		struct mlx5_rxq_ctrl *rxq_ctrl =
@@ -670,7 +680,8 @@ mlx5_devx_ind_table_destroy(struct mlx5_ind_table_obj *ind_tbl)
  * @param[in] hash_fields
  *   Verbs protocol hash field to make the RSS on.
  * @param[in] ind_tbl
- *   Indirection table for TIR.
+ *   Indirection table for TIR. If table queues array is NULL,
+ *   a TIR for drop queue is assumed.
  * @param[in] tunnel
  *   Tunnel type.
  * @param[out] tir_attr
@@ -686,19 +697,27 @@ mlx5_devx_tir_attr_set(struct rte_eth_dev *dev, const uint8_t *rss_key,
 		       int tunnel, struct mlx5_devx_tir_attr *tir_attr)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[ind_tbl->queues[0]];
-	struct mlx5_rxq_ctrl *rxq_ctrl =
-		container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
-	enum mlx5_rxq_type rxq_obj_type = rxq_ctrl->type;
+	enum mlx5_rxq_type rxq_obj_type;
 	bool lro = true;
 	uint32_t i;
 
-	/* Enable TIR LRO only if all the queues were configured for. */
-	for (i = 0; i < ind_tbl->queues_n; ++i) {
-		if (!(*priv->rxqs)[ind_tbl->queues[i]]->lro) {
-			lro = false;
-			break;
+	/* NULL queues designate drop queue. */
+	if (ind_tbl->queues != NULL) {
+		struct mlx5_rxq_data *rxq_data =
+					(*priv->rxqs)[ind_tbl->queues[0]];
+		struct mlx5_rxq_ctrl *rxq_ctrl =
+			container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
+		rxq_obj_type = rxq_ctrl->type;
+
+		/* Enable TIR LRO only if all the queues were configured for. */
+		for (i = 0; i < ind_tbl->queues_n; ++i) {
+			if (!(*priv->rxqs)[ind_tbl->queues[i]]->lro) {
+				lro = false;
+				break;
+			}
 		}
+	} else {
+		rxq_obj_type = priv->drop_queue.rxq->rxq_ctrl->type;
 	}
 	memset(tir_attr, 0, sizeof(*tir_attr));
 	tir_attr->disp_type = MLX5_TIRC_DISP_TYPE_INDIRECT;
@@ -857,7 +876,7 @@ mlx5_devx_hrxq_modify(struct rte_eth_dev *dev, struct mlx5_hrxq *hrxq,
 }
 
 /**
- * Create a DevX drop action for Rx Hash queue.
+ * Create a DevX drop Rx queue.
  *
  * @param dev
  *   Pointer to Ethernet device.
@@ -866,14 +885,99 @@ mlx5_devx_hrxq_modify(struct rte_eth_dev *dev, struct mlx5_hrxq *hrxq,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_devx_drop_action_create(struct rte_eth_dev *dev)
+mlx5_rxq_devx_obj_drop_create(struct rte_eth_dev *dev)
 {
-	(void)dev;
-	DRV_LOG(ERR, "DevX drop action is not supported yet.");
-	rte_errno = ENOTSUP;
+	struct mlx5_priv *priv = dev->data->dev_private;
+	int socket_id = dev->device->numa_node;
+	struct mlx5_rxq_ctrl *rxq_ctrl;
+	struct mlx5_rxq_data *rxq_data;
+	struct mlx5_rxq_obj *rxq = NULL;
+	int ret;
+
+	/*
+	 * Initialize dummy control structures.
+	 * They are required to hold pointers for cleanup
+	 * and are only accessible via drop queue DevX objects.
+	 */
+	rxq_ctrl = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*rxq_ctrl),
+			       0, socket_id);
+	if (rxq_ctrl == NULL) {
+		DRV_LOG(ERR, "Port %u could not allocate drop queue control",
+			dev->data->port_id);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	rxq = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*rxq), 0, socket_id);
+	if (rxq == NULL) {
+		DRV_LOG(ERR, "Port %u could not allocate drop queue object",
+			dev->data->port_id);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	rxq->rxq_ctrl = rxq_ctrl;
+	rxq_ctrl->type = MLX5_RXQ_TYPE_STANDARD;
+	rxq_ctrl->priv = priv;
+	rxq_ctrl->obj = rxq;
+	rxq_data = &rxq_ctrl->rxq;
+	/* Create CQ using DevX API. */
+	ret = mlx5_rxq_create_devx_cq_resources(dev, rxq_data);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Port %u drop queue CQ creation failed.",
+			dev->data->port_id);
+		goto error;
+	}
+	/* Create RQ using DevX API. */
+	ret = mlx5_rxq_create_devx_rq_resources(dev, rxq_data);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Port %u drop queue RQ creation failed.",
+			dev->data->port_id);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	/* Change queue state to ready. */
+	ret = mlx5_devx_modify_rq(rxq, MLX5_RXQ_MOD_RST2RDY);
+	if (ret != 0)
+		goto error;
+	/* Initialize drop queue. */
+	priv->drop_queue.rxq = rxq;
+	return 0;
+error:
+	ret = rte_errno; /* Save rte_errno before cleanup. */
+	if (rxq != NULL) {
+		if (rxq->rq_obj.rq != NULL)
+			mlx5_devx_rq_destroy(&rxq->rq_obj);
+		if (rxq->cq_obj.cq != NULL)
+			mlx5_devx_cq_destroy(&rxq->cq_obj);
+		if (rxq->devx_channel)
+			mlx5_os_devx_destroy_event_channel
+							(rxq->devx_channel);
+		mlx5_free(rxq);
+	}
+	if (rxq_ctrl != NULL)
+		mlx5_free(rxq_ctrl);
+	rte_errno = ret; /* Restore rte_errno. */
 	return -rte_errno;
 }
 
+/**
+ * Release drop Rx queue resources.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ */
+static void
+mlx5_rxq_devx_obj_drop_release(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rxq_obj *rxq = priv->drop_queue.rxq;
+	struct mlx5_rxq_ctrl *rxq_ctrl = rxq->rxq_ctrl;
+
+	mlx5_rxq_devx_obj_release(rxq);
+	mlx5_free(rxq);
+	mlx5_free(rxq_ctrl);
+	priv->drop_queue.rxq = NULL;
+}
+
 /**
  * Release a drop hash Rx queue.
  *
@@ -883,9 +987,53 @@ mlx5_devx_drop_action_create(struct rte_eth_dev *dev)
 static void
 mlx5_devx_drop_action_destroy(struct rte_eth_dev *dev)
 {
-	(void)dev;
-	DRV_LOG(ERR, "DevX drop action is not supported yet.");
-	rte_errno = ENOTSUP;
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_hrxq *hrxq = priv->drop_queue.hrxq;
+
+	if (hrxq->tir != NULL)
+		mlx5_devx_tir_destroy(hrxq);
+	if (hrxq->ind_table->ind_table != NULL)
+		mlx5_devx_ind_table_destroy(hrxq->ind_table);
+	if (priv->drop_queue.rxq->rq != NULL)
+		mlx5_rxq_devx_obj_drop_release(dev);
+}
+
+/**
+ * Create a DevX drop action for Rx Hash queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_devx_drop_action_create(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_hrxq *hrxq = priv->drop_queue.hrxq;
+	int ret;
+
+	ret = mlx5_rxq_devx_obj_drop_create(dev);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Cannot create drop RX queue");
+		return ret;
+	}
+	/* hrxq->ind_table queues are NULL, drop RX queue ID will be used */
+	ret = mlx5_devx_ind_table_new(dev, 0, hrxq->ind_table);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Cannot create drop hash RX queue indirection table");
+		goto error;
+	}
+	ret = mlx5_devx_hrxq_new(dev, hrxq, /* tunnel */ false);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Cannot create drop hash RX queue");
+		goto error;
+	}
+	return 0;
+error:
+	mlx5_devx_drop_action_destroy(dev);
+	return ret;
 }
 
 /**
-- 
2.25.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [dpdk-dev] [PATCH 3/4] net/mlx5: preserve indirect actions across port restart
  2021-07-27  7:31 [dpdk-dev] [PATCH 0/4] net/mlx5: keep indirect actions across port restart Dmitry Kozlyuk
  2021-07-27  7:31 ` [dpdk-dev] [PATCH 1/4] net/mlx5: discover max flow priority using DevX Dmitry Kozlyuk
  2021-07-27  7:31 ` [dpdk-dev] [PATCH 2/4] net/mlx5: create drop queue " Dmitry Kozlyuk
@ 2021-07-27  7:31 ` Dmitry Kozlyuk
  2021-07-27  7:31 ` [dpdk-dev] [PATCH 4/4] ethdev: document indirect flow action life cycle Dmitry Kozlyuk
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Dmitry Kozlyuk @ 2021-07-27  7:31 UTC (permalink / raw)
  To: dev
  Cc: David Marchand, bingz, stable, Matan Azrad, Shahaf Shuler,
	Viacheslav Ovsiienko

MLX5 PMD uses reference counting to manage RX queue resources.
After port stop shared RSS actions kept references to RX queues,
preventing resource release. As a result, internal PMD mempool
for such queues had been exhausted after a number of port restarts.
Diagnostic message from rte_eth_dev_start():

    Rx queue allocation failed: Cannot allocate memory

Dereference RX queues used by indirect actions on port stop (detach)
and restore references on port start (attach) in order to allow RX queue
resource release, but keep indirect actions across the port restart.
Replace queue IDs in HW by drop queue ID on detach and restore actual
queue IDs on attach.

Fixes: 4b61b8774be9 ("ethdev: introduce indirect flow action")
Cc: bingz@nvidia.com
Cc: stable@dpdk.org

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/net/mlx5/mlx5_flow.c    | 194 ++++++++++++++++++++++++++++----
 drivers/net/mlx5/mlx5_flow.h    |   2 +
 drivers/net/mlx5/mlx5_rx.h      |   4 +
 drivers/net/mlx5/mlx5_rxq.c     |  99 ++++++++++++++--
 drivers/net/mlx5/mlx5_trigger.c |  10 ++
 5 files changed, 275 insertions(+), 34 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index e8d2678877..5343720ec9 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -1524,6 +1524,58 @@ mlx5_flow_validate_action_queue(const struct rte_flow_action *action,
 	return 0;
 }
 
+/**
+ * Validate queue numbers for device RSS.
+ *
+ * @param[in] dev
+ *   Configured device.
+ * @param[in] queues
+ *   Array of queue numbers.
+ * @param[in] queues_n
+ *   Size of the @p queues array.
+ * @param[out] error
+ *   On error, filled with a textual error description.
+ * @param[out] queue
+ *   On error, filled with an offending queue index in @p queues array.
+ *
+ * @return
+ *   0 on success, a negative errno code on error.
+ */
+static int
+mlx5_validate_rss_queues(const struct rte_eth_dev *dev,
+			 const uint16_t *queues, uint32_t queues_n,
+			 const char **error, uint32_t *queue_idx)
+{
+	const struct mlx5_priv *priv = dev->data->dev_private;
+	enum mlx5_rxq_type rxq_type = MLX5_RXQ_TYPE_UNDEFINED;
+	uint32_t i;
+
+	for (i = 0; i != queues_n; ++i) {
+		struct mlx5_rxq_ctrl *rxq_ctrl;
+
+		if (queues[i] >= priv->rxqs_n) {
+			*error = "queue index out of range";
+			*queue_idx = i;
+			return -EINVAL;
+		}
+		if (!(*priv->rxqs)[queues[i]]) {
+			*error =  "queue is not configured";
+			*queue_idx = i;
+			return -EINVAL;
+		}
+		rxq_ctrl = container_of((*priv->rxqs)[queues[i]],
+					struct mlx5_rxq_ctrl, rxq);
+		if (i == 0)
+			rxq_type = rxq_ctrl->type;
+		if (rxq_type != rxq_ctrl->type) {
+			*error = "combining hairpin and regular RSS queues is not supported";
+			*queue_idx = i;
+			return -ENOTSUP;
+		}
+	}
+	return 0;
+}
+
 /*
  * Validate the rss action.
  *
@@ -1544,8 +1596,9 @@ mlx5_validate_action_rss(struct rte_eth_dev *dev,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	const struct rte_flow_action_rss *rss = action->conf;
-	enum mlx5_rxq_type rxq_type = MLX5_RXQ_TYPE_UNDEFINED;
-	unsigned int i;
+	int ret;
+	const char *message;
+	uint32_t queue_idx;
 
 	if (rss->func != RTE_ETH_HASH_FUNCTION_DEFAULT &&
 	    rss->func != RTE_ETH_HASH_FUNCTION_TOEPLITZ)
@@ -1609,27 +1662,12 @@ mlx5_validate_action_rss(struct rte_eth_dev *dev,
 		return rte_flow_error_set(error, EINVAL,
 					  RTE_FLOW_ERROR_TYPE_ACTION_CONF,
 					  NULL, "No queues configured");
-	for (i = 0; i != rss->queue_num; ++i) {
-		struct mlx5_rxq_ctrl *rxq_ctrl;
-
-		if (rss->queue[i] >= priv->rxqs_n)
-			return rte_flow_error_set
-				(error, EINVAL,
-				 RTE_FLOW_ERROR_TYPE_ACTION_CONF,
-				 &rss->queue[i], "queue index out of range");
-		if (!(*priv->rxqs)[rss->queue[i]])
-			return rte_flow_error_set
-				(error, EINVAL, RTE_FLOW_ERROR_TYPE_ACTION_CONF,
-				 &rss->queue[i], "queue is not configured");
-		rxq_ctrl = container_of((*priv->rxqs)[rss->queue[i]],
-					struct mlx5_rxq_ctrl, rxq);
-		if (i == 0)
-			rxq_type = rxq_ctrl->type;
-		if (rxq_type != rxq_ctrl->type)
-			return rte_flow_error_set
-				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION_CONF,
-				 &rss->queue[i],
-				 "combining hairpin and regular RSS queues is not supported");
+	ret = mlx5_validate_rss_queues(dev, rss->queue, rss->queue_num,
+				       &message, &queue_idx);
+	if (ret != 0) {
+		return rte_flow_error_set(error, -ret,
+					  RTE_FLOW_ERROR_TYPE_ACTION_CONF,
+					  &rss->queue[queue_idx], message);
 	}
 	return 0;
 }
@@ -8493,6 +8531,116 @@ mlx5_action_handle_flush(struct rte_eth_dev *dev)
 	return ret;
 }
 
+/**
+ * Validate existing indirect actions against current device configuration
+ * and attach them to device resources.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_action_handle_attach(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_indexed_pool *ipool =
+			priv->sh->ipool[MLX5_IPOOL_RSS_SHARED_ACTIONS];
+	struct mlx5_shared_action_rss *shared_rss, *shared_rss_last;
+	int ret = 0;
+	uint32_t idx;
+
+	ILIST_FOREACH(ipool, priv->rss_shared_actions, idx, shared_rss, next) {
+		struct mlx5_ind_table_obj *ind_tbl = shared_rss->ind_tbl;
+		const char *message;
+		uint32_t queue_idx;
+
+		ret = mlx5_validate_rss_queues(dev, ind_tbl->queues,
+					       ind_tbl->queues_n,
+					       &message, &queue_idx);
+		if (ret != 0) {
+			DRV_LOG(ERR, "Port %u cannot use queue %u in RSS: %s",
+				dev->data->port_id, ind_tbl->queues[queue_idx],
+				message);
+			break;
+		}
+	}
+	if (ret != 0)
+		return ret;
+	ILIST_FOREACH(ipool, priv->rss_shared_actions, idx, shared_rss, next) {
+		struct mlx5_ind_table_obj *ind_tbl = shared_rss->ind_tbl;
+
+		ret = mlx5_ind_table_obj_attach(dev, ind_tbl);
+		if (ret != 0) {
+			DRV_LOG(ERR, "Port %u could not attach "
+				"indirection table obj %p",
+				dev->data->port_id, (void *)ind_tbl);
+			goto error;
+		}
+	}
+	return 0;
+error:
+	shared_rss_last = shared_rss;
+	ILIST_FOREACH(ipool, priv->rss_shared_actions, idx, shared_rss, next) {
+		struct mlx5_ind_table_obj *ind_tbl = shared_rss->ind_tbl;
+
+		if (shared_rss == shared_rss_last)
+			break;
+		if (mlx5_ind_table_obj_detach(dev, ind_tbl) != 0)
+			DRV_LOG(CRIT, "Port %u could not detach "
+				"indirection table obj %p on rollback",
+				dev->data->port_id, (void *)ind_tbl);
+	}
+	return ret;
+}
+
+/**
+ * Detach indirect actions of the device from its resources.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_action_handle_detach(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_indexed_pool *ipool =
+			priv->sh->ipool[MLX5_IPOOL_RSS_SHARED_ACTIONS];
+	struct mlx5_shared_action_rss *shared_rss, *shared_rss_last;
+	int ret = 0;
+	uint32_t idx;
+
+	ILIST_FOREACH(ipool, priv->rss_shared_actions, idx, shared_rss, next) {
+		struct mlx5_ind_table_obj *ind_tbl = shared_rss->ind_tbl;
+
+		ret = mlx5_ind_table_obj_detach(dev, ind_tbl);
+		if (ret != 0) {
+			DRV_LOG(ERR, "Port %u could not detach "
+				"indirection table obj %p",
+				dev->data->port_id, (void *)ind_tbl);
+			goto error;
+		}
+	}
+	return 0;
+error:
+	shared_rss_last = shared_rss;
+	ILIST_FOREACH(ipool, priv->rss_shared_actions, idx, shared_rss, next) {
+		struct mlx5_ind_table_obj *ind_tbl = shared_rss->ind_tbl;
+
+		if (shared_rss == shared_rss_last)
+			break;
+		if (mlx5_ind_table_obj_attach(dev, ind_tbl) != 0)
+			DRV_LOG(CRIT, "Port %u could not attach "
+				"indirection table obj %p on rollback",
+				dev->data->port_id, (void *)ind_tbl);
+	}
+	return ret;
+}
+
 #ifndef HAVE_MLX5DV_DR
 #define MLX5_DOMAIN_SYNC_FLOW ((1 << 0) | (1 << 1))
 #else
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index da39eeb596..251d643f8c 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1575,6 +1575,8 @@ struct mlx5_flow_meter_sub_policy *mlx5_flow_meter_sub_policy_rss_prepare
 void mlx5_flow_destroy_sub_policy_with_rxq(struct rte_eth_dev *dev,
 		struct mlx5_flow_meter_policy *mtr_policy);
 int mlx5_flow_dv_discover_counter_offset_support(struct rte_eth_dev *dev);
+int mlx5_action_handle_attach(struct rte_eth_dev *dev);
+int mlx5_action_handle_detach(struct rte_eth_dev *dev);
 int mlx5_action_handle_flush(struct rte_eth_dev *dev);
 void mlx5_release_tunnel_hub(struct mlx5_dev_ctx_shared *sh, uint16_t port_id);
 int mlx5_alloc_tunnel_hub(struct mlx5_dev_ctx_shared *sh);
diff --git a/drivers/net/mlx5/mlx5_rx.h b/drivers/net/mlx5/mlx5_rx.h
index 3f2b99fb65..7319ad0264 100644
--- a/drivers/net/mlx5/mlx5_rx.h
+++ b/drivers/net/mlx5/mlx5_rx.h
@@ -222,6 +222,10 @@ int mlx5_ind_table_obj_modify(struct rte_eth_dev *dev,
 			      struct mlx5_ind_table_obj *ind_tbl,
 			      uint16_t *queues, const uint32_t queues_n,
 			      bool standalone);
+int mlx5_ind_table_obj_attach(struct rte_eth_dev *dev,
+			      struct mlx5_ind_table_obj *ind_tbl);
+int mlx5_ind_table_obj_detach(struct rte_eth_dev *dev,
+			      struct mlx5_ind_table_obj *ind_tbl);
 struct mlx5_list_entry *mlx5_hrxq_create_cb(void *tool_ctx, void *cb_ctx);
 int mlx5_hrxq_match_cb(void *tool_ctx, struct mlx5_list_entry *entry,
 		       void *cb_ctx);
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 49165f482e..1140f6067e 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -2024,6 +2024,26 @@ mlx5_ind_table_obj_new(struct rte_eth_dev *dev, const uint16_t *queues,
 	return ind_tbl;
 }
 
+static int
+mlx5_ind_table_obj_check_standalone(struct rte_eth_dev *dev __rte_unused,
+				    struct mlx5_ind_table_obj *ind_tbl)
+{
+	uint32_t refcnt;
+
+	refcnt = __atomic_load_n(&ind_tbl->refcnt, __ATOMIC_RELAXED);
+	if (refcnt <= 1)
+		return 0;
+	/*
+	 * Modification of indirection tables having more than 1
+	 * reference is unsupported.
+	 */
+	DRV_LOG(DEBUG,
+		"Port %u cannot modify indirection table %p (refcnt %u > 1).",
+		dev->data->port_id, (void *)ind_tbl, refcnt);
+	rte_errno = EINVAL;
+	return -rte_errno;
+}
+
 /**
  * Modify an indirection table.
  *
@@ -2056,18 +2076,8 @@ mlx5_ind_table_obj_modify(struct rte_eth_dev *dev,
 
 	MLX5_ASSERT(standalone);
 	RTE_SET_USED(standalone);
-	if (__atomic_load_n(&ind_tbl->refcnt, __ATOMIC_RELAXED) > 1) {
-		/*
-		 * Modification of indirection ntables having more than 1
-		 * reference unsupported. Intended for standalone indirection
-		 * tables only.
-		 */
-		DRV_LOG(DEBUG,
-			"Port %u cannot modify indirection table (refcnt> 1).",
-			dev->data->port_id);
-		rte_errno = EINVAL;
+	if (mlx5_ind_table_obj_check_standalone(dev, ind_tbl) < 0)
 		return -rte_errno;
-	}
 	for (i = 0; i != queues_n; ++i) {
 		if (!mlx5_rxq_get(dev, queues[i])) {
 			ret = -rte_errno;
@@ -2093,6 +2103,73 @@ mlx5_ind_table_obj_modify(struct rte_eth_dev *dev,
 	return ret;
 }
 
+/**
+ * Attach an indirection table to its queues.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param ind_table
+ *   Indirection table to attach.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_ind_table_obj_attach(struct rte_eth_dev *dev,
+			  struct mlx5_ind_table_obj *ind_tbl)
+{
+	unsigned int i;
+	int ret;
+
+	ret = mlx5_ind_table_obj_modify(dev, ind_tbl, ind_tbl->queues,
+					ind_tbl->queues_n, true);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Port %u could not modify indirect table obj %p",
+			dev->data->port_id, (void *)ind_tbl);
+		return ret;
+	}
+	for (i = 0; i < ind_tbl->queues_n; i++)
+		mlx5_rxq_get(dev, ind_tbl->queues[i]);
+	return 0;
+}
+
+/**
+ * Detach an indirection table from its queues.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param ind_table
+ *   Indirection table to detach.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_ind_table_obj_detach(struct rte_eth_dev *dev,
+			  struct mlx5_ind_table_obj *ind_tbl)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const unsigned int n = rte_is_power_of_2(ind_tbl->queues_n) ?
+			       log2above(ind_tbl->queues_n) :
+			       log2above(priv->config.ind_table_max_size);
+	unsigned int i;
+	int ret;
+
+	ret = mlx5_ind_table_obj_check_standalone(dev, ind_tbl);
+	if (ret != 0)
+		return ret;
+	MLX5_ASSERT(priv->obj_ops.ind_table_modify);
+	ret = priv->obj_ops.ind_table_modify(dev, n, NULL, 0, ind_tbl);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Port %u could not modify indirect table obj %p",
+			dev->data->port_id, (void *)ind_tbl);
+		return ret;
+	}
+	for (i = 0; i < ind_tbl->queues_n; i++)
+		mlx5_rxq_release(dev, ind_tbl->queues[i]);
+	return ret;
+}
+
 int
 mlx5_hrxq_match_cb(void *tool_ctx __rte_unused, struct mlx5_list_entry *entry,
 		   void *cb_ctx)
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index a9d5d58fd9..6761a84a68 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -14,6 +14,7 @@
 #include <mlx5_malloc.h>
 
 #include "mlx5.h"
+#include "mlx5_flow.h"
 #include "mlx5_mr.h"
 #include "mlx5_rx.h"
 #include "mlx5_tx.h"
@@ -1115,6 +1116,14 @@ mlx5_dev_start(struct rte_eth_dev *dev)
 	mlx5_rxq_timestamp_set(dev);
 	/* Set a mask and offset of scheduling on timestamp into Tx queues. */
 	mlx5_txq_dynf_timestamp_set(dev);
+	/* Attach indirection table objects detached on port stop. */
+	ret = mlx5_action_handle_attach(dev);
+	if (ret) {
+		DRV_LOG(ERR,
+			"port %u failed to attach indirect actions: %s",
+			dev->data->port_id, rte_strerror(rte_errno));
+		goto error;
+	}
 	/*
 	 * In non-cached mode, it only needs to start the default mreg copy
 	 * action and no flow created by application exists anymore.
@@ -1187,6 +1196,7 @@ mlx5_dev_stop(struct rte_eth_dev *dev)
 	/* All RX queue flags will be cleared in the flush interface. */
 	mlx5_flow_list_flush(dev, MLX5_FLOW_TYPE_GEN, true);
 	mlx5_flow_meter_rxq_flush(dev);
+	mlx5_action_handle_detach(dev);
 	mlx5_rx_intr_vec_disable(dev);
 	priv->sh->port[priv->dev_port - 1].ih_port_id = RTE_MAX_ETHPORTS;
 	priv->sh->port[priv->dev_port - 1].devx_ih_port_id = RTE_MAX_ETHPORTS;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [dpdk-dev] [PATCH 4/4] ethdev: document indirect flow action life cycle
  2021-07-27  7:31 [dpdk-dev] [PATCH 0/4] net/mlx5: keep indirect actions across port restart Dmitry Kozlyuk
                   ` (2 preceding siblings ...)
  2021-07-27  7:31 ` [dpdk-dev] [PATCH 3/4] net/mlx5: preserve indirect actions across port restart Dmitry Kozlyuk
@ 2021-07-27  7:31 ` Dmitry Kozlyuk
  2021-07-28  9:50   ` Ori Kam
  2021-07-28  8:05 ` [dpdk-dev] [PATCH 0/4] net/mlx5: keep indirect actions across port restart Andrew Rybchenko
  2021-07-29 14:00 ` [dpdk-dev] [PATCH v2 " Dmitry Kozlyuk
  5 siblings, 1 reply; 16+ messages in thread
From: Dmitry Kozlyuk @ 2021-07-27  7:31 UTC (permalink / raw)
  To: dev
  Cc: David Marchand, bingz, stable, Matan Azrad, Ori Kam,
	Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko

rte_flow_action_handle_create() did not specify what happens
with an indirect action when device is stopped, possibly reconfigured,
and started again.

It is proposed that indirect actions persisted across such a sequence.
This allows for easier API usage and better HW resources utilization
by saving indirect actions flush and re-creation with associated error
handling and rollback.

If between stop and start a device is reconfigured in a way that is
incompatible with an existing indirect action, PMD is required to report
an error at the device start. This is mandatory, because flow API does
not supply users with capabilities, so this is the only way for a user
to learn that configuration is invalid. Errors are not reported
at configuration stage to give the user a chance to remove or change
offending actions. For example, if number of queues changes and an RSS
indirect action specifies queues that went away, user must update
the action before starting the device. PMD is not allowed to silently
adjust indirect actions (in the same example, to remove queues from
the RSS), so that all configuration is explicit.

Fixes: 4b61b8774be9 ("ethdev: introduce indirect flow action")
Cc: bingz@nvidia.com
Cc: stable@dpdk.org

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 doc/guides/prog_guide/rte_flow.rst | 10 ++++++++++
 lib/ethdev/rte_flow.h              |  4 ++++
 2 files changed, 14 insertions(+)

diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index 2b42d5ec8c..06dd06d9a6 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -2785,6 +2785,16 @@ updated depend on the type of the ``action`` and different for every type.
 The indirect action specified data (e.g. counter) can be queried by
 ``rte_flow_action_handle_query()``.
 
+Indirect actions persist across device configure, stop, and start.
+If a new configuration is incompatible with an existing indirect action,
+the start operation will fail. "Incompatible" means that if this action
+was destroyed and created again, creation would fail.
+It is a programmer's responsibility to remove or update offending actions.
+
+PMD developers should use the same diagnostics for ``rte_eth_dev_start()``
+as for ``rte_flow_action_handle_create()``. PMD is not allowed to silently
+ignore or correct offending actions.
+
 .. _table_rte_flow_action_handle:
 
 .. table:: INDIRECT
diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
index 70f455d47d..f571a27fe7 100644
--- a/lib/ethdev/rte_flow.h
+++ b/lib/ethdev/rte_flow.h
@@ -3969,6 +3969,10 @@ struct rte_flow_indir_action_conf {
  * The created object handle has single state and configuration
  * across all the flow rules using it.
  *
+ * Indirect actions persist across device configure, stop, and start.
+ * If a new configuration is incompatible with an existing indirect
+ * action, rte_eth_dev_start() will fail.
+ *
  * @param[in] port_id
  *    The port identifier of the Ethernet device.
  * @param[in] conf
-- 
2.25.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4] net/mlx5: keep indirect actions across port restart
  2021-07-27  7:31 [dpdk-dev] [PATCH 0/4] net/mlx5: keep indirect actions across port restart Dmitry Kozlyuk
                   ` (3 preceding siblings ...)
  2021-07-27  7:31 ` [dpdk-dev] [PATCH 4/4] ethdev: document indirect flow action life cycle Dmitry Kozlyuk
@ 2021-07-28  8:05 ` Andrew Rybchenko
  2021-07-28 11:18   ` Dmitry Kozlyuk
  2021-07-29 14:00 ` [dpdk-dev] [PATCH v2 " Dmitry Kozlyuk
  5 siblings, 1 reply; 16+ messages in thread
From: Andrew Rybchenko @ 2021-07-28  8:05 UTC (permalink / raw)
  To: Dmitry Kozlyuk, dev; +Cc: David Marchand

On 7/27/21 10:31 AM, Dmitry Kozlyuk wrote:
> It was unspecified what happens to indirect actions when a port
> is stopped, possibly reconfigured, and started again. MLX5 PMD,
> the first one to use indirect actions, intended to keep them across
> such a sequence, but the implementation was buggy. Patches 1-3 fix
> the PMD behavior, patch 4 adds common specification with rationale.

I'm sorry, but it looks very inconsistent. If flow rules are not
preserved across restart, indirect actions should not be preserved
as well. We need very strong reasons to introduce the inconsistency.

If we finally accept it, I think it would be very useful to care
about PMDs which cannot preserve it in HW across restart from the
very beginning and save it in ethdev layer and restore on start
automatically (i.e. do not force all such PMDs to care about
the restore internally and basically duplicate the code).

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [PATCH 4/4] ethdev: document indirect flow action life cycle
  2021-07-27  7:31 ` [dpdk-dev] [PATCH 4/4] ethdev: document indirect flow action life cycle Dmitry Kozlyuk
@ 2021-07-28  9:50   ` Ori Kam
  0 siblings, 0 replies; 16+ messages in thread
From: Ori Kam @ 2021-07-28  9:50 UTC (permalink / raw)
  To: Dmitry Kozlyuk, dev
  Cc: David Marchand, Bing Zhao, stable, Matan Azrad,
	NBU-Contact-Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko

Hi Dmitry,

> -----Original Message-----
> From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> Sent: Tuesday, July 27, 2021 10:31 AM
>
> Subject: [PATCH 4/4] ethdev: document indirect flow action life cycle
> 
> rte_flow_action_handle_create() did not specify what happens with an
> indirect action when device is stopped, possibly reconfigured, and started
> again.
> 
> It is proposed that indirect actions persisted across such a sequence.
> This allows for easier API usage and better HW resources utilization by saving
> indirect actions flush and re-creation with associated error handling and
> rollback.
> 
> If between stop and start a device is reconfigured in a way that is
> incompatible with an existing indirect action, PMD is required to report an
> error at the device start. This is mandatory, because flow API does not supply
> users with capabilities, so this is the only way for a user to learn that
> configuration is invalid. Errors are not reported at configuration stage to give
> the user a chance to remove or change offending actions. For example, if
> number of queues changes and an RSS indirect action specifies queues that
> went away, user must update the action before starting the device. PMD is
> not allowed to silently adjust indirect actions (in the same example, to
> remove queues from the RSS), so that all configuration is explicit.
> 
I think some of the errors can be checked during configuration.
For example when app removes a queue the PMD can check if there is any
reference on this queue and fail at this point.

> Fixes: 4b61b8774be9 ("ethdev: introduce indirect flow action")
> Cc: bingz@nvidia.com
> Cc: stable@dpdk.org
> 
> Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> Acked-by: Matan Azrad <matan@nvidia.com>
> ---
>  doc/guides/prog_guide/rte_flow.rst | 10 ++++++++++
>  lib/ethdev/rte_flow.h              |  4 ++++
>  2 files changed, 14 insertions(+)
> 
> diff --git a/doc/guides/prog_guide/rte_flow.rst
> b/doc/guides/prog_guide/rte_flow.rst
> index 2b42d5ec8c..06dd06d9a6 100644
> --- a/doc/guides/prog_guide/rte_flow.rst
> +++ b/doc/guides/prog_guide/rte_flow.rst
> @@ -2785,6 +2785,16 @@ updated depend on the type of the ``action`` and
> different for every type.
>  The indirect action specified data (e.g. counter) can be queried by
> ``rte_flow_action_handle_query()``.
> 
> +Indirect actions persist across device configure, stop, and start.
+1
> +If a new configuration is incompatible with an existing indirect
> +action, the start operation will fail. "Incompatible" means that if
> +this action was destroyed and created again, creation would fail.

I think this test should be done during the configuration.
It is the application responsibility to use valid actions, it should be stated
that changing the values of configuration may result in in valid actions.

> +It is a programmer's responsibility to remove or update offending actions.
> +
> +PMD developers should use the same diagnostics for
> +``rte_eth_dev_start()`` as for ``rte_flow_action_handle_create()``. PMD
> +is not allowed to silently ignore or correct offending actions.
> +
>  .. _table_rte_flow_action_handle:
> 
>  .. table:: INDIRECT
> diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h index
> 70f455d47d..f571a27fe7 100644
> --- a/lib/ethdev/rte_flow.h
> +++ b/lib/ethdev/rte_flow.h
> @@ -3969,6 +3969,10 @@ struct rte_flow_indir_action_conf {
>   * The created object handle has single state and configuration
>   * across all the flow rules using it.
>   *
> + * Indirect actions persist across device configure, stop, and start.
> + * If a new configuration is incompatible with an existing indirect
> + * action, rte_eth_dev_start() will fail.
> + *

I don't think that dev start should check configuration it may slow down startup.

>   * @param[in] port_id
>   *    The port identifier of the Ethernet device.
>   * @param[in] conf
> --
> 2.25.1

Thanks,
Ori


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4] net/mlx5: keep indirect actions across port restart
  2021-07-28  8:05 ` [dpdk-dev] [PATCH 0/4] net/mlx5: keep indirect actions across port restart Andrew Rybchenko
@ 2021-07-28 11:18   ` Dmitry Kozlyuk
  2021-07-28 12:07     ` Ori Kam
  2021-07-28 12:26     ` Andrew Rybchenko
  0 siblings, 2 replies; 16+ messages in thread
From: Dmitry Kozlyuk @ 2021-07-28 11:18 UTC (permalink / raw)
  To: Andrew Rybchenko, dev; +Cc: David Marchand

Hi Andrew,

> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> On 7/27/21 10:31 AM, Dmitry Kozlyuk wrote:
> > It was unspecified what happens to indirect actions when a port is
> > stopped, possibly reconfigured, and started again. MLX5 PMD, the first
> > one to use indirect actions, intended to keep them across such a
> > sequence, but the implementation was buggy. Patches 1-3 fix the PMD
> > behavior, patch 4 adds common specification with rationale.
> 
> I'm sorry, but it looks very inconsistent. If flow rules are not preserved across
> restart, indirect actions should not be preserved as well. We need very strong
> reasons to introduce the inconsistency.

Indirect actions really don't need to behave like flow rules. They are just objects owned by the port and they can exist while it exists. Consider a counter: stopping and starting the port doesn't logically affect its state. MLX5 PMD destroys flow rules on port stop for internal reasons and documents this behavior, but ethdev API doesn't require it either.

> If we finally accept it, I think it would be very useful to care about PMDs which
> cannot preserve it in HW across restart from the very beginning and save it in
> ethdev layer and restore on start automatically (i.e. do not force all such PMDs
> to care about the restore internally and basically duplicate the code).

Or keeping indirect actions can be an advertised PMD capability.
Given Ori's comments to patch 4, I think the common spec needs more work.
For this patchset that fixes MLX5 we can have the behavior documented for PMD and not require it from all the drivers.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4] net/mlx5: keep indirect actions across port restart
  2021-07-28 11:18   ` Dmitry Kozlyuk
@ 2021-07-28 12:07     ` Ori Kam
  2021-07-28 12:26     ` Andrew Rybchenko
  1 sibling, 0 replies; 16+ messages in thread
From: Ori Kam @ 2021-07-28 12:07 UTC (permalink / raw)
  To: Dmitry Kozlyuk, Andrew Rybchenko, dev; +Cc: David Marchand

Hi Dmitry and Andrew,

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Dmitry Kozlyuk
> Sent: Wednesday, July 28, 2021 2:19 PM
> 
> Hi Andrew,
> 
> > From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> On 7/27/21
> > 10:31 AM, Dmitry Kozlyuk wrote:
> > > It was unspecified what happens to indirect actions when a port is
> > > stopped, possibly reconfigured, and started again. MLX5 PMD, the
> > > first one to use indirect actions, intended to keep them across such
> > > a sequence, but the implementation was buggy. Patches 1-3 fix the
> > > PMD behavior, patch 4 adds common specification with rationale.
> >
> > I'm sorry, but it looks very inconsistent. If flow rules are not
> > preserved across restart, indirect actions should not be preserved as
> > well. We need very strong reasons to introduce the inconsistency.
> 
> Indirect actions really don't need to behave like flow rules. They are just
> objects owned by the port and they can exist while it exists. Consider a
> counter: stopping and starting the port doesn't logically affect its state. MLX5
> PMD destroys flow rules on port stop for internal reasons and documents
> this behavior, but ethdev API doesn't require it either.
> 
> > If we finally accept it, I think it would be very useful to care about
> > PMDs which cannot preserve it in HW across restart from the very
> > beginning and save it in ethdev layer and restore on start
> > automatically (i.e. do not force all such PMDs to care about the restore
> internally and basically duplicate the code).
> 
> Or keeping indirect actions can be an advertised PMD capability.
> Given Ori's comments to patch 4, I think the common spec needs more work.
> For this patchset that fixes MLX5 we can have the behavior documented for
> PMD and not require it from all the drivers.

This also effects if flows can be stored or not, (there was other thread about it)
I think we should have device cap that says if flows are preserved,
if they can be created before start, the same goes to actions, but
what if some actions can be preserved and some not? For example RSS
can't in some HW (or due to configuration change) while other can? For example
counter?
I don't want to have cap for each action, I think this info is based explained in each
driver documentation.
Maybe we can have some general flag one for flows and one for actions, and each PMD will have
detail doc.


Best,
Ori


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4] net/mlx5: keep indirect actions across port restart
  2021-07-28 11:18   ` Dmitry Kozlyuk
  2021-07-28 12:07     ` Ori Kam
@ 2021-07-28 12:26     ` Andrew Rybchenko
  2021-07-28 14:08       ` Dmitry Kozlyuk
  1 sibling, 1 reply; 16+ messages in thread
From: Andrew Rybchenko @ 2021-07-28 12:26 UTC (permalink / raw)
  To: Dmitry Kozlyuk, dev; +Cc: David Marchand

On 7/28/21 2:18 PM, Dmitry Kozlyuk wrote:
> Hi Andrew,
> 
>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> On 7/27/21 10:31 AM, Dmitry Kozlyuk wrote:
>>> It was unspecified what happens to indirect actions when a port is
>>> stopped, possibly reconfigured, and started again. MLX5 PMD, the first
>>> one to use indirect actions, intended to keep them across such a
>>> sequence, but the implementation was buggy. Patches 1-3 fix the PMD
>>> behavior, patch 4 adds common specification with rationale.
>>
>> I'm sorry, but it looks very inconsistent. If flow rules are not preserved across
>> restart, indirect actions should not be preserved as well. We need very strong
>> reasons to introduce the inconsistency.
> 
> Indirect actions really don't need to behave like flow rules. They are just objects owned by the port and they can exist while it exists. Consider a counter: stopping and starting the port doesn't logically affect its state. MLX5 PMD destroys flow rules on port stop for internal reasons and documents this behavior, but ethdev API doesn't require it either.

It all sounds bad. All these gray areas just make it hard for DPDK
applications to switch from one HW to another.
Any rules must not be motivated because of some PMD internal reasons.
We should not adjust ethdev rules to fit some PMD behaviour.
ethdev rules should be motivated by common sense and convenience from
applications point of view.

For example, it is strange to preserve indirect RSS action with queues 
specified across device reconfiguration when queues count may change.
I'd say that reconfiguration must drop all indirect actions.
However, just stop/start could preserve both indirect actions and flow
rues since it could be more convenient from application point of view.
If application really wants to remove all flow rules, it can call
rte_flow_flush().
The strong reason to flush indirect actions and flow rules across
restart is possible actions or rules restore failure on start.
However, may be it is sufficient to document that start should really
fail, if it can't restore everything and application should retry
after rte_flow_flush() taking it into account.

>> If we finally accept it, I think it would be very useful to care about PMDs which
>> cannot preserve it in HW across restart from the very beginning and save it in
>> ethdev layer and restore on start automatically (i.e. do not force all such PMDs
>> to care about the restore internally and basically duplicate the code).
> 
> Or keeping indirect actions can be an advertised PMD capability.
> Given Ori's comments to patch 4, I think the common spec needs more work.
> For this patchset that fixes MLX5 we can have the behavior documented for PMD and not require it from all the drivers.

Are you going to drop 4th patch?

In general documenting PMD behaviour specifics in its documentation is
a wrong direction since it does not help DPDK applications to be
portable across different HW.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4] net/mlx5: keep indirect actions across port restart
  2021-07-28 12:26     ` Andrew Rybchenko
@ 2021-07-28 14:08       ` Dmitry Kozlyuk
  2021-07-28 17:07         ` Ori Kam
  0 siblings, 1 reply; 16+ messages in thread
From: Dmitry Kozlyuk @ 2021-07-28 14:08 UTC (permalink / raw)
  To: Andrew Rybchenko, dev; +Cc: David Marchand, Ori Kam

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: 28 июля 2021 г. 15:27
> To: Dmitry Kozlyuk <dkozlyuk@nvidia.com>; dev@dpdk.org
> Cc: David Marchand <david.marchand@redhat.com>
> Subject: Re: [dpdk-dev] [PATCH 0/4] net/mlx5: keep indirect actions across port
> restart
> 
> External email: Use caution opening links or attachments
> 
> 
> On 7/28/21 2:18 PM, Dmitry Kozlyuk wrote:
> > Hi Andrew,
> >
> >> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> On 7/27/21
> >> 10:31 AM, Dmitry Kozlyuk wrote:
> >>> It was unspecified what happens to indirect actions when a port is
> >>> stopped, possibly reconfigured, and started again. MLX5 PMD, the
> >>> first one to use indirect actions, intended to keep them across such
> >>> a sequence, but the implementation was buggy. Patches 1-3 fix the
> >>> PMD behavior, patch 4 adds common specification with rationale.
> >>
> >> I'm sorry, but it looks very inconsistent. If flow rules are not
> >> preserved across restart, indirect actions should not be preserved as
> >> well. We need very strong reasons to introduce the inconsistency.
> >
> > Indirect actions really don't need to behave like flow rules. They are just
> objects owned by the port and they can exist while it exists. Consider a counter:
> stopping and starting the port doesn't logically affect its state. MLX5 PMD
> destroys flow rules on port stop for internal reasons and documents this
> behavior, but ethdev API doesn't require it either.
> 
> It all sounds bad. All these gray areas just make it hard for DPDK applications to
> switch from one HW to another.
> Any rules must not be motivated because of some PMD internal reasons.
> We should not adjust ethdev rules to fit some PMD behaviour.
> ethdev rules should be motivated by common sense and convenience from
> applications point of view.

That is what this patchset is trying to do.
Current specification is unclear, application doesn't know
if it should destroy and recreate indirect actions or not.
MLX5 PMD is only mentioned above because it's the only one implementing
indirect action API, but it's not an attempt to tailor API to it, quite the opposite.

> For example, it is strange to preserve indirect RSS action with queues specified
> across device reconfiguration when queues count may change.
> I'd say that reconfiguration must drop all indirect actions.

I don't like it because 1) it is implicit, 2) it may be unnecessary even for RSS, and it's only one example of an indirect action.

> However, just stop/start could preserve both indirect actions and flow rues since
> it could be more convenient from application point of view.

For many cases I agree, but not for all.
What if an application creates numerous flows from its data path?
They are transient by nature, but PMD will have to save them all
at the cost of RAM and CPU but without benefit to anyone.
OTOH, application always controls indirect actions it creates,
because it is going to reuse or query them.
Therefore, it is both logical and convenient to preserve them.

> If application really wants to remove all flow rules, it can call rte_flow_flush().
> The strong reason to flush indirect actions and flow rules across restart is
> possible actions or rules restore failure on start.
> However, may be it is sufficient to document that start should really fail, if it
> can't restore everything and application should retry after rte_flow_flush()
> taking it into account.
> 
> >> If we finally accept it, I think it would be very useful to care
> >> about PMDs which cannot preserve it in HW across restart from the
> >> very beginning and save it in ethdev layer and restore on start
> >> automatically (i.e. do not force all such PMDs to care about the restore
> internally and basically duplicate the code).
> >
> > Or keeping indirect actions can be an advertised PMD capability.
> > Given Ori's comments to patch 4, I think the common spec needs more work.
> > For this patchset that fixes MLX5 we can have the behavior documented for
> PMD and not require it from all the drivers.
> 
> Are you going to drop 4th patch?

Yes.

> In general documenting PMD behaviour specifics in its documentation is a wrong
> direction since it does not help DPDK applications to be portable across different
> HW.

I agree. But currently there is a clear resource leak in MLX5 PMD, that can be solved either by destroying indirect actions on port stop or by keeping them (this is what PMD maintainers prefer). The leak should be fixed and what happens to indirect actions must be clearly documented. Ideally the fix should be aligned with common ethdev API, but if you and Ori think its design is wrong, then at least behavior can be described in PMD docs and later fixed or promoted to API.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4] net/mlx5: keep indirect actions across port restart
  2021-07-28 14:08       ` Dmitry Kozlyuk
@ 2021-07-28 17:07         ` Ori Kam
  0 siblings, 0 replies; 16+ messages in thread
From: Ori Kam @ 2021-07-28 17:07 UTC (permalink / raw)
  To: Dmitry Kozlyuk, Andrew Rybchenko, dev; +Cc: David Marchand

Hi,

> -----Original Message-----
> From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> Sent: Wednesday, July 28, 2021 5:08 PM
> 
> > -----Original Message-----
> > From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> > Sent: 28 июля 2021 г. 15:27
> > To: Dmitry Kozlyuk <dkozlyuk@nvidia.com>; dev@dpdk.org
> > Cc: David Marchand <david.marchand@redhat.com>
> > Subject: Re: [dpdk-dev] [PATCH 0/4] net/mlx5: keep indirect actions
> > across port restart
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On 7/28/21 2:18 PM, Dmitry Kozlyuk wrote:
> > > Hi Andrew,
> > >
> > >> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> On
> 7/27/21
> > >> 10:31 AM, Dmitry Kozlyuk wrote:
> > >>> It was unspecified what happens to indirect actions when a port is
> > >>> stopped, possibly reconfigured, and started again. MLX5 PMD, the
> > >>> first one to use indirect actions, intended to keep them across
> > >>> such a sequence, but the implementation was buggy. Patches 1-3 fix
> > >>> the PMD behavior, patch 4 adds common specification with rationale.
> > >>
> > >> I'm sorry, but it looks very inconsistent. If flow rules are not
> > >> preserved across restart, indirect actions should not be preserved
> > >> as well. We need very strong reasons to introduce the inconsistency.
> > >
> > > Indirect actions really don't need to behave like flow rules. They
> > > are just
> > objects owned by the port and they can exist while it exists. Consider a
> counter:
> > stopping and starting the port doesn't logically affect its state.
> > MLX5 PMD destroys flow rules on port stop for internal reasons and
> > documents this behavior, but ethdev API doesn't require it either.
> >
> > It all sounds bad. All these gray areas just make it hard for DPDK
> > applications to switch from one HW to another.
> > Any rules must not be motivated because of some PMD internal reasons.
> > We should not adjust ethdev rules to fit some PMD behaviour.
> > ethdev rules should be motivated by common sense and convenience
> from
> > applications point of view.
> 
> That is what this patchset is trying to do.
> Current specification is unclear, application doesn't know if it should destroy
> and recreate indirect actions or not.
> MLX5 PMD is only mentioned above because it's the only one implementing
> indirect action API, but it's not an attempt to tailor API to it, quite the
> opposite.
> 
I agree gray areas are bad, but as more and more application are using more and more
flows, insertion of flows become part of the datapath, optimization of actions and rules
become even more critical. (To address this we are going to introduce additional API,
which will enable async insertion, allocation of resources before flow insertion)
So since each HW implements the actions and flows differently forcing the exact same
behavior will result in performance degradation.


> > For example, it is strange to preserve indirect RSS action with queues
> > specified across device reconfiguration when queues count may change.
> > I'd say that reconfiguration must drop all indirect actions.
> 
> I don't like it because 1) it is implicit, 2) it may be unnecessary even for RSS,
> and it's only one example of an indirect action.
> 
> > However, just stop/start could preserve both indirect actions and flow
> > rues since it could be more convenient from application point of view.
> 
> For many cases I agree, but not for all.
> What if an application creates numerous flows from its data path?
> They are transient by nature, but PMD will have to save them all at the cost
> of RAM and CPU but without benefit to anyone.
> OTOH, application always controls indirect actions it creates, because it is
> going to reuse or query them.
> Therefore, it is both logical and convenient to preserve them.
> 
> > If application really wants to remove all flow rules, it can call
> rte_flow_flush().
> > The strong reason to flush indirect actions and flow rules across
> > restart is possible actions or rules restore failure on start.
> > However, may be it is sufficient to document that start should really
> > fail, if it can't restore everything and application should retry
> > after rte_flow_flush() taking it into account.
> >
> > >> If we finally accept it, I think it would be very useful to care
> > >> about PMDs which cannot preserve it in HW across restart from the
> > >> very beginning and save it in ethdev layer and restore on start
> > >> automatically (i.e. do not force all such PMDs to care about the
> > >> restore
> > internally and basically duplicate the code).
> > >
> > > Or keeping indirect actions can be an advertised PMD capability.
> > > Given Ori's comments to patch 4, I think the common spec needs more
> work.
> > > For this patchset that fixes MLX5 we can have the behavior
> > > documented for
> > PMD and not require it from all the drivers.
> >
> > Are you going to drop 4th patch?
> 
> Yes.
> 
> > In general documenting PMD behaviour specifics in its documentation is
> > a wrong direction since it does not help DPDK applications to be
> > portable across different HW.
> 
> I agree. But currently there is a clear resource leak in MLX5 PMD, that can be
> solved either by destroying indirect actions on port stop or by keeping them
> (this is what PMD maintainers prefer). The leak should be fixed and what
> happens to indirect actions must be clearly documented. Ideally the fix
> should be aligned with common ethdev API, but if you and Ori think its
> design is wrong, then at least behavior can be described in PMD docs and
> later fixed or promoted to API.

I think application should be aware to different possibilities between the PMD.
If possible, it is best that all PMD will act the same but if there is HW issue I think
different behavior is better then not supporting at all.
In any case the doc should state the min requirement if HW can support better than
it can do so.

Best,
Ori

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [dpdk-dev] [PATCH v2 0/4] net/mlx5: keep indirect actions across port restart
  2021-07-27  7:31 [dpdk-dev] [PATCH 0/4] net/mlx5: keep indirect actions across port restart Dmitry Kozlyuk
                   ` (4 preceding siblings ...)
  2021-07-28  8:05 ` [dpdk-dev] [PATCH 0/4] net/mlx5: keep indirect actions across port restart Andrew Rybchenko
@ 2021-07-29 14:00 ` Dmitry Kozlyuk
  2021-07-29 14:00   ` [dpdk-dev] [PATCH v2 1/3] net/mlx5: discover max flow priority using DevX Dmitry Kozlyuk
                     ` (2 more replies)
  5 siblings, 3 replies; 16+ messages in thread
From: Dmitry Kozlyuk @ 2021-07-29 14:00 UTC (permalink / raw)
  To: dev; +Cc: David Marchand, Andrew Rybchenko, Ori Kam

It is unspecified what happens to indirect actions when a port
is stopped, possibly reconfigured, and started again.  MLX5 PMD,
the first one to implement indirect action API, intended to keep
them across such a sequence, but the implementation was buggy,
This patcheset fixes it. There is no consensus yet what behavior
should be specified for all drivers, so current behavior is described
in PMD docs.

Dmitry Kozlyuk (3):
  net/mlx5: discover max flow priority using DevX
  net/mlx5: create drop queue using DevX
  net/mlx5: preserve indirect actions across port restart

 doc/guides/nics/mlx5.rst           |   8 +
 drivers/net/mlx5/linux/mlx5_os.c   |   5 -
 drivers/net/mlx5/mlx5_devx.c       | 204 +++++++++++++++++---
 drivers/net/mlx5/mlx5_flow.c       | 292 ++++++++++++++++++++++++++---
 drivers/net/mlx5/mlx5_flow.h       |   6 +
 drivers/net/mlx5/mlx5_flow_dv.c    | 103 ++++++++++
 drivers/net/mlx5/mlx5_flow_verbs.c |  77 +-------
 drivers/net/mlx5/mlx5_rx.h         |   4 +
 drivers/net/mlx5/mlx5_rxq.c        |  99 ++++++++--
 drivers/net/mlx5/mlx5_trigger.c    |  10 +
 10 files changed, 674 insertions(+), 134 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [dpdk-dev] [PATCH v2 1/3] net/mlx5: discover max flow priority using DevX
  2021-07-29 14:00 ` [dpdk-dev] [PATCH v2 " Dmitry Kozlyuk
@ 2021-07-29 14:00   ` Dmitry Kozlyuk
  2021-07-29 14:00   ` [dpdk-dev] [PATCH v2 2/3] net/mlx5: create drop queue " Dmitry Kozlyuk
  2021-07-29 14:00   ` [dpdk-dev] [PATCH v2 3/3] net/mlx5: preserve indirect actions across port restart Dmitry Kozlyuk
  2 siblings, 0 replies; 16+ messages in thread
From: Dmitry Kozlyuk @ 2021-07-29 14:00 UTC (permalink / raw)
  To: dev
  Cc: David Marchand, Andrew Rybchenko, Ori Kam, stable, Matan Azrad,
	Shahaf Shuler, Viacheslav Ovsiienko

Maximum available flow priority was discovered using Verbs API
regardless of the selected flow engine. This required some Verbs
objects to be initialized in order to use DevX engine. Make priority
discovery an engine method and implement it for DevX using its API.

Cc: stable@dpdk.org

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_os.c   |   1 -
 drivers/net/mlx5/mlx5_flow.c       |  98 +++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_flow.h       |   4 ++
 drivers/net/mlx5/mlx5_flow_dv.c    | 103 +++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_flow_verbs.c |  77 +++------------------
 5 files changed, 215 insertions(+), 68 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 4712bd6f9b..671dfface9 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1781,7 +1781,6 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	priv->drop_queue.hrxq = mlx5_drop_action_create(eth_dev);
 	if (!priv->drop_queue.hrxq)
 		goto error;
-	/* Supported Verbs flow priority number detection. */
 	err = mlx5_flow_discover_priorities(eth_dev);
 	if (err < 0) {
 		err = -err;
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index a3fdce685e..e8d2678877 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -9362,3 +9362,101 @@ mlx5_dbg__print_pattern(const struct rte_flow_item *item)
 	}
 	printf("END\n");
 }
+
+/* Map of Verbs to Flow priority with 8 Verbs priorities. */
+static const uint32_t priority_map_3[][MLX5_PRIORITY_MAP_MAX] = {
+	{ 0, 1, 2 }, { 2, 3, 4 }, { 5, 6, 7 },
+};
+
+/* Map of Verbs to Flow priority with 16 Verbs priorities. */
+static const uint32_t priority_map_5[][MLX5_PRIORITY_MAP_MAX] = {
+	{ 0, 1, 2 }, { 3, 4, 5 }, { 6, 7, 8 },
+	{ 9, 10, 11 }, { 12, 13, 14 },
+};
+
+/**
+ * Discover the number of available flow priorities.
+ *
+ * @param dev
+ *   Ethernet device.
+ *
+ * @return
+ *   On success, number of available flow priorities.
+ *   On failure, a negative errno-style code and rte_errno is set.
+ */
+int
+mlx5_flow_discover_priorities(struct rte_eth_dev *dev)
+{
+	static const uint16_t vprio[] = {8, 16};
+	const struct mlx5_priv *priv = dev->data->dev_private;
+	const struct mlx5_flow_driver_ops *fops;
+	enum mlx5_flow_drv_type type;
+	int ret;
+
+	type = mlx5_flow_os_get_type();
+	if (type == MLX5_FLOW_TYPE_MAX) {
+		type = MLX5_FLOW_TYPE_VERBS;
+		if (priv->config.devx && priv->config.dv_flow_en)
+			type = MLX5_FLOW_TYPE_DV;
+	}
+	fops = flow_get_drv_ops(type);
+	if (fops->discover_priorities == NULL) {
+		DRV_LOG(ERR, "Priority discovery not supported");
+		rte_errno = ENOTSUP;
+		return -rte_errno;
+	}
+	ret = fops->discover_priorities(dev, vprio, RTE_DIM(vprio));
+	if (ret < 0)
+		return ret;
+	switch (ret) {
+	case 8:
+		ret = RTE_DIM(priority_map_3);
+		break;
+	case 16:
+		ret = RTE_DIM(priority_map_5);
+		break;
+	default:
+		rte_errno = ENOTSUP;
+		DRV_LOG(ERR,
+			"port %u maximum priority: %d expected 8/16",
+			dev->data->port_id, ret);
+		return -rte_errno;
+	}
+	DRV_LOG(INFO, "port %u supported flow priorities:"
+		" 0-%d for ingress or egress root table,"
+		" 0-%d for non-root table or transfer root table.",
+		dev->data->port_id, ret - 2,
+		MLX5_NON_ROOT_FLOW_MAX_PRIO - 1);
+	return ret;
+}
+
+/**
+ * Adjust flow priority based on the highest layer and the request priority.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] priority
+ *   The rule base priority.
+ * @param[in] subpriority
+ *   The priority based on the items.
+ *
+ * @return
+ *   The new priority.
+ */
+uint32_t
+mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
+			  uint32_t subpriority)
+{
+	uint32_t res = 0;
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	switch (priv->config.flow_prio) {
+	case RTE_DIM(priority_map_3):
+		res = priority_map_3[priority][subpriority];
+		break;
+	case RTE_DIM(priority_map_5):
+		res = priority_map_5[priority][subpriority];
+		break;
+	}
+	return  res;
+}
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 3724293d26..da39eeb596 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1229,6 +1229,9 @@ typedef int (*mlx5_flow_create_def_policy_t)
 			(struct rte_eth_dev *dev);
 typedef void (*mlx5_flow_destroy_def_policy_t)
 			(struct rte_eth_dev *dev);
+typedef int (*mlx5_flow_discover_priorities_t)
+			(struct rte_eth_dev *dev,
+			 const uint16_t *vprio, int vprio_n);
 
 struct mlx5_flow_driver_ops {
 	mlx5_flow_validate_t validate;
@@ -1263,6 +1266,7 @@ struct mlx5_flow_driver_ops {
 	mlx5_flow_action_update_t action_update;
 	mlx5_flow_action_query_t action_query;
 	mlx5_flow_sync_domain_t sync_domain;
+	mlx5_flow_discover_priorities_t discover_priorities;
 };
 
 /* mlx5_flow.c */
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 736227bc0c..b31445b51e 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -17828,6 +17828,108 @@ flow_dv_sync_domain(struct rte_eth_dev *dev, uint32_t domains, uint32_t flags)
 	return 0;
 }
 
+/**
+ * Discover the number of available flow priorities
+ * by trying to create a flow with the highest priority value
+ * for each possible number.
+ *
+ * @param[in] dev
+ *   Ethernet device.
+ * @param[in] vprio
+ *   List of possible number of available priorities.
+ * @param[in] vprio_n
+ *   Size of @p vprio array.
+ * @return
+ *   On success, number of available flow priorities.
+ *   On failure, a negative errno-style code and rte_errno is set.
+ */
+static int
+flow_dv_discover_priorities(struct rte_eth_dev *dev,
+			    const uint16_t *vprio, int vprio_n)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_indexed_pool *pool = priv->sh->ipool[MLX5_IPOOL_MLX5_FLOW];
+	struct rte_flow_item_eth eth;
+	struct rte_flow_item item = {
+		.type = RTE_FLOW_ITEM_TYPE_ETH,
+		.spec = &eth,
+		.mask = &eth,
+	};
+	struct mlx5_flow_dv_matcher matcher = {
+		.mask = {
+			.size = sizeof(matcher.mask.buf),
+		},
+	};
+	union mlx5_flow_tbl_key tbl_key;
+	struct mlx5_flow flow;
+	void *action;
+	struct rte_flow_error error;
+	uint8_t misc_mask;
+	int i, err, ret = -ENOTSUP;
+
+	/*
+	 * Prepare a flow with a catch-all pattern and a drop action.
+	 * Use drop queue, because shared drop action may be unavailable.
+	 */
+	action = priv->drop_queue.hrxq->action;
+	if (action == NULL) {
+		DRV_LOG(ERR, "Priority discovery requires a drop action");
+		rte_errno = ENOTSUP;
+		return -rte_errno;
+	}
+	memset(&flow, 0, sizeof(flow));
+	flow.handle = mlx5_ipool_zmalloc(pool, &flow.handle_idx);
+	if (flow.handle == NULL) {
+		DRV_LOG(ERR, "Cannot create flow handle");
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	flow.ingress = true;
+	flow.dv.value.size = MLX5_ST_SZ_BYTES(fte_match_param);
+	flow.dv.actions[0] = action;
+	flow.dv.actions_n = 1;
+	memset(&eth, 0, sizeof(eth));
+	flow_dv_translate_item_eth(matcher.mask.buf, flow.dv.value.buf,
+				   &item, /* inner */ false, /* group */ 0);
+	matcher.crc = rte_raw_cksum(matcher.mask.buf, matcher.mask.size);
+	for (i = 0; i < vprio_n; i++) {
+		/* Configure the next proposed maximum priority. */
+		matcher.priority = vprio[i] - 1;
+		memset(&tbl_key, 0, sizeof(tbl_key));
+		err = flow_dv_matcher_register(dev, &matcher, &tbl_key, &flow,
+					       /* tunnel */ NULL,
+					       /* group */ 0,
+					       &error);
+		if (err != 0) {
+			/* This action is pure SW and must always succeed. */
+			DRV_LOG(ERR, "Cannot register matcher");
+			ret = -rte_errno;
+			break;
+		}
+		/* Try to apply the flow to HW. */
+		misc_mask = flow_dv_matcher_enable(flow.dv.value.buf);
+		__flow_dv_adjust_buf_size(&flow.dv.value.size, misc_mask);
+		err = mlx5_flow_os_create_flow
+				(flow.handle->dvh.matcher->matcher_object,
+				 (void *)&flow.dv.value, flow.dv.actions_n,
+				 flow.dv.actions, &flow.handle->drv_flow);
+		if (err == 0) {
+			claim_zero(mlx5_flow_os_destroy_flow
+						(flow.handle->drv_flow));
+			flow.handle->drv_flow = NULL;
+		}
+		claim_zero(flow_dv_matcher_release(dev, flow.handle));
+		if (err != 0)
+			break;
+		ret = vprio[i];
+	}
+	mlx5_ipool_free(pool, flow.handle_idx);
+	/* Set rte_errno if no expected priority value matched. */
+	if (ret < 0)
+		rte_errno = -ret;
+	return ret;
+}
+
 const struct mlx5_flow_driver_ops mlx5_flow_dv_drv_ops = {
 	.validate = flow_dv_validate,
 	.prepare = flow_dv_prepare,
@@ -17861,6 +17963,7 @@ const struct mlx5_flow_driver_ops mlx5_flow_dv_drv_ops = {
 	.action_update = flow_dv_action_update,
 	.action_query = flow_dv_action_query,
 	.sync_domain = flow_dv_sync_domain,
+	.discover_priorities = flow_dv_discover_priorities,
 };
 
 #endif /* HAVE_IBV_FLOW_DV_SUPPORT */
diff --git a/drivers/net/mlx5/mlx5_flow_verbs.c b/drivers/net/mlx5/mlx5_flow_verbs.c
index 7b3d0b320d..c87b257b37 100644
--- a/drivers/net/mlx5/mlx5_flow_verbs.c
+++ b/drivers/net/mlx5/mlx5_flow_verbs.c
@@ -28,17 +28,6 @@
 #define VERBS_SPEC_INNER(item_flags) \
 	(!!((item_flags) & MLX5_FLOW_LAYER_TUNNEL) ? IBV_FLOW_SPEC_INNER : 0)
 
-/* Map of Verbs to Flow priority with 8 Verbs priorities. */
-static const uint32_t priority_map_3[][MLX5_PRIORITY_MAP_MAX] = {
-	{ 0, 1, 2 }, { 2, 3, 4 }, { 5, 6, 7 },
-};
-
-/* Map of Verbs to Flow priority with 16 Verbs priorities. */
-static const uint32_t priority_map_5[][MLX5_PRIORITY_MAP_MAX] = {
-	{ 0, 1, 2 }, { 3, 4, 5 }, { 6, 7, 8 },
-	{ 9, 10, 11 }, { 12, 13, 14 },
-};
-
 /* Verbs specification header. */
 struct ibv_spec_header {
 	enum ibv_flow_spec_type type;
@@ -50,13 +39,17 @@ struct ibv_spec_header {
  *
  * @param[in] dev
  *   Pointer to the Ethernet device structure.
- *
+ * @param[in] vprio
+ *   Expected result variants.
+ * @param[in] vprio_n
+ *   Number of entries in @p vprio array.
  * @return
- *   number of supported flow priority on success, a negative errno
+ *   Number of supported flow priority on success, a negative errno
  *   value otherwise and rte_errno is set.
  */
-int
-mlx5_flow_discover_priorities(struct rte_eth_dev *dev)
+static int
+flow_verbs_discover_priorities(struct rte_eth_dev *dev,
+			       const uint16_t *vprio, int vprio_n)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct {
@@ -79,7 +72,6 @@ mlx5_flow_discover_priorities(struct rte_eth_dev *dev)
 	};
 	struct ibv_flow *flow;
 	struct mlx5_hrxq *drop = priv->drop_queue.hrxq;
-	uint16_t vprio[] = { 8, 16 };
 	int i;
 	int priority = 0;
 
@@ -87,7 +79,7 @@ mlx5_flow_discover_priorities(struct rte_eth_dev *dev)
 		rte_errno = ENOTSUP;
 		return -rte_errno;
 	}
-	for (i = 0; i != RTE_DIM(vprio); i++) {
+	for (i = 0; i != vprio_n; i++) {
 		flow_attr.attr.priority = vprio[i] - 1;
 		flow = mlx5_glue->create_flow(drop->qp, &flow_attr.attr);
 		if (!flow)
@@ -95,59 +87,9 @@ mlx5_flow_discover_priorities(struct rte_eth_dev *dev)
 		claim_zero(mlx5_glue->destroy_flow(flow));
 		priority = vprio[i];
 	}
-	switch (priority) {
-	case 8:
-		priority = RTE_DIM(priority_map_3);
-		break;
-	case 16:
-		priority = RTE_DIM(priority_map_5);
-		break;
-	default:
-		rte_errno = ENOTSUP;
-		DRV_LOG(ERR,
-			"port %u verbs maximum priority: %d expected 8/16",
-			dev->data->port_id, priority);
-		return -rte_errno;
-	}
-	DRV_LOG(INFO, "port %u supported flow priorities:"
-		" 0-%d for ingress or egress root table,"
-		" 0-%d for non-root table or transfer root table.",
-		dev->data->port_id, priority - 2,
-		MLX5_NON_ROOT_FLOW_MAX_PRIO - 1);
 	return priority;
 }
 
-/**
- * Adjust flow priority based on the highest layer and the request priority.
- *
- * @param[in] dev
- *   Pointer to the Ethernet device structure.
- * @param[in] priority
- *   The rule base priority.
- * @param[in] subpriority
- *   The priority based on the items.
- *
- * @return
- *   The new priority.
- */
-uint32_t
-mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
-				   uint32_t subpriority)
-{
-	uint32_t res = 0;
-	struct mlx5_priv *priv = dev->data->dev_private;
-
-	switch (priv->config.flow_prio) {
-	case RTE_DIM(priority_map_3):
-		res = priority_map_3[priority][subpriority];
-		break;
-	case RTE_DIM(priority_map_5):
-		res = priority_map_5[priority][subpriority];
-		break;
-	}
-	return  res;
-}
-
 /**
  * Get Verbs flow counter by index.
  *
@@ -2093,4 +2035,5 @@ const struct mlx5_flow_driver_ops mlx5_flow_verbs_drv_ops = {
 	.destroy = flow_verbs_destroy,
 	.query = flow_verbs_query,
 	.sync_domain = flow_verbs_sync_domain,
+	.discover_priorities = flow_verbs_discover_priorities,
 };
-- 
2.25.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [dpdk-dev] [PATCH v2 2/3] net/mlx5: create drop queue using DevX
  2021-07-29 14:00 ` [dpdk-dev] [PATCH v2 " Dmitry Kozlyuk
  2021-07-29 14:00   ` [dpdk-dev] [PATCH v2 1/3] net/mlx5: discover max flow priority using DevX Dmitry Kozlyuk
@ 2021-07-29 14:00   ` Dmitry Kozlyuk
  2021-07-29 14:00   ` [dpdk-dev] [PATCH v2 3/3] net/mlx5: preserve indirect actions across port restart Dmitry Kozlyuk
  2 siblings, 0 replies; 16+ messages in thread
From: Dmitry Kozlyuk @ 2021-07-29 14:00 UTC (permalink / raw)
  To: dev
  Cc: David Marchand, Andrew Rybchenko, Ori Kam, stable, Matan Azrad,
	Shahaf Shuler, Viacheslav Ovsiienko

Drop queue creation and destruction were not implemented for DevX
flow engine and Verbs engine methods were used as a workaround.
Implement these methods for DevX so that there is a valid queue ID
that can be used regardless of queue configuration via API.

Cc: stable@dpdk.org

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_os.c |   4 -
 drivers/net/mlx5/mlx5_devx.c     | 204 ++++++++++++++++++++++++++-----
 2 files changed, 176 insertions(+), 32 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 671dfface9..791024c2c6 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1756,10 +1756,6 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	}
 	if (config->devx && config->dv_flow_en && config->dest_tir) {
 		priv->obj_ops = devx_obj_ops;
-		priv->obj_ops.drop_action_create =
-						ibv_obj_ops.drop_action_create;
-		priv->obj_ops.drop_action_destroy =
-						ibv_obj_ops.drop_action_destroy;
 #ifndef HAVE_MLX5DV_DEVX_UAR_OFFSET
 		priv->obj_ops.txq_obj_modify = ibv_obj_ops.txq_obj_modify;
 #else
diff --git a/drivers/net/mlx5/mlx5_devx.c b/drivers/net/mlx5/mlx5_devx.c
index a1db53577a..447d6bafb9 100644
--- a/drivers/net/mlx5/mlx5_devx.c
+++ b/drivers/net/mlx5/mlx5_devx.c
@@ -226,17 +226,17 @@ mlx5_rx_devx_get_event(struct mlx5_rxq_obj *rxq_obj)
  *
  * @param dev
  *   Pointer to Ethernet device.
- * @param idx
- *   Queue index in DPDK Rx queue array.
+ * @param rxq_data
+ *   RX queue data.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_rxq_create_devx_rq_resources(struct rte_eth_dev *dev, uint16_t idx)
+mlx5_rxq_create_devx_rq_resources(struct rte_eth_dev *dev,
+				  struct mlx5_rxq_data *rxq_data)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[idx];
 	struct mlx5_rxq_ctrl *rxq_ctrl =
 		container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
 	struct mlx5_devx_create_rq_attr rq_attr = { 0 };
@@ -289,20 +289,20 @@ mlx5_rxq_create_devx_rq_resources(struct rte_eth_dev *dev, uint16_t idx)
  *
  * @param dev
  *   Pointer to Ethernet device.
- * @param idx
- *   Queue index in DPDK Rx queue array.
+ * @param rxq_data
+ *   RX queue data.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_rxq_create_devx_cq_resources(struct rte_eth_dev *dev, uint16_t idx)
+mlx5_rxq_create_devx_cq_resources(struct rte_eth_dev *dev,
+				  struct mlx5_rxq_data *rxq_data)
 {
 	struct mlx5_devx_cq *cq_obj = 0;
 	struct mlx5_devx_cq_attr cq_attr = { 0 };
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_dev_ctx_shared *sh = priv->sh;
-	struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[idx];
 	struct mlx5_rxq_ctrl *rxq_ctrl =
 		container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
 	unsigned int cqe_n = mlx5_rxq_cqe_num(rxq_data);
@@ -497,13 +497,13 @@ mlx5_rxq_devx_obj_new(struct rte_eth_dev *dev, uint16_t idx)
 		tmpl->fd = mlx5_os_get_devx_channel_fd(tmpl->devx_channel);
 	}
 	/* Create CQ using DevX API. */
-	ret = mlx5_rxq_create_devx_cq_resources(dev, idx);
+	ret = mlx5_rxq_create_devx_cq_resources(dev, rxq_data);
 	if (ret) {
 		DRV_LOG(ERR, "Failed to create CQ.");
 		goto error;
 	}
 	/* Create RQ using DevX API. */
-	ret = mlx5_rxq_create_devx_rq_resources(dev, idx);
+	ret = mlx5_rxq_create_devx_rq_resources(dev, rxq_data);
 	if (ret) {
 		DRV_LOG(ERR, "Port %u Rx queue %u RQ creation failure.",
 			dev->data->port_id, idx);
@@ -536,6 +536,11 @@ mlx5_rxq_devx_obj_new(struct rte_eth_dev *dev, uint16_t idx)
  *   Pointer to Ethernet device.
  * @param log_n
  *   Log of number of queues in the array.
+ * @param queues
+ *   List of RX queue indices or NULL, in which case
+ *   the attribute will be filled by drop queue ID.
+ * @param queues_n
+ *   Size of @p queues array or 0 if it is NULL.
  * @param ind_tbl
  *   DevX indirection table object.
  *
@@ -563,6 +568,11 @@ mlx5_devx_ind_table_create_rqt_attr(struct rte_eth_dev *dev,
 	}
 	rqt_attr->rqt_max_size = priv->config.ind_table_max_size;
 	rqt_attr->rqt_actual_size = rqt_n;
+	if (queues == NULL) {
+		for (i = 0; i < rqt_n; i++)
+			rqt_attr->rq_list[i] = priv->drop_queue.rxq->rq->id;
+		return rqt_attr;
+	}
 	for (i = 0; i != queues_n; ++i) {
 		struct mlx5_rxq_data *rxq = (*priv->rxqs)[queues[i]];
 		struct mlx5_rxq_ctrl *rxq_ctrl =
@@ -670,7 +680,8 @@ mlx5_devx_ind_table_destroy(struct mlx5_ind_table_obj *ind_tbl)
  * @param[in] hash_fields
  *   Verbs protocol hash field to make the RSS on.
  * @param[in] ind_tbl
- *   Indirection table for TIR.
+ *   Indirection table for TIR. If table queues array is NULL,
+ *   a TIR for drop queue is assumed.
  * @param[in] tunnel
  *   Tunnel type.
  * @param[out] tir_attr
@@ -686,19 +697,27 @@ mlx5_devx_tir_attr_set(struct rte_eth_dev *dev, const uint8_t *rss_key,
 		       int tunnel, struct mlx5_devx_tir_attr *tir_attr)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[ind_tbl->queues[0]];
-	struct mlx5_rxq_ctrl *rxq_ctrl =
-		container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
-	enum mlx5_rxq_type rxq_obj_type = rxq_ctrl->type;
+	enum mlx5_rxq_type rxq_obj_type;
 	bool lro = true;
 	uint32_t i;
 
-	/* Enable TIR LRO only if all the queues were configured for. */
-	for (i = 0; i < ind_tbl->queues_n; ++i) {
-		if (!(*priv->rxqs)[ind_tbl->queues[i]]->lro) {
-			lro = false;
-			break;
+	/* NULL queues designate drop queue. */
+	if (ind_tbl->queues != NULL) {
+		struct mlx5_rxq_data *rxq_data =
+					(*priv->rxqs)[ind_tbl->queues[0]];
+		struct mlx5_rxq_ctrl *rxq_ctrl =
+			container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
+		rxq_obj_type = rxq_ctrl->type;
+
+		/* Enable TIR LRO only if all the queues were configured for. */
+		for (i = 0; i < ind_tbl->queues_n; ++i) {
+			if (!(*priv->rxqs)[ind_tbl->queues[i]]->lro) {
+				lro = false;
+				break;
+			}
 		}
+	} else {
+		rxq_obj_type = priv->drop_queue.rxq->rxq_ctrl->type;
 	}
 	memset(tir_attr, 0, sizeof(*tir_attr));
 	tir_attr->disp_type = MLX5_TIRC_DISP_TYPE_INDIRECT;
@@ -857,7 +876,7 @@ mlx5_devx_hrxq_modify(struct rte_eth_dev *dev, struct mlx5_hrxq *hrxq,
 }
 
 /**
- * Create a DevX drop action for Rx Hash queue.
+ * Create a DevX drop Rx queue.
  *
  * @param dev
  *   Pointer to Ethernet device.
@@ -866,14 +885,99 @@ mlx5_devx_hrxq_modify(struct rte_eth_dev *dev, struct mlx5_hrxq *hrxq,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_devx_drop_action_create(struct rte_eth_dev *dev)
+mlx5_rxq_devx_obj_drop_create(struct rte_eth_dev *dev)
 {
-	(void)dev;
-	DRV_LOG(ERR, "DevX drop action is not supported yet.");
-	rte_errno = ENOTSUP;
+	struct mlx5_priv *priv = dev->data->dev_private;
+	int socket_id = dev->device->numa_node;
+	struct mlx5_rxq_ctrl *rxq_ctrl;
+	struct mlx5_rxq_data *rxq_data;
+	struct mlx5_rxq_obj *rxq = NULL;
+	int ret;
+
+	/*
+	 * Initialize dummy control structures.
+	 * They are required to hold pointers for cleanup
+	 * and are only accessible via drop queue DevX objects.
+	 */
+	rxq_ctrl = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*rxq_ctrl),
+			       0, socket_id);
+	if (rxq_ctrl == NULL) {
+		DRV_LOG(ERR, "Port %u could not allocate drop queue control",
+			dev->data->port_id);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	rxq = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*rxq), 0, socket_id);
+	if (rxq == NULL) {
+		DRV_LOG(ERR, "Port %u could not allocate drop queue object",
+			dev->data->port_id);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	rxq->rxq_ctrl = rxq_ctrl;
+	rxq_ctrl->type = MLX5_RXQ_TYPE_STANDARD;
+	rxq_ctrl->priv = priv;
+	rxq_ctrl->obj = rxq;
+	rxq_data = &rxq_ctrl->rxq;
+	/* Create CQ using DevX API. */
+	ret = mlx5_rxq_create_devx_cq_resources(dev, rxq_data);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Port %u drop queue CQ creation failed.",
+			dev->data->port_id);
+		goto error;
+	}
+	/* Create RQ using DevX API. */
+	ret = mlx5_rxq_create_devx_rq_resources(dev, rxq_data);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Port %u drop queue RQ creation failed.",
+			dev->data->port_id);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	/* Change queue state to ready. */
+	ret = mlx5_devx_modify_rq(rxq, MLX5_RXQ_MOD_RST2RDY);
+	if (ret != 0)
+		goto error;
+	/* Initialize drop queue. */
+	priv->drop_queue.rxq = rxq;
+	return 0;
+error:
+	ret = rte_errno; /* Save rte_errno before cleanup. */
+	if (rxq != NULL) {
+		if (rxq->rq_obj.rq != NULL)
+			mlx5_devx_rq_destroy(&rxq->rq_obj);
+		if (rxq->cq_obj.cq != NULL)
+			mlx5_devx_cq_destroy(&rxq->cq_obj);
+		if (rxq->devx_channel)
+			mlx5_os_devx_destroy_event_channel
+							(rxq->devx_channel);
+		mlx5_free(rxq);
+	}
+	if (rxq_ctrl != NULL)
+		mlx5_free(rxq_ctrl);
+	rte_errno = ret; /* Restore rte_errno. */
 	return -rte_errno;
 }
 
+/**
+ * Release drop Rx queue resources.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ */
+static void
+mlx5_rxq_devx_obj_drop_release(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rxq_obj *rxq = priv->drop_queue.rxq;
+	struct mlx5_rxq_ctrl *rxq_ctrl = rxq->rxq_ctrl;
+
+	mlx5_rxq_devx_obj_release(rxq);
+	mlx5_free(rxq);
+	mlx5_free(rxq_ctrl);
+	priv->drop_queue.rxq = NULL;
+}
+
 /**
  * Release a drop hash Rx queue.
  *
@@ -883,9 +987,53 @@ mlx5_devx_drop_action_create(struct rte_eth_dev *dev)
 static void
 mlx5_devx_drop_action_destroy(struct rte_eth_dev *dev)
 {
-	(void)dev;
-	DRV_LOG(ERR, "DevX drop action is not supported yet.");
-	rte_errno = ENOTSUP;
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_hrxq *hrxq = priv->drop_queue.hrxq;
+
+	if (hrxq->tir != NULL)
+		mlx5_devx_tir_destroy(hrxq);
+	if (hrxq->ind_table->ind_table != NULL)
+		mlx5_devx_ind_table_destroy(hrxq->ind_table);
+	if (priv->drop_queue.rxq->rq != NULL)
+		mlx5_rxq_devx_obj_drop_release(dev);
+}
+
+/**
+ * Create a DevX drop action for Rx Hash queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_devx_drop_action_create(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_hrxq *hrxq = priv->drop_queue.hrxq;
+	int ret;
+
+	ret = mlx5_rxq_devx_obj_drop_create(dev);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Cannot create drop RX queue");
+		return ret;
+	}
+	/* hrxq->ind_table queues are NULL, drop RX queue ID will be used */
+	ret = mlx5_devx_ind_table_new(dev, 0, hrxq->ind_table);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Cannot create drop hash RX queue indirection table");
+		goto error;
+	}
+	ret = mlx5_devx_hrxq_new(dev, hrxq, /* tunnel */ false);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Cannot create drop hash RX queue");
+		goto error;
+	}
+	return 0;
+error:
+	mlx5_devx_drop_action_destroy(dev);
+	return ret;
 }
 
 /**
-- 
2.25.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [dpdk-dev] [PATCH v2 3/3] net/mlx5: preserve indirect actions across port restart
  2021-07-29 14:00 ` [dpdk-dev] [PATCH v2 " Dmitry Kozlyuk
  2021-07-29 14:00   ` [dpdk-dev] [PATCH v2 1/3] net/mlx5: discover max flow priority using DevX Dmitry Kozlyuk
  2021-07-29 14:00   ` [dpdk-dev] [PATCH v2 2/3] net/mlx5: create drop queue " Dmitry Kozlyuk
@ 2021-07-29 14:00   ` Dmitry Kozlyuk
  2 siblings, 0 replies; 16+ messages in thread
From: Dmitry Kozlyuk @ 2021-07-29 14:00 UTC (permalink / raw)
  To: dev
  Cc: David Marchand, Andrew Rybchenko, Ori Kam, bingz, stable,
	Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko

MLX5 PMD uses reference counting to manage RX queue resources.
After port stop shared RSS actions kept references to RX queues,
preventing resource release. As a result, internal PMD mempool
for such queues had been exhausted after a number of port restarts.
Diagnostic message from rte_eth_dev_start():

    Rx queue allocation failed: Cannot allocate memory

Dereference RX queues used by indirect actions on port stop (detach)
and restore references on port start (attach) in order to allow RX queue
resource release, but keep indirect actions across the port restart.
Replace queue IDs in HW by drop queue ID on detach and restore actual
queue IDs on attach.

Since rte_flow_action_handle_create() does not specify indirect action
behavior across port restart, document it for MLX5 PMD.

Fixes: 4b61b8774be9 ("ethdev: introduce indirect flow action")
Cc: bingz@nvidia.com
Cc: stable@dpdk.org

Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 doc/guides/nics/mlx5.rst        |   8 ++
 drivers/net/mlx5/mlx5_flow.c    | 194 ++++++++++++++++++++++++++++----
 drivers/net/mlx5/mlx5_flow.h    |   2 +
 drivers/net/mlx5/mlx5_rx.h      |   4 +
 drivers/net/mlx5/mlx5_rxq.c     |  99 ++++++++++++++--
 drivers/net/mlx5/mlx5_trigger.c |  10 ++
 6 files changed, 283 insertions(+), 34 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index a00793907f..f103126301 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -1813,6 +1813,14 @@ directly but neither destroyed nor flushed.
 
 The application should re-create the flows as required after the port restart.
 
+Indirect actions persist across device configure, stop, and start.
+After reconfiguration an existing indirect action may become invalid
+because of incompatibility with the new configuration.
+Such actions cannot be used to create a flow rule,
+also ``rte_eth_dev_configure()`` and ``rte_eth_dev_start()`` may fail
+with the same diagnostics as ``rte_flow_action_handle_create()`` would give.
+It is a programmer's responsibility to remove or update offending actions.
+
 Notes for testpmd
 -----------------
 
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index e8d2678877..5343720ec9 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -1524,6 +1524,58 @@ mlx5_flow_validate_action_queue(const struct rte_flow_action *action,
 	return 0;
 }
 
+/**
+ * Validate queue numbers for device RSS.
+ *
+ * @param[in] dev
+ *   Configured device.
+ * @param[in] queues
+ *   Array of queue numbers.
+ * @param[in] queues_n
+ *   Size of the @p queues array.
+ * @param[out] error
+ *   On error, filled with a textual error description.
+ * @param[out] queue
+ *   On error, filled with an offending queue index in @p queues array.
+ *
+ * @return
+ *   0 on success, a negative errno code on error.
+ */
+static int
+mlx5_validate_rss_queues(const struct rte_eth_dev *dev,
+			 const uint16_t *queues, uint32_t queues_n,
+			 const char **error, uint32_t *queue_idx)
+{
+	const struct mlx5_priv *priv = dev->data->dev_private;
+	enum mlx5_rxq_type rxq_type = MLX5_RXQ_TYPE_UNDEFINED;
+	uint32_t i;
+
+	for (i = 0; i != queues_n; ++i) {
+		struct mlx5_rxq_ctrl *rxq_ctrl;
+
+		if (queues[i] >= priv->rxqs_n) {
+			*error = "queue index out of range";
+			*queue_idx = i;
+			return -EINVAL;
+		}
+		if (!(*priv->rxqs)[queues[i]]) {
+			*error =  "queue is not configured";
+			*queue_idx = i;
+			return -EINVAL;
+		}
+		rxq_ctrl = container_of((*priv->rxqs)[queues[i]],
+					struct mlx5_rxq_ctrl, rxq);
+		if (i == 0)
+			rxq_type = rxq_ctrl->type;
+		if (rxq_type != rxq_ctrl->type) {
+			*error = "combining hairpin and regular RSS queues is not supported";
+			*queue_idx = i;
+			return -ENOTSUP;
+		}
+	}
+	return 0;
+}
+
 /*
  * Validate the rss action.
  *
@@ -1544,8 +1596,9 @@ mlx5_validate_action_rss(struct rte_eth_dev *dev,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	const struct rte_flow_action_rss *rss = action->conf;
-	enum mlx5_rxq_type rxq_type = MLX5_RXQ_TYPE_UNDEFINED;
-	unsigned int i;
+	int ret;
+	const char *message;
+	uint32_t queue_idx;
 
 	if (rss->func != RTE_ETH_HASH_FUNCTION_DEFAULT &&
 	    rss->func != RTE_ETH_HASH_FUNCTION_TOEPLITZ)
@@ -1609,27 +1662,12 @@ mlx5_validate_action_rss(struct rte_eth_dev *dev,
 		return rte_flow_error_set(error, EINVAL,
 					  RTE_FLOW_ERROR_TYPE_ACTION_CONF,
 					  NULL, "No queues configured");
-	for (i = 0; i != rss->queue_num; ++i) {
-		struct mlx5_rxq_ctrl *rxq_ctrl;
-
-		if (rss->queue[i] >= priv->rxqs_n)
-			return rte_flow_error_set
-				(error, EINVAL,
-				 RTE_FLOW_ERROR_TYPE_ACTION_CONF,
-				 &rss->queue[i], "queue index out of range");
-		if (!(*priv->rxqs)[rss->queue[i]])
-			return rte_flow_error_set
-				(error, EINVAL, RTE_FLOW_ERROR_TYPE_ACTION_CONF,
-				 &rss->queue[i], "queue is not configured");
-		rxq_ctrl = container_of((*priv->rxqs)[rss->queue[i]],
-					struct mlx5_rxq_ctrl, rxq);
-		if (i == 0)
-			rxq_type = rxq_ctrl->type;
-		if (rxq_type != rxq_ctrl->type)
-			return rte_flow_error_set
-				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION_CONF,
-				 &rss->queue[i],
-				 "combining hairpin and regular RSS queues is not supported");
+	ret = mlx5_validate_rss_queues(dev, rss->queue, rss->queue_num,
+				       &message, &queue_idx);
+	if (ret != 0) {
+		return rte_flow_error_set(error, -ret,
+					  RTE_FLOW_ERROR_TYPE_ACTION_CONF,
+					  &rss->queue[queue_idx], message);
 	}
 	return 0;
 }
@@ -8493,6 +8531,116 @@ mlx5_action_handle_flush(struct rte_eth_dev *dev)
 	return ret;
 }
 
+/**
+ * Validate existing indirect actions against current device configuration
+ * and attach them to device resources.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_action_handle_attach(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_indexed_pool *ipool =
+			priv->sh->ipool[MLX5_IPOOL_RSS_SHARED_ACTIONS];
+	struct mlx5_shared_action_rss *shared_rss, *shared_rss_last;
+	int ret = 0;
+	uint32_t idx;
+
+	ILIST_FOREACH(ipool, priv->rss_shared_actions, idx, shared_rss, next) {
+		struct mlx5_ind_table_obj *ind_tbl = shared_rss->ind_tbl;
+		const char *message;
+		uint32_t queue_idx;
+
+		ret = mlx5_validate_rss_queues(dev, ind_tbl->queues,
+					       ind_tbl->queues_n,
+					       &message, &queue_idx);
+		if (ret != 0) {
+			DRV_LOG(ERR, "Port %u cannot use queue %u in RSS: %s",
+				dev->data->port_id, ind_tbl->queues[queue_idx],
+				message);
+			break;
+		}
+	}
+	if (ret != 0)
+		return ret;
+	ILIST_FOREACH(ipool, priv->rss_shared_actions, idx, shared_rss, next) {
+		struct mlx5_ind_table_obj *ind_tbl = shared_rss->ind_tbl;
+
+		ret = mlx5_ind_table_obj_attach(dev, ind_tbl);
+		if (ret != 0) {
+			DRV_LOG(ERR, "Port %u could not attach "
+				"indirection table obj %p",
+				dev->data->port_id, (void *)ind_tbl);
+			goto error;
+		}
+	}
+	return 0;
+error:
+	shared_rss_last = shared_rss;
+	ILIST_FOREACH(ipool, priv->rss_shared_actions, idx, shared_rss, next) {
+		struct mlx5_ind_table_obj *ind_tbl = shared_rss->ind_tbl;
+
+		if (shared_rss == shared_rss_last)
+			break;
+		if (mlx5_ind_table_obj_detach(dev, ind_tbl) != 0)
+			DRV_LOG(CRIT, "Port %u could not detach "
+				"indirection table obj %p on rollback",
+				dev->data->port_id, (void *)ind_tbl);
+	}
+	return ret;
+}
+
+/**
+ * Detach indirect actions of the device from its resources.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_action_handle_detach(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_indexed_pool *ipool =
+			priv->sh->ipool[MLX5_IPOOL_RSS_SHARED_ACTIONS];
+	struct mlx5_shared_action_rss *shared_rss, *shared_rss_last;
+	int ret = 0;
+	uint32_t idx;
+
+	ILIST_FOREACH(ipool, priv->rss_shared_actions, idx, shared_rss, next) {
+		struct mlx5_ind_table_obj *ind_tbl = shared_rss->ind_tbl;
+
+		ret = mlx5_ind_table_obj_detach(dev, ind_tbl);
+		if (ret != 0) {
+			DRV_LOG(ERR, "Port %u could not detach "
+				"indirection table obj %p",
+				dev->data->port_id, (void *)ind_tbl);
+			goto error;
+		}
+	}
+	return 0;
+error:
+	shared_rss_last = shared_rss;
+	ILIST_FOREACH(ipool, priv->rss_shared_actions, idx, shared_rss, next) {
+		struct mlx5_ind_table_obj *ind_tbl = shared_rss->ind_tbl;
+
+		if (shared_rss == shared_rss_last)
+			break;
+		if (mlx5_ind_table_obj_attach(dev, ind_tbl) != 0)
+			DRV_LOG(CRIT, "Port %u could not attach "
+				"indirection table obj %p on rollback",
+				dev->data->port_id, (void *)ind_tbl);
+	}
+	return ret;
+}
+
 #ifndef HAVE_MLX5DV_DR
 #define MLX5_DOMAIN_SYNC_FLOW ((1 << 0) | (1 << 1))
 #else
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index da39eeb596..251d643f8c 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1575,6 +1575,8 @@ struct mlx5_flow_meter_sub_policy *mlx5_flow_meter_sub_policy_rss_prepare
 void mlx5_flow_destroy_sub_policy_with_rxq(struct rte_eth_dev *dev,
 		struct mlx5_flow_meter_policy *mtr_policy);
 int mlx5_flow_dv_discover_counter_offset_support(struct rte_eth_dev *dev);
+int mlx5_action_handle_attach(struct rte_eth_dev *dev);
+int mlx5_action_handle_detach(struct rte_eth_dev *dev);
 int mlx5_action_handle_flush(struct rte_eth_dev *dev);
 void mlx5_release_tunnel_hub(struct mlx5_dev_ctx_shared *sh, uint16_t port_id);
 int mlx5_alloc_tunnel_hub(struct mlx5_dev_ctx_shared *sh);
diff --git a/drivers/net/mlx5/mlx5_rx.h b/drivers/net/mlx5/mlx5_rx.h
index 3f2b99fb65..7319ad0264 100644
--- a/drivers/net/mlx5/mlx5_rx.h
+++ b/drivers/net/mlx5/mlx5_rx.h
@@ -222,6 +222,10 @@ int mlx5_ind_table_obj_modify(struct rte_eth_dev *dev,
 			      struct mlx5_ind_table_obj *ind_tbl,
 			      uint16_t *queues, const uint32_t queues_n,
 			      bool standalone);
+int mlx5_ind_table_obj_attach(struct rte_eth_dev *dev,
+			      struct mlx5_ind_table_obj *ind_tbl);
+int mlx5_ind_table_obj_detach(struct rte_eth_dev *dev,
+			      struct mlx5_ind_table_obj *ind_tbl);
 struct mlx5_list_entry *mlx5_hrxq_create_cb(void *tool_ctx, void *cb_ctx);
 int mlx5_hrxq_match_cb(void *tool_ctx, struct mlx5_list_entry *entry,
 		       void *cb_ctx);
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 49165f482e..1140f6067e 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -2024,6 +2024,26 @@ mlx5_ind_table_obj_new(struct rte_eth_dev *dev, const uint16_t *queues,
 	return ind_tbl;
 }
 
+static int
+mlx5_ind_table_obj_check_standalone(struct rte_eth_dev *dev __rte_unused,
+				    struct mlx5_ind_table_obj *ind_tbl)
+{
+	uint32_t refcnt;
+
+	refcnt = __atomic_load_n(&ind_tbl->refcnt, __ATOMIC_RELAXED);
+	if (refcnt <= 1)
+		return 0;
+	/*
+	 * Modification of indirection tables having more than 1
+	 * reference is unsupported.
+	 */
+	DRV_LOG(DEBUG,
+		"Port %u cannot modify indirection table %p (refcnt %u > 1).",
+		dev->data->port_id, (void *)ind_tbl, refcnt);
+	rte_errno = EINVAL;
+	return -rte_errno;
+}
+
 /**
  * Modify an indirection table.
  *
@@ -2056,18 +2076,8 @@ mlx5_ind_table_obj_modify(struct rte_eth_dev *dev,
 
 	MLX5_ASSERT(standalone);
 	RTE_SET_USED(standalone);
-	if (__atomic_load_n(&ind_tbl->refcnt, __ATOMIC_RELAXED) > 1) {
-		/*
-		 * Modification of indirection ntables having more than 1
-		 * reference unsupported. Intended for standalone indirection
-		 * tables only.
-		 */
-		DRV_LOG(DEBUG,
-			"Port %u cannot modify indirection table (refcnt> 1).",
-			dev->data->port_id);
-		rte_errno = EINVAL;
+	if (mlx5_ind_table_obj_check_standalone(dev, ind_tbl) < 0)
 		return -rte_errno;
-	}
 	for (i = 0; i != queues_n; ++i) {
 		if (!mlx5_rxq_get(dev, queues[i])) {
 			ret = -rte_errno;
@@ -2093,6 +2103,73 @@ mlx5_ind_table_obj_modify(struct rte_eth_dev *dev,
 	return ret;
 }
 
+/**
+ * Attach an indirection table to its queues.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param ind_table
+ *   Indirection table to attach.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_ind_table_obj_attach(struct rte_eth_dev *dev,
+			  struct mlx5_ind_table_obj *ind_tbl)
+{
+	unsigned int i;
+	int ret;
+
+	ret = mlx5_ind_table_obj_modify(dev, ind_tbl, ind_tbl->queues,
+					ind_tbl->queues_n, true);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Port %u could not modify indirect table obj %p",
+			dev->data->port_id, (void *)ind_tbl);
+		return ret;
+	}
+	for (i = 0; i < ind_tbl->queues_n; i++)
+		mlx5_rxq_get(dev, ind_tbl->queues[i]);
+	return 0;
+}
+
+/**
+ * Detach an indirection table from its queues.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param ind_table
+ *   Indirection table to detach.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_ind_table_obj_detach(struct rte_eth_dev *dev,
+			  struct mlx5_ind_table_obj *ind_tbl)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const unsigned int n = rte_is_power_of_2(ind_tbl->queues_n) ?
+			       log2above(ind_tbl->queues_n) :
+			       log2above(priv->config.ind_table_max_size);
+	unsigned int i;
+	int ret;
+
+	ret = mlx5_ind_table_obj_check_standalone(dev, ind_tbl);
+	if (ret != 0)
+		return ret;
+	MLX5_ASSERT(priv->obj_ops.ind_table_modify);
+	ret = priv->obj_ops.ind_table_modify(dev, n, NULL, 0, ind_tbl);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Port %u could not modify indirect table obj %p",
+			dev->data->port_id, (void *)ind_tbl);
+		return ret;
+	}
+	for (i = 0; i < ind_tbl->queues_n; i++)
+		mlx5_rxq_release(dev, ind_tbl->queues[i]);
+	return ret;
+}
+
 int
 mlx5_hrxq_match_cb(void *tool_ctx __rte_unused, struct mlx5_list_entry *entry,
 		   void *cb_ctx)
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index a9d5d58fd9..6761a84a68 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -14,6 +14,7 @@
 #include <mlx5_malloc.h>
 
 #include "mlx5.h"
+#include "mlx5_flow.h"
 #include "mlx5_mr.h"
 #include "mlx5_rx.h"
 #include "mlx5_tx.h"
@@ -1115,6 +1116,14 @@ mlx5_dev_start(struct rte_eth_dev *dev)
 	mlx5_rxq_timestamp_set(dev);
 	/* Set a mask and offset of scheduling on timestamp into Tx queues. */
 	mlx5_txq_dynf_timestamp_set(dev);
+	/* Attach indirection table objects detached on port stop. */
+	ret = mlx5_action_handle_attach(dev);
+	if (ret) {
+		DRV_LOG(ERR,
+			"port %u failed to attach indirect actions: %s",
+			dev->data->port_id, rte_strerror(rte_errno));
+		goto error;
+	}
 	/*
 	 * In non-cached mode, it only needs to start the default mreg copy
 	 * action and no flow created by application exists anymore.
@@ -1187,6 +1196,7 @@ mlx5_dev_stop(struct rte_eth_dev *dev)
 	/* All RX queue flags will be cleared in the flush interface. */
 	mlx5_flow_list_flush(dev, MLX5_FLOW_TYPE_GEN, true);
 	mlx5_flow_meter_rxq_flush(dev);
+	mlx5_action_handle_detach(dev);
 	mlx5_rx_intr_vec_disable(dev);
 	priv->sh->port[priv->dev_port - 1].ih_port_id = RTE_MAX_ETHPORTS;
 	priv->sh->port[priv->dev_port - 1].devx_ih_port_id = RTE_MAX_ETHPORTS;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-07-29 14:01 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-27  7:31 [dpdk-dev] [PATCH 0/4] net/mlx5: keep indirect actions across port restart Dmitry Kozlyuk
2021-07-27  7:31 ` [dpdk-dev] [PATCH 1/4] net/mlx5: discover max flow priority using DevX Dmitry Kozlyuk
2021-07-27  7:31 ` [dpdk-dev] [PATCH 2/4] net/mlx5: create drop queue " Dmitry Kozlyuk
2021-07-27  7:31 ` [dpdk-dev] [PATCH 3/4] net/mlx5: preserve indirect actions across port restart Dmitry Kozlyuk
2021-07-27  7:31 ` [dpdk-dev] [PATCH 4/4] ethdev: document indirect flow action life cycle Dmitry Kozlyuk
2021-07-28  9:50   ` Ori Kam
2021-07-28  8:05 ` [dpdk-dev] [PATCH 0/4] net/mlx5: keep indirect actions across port restart Andrew Rybchenko
2021-07-28 11:18   ` Dmitry Kozlyuk
2021-07-28 12:07     ` Ori Kam
2021-07-28 12:26     ` Andrew Rybchenko
2021-07-28 14:08       ` Dmitry Kozlyuk
2021-07-28 17:07         ` Ori Kam
2021-07-29 14:00 ` [dpdk-dev] [PATCH v2 " Dmitry Kozlyuk
2021-07-29 14:00   ` [dpdk-dev] [PATCH v2 1/3] net/mlx5: discover max flow priority using DevX Dmitry Kozlyuk
2021-07-29 14:00   ` [dpdk-dev] [PATCH v2 2/3] net/mlx5: create drop queue " Dmitry Kozlyuk
2021-07-29 14:00   ` [dpdk-dev] [PATCH v2 3/3] net/mlx5: preserve indirect actions across port restart Dmitry Kozlyuk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).