* [dpdk-dev] [PATCH 0/5] Flow entites behavior on port restart
@ 2021-10-05  0:52 dkozlyuk
  2021-10-05  0:52 ` [dpdk-dev] [PATCH 1/5] ethdev: add capability to keep flow rules on restart dkozlyuk
                   ` (5 more replies)
  0 siblings, 6 replies; 96+ messages in thread
From: dkozlyuk @ 2021-10-05  0:52 UTC (permalink / raw)
  To: dev; +Cc: Dmitry Kozlyuk
From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
It is unspecified whether flow rules and indirect actions are kept
when a port is stopped, possibly reconfigured, and started again.
Vendors approach the topic differently, e.g. mlx5 and i40e PMD
disagree in whether flow rules can be kept, and mlx5 PMD would keep
indirect actions. In the end, applications are greatly affected
by whatever contract there is and need to know it.
It is proposed to advertise capabilities of keeping flow rules
and indirect actions (as a special case of shared object) using
ethdev info. Then a bug is fixed in mlx5 PMD that prevented indirect RSS
action from being kept, and the driver starts advertising the new
capability.
Prior discussions:
1) http://inbox.dpdk.org/dev/20210727073121.895620-1-dkozlyuk@nvidia.com/
2) http://inbox.dpdk.org/dev/20210901085516.3647814-1-dkozlyuk@nvidia.com/
Dmitry Kozlyuk (5):
  ethdev: add capability to keep flow rules on restart
  ethdev: add capability to keep shared objects on restart
  net/mlx5: discover max flow priority using DevX
  net/mlx5: create drop queue using DevX
  net/mlx5: preserve indirect actions on restart
 doc/guides/prog_guide/rte_flow.rst |  21 +++
 drivers/net/mlx5/linux/mlx5_os.c   |   5 -
 drivers/net/mlx5/mlx5_devx.c       | 204 +++++++++++++++++---
 drivers/net/mlx5/mlx5_ethdev.c     |   1 +
 drivers/net/mlx5/mlx5_flow.c       | 292 ++++++++++++++++++++++++++---
 drivers/net/mlx5/mlx5_flow.h       |   6 +
 drivers/net/mlx5/mlx5_flow_dv.c    | 103 ++++++++++
 drivers/net/mlx5/mlx5_flow_verbs.c |  77 +-------
 drivers/net/mlx5/mlx5_rx.h         |   4 +
 drivers/net/mlx5/mlx5_rxq.c        |  99 ++++++++--
 drivers/net/mlx5/mlx5_trigger.c    |  10 +
 lib/ethdev/rte_ethdev.h            |   8 +
 12 files changed, 696 insertions(+), 134 deletions(-)
-- 
2.25.1
^ permalink raw reply	[flat|nested] 96+ messages in thread
* [dpdk-dev] [PATCH 1/5] ethdev: add capability to keep flow rules on restart
  2021-10-05  0:52 [dpdk-dev] [PATCH 0/5] Flow entites behavior on port restart dkozlyuk
@ 2021-10-05  0:52 ` dkozlyuk
  2021-10-06  6:15   ` Ori Kam
  2021-10-06 17:15   ` Ajit Khaparde
  2021-10-05  0:52 ` [dpdk-dev] [PATCH 2/5] ethdev: add capability to keep shared objects " dkozlyuk
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 96+ messages in thread
From: dkozlyuk @ 2021-10-05  0:52 UTC (permalink / raw)
  To: dev
  Cc: Dmitry Kozlyuk, Ori Kam, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Currently, it is not specified what happens to the flow rules when
the device is stopped, possibly reconfigured, then started.
If flow rules were kept, it could be convenient for application
developers, because they wouldn't need to save and restore them.
However, due to the number of flows and possible creation rate it is
impractical to save all flow rules in DPDK layer. This means that flow
rules persistence really depends on whether PMD and HW can implement it
efficiently. It is proposed for PMDs to advertise this capability
if supported using a new flag.
If the device is being reconfigured in a way that is incompatible with
existing flow rules, PMD is required to report an error.
This is mandatory, because flow API does not supply users with
capabilities, so this is the only way for a user to learn that
configuration is invalid. For example, if queue count changes and the
action of a flow rule specifies queues that are going away, the user
must update or remove the flow rule before removing the queues.
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
---
 doc/guides/prog_guide/rte_flow.rst | 9 +++++++++
 lib/ethdev/rte_ethdev.h            | 2 ++
 2 files changed, 11 insertions(+)
diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index 2b42d5ec8c..0a03097a7c 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -87,6 +87,15 @@ To avoid resource leaks on the PMD side, handles must be explicitly
 destroyed by the application before releasing associated resources such as
 queues and ports.
 
+By default flow rules are implicitly destroyed when the device is stopped.
+If the device advertises ``RTE_DEV_CAPA_FLOW_RULE_KEEP``, flow rules persist
+across device stop and start with possible reconfiguration in between.
+Some configuration changes may be incompatible with existing flow rules,
+in this case ``rte_eth_dev_configure()`` or ``rte_eth_rx/tx_queue_setup()``
+will fail. At this point PMD developers are encouraged to log errors identical
+to the ones that would be emitted by ``rte_flow_create()`` if the new
+configuration was active.
+
 The following sections cover:
 
 - **Attributes** (represented by ``struct rte_flow_attr``): properties of a
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index afdc53b674..d24de5e8fa 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1478,6 +1478,8 @@ struct rte_eth_conf {
 #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
 /** Device supports Tx queue setup after device started. */
 #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
+/** Device keeps flow rules across restart and reconfiguration. */
+#define RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP 0x00000004
 /**@}*/
 
 /*
-- 
2.25.1
^ permalink raw reply	[flat|nested] 96+ messages in thread
* [dpdk-dev] [PATCH 2/5] ethdev: add capability to keep shared objects on restart
  2021-10-05  0:52 [dpdk-dev] [PATCH 0/5] Flow entites behavior on port restart dkozlyuk
  2021-10-05  0:52 ` [dpdk-dev] [PATCH 1/5] ethdev: add capability to keep flow rules on restart dkozlyuk
@ 2021-10-05  0:52 ` dkozlyuk
  2021-10-06  6:16   ` Ori Kam
  2021-10-13  8:32   ` Dmitry Kozlyuk
  2021-10-05  0:52 ` [dpdk-dev] [PATCH 3/5] net/mlx5: discover max flow priority using DevX dkozlyuk
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 96+ messages in thread
From: dkozlyuk @ 2021-10-05  0:52 UTC (permalink / raw)
  To: dev
  Cc: Dmitry Kozlyuk, Ori Kam, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
rte_flow_action_handle_create() did not mention what happens
with an indirect action when a device is stopped, possibly reconfigured,
and started again. It is natural for some indirect actions to be
persistent, like counters and meters; keeping others just saves
application time and complexity. However, not all PMDs can support it.
It is proposed to add a device capability to indicate if indirect actions
are kept across the above sequence or implicitly destroyed.
In the future, indirect actions may not be the only type of objects
shared between flow rules. The capability bit intends to cover all
possible types of such objects, hence its name.
It may happen that in the future a PMD acquires support for a type of
shared objects that it cannot keep across a restart. It is undesirable
to stop advertising the capability so that applications that don't use
objects of the problematic type can still take advantage of it.
This is why PMDs are allowed to keep only a subset of shared objects
provided that the vendor mandatorily documents it.
If the device is being reconfigured in a way that is incompatible with
an existing shared objects, PMD is required to report an error.
This is mandatory, because flow API does not supply users with
capabilities, so this is the only way for a user to learn that
configuration is invalid. For example, if queue count changes and RSS
indirect action specifies queues that are going away, the user must
update the action before removing the queues or remove the action and
all flow rules that were using it.
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
---
 doc/guides/prog_guide/rte_flow.rst | 12 ++++++++++++
 lib/ethdev/rte_ethdev.h            |  6 ++++++
 2 files changed, 18 insertions(+)
diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index 0a03097a7c..4597853bff 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -2794,6 +2794,18 @@ updated depend on the type of the ``action`` and different for every type.
 The indirect action specified data (e.g. counter) can be queried by
 ``rte_flow_action_handle_query()``.
 
+By default indirect actions are destroyed when the device is stopped.
+If the device advertises ``RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP``,
+indirect actions persist across the device stop and start with possible
+reconfiguration in between. Some configuration changes may be incompatible
+with existing indirect actions, in this case ``rte_eth_dev_configure()`` and/or
+``rte_eth_rx/tx_queue_setup()`` will fail. At this point PMD developers
+are encouraged to log errors identical to the ones that would be emitted by
+``rte_flow_action_handle_create()`` if the new configuration was active.
+Even if this capability is advertised, there may be kinds of indirect actions
+that the device cannot keep. They are implicitly destroyed at device stop.
+PMD developers must document such kinds of actions if applicable.
+
 .. _table_rte_flow_action_handle:
 
 .. table:: INDIRECT
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index d24de5e8fa..3d9a42672f 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1480,6 +1480,12 @@ struct rte_eth_conf {
 #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
 /** Device keeps flow rules across restart and reconfiguration. */
 #define RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP 0x00000004
+/**
+ * Device keeps objects that are shared between flow rules,
+ * e.g. indirect actions, across restart and reconfiguration.
+ * For a specific PMD this may not be applicable to certain action types.
+ */
+#define RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP 0x00000008
 /**@}*/
 
 /*
-- 
2.25.1
^ permalink raw reply	[flat|nested] 96+ messages in thread
* [dpdk-dev] [PATCH 3/5] net/mlx5: discover max flow priority using DevX
  2021-10-05  0:52 [dpdk-dev] [PATCH 0/5] Flow entites behavior on port restart dkozlyuk
  2021-10-05  0:52 ` [dpdk-dev] [PATCH 1/5] ethdev: add capability to keep flow rules on restart dkozlyuk
  2021-10-05  0:52 ` [dpdk-dev] [PATCH 2/5] ethdev: add capability to keep shared objects " dkozlyuk
@ 2021-10-05  0:52 ` dkozlyuk
  2021-10-05  0:52 ` [dpdk-dev] [PATCH 4/5] net/mlx5: create drop queue " dkozlyuk
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 96+ messages in thread
From: dkozlyuk @ 2021-10-05  0:52 UTC (permalink / raw)
  To: dev; +Cc: Dmitry Kozlyuk, stable, Matan Azrad, Viacheslav Ovsiienko
From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Maximum available flow priority was discovered using Verbs API
regardless of the selected flow engine. This required some Verbs
objects to be initialized in order to use DevX engine. Make priority
discovery an engine method and implement it for DevX using its API.
Cc: stable@dpdk.org
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_os.c   |   1 -
 drivers/net/mlx5/mlx5_flow.c       |  98 +++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_flow.h       |   4 ++
 drivers/net/mlx5/mlx5_flow_dv.c    | 103 +++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_flow_verbs.c |  77 +++------------------
 5 files changed, 215 insertions(+), 68 deletions(-)
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 3746057673..8ee7ada51b 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1830,7 +1830,6 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	priv->drop_queue.hrxq = mlx5_drop_action_create(eth_dev);
 	if (!priv->drop_queue.hrxq)
 		goto error;
-	/* Supported Verbs flow priority number detection. */
 	err = mlx5_flow_discover_priorities(eth_dev);
 	if (err < 0) {
 		err = -err;
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index c914a7120c..bfc3e20c9a 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -9416,3 +9416,101 @@ mlx5_dbg__print_pattern(const struct rte_flow_item *item)
 	}
 	printf("END\n");
 }
+
+/* Map of Verbs to Flow priority with 8 Verbs priorities. */
+static const uint32_t priority_map_3[][MLX5_PRIORITY_MAP_MAX] = {
+	{ 0, 1, 2 }, { 2, 3, 4 }, { 5, 6, 7 },
+};
+
+/* Map of Verbs to Flow priority with 16 Verbs priorities. */
+static const uint32_t priority_map_5[][MLX5_PRIORITY_MAP_MAX] = {
+	{ 0, 1, 2 }, { 3, 4, 5 }, { 6, 7, 8 },
+	{ 9, 10, 11 }, { 12, 13, 14 },
+};
+
+/**
+ * Discover the number of available flow priorities.
+ *
+ * @param dev
+ *   Ethernet device.
+ *
+ * @return
+ *   On success, number of available flow priorities.
+ *   On failure, a negative errno-style code and rte_errno is set.
+ */
+int
+mlx5_flow_discover_priorities(struct rte_eth_dev *dev)
+{
+	static const uint16_t vprio[] = {8, 16};
+	const struct mlx5_priv *priv = dev->data->dev_private;
+	const struct mlx5_flow_driver_ops *fops;
+	enum mlx5_flow_drv_type type;
+	int ret;
+
+	type = mlx5_flow_os_get_type();
+	if (type == MLX5_FLOW_TYPE_MAX) {
+		type = MLX5_FLOW_TYPE_VERBS;
+		if (priv->config.devx && priv->config.dv_flow_en)
+			type = MLX5_FLOW_TYPE_DV;
+	}
+	fops = flow_get_drv_ops(type);
+	if (fops->discover_priorities == NULL) {
+		DRV_LOG(ERR, "Priority discovery not supported");
+		rte_errno = ENOTSUP;
+		return -rte_errno;
+	}
+	ret = fops->discover_priorities(dev, vprio, RTE_DIM(vprio));
+	if (ret < 0)
+		return ret;
+	switch (ret) {
+	case 8:
+		ret = RTE_DIM(priority_map_3);
+		break;
+	case 16:
+		ret = RTE_DIM(priority_map_5);
+		break;
+	default:
+		rte_errno = ENOTSUP;
+		DRV_LOG(ERR,
+			"port %u maximum priority: %d expected 8/16",
+			dev->data->port_id, ret);
+		return -rte_errno;
+	}
+	DRV_LOG(INFO, "port %u supported flow priorities:"
+		" 0-%d for ingress or egress root table,"
+		" 0-%d for non-root table or transfer root table.",
+		dev->data->port_id, ret - 2,
+		MLX5_NON_ROOT_FLOW_MAX_PRIO - 1);
+	return ret;
+}
+
+/**
+ * Adjust flow priority based on the highest layer and the request priority.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] priority
+ *   The rule base priority.
+ * @param[in] subpriority
+ *   The priority based on the items.
+ *
+ * @return
+ *   The new priority.
+ */
+uint32_t
+mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
+			  uint32_t subpriority)
+{
+	uint32_t res = 0;
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	switch (priv->config.flow_prio) {
+	case RTE_DIM(priority_map_3):
+		res = priority_map_3[priority][subpriority];
+		break;
+	case RTE_DIM(priority_map_5):
+		res = priority_map_5[priority][subpriority];
+		break;
+	}
+	return  res;
+}
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 5c68d4f7d7..8f94125f26 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1226,6 +1226,9 @@ typedef int (*mlx5_flow_create_def_policy_t)
 			(struct rte_eth_dev *dev);
 typedef void (*mlx5_flow_destroy_def_policy_t)
 			(struct rte_eth_dev *dev);
+typedef int (*mlx5_flow_discover_priorities_t)
+			(struct rte_eth_dev *dev,
+			 const uint16_t *vprio, int vprio_n);
 
 struct mlx5_flow_driver_ops {
 	mlx5_flow_validate_t validate;
@@ -1260,6 +1263,7 @@ struct mlx5_flow_driver_ops {
 	mlx5_flow_action_update_t action_update;
 	mlx5_flow_action_query_t action_query;
 	mlx5_flow_sync_domain_t sync_domain;
+	mlx5_flow_discover_priorities_t discover_priorities;
 };
 
 /* mlx5_flow.c */
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index c6370cd1d6..155745748f 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -17978,6 +17978,108 @@ flow_dv_sync_domain(struct rte_eth_dev *dev, uint32_t domains, uint32_t flags)
 	return 0;
 }
 
+/**
+ * Discover the number of available flow priorities
+ * by trying to create a flow with the highest priority value
+ * for each possible number.
+ *
+ * @param[in] dev
+ *   Ethernet device.
+ * @param[in] vprio
+ *   List of possible number of available priorities.
+ * @param[in] vprio_n
+ *   Size of @p vprio array.
+ * @return
+ *   On success, number of available flow priorities.
+ *   On failure, a negative errno-style code and rte_errno is set.
+ */
+static int
+flow_dv_discover_priorities(struct rte_eth_dev *dev,
+			    const uint16_t *vprio, int vprio_n)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_indexed_pool *pool = priv->sh->ipool[MLX5_IPOOL_MLX5_FLOW];
+	struct rte_flow_item_eth eth;
+	struct rte_flow_item item = {
+		.type = RTE_FLOW_ITEM_TYPE_ETH,
+		.spec = ð,
+		.mask = ð,
+	};
+	struct mlx5_flow_dv_matcher matcher = {
+		.mask = {
+			.size = sizeof(matcher.mask.buf),
+		},
+	};
+	union mlx5_flow_tbl_key tbl_key;
+	struct mlx5_flow flow;
+	void *action;
+	struct rte_flow_error error;
+	uint8_t misc_mask;
+	int i, err, ret = -ENOTSUP;
+
+	/*
+	 * Prepare a flow with a catch-all pattern and a drop action.
+	 * Use drop queue, because shared drop action may be unavailable.
+	 */
+	action = priv->drop_queue.hrxq->action;
+	if (action == NULL) {
+		DRV_LOG(ERR, "Priority discovery requires a drop action");
+		rte_errno = ENOTSUP;
+		return -rte_errno;
+	}
+	memset(&flow, 0, sizeof(flow));
+	flow.handle = mlx5_ipool_zmalloc(pool, &flow.handle_idx);
+	if (flow.handle == NULL) {
+		DRV_LOG(ERR, "Cannot create flow handle");
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	flow.ingress = true;
+	flow.dv.value.size = MLX5_ST_SZ_BYTES(fte_match_param);
+	flow.dv.actions[0] = action;
+	flow.dv.actions_n = 1;
+	memset(ð, 0, sizeof(eth));
+	flow_dv_translate_item_eth(matcher.mask.buf, flow.dv.value.buf,
+				   &item, /* inner */ false, /* group */ 0);
+	matcher.crc = rte_raw_cksum(matcher.mask.buf, matcher.mask.size);
+	for (i = 0; i < vprio_n; i++) {
+		/* Configure the next proposed maximum priority. */
+		matcher.priority = vprio[i] - 1;
+		memset(&tbl_key, 0, sizeof(tbl_key));
+		err = flow_dv_matcher_register(dev, &matcher, &tbl_key, &flow,
+					       /* tunnel */ NULL,
+					       /* group */ 0,
+					       &error);
+		if (err != 0) {
+			/* This action is pure SW and must always succeed. */
+			DRV_LOG(ERR, "Cannot register matcher");
+			ret = -rte_errno;
+			break;
+		}
+		/* Try to apply the flow to HW. */
+		misc_mask = flow_dv_matcher_enable(flow.dv.value.buf);
+		__flow_dv_adjust_buf_size(&flow.dv.value.size, misc_mask);
+		err = mlx5_flow_os_create_flow
+				(flow.handle->dvh.matcher->matcher_object,
+				 (void *)&flow.dv.value, flow.dv.actions_n,
+				 flow.dv.actions, &flow.handle->drv_flow);
+		if (err == 0) {
+			claim_zero(mlx5_flow_os_destroy_flow
+						(flow.handle->drv_flow));
+			flow.handle->drv_flow = NULL;
+		}
+		claim_zero(flow_dv_matcher_release(dev, flow.handle));
+		if (err != 0)
+			break;
+		ret = vprio[i];
+	}
+	mlx5_ipool_free(pool, flow.handle_idx);
+	/* Set rte_errno if no expected priority value matched. */
+	if (ret < 0)
+		rte_errno = -ret;
+	return ret;
+}
+
 const struct mlx5_flow_driver_ops mlx5_flow_dv_drv_ops = {
 	.validate = flow_dv_validate,
 	.prepare = flow_dv_prepare,
@@ -18011,6 +18113,7 @@ const struct mlx5_flow_driver_ops mlx5_flow_dv_drv_ops = {
 	.action_update = flow_dv_action_update,
 	.action_query = flow_dv_action_query,
 	.sync_domain = flow_dv_sync_domain,
+	.discover_priorities = flow_dv_discover_priorities,
 };
 
 #endif /* HAVE_IBV_FLOW_DV_SUPPORT */
diff --git a/drivers/net/mlx5/mlx5_flow_verbs.c b/drivers/net/mlx5/mlx5_flow_verbs.c
index b93fd4d2c9..72b9db6c7f 100644
--- a/drivers/net/mlx5/mlx5_flow_verbs.c
+++ b/drivers/net/mlx5/mlx5_flow_verbs.c
@@ -28,17 +28,6 @@
 #define VERBS_SPEC_INNER(item_flags) \
 	(!!((item_flags) & MLX5_FLOW_LAYER_TUNNEL) ? IBV_FLOW_SPEC_INNER : 0)
 
-/* Map of Verbs to Flow priority with 8 Verbs priorities. */
-static const uint32_t priority_map_3[][MLX5_PRIORITY_MAP_MAX] = {
-	{ 0, 1, 2 }, { 2, 3, 4 }, { 5, 6, 7 },
-};
-
-/* Map of Verbs to Flow priority with 16 Verbs priorities. */
-static const uint32_t priority_map_5[][MLX5_PRIORITY_MAP_MAX] = {
-	{ 0, 1, 2 }, { 3, 4, 5 }, { 6, 7, 8 },
-	{ 9, 10, 11 }, { 12, 13, 14 },
-};
-
 /* Verbs specification header. */
 struct ibv_spec_header {
 	enum ibv_flow_spec_type type;
@@ -50,13 +39,17 @@ struct ibv_spec_header {
  *
  * @param[in] dev
  *   Pointer to the Ethernet device structure.
- *
+ * @param[in] vprio
+ *   Expected result variants.
+ * @param[in] vprio_n
+ *   Number of entries in @p vprio array.
  * @return
- *   number of supported flow priority on success, a negative errno
+ *   Number of supported flow priority on success, a negative errno
  *   value otherwise and rte_errno is set.
  */
-int
-mlx5_flow_discover_priorities(struct rte_eth_dev *dev)
+static int
+flow_verbs_discover_priorities(struct rte_eth_dev *dev,
+			       const uint16_t *vprio, int vprio_n)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct {
@@ -79,7 +72,6 @@ mlx5_flow_discover_priorities(struct rte_eth_dev *dev)
 	};
 	struct ibv_flow *flow;
 	struct mlx5_hrxq *drop = priv->drop_queue.hrxq;
-	uint16_t vprio[] = { 8, 16 };
 	int i;
 	int priority = 0;
 
@@ -87,7 +79,7 @@ mlx5_flow_discover_priorities(struct rte_eth_dev *dev)
 		rte_errno = ENOTSUP;
 		return -rte_errno;
 	}
-	for (i = 0; i != RTE_DIM(vprio); i++) {
+	for (i = 0; i != vprio_n; i++) {
 		flow_attr.attr.priority = vprio[i] - 1;
 		flow = mlx5_glue->create_flow(drop->qp, &flow_attr.attr);
 		if (!flow)
@@ -95,59 +87,9 @@ mlx5_flow_discover_priorities(struct rte_eth_dev *dev)
 		claim_zero(mlx5_glue->destroy_flow(flow));
 		priority = vprio[i];
 	}
-	switch (priority) {
-	case 8:
-		priority = RTE_DIM(priority_map_3);
-		break;
-	case 16:
-		priority = RTE_DIM(priority_map_5);
-		break;
-	default:
-		rte_errno = ENOTSUP;
-		DRV_LOG(ERR,
-			"port %u verbs maximum priority: %d expected 8/16",
-			dev->data->port_id, priority);
-		return -rte_errno;
-	}
-	DRV_LOG(INFO, "port %u supported flow priorities:"
-		" 0-%d for ingress or egress root table,"
-		" 0-%d for non-root table or transfer root table.",
-		dev->data->port_id, priority - 2,
-		MLX5_NON_ROOT_FLOW_MAX_PRIO - 1);
 	return priority;
 }
 
-/**
- * Adjust flow priority based on the highest layer and the request priority.
- *
- * @param[in] dev
- *   Pointer to the Ethernet device structure.
- * @param[in] priority
- *   The rule base priority.
- * @param[in] subpriority
- *   The priority based on the items.
- *
- * @return
- *   The new priority.
- */
-uint32_t
-mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
-				   uint32_t subpriority)
-{
-	uint32_t res = 0;
-	struct mlx5_priv *priv = dev->data->dev_private;
-
-	switch (priv->config.flow_prio) {
-	case RTE_DIM(priority_map_3):
-		res = priority_map_3[priority][subpriority];
-		break;
-	case RTE_DIM(priority_map_5):
-		res = priority_map_5[priority][subpriority];
-		break;
-	}
-	return  res;
-}
-
 /**
  * Get Verbs flow counter by index.
  *
@@ -2105,4 +2047,5 @@ const struct mlx5_flow_driver_ops mlx5_flow_verbs_drv_ops = {
 	.destroy = flow_verbs_destroy,
 	.query = flow_verbs_query,
 	.sync_domain = flow_verbs_sync_domain,
+	.discover_priorities = flow_verbs_discover_priorities,
 };
-- 
2.25.1
^ permalink raw reply	[flat|nested] 96+ messages in thread
* [dpdk-dev] [PATCH 4/5] net/mlx5: create drop queue using DevX
  2021-10-05  0:52 [dpdk-dev] [PATCH 0/5] Flow entites behavior on port restart dkozlyuk
                   ` (2 preceding siblings ...)
  2021-10-05  0:52 ` [dpdk-dev] [PATCH 3/5] net/mlx5: discover max flow priority using DevX dkozlyuk
@ 2021-10-05  0:52 ` dkozlyuk
  2021-10-05  0:52 ` [dpdk-dev] [PATCH 5/5] net/mlx5: preserve indirect actions on restart dkozlyuk
  2021-10-15 16:18 ` [dpdk-dev] [PATCH v2 0/5] Flow entites behavior on port restart Dmitry Kozlyuk
  5 siblings, 0 replies; 96+ messages in thread
From: dkozlyuk @ 2021-10-05  0:52 UTC (permalink / raw)
  To: dev; +Cc: Dmitry Kozlyuk, stable, Matan Azrad, Viacheslav Ovsiienko
From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Drop queue creation and destruction were not implemented for DevX
flow engine and Verbs engine methods were used as a workaround.
Implement these methods for DevX so that there is a valid queue ID
that can be used regardless of queue configuration via API.
Cc: stable@dpdk.org
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_os.c |   4 -
 drivers/net/mlx5/mlx5_devx.c     | 204 ++++++++++++++++++++++++++-----
 2 files changed, 176 insertions(+), 32 deletions(-)
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 8ee7ada51b..985f0bd489 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1790,10 +1790,6 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	}
 	if (config->devx && config->dv_flow_en && config->dest_tir) {
 		priv->obj_ops = devx_obj_ops;
-		priv->obj_ops.drop_action_create =
-						ibv_obj_ops.drop_action_create;
-		priv->obj_ops.drop_action_destroy =
-						ibv_obj_ops.drop_action_destroy;
 #ifndef HAVE_MLX5DV_DEVX_UAR_OFFSET
 		priv->obj_ops.txq_obj_modify = ibv_obj_ops.txq_obj_modify;
 #else
diff --git a/drivers/net/mlx5/mlx5_devx.c b/drivers/net/mlx5/mlx5_devx.c
index a1db53577a..447d6bafb9 100644
--- a/drivers/net/mlx5/mlx5_devx.c
+++ b/drivers/net/mlx5/mlx5_devx.c
@@ -226,17 +226,17 @@ mlx5_rx_devx_get_event(struct mlx5_rxq_obj *rxq_obj)
  *
  * @param dev
  *   Pointer to Ethernet device.
- * @param idx
- *   Queue index in DPDK Rx queue array.
+ * @param rxq_data
+ *   RX queue data.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_rxq_create_devx_rq_resources(struct rte_eth_dev *dev, uint16_t idx)
+mlx5_rxq_create_devx_rq_resources(struct rte_eth_dev *dev,
+				  struct mlx5_rxq_data *rxq_data)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[idx];
 	struct mlx5_rxq_ctrl *rxq_ctrl =
 		container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
 	struct mlx5_devx_create_rq_attr rq_attr = { 0 };
@@ -289,20 +289,20 @@ mlx5_rxq_create_devx_rq_resources(struct rte_eth_dev *dev, uint16_t idx)
  *
  * @param dev
  *   Pointer to Ethernet device.
- * @param idx
- *   Queue index in DPDK Rx queue array.
+ * @param rxq_data
+ *   RX queue data.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_rxq_create_devx_cq_resources(struct rte_eth_dev *dev, uint16_t idx)
+mlx5_rxq_create_devx_cq_resources(struct rte_eth_dev *dev,
+				  struct mlx5_rxq_data *rxq_data)
 {
 	struct mlx5_devx_cq *cq_obj = 0;
 	struct mlx5_devx_cq_attr cq_attr = { 0 };
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_dev_ctx_shared *sh = priv->sh;
-	struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[idx];
 	struct mlx5_rxq_ctrl *rxq_ctrl =
 		container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
 	unsigned int cqe_n = mlx5_rxq_cqe_num(rxq_data);
@@ -497,13 +497,13 @@ mlx5_rxq_devx_obj_new(struct rte_eth_dev *dev, uint16_t idx)
 		tmpl->fd = mlx5_os_get_devx_channel_fd(tmpl->devx_channel);
 	}
 	/* Create CQ using DevX API. */
-	ret = mlx5_rxq_create_devx_cq_resources(dev, idx);
+	ret = mlx5_rxq_create_devx_cq_resources(dev, rxq_data);
 	if (ret) {
 		DRV_LOG(ERR, "Failed to create CQ.");
 		goto error;
 	}
 	/* Create RQ using DevX API. */
-	ret = mlx5_rxq_create_devx_rq_resources(dev, idx);
+	ret = mlx5_rxq_create_devx_rq_resources(dev, rxq_data);
 	if (ret) {
 		DRV_LOG(ERR, "Port %u Rx queue %u RQ creation failure.",
 			dev->data->port_id, idx);
@@ -536,6 +536,11 @@ mlx5_rxq_devx_obj_new(struct rte_eth_dev *dev, uint16_t idx)
  *   Pointer to Ethernet device.
  * @param log_n
  *   Log of number of queues in the array.
+ * @param queues
+ *   List of RX queue indices or NULL, in which case
+ *   the attribute will be filled by drop queue ID.
+ * @param queues_n
+ *   Size of @p queues array or 0 if it is NULL.
  * @param ind_tbl
  *   DevX indirection table object.
  *
@@ -563,6 +568,11 @@ mlx5_devx_ind_table_create_rqt_attr(struct rte_eth_dev *dev,
 	}
 	rqt_attr->rqt_max_size = priv->config.ind_table_max_size;
 	rqt_attr->rqt_actual_size = rqt_n;
+	if (queues == NULL) {
+		for (i = 0; i < rqt_n; i++)
+			rqt_attr->rq_list[i] = priv->drop_queue.rxq->rq->id;
+		return rqt_attr;
+	}
 	for (i = 0; i != queues_n; ++i) {
 		struct mlx5_rxq_data *rxq = (*priv->rxqs)[queues[i]];
 		struct mlx5_rxq_ctrl *rxq_ctrl =
@@ -670,7 +680,8 @@ mlx5_devx_ind_table_destroy(struct mlx5_ind_table_obj *ind_tbl)
  * @param[in] hash_fields
  *   Verbs protocol hash field to make the RSS on.
  * @param[in] ind_tbl
- *   Indirection table for TIR.
+ *   Indirection table for TIR. If table queues array is NULL,
+ *   a TIR for drop queue is assumed.
  * @param[in] tunnel
  *   Tunnel type.
  * @param[out] tir_attr
@@ -686,19 +697,27 @@ mlx5_devx_tir_attr_set(struct rte_eth_dev *dev, const uint8_t *rss_key,
 		       int tunnel, struct mlx5_devx_tir_attr *tir_attr)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[ind_tbl->queues[0]];
-	struct mlx5_rxq_ctrl *rxq_ctrl =
-		container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
-	enum mlx5_rxq_type rxq_obj_type = rxq_ctrl->type;
+	enum mlx5_rxq_type rxq_obj_type;
 	bool lro = true;
 	uint32_t i;
 
-	/* Enable TIR LRO only if all the queues were configured for. */
-	for (i = 0; i < ind_tbl->queues_n; ++i) {
-		if (!(*priv->rxqs)[ind_tbl->queues[i]]->lro) {
-			lro = false;
-			break;
+	/* NULL queues designate drop queue. */
+	if (ind_tbl->queues != NULL) {
+		struct mlx5_rxq_data *rxq_data =
+					(*priv->rxqs)[ind_tbl->queues[0]];
+		struct mlx5_rxq_ctrl *rxq_ctrl =
+			container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
+		rxq_obj_type = rxq_ctrl->type;
+
+		/* Enable TIR LRO only if all the queues were configured for. */
+		for (i = 0; i < ind_tbl->queues_n; ++i) {
+			if (!(*priv->rxqs)[ind_tbl->queues[i]]->lro) {
+				lro = false;
+				break;
+			}
 		}
+	} else {
+		rxq_obj_type = priv->drop_queue.rxq->rxq_ctrl->type;
 	}
 	memset(tir_attr, 0, sizeof(*tir_attr));
 	tir_attr->disp_type = MLX5_TIRC_DISP_TYPE_INDIRECT;
@@ -857,7 +876,7 @@ mlx5_devx_hrxq_modify(struct rte_eth_dev *dev, struct mlx5_hrxq *hrxq,
 }
 
 /**
- * Create a DevX drop action for Rx Hash queue.
+ * Create a DevX drop Rx queue.
  *
  * @param dev
  *   Pointer to Ethernet device.
@@ -866,14 +885,99 @@ mlx5_devx_hrxq_modify(struct rte_eth_dev *dev, struct mlx5_hrxq *hrxq,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_devx_drop_action_create(struct rte_eth_dev *dev)
+mlx5_rxq_devx_obj_drop_create(struct rte_eth_dev *dev)
 {
-	(void)dev;
-	DRV_LOG(ERR, "DevX drop action is not supported yet.");
-	rte_errno = ENOTSUP;
+	struct mlx5_priv *priv = dev->data->dev_private;
+	int socket_id = dev->device->numa_node;
+	struct mlx5_rxq_ctrl *rxq_ctrl;
+	struct mlx5_rxq_data *rxq_data;
+	struct mlx5_rxq_obj *rxq = NULL;
+	int ret;
+
+	/*
+	 * Initialize dummy control structures.
+	 * They are required to hold pointers for cleanup
+	 * and are only accessible via drop queue DevX objects.
+	 */
+	rxq_ctrl = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*rxq_ctrl),
+			       0, socket_id);
+	if (rxq_ctrl == NULL) {
+		DRV_LOG(ERR, "Port %u could not allocate drop queue control",
+			dev->data->port_id);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	rxq = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*rxq), 0, socket_id);
+	if (rxq == NULL) {
+		DRV_LOG(ERR, "Port %u could not allocate drop queue object",
+			dev->data->port_id);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	rxq->rxq_ctrl = rxq_ctrl;
+	rxq_ctrl->type = MLX5_RXQ_TYPE_STANDARD;
+	rxq_ctrl->priv = priv;
+	rxq_ctrl->obj = rxq;
+	rxq_data = &rxq_ctrl->rxq;
+	/* Create CQ using DevX API. */
+	ret = mlx5_rxq_create_devx_cq_resources(dev, rxq_data);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Port %u drop queue CQ creation failed.",
+			dev->data->port_id);
+		goto error;
+	}
+	/* Create RQ using DevX API. */
+	ret = mlx5_rxq_create_devx_rq_resources(dev, rxq_data);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Port %u drop queue RQ creation failed.",
+			dev->data->port_id);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	/* Change queue state to ready. */
+	ret = mlx5_devx_modify_rq(rxq, MLX5_RXQ_MOD_RST2RDY);
+	if (ret != 0)
+		goto error;
+	/* Initialize drop queue. */
+	priv->drop_queue.rxq = rxq;
+	return 0;
+error:
+	ret = rte_errno; /* Save rte_errno before cleanup. */
+	if (rxq != NULL) {
+		if (rxq->rq_obj.rq != NULL)
+			mlx5_devx_rq_destroy(&rxq->rq_obj);
+		if (rxq->cq_obj.cq != NULL)
+			mlx5_devx_cq_destroy(&rxq->cq_obj);
+		if (rxq->devx_channel)
+			mlx5_os_devx_destroy_event_channel
+							(rxq->devx_channel);
+		mlx5_free(rxq);
+	}
+	if (rxq_ctrl != NULL)
+		mlx5_free(rxq_ctrl);
+	rte_errno = ret; /* Restore rte_errno. */
 	return -rte_errno;
 }
 
+/**
+ * Release drop Rx queue resources.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ */
+static void
+mlx5_rxq_devx_obj_drop_release(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rxq_obj *rxq = priv->drop_queue.rxq;
+	struct mlx5_rxq_ctrl *rxq_ctrl = rxq->rxq_ctrl;
+
+	mlx5_rxq_devx_obj_release(rxq);
+	mlx5_free(rxq);
+	mlx5_free(rxq_ctrl);
+	priv->drop_queue.rxq = NULL;
+}
+
 /**
  * Release a drop hash Rx queue.
  *
@@ -883,9 +987,53 @@ mlx5_devx_drop_action_create(struct rte_eth_dev *dev)
 static void
 mlx5_devx_drop_action_destroy(struct rte_eth_dev *dev)
 {
-	(void)dev;
-	DRV_LOG(ERR, "DevX drop action is not supported yet.");
-	rte_errno = ENOTSUP;
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_hrxq *hrxq = priv->drop_queue.hrxq;
+
+	if (hrxq->tir != NULL)
+		mlx5_devx_tir_destroy(hrxq);
+	if (hrxq->ind_table->ind_table != NULL)
+		mlx5_devx_ind_table_destroy(hrxq->ind_table);
+	if (priv->drop_queue.rxq->rq != NULL)
+		mlx5_rxq_devx_obj_drop_release(dev);
+}
+
+/**
+ * Create a DevX drop action for Rx Hash queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_devx_drop_action_create(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_hrxq *hrxq = priv->drop_queue.hrxq;
+	int ret;
+
+	ret = mlx5_rxq_devx_obj_drop_create(dev);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Cannot create drop RX queue");
+		return ret;
+	}
+	/* hrxq->ind_table queues are NULL, drop RX queue ID will be used */
+	ret = mlx5_devx_ind_table_new(dev, 0, hrxq->ind_table);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Cannot create drop hash RX queue indirection table");
+		goto error;
+	}
+	ret = mlx5_devx_hrxq_new(dev, hrxq, /* tunnel */ false);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Cannot create drop hash RX queue");
+		goto error;
+	}
+	return 0;
+error:
+	mlx5_devx_drop_action_destroy(dev);
+	return ret;
 }
 
 /**
-- 
2.25.1
^ permalink raw reply	[flat|nested] 96+ messages in thread
* [dpdk-dev] [PATCH 5/5] net/mlx5: preserve indirect actions on restart
  2021-10-05  0:52 [dpdk-dev] [PATCH 0/5] Flow entites behavior on port restart dkozlyuk
                   ` (3 preceding siblings ...)
  2021-10-05  0:52 ` [dpdk-dev] [PATCH 4/5] net/mlx5: create drop queue " dkozlyuk
@ 2021-10-05  0:52 ` dkozlyuk
  2021-10-15 16:18 ` [dpdk-dev] [PATCH v2 0/5] Flow entites behavior on port restart Dmitry Kozlyuk
  5 siblings, 0 replies; 96+ messages in thread
From: dkozlyuk @ 2021-10-05  0:52 UTC (permalink / raw)
  To: dev; +Cc: Dmitry Kozlyuk, bingz, stable, Matan Azrad, Viacheslav Ovsiienko
From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
MLX5 PMD uses reference counting to manage RX queue resources.
After port stop shared RSS actions kept references to RX queues,
preventing resource release. As a result, internal PMD mempool
for such queues had been exhausted after a number of port restarts.
Diagnostic message from rte_eth_dev_start():
    Rx queue allocation failed: Cannot allocate memory
Dereference RX queues used by indirect actions on port stop (detach)
and restore references on port start (attach) in order to allow RX queue
resource release, but keep indirect RSS across the port restart.
Replace queue IDs in HW by drop queue ID on detach and restore actual
queue IDs on attach.
As a result, MLX5 PMD is able to keep all its indirect actions
across port restart. Advertise this capability to the ethdev layer.
Fixes: 4b61b8774be9 ("ethdev: introduce indirect flow action")
Cc: bingz@nvidia.com
Cc: stable@dpdk.org
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/net/mlx5/mlx5_ethdev.c  |   1 +
 drivers/net/mlx5/mlx5_flow.c    | 194 ++++++++++++++++++++++++++++----
 drivers/net/mlx5/mlx5_flow.h    |   2 +
 drivers/net/mlx5/mlx5_rx.h      |   4 +
 drivers/net/mlx5/mlx5_rxq.c     |  99 ++++++++++++++--
 drivers/net/mlx5/mlx5_trigger.c |  10 ++
 6 files changed, 276 insertions(+), 34 deletions(-)
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 82e2284d98..8ebfd0bccb 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -325,6 +325,7 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
 	info->reta_size = priv->reta_idx_n ?
 		priv->reta_idx_n : config->ind_table_max_size;
 	info->hash_key_size = MLX5_RSS_HASH_KEY_LEN;
+	info->dev_capa = RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP;
 	info->speed_capa = priv->link_speed_capa;
 	info->flow_type_rss_offloads = ~MLX5_RSS_HF_MASK;
 	mlx5_set_default_params(dev, info);
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index bfc3e20c9a..c10b911259 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -1560,6 +1560,58 @@ mlx5_flow_validate_action_queue(const struct rte_flow_action *action,
 	return 0;
 }
 
+/**
+ * Validate queue numbers for device RSS.
+ *
+ * @param[in] dev
+ *   Configured device.
+ * @param[in] queues
+ *   Array of queue numbers.
+ * @param[in] queues_n
+ *   Size of the @p queues array.
+ * @param[out] error
+ *   On error, filled with a textual error description.
+ * @param[out] queue
+ *   On error, filled with an offending queue index in @p queues array.
+ *
+ * @return
+ *   0 on success, a negative errno code on error.
+ */
+static int
+mlx5_validate_rss_queues(const struct rte_eth_dev *dev,
+			 const uint16_t *queues, uint32_t queues_n,
+			 const char **error, uint32_t *queue_idx)
+{
+	const struct mlx5_priv *priv = dev->data->dev_private;
+	enum mlx5_rxq_type rxq_type = MLX5_RXQ_TYPE_UNDEFINED;
+	uint32_t i;
+
+	for (i = 0; i != queues_n; ++i) {
+		struct mlx5_rxq_ctrl *rxq_ctrl;
+
+		if (queues[i] >= priv->rxqs_n) {
+			*error = "queue index out of range";
+			*queue_idx = i;
+			return -EINVAL;
+		}
+		if (!(*priv->rxqs)[queues[i]]) {
+			*error =  "queue is not configured";
+			*queue_idx = i;
+			return -EINVAL;
+		}
+		rxq_ctrl = container_of((*priv->rxqs)[queues[i]],
+					struct mlx5_rxq_ctrl, rxq);
+		if (i == 0)
+			rxq_type = rxq_ctrl->type;
+		if (rxq_type != rxq_ctrl->type) {
+			*error = "combining hairpin and regular RSS queues is not supported";
+			*queue_idx = i;
+			return -ENOTSUP;
+		}
+	}
+	return 0;
+}
+
 /*
  * Validate the rss action.
  *
@@ -1580,8 +1632,9 @@ mlx5_validate_action_rss(struct rte_eth_dev *dev,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	const struct rte_flow_action_rss *rss = action->conf;
-	enum mlx5_rxq_type rxq_type = MLX5_RXQ_TYPE_UNDEFINED;
-	unsigned int i;
+	int ret;
+	const char *message;
+	uint32_t queue_idx;
 
 	if (rss->func != RTE_ETH_HASH_FUNCTION_DEFAULT &&
 	    rss->func != RTE_ETH_HASH_FUNCTION_TOEPLITZ)
@@ -1645,27 +1698,12 @@ mlx5_validate_action_rss(struct rte_eth_dev *dev,
 		return rte_flow_error_set(error, EINVAL,
 					  RTE_FLOW_ERROR_TYPE_ACTION_CONF,
 					  NULL, "No queues configured");
-	for (i = 0; i != rss->queue_num; ++i) {
-		struct mlx5_rxq_ctrl *rxq_ctrl;
-
-		if (rss->queue[i] >= priv->rxqs_n)
-			return rte_flow_error_set
-				(error, EINVAL,
-				 RTE_FLOW_ERROR_TYPE_ACTION_CONF,
-				 &rss->queue[i], "queue index out of range");
-		if (!(*priv->rxqs)[rss->queue[i]])
-			return rte_flow_error_set
-				(error, EINVAL, RTE_FLOW_ERROR_TYPE_ACTION_CONF,
-				 &rss->queue[i], "queue is not configured");
-		rxq_ctrl = container_of((*priv->rxqs)[rss->queue[i]],
-					struct mlx5_rxq_ctrl, rxq);
-		if (i == 0)
-			rxq_type = rxq_ctrl->type;
-		if (rxq_type != rxq_ctrl->type)
-			return rte_flow_error_set
-				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION_CONF,
-				 &rss->queue[i],
-				 "combining hairpin and regular RSS queues is not supported");
+	ret = mlx5_validate_rss_queues(dev, rss->queue, rss->queue_num,
+				       &message, &queue_idx);
+	if (ret != 0) {
+		return rte_flow_error_set(error, -ret,
+					  RTE_FLOW_ERROR_TYPE_ACTION_CONF,
+					  &rss->queue[queue_idx], message);
 	}
 	return 0;
 }
@@ -8547,6 +8585,116 @@ mlx5_action_handle_flush(struct rte_eth_dev *dev)
 	return ret;
 }
 
+/**
+ * Validate existing indirect actions against current device configuration
+ * and attach them to device resources.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_action_handle_attach(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_indexed_pool *ipool =
+			priv->sh->ipool[MLX5_IPOOL_RSS_SHARED_ACTIONS];
+	struct mlx5_shared_action_rss *shared_rss, *shared_rss_last;
+	int ret = 0;
+	uint32_t idx;
+
+	ILIST_FOREACH(ipool, priv->rss_shared_actions, idx, shared_rss, next) {
+		struct mlx5_ind_table_obj *ind_tbl = shared_rss->ind_tbl;
+		const char *message;
+		uint32_t queue_idx;
+
+		ret = mlx5_validate_rss_queues(dev, ind_tbl->queues,
+					       ind_tbl->queues_n,
+					       &message, &queue_idx);
+		if (ret != 0) {
+			DRV_LOG(ERR, "Port %u cannot use queue %u in RSS: %s",
+				dev->data->port_id, ind_tbl->queues[queue_idx],
+				message);
+			break;
+		}
+	}
+	if (ret != 0)
+		return ret;
+	ILIST_FOREACH(ipool, priv->rss_shared_actions, idx, shared_rss, next) {
+		struct mlx5_ind_table_obj *ind_tbl = shared_rss->ind_tbl;
+
+		ret = mlx5_ind_table_obj_attach(dev, ind_tbl);
+		if (ret != 0) {
+			DRV_LOG(ERR, "Port %u could not attach "
+				"indirection table obj %p",
+				dev->data->port_id, (void *)ind_tbl);
+			goto error;
+		}
+	}
+	return 0;
+error:
+	shared_rss_last = shared_rss;
+	ILIST_FOREACH(ipool, priv->rss_shared_actions, idx, shared_rss, next) {
+		struct mlx5_ind_table_obj *ind_tbl = shared_rss->ind_tbl;
+
+		if (shared_rss == shared_rss_last)
+			break;
+		if (mlx5_ind_table_obj_detach(dev, ind_tbl) != 0)
+			DRV_LOG(CRIT, "Port %u could not detach "
+				"indirection table obj %p on rollback",
+				dev->data->port_id, (void *)ind_tbl);
+	}
+	return ret;
+}
+
+/**
+ * Detach indirect actions of the device from its resources.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_action_handle_detach(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_indexed_pool *ipool =
+			priv->sh->ipool[MLX5_IPOOL_RSS_SHARED_ACTIONS];
+	struct mlx5_shared_action_rss *shared_rss, *shared_rss_last;
+	int ret = 0;
+	uint32_t idx;
+
+	ILIST_FOREACH(ipool, priv->rss_shared_actions, idx, shared_rss, next) {
+		struct mlx5_ind_table_obj *ind_tbl = shared_rss->ind_tbl;
+
+		ret = mlx5_ind_table_obj_detach(dev, ind_tbl);
+		if (ret != 0) {
+			DRV_LOG(ERR, "Port %u could not detach "
+				"indirection table obj %p",
+				dev->data->port_id, (void *)ind_tbl);
+			goto error;
+		}
+	}
+	return 0;
+error:
+	shared_rss_last = shared_rss;
+	ILIST_FOREACH(ipool, priv->rss_shared_actions, idx, shared_rss, next) {
+		struct mlx5_ind_table_obj *ind_tbl = shared_rss->ind_tbl;
+
+		if (shared_rss == shared_rss_last)
+			break;
+		if (mlx5_ind_table_obj_attach(dev, ind_tbl) != 0)
+			DRV_LOG(CRIT, "Port %u could not attach "
+				"indirection table obj %p on rollback",
+				dev->data->port_id, (void *)ind_tbl);
+	}
+	return ret;
+}
+
 #ifndef HAVE_MLX5DV_DR
 #define MLX5_DOMAIN_SYNC_FLOW ((1 << 0) | (1 << 1))
 #else
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 8f94125f26..6bc7946cc3 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1574,6 +1574,8 @@ void mlx5_flow_destroy_sub_policy_with_rxq(struct rte_eth_dev *dev,
 		struct mlx5_flow_meter_policy *mtr_policy);
 int mlx5_flow_dv_discover_counter_offset_support(struct rte_eth_dev *dev);
 int mlx5_flow_discover_dr_action_support(struct rte_eth_dev *dev);
+int mlx5_action_handle_attach(struct rte_eth_dev *dev);
+int mlx5_action_handle_detach(struct rte_eth_dev *dev);
 int mlx5_action_handle_flush(struct rte_eth_dev *dev);
 void mlx5_release_tunnel_hub(struct mlx5_dev_ctx_shared *sh, uint16_t port_id);
 int mlx5_alloc_tunnel_hub(struct mlx5_dev_ctx_shared *sh);
diff --git a/drivers/net/mlx5/mlx5_rx.h b/drivers/net/mlx5/mlx5_rx.h
index 3f2b99fb65..7319ad0264 100644
--- a/drivers/net/mlx5/mlx5_rx.h
+++ b/drivers/net/mlx5/mlx5_rx.h
@@ -222,6 +222,10 @@ int mlx5_ind_table_obj_modify(struct rte_eth_dev *dev,
 			      struct mlx5_ind_table_obj *ind_tbl,
 			      uint16_t *queues, const uint32_t queues_n,
 			      bool standalone);
+int mlx5_ind_table_obj_attach(struct rte_eth_dev *dev,
+			      struct mlx5_ind_table_obj *ind_tbl);
+int mlx5_ind_table_obj_detach(struct rte_eth_dev *dev,
+			      struct mlx5_ind_table_obj *ind_tbl);
 struct mlx5_list_entry *mlx5_hrxq_create_cb(void *tool_ctx, void *cb_ctx);
 int mlx5_hrxq_match_cb(void *tool_ctx, struct mlx5_list_entry *entry,
 		       void *cb_ctx);
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index abd8ce7989..cf4a29772c 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -2018,6 +2018,26 @@ mlx5_ind_table_obj_new(struct rte_eth_dev *dev, const uint16_t *queues,
 	return ind_tbl;
 }
 
+static int
+mlx5_ind_table_obj_check_standalone(struct rte_eth_dev *dev __rte_unused,
+				    struct mlx5_ind_table_obj *ind_tbl)
+{
+	uint32_t refcnt;
+
+	refcnt = __atomic_load_n(&ind_tbl->refcnt, __ATOMIC_RELAXED);
+	if (refcnt <= 1)
+		return 0;
+	/*
+	 * Modification of indirection tables having more than 1
+	 * reference is unsupported.
+	 */
+	DRV_LOG(DEBUG,
+		"Port %u cannot modify indirection table %p (refcnt %u > 1).",
+		dev->data->port_id, (void *)ind_tbl, refcnt);
+	rte_errno = EINVAL;
+	return -rte_errno;
+}
+
 /**
  * Modify an indirection table.
  *
@@ -2050,18 +2070,8 @@ mlx5_ind_table_obj_modify(struct rte_eth_dev *dev,
 
 	MLX5_ASSERT(standalone);
 	RTE_SET_USED(standalone);
-	if (__atomic_load_n(&ind_tbl->refcnt, __ATOMIC_RELAXED) > 1) {
-		/*
-		 * Modification of indirection ntables having more than 1
-		 * reference unsupported. Intended for standalone indirection
-		 * tables only.
-		 */
-		DRV_LOG(DEBUG,
-			"Port %u cannot modify indirection table (refcnt> 1).",
-			dev->data->port_id);
-		rte_errno = EINVAL;
+	if (mlx5_ind_table_obj_check_standalone(dev, ind_tbl) < 0)
 		return -rte_errno;
-	}
 	for (i = 0; i != queues_n; ++i) {
 		if (!mlx5_rxq_get(dev, queues[i])) {
 			ret = -rte_errno;
@@ -2087,6 +2097,73 @@ mlx5_ind_table_obj_modify(struct rte_eth_dev *dev,
 	return ret;
 }
 
+/**
+ * Attach an indirection table to its queues.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param ind_table
+ *   Indirection table to attach.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_ind_table_obj_attach(struct rte_eth_dev *dev,
+			  struct mlx5_ind_table_obj *ind_tbl)
+{
+	unsigned int i;
+	int ret;
+
+	ret = mlx5_ind_table_obj_modify(dev, ind_tbl, ind_tbl->queues,
+					ind_tbl->queues_n, true);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Port %u could not modify indirect table obj %p",
+			dev->data->port_id, (void *)ind_tbl);
+		return ret;
+	}
+	for (i = 0; i < ind_tbl->queues_n; i++)
+		mlx5_rxq_get(dev, ind_tbl->queues[i]);
+	return 0;
+}
+
+/**
+ * Detach an indirection table from its queues.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param ind_table
+ *   Indirection table to detach.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_ind_table_obj_detach(struct rte_eth_dev *dev,
+			  struct mlx5_ind_table_obj *ind_tbl)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const unsigned int n = rte_is_power_of_2(ind_tbl->queues_n) ?
+			       log2above(ind_tbl->queues_n) :
+			       log2above(priv->config.ind_table_max_size);
+	unsigned int i;
+	int ret;
+
+	ret = mlx5_ind_table_obj_check_standalone(dev, ind_tbl);
+	if (ret != 0)
+		return ret;
+	MLX5_ASSERT(priv->obj_ops.ind_table_modify);
+	ret = priv->obj_ops.ind_table_modify(dev, n, NULL, 0, ind_tbl);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Port %u could not modify indirect table obj %p",
+			dev->data->port_id, (void *)ind_tbl);
+		return ret;
+	}
+	for (i = 0; i < ind_tbl->queues_n; i++)
+		mlx5_rxq_release(dev, ind_tbl->queues[i]);
+	return ret;
+}
+
 int
 mlx5_hrxq_match_cb(void *tool_ctx __rte_unused, struct mlx5_list_entry *entry,
 		   void *cb_ctx)
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 54173bfacb..c3adf5082e 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -14,6 +14,7 @@
 #include <mlx5_malloc.h>
 
 #include "mlx5.h"
+#include "mlx5_flow.h"
 #include "mlx5_mr.h"
 #include "mlx5_rx.h"
 #include "mlx5_tx.h"
@@ -1113,6 +1114,14 @@ mlx5_dev_start(struct rte_eth_dev *dev)
 	mlx5_rxq_timestamp_set(dev);
 	/* Set a mask and offset of scheduling on timestamp into Tx queues. */
 	mlx5_txq_dynf_timestamp_set(dev);
+	/* Attach indirection table objects detached on port stop. */
+	ret = mlx5_action_handle_attach(dev);
+	if (ret) {
+		DRV_LOG(ERR,
+			"port %u failed to attach indirect actions: %s",
+			dev->data->port_id, rte_strerror(rte_errno));
+		goto error;
+	}
 	/*
 	 * In non-cached mode, it only needs to start the default mreg copy
 	 * action and no flow created by application exists anymore.
@@ -1185,6 +1194,7 @@ mlx5_dev_stop(struct rte_eth_dev *dev)
 	/* All RX queue flags will be cleared in the flush interface. */
 	mlx5_flow_list_flush(dev, MLX5_FLOW_TYPE_GEN, true);
 	mlx5_flow_meter_rxq_flush(dev);
+	mlx5_action_handle_detach(dev);
 	mlx5_rx_intr_vec_disable(dev);
 	priv->sh->port[priv->dev_port - 1].ih_port_id = RTE_MAX_ETHPORTS;
 	priv->sh->port[priv->dev_port - 1].devx_ih_port_id = RTE_MAX_ETHPORTS;
-- 
2.25.1
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH 1/5] ethdev: add capability to keep flow rules on restart
  2021-10-05  0:52 ` [dpdk-dev] [PATCH 1/5] ethdev: add capability to keep flow rules on restart dkozlyuk
@ 2021-10-06  6:15   ` Ori Kam
  2021-10-06  6:55     ` Somnath Kotur
  2021-10-06 17:15   ` Ajit Khaparde
  1 sibling, 1 reply; 96+ messages in thread
From: Ori Kam @ 2021-10-06  6:15 UTC (permalink / raw)
  To: Dmitry Kozlyuk, dev
  Cc: NBU-Contact-Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
Hi Dmitry,
> -----Original Message-----
> From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> Sent: Tuesday, October 5, 2021 3:52 AM
> Subject: [PATCH 1/5] ethdev: add capability to keep flow rules on restart
> 
> From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> 
> Currently, it is not specified what happens to the flow rules when the device is
> stopped, possibly reconfigured, then started.
> If flow rules were kept, it could be convenient for application developers,
> because they wouldn't need to save and restore them.
> However, due to the number of flows and possible creation rate it is
> impractical to save all flow rules in DPDK layer. This means that flow rules
> persistence really depends on whether PMD and HW can implement it
> efficiently. It is proposed for PMDs to advertise this capability if supported
> using a new flag.
> 
> If the device is being reconfigured in a way that is incompatible with existing
> flow rules, PMD is required to report an error.
> This is mandatory, because flow API does not supply users with capabilities, so
> this is the only way for a user to learn that configuration is invalid. For
> example, if queue count changes and the action of a flow rule specifies queues
> that are going away, the user must update or remove the flow rule before
> removing the queues.
> 
> Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> ---
Acked-by: Ori Kam <orika@nvidia.com>
Thanks,
Ori
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH 2/5] ethdev: add capability to keep shared objects on restart
  2021-10-05  0:52 ` [dpdk-dev] [PATCH 2/5] ethdev: add capability to keep shared objects " dkozlyuk
@ 2021-10-06  6:16   ` Ori Kam
  2021-10-13  8:32   ` Dmitry Kozlyuk
  1 sibling, 0 replies; 96+ messages in thread
From: Ori Kam @ 2021-10-06  6:16 UTC (permalink / raw)
  To: Dmitry Kozlyuk, dev
  Cc: NBU-Contact-Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
Hi Dmitry,
> -----Original Message-----
> From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> Sent: Tuesday, October 5, 2021 3:52 AM
> To: dev@dpdk.org
> Cc: Dmitry Kozlyuk <dkozlyuk@nvidia.com>; Ori Kam <orika@nvidia.com>;
> NBU-Contact-Thomas Monjalon <thomas@monjalon.net>; Ferruh Yigit
> <ferruh.yigit@intel.com>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>
> Subject: [PATCH 2/5] ethdev: add capability to keep shared objects on restart
> 
> From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> 
> rte_flow_action_handle_create() did not mention what happens with an
> indirect action when a device is stopped, possibly reconfigured, and started
> again. It is natural for some indirect actions to be persistent, like counters and
> meters; keeping others just saves application time and complexity. However,
> not all PMDs can support it.
> It is proposed to add a device capability to indicate if indirect actions are kept
> across the above sequence or implicitly destroyed.
> 
> In the future, indirect actions may not be the only type of objects shared
> between flow rules. The capability bit intends to cover all possible types of such
> objects, hence its name.
> 
> It may happen that in the future a PMD acquires support for a type of shared
> objects that it cannot keep across a restart. It is undesirable to stop advertising
> the capability so that applications that don't use objects of the problematic type
> can still take advantage of it.
> This is why PMDs are allowed to keep only a subset of shared objects provided
> that the vendor mandatorily documents it.
> 
> If the device is being reconfigured in a way that is incompatible with an existing
> shared objects, PMD is required to report an error.
> This is mandatory, because flow API does not supply users with capabilities, so
> this is the only way for a user to learn that configuration is invalid. For
> example, if queue count changes and RSS indirect action specifies queues that
> are going away, the user must update the action before removing the queues
> or remove the action and all flow rules that were using it.
> 
> Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> ---
Acked-by: Ori Kam <orika@nvidia.com>
Thanks,
Ori
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH 1/5] ethdev: add capability to keep flow rules on restart
  2021-10-06  6:15   ` Ori Kam
@ 2021-10-06  6:55     ` Somnath Kotur
  0 siblings, 0 replies; 96+ messages in thread
From: Somnath Kotur @ 2021-10-06  6:55 UTC (permalink / raw)
  To: Ori Kam
  Cc: Dmitry Kozlyuk, dev, NBU-Contact-Thomas Monjalon, Ferruh Yigit,
	Andrew Rybchenko
[-- Attachment #1: Type: text/plain, Size: 1605 bytes --]
On Wed, Oct 6, 2021 at 11:45 AM Ori Kam <orika@nvidia.com> wrote:
>
> Hi Dmitry,
>
> > -----Original Message-----
> > From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> > Sent: Tuesday, October 5, 2021 3:52 AM
> > Subject: [PATCH 1/5] ethdev: add capability to keep flow rules on restart
> >
> > From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> >
> > Currently, it is not specified what happens to the flow rules when the device is
> > stopped, possibly reconfigured, then started.
> > If flow rules were kept, it could be convenient for application developers,
> > because they wouldn't need to save and restore them.
> > However, due to the number of flows and possible creation rate it is
> > impractical to save all flow rules in DPDK layer. This means that flow rules
> > persistence really depends on whether PMD and HW can implement it
> > efficiently. It is proposed for PMDs to advertise this capability if supported
> > using a new flag.
> >
> > If the device is being reconfigured in a way that is incompatible with existing
> > flow rules, PMD is required to report an error.
> > This is mandatory, because flow API does not supply users with capabilities, so
> > this is the only way for a user to learn that configuration is invalid. For
> > example, if queue count changes and the action of a flow rule specifies queues
> > that are going away, the user must update or remove the flow rule before
> > removing the queues.
> >
> > Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> > ---
>
> Acked-by: Ori Kam <orika@nvidia.com>
> Thanks,
> Ori
Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH 1/5] ethdev: add capability to keep flow rules on restart
  2021-10-05  0:52 ` [dpdk-dev] [PATCH 1/5] ethdev: add capability to keep flow rules on restart dkozlyuk
  2021-10-06  6:15   ` Ori Kam
@ 2021-10-06 17:15   ` Ajit Khaparde
  1 sibling, 0 replies; 96+ messages in thread
From: Ajit Khaparde @ 2021-10-06 17:15 UTC (permalink / raw)
  To: dkozlyuk
  Cc: dpdk-dev, Ori Kam, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
[-- Attachment #1: Type: text/plain, Size: 1279 bytes --]
On Mon, Oct 4, 2021 at 5:52 PM <dkozlyuk@oss.nvidia.com> wrote:
>
> From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
>
> Currently, it is not specified what happens to the flow rules when
> the device is stopped, possibly reconfigured, then started.
> If flow rules were kept, it could be convenient for application
> developers, because they wouldn't need to save and restore them.
> However, due to the number of flows and possible creation rate it is
> impractical to save all flow rules in DPDK layer. This means that flow
> rules persistence really depends on whether PMD and HW can implement it
> efficiently. It is proposed for PMDs to advertise this capability
> if supported using a new flag.
>
> If the device is being reconfigured in a way that is incompatible with
> existing flow rules, PMD is required to report an error.
> This is mandatory, because flow API does not supply users with
> capabilities, so this is the only way for a user to learn that
> configuration is invalid. For example, if queue count changes and the
> action of a flow rule specifies queues that are going away, the user
> must update or remove the flow rule before removing the queues.
>
> Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH 2/5] ethdev: add capability to keep shared objects on restart
  2021-10-05  0:52 ` [dpdk-dev] [PATCH 2/5] ethdev: add capability to keep shared objects " dkozlyuk
  2021-10-06  6:16   ` Ori Kam
@ 2021-10-13  8:32   ` Dmitry Kozlyuk
  2021-10-14 13:46     ` Ferruh Yigit
  2021-10-14 14:14     ` Dmitry Kozlyuk
  1 sibling, 2 replies; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-10-13  8:32 UTC (permalink / raw)
  To: dev, Andrew Rybchenko, Ori Kam, Raslan Darawsheh
  Cc: NBU-Contact-Thomas Monjalon, Ferruh Yigit
This thread continues discussions on previous versions
to keep everything in the thread with final patches:
[1]: http://inbox.dpdk.org/dev/d5673b58-5aa6-ca35-5b60-d938e56cfee1@oktetlabs.ru/
[2]: http://inbox.dpdk.org/dev/DM8PR12MB5400997CCEC9169AC5AE0C89D6EA9@DM8PR12MB5400.namprd12.prod.outlook.com/
Please see below.
> -----Original Message-----
> From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> Sent: 5 октября 2021 г. 3:52
> To: dev@dpdk.org
> Cc: Dmitry Kozlyuk <dkozlyuk@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-
> Contact-Thomas Monjalon <thomas@monjalon.net>; Ferruh Yigit
> <ferruh.yigit@intel.com>; Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Subject: [PATCH 2/5] ethdev: add capability to keep shared objects on
> restart
> 
> From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> 
> rte_flow_action_handle_create() did not mention what happens with an
> indirect action when a device is stopped, possibly reconfigured, and
> started again. It is natural for some indirect actions to be persistent,
> like counters and meters; keeping others just saves application time and
> complexity. However, not all PMDs can support it.
> It is proposed to add a device capability to indicate if indirect actions
> are kept across the above sequence or implicitly destroyed.
> 
> In the future, indirect actions may not be the only type of objects shared
> between flow rules. The capability bit intends to cover all possible types
> of such objects, hence its name.
> 
> It may happen that in the future a PMD acquires support for a type of
> shared objects that it cannot keep across a restart. It is undesirable to
> stop advertising the capability so that applications that don't use
> objects of the problematic type can still take advantage of it.
> This is why PMDs are allowed to keep only a subset of shared objects
> provided that the vendor mandatorily documents it.
> 
> If the device is being reconfigured in a way that is incompatible with an
> existing shared objects, PMD is required to report an error.
> This is mandatory, because flow API does not supply users with
> capabilities, so this is the only way for a user to learn that
> configuration is invalid. For example, if queue count changes and RSS
> indirect action specifies queues that are going away, the user must update
> the action before removing the queues or remove the action and all flow
> rules that were using it.
> 
> Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> ---
> [...]
Current pain point is that capability bits may be insufficient
and a programmatic way is desired to check which types of objects
can be kept across restart, instead of documenting the limitations.
I support one of previous Ori's suggestions and want to clarify it [1]:
Ori: "Another way is to assume that if the action was created before port start it will be kept after port stop."
Andrew: "It does not sound like a solution. May be I simply don't know
target usecase."
What Ori suggests (offline discussion summary): Suppose an application wants to check whether a shared object (indirect action) or a flow rule of a particular kind. It calls rte_flow_action_handle_create() or rte_flow_create() before rte_eth_dev_start(). If it succeeds, 1) it means objects of this type can be kept across restart, 2) it's a normal object created that will work after the port is started. This is logical, because if the PMD can keep some kind of objects when the port is stopped, it is likely to be able to create them when the port is not started. It is subject to discussion if "object kind" means only "type" or "type + transfer bit" combination; for mlx5 PMD it doesn't matter. One minor drawback is that applications can only do the test when the port is stopped, but it seems likely that the test really needs to be done at startup anyway.
If this is acceptable:
1. Capability bits are not needed anymore.
2. ethdev patches can be accepted in RC1, present behavior is undefined anyway.
3. PMD patches will need update that can be done by RC2.
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH 2/5] ethdev: add capability to keep shared objects on restart
  2021-10-13  8:32   ` Dmitry Kozlyuk
@ 2021-10-14 13:46     ` Ferruh Yigit
  2021-10-14 21:45       ` Dmitry Kozlyuk
  2021-10-14 14:14     ` Dmitry Kozlyuk
  1 sibling, 1 reply; 96+ messages in thread
From: Ferruh Yigit @ 2021-10-14 13:46 UTC (permalink / raw)
  To: Dmitry Kozlyuk, dev, Andrew Rybchenko, Ori Kam, Raslan Darawsheh
  Cc: NBU-Contact-Thomas Monjalon, Qi Zhang, jerinj, Maxime Coquelin
On 10/13/2021 9:32 AM, Dmitry Kozlyuk wrote:
> This thread continues discussions on previous versions
> to keep everything in the thread with final patches:
> 
> [1]: http://inbox.dpdk.org/dev/d5673b58-5aa6-ca35-5b60-d938e56cfee1@oktetlabs.ru/
> [2]: http://inbox.dpdk.org/dev/DM8PR12MB5400997CCEC9169AC5AE0C89D6EA9@DM8PR12MB5400.namprd12.prod.outlook.com/
> 
> Please see below.
> 
>> -----Original Message-----
>> From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
>> Sent: 5 октября 2021 г. 3:52
>> To: dev@dpdk.org
>> Cc: Dmitry Kozlyuk <dkozlyuk@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-
>> Contact-Thomas Monjalon <thomas@monjalon.net>; Ferruh Yigit
>> <ferruh.yigit@intel.com>; Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> Subject: [PATCH 2/5] ethdev: add capability to keep shared objects on
>> restart
>>
>> From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
>>
>> rte_flow_action_handle_create() did not mention what happens with an
>> indirect action when a device is stopped, possibly reconfigured, and
>> started again. It is natural for some indirect actions to be persistent,
>> like counters and meters; keeping others just saves application time and
>> complexity. However, not all PMDs can support it.
>> It is proposed to add a device capability to indicate if indirect actions
>> are kept across the above sequence or implicitly destroyed.
>>
>> In the future, indirect actions may not be the only type of objects shared
>> between flow rules. The capability bit intends to cover all possible types
>> of such objects, hence its name.
>>
>> It may happen that in the future a PMD acquires support for a type of
>> shared objects that it cannot keep across a restart. It is undesirable to
>> stop advertising the capability so that applications that don't use
>> objects of the problematic type can still take advantage of it.
>> This is why PMDs are allowed to keep only a subset of shared objects
>> provided that the vendor mandatorily documents it.
>>
>> If the device is being reconfigured in a way that is incompatible with an
>> existing shared objects, PMD is required to report an error.
>> This is mandatory, because flow API does not supply users with
>> capabilities, so this is the only way for a user to learn that
>> configuration is invalid. For example, if queue count changes and RSS
>> indirect action specifies queues that are going away, the user must update
>> the action before removing the queues or remove the action and all flow
>> rules that were using it.
>>
>> Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
>> ---
>> [...]
> 
> Current pain point is that capability bits may be insufficient
> and a programmatic way is desired to check which types of objects
> can be kept across restart, instead of documenting the limitations.
> 
> I support one of previous Ori's suggestions and want to clarify it [1]:
> 
> Ori: "Another way is to assume that if the action was created before port start it will be kept after port stop."
> Andrew: "It does not sound like a solution. May be I simply don't know
> target usecase."
> 
> What Ori suggests (offline discussion summary): Suppose an application wants to check whether a shared object (indirect action) or a flow rule of a particular kind. It calls rte_flow_action_handle_create() or rte_flow_create() before rte_eth_dev_start(). If it succeeds, 1) it means objects of this type can be kept across restart, 2) it's a normal object created that will work after the port is started. This is logical, because if the PMD can keep some kind of objects when the port is stopped, it is likely to be able to create them when the port is not started. It is subject to discussion if "object kind" means only "type" or "type + transfer bit" combination; for mlx5 PMD it doesn't matter. One minor drawback is that applications can only do the test when the port is stopped, but it seems likely that the test really needs to be done at startup anyway.
> 
> If this is acceptable:
> 1. Capability bits are not needed anymore.
> 2. ethdev patches can be accepted in RC1, present behavior is undefined anyway.
> 3. PMD patches will need update that can be done by RC2.
> 
Hi Dmitry,
Are you planning to update drivers yourself on -rc2?
Or do you mean PMD maintainers should update themselves, if so do they
know about it?
If the ethdev layer is updated in a way to impact the drivers, it should
be either:
- all drivers updated with a change
or
- give PMDs time to implement it on their own time, meanwhile they can report
their support status by a flag
We had multiple sample of second case in the past but it is harder for
this case.
For this case what about having three states:
- FLOW_RULE_KEEP
- FLOW_RULE_DESTROY
- FLOW_RULE_UNKNOWN
And set 'FLOW_RULE_UNKNOWN' for all drivers, to simulate current status,
until driver is updated.
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH 2/5] ethdev: add capability to keep shared objects on restart
  2021-10-13  8:32   ` Dmitry Kozlyuk
  2021-10-14 13:46     ` Ferruh Yigit
@ 2021-10-14 14:14     ` Dmitry Kozlyuk
  2021-10-15  8:26       ` Andrew Rybchenko
  1 sibling, 1 reply; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-10-14 14:14 UTC (permalink / raw)
  To: dev, Andrew Rybchenko, Ori Kam, Raslan Darawsheh
  Cc: NBU-Contact-Thomas Monjalon, Ferruh Yigit
> -----Original Message-----
> From: Dmitry Kozlyuk
> Sent: 13 октября 2021 г. 11:33
> To: dev@dpdk.org; Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; Ori
> Kam <orika@nvidia.com>; Raslan Darawsheh <rasland@nvidia.com>
> Cc: NBU-Contact-Thomas Monjalon <thomas@monjalon.net>; Ferruh Yigit
> <ferruh.yigit@intel.com>
> Subject: RE: [PATCH 2/5] ethdev: add capability to keep shared objects on
> restart
> 
> This thread continues discussions on previous versions to keep everything
> in the thread with final patches:
> 
> [1]: http://inbox.dpdk.org/dev/d5673b58-5aa6-ca35-5b60-
> d938e56cfee1@oktetlabs.ru/
> [2]:
> http://inbox.dpdk.org/dev/DM8PR12MB5400997CCEC9169AC5AE0C89D6EA9@DM8PR12MB
> 5400.namprd12.prod.outlook.com/
> 
> Please see below.
> 
> > -----Original Message-----
> > From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> > Sent: 5 октября 2021 г. 3:52
> > To: dev@dpdk.org
> > Cc: Dmitry Kozlyuk <dkozlyuk@nvidia.com>; Ori Kam <orika@nvidia.com>;
> > NBU- Contact-Thomas Monjalon <thomas@monjalon.net>; Ferruh Yigit
> > <ferruh.yigit@intel.com>; Andrew Rybchenko
> > <andrew.rybchenko@oktetlabs.ru>
> > Subject: [PATCH 2/5] ethdev: add capability to keep shared objects on
> > restart
> >
> > From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> >
> > rte_flow_action_handle_create() did not mention what happens with an
> > indirect action when a device is stopped, possibly reconfigured, and
> > started again. It is natural for some indirect actions to be
> > persistent, like counters and meters; keeping others just saves
> > application time and complexity. However, not all PMDs can support it.
> > It is proposed to add a device capability to indicate if indirect
> > actions are kept across the above sequence or implicitly destroyed.
> >
> > In the future, indirect actions may not be the only type of objects
> > shared between flow rules. The capability bit intends to cover all
> > possible types of such objects, hence its name.
> >
> > It may happen that in the future a PMD acquires support for a type of
> > shared objects that it cannot keep across a restart. It is undesirable
> > to stop advertising the capability so that applications that don't use
> > objects of the problematic type can still take advantage of it.
> > This is why PMDs are allowed to keep only a subset of shared objects
> > provided that the vendor mandatorily documents it.
> >
> > If the device is being reconfigured in a way that is incompatible with
> > an existing shared objects, PMD is required to report an error.
> > This is mandatory, because flow API does not supply users with
> > capabilities, so this is the only way for a user to learn that
> > configuration is invalid. For example, if queue count changes and RSS
> > indirect action specifies queues that are going away, the user must
> > update the action before removing the queues or remove the action and
> > all flow rules that were using it.
> >
> > Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> > ---
> > [...]
> 
> Current pain point is that capability bits may be insufficient and a
> programmatic way is desired to check which types of objects can be kept
> across restart, instead of documenting the limitations.
> 
> I support one of previous Ori's suggestions and want to clarify it [1]:
> 
> Ori: "Another way is to assume that if the action was created before port
> start it will be kept after port stop."
> Andrew: "It does not sound like a solution. May be I simply don't know
> target usecase."
> 
> What Ori suggests (offline discussion summary): Suppose an application
> wants to check whether a shared object (indirect action) or a flow rule of
> a particular kind. It calls rte_flow_action_handle_create() or
> rte_flow_create() before rte_eth_dev_start(). If it succeeds, 1) it means
> objects of this type can be kept across restart, 2) it's a normal object
> created that will work after the port is started. This is logical, because
> if the PMD can keep some kind of objects when the port is stopped, it is
> likely to be able to create them when the port is not started. It is
> subject to discussion if "object kind" means only "type" or "type +
> transfer bit" combination; for mlx5 PMD it doesn't matter. One minor
> drawback is that applications can only do the test when the port is
> stopped, but it seems likely that the test really needs to be done at
> startup anyway.
> 
> If this is acceptable:
> 1. Capability bits are not needed anymore.
> 2. ethdev patches can be accepted in RC1, present behavior is undefined
> anyway.
> 3. PMD patches will need update that can be done by RC2.
Andrew, what do you think?
If you agree, do we need to include transfer bit into "kind"?
I'd like to conclude before RC1 and can update the docs quickly.
I've seen the proposition to advertise capability
to create flow rules before device start as a flag.
I don't think it conflicts with Ori's suggestion
because the flag doesn't imply that _any_ rule can be created,
neither does it say about indirect actions.
On the other hand, if PMD can create a flow object (rule, etc.)
when the device is not started, it is logical to assume that
after the device is stopped it can move existing flow objects
to the same state as when the device was not started, then restore
when it is started again.
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH 2/5] ethdev: add capability to keep shared objects on restart
  2021-10-14 13:46     ` Ferruh Yigit
@ 2021-10-14 21:45       ` Dmitry Kozlyuk
  2021-10-14 21:48         ` Dmitry Kozlyuk
  2021-10-15 11:46         ` Ferruh Yigit
  0 siblings, 2 replies; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-10-14 21:45 UTC (permalink / raw)
  To: Ferruh Yigit, dev, Andrew Rybchenko, Ori Kam, Raslan Darawsheh
  Cc: NBU-Contact-Thomas Monjalon, Qi Zhang, jerinj, Maxime Coquelin
> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@intel.com>
> Sent: 14 октября 2021 г. 16:47
> To: Dmitry Kozlyuk <dkozlyuk@nvidia.com>; dev@dpdk.org; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>; Ori Kam <orika@nvidia.com>; Raslan
> Darawsheh <rasland@nvidia.com>
> Cc: NBU-Contact-Thomas Monjalon <thomas@monjalon.net>; Qi Zhang
> <qi.z.zhang@intel.com>; jerinj@marvell.com; Maxime Coquelin
> <maxime.coquelin@redhat.com>
> Subject: Re: [PATCH 2/5] ethdev: add capability to keep shared objects on
> restart
> 
> External email: Use caution opening links or attachments
> 
> 
> On 10/13/2021 9:32 AM, Dmitry Kozlyuk wrote:
> > This thread continues discussions on previous versions to keep
> > everything in the thread with final patches:
> >
> > [1]:
> > http://inbox.dpdk.org/dev/d5673b58-5aa6-ca35-5b60-d938e56cfee1@oktetla
> > bs.ru/
> > [2]:
> > http://inbox.dpdk.org/dev/DM8PR12MB5400997CCEC9169AC5AE0C89D6EA9@DM8PR
> > 12MB5400.namprd12.prod.outlook.com/
> >
> > Please see below.
> >
> >> -----Original Message-----
> >> From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> >> Sent: 5 октября 2021 г. 3:52
> >> To: dev@dpdk.org
> >> Cc: Dmitry Kozlyuk <dkozlyuk@nvidia.com>; Ori Kam <orika@nvidia.com>;
> >> NBU- Contact-Thomas Monjalon <thomas@monjalon.net>; Ferruh Yigit
> >> <ferruh.yigit@intel.com>; Andrew Rybchenko
> >> <andrew.rybchenko@oktetlabs.ru>
> >> Subject: [PATCH 2/5] ethdev: add capability to keep shared objects on
> >> restart
> >>
> >> From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> >>
> >> rte_flow_action_handle_create() did not mention what happens with an
> >> indirect action when a device is stopped, possibly reconfigured, and
> >> started again. It is natural for some indirect actions to be
> >> persistent, like counters and meters; keeping others just saves
> >> application time and complexity. However, not all PMDs can support it.
> >> It is proposed to add a device capability to indicate if indirect
> >> actions are kept across the above sequence or implicitly destroyed.
> >>
> >> In the future, indirect actions may not be the only type of objects
> >> shared between flow rules. The capability bit intends to cover all
> >> possible types of such objects, hence its name.
> >>
> >> It may happen that in the future a PMD acquires support for a type of
> >> shared objects that it cannot keep across a restart. It is
> >> undesirable to stop advertising the capability so that applications
> >> that don't use objects of the problematic type can still take advantage
> of it.
> >> This is why PMDs are allowed to keep only a subset of shared objects
> >> provided that the vendor mandatorily documents it.
> >>
> >> If the device is being reconfigured in a way that is incompatible
> >> with an existing shared objects, PMD is required to report an error.
> >> This is mandatory, because flow API does not supply users with
> >> capabilities, so this is the only way for a user to learn that
> >> configuration is invalid. For example, if queue count changes and RSS
> >> indirect action specifies queues that are going away, the user must
> >> update the action before removing the queues or remove the action and
> >> all flow rules that were using it.
> >>
> >> Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> >> ---
> >> [...]
> >
> > Current pain point is that capability bits may be insufficient and a
> > programmatic way is desired to check which types of objects can be
> > kept across restart, instead of documenting the limitations.
> >
> > I support one of previous Ori's suggestions and want to clarify it [1]:
> >
> > Ori: "Another way is to assume that if the action was created before
> port start it will be kept after port stop."
> > Andrew: "It does not sound like a solution. May be I simply don't know
> > target usecase."
> >
> > What Ori suggests (offline discussion summary): Suppose an application
> wants to check whether a shared object (indirect action) or a flow rule of
> a particular kind. It calls rte_flow_action_handle_create() or
> rte_flow_create() before rte_eth_dev_start(). If it succeeds, 1) it means
> objects of this type can be kept across restart, 2) it's a normal object
> created that will work after the port is started. This is logical, because
> if the PMD can keep some kind of objects when the port is stopped, it is
> likely to be able to create them when the port is not started. It is
> subject to discussion if "object kind" means only "type" or "type +
> transfer bit" combination; for mlx5 PMD it doesn't matter. One minor
> drawback is that applications can only do the test when the port is
> stopped, but it seems likely that the test really needs to be done at
> startup anyway.
> >
> > If this is acceptable:
> > 1. Capability bits are not needed anymore.
> > 2. ethdev patches can be accepted in RC1, present behavior is undefined
> anyway.
> > 3. PMD patches will need update that can be done by RC2.
> >
> 
> Hi Dmitry,
> 
> Are you planning to update drivers yourself on -rc2?
> Or do you mean PMD maintainers should update themselves, if so do they
> know about it?
> 
> If the ethdev layer is updated in a way to impact the drivers, it should
> be either:
> - all drivers updated with a change
> or
> - give PMDs time to implement it on their own time, meanwhile they can
> report their support status by a flag
> 
> We had multiple sample of second case in the past but it is harder for
> this case.
> 
> For this case what about having three states:
> - FLOW_RULE_KEEP
> - FLOW_RULE_DESTROY
> - FLOW_RULE_UNKNOWN
> 
> And set 'FLOW_RULE_UNKNOWN' for all drivers, to simulate current status,
> until driver is updated.
Hi Ferruh,
Indirect actions are only implemented by mlx5 PMD,
the patches will be in RC2.
If we don't use the flag as per the latest suggestion,
nothing needs to be done for other PMDs.
Flag can as well be kept with the following semantics:
0 => indirect actions are flushed on device stop
1 => at least some indirect actions are kept,
     application should check types it's interested in
Introducing UNKNOWN state seems wrong to me.
What should an application do when it is reported?
Now there's just no way to learn how the PMD behaves,
but if it provides a response, it can't be "I don't know what I do".
Here's what I understood from the code, assuming there are no bugs
Like allowing to stop the port and keep dangling flow handles:
bnxt        flush
bonding     depends
cnxk        can't figure out
cxgbe       keep
dpaa2       keep
e1000       keep
enic        flush
failsafe    depends
hinic       flush
hns3        keep
i40e        keep
iavf        keep
ice         keep
igc         keep
ipn3ke      keep
ixgbe       keep
mlx4        keep
mlx5        flush
mvpp2       keep
octeontx2   can't figure out
qede        keep
sfc         flush
softnic     flush
tap         keep
txgbe       keep
Currently one flag would be sufficient to describe PMD behavior:
they either keep or flush the flow rules.
If there are indeed no exceptions, which maintainers should confirm,
I can add flag reporting myself.
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH 2/5] ethdev: add capability to keep shared objects on restart
  2021-10-14 21:45       ` Dmitry Kozlyuk
@ 2021-10-14 21:48         ` Dmitry Kozlyuk
  2021-10-15 11:46         ` Ferruh Yigit
  1 sibling, 0 replies; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-10-14 21:48 UTC (permalink / raw)
  To: Dmitry Kozlyuk, Ferruh Yigit, dev, Andrew Rybchenko, Ori Kam,
	Raslan Darawsheh
  Cc: NBU-Contact-Thomas Monjalon, Qi Zhang, jerinj, Maxime Coquelin
> Introducing UNKNOWN state seems wrong to me.
> What should an application do when it is reported?
> Now there's just no way to learn how the PMD behaves, but if it provides a
> response, it can't be "I don't know what I do".
> 
> Here's what I understood from the code, assuming there are no bugs Like
> allowing to stop the port and keep dangling flow handles:
> 
> bnxt        flush
> [...]
N.B.: This part is about flow rules. For indirect actions it is clear.
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH 2/5] ethdev: add capability to keep shared objects on restart
  2021-10-14 14:14     ` Dmitry Kozlyuk
@ 2021-10-15  8:26       ` Andrew Rybchenko
  2021-10-15  9:04         ` Dmitry Kozlyuk
  0 siblings, 1 reply; 96+ messages in thread
From: Andrew Rybchenko @ 2021-10-15  8:26 UTC (permalink / raw)
  To: Dmitry Kozlyuk, dev, Ori Kam, Raslan Darawsheh
  Cc: NBU-Contact-Thomas Monjalon, Ferruh Yigit
On 10/14/21 5:14 PM, Dmitry Kozlyuk wrote:
> 
> 
>> -----Original Message-----
>> From: Dmitry Kozlyuk
>> Sent: 13 октября 2021 г. 11:33
>> To: dev@dpdk.org; Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; Ori
>> Kam <orika@nvidia.com>; Raslan Darawsheh <rasland@nvidia.com>
>> Cc: NBU-Contact-Thomas Monjalon <thomas@monjalon.net>; Ferruh Yigit
>> <ferruh.yigit@intel.com>
>> Subject: RE: [PATCH 2/5] ethdev: add capability to keep shared objects on
>> restart
>>
>> This thread continues discussions on previous versions to keep everything
>> in the thread with final patches:
>>
>> [1]: http://inbox.dpdk.org/dev/d5673b58-5aa6-ca35-5b60-
>> d938e56cfee1@oktetlabs.ru/
>> [2]:
>> http://inbox.dpdk.org/dev/DM8PR12MB5400997CCEC9169AC5AE0C89D6EA9@DM8PR12MB
>> 5400.namprd12.prod.outlook.com/
>>
>> Please see below.
>>
>>> -----Original Message-----
>>> From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
>>> Sent: 5 октября 2021 г. 3:52
>>> To: dev@dpdk.org
>>> Cc: Dmitry Kozlyuk <dkozlyuk@nvidia.com>; Ori Kam <orika@nvidia.com>;
>>> NBU- Contact-Thomas Monjalon <thomas@monjalon.net>; Ferruh Yigit
>>> <ferruh.yigit@intel.com>; Andrew Rybchenko
>>> <andrew.rybchenko@oktetlabs.ru>
>>> Subject: [PATCH 2/5] ethdev: add capability to keep shared objects on
>>> restart
>>>
>>> From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
>>>
>>> rte_flow_action_handle_create() did not mention what happens with an
>>> indirect action when a device is stopped, possibly reconfigured, and
>>> started again. It is natural for some indirect actions to be
>>> persistent, like counters and meters; keeping others just saves
>>> application time and complexity. However, not all PMDs can support it.
>>> It is proposed to add a device capability to indicate if indirect
>>> actions are kept across the above sequence or implicitly destroyed.
>>>
>>> In the future, indirect actions may not be the only type of objects
>>> shared between flow rules. The capability bit intends to cover all
>>> possible types of such objects, hence its name.
>>>
>>> It may happen that in the future a PMD acquires support for a type of
>>> shared objects that it cannot keep across a restart. It is undesirable
>>> to stop advertising the capability so that applications that don't use
>>> objects of the problematic type can still take advantage of it.
>>> This is why PMDs are allowed to keep only a subset of shared objects
>>> provided that the vendor mandatorily documents it.
>>>
>>> If the device is being reconfigured in a way that is incompatible with
>>> an existing shared objects, PMD is required to report an error.
>>> This is mandatory, because flow API does not supply users with
>>> capabilities, so this is the only way for a user to learn that
>>> configuration is invalid. For example, if queue count changes and RSS
>>> indirect action specifies queues that are going away, the user must
>>> update the action before removing the queues or remove the action and
>>> all flow rules that were using it.
>>>
>>> Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
>>> ---
>>> [...]
>>
>> Current pain point is that capability bits may be insufficient and a
>> programmatic way is desired to check which types of objects can be kept
>> across restart, instead of documenting the limitations.
>>
>> I support one of previous Ori's suggestions and want to clarify it [1]:
>>
>> Ori: "Another way is to assume that if the action was created before port
>> start it will be kept after port stop."
>> Andrew: "It does not sound like a solution. May be I simply don't know
>> target usecase."
>>
>> What Ori suggests (offline discussion summary): Suppose an application
>> wants to check whether a shared object (indirect action) or a flow rule of
>> a particular kind. It calls rte_flow_action_handle_create() or
>> rte_flow_create() before rte_eth_dev_start(). If it succeeds, 1) it means
>> objects of this type can be kept across restart, 2) it's a normal object
>> created that will work after the port is started. This is logical, because
>> if the PMD can keep some kind of objects when the port is stopped, it is
>> likely to be able to create them when the port is not started. It is
>> subject to discussion if "object kind" means only "type" or "type +
>> transfer bit" combination; for mlx5 PMD it doesn't matter. One minor
>> drawback is that applications can only do the test when the port is
>> stopped, but it seems likely that the test really needs to be done at
>> startup anyway.
>>
>> If this is acceptable:
>> 1. Capability bits are not needed anymore.
>> 2. ethdev patches can be accepted in RC1, present behavior is undefined
>> anyway.
>> 3. PMD patches will need update that can be done by RC2.
> 
> Andrew, what do you think?
> If you agree, do we need to include transfer bit into "kind"?
> I'd like to conclude before RC1 and can update the docs quickly.
> 
> I've seen the proposition to advertise capability
> to create flow rules before device start as a flag.
> I don't think it conflicts with Ori's suggestion
> because the flag doesn't imply that _any_ rule can be created,
> neither does it say about indirect actions.
> On the other hand, if PMD can create a flow object (rule, etc.)
> when the device is not started, it is logical to assume that
> after the device is stopped it can move existing flow objects
> to the same state as when the device was not started, then restore
> when it is started again.
> 
Dmitry, thanks for the explanations. Ori's idea makes sense to
me now. The problem is to document it properly. We must define
rules to check it. Which bits in the check request matter and
how application should make a choice of rule to try. Which
status code should be returned by the PMD to clearly say that
addition in started state is not supported and, therefore,
preserving across restart is not supported. Must the device be
configured before an attempt to check it? Should transfer and
non-transfer rules/indirect actions be checked separately?
Andrew.
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH 2/5] ethdev: add capability to keep shared objects on restart
  2021-10-15  8:26       ` Andrew Rybchenko
@ 2021-10-15  9:04         ` Dmitry Kozlyuk
  2021-10-15  9:36           ` Andrew Rybchenko
  0 siblings, 1 reply; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-10-15  9:04 UTC (permalink / raw)
  To: Andrew Rybchenko, dev, Ori Kam, Raslan Darawsheh
  Cc: NBU-Contact-Thomas Monjalon, Ferruh Yigit
> [...]
> >>> From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> >>>
> >>> rte_flow_action_handle_create() did not mention what happens with an
> >>> indirect action when a device is stopped, possibly reconfigured, and
> >>> started again. It is natural for some indirect actions to be
> >>> persistent, like counters and meters; keeping others just saves
> >>> application time and complexity. However, not all PMDs can support it.
> >>> It is proposed to add a device capability to indicate if indirect
> >>> actions are kept across the above sequence or implicitly destroyed.
> >>>
> >>> In the future, indirect actions may not be the only type of objects
> >>> shared between flow rules. The capability bit intends to cover all
> >>> possible types of such objects, hence its name.
> >>>
> >>> It may happen that in the future a PMD acquires support for a type
> >>> of shared objects that it cannot keep across a restart. It is
> >>> undesirable to stop advertising the capability so that applications
> >>> that don't use objects of the problematic type can still take
> advantage of it.
> >>> This is why PMDs are allowed to keep only a subset of shared objects
> >>> provided that the vendor mandatorily documents it.
> >>>
> >>> If the device is being reconfigured in a way that is incompatible
> >>> with an existing shared objects, PMD is required to report an error.
> >>> This is mandatory, because flow API does not supply users with
> >>> capabilities, so this is the only way for a user to learn that
> >>> configuration is invalid. For example, if queue count changes and
> >>> RSS indirect action specifies queues that are going away, the user
> >>> must update the action before removing the queues or remove the
> >>> action and all flow rules that were using it.
> >>>
> >>> Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> >>> ---
> >>> [...]
> >>
> >> Current pain point is that capability bits may be insufficient and a
> >> programmatic way is desired to check which types of objects can be
> >> kept across restart, instead of documenting the limitations.
> >>
> >> I support one of previous Ori's suggestions and want to clarify it [1]:
> >>
> >> Ori: "Another way is to assume that if the action was created before
> >> port start it will be kept after port stop."
> >> Andrew: "It does not sound like a solution. May be I simply don't
> >> know target usecase."
> >>
> >> What Ori suggests (offline discussion summary): Suppose an
> >> application wants to check whether a shared object (indirect action)
> >> or a flow rule of a particular kind. It calls
> >> rte_flow_action_handle_create() or
> >> rte_flow_create() before rte_eth_dev_start(). If it succeeds, 1) it
> >> means objects of this type can be kept across restart, 2) it's a
> >> normal object created that will work after the port is started. This
> >> is logical, because if the PMD can keep some kind of objects when the
> >> port is stopped, it is likely to be able to create them when the port
> >> is not started. It is subject to discussion if "object kind" means
> >> only "type" or "type + transfer bit" combination; for mlx5 PMD it
> >> doesn't matter. One minor drawback is that applications can only do
> >> the test when the port is stopped, but it seems likely that the test
> >> really needs to be done at startup anyway.
> >>
> >> If this is acceptable:
> >> 1. Capability bits are not needed anymore.
> >> 2. ethdev patches can be accepted in RC1, present behavior is
> >> undefined anyway.
> >> 3. PMD patches will need update that can be done by RC2.
> >
> > Andrew, what do you think?
> > If you agree, do we need to include transfer bit into "kind"?
> > I'd like to conclude before RC1 and can update the docs quickly.
> >
> > I've seen the proposition to advertise capability to create flow rules
> > before device start as a flag.
> > I don't think it conflicts with Ori's suggestion because the flag
> > doesn't imply that _any_ rule can be created, neither does it say
> > about indirect actions.
> > On the other hand, if PMD can create a flow object (rule, etc.) when
> > the device is not started, it is logical to assume that after the
> > device is stopped it can move existing flow objects to the same state
> > as when the device was not started, then restore when it is started
> > again.
>
> Dmitry, thanks for the explanations. Ori's idea makes sense to me now. The
> problem is to document it properly. We must define rules to check it.
> Which bits in the check request matter and how application should make a
> choice of rule to try.
This is a generalization of the last question about the transfer bit.
I call the bits that matter a "kind". As I see it:
rule kind = seq. of item types + seq. of action types
indirect action kind = action type
As Ori mentioned, for mlx5 PMD transfer bit doesn't affect object persistence.
If you or other PMD maintainers think it may be relevant, no problem,
because PMDs like mlx5 will just ignore it when checking. Then it will be:
rule kind = seq. of item types + seq. of action types + attr. transfer bit
indirect action kind = action type + indirect action conf. transfer bit
Including the transfer bit seems to be a safe choice from DPDK point of view,
but obviously it can force applications to do up to twice as many checks.
The application needs to construct a valid flow configuration
that is (1) valid and (2) has the kind the application is interested in.
It is worth noting that these checks are not about resource consumption,
i.e. it is sufficient to test an indirect RSS with one queue
to be confident that indirect RSS with any number of queues are preserved.
> Which status code should be returned by the PMD to
> clearly say that addition in started state is not supported and,
> therefore, preserving across restart is not supported.
I suggest a new DPDK-specific value in rte_errno.h.
> Must the device be configured before an attempt to check it?
Yes, because flow objects created by these checks are as good as any others
and AFAIK no PMD supports rte_flow calls before configuration is done.
> Should transfer and non-transfer rules/indirect actions be checked separately?
See above.
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH 2/5] ethdev: add capability to keep shared objects on restart
  2021-10-15  9:04         ` Dmitry Kozlyuk
@ 2021-10-15  9:36           ` Andrew Rybchenko
  0 siblings, 0 replies; 96+ messages in thread
From: Andrew Rybchenko @ 2021-10-15  9:36 UTC (permalink / raw)
  To: Dmitry Kozlyuk, dev, Ori Kam, Raslan Darawsheh
  Cc: NBU-Contact-Thomas Monjalon, Ferruh Yigit
On 10/15/21 12:04 PM, Dmitry Kozlyuk wrote:
>> [...]
>>>>> From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
>>>>>
>>>>> rte_flow_action_handle_create() did not mention what happens with an
>>>>> indirect action when a device is stopped, possibly reconfigured, and
>>>>> started again. It is natural for some indirect actions to be
>>>>> persistent, like counters and meters; keeping others just saves
>>>>> application time and complexity. However, not all PMDs can support it.
>>>>> It is proposed to add a device capability to indicate if indirect
>>>>> actions are kept across the above sequence or implicitly destroyed.
>>>>>
>>>>> In the future, indirect actions may not be the only type of objects
>>>>> shared between flow rules. The capability bit intends to cover all
>>>>> possible types of such objects, hence its name.
>>>>>
>>>>> It may happen that in the future a PMD acquires support for a type
>>>>> of shared objects that it cannot keep across a restart. It is
>>>>> undesirable to stop advertising the capability so that applications
>>>>> that don't use objects of the problematic type can still take
>> advantage of it.
>>>>> This is why PMDs are allowed to keep only a subset of shared objects
>>>>> provided that the vendor mandatorily documents it.
>>>>>
>>>>> If the device is being reconfigured in a way that is incompatible
>>>>> with an existing shared objects, PMD is required to report an error.
>>>>> This is mandatory, because flow API does not supply users with
>>>>> capabilities, so this is the only way for a user to learn that
>>>>> configuration is invalid. For example, if queue count changes and
>>>>> RSS indirect action specifies queues that are going away, the user
>>>>> must update the action before removing the queues or remove the
>>>>> action and all flow rules that were using it.
>>>>>
>>>>> Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
>>>>> ---
>>>>> [...]
>>>>
>>>> Current pain point is that capability bits may be insufficient and a
>>>> programmatic way is desired to check which types of objects can be
>>>> kept across restart, instead of documenting the limitations.
>>>>
>>>> I support one of previous Ori's suggestions and want to clarify it [1]:
>>>>
>>>> Ori: "Another way is to assume that if the action was created before
>>>> port start it will be kept after port stop."
>>>> Andrew: "It does not sound like a solution. May be I simply don't
>>>> know target usecase."
>>>>
>>>> What Ori suggests (offline discussion summary): Suppose an
>>>> application wants to check whether a shared object (indirect action)
>>>> or a flow rule of a particular kind. It calls
>>>> rte_flow_action_handle_create() or
>>>> rte_flow_create() before rte_eth_dev_start(). If it succeeds, 1) it
>>>> means objects of this type can be kept across restart, 2) it's a
>>>> normal object created that will work after the port is started. This
>>>> is logical, because if the PMD can keep some kind of objects when the
>>>> port is stopped, it is likely to be able to create them when the port
>>>> is not started. It is subject to discussion if "object kind" means
>>>> only "type" or "type + transfer bit" combination; for mlx5 PMD it
>>>> doesn't matter. One minor drawback is that applications can only do
>>>> the test when the port is stopped, but it seems likely that the test
>>>> really needs to be done at startup anyway.
>>>>
>>>> If this is acceptable:
>>>> 1. Capability bits are not needed anymore.
>>>> 2. ethdev patches can be accepted in RC1, present behavior is
>>>> undefined anyway.
>>>> 3. PMD patches will need update that can be done by RC2.
>>>
>>> Andrew, what do you think?
>>> If you agree, do we need to include transfer bit into "kind"?
>>> I'd like to conclude before RC1 and can update the docs quickly.
>>>
>>> I've seen the proposition to advertise capability to create flow rules
>>> before device start as a flag.
>>> I don't think it conflicts with Ori's suggestion because the flag
>>> doesn't imply that _any_ rule can be created, neither does it say
>>> about indirect actions.
>>> On the other hand, if PMD can create a flow object (rule, etc.) when
>>> the device is not started, it is logical to assume that after the
>>> device is stopped it can move existing flow objects to the same state
>>> as when the device was not started, then restore when it is started
>>> again.
>>
>> Dmitry, thanks for the explanations. Ori's idea makes sense to me now. The
>> problem is to document it properly. We must define rules to check it.
>> Which bits in the check request matter and how application should make a
>> choice of rule to try.
> 
> This is a generalization of the last question about the transfer bit.
> I call the bits that matter a "kind". As I see it:
> 
> rule kind = seq. of item types + seq. of action types
> indirect action kind = action type
> 
> As Ori mentioned, for mlx5 PMD transfer bit doesn't affect object persistence.
> If you or other PMD maintainers think it may be relevant, no problem,
> because PMDs like mlx5 will just ignore it when checking. Then it will be:
> 
> rule kind = seq. of item types + seq. of action types + attr. transfer bit
> indirect action kind = action type + indirect action conf. transfer bit
> 
> Including the transfer bit seems to be a safe choice from DPDK point of view,
> but obviously it can force applications to do up to twice as many checks.
> 
> The application needs to construct a valid flow configuration
> that is (1) valid and (2) has the kind the application is interested in.
> It is worth noting that these checks are not about resource consumption,
> i.e. it is sufficient to test an indirect RSS with one queue
> to be confident that indirect RSS with any number of queues are preserved.
> 
>> Which status code should be returned by the PMD to
>> clearly say that addition in started state is not supported and,
>> therefore, preserving across restart is not supported.
> 
> I suggest a new DPDK-specific value in rte_errno.h.
> 
>> Must the device be configured before an attempt to check it?
> 
> Yes, because flow objects created by these checks are as good as any others
> and AFAIK no PMD supports rte_flow calls before configuration is done.
> 
>> Should transfer and non-transfer rules/indirect actions be checked separately?
> 
> See above.
> 
Please, try to put it into the patch for documentation
and I'll review the result. All my above questions
should be answered in the documentation.
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH 2/5] ethdev: add capability to keep shared objects on restart
  2021-10-14 21:45       ` Dmitry Kozlyuk
  2021-10-14 21:48         ` Dmitry Kozlyuk
@ 2021-10-15 11:46         ` Ferruh Yigit
  2021-10-15 12:35           ` Dmitry Kozlyuk
  1 sibling, 1 reply; 96+ messages in thread
From: Ferruh Yigit @ 2021-10-15 11:46 UTC (permalink / raw)
  To: Dmitry Kozlyuk, dev, Andrew Rybchenko, Ori Kam, Raslan Darawsheh
  Cc: NBU-Contact-Thomas Monjalon, Qi Zhang, jerinj, Maxime Coquelin
On 10/14/2021 10:45 PM, Dmitry Kozlyuk wrote:
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@intel.com>
>> Sent: 14 октября 2021 г. 16:47
>> To: Dmitry Kozlyuk <dkozlyuk@nvidia.com>; dev@dpdk.org; Andrew Rybchenko
>> <andrew.rybchenko@oktetlabs.ru>; Ori Kam <orika@nvidia.com>; Raslan
>> Darawsheh <rasland@nvidia.com>
>> Cc: NBU-Contact-Thomas Monjalon <thomas@monjalon.net>; Qi Zhang
>> <qi.z.zhang@intel.com>; jerinj@marvell.com; Maxime Coquelin
>> <maxime.coquelin@redhat.com>
>> Subject: Re: [PATCH 2/5] ethdev: add capability to keep shared objects on
>> restart
>>
>> External email: Use caution opening links or attachments
>>
>>
>> On 10/13/2021 9:32 AM, Dmitry Kozlyuk wrote:
>>> This thread continues discussions on previous versions to keep
>>> everything in the thread with final patches:
>>>
>>> [1]:
>>> http://inbox.dpdk.org/dev/d5673b58-5aa6-ca35-5b60-d938e56cfee1@oktetla
>>> bs.ru/
>>> [2]:
>>> http://inbox.dpdk.org/dev/DM8PR12MB5400997CCEC9169AC5AE0C89D6EA9@DM8PR
>>> 12MB5400.namprd12.prod.outlook.com/
>>>
>>> Please see below.
>>>
>>>> -----Original Message-----
>>>> From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
>>>> Sent: 5 октября 2021 г. 3:52
>>>> To: dev@dpdk.org
>>>> Cc: Dmitry Kozlyuk <dkozlyuk@nvidia.com>; Ori Kam <orika@nvidia.com>;
>>>> NBU- Contact-Thomas Monjalon <thomas@monjalon.net>; Ferruh Yigit
>>>> <ferruh.yigit@intel.com>; Andrew Rybchenko
>>>> <andrew.rybchenko@oktetlabs.ru>
>>>> Subject: [PATCH 2/5] ethdev: add capability to keep shared objects on
>>>> restart
>>>>
>>>> From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
>>>>
>>>> rte_flow_action_handle_create() did not mention what happens with an
>>>> indirect action when a device is stopped, possibly reconfigured, and
>>>> started again. It is natural for some indirect actions to be
>>>> persistent, like counters and meters; keeping others just saves
>>>> application time and complexity. However, not all PMDs can support it.
>>>> It is proposed to add a device capability to indicate if indirect
>>>> actions are kept across the above sequence or implicitly destroyed.
>>>>
>>>> In the future, indirect actions may not be the only type of objects
>>>> shared between flow rules. The capability bit intends to cover all
>>>> possible types of such objects, hence its name.
>>>>
>>>> It may happen that in the future a PMD acquires support for a type of
>>>> shared objects that it cannot keep across a restart. It is
>>>> undesirable to stop advertising the capability so that applications
>>>> that don't use objects of the problematic type can still take advantage
>> of it.
>>>> This is why PMDs are allowed to keep only a subset of shared objects
>>>> provided that the vendor mandatorily documents it.
>>>>
>>>> If the device is being reconfigured in a way that is incompatible
>>>> with an existing shared objects, PMD is required to report an error.
>>>> This is mandatory, because flow API does not supply users with
>>>> capabilities, so this is the only way for a user to learn that
>>>> configuration is invalid. For example, if queue count changes and RSS
>>>> indirect action specifies queues that are going away, the user must
>>>> update the action before removing the queues or remove the action and
>>>> all flow rules that were using it.
>>>>
>>>> Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
>>>> ---
>>>> [...]
>>>
>>> Current pain point is that capability bits may be insufficient and a
>>> programmatic way is desired to check which types of objects can be
>>> kept across restart, instead of documenting the limitations.
>>>
>>> I support one of previous Ori's suggestions and want to clarify it [1]:
>>>
>>> Ori: "Another way is to assume that if the action was created before
>> port start it will be kept after port stop."
>>> Andrew: "It does not sound like a solution. May be I simply don't know
>>> target usecase."
>>>
>>> What Ori suggests (offline discussion summary): Suppose an application
>> wants to check whether a shared object (indirect action) or a flow rule of
>> a particular kind. It calls rte_flow_action_handle_create() or
>> rte_flow_create() before rte_eth_dev_start(). If it succeeds, 1) it means
>> objects of this type can be kept across restart, 2) it's a normal object
>> created that will work after the port is started. This is logical, because
>> if the PMD can keep some kind of objects when the port is stopped, it is
>> likely to be able to create them when the port is not started. It is
>> subject to discussion if "object kind" means only "type" or "type +
>> transfer bit" combination; for mlx5 PMD it doesn't matter. One minor
>> drawback is that applications can only do the test when the port is
>> stopped, but it seems likely that the test really needs to be done at
>> startup anyway.
>>>
>>> If this is acceptable:
>>> 1. Capability bits are not needed anymore.
>>> 2. ethdev patches can be accepted in RC1, present behavior is undefined
>> anyway.
>>> 3. PMD patches will need update that can be done by RC2.
>>>
>>
>> Hi Dmitry,
>>
>> Are you planning to update drivers yourself on -rc2?
>> Or do you mean PMD maintainers should update themselves, if so do they
>> know about it?
>>
>> If the ethdev layer is updated in a way to impact the drivers, it should
>> be either:
>> - all drivers updated with a change
>> or
>> - give PMDs time to implement it on their own time, meanwhile they can
>> report their support status by a flag
>>
>> We had multiple sample of second case in the past but it is harder for
>> this case.
>>
>> For this case what about having three states:
>> - FLOW_RULE_KEEP
>> - FLOW_RULE_DESTROY
>> - FLOW_RULE_UNKNOWN
>>
>> And set 'FLOW_RULE_UNKNOWN' for all drivers, to simulate current status,
>> until driver is updated.
> 
> Hi Ferruh,
> 
> Indirect actions are only implemented by mlx5 PMD,
> the patches will be in RC2.
> If we don't use the flag as per the latest suggestion,
> nothing needs to be done for other PMDs.
> Flag can as well be kept with the following semantics:
> 0 => indirect actions are flushed on device stop
> 1 => at least some indirect actions are kept,
>       application should check types it's interested in
> 
My concerns is related to the 'flow rules', not indirect actions,
the patch mentions capability is for both of them.
> Introducing UNKNOWN state seems wrong to me.
> What should an application do when it is reported?
> Now there's just no way to learn how the PMD behaves,
> but if it provides a response, it can't be "I don't know what I do".
> 
I agree 'unknown' state is not ideal, but my intentions is prevent
drivers that not implemented this new feature report wrong capability.
Without capability, application already doesn't know how underlying
PMD behaves, so this is by default 'unknown' state.
I suggest keeping that state until driver explicitly updates its state
to the correct value.
But having below list is good, if you will update all drivers than
no need to have the 'unknown' state, but updating drivers may require
driver maintainers ack which can take some time.
Can you please clarify what is you plan according PMDs, will you update
them all, or will you only update mlx5 in -rc2?
And what is the exact plan for the -rc2 that you mention?
> Here's what I understood from the code, assuming there are no bugs
> Like allowing to stop the port and keep dangling flow handles:
> 
> bnxt        flush
> bonding     depends
> cnxk        can't figure out
> cxgbe       keep
> dpaa2       keep
> e1000       keep
> enic        flush
> failsafe    depends
> hinic       flush
> hns3        keep
> i40e        keep
> iavf        keep
> ice         keep
> igc         keep
> ipn3ke      keep
> ixgbe       keep
> mlx4        keep
> mlx5        flush
> mvpp2       keep
> octeontx2   can't figure out
> qede        keep
> sfc         flush
> softnic     flush
> tap         keep
> txgbe       keep
> 
> Currently one flag would be sufficient to describe PMD behavior:
> they either keep or flush the flow rules.
> If there are indeed no exceptions, which maintainers should confirm,
> I can add flag reporting myself.
> 
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH 2/5] ethdev: add capability to keep shared objects on restart
  2021-10-15 11:46         ` Ferruh Yigit
@ 2021-10-15 12:35           ` Dmitry Kozlyuk
  2021-10-15 16:26             ` Ferruh Yigit
  0 siblings, 1 reply; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-10-15 12:35 UTC (permalink / raw)
  To: Ferruh Yigit, dev, Andrew Rybchenko, Ori Kam, Raslan Darawsheh
  Cc: NBU-Contact-Thomas Monjalon, Qi Zhang, jerinj, Maxime Coquelin
> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@intel.com>
> [...]
> > Introducing UNKNOWN state seems wrong to me.
> > What should an application do when it is reported?
> > Now there's just no way to learn how the PMD behaves,
> > but if it provides a response, it can't be "I don't know what I do".
> >
> 
> I agree 'unknown' state is not ideal, but my intentions is prevent
> drivers that not implemented this new feature report wrong capability.
> 
> Without capability, application already doesn't know how underlying
> PMD behaves, so this is by default 'unknown' state.
> I suggest keeping that state until driver explicitly updates its state
> to the correct value.
My concern is that when all the drivers are changed to report a proper
capability, UNKNOWN remains in the API meaning "there's a bug in DPDK".
Instead of UNKNOWN response we can declare that rte_flow_flush()
must be called unless the application wants to keep the rules
and has made sure it's possible, or the behavior is undefined.
(Can be viewed as "UNKNOWN by default", but is simpler.)
This way neither UNKNOWN state is needed,
nor the bit saying the flow rules are flushed.
Here is why, let's consider KEEP and FLUSH combinations:
(1) FLUSH=0, KEEP=0 is equivalent to UNKNOWN, i.e. the application
                    must explicitly flush the rules itself
                    in order to get deterministic behavior.
(2) FLUSH=1, KEEP=0 means PMD flushes all rules on the device stop.
(3) FLUSH=0, KEEP=1 means PMD can keep at least some rules,
                    exact support must be checked with rte_flow_create()
                    when the device is stopped.
(4) FLUSH=1, KEEP=1 is forbidden.
If the application doesn't need the PMD to keep flow rules,
it can as well flush them always before the device stop
regardless of whether the driver does it automatically or not.
It's even simpler and probably as efficient. Testpmd does this.
If the application wants to take advantage of rule-keeping ability,
it just tests the KEEP bit. If it is unset that's the previous case,
application should call rte_flow_flush() before the device stop to be sure.
Otherwise, the application can test capability to keep flow rule kinds
it is interested in (see my reply to Andrew).
Result: no changes to PMDs are _immediately_ needed when such behavior
is documented. They can start advertising it whenever they like,
it's not even an RC2 task. Currently applications that relied on certain
behavior are non-portable anyway.
> But having below list is good, if you will update all drivers than
> no need to have the 'unknown' state, but updating drivers may require
> driver maintainers ack which can take some time.
If you agree with what I suggest above, there will be no urgency.
The list can be used to notify maintainers that they can enhance
their PMD user experience whenever they like.
> Can you please clarify what is you plan according PMDs, will you update
> them all, or will you only update mlx5 in -rc2?
> And what is the exact plan for the -rc2 that you mention?
mlx5 PMD will be updated with the patches from this series.
Regarding indirect actions: no other PMD needs an update.
Regarding flow rules: if the above suggestion is accepted,
no PMDs need to be updated urgently.
^ permalink raw reply	[flat|nested] 96+ messages in thread
* [dpdk-dev] [PATCH v2 0/5] Flow entites behavior on port restart
  2021-10-05  0:52 [dpdk-dev] [PATCH 0/5] Flow entites behavior on port restart dkozlyuk
                   ` (4 preceding siblings ...)
  2021-10-05  0:52 ` [dpdk-dev] [PATCH 5/5] net/mlx5: preserve indirect actions on restart dkozlyuk
@ 2021-10-15 16:18 ` Dmitry Kozlyuk
  2021-10-15 16:18   ` [dpdk-dev] [PATCH v2 1/5] ethdev: add capability to keep flow rules on restart Dmitry Kozlyuk
                     ` (5 more replies)
  5 siblings, 6 replies; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-10-15 16:18 UTC (permalink / raw)
  To: dev
It is unspecified whether flow rules and indirect actions are kept
when a port is stopped, possibly reconfigured, and started again.
Vendors approach the topic differently, e.g. mlx5 and i40e PMD
disagree in whether flow rules can be kept, and mlx5 PMD would keep
indirect actions. In the end, applications are greatly affected
by whatever contract there is and need to know it.
It is proposed to advertise capabilities of keeping flow rules
and indirect actions (as a special case of shared object)
using a combination of ethdev info and rte_flow calls.
Then a bug is fixed in mlx5 PMD that prevented indirect RSS action
from being kept, and the driver starts advertising the new capability.
Prior discussions:
1) http://inbox.dpdk.org/dev/20210727073121.895620-1-dkozlyuk@nvidia.com/
2) http://inbox.dpdk.org/dev/20210901085516.3647814-1-dkozlyuk@nvidia.com/
Dmitry Kozlyuk (5):
  ethdev: add capability to keep flow rules on restart
  ethdev: add capability to keep shared objects on restart
  net/mlx5: discover max flow priority using DevX
  net/mlx5: create drop queue using DevX
  net/mlx5: preserve indirect actions on restart
 doc/guides/prog_guide/rte_flow.rst |  51 +++++
 drivers/net/mlx5/linux/mlx5_os.c   |   5 -
 drivers/net/mlx5/mlx5_devx.c       | 211 ++++++++++++++++++---
 drivers/net/mlx5/mlx5_ethdev.c     |   1 +
 drivers/net/mlx5/mlx5_flow.c       | 292 ++++++++++++++++++++++++++---
 drivers/net/mlx5/mlx5_flow.h       |   6 +
 drivers/net/mlx5/mlx5_flow_dv.c    | 103 ++++++++++
 drivers/net/mlx5/mlx5_flow_verbs.c |  77 +-------
 drivers/net/mlx5/mlx5_rx.h         |   4 +
 drivers/net/mlx5/mlx5_rxq.c        |  99 ++++++++--
 drivers/net/mlx5/mlx5_trigger.c    |  10 +
 lib/ethdev/rte_ethdev.h            |  10 +
 lib/ethdev/rte_flow.h              |   1 +
 13 files changed, 733 insertions(+), 137 deletions(-)
-- 
2.25.1
^ permalink raw reply	[flat|nested] 96+ messages in thread
* [dpdk-dev] [PATCH v2 1/5] ethdev: add capability to keep flow rules on restart
  2021-10-15 16:18 ` [dpdk-dev] [PATCH v2 0/5] Flow entites behavior on port restart Dmitry Kozlyuk
@ 2021-10-15 16:18   ` Dmitry Kozlyuk
  2021-10-18  8:56     ` Andrew Rybchenko
  2021-10-18 13:06     ` Zhang, Qi Z
  2021-10-15 16:18   ` [dpdk-dev] [PATCH v2 2/5] ethdev: add capability to keep shared objects " Dmitry Kozlyuk
                     ` (4 subsequent siblings)
  5 siblings, 2 replies; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-10-15 16:18 UTC (permalink / raw)
  To: dev; +Cc: Ori Kam, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
Currently, it is not specified what happens to the flow rules when
the device is stopped, possibly reconfigured, then started.
If flow rules were kept, it could be convenient for application
developers, because they wouldn't need to save and restore them.
However, due to the number of flows and possible creation rate it is
impractical to save all flow rules in DPDK layer. This means that flow
rules persistence really depends on whether PMD and HW can implement it
efficiently. It can also be limited by the rule item and action types,
and its attributes transfer bit, which together comprise the rule kind.
Add a device capability bit for PMDs that can keep at least some
of the flow rules across restart. Without this capability behavior
is still unspecified, which is now explicitly stated.
Declare that the application can test for persitence of flow rules
of a particular kind by attempting to create a rule of that kind
when the device is stopped and checking for the specific error.
This is logical because if the PMD can to create the flow rule
when the device is not started and use it after the start happens,
it is natural that it can move its internal flow rule object
to the same state when the device is stopped and restore the state
when the device is started.
If the device is being reconfigured in a way that is incompatible with
existing flow rules, PMD is required to report an error.
This is mandatory, because flow API does not supply users with
capabilities, so this is the only way for a user to learn that
configuration is invalid. For example, if queue count changes and the
action of a flow rule specifies queues that are going away, the user
must update or remove the flow rule before removing the queues.
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
---
 doc/guides/prog_guide/rte_flow.rst | 27 +++++++++++++++++++++++++++
 lib/ethdev/rte_ethdev.h            |  7 +++++++
 lib/ethdev/rte_flow.h              |  1 +
 3 files changed, 35 insertions(+)
diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index 2b42d5ec8c..b0ced4209b 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -87,6 +87,33 @@ To avoid resource leaks on the PMD side, handles must be explicitly
 destroyed by the application before releasing associated resources such as
 queues and ports.
 
+By default it is unspecified if the flow rules persist after the device stop.
+If ``RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP`` is not advertised,
+then rules must be explicitly flushed before stopping the device
+if the application needs to ensure they are removed.
+If it is advertised, this means the PMD can keep at least some rules
+across the device stop and start with possible reconfiguration in between.
+However, it may be only supported for some kinds of rules.
+The kind is a combination of the following rule properties:
+
+- the sequence of item types;
+- the sequence of action types;
+- the value of the transfer attribute.
+
+To test if a particular kind of rules is kept, the application must try
+to create a valid rule of that kind when the device is stopped
+(after it has been configured or started previously).
+If it succeeds, all rules of the same kind are kept at the device stop.
+If it fails with an error of type ``RTE_FLOW_ERROR_TYPE_STATE``,
+rules of this kind are flushed when the device is stopped.
+Rules of a kept kind that are created when the device is stopped, including
+the rules created for the test, will be kept after the device is started.
+Some configuration changes may be incompatible with existing rules.
+In this case ``rte_eth_dev_configure()``, ``rte_eth_rx/tx_queue_setup()``,
+and/or ``rte_eth_dev_start()`` will fail with a log message from the PMD that
+should be similar to the one that would be emitted by ``rte_flow_create()``
+if an attempt was made to create the offending rule with the new configuration.
+
 The following sections cover:
 
 - **Attributes** (represented by ``struct rte_flow_attr``): properties of a
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 6d80514ba7..a0b388bb25 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -90,6 +90,11 @@
  *     - flow director filtering mode (but not filtering rules)
  *     - NIC queue statistics mappings
  *
+ * The following configuration may be retained or not
+ * depending on the device capabilities:
+ *
+ *     - flow rules
+ *
  * Any other configuration will not be stored and will need to be re-entered
  * before a call to rte_eth_dev_start().
  *
@@ -1445,6 +1450,8 @@ struct rte_eth_conf {
 #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
 /** Device supports Tx queue setup after device started. */
 #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
+/** Device supports keeping flow rules across restart. */
+#define RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP 0x00000004
 /**@}*/
 
 /*
diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
index a89945061a..aa0182d021 100644
--- a/lib/ethdev/rte_flow.h
+++ b/lib/ethdev/rte_flow.h
@@ -3344,6 +3344,7 @@ enum rte_flow_error_type {
 	RTE_FLOW_ERROR_TYPE_ACTION_NUM, /**< Number of actions. */
 	RTE_FLOW_ERROR_TYPE_ACTION_CONF, /**< Action configuration. */
 	RTE_FLOW_ERROR_TYPE_ACTION, /**< Specific action. */
+	RTE_FLOW_ERROR_TYPE_STATE, /**< Current device state. */
 };
 
 /**
-- 
2.25.1
^ permalink raw reply	[flat|nested] 96+ messages in thread
* [dpdk-dev] [PATCH v2 2/5] ethdev: add capability to keep shared objects on restart
  2021-10-15 16:18 ` [dpdk-dev] [PATCH v2 0/5] Flow entites behavior on port restart Dmitry Kozlyuk
  2021-10-15 16:18   ` [dpdk-dev] [PATCH v2 1/5] ethdev: add capability to keep flow rules on restart Dmitry Kozlyuk
@ 2021-10-15 16:18   ` Dmitry Kozlyuk
  2021-10-17  8:10     ` Ori Kam
  2021-10-15 16:18   ` [dpdk-dev] [PATCH v2 3/5] net/mlx5: discover max flow priority using DevX Dmitry Kozlyuk
                     ` (3 subsequent siblings)
  5 siblings, 1 reply; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-10-15 16:18 UTC (permalink / raw)
  To: dev; +Cc: Ori Kam, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
rte_flow_action_handle_create() did not mention what happens
with an indirect action when a device is stopped, possibly reconfigured,
and started again. It is natural for some indirect actions to be
persistent, like counters and meters; keeping others just saves
application time and complexity. However, not all PMDs can support it.
Also the support may be limited by particular action kinds, that is,
combinations of action type and the value of the transfer bit
in its configuration.
Add a device capability to indicate if at least some indirect actions
are kept across the above sequence. Without this capability the behavior
is still unspecified, but now it is stated explicitly.
In the future, indirect actions may not be the only type of objects
shared between flow rules. The capability bit intends to cover all
possible types of such objects, hence its name.
Declare that the application can test for the persistence
of a particular indirect action kind by attempting to create
an indirect action of that kind when the device is stopped
and checking for the specific error type.
This is logical because if the PMD can to create the flow rule
when the device is not started and use it after the start happens,
it is natural that it can move its internal flow shared object
to the same state when the device is stopped and restore the state
when the device is started.
If the device is being reconfigured in a way that is incompatible with
an existing shared objects, PMD is required to report an error.
This is mandatory, because flow API does not supply users with
capabilities, so this is the only way for a user to learn that
configuration is invalid. For example, if queue count changes and RSS
indirect action specifies queues that are going away, the user must
update the action before removing the queues or remove the action and
all flow rules that were using it.
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
---
 doc/guides/prog_guide/rte_flow.rst | 24 ++++++++++++++++++++++++
 lib/ethdev/rte_ethdev.h            |  3 +++
 2 files changed, 27 insertions(+)
diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index b0ced4209b..bf96ad830f 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -2812,6 +2812,30 @@ updated depend on the type of the ``action`` and different for every type.
 The indirect action specified data (e.g. counter) can be queried by
 ``rte_flow_action_handle_query()``.
 
+By default it is unspecified if indirect actions persist after the device stop.
+If ``RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP`` is not advertised,
+then indirect actions must be explicitly destroyed before stopping the device
+if the application needs to ensure they are removed.
+If it is advertised, this means the PMD can keep at least some indirect actions
+across device stop and start with possible reconfiguration in between.
+However, it may be only supported for certain kinds of indirect actions.
+The kind is a combination of the action type and the value of its transfer bit.
+To test if a particular kind of indirect actions is kept,
+the application must try to create a valid indirect action of that kind
+when the device is stopped (after it has been configured or started previously).
+If it succeeds, all indirect actions of the same kind are kept
+when the device is stopped.
+If it fails with an error of type ``RTE_FLOW_ERROR_TYPE_STATE``,
+indirect actions of this kind are flushed when the device is stopped.
+Indirect actions of a kept kind that are created when the device is stopped,
+including the ones created for the test, will be kept after the device start.
+Some configuration changes may be incompatible with existing indirect actions.
+In this case ``rte_eth_dev_configure()``, ``rte_eth_rx/tx_queue_setup()``,
+and/or ``rte_eth_dev_start()`` will fail with a log message from the PMD that
+should be similar to the one that would be emitted
+by ``rte_flow_action_handle_create()`` if an attempt was made
+to create the offending rule with the new configuration.
+
 .. _table_rte_flow_action_handle:
 
 .. table:: INDIRECT
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index a0b388bb25..12fc7262eb 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -94,6 +94,7 @@
  * depending on the device capabilities:
  *
  *     - flow rules
+ *     - flow-related shared objects, e.g. indirect actions
  *
  * Any other configuration will not be stored and will need to be re-entered
  * before a call to rte_eth_dev_start().
@@ -1452,6 +1453,8 @@ struct rte_eth_conf {
 #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
 /** Device supports keeping flow rules across restart. */
 #define RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP 0x00000004
+/** Device supports keeping shared flow objects across restart. */
+#define RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP 0x00000008
 /**@}*/
 
 /*
-- 
2.25.1
^ permalink raw reply	[flat|nested] 96+ messages in thread
* [dpdk-dev] [PATCH v2 3/5] net/mlx5: discover max flow priority using DevX
  2021-10-15 16:18 ` [dpdk-dev] [PATCH v2 0/5] Flow entites behavior on port restart Dmitry Kozlyuk
  2021-10-15 16:18   ` [dpdk-dev] [PATCH v2 1/5] ethdev: add capability to keep flow rules on restart Dmitry Kozlyuk
  2021-10-15 16:18   ` [dpdk-dev] [PATCH v2 2/5] ethdev: add capability to keep shared objects " Dmitry Kozlyuk
@ 2021-10-15 16:18   ` Dmitry Kozlyuk
  2021-10-15 16:18   ` [dpdk-dev] [PATCH v2 4/5] net/mlx5: create drop queue " Dmitry Kozlyuk
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-10-15 16:18 UTC (permalink / raw)
  To: dev; +Cc: stable, Matan Azrad, Viacheslav Ovsiienko
Maximum available flow priority was discovered using Verbs API
regardless of the selected flow engine. This required some Verbs
objects to be initialized in order to use DevX engine. Make priority
discovery an engine method and implement it for DevX using its API.
Cc: stable@dpdk.org
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_os.c   |   1 -
 drivers/net/mlx5/mlx5_flow.c       |  98 +++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_flow.h       |   4 ++
 drivers/net/mlx5/mlx5_flow_dv.c    | 103 +++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_flow_verbs.c |  77 +++------------------
 5 files changed, 215 insertions(+), 68 deletions(-)
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 3746057673..8ee7ada51b 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1830,7 +1830,6 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	priv->drop_queue.hrxq = mlx5_drop_action_create(eth_dev);
 	if (!priv->drop_queue.hrxq)
 		goto error;
-	/* Supported Verbs flow priority number detection. */
 	err = mlx5_flow_discover_priorities(eth_dev);
 	if (err < 0) {
 		err = -err;
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index c914a7120c..bfc3e20c9a 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -9416,3 +9416,101 @@ mlx5_dbg__print_pattern(const struct rte_flow_item *item)
 	}
 	printf("END\n");
 }
+
+/* Map of Verbs to Flow priority with 8 Verbs priorities. */
+static const uint32_t priority_map_3[][MLX5_PRIORITY_MAP_MAX] = {
+	{ 0, 1, 2 }, { 2, 3, 4 }, { 5, 6, 7 },
+};
+
+/* Map of Verbs to Flow priority with 16 Verbs priorities. */
+static const uint32_t priority_map_5[][MLX5_PRIORITY_MAP_MAX] = {
+	{ 0, 1, 2 }, { 3, 4, 5 }, { 6, 7, 8 },
+	{ 9, 10, 11 }, { 12, 13, 14 },
+};
+
+/**
+ * Discover the number of available flow priorities.
+ *
+ * @param dev
+ *   Ethernet device.
+ *
+ * @return
+ *   On success, number of available flow priorities.
+ *   On failure, a negative errno-style code and rte_errno is set.
+ */
+int
+mlx5_flow_discover_priorities(struct rte_eth_dev *dev)
+{
+	static const uint16_t vprio[] = {8, 16};
+	const struct mlx5_priv *priv = dev->data->dev_private;
+	const struct mlx5_flow_driver_ops *fops;
+	enum mlx5_flow_drv_type type;
+	int ret;
+
+	type = mlx5_flow_os_get_type();
+	if (type == MLX5_FLOW_TYPE_MAX) {
+		type = MLX5_FLOW_TYPE_VERBS;
+		if (priv->config.devx && priv->config.dv_flow_en)
+			type = MLX5_FLOW_TYPE_DV;
+	}
+	fops = flow_get_drv_ops(type);
+	if (fops->discover_priorities == NULL) {
+		DRV_LOG(ERR, "Priority discovery not supported");
+		rte_errno = ENOTSUP;
+		return -rte_errno;
+	}
+	ret = fops->discover_priorities(dev, vprio, RTE_DIM(vprio));
+	if (ret < 0)
+		return ret;
+	switch (ret) {
+	case 8:
+		ret = RTE_DIM(priority_map_3);
+		break;
+	case 16:
+		ret = RTE_DIM(priority_map_5);
+		break;
+	default:
+		rte_errno = ENOTSUP;
+		DRV_LOG(ERR,
+			"port %u maximum priority: %d expected 8/16",
+			dev->data->port_id, ret);
+		return -rte_errno;
+	}
+	DRV_LOG(INFO, "port %u supported flow priorities:"
+		" 0-%d for ingress or egress root table,"
+		" 0-%d for non-root table or transfer root table.",
+		dev->data->port_id, ret - 2,
+		MLX5_NON_ROOT_FLOW_MAX_PRIO - 1);
+	return ret;
+}
+
+/**
+ * Adjust flow priority based on the highest layer and the request priority.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] priority
+ *   The rule base priority.
+ * @param[in] subpriority
+ *   The priority based on the items.
+ *
+ * @return
+ *   The new priority.
+ */
+uint32_t
+mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
+			  uint32_t subpriority)
+{
+	uint32_t res = 0;
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	switch (priv->config.flow_prio) {
+	case RTE_DIM(priority_map_3):
+		res = priority_map_3[priority][subpriority];
+		break;
+	case RTE_DIM(priority_map_5):
+		res = priority_map_5[priority][subpriority];
+		break;
+	}
+	return  res;
+}
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 5c68d4f7d7..8f94125f26 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1226,6 +1226,9 @@ typedef int (*mlx5_flow_create_def_policy_t)
 			(struct rte_eth_dev *dev);
 typedef void (*mlx5_flow_destroy_def_policy_t)
 			(struct rte_eth_dev *dev);
+typedef int (*mlx5_flow_discover_priorities_t)
+			(struct rte_eth_dev *dev,
+			 const uint16_t *vprio, int vprio_n);
 
 struct mlx5_flow_driver_ops {
 	mlx5_flow_validate_t validate;
@@ -1260,6 +1263,7 @@ struct mlx5_flow_driver_ops {
 	mlx5_flow_action_update_t action_update;
 	mlx5_flow_action_query_t action_query;
 	mlx5_flow_sync_domain_t sync_domain;
+	mlx5_flow_discover_priorities_t discover_priorities;
 };
 
 /* mlx5_flow.c */
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index c6370cd1d6..155745748f 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -17978,6 +17978,108 @@ flow_dv_sync_domain(struct rte_eth_dev *dev, uint32_t domains, uint32_t flags)
 	return 0;
 }
 
+/**
+ * Discover the number of available flow priorities
+ * by trying to create a flow with the highest priority value
+ * for each possible number.
+ *
+ * @param[in] dev
+ *   Ethernet device.
+ * @param[in] vprio
+ *   List of possible number of available priorities.
+ * @param[in] vprio_n
+ *   Size of @p vprio array.
+ * @return
+ *   On success, number of available flow priorities.
+ *   On failure, a negative errno-style code and rte_errno is set.
+ */
+static int
+flow_dv_discover_priorities(struct rte_eth_dev *dev,
+			    const uint16_t *vprio, int vprio_n)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_indexed_pool *pool = priv->sh->ipool[MLX5_IPOOL_MLX5_FLOW];
+	struct rte_flow_item_eth eth;
+	struct rte_flow_item item = {
+		.type = RTE_FLOW_ITEM_TYPE_ETH,
+		.spec = ð,
+		.mask = ð,
+	};
+	struct mlx5_flow_dv_matcher matcher = {
+		.mask = {
+			.size = sizeof(matcher.mask.buf),
+		},
+	};
+	union mlx5_flow_tbl_key tbl_key;
+	struct mlx5_flow flow;
+	void *action;
+	struct rte_flow_error error;
+	uint8_t misc_mask;
+	int i, err, ret = -ENOTSUP;
+
+	/*
+	 * Prepare a flow with a catch-all pattern and a drop action.
+	 * Use drop queue, because shared drop action may be unavailable.
+	 */
+	action = priv->drop_queue.hrxq->action;
+	if (action == NULL) {
+		DRV_LOG(ERR, "Priority discovery requires a drop action");
+		rte_errno = ENOTSUP;
+		return -rte_errno;
+	}
+	memset(&flow, 0, sizeof(flow));
+	flow.handle = mlx5_ipool_zmalloc(pool, &flow.handle_idx);
+	if (flow.handle == NULL) {
+		DRV_LOG(ERR, "Cannot create flow handle");
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	flow.ingress = true;
+	flow.dv.value.size = MLX5_ST_SZ_BYTES(fte_match_param);
+	flow.dv.actions[0] = action;
+	flow.dv.actions_n = 1;
+	memset(ð, 0, sizeof(eth));
+	flow_dv_translate_item_eth(matcher.mask.buf, flow.dv.value.buf,
+				   &item, /* inner */ false, /* group */ 0);
+	matcher.crc = rte_raw_cksum(matcher.mask.buf, matcher.mask.size);
+	for (i = 0; i < vprio_n; i++) {
+		/* Configure the next proposed maximum priority. */
+		matcher.priority = vprio[i] - 1;
+		memset(&tbl_key, 0, sizeof(tbl_key));
+		err = flow_dv_matcher_register(dev, &matcher, &tbl_key, &flow,
+					       /* tunnel */ NULL,
+					       /* group */ 0,
+					       &error);
+		if (err != 0) {
+			/* This action is pure SW and must always succeed. */
+			DRV_LOG(ERR, "Cannot register matcher");
+			ret = -rte_errno;
+			break;
+		}
+		/* Try to apply the flow to HW. */
+		misc_mask = flow_dv_matcher_enable(flow.dv.value.buf);
+		__flow_dv_adjust_buf_size(&flow.dv.value.size, misc_mask);
+		err = mlx5_flow_os_create_flow
+				(flow.handle->dvh.matcher->matcher_object,
+				 (void *)&flow.dv.value, flow.dv.actions_n,
+				 flow.dv.actions, &flow.handle->drv_flow);
+		if (err == 0) {
+			claim_zero(mlx5_flow_os_destroy_flow
+						(flow.handle->drv_flow));
+			flow.handle->drv_flow = NULL;
+		}
+		claim_zero(flow_dv_matcher_release(dev, flow.handle));
+		if (err != 0)
+			break;
+		ret = vprio[i];
+	}
+	mlx5_ipool_free(pool, flow.handle_idx);
+	/* Set rte_errno if no expected priority value matched. */
+	if (ret < 0)
+		rte_errno = -ret;
+	return ret;
+}
+
 const struct mlx5_flow_driver_ops mlx5_flow_dv_drv_ops = {
 	.validate = flow_dv_validate,
 	.prepare = flow_dv_prepare,
@@ -18011,6 +18113,7 @@ const struct mlx5_flow_driver_ops mlx5_flow_dv_drv_ops = {
 	.action_update = flow_dv_action_update,
 	.action_query = flow_dv_action_query,
 	.sync_domain = flow_dv_sync_domain,
+	.discover_priorities = flow_dv_discover_priorities,
 };
 
 #endif /* HAVE_IBV_FLOW_DV_SUPPORT */
diff --git a/drivers/net/mlx5/mlx5_flow_verbs.c b/drivers/net/mlx5/mlx5_flow_verbs.c
index b93fd4d2c9..72b9db6c7f 100644
--- a/drivers/net/mlx5/mlx5_flow_verbs.c
+++ b/drivers/net/mlx5/mlx5_flow_verbs.c
@@ -28,17 +28,6 @@
 #define VERBS_SPEC_INNER(item_flags) \
 	(!!((item_flags) & MLX5_FLOW_LAYER_TUNNEL) ? IBV_FLOW_SPEC_INNER : 0)
 
-/* Map of Verbs to Flow priority with 8 Verbs priorities. */
-static const uint32_t priority_map_3[][MLX5_PRIORITY_MAP_MAX] = {
-	{ 0, 1, 2 }, { 2, 3, 4 }, { 5, 6, 7 },
-};
-
-/* Map of Verbs to Flow priority with 16 Verbs priorities. */
-static const uint32_t priority_map_5[][MLX5_PRIORITY_MAP_MAX] = {
-	{ 0, 1, 2 }, { 3, 4, 5 }, { 6, 7, 8 },
-	{ 9, 10, 11 }, { 12, 13, 14 },
-};
-
 /* Verbs specification header. */
 struct ibv_spec_header {
 	enum ibv_flow_spec_type type;
@@ -50,13 +39,17 @@ struct ibv_spec_header {
  *
  * @param[in] dev
  *   Pointer to the Ethernet device structure.
- *
+ * @param[in] vprio
+ *   Expected result variants.
+ * @param[in] vprio_n
+ *   Number of entries in @p vprio array.
  * @return
- *   number of supported flow priority on success, a negative errno
+ *   Number of supported flow priority on success, a negative errno
  *   value otherwise and rte_errno is set.
  */
-int
-mlx5_flow_discover_priorities(struct rte_eth_dev *dev)
+static int
+flow_verbs_discover_priorities(struct rte_eth_dev *dev,
+			       const uint16_t *vprio, int vprio_n)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct {
@@ -79,7 +72,6 @@ mlx5_flow_discover_priorities(struct rte_eth_dev *dev)
 	};
 	struct ibv_flow *flow;
 	struct mlx5_hrxq *drop = priv->drop_queue.hrxq;
-	uint16_t vprio[] = { 8, 16 };
 	int i;
 	int priority = 0;
 
@@ -87,7 +79,7 @@ mlx5_flow_discover_priorities(struct rte_eth_dev *dev)
 		rte_errno = ENOTSUP;
 		return -rte_errno;
 	}
-	for (i = 0; i != RTE_DIM(vprio); i++) {
+	for (i = 0; i != vprio_n; i++) {
 		flow_attr.attr.priority = vprio[i] - 1;
 		flow = mlx5_glue->create_flow(drop->qp, &flow_attr.attr);
 		if (!flow)
@@ -95,59 +87,9 @@ mlx5_flow_discover_priorities(struct rte_eth_dev *dev)
 		claim_zero(mlx5_glue->destroy_flow(flow));
 		priority = vprio[i];
 	}
-	switch (priority) {
-	case 8:
-		priority = RTE_DIM(priority_map_3);
-		break;
-	case 16:
-		priority = RTE_DIM(priority_map_5);
-		break;
-	default:
-		rte_errno = ENOTSUP;
-		DRV_LOG(ERR,
-			"port %u verbs maximum priority: %d expected 8/16",
-			dev->data->port_id, priority);
-		return -rte_errno;
-	}
-	DRV_LOG(INFO, "port %u supported flow priorities:"
-		" 0-%d for ingress or egress root table,"
-		" 0-%d for non-root table or transfer root table.",
-		dev->data->port_id, priority - 2,
-		MLX5_NON_ROOT_FLOW_MAX_PRIO - 1);
 	return priority;
 }
 
-/**
- * Adjust flow priority based on the highest layer and the request priority.
- *
- * @param[in] dev
- *   Pointer to the Ethernet device structure.
- * @param[in] priority
- *   The rule base priority.
- * @param[in] subpriority
- *   The priority based on the items.
- *
- * @return
- *   The new priority.
- */
-uint32_t
-mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
-				   uint32_t subpriority)
-{
-	uint32_t res = 0;
-	struct mlx5_priv *priv = dev->data->dev_private;
-
-	switch (priv->config.flow_prio) {
-	case RTE_DIM(priority_map_3):
-		res = priority_map_3[priority][subpriority];
-		break;
-	case RTE_DIM(priority_map_5):
-		res = priority_map_5[priority][subpriority];
-		break;
-	}
-	return  res;
-}
-
 /**
  * Get Verbs flow counter by index.
  *
@@ -2105,4 +2047,5 @@ const struct mlx5_flow_driver_ops mlx5_flow_verbs_drv_ops = {
 	.destroy = flow_verbs_destroy,
 	.query = flow_verbs_query,
 	.sync_domain = flow_verbs_sync_domain,
+	.discover_priorities = flow_verbs_discover_priorities,
 };
-- 
2.25.1
^ permalink raw reply	[flat|nested] 96+ messages in thread
* [dpdk-dev] [PATCH v2 4/5] net/mlx5: create drop queue using DevX
  2021-10-15 16:18 ` [dpdk-dev] [PATCH v2 0/5] Flow entites behavior on port restart Dmitry Kozlyuk
                     ` (2 preceding siblings ...)
  2021-10-15 16:18   ` [dpdk-dev] [PATCH v2 3/5] net/mlx5: discover max flow priority using DevX Dmitry Kozlyuk
@ 2021-10-15 16:18   ` Dmitry Kozlyuk
  2021-10-15 16:18   ` [dpdk-dev] [PATCH v2 5/5] net/mlx5: preserve indirect actions on restart Dmitry Kozlyuk
  2021-10-19 12:37   ` [dpdk-dev] [PATCH v3 0/6] Flow entites behavior on port restart Dmitry Kozlyuk
  5 siblings, 0 replies; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-10-15 16:18 UTC (permalink / raw)
  To: dev; +Cc: stable, Matan Azrad, Viacheslav Ovsiienko
Drop queue creation and destruction were not implemented for DevX
flow engine and Verbs engine methods were used as a workaround.
Implement these methods for DevX so that there is a valid queue ID
that can be used regardless of queue configuration via API.
Cc: stable@dpdk.org
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_os.c |   4 -
 drivers/net/mlx5/mlx5_devx.c     | 211 ++++++++++++++++++++++++++-----
 2 files changed, 180 insertions(+), 35 deletions(-)
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 8ee7ada51b..985f0bd489 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1790,10 +1790,6 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	}
 	if (config->devx && config->dv_flow_en && config->dest_tir) {
 		priv->obj_ops = devx_obj_ops;
-		priv->obj_ops.drop_action_create =
-						ibv_obj_ops.drop_action_create;
-		priv->obj_ops.drop_action_destroy =
-						ibv_obj_ops.drop_action_destroy;
 #ifndef HAVE_MLX5DV_DEVX_UAR_OFFSET
 		priv->obj_ops.txq_obj_modify = ibv_obj_ops.txq_obj_modify;
 #else
diff --git a/drivers/net/mlx5/mlx5_devx.c b/drivers/net/mlx5/mlx5_devx.c
index a1db53577a..1e62108c94 100644
--- a/drivers/net/mlx5/mlx5_devx.c
+++ b/drivers/net/mlx5/mlx5_devx.c
@@ -226,17 +226,17 @@ mlx5_rx_devx_get_event(struct mlx5_rxq_obj *rxq_obj)
  *
  * @param dev
  *   Pointer to Ethernet device.
- * @param idx
- *   Queue index in DPDK Rx queue array.
+ * @param rxq_data
+ *   RX queue data.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_rxq_create_devx_rq_resources(struct rte_eth_dev *dev, uint16_t idx)
+mlx5_rxq_create_devx_rq_resources(struct rte_eth_dev *dev,
+				  struct mlx5_rxq_data *rxq_data)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[idx];
 	struct mlx5_rxq_ctrl *rxq_ctrl =
 		container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
 	struct mlx5_devx_create_rq_attr rq_attr = { 0 };
@@ -289,20 +289,20 @@ mlx5_rxq_create_devx_rq_resources(struct rte_eth_dev *dev, uint16_t idx)
  *
  * @param dev
  *   Pointer to Ethernet device.
- * @param idx
- *   Queue index in DPDK Rx queue array.
+ * @param rxq_data
+ *   RX queue data.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_rxq_create_devx_cq_resources(struct rte_eth_dev *dev, uint16_t idx)
+mlx5_rxq_create_devx_cq_resources(struct rte_eth_dev *dev,
+				  struct mlx5_rxq_data *rxq_data)
 {
 	struct mlx5_devx_cq *cq_obj = 0;
 	struct mlx5_devx_cq_attr cq_attr = { 0 };
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_dev_ctx_shared *sh = priv->sh;
-	struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[idx];
 	struct mlx5_rxq_ctrl *rxq_ctrl =
 		container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
 	unsigned int cqe_n = mlx5_rxq_cqe_num(rxq_data);
@@ -497,13 +497,13 @@ mlx5_rxq_devx_obj_new(struct rte_eth_dev *dev, uint16_t idx)
 		tmpl->fd = mlx5_os_get_devx_channel_fd(tmpl->devx_channel);
 	}
 	/* Create CQ using DevX API. */
-	ret = mlx5_rxq_create_devx_cq_resources(dev, idx);
+	ret = mlx5_rxq_create_devx_cq_resources(dev, rxq_data);
 	if (ret) {
 		DRV_LOG(ERR, "Failed to create CQ.");
 		goto error;
 	}
 	/* Create RQ using DevX API. */
-	ret = mlx5_rxq_create_devx_rq_resources(dev, idx);
+	ret = mlx5_rxq_create_devx_rq_resources(dev, rxq_data);
 	if (ret) {
 		DRV_LOG(ERR, "Port %u Rx queue %u RQ creation failure.",
 			dev->data->port_id, idx);
@@ -536,6 +536,11 @@ mlx5_rxq_devx_obj_new(struct rte_eth_dev *dev, uint16_t idx)
  *   Pointer to Ethernet device.
  * @param log_n
  *   Log of number of queues in the array.
+ * @param queues
+ *   List of RX queue indices or NULL, in which case
+ *   the attribute will be filled by drop queue ID.
+ * @param queues_n
+ *   Size of @p queues array or 0 if it is NULL.
  * @param ind_tbl
  *   DevX indirection table object.
  *
@@ -563,6 +568,11 @@ mlx5_devx_ind_table_create_rqt_attr(struct rte_eth_dev *dev,
 	}
 	rqt_attr->rqt_max_size = priv->config.ind_table_max_size;
 	rqt_attr->rqt_actual_size = rqt_n;
+	if (queues == NULL) {
+		for (i = 0; i < rqt_n; i++)
+			rqt_attr->rq_list[i] = priv->drop_queue.rxq->rq->id;
+		return rqt_attr;
+	}
 	for (i = 0; i != queues_n; ++i) {
 		struct mlx5_rxq_data *rxq = (*priv->rxqs)[queues[i]];
 		struct mlx5_rxq_ctrl *rxq_ctrl =
@@ -595,11 +605,12 @@ mlx5_devx_ind_table_new(struct rte_eth_dev *dev, const unsigned int log_n,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_devx_rqt_attr *rqt_attr = NULL;
+	const uint16_t *queues = dev->data->dev_started ? ind_tbl->queues :
+							  NULL;
 
 	MLX5_ASSERT(ind_tbl);
-	rqt_attr = mlx5_devx_ind_table_create_rqt_attr(dev, log_n,
-							ind_tbl->queues,
-							ind_tbl->queues_n);
+	rqt_attr = mlx5_devx_ind_table_create_rqt_attr(dev, log_n, queues,
+						       ind_tbl->queues_n);
 	if (!rqt_attr)
 		return -rte_errno;
 	ind_tbl->rqt = mlx5_devx_cmd_create_rqt(priv->sh->ctx, rqt_attr);
@@ -670,7 +681,8 @@ mlx5_devx_ind_table_destroy(struct mlx5_ind_table_obj *ind_tbl)
  * @param[in] hash_fields
  *   Verbs protocol hash field to make the RSS on.
  * @param[in] ind_tbl
- *   Indirection table for TIR.
+ *   Indirection table for TIR. If table queues array is NULL,
+ *   a TIR for drop queue is assumed.
  * @param[in] tunnel
  *   Tunnel type.
  * @param[out] tir_attr
@@ -686,19 +698,27 @@ mlx5_devx_tir_attr_set(struct rte_eth_dev *dev, const uint8_t *rss_key,
 		       int tunnel, struct mlx5_devx_tir_attr *tir_attr)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[ind_tbl->queues[0]];
-	struct mlx5_rxq_ctrl *rxq_ctrl =
-		container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
-	enum mlx5_rxq_type rxq_obj_type = rxq_ctrl->type;
+	enum mlx5_rxq_type rxq_obj_type;
 	bool lro = true;
 	uint32_t i;
 
-	/* Enable TIR LRO only if all the queues were configured for. */
-	for (i = 0; i < ind_tbl->queues_n; ++i) {
-		if (!(*priv->rxqs)[ind_tbl->queues[i]]->lro) {
-			lro = false;
-			break;
+	/* NULL queues designate drop queue. */
+	if (ind_tbl->queues != NULL) {
+		struct mlx5_rxq_data *rxq_data =
+					(*priv->rxqs)[ind_tbl->queues[0]];
+		struct mlx5_rxq_ctrl *rxq_ctrl =
+			container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
+		rxq_obj_type = rxq_ctrl->type;
+
+		/* Enable TIR LRO only if all the queues were configured for. */
+		for (i = 0; i < ind_tbl->queues_n; ++i) {
+			if (!(*priv->rxqs)[ind_tbl->queues[i]]->lro) {
+				lro = false;
+				break;
+			}
 		}
+	} else {
+		rxq_obj_type = priv->drop_queue.rxq->rxq_ctrl->type;
 	}
 	memset(tir_attr, 0, sizeof(*tir_attr));
 	tir_attr->disp_type = MLX5_TIRC_DISP_TYPE_INDIRECT;
@@ -857,7 +877,7 @@ mlx5_devx_hrxq_modify(struct rte_eth_dev *dev, struct mlx5_hrxq *hrxq,
 }
 
 /**
- * Create a DevX drop action for Rx Hash queue.
+ * Create a DevX drop Rx queue.
  *
  * @param dev
  *   Pointer to Ethernet device.
@@ -866,14 +886,99 @@ mlx5_devx_hrxq_modify(struct rte_eth_dev *dev, struct mlx5_hrxq *hrxq,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_devx_drop_action_create(struct rte_eth_dev *dev)
+mlx5_rxq_devx_obj_drop_create(struct rte_eth_dev *dev)
 {
-	(void)dev;
-	DRV_LOG(ERR, "DevX drop action is not supported yet.");
-	rte_errno = ENOTSUP;
+	struct mlx5_priv *priv = dev->data->dev_private;
+	int socket_id = dev->device->numa_node;
+	struct mlx5_rxq_ctrl *rxq_ctrl;
+	struct mlx5_rxq_data *rxq_data;
+	struct mlx5_rxq_obj *rxq = NULL;
+	int ret;
+
+	/*
+	 * Initialize dummy control structures.
+	 * They are required to hold pointers for cleanup
+	 * and are only accessible via drop queue DevX objects.
+	 */
+	rxq_ctrl = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*rxq_ctrl),
+			       0, socket_id);
+	if (rxq_ctrl == NULL) {
+		DRV_LOG(ERR, "Port %u could not allocate drop queue control",
+			dev->data->port_id);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	rxq = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*rxq), 0, socket_id);
+	if (rxq == NULL) {
+		DRV_LOG(ERR, "Port %u could not allocate drop queue object",
+			dev->data->port_id);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	rxq->rxq_ctrl = rxq_ctrl;
+	rxq_ctrl->type = MLX5_RXQ_TYPE_STANDARD;
+	rxq_ctrl->priv = priv;
+	rxq_ctrl->obj = rxq;
+	rxq_data = &rxq_ctrl->rxq;
+	/* Create CQ using DevX API. */
+	ret = mlx5_rxq_create_devx_cq_resources(dev, rxq_data);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Port %u drop queue CQ creation failed.",
+			dev->data->port_id);
+		goto error;
+	}
+	/* Create RQ using DevX API. */
+	ret = mlx5_rxq_create_devx_rq_resources(dev, rxq_data);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Port %u drop queue RQ creation failed.",
+			dev->data->port_id);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	/* Change queue state to ready. */
+	ret = mlx5_devx_modify_rq(rxq, MLX5_RXQ_MOD_RST2RDY);
+	if (ret != 0)
+		goto error;
+	/* Initialize drop queue. */
+	priv->drop_queue.rxq = rxq;
+	return 0;
+error:
+	ret = rte_errno; /* Save rte_errno before cleanup. */
+	if (rxq != NULL) {
+		if (rxq->rq_obj.rq != NULL)
+			mlx5_devx_rq_destroy(&rxq->rq_obj);
+		if (rxq->cq_obj.cq != NULL)
+			mlx5_devx_cq_destroy(&rxq->cq_obj);
+		if (rxq->devx_channel)
+			mlx5_os_devx_destroy_event_channel
+							(rxq->devx_channel);
+		mlx5_free(rxq);
+	}
+	if (rxq_ctrl != NULL)
+		mlx5_free(rxq_ctrl);
+	rte_errno = ret; /* Restore rte_errno. */
 	return -rte_errno;
 }
 
+/**
+ * Release drop Rx queue resources.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ */
+static void
+mlx5_rxq_devx_obj_drop_release(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rxq_obj *rxq = priv->drop_queue.rxq;
+	struct mlx5_rxq_ctrl *rxq_ctrl = rxq->rxq_ctrl;
+
+	mlx5_rxq_devx_obj_release(rxq);
+	mlx5_free(rxq);
+	mlx5_free(rxq_ctrl);
+	priv->drop_queue.rxq = NULL;
+}
+
 /**
  * Release a drop hash Rx queue.
  *
@@ -883,9 +988,53 @@ mlx5_devx_drop_action_create(struct rte_eth_dev *dev)
 static void
 mlx5_devx_drop_action_destroy(struct rte_eth_dev *dev)
 {
-	(void)dev;
-	DRV_LOG(ERR, "DevX drop action is not supported yet.");
-	rte_errno = ENOTSUP;
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_hrxq *hrxq = priv->drop_queue.hrxq;
+
+	if (hrxq->tir != NULL)
+		mlx5_devx_tir_destroy(hrxq);
+	if (hrxq->ind_table->ind_table != NULL)
+		mlx5_devx_ind_table_destroy(hrxq->ind_table);
+	if (priv->drop_queue.rxq->rq != NULL)
+		mlx5_rxq_devx_obj_drop_release(dev);
+}
+
+/**
+ * Create a DevX drop action for Rx Hash queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_devx_drop_action_create(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_hrxq *hrxq = priv->drop_queue.hrxq;
+	int ret;
+
+	ret = mlx5_rxq_devx_obj_drop_create(dev);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Cannot create drop RX queue");
+		return ret;
+	}
+	/* hrxq->ind_table queues are NULL, drop RX queue ID will be used */
+	ret = mlx5_devx_ind_table_new(dev, 0, hrxq->ind_table);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Cannot create drop hash RX queue indirection table");
+		goto error;
+	}
+	ret = mlx5_devx_hrxq_new(dev, hrxq, /* tunnel */ false);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Cannot create drop hash RX queue");
+		goto error;
+	}
+	return 0;
+error:
+	mlx5_devx_drop_action_destroy(dev);
+	return ret;
 }
 
 /**
-- 
2.25.1
^ permalink raw reply	[flat|nested] 96+ messages in thread
* [dpdk-dev] [PATCH v2 5/5] net/mlx5: preserve indirect actions on restart
  2021-10-15 16:18 ` [dpdk-dev] [PATCH v2 0/5] Flow entites behavior on port restart Dmitry Kozlyuk
                     ` (3 preceding siblings ...)
  2021-10-15 16:18   ` [dpdk-dev] [PATCH v2 4/5] net/mlx5: create drop queue " Dmitry Kozlyuk
@ 2021-10-15 16:18   ` Dmitry Kozlyuk
  2021-10-19 12:37   ` [dpdk-dev] [PATCH v3 0/6] Flow entites behavior on port restart Dmitry Kozlyuk
  5 siblings, 0 replies; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-10-15 16:18 UTC (permalink / raw)
  To: dev; +Cc: bingz, stable, Matan Azrad, Viacheslav Ovsiienko
MLX5 PMD uses reference counting to manage RX queue resources.
After port stop shared RSS actions kept references to RX queues,
preventing resource release. As a result, internal PMD mempool
for such queues had been exhausted after a number of port restarts.
Diagnostic message from rte_eth_dev_start():
    Rx queue allocation failed: Cannot allocate memory
Dereference RX queues used by indirect actions on port stop (detach)
and restore references on port start (attach) in order to allow RX queue
resource release, but keep indirect RSS across the port restart.
Replace queue IDs in HW by drop queue ID on detach and restore actual
queue IDs on attach.
When the port is stopped, create indirect RSS in the detached state.
As a result, MLX5 PMD is able to keep all its indirect actions
across port restart. Advertise this capability.
Fixes: 4b61b8774be9 ("ethdev: introduce indirect flow action")
Cc: bingz@nvidia.com
Cc: stable@dpdk.org
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/net/mlx5/mlx5_ethdev.c  |   1 +
 drivers/net/mlx5/mlx5_flow.c    | 194 ++++++++++++++++++++++++++++----
 drivers/net/mlx5/mlx5_flow.h    |   2 +
 drivers/net/mlx5/mlx5_rx.h      |   4 +
 drivers/net/mlx5/mlx5_rxq.c     |  99 ++++++++++++++--
 drivers/net/mlx5/mlx5_trigger.c |  10 ++
 6 files changed, 276 insertions(+), 34 deletions(-)
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 82e2284d98..419fec3e4e 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -321,6 +321,7 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
 	info->rx_offload_capa = (mlx5_get_rx_port_offloads() |
 				 info->rx_queue_offload_capa);
 	info->tx_offload_capa = mlx5_get_tx_port_offloads(dev);
+	info->dev_capa = RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP;
 	info->if_index = mlx5_ifindex(dev);
 	info->reta_size = priv->reta_idx_n ?
 		priv->reta_idx_n : config->ind_table_max_size;
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index bfc3e20c9a..c10b911259 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -1560,6 +1560,58 @@ mlx5_flow_validate_action_queue(const struct rte_flow_action *action,
 	return 0;
 }
 
+/**
+ * Validate queue numbers for device RSS.
+ *
+ * @param[in] dev
+ *   Configured device.
+ * @param[in] queues
+ *   Array of queue numbers.
+ * @param[in] queues_n
+ *   Size of the @p queues array.
+ * @param[out] error
+ *   On error, filled with a textual error description.
+ * @param[out] queue
+ *   On error, filled with an offending queue index in @p queues array.
+ *
+ * @return
+ *   0 on success, a negative errno code on error.
+ */
+static int
+mlx5_validate_rss_queues(const struct rte_eth_dev *dev,
+			 const uint16_t *queues, uint32_t queues_n,
+			 const char **error, uint32_t *queue_idx)
+{
+	const struct mlx5_priv *priv = dev->data->dev_private;
+	enum mlx5_rxq_type rxq_type = MLX5_RXQ_TYPE_UNDEFINED;
+	uint32_t i;
+
+	for (i = 0; i != queues_n; ++i) {
+		struct mlx5_rxq_ctrl *rxq_ctrl;
+
+		if (queues[i] >= priv->rxqs_n) {
+			*error = "queue index out of range";
+			*queue_idx = i;
+			return -EINVAL;
+		}
+		if (!(*priv->rxqs)[queues[i]]) {
+			*error =  "queue is not configured";
+			*queue_idx = i;
+			return -EINVAL;
+		}
+		rxq_ctrl = container_of((*priv->rxqs)[queues[i]],
+					struct mlx5_rxq_ctrl, rxq);
+		if (i == 0)
+			rxq_type = rxq_ctrl->type;
+		if (rxq_type != rxq_ctrl->type) {
+			*error = "combining hairpin and regular RSS queues is not supported";
+			*queue_idx = i;
+			return -ENOTSUP;
+		}
+	}
+	return 0;
+}
+
 /*
  * Validate the rss action.
  *
@@ -1580,8 +1632,9 @@ mlx5_validate_action_rss(struct rte_eth_dev *dev,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	const struct rte_flow_action_rss *rss = action->conf;
-	enum mlx5_rxq_type rxq_type = MLX5_RXQ_TYPE_UNDEFINED;
-	unsigned int i;
+	int ret;
+	const char *message;
+	uint32_t queue_idx;
 
 	if (rss->func != RTE_ETH_HASH_FUNCTION_DEFAULT &&
 	    rss->func != RTE_ETH_HASH_FUNCTION_TOEPLITZ)
@@ -1645,27 +1698,12 @@ mlx5_validate_action_rss(struct rte_eth_dev *dev,
 		return rte_flow_error_set(error, EINVAL,
 					  RTE_FLOW_ERROR_TYPE_ACTION_CONF,
 					  NULL, "No queues configured");
-	for (i = 0; i != rss->queue_num; ++i) {
-		struct mlx5_rxq_ctrl *rxq_ctrl;
-
-		if (rss->queue[i] >= priv->rxqs_n)
-			return rte_flow_error_set
-				(error, EINVAL,
-				 RTE_FLOW_ERROR_TYPE_ACTION_CONF,
-				 &rss->queue[i], "queue index out of range");
-		if (!(*priv->rxqs)[rss->queue[i]])
-			return rte_flow_error_set
-				(error, EINVAL, RTE_FLOW_ERROR_TYPE_ACTION_CONF,
-				 &rss->queue[i], "queue is not configured");
-		rxq_ctrl = container_of((*priv->rxqs)[rss->queue[i]],
-					struct mlx5_rxq_ctrl, rxq);
-		if (i == 0)
-			rxq_type = rxq_ctrl->type;
-		if (rxq_type != rxq_ctrl->type)
-			return rte_flow_error_set
-				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION_CONF,
-				 &rss->queue[i],
-				 "combining hairpin and regular RSS queues is not supported");
+	ret = mlx5_validate_rss_queues(dev, rss->queue, rss->queue_num,
+				       &message, &queue_idx);
+	if (ret != 0) {
+		return rte_flow_error_set(error, -ret,
+					  RTE_FLOW_ERROR_TYPE_ACTION_CONF,
+					  &rss->queue[queue_idx], message);
 	}
 	return 0;
 }
@@ -8547,6 +8585,116 @@ mlx5_action_handle_flush(struct rte_eth_dev *dev)
 	return ret;
 }
 
+/**
+ * Validate existing indirect actions against current device configuration
+ * and attach them to device resources.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_action_handle_attach(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_indexed_pool *ipool =
+			priv->sh->ipool[MLX5_IPOOL_RSS_SHARED_ACTIONS];
+	struct mlx5_shared_action_rss *shared_rss, *shared_rss_last;
+	int ret = 0;
+	uint32_t idx;
+
+	ILIST_FOREACH(ipool, priv->rss_shared_actions, idx, shared_rss, next) {
+		struct mlx5_ind_table_obj *ind_tbl = shared_rss->ind_tbl;
+		const char *message;
+		uint32_t queue_idx;
+
+		ret = mlx5_validate_rss_queues(dev, ind_tbl->queues,
+					       ind_tbl->queues_n,
+					       &message, &queue_idx);
+		if (ret != 0) {
+			DRV_LOG(ERR, "Port %u cannot use queue %u in RSS: %s",
+				dev->data->port_id, ind_tbl->queues[queue_idx],
+				message);
+			break;
+		}
+	}
+	if (ret != 0)
+		return ret;
+	ILIST_FOREACH(ipool, priv->rss_shared_actions, idx, shared_rss, next) {
+		struct mlx5_ind_table_obj *ind_tbl = shared_rss->ind_tbl;
+
+		ret = mlx5_ind_table_obj_attach(dev, ind_tbl);
+		if (ret != 0) {
+			DRV_LOG(ERR, "Port %u could not attach "
+				"indirection table obj %p",
+				dev->data->port_id, (void *)ind_tbl);
+			goto error;
+		}
+	}
+	return 0;
+error:
+	shared_rss_last = shared_rss;
+	ILIST_FOREACH(ipool, priv->rss_shared_actions, idx, shared_rss, next) {
+		struct mlx5_ind_table_obj *ind_tbl = shared_rss->ind_tbl;
+
+		if (shared_rss == shared_rss_last)
+			break;
+		if (mlx5_ind_table_obj_detach(dev, ind_tbl) != 0)
+			DRV_LOG(CRIT, "Port %u could not detach "
+				"indirection table obj %p on rollback",
+				dev->data->port_id, (void *)ind_tbl);
+	}
+	return ret;
+}
+
+/**
+ * Detach indirect actions of the device from its resources.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_action_handle_detach(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_indexed_pool *ipool =
+			priv->sh->ipool[MLX5_IPOOL_RSS_SHARED_ACTIONS];
+	struct mlx5_shared_action_rss *shared_rss, *shared_rss_last;
+	int ret = 0;
+	uint32_t idx;
+
+	ILIST_FOREACH(ipool, priv->rss_shared_actions, idx, shared_rss, next) {
+		struct mlx5_ind_table_obj *ind_tbl = shared_rss->ind_tbl;
+
+		ret = mlx5_ind_table_obj_detach(dev, ind_tbl);
+		if (ret != 0) {
+			DRV_LOG(ERR, "Port %u could not detach "
+				"indirection table obj %p",
+				dev->data->port_id, (void *)ind_tbl);
+			goto error;
+		}
+	}
+	return 0;
+error:
+	shared_rss_last = shared_rss;
+	ILIST_FOREACH(ipool, priv->rss_shared_actions, idx, shared_rss, next) {
+		struct mlx5_ind_table_obj *ind_tbl = shared_rss->ind_tbl;
+
+		if (shared_rss == shared_rss_last)
+			break;
+		if (mlx5_ind_table_obj_attach(dev, ind_tbl) != 0)
+			DRV_LOG(CRIT, "Port %u could not attach "
+				"indirection table obj %p on rollback",
+				dev->data->port_id, (void *)ind_tbl);
+	}
+	return ret;
+}
+
 #ifndef HAVE_MLX5DV_DR
 #define MLX5_DOMAIN_SYNC_FLOW ((1 << 0) | (1 << 1))
 #else
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 8f94125f26..6bc7946cc3 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1574,6 +1574,8 @@ void mlx5_flow_destroy_sub_policy_with_rxq(struct rte_eth_dev *dev,
 		struct mlx5_flow_meter_policy *mtr_policy);
 int mlx5_flow_dv_discover_counter_offset_support(struct rte_eth_dev *dev);
 int mlx5_flow_discover_dr_action_support(struct rte_eth_dev *dev);
+int mlx5_action_handle_attach(struct rte_eth_dev *dev);
+int mlx5_action_handle_detach(struct rte_eth_dev *dev);
 int mlx5_action_handle_flush(struct rte_eth_dev *dev);
 void mlx5_release_tunnel_hub(struct mlx5_dev_ctx_shared *sh, uint16_t port_id);
 int mlx5_alloc_tunnel_hub(struct mlx5_dev_ctx_shared *sh);
diff --git a/drivers/net/mlx5/mlx5_rx.h b/drivers/net/mlx5/mlx5_rx.h
index 2b7ad3e48b..d44c8078de 100644
--- a/drivers/net/mlx5/mlx5_rx.h
+++ b/drivers/net/mlx5/mlx5_rx.h
@@ -222,6 +222,10 @@ int mlx5_ind_table_obj_modify(struct rte_eth_dev *dev,
 			      struct mlx5_ind_table_obj *ind_tbl,
 			      uint16_t *queues, const uint32_t queues_n,
 			      bool standalone);
+int mlx5_ind_table_obj_attach(struct rte_eth_dev *dev,
+			      struct mlx5_ind_table_obj *ind_tbl);
+int mlx5_ind_table_obj_detach(struct rte_eth_dev *dev,
+			      struct mlx5_ind_table_obj *ind_tbl);
 struct mlx5_list_entry *mlx5_hrxq_create_cb(void *tool_ctx, void *cb_ctx);
 int mlx5_hrxq_match_cb(void *tool_ctx, struct mlx5_list_entry *entry,
 		       void *cb_ctx);
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index b68443bed5..fd2b5779ff 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -2015,6 +2015,26 @@ mlx5_ind_table_obj_new(struct rte_eth_dev *dev, const uint16_t *queues,
 	return ind_tbl;
 }
 
+static int
+mlx5_ind_table_obj_check_standalone(struct rte_eth_dev *dev __rte_unused,
+				    struct mlx5_ind_table_obj *ind_tbl)
+{
+	uint32_t refcnt;
+
+	refcnt = __atomic_load_n(&ind_tbl->refcnt, __ATOMIC_RELAXED);
+	if (refcnt <= 1)
+		return 0;
+	/*
+	 * Modification of indirection tables having more than 1
+	 * reference is unsupported.
+	 */
+	DRV_LOG(DEBUG,
+		"Port %u cannot modify indirection table %p (refcnt %u > 1).",
+		dev->data->port_id, (void *)ind_tbl, refcnt);
+	rte_errno = EINVAL;
+	return -rte_errno;
+}
+
 /**
  * Modify an indirection table.
  *
@@ -2047,18 +2067,8 @@ mlx5_ind_table_obj_modify(struct rte_eth_dev *dev,
 
 	MLX5_ASSERT(standalone);
 	RTE_SET_USED(standalone);
-	if (__atomic_load_n(&ind_tbl->refcnt, __ATOMIC_RELAXED) > 1) {
-		/*
-		 * Modification of indirection ntables having more than 1
-		 * reference unsupported. Intended for standalone indirection
-		 * tables only.
-		 */
-		DRV_LOG(DEBUG,
-			"Port %u cannot modify indirection table (refcnt> 1).",
-			dev->data->port_id);
-		rte_errno = EINVAL;
+	if (mlx5_ind_table_obj_check_standalone(dev, ind_tbl) < 0)
 		return -rte_errno;
-	}
 	for (i = 0; i != queues_n; ++i) {
 		if (!mlx5_rxq_get(dev, queues[i])) {
 			ret = -rte_errno;
@@ -2084,6 +2094,73 @@ mlx5_ind_table_obj_modify(struct rte_eth_dev *dev,
 	return ret;
 }
 
+/**
+ * Attach an indirection table to its queues.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param ind_table
+ *   Indirection table to attach.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_ind_table_obj_attach(struct rte_eth_dev *dev,
+			  struct mlx5_ind_table_obj *ind_tbl)
+{
+	unsigned int i;
+	int ret;
+
+	ret = mlx5_ind_table_obj_modify(dev, ind_tbl, ind_tbl->queues,
+					ind_tbl->queues_n, true);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Port %u could not modify indirect table obj %p",
+			dev->data->port_id, (void *)ind_tbl);
+		return ret;
+	}
+	for (i = 0; i < ind_tbl->queues_n; i++)
+		mlx5_rxq_get(dev, ind_tbl->queues[i]);
+	return 0;
+}
+
+/**
+ * Detach an indirection table from its queues.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param ind_table
+ *   Indirection table to detach.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_ind_table_obj_detach(struct rte_eth_dev *dev,
+			  struct mlx5_ind_table_obj *ind_tbl)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const unsigned int n = rte_is_power_of_2(ind_tbl->queues_n) ?
+			       log2above(ind_tbl->queues_n) :
+			       log2above(priv->config.ind_table_max_size);
+	unsigned int i;
+	int ret;
+
+	ret = mlx5_ind_table_obj_check_standalone(dev, ind_tbl);
+	if (ret != 0)
+		return ret;
+	MLX5_ASSERT(priv->obj_ops.ind_table_modify);
+	ret = priv->obj_ops.ind_table_modify(dev, n, NULL, 0, ind_tbl);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Port %u could not modify indirect table obj %p",
+			dev->data->port_id, (void *)ind_tbl);
+		return ret;
+	}
+	for (i = 0; i < ind_tbl->queues_n; i++)
+		mlx5_rxq_release(dev, ind_tbl->queues[i]);
+	return ret;
+}
+
 int
 mlx5_hrxq_match_cb(void *tool_ctx __rte_unused, struct mlx5_list_entry *entry,
 		   void *cb_ctx)
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 54173bfacb..c3adf5082e 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -14,6 +14,7 @@
 #include <mlx5_malloc.h>
 
 #include "mlx5.h"
+#include "mlx5_flow.h"
 #include "mlx5_mr.h"
 #include "mlx5_rx.h"
 #include "mlx5_tx.h"
@@ -1113,6 +1114,14 @@ mlx5_dev_start(struct rte_eth_dev *dev)
 	mlx5_rxq_timestamp_set(dev);
 	/* Set a mask and offset of scheduling on timestamp into Tx queues. */
 	mlx5_txq_dynf_timestamp_set(dev);
+	/* Attach indirection table objects detached on port stop. */
+	ret = mlx5_action_handle_attach(dev);
+	if (ret) {
+		DRV_LOG(ERR,
+			"port %u failed to attach indirect actions: %s",
+			dev->data->port_id, rte_strerror(rte_errno));
+		goto error;
+	}
 	/*
 	 * In non-cached mode, it only needs to start the default mreg copy
 	 * action and no flow created by application exists anymore.
@@ -1185,6 +1194,7 @@ mlx5_dev_stop(struct rte_eth_dev *dev)
 	/* All RX queue flags will be cleared in the flush interface. */
 	mlx5_flow_list_flush(dev, MLX5_FLOW_TYPE_GEN, true);
 	mlx5_flow_meter_rxq_flush(dev);
+	mlx5_action_handle_detach(dev);
 	mlx5_rx_intr_vec_disable(dev);
 	priv->sh->port[priv->dev_port - 1].ih_port_id = RTE_MAX_ETHPORTS;
 	priv->sh->port[priv->dev_port - 1].devx_ih_port_id = RTE_MAX_ETHPORTS;
-- 
2.25.1
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH 2/5] ethdev: add capability to keep shared objects on restart
  2021-10-15 12:35           ` Dmitry Kozlyuk
@ 2021-10-15 16:26             ` Ferruh Yigit
  2021-10-16 20:32               ` Dmitry Kozlyuk
  0 siblings, 1 reply; 96+ messages in thread
From: Ferruh Yigit @ 2021-10-15 16:26 UTC (permalink / raw)
  To: Dmitry Kozlyuk, dev, Andrew Rybchenko, Ori Kam, Raslan Darawsheh
  Cc: NBU-Contact-Thomas Monjalon, Qi Zhang, jerinj, Maxime Coquelin
On 10/15/2021 1:35 PM, Dmitry Kozlyuk wrote:
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@intel.com>
>> [...]
>>> Introducing UNKNOWN state seems wrong to me.
>>> What should an application do when it is reported?
>>> Now there's just no way to learn how the PMD behaves,
>>> but if it provides a response, it can't be "I don't know what I do".
>>>
>>
>> I agree 'unknown' state is not ideal, but my intentions is prevent
>> drivers that not implemented this new feature report wrong capability.
>>
>> Without capability, application already doesn't know how underlying
>> PMD behaves, so this is by default 'unknown' state.
>> I suggest keeping that state until driver explicitly updates its state
>> to the correct value.
> 
> My concern is that when all the drivers are changed to report a proper
> capability, UNKNOWN remains in the API meaning "there's a bug in DPDK".
> 
When all drivers are changed, of course we can remove the 'unknown' flag.
> Instead of UNKNOWN response we can declare that rte_flow_flush()
> must be called unless the application wants to keep the rules
> and has made sure it's possible, or the behavior is undefined.
> (Can be viewed as "UNKNOWN by default", but is simpler.)
> This way neither UNKNOWN state is needed,
> nor the bit saying the flow rules are flushed.
> Here is why, let's consider KEEP and FLUSH combinations:
> 
> (1) FLUSH=0, KEEP=0 is equivalent to UNKNOWN, i.e. the application
>                      must explicitly flush the rules itself
>                      in order to get deterministic behavior.
> (2) FLUSH=1, KEEP=0 means PMD flushes all rules on the device stop.
> (3) FLUSH=0, KEEP=1 means PMD can keep at least some rules,
>                      exact support must be checked with rte_flow_create()
>                      when the device is stopped.
> (4) FLUSH=1, KEEP=1 is forbidden.
> 
What is 'FLUSH' here? Are you proposing a new capability?
> If the application doesn't need the PMD to keep flow rules,
> it can as well flush them always before the device stop
> regardless of whether the driver does it automatically or not.
> It's even simpler and probably as efficient. Testpmd does this.
> If the application wants to take advantage of rule-keeping ability,
> it just tests the KEEP bit. If it is unset that's the previous case,
> application should call rte_flow_flush() before the device stop to be sure.
> Otherwise, the application can test capability to keep flow rule kinds
> it is interested in (see my reply to Andrew).
> 
Overall this is an optimization, application can workaround without this
capability.
If driver doesn't set KEEP capability, it is not clear what does it
mean, driver doesn't keep rules or driver is not updated yet.
I suggest to update comment to clarify the meaning of the missing KEEP
flag.
And unless we have two explicit status flags application can never be
sure that driver doesn't keep rules after stop. I am don't know if
application wants to know this.
Other concern is how PMD maintainers will know that there is something
to update here, I am sure many driver maintainers won't even be aware of
this, your patch doesn't even cc them. Your approach feels like you are
thinking only single PMD and ignore rest.
My intention was to have a way to follow drivers that is not updated,
by marking them with UNKNOWN flag. But this also doesn't work with new
drivers, they may forget setting capability.
What about following:
1) Clarify KEEP flag meaning:
having KEEP: flow rules are kept after stop
missing KEEP: unknown behavior
2) Mark all PMDs with useless flag:
dev_capa &= ~KEEP
Maintainer can remove or update this later, and we can easily track it.
> Result: no changes to PMDs are _immediately_ needed when such behavior
> is documented. They can start advertising it whenever they like,
> it's not even an RC2 task. Currently applications that relied on certain
> behavior are non-portable anyway.
> 
>> But having below list is good, if you will update all drivers than
>> no need to have the 'unknown' state, but updating drivers may require
>> driver maintainers ack which can take some time.
> 
> If you agree with what I suggest above, there will be no urgency.
> The list can be used to notify maintainers that they can enhance
> their PMD user experience whenever they like.
> 
>> Can you please clarify what is you plan according PMDs, will you update
>> them all, or will you only update mlx5 in -rc2?
>> And what is the exact plan for the -rc2 that you mention?
> 
> mlx5 PMD will be updated with the patches from this series.
> Regarding indirect actions: no other PMD needs an update.
> Regarding flow rules: if the above suggestion is accepted,
> no PMDs need to be updated urgently.
> 
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH 2/5] ethdev: add capability to keep shared objects on restart
  2021-10-15 16:26             ` Ferruh Yigit
@ 2021-10-16 20:32               ` Dmitry Kozlyuk
  2021-10-18  8:42                 ` Ferruh Yigit
  0 siblings, 1 reply; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-10-16 20:32 UTC (permalink / raw)
  To: Ferruh Yigit, dev, Andrew Rybchenko, Ori Kam, Raslan Darawsheh
  Cc: NBU-Contact-Thomas Monjalon, Qi Zhang, jerinj, Maxime Coquelin
> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@intel.com>
> Sent: 15 октября 2021 г. 19:27
> To: Dmitry Kozlyuk <dkozlyuk@nvidia.com>; dev@dpdk.org; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>; Ori Kam <orika@nvidia.com>; Raslan
> Darawsheh <rasland@nvidia.com>
> Cc: NBU-Contact-Thomas Monjalon <thomas@monjalon.net>; Qi Zhang
> <qi.z.zhang@intel.com>; jerinj@marvell.com; Maxime Coquelin
> <maxime.coquelin@redhat.com>
> Subject: Re: [PATCH 2/5] ethdev: add capability to keep shared objects on
> restart
> 
> External email: Use caution opening links or attachments
> 
> 
> On 10/15/2021 1:35 PM, Dmitry Kozlyuk wrote:
> >> -----Original Message-----
> >> From: Ferruh Yigit <ferruh.yigit@intel.com>
> >> [...]
> >>> Introducing UNKNOWN state seems wrong to me.
> >>> What should an application do when it is reported?
> >>> Now there's just no way to learn how the PMD behaves,
> >>> but if it provides a response, it can't be "I don't know what I do".
> >>>
> >>
> >> I agree 'unknown' state is not ideal, but my intentions is prevent
> >> drivers that not implemented this new feature report wrong capability.
> >>
> >> Without capability, application already doesn't know how underlying
> >> PMD behaves, so this is by default 'unknown' state.
> >> I suggest keeping that state until driver explicitly updates its state
> >> to the correct value.
> >
> > My concern is that when all the drivers are changed to report a proper
> > capability, UNKNOWN remains in the API meaning "there's a bug in DPDK".
> >
> 
> When all drivers are changed, of course we can remove the 'unknown' flag.
> 
> > Instead of UNKNOWN response we can declare that rte_flow_flush()
> > must be called unless the application wants to keep the rules
> > and has made sure it's possible, or the behavior is undefined.
> > (Can be viewed as "UNKNOWN by default", but is simpler.)
> > This way neither UNKNOWN state is needed,
> > nor the bit saying the flow rules are flushed.
> > Here is why, let's consider KEEP and FLUSH combinations:
> >
> > (1) FLUSH=0, KEEP=0 is equivalent to UNKNOWN, i.e. the application
> >                      must explicitly flush the rules itself
> >                      in order to get deterministic behavior.
> > (2) FLUSH=1, KEEP=0 means PMD flushes all rules on the device stop.
> > (3) FLUSH=0, KEEP=1 means PMD can keep at least some rules,
> >                      exact support must be checked with
> rte_flow_create()
> >                      when the device is stopped.
> > (4) FLUSH=1, KEEP=1 is forbidden.
> >
> 
> What is 'FLUSH' here? Are you proposing a new capability?
> 
> > If the application doesn't need the PMD to keep flow rules,
> > it can as well flush them always before the device stop
> > regardless of whether the driver does it automatically or not.
> > It's even simpler and probably as efficient. Testpmd does this.
> > If the application wants to take advantage of rule-keeping ability,
> > it just tests the KEEP bit. If it is unset that's the previous case,
> > application should call rte_flow_flush() before the device stop to be
> sure.
> > Otherwise, the application can test capability to keep flow rule kinds
> > it is interested in (see my reply to Andrew).
> >
> 
> Overall this is an optimization, application can workaround without this
> capability.
> 
> If driver doesn't set KEEP capability, it is not clear what does it
> mean, driver doesn't keep rules or driver is not updated yet.
> I suggest to update comment to clarify the meaning of the missing KEEP
> flag.
> 
> And unless we have two explicit status flags application can never be
> sure that driver doesn't keep rules after stop. I am don't know if
> application wants to know this.
> 
> Other concern is how PMD maintainers will know that there is something
> to update here, I am sure many driver maintainers won't even be aware of
> this, your patch doesn't even cc them. Your approach feels like you are
> thinking only single PMD and ignore rest.
> 
> My intention was to have a way to follow drivers that is not updated,
> by marking them with UNKNOWN flag. But this also doesn't work with new
> drivers, they may forget setting capability.
> 
> 
> What about following:
> 1) Clarify KEEP flag meaning:
> having KEEP: flow rules are kept after stop
> missing KEEP: unknown behavior
> 
> 2) Mark all PMDs with useless flag:
> dev_capa &= ~KEEP
> Maintainer can remove or update this later, and we can easily track it.
Item 1) is almost what I did in v2. The difference (or clarification) is that
if the bit is set, it doesn't mean that all rules are kept.
It allows the PMD to not support keeping some kinds of rules.
Please see the doc update about how the kind is defined
and how the application can test what is unsupported.
This complication is needed so that if a PMD cannot keep some exotic kind of rules,
it is not forced to remove the capability completely,
blocking optimizations even if the application doesn't use problematic rule kinds.
It makes the capability future-proof.
The second flag (FLUSH) would not be of much help.
Consider it is not set, but the PMD can keep some kinds of rules.
The application still needs to test all the kinds it needs.
But it needs to do the same if the KEEP bit is set.
Only if it is set the application can skip the tests and rte_flow_flush(),
but these optimizations are small compared to keeping the rules itself.
Item 2) needs not to be done, because the absence of the bit is *the* useless value:
it means the unspecified same behavior as it is presently.
It is worth noting that currently any application that relies on the PMD
to keep or flush the rules is non-portable, because PMD is allowed to do anything.
To get a reliable behavior application must explicitly clear the rules.
Regarding you concern about maintainers forgetting to update PMDs,
I think there are better task-tracking tools then constants in the code
(the authors of golang's context.TODO may disagree :)
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH v2 2/5] ethdev: add capability to keep shared objects on restart
  2021-10-15 16:18   ` [dpdk-dev] [PATCH v2 2/5] ethdev: add capability to keep shared objects " Dmitry Kozlyuk
@ 2021-10-17  8:10     ` Ori Kam
  2021-10-17  9:14       ` Dmitry Kozlyuk
  0 siblings, 1 reply; 96+ messages in thread
From: Ori Kam @ 2021-10-17  8:10 UTC (permalink / raw)
  To: Dmitry Kozlyuk, dev
  Cc: NBU-Contact-Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
Hi Dmitry
> -----Original Message-----
> From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> Sent: Friday, October 15, 2021 7:18 PM
> To: dev@dpdk.org
> Subject: [PATCH v2 2/5] ethdev: add capability to keep shared objects on restart
> 
> rte_flow_action_handle_create() did not mention what happens with an indirect action when a
> device is stopped, possibly reconfigured, and started again. It is natural for some indirect actions to be
> persistent, like counters and meters; keeping others just saves application time and complexity.
> However, not all PMDs can support it.
> Also the support may be limited by particular action kinds, that is, combinations of action type and the
> value of the transfer bit in its configuration.
> 
> Add a device capability to indicate if at least some indirect actions are kept across the above sequence.
> Without this capability the behavior is still unspecified, but now it is stated explicitly.
> In the future, indirect actions may not be the only type of objects shared between flow rules. The
> capability bit intends to cover all possible types of such objects, hence its name.
> 
> Declare that the application can test for the persistence of a particular indirect action kind by
> attempting to create an indirect action of that kind when the device is stopped and checking for the
> specific error type.
> This is logical because if the PMD can to create the flow rule when the device is not started and use it
> after the start happens, it is natural that it can move its internal flow shared object to the same state
> when the device is stopped and restore the state when the device is started.
> 
> If the device is being reconfigured in a way that is incompatible with an existing shared objects, PMD is
> required to report an error.
> This is mandatory, because flow API does not supply users with capabilities, so this is the only way for
> a user to learn that configuration is invalid. For example, if queue count changes and RSS indirect
> action specifies queues that are going away, the user must update the action before removing the
> queues or remove the action and all flow rules that were using it.
> 
> Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> ---
>  doc/guides/prog_guide/rte_flow.rst | 24 ++++++++++++++++++++++++
>  lib/ethdev/rte_ethdev.h            |  3 +++
>  2 files changed, 27 insertions(+)
> 
> diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
> index b0ced4209b..bf96ad830f 100644
> --- a/doc/guides/prog_guide/rte_flow.rst
> +++ b/doc/guides/prog_guide/rte_flow.rst
> @@ -2812,6 +2812,30 @@ updated depend on the type of the ``action`` and different for every type.
>  The indirect action specified data (e.g. counter) can be queried by
> ``rte_flow_action_handle_query()``.
> 
> +By default it is unspecified if indirect actions persist after the device stop.
> +If ``RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP`` is not advertised, then
> +indirect actions must be explicitly destroyed before stopping the
> +device if the application needs to ensure they are removed.
I don't understand the above line, if indirect actions must be explicitly destroyed
what does it means "if the application needs to ensure?)
I think it should say just like now, application should destroy all shared objects before
stopping. If application doesn't call the destroy the state of the action is undefined.
> +If it is advertised, this means the PMD can keep at least some indirect
> +actions across device stop and start with possible reconfiguration in between.
> +However, it may be only supported for certain kinds of indirect actions.
> +The kind is a combination of the action type and the value of its transfer bit.
> +To test if a particular kind of indirect actions is kept, the
> +application must try to create a valid indirect action of that kind
> +when the device is stopped (after it has been configured or started previously).
> +If it succeeds, all indirect actions of the same kind are kept when the
> +device is stopped.
> +If it fails with an error of type ``RTE_FLOW_ERROR_TYPE_STATE``,
> +indirect actions of this kind are flushed when the device is stopped.
> +Indirect actions of a kept kind that are created when the device is
> +stopped, including the ones created for the test, will be kept after the device start.
> +Some configuration changes may be incompatible with existing indirect actions.
> +In this case ``rte_eth_dev_configure()``,
> +``rte_eth_rx/tx_queue_setup()``, and/or ``rte_eth_dev_start()`` will
> +fail with a log message from the PMD that should be similar to the one
> +that would be emitted by ``rte_flow_action_handle_create()`` if an
> +attempt was made to create the offending rule with the new configuration.
> +
>  .. _table_rte_flow_action_handle:
> 
>  .. table:: INDIRECT
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index a0b388bb25..12fc7262eb 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -94,6 +94,7 @@
>   * depending on the device capabilities:
>   *
>   *     - flow rules
> + *     - flow-related shared objects, e.g. indirect actions
>   *
>   * Any other configuration will not be stored and will need to be re-entered
>   * before a call to rte_eth_dev_start().
> @@ -1452,6 +1453,8 @@ struct rte_eth_conf {  #define
> RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
>  /** Device supports keeping flow rules across restart. */  #define
> RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP 0x00000004
> +/** Device supports keeping shared flow objects across restart. */
> +#define RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP 0x00000008
>  /**@}*/
> 
>  /*
> --
> 2.25.1
Best,
Ori
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH v2 2/5] ethdev: add capability to keep shared objects on restart
  2021-10-17  8:10     ` Ori Kam
@ 2021-10-17  9:14       ` Dmitry Kozlyuk
  2021-10-17  9:45         ` Ori Kam
  0 siblings, 1 reply; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-10-17  9:14 UTC (permalink / raw)
  To: Ori Kam, dev; +Cc: NBU-Contact-Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
> [...]
> > +By default it is unspecified if indirect actions persist after the
> device stop.
> > +If ``RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP`` is not advertised, then
> > +indirect actions must be explicitly destroyed before stopping the
> > +device if the application needs to ensure they are removed.
> 
> I don't understand the above line, if indirect actions must be explicitly
> destroyed
> what does it means "if the application needs to ensure?)
> I think it should say just like now, application should destroy all shared
> objects before
> stopping. If application doesn't call the destroy the state of the action
> is undefined.
I had in mind that some applications don't care,
because they only stop the port before closing it.
Would the following be more explicit?
"If ``RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP`` is not advertised,
the state of indirect actions after the device stop is undefined.
Application must destroy all indirect actions before stopping the port
if it intends to start it later, or unwanted indirect actions can remain."
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH v2 2/5] ethdev: add capability to keep shared objects on restart
  2021-10-17  9:14       ` Dmitry Kozlyuk
@ 2021-10-17  9:45         ` Ori Kam
  0 siblings, 0 replies; 96+ messages in thread
From: Ori Kam @ 2021-10-17  9:45 UTC (permalink / raw)
  To: Dmitry Kozlyuk, dev
  Cc: NBU-Contact-Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
Hi Dmitry.
> -----Original Message-----
> From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> Sent: Sunday, October 17, 2021 12:15 PM
> Subject: RE: [PATCH v2 2/5] ethdev: add capability to keep shared objects on restart
> 
> > [...]
> > > +By default it is unspecified if indirect actions persist after the
> > device stop.
> > > +If ``RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP`` is not advertised,
> > > +then indirect actions must be explicitly destroyed before stopping
> > > +the device if the application needs to ensure they are removed.
> >
> > I don't understand the above line, if indirect actions must be
> > explicitly destroyed what does it means "if the application needs to
> > ensure?) I think it should say just like now, application should
> > destroy all shared objects before stopping. If application doesn't
> > call the destroy the state of the action is undefined.
> 
> I had in mind that some applications don't care, because they only stop the port before closing it.
> Would the following be more explicit?
> 
> "If ``RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP`` is not advertised, the state of indirect
> actions after the device stop is undefined.
> Application must destroy all indirect actions before stopping the port if it intends to start it later, or
> unwanted indirect actions can remain."
Sounds better, but I would also add something like this: if PMD doesn''t report keep, application should
release resource before stopping / closing the port otherwise resource leak may happen. 
This should also be in the flows patch.
Best,
Ori
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH 2/5] ethdev: add capability to keep shared objects on restart
  2021-10-16 20:32               ` Dmitry Kozlyuk
@ 2021-10-18  8:42                 ` Ferruh Yigit
  2021-10-18 11:13                   ` Dmitry Kozlyuk
  0 siblings, 1 reply; 96+ messages in thread
From: Ferruh Yigit @ 2021-10-18  8:42 UTC (permalink / raw)
  To: Dmitry Kozlyuk, dev, Andrew Rybchenko, Ori Kam, Raslan Darawsheh,
	NBU-Contact-Thomas Monjalon
  Cc: Qi Zhang, jerinj, Maxime Coquelin
On 10/16/2021 9:32 PM, Dmitry Kozlyuk wrote:
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@intel.com>
>> Sent: 15 октября 2021 г. 19:27
>> To: Dmitry Kozlyuk <dkozlyuk@nvidia.com>; dev@dpdk.org; Andrew Rybchenko
>> <andrew.rybchenko@oktetlabs.ru>; Ori Kam <orika@nvidia.com>; Raslan
>> Darawsheh <rasland@nvidia.com>
>> Cc: NBU-Contact-Thomas Monjalon <thomas@monjalon.net>; Qi Zhang
>> <qi.z.zhang@intel.com>; jerinj@marvell.com; Maxime Coquelin
>> <maxime.coquelin@redhat.com>
>> Subject: Re: [PATCH 2/5] ethdev: add capability to keep shared objects on
>> restart
>>
>> External email: Use caution opening links or attachments
>>
>>
>> On 10/15/2021 1:35 PM, Dmitry Kozlyuk wrote:
>>>> -----Original Message-----
>>>> From: Ferruh Yigit <ferruh.yigit@intel.com>
>>>> [...]
>>>>> Introducing UNKNOWN state seems wrong to me.
>>>>> What should an application do when it is reported?
>>>>> Now there's just no way to learn how the PMD behaves,
>>>>> but if it provides a response, it can't be "I don't know what I do".
>>>>>
>>>>
>>>> I agree 'unknown' state is not ideal, but my intentions is prevent
>>>> drivers that not implemented this new feature report wrong capability.
>>>>
>>>> Without capability, application already doesn't know how underlying
>>>> PMD behaves, so this is by default 'unknown' state.
>>>> I suggest keeping that state until driver explicitly updates its state
>>>> to the correct value.
>>>
>>> My concern is that when all the drivers are changed to report a proper
>>> capability, UNKNOWN remains in the API meaning "there's a bug in DPDK".
>>>
>>
>> When all drivers are changed, of course we can remove the 'unknown' flag.
>>
>>> Instead of UNKNOWN response we can declare that rte_flow_flush()
>>> must be called unless the application wants to keep the rules
>>> and has made sure it's possible, or the behavior is undefined.
>>> (Can be viewed as "UNKNOWN by default", but is simpler.)
>>> This way neither UNKNOWN state is needed,
>>> nor the bit saying the flow rules are flushed.
>>> Here is why, let's consider KEEP and FLUSH combinations:
>>>
>>> (1) FLUSH=0, KEEP=0 is equivalent to UNKNOWN, i.e. the application
>>>                       must explicitly flush the rules itself
>>>                       in order to get deterministic behavior.
>>> (2) FLUSH=1, KEEP=0 means PMD flushes all rules on the device stop.
>>> (3) FLUSH=0, KEEP=1 means PMD can keep at least some rules,
>>>                       exact support must be checked with
>> rte_flow_create()
>>>                       when the device is stopped.
>>> (4) FLUSH=1, KEEP=1 is forbidden.
>>>
>>
>> What is 'FLUSH' here? Are you proposing a new capability?
>>
>>> If the application doesn't need the PMD to keep flow rules,
>>> it can as well flush them always before the device stop
>>> regardless of whether the driver does it automatically or not.
>>> It's even simpler and probably as efficient. Testpmd does this.
>>> If the application wants to take advantage of rule-keeping ability,
>>> it just tests the KEEP bit. If it is unset that's the previous case,
>>> application should call rte_flow_flush() before the device stop to be
>> sure.
>>> Otherwise, the application can test capability to keep flow rule kinds
>>> it is interested in (see my reply to Andrew).
>>>
>>
>> Overall this is an optimization, application can workaround without this
>> capability.
>>
>> If driver doesn't set KEEP capability, it is not clear what does it
>> mean, driver doesn't keep rules or driver is not updated yet.
>> I suggest to update comment to clarify the meaning of the missing KEEP
>> flag.
>>
>> And unless we have two explicit status flags application can never be
>> sure that driver doesn't keep rules after stop. I am don't know if
>> application wants to know this.
>>
>> Other concern is how PMD maintainers will know that there is something
>> to update here, I am sure many driver maintainers won't even be aware of
>> this, your patch doesn't even cc them. Your approach feels like you are
>> thinking only single PMD and ignore rest.
>>
>> My intention was to have a way to follow drivers that is not updated,
>> by marking them with UNKNOWN flag. But this also doesn't work with new
>> drivers, they may forget setting capability.
>>
>>
>> What about following:
>> 1) Clarify KEEP flag meaning:
>> having KEEP: flow rules are kept after stop
>> missing KEEP: unknown behavior
>>
>> 2) Mark all PMDs with useless flag:
>> dev_capa &= ~KEEP
>> Maintainer can remove or update this later, and we can easily track it.
> 
> Item 1) is almost what I did in v2. The difference (or clarification) is that
> if the bit is set, it doesn't mean that all rules are kept.
> It allows the PMD to not support keeping some kinds of rules.
> Please see the doc update about how the kind is defined
> and how the application can test what is unsupported.
> 
> This complication is needed so that if a PMD cannot keep some exotic kind of rules,
> it is not forced to remove the capability completely,
> blocking optimizations even if the application doesn't use problematic rule kinds.
> It makes the capability future-proof.
> 
> The second flag (FLUSH) would not be of much help.
> Consider it is not set, but the PMD can keep some kinds of rules.
> The application still needs to test all the kinds it needs.
> But it needs to do the same if the KEEP bit is set.
> Only if it is set the application can skip the tests and rte_flow_flush(),
> but these optimizations are small compared to keeping the rules itself.
> 
> Item 2) needs not to be done, because the absence of the bit is *the* useless value:
> it means the unspecified same behavior as it is presently.
> It is worth noting that currently any application that relies on the PMD
> to keep or flush the rules is non-portable, because PMD is allowed to do anything.
> To get a reliable behavior application must explicitly clear the rules.
> 
> Regarding you concern about maintainers forgetting to update PMDs,
> I think there are better task-tracking tools then constants in the code
> (the authors of golang's context.TODO may disagree :)
> 
Hi Dmitry,
This is a valid concern, and adding code to the drivers proved that it works.
There are multiple authors updating the ethdev layer and expecting PMD
maintainers will do required changes. For your case you are updating the PMD
you concern but how other PMD maintainers will even be aware that there is
something to change in their PMD?
By your change you are putting some responsibility to other maintainers,
without even cc'ing them. And it is for sure they are not reading all emails
in the mail list, they can't.
Task-tracking is an option, it the past I tried to upstream some todo doc
for PMDs. But I can see the additional maintenance cost to trace features
from a central point, comparing the distributing it to PMDS (adding code
to PMDs).
I think best method is whoever doing the ethdev layer do the relevant change
in the PMDs, but has the obvious problem that not able to know enough about
the PMDs to update them.
We have used the following option, and it worked in the past:
- When an ethdev feature require change in PMDs, ehtdev supports both new
   and old method
- PMDs set a flag by default to request old method, so there is no update
   in the PMD default behavior
- When PMD does the required changes, it removes the flag
- This lets me (or other maintainer), to trace the update status and ping
   relevant maintainers
- When all PMDs updated, ethdev support for old method removed
- This method allows PMD maintainers do the change on their own time
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/5] ethdev: add capability to keep flow rules on restart
  2021-10-15 16:18   ` [dpdk-dev] [PATCH v2 1/5] ethdev: add capability to keep flow rules on restart Dmitry Kozlyuk
@ 2021-10-18  8:56     ` Andrew Rybchenko
  2021-10-19 12:38       ` Dmitry Kozlyuk
  2021-10-18 13:06     ` Zhang, Qi Z
  1 sibling, 1 reply; 96+ messages in thread
From: Andrew Rybchenko @ 2021-10-18  8:56 UTC (permalink / raw)
  To: Dmitry Kozlyuk, dev; +Cc: Ori Kam, Thomas Monjalon, Ferruh Yigit
On 10/15/21 7:18 PM, Dmitry Kozlyuk wrote:
> Currently, it is not specified what happens to the flow rules when
> the device is stopped, possibly reconfigured, then started.
> If flow rules were kept, it could be convenient for application
> developers, because they wouldn't need to save and restore them.
> However, due to the number of flows and possible creation rate it is
> impractical to save all flow rules in DPDK layer. This means that flow
> rules persistence really depends on whether PMD and HW can implement it
> efficiently. It can also be limited by the rule item and action types,
> and its attributes transfer bit, which together comprise the rule kind.
> 
> Add a device capability bit for PMDs that can keep at least some
> of the flow rules across restart. Without this capability behavior
> is still unspecified, which is now explicitly stated.
> Declare that the application can test for persitence of flow rules
> of a particular kind by attempting to create a rule of that kind
> when the device is stopped and checking for the specific error.
stopped -> configured but not yet started
> This is logical because if the PMD can to create the flow rule
can to -> can
> when the device is not started and use it after the start happens,
> it is natural that it can move its internal flow rule object
> to the same state when the device is stopped and restore the state
> when the device is started.
> 
> If the device is being reconfigured in a way that is incompatible with
> existing flow rules, PMD is required to report an error.
> This is mandatory, because flow API does not supply users with
> capabilities, so this is the only way for a user to learn that
> configuration is invalid. For example, if queue count changes and the
> action of a flow rule specifies queues that are going away, the user
> must update or remove the flow rule before removing the queues.
> 
> Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> ---
>  doc/guides/prog_guide/rte_flow.rst | 27 +++++++++++++++++++++++++++
>  lib/ethdev/rte_ethdev.h            |  7 +++++++
>  lib/ethdev/rte_flow.h              |  1 +
>  3 files changed, 35 insertions(+)
> 
> diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
> index 2b42d5ec8c..b0ced4209b 100644
> --- a/doc/guides/prog_guide/rte_flow.rst
> +++ b/doc/guides/prog_guide/rte_flow.rst
> @@ -87,6 +87,33 @@ To avoid resource leaks on the PMD side, handles must be explicitly
>  destroyed by the application before releasing associated resources such as
>  queues and ports.
>  
> +By default it is unspecified if the flow rules persist after the device stop.
or can be created before the first device start
> +If ``RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP`` is not advertised,
> +then rules must be explicitly flushed before stopping the device
> +if the application needs to ensure they are removed.
> +If it is advertised, this means the PMD can keep at least some rules
> +across the device stop and start with possible reconfiguration in between.
> +However, it may be only supported for some kinds of rules.
> +The kind is a combination of the following rule properties:
> +
> +- the sequence of item types;
> +- the sequence of action types;
> +- the value of the transfer attribute.
> +
> +To test if a particular kind of rules is kept, the application must try
> +to create a valid rule of that kind when the device is stopped
> +(after it has been configured or started previously).
> +If it succeeds, all rules of the same kind are kept at the device stop.
> +If it fails with an error of type ``RTE_FLOW_ERROR_TYPE_STATE``,
> +rules of this kind are flushed when the device is stopped.
> +Rules of a kept kind that are created when the device is stopped, including
> +the rules created for the test, will be kept after the device is started.
It must be defined what application should expect for
not tested rule kinds.
For me about check sounds extremely complicated and hardly
doable. Yes, some applications know kinds of rule it would
like to create, but some, like OvS, do not. Please, correct
me if I'm wrong. OvS knows which types of actions and even
possible combinations of actions (harder, but still possible)
it would like to install. But all possible combinations of
items together with all possible combinations of actions
could be very-very big.
May be I still misunderstand the above idea.
> +Some configuration changes may be incompatible with existing rules.
> +In this case ``rte_eth_dev_configure()``, ``rte_eth_rx/tx_queue_setup()``,
> +and/or ``rte_eth_dev_start()`` will fail with a log message from the PMD that
> +should be similar to the one that would be emitted by ``rte_flow_create()``
> +if an attempt was made to create the offending rule with the new configuration.
> +
>  The following sections cover:
>  
>  - **Attributes** (represented by ``struct rte_flow_attr``): properties of a
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index 6d80514ba7..a0b388bb25 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -90,6 +90,11 @@
>   *     - flow director filtering mode (but not filtering rules)
>   *     - NIC queue statistics mappings
>   *
> + * The following configuration may be retained or not
> + * depending on the device capabilities:
> + *
> + *     - flow rules
> + *
>   * Any other configuration will not be stored and will need to be re-entered
>   * before a call to rte_eth_dev_start().
>   *
> @@ -1445,6 +1450,8 @@ struct rte_eth_conf {
>  #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
>  /** Device supports Tx queue setup after device started. */
>  #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
> +/** Device supports keeping flow rules across restart. */
> +#define RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP 0x00000004
>  /**@}*/
>  
>  /*
> diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
> index a89945061a..aa0182d021 100644
> --- a/lib/ethdev/rte_flow.h
> +++ b/lib/ethdev/rte_flow.h
> @@ -3344,6 +3344,7 @@ enum rte_flow_error_type {
>  	RTE_FLOW_ERROR_TYPE_ACTION_NUM, /**< Number of actions. */
>  	RTE_FLOW_ERROR_TYPE_ACTION_CONF, /**< Action configuration. */
>  	RTE_FLOW_ERROR_TYPE_ACTION, /**< Specific action. */
> +	RTE_FLOW_ERROR_TYPE_STATE, /**< Current device state. */
>  };
>  
>  /**
> 
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH 2/5] ethdev: add capability to keep shared objects on restart
  2021-10-18  8:42                 ` Ferruh Yigit
@ 2021-10-18 11:13                   ` Dmitry Kozlyuk
  2021-10-18 11:59                     ` Ferruh Yigit
  0 siblings, 1 reply; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-10-18 11:13 UTC (permalink / raw)
  To: Ferruh Yigit, dev, Andrew Rybchenko, Ori Kam, Raslan Darawsheh,
	NBU-Contact-Thomas Monjalon
  Cc: Qi Zhang, jerinj, Maxime Coquelin
> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@intel.com>
> Sent: 18 октября 2021 г. 11:42
> To: Dmitry Kozlyuk <dkozlyuk@nvidia.com>; dev@dpdk.org; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>; Ori Kam <orika@nvidia.com>; Raslan
> Darawsheh <rasland@nvidia.com>; NBU-Contact-Thomas Monjalon
> <thomas@monjalon.net>
> Cc: Qi Zhang <qi.z.zhang@intel.com>; jerinj@marvell.com; Maxime Coquelin
> <maxime.coquelin@redhat.com>
> Subject: Re: [PATCH 2/5] ethdev: add capability to keep shared objects on
> restart
> 
> External email: Use caution opening links or attachments
> 
> 
> On 10/16/2021 9:32 PM, Dmitry Kozlyuk wrote:
> >> -----Original Message-----
> >> From: Ferruh Yigit <ferruh.yigit@intel.com>
> >> Sent: 15 октября 2021 г. 19:27
> >> To: Dmitry Kozlyuk <dkozlyuk@nvidia.com>; dev@dpdk.org; Andrew
> Rybchenko
> >> <andrew.rybchenko@oktetlabs.ru>; Ori Kam <orika@nvidia.com>; Raslan
> >> Darawsheh <rasland@nvidia.com>
> >> Cc: NBU-Contact-Thomas Monjalon <thomas@monjalon.net>; Qi Zhang
> >> <qi.z.zhang@intel.com>; jerinj@marvell.com; Maxime Coquelin
> >> <maxime.coquelin@redhat.com>
> >> Subject: Re: [PATCH 2/5] ethdev: add capability to keep shared objects
> on
> >> restart
> >>
> >> External email: Use caution opening links or attachments
> >>
> >>
> >> On 10/15/2021 1:35 PM, Dmitry Kozlyuk wrote:
> >>>> -----Original Message-----
> >>>> From: Ferruh Yigit <ferruh.yigit@intel.com>
> >>>> [...]
> >>>>> Introducing UNKNOWN state seems wrong to me.
> >>>>> What should an application do when it is reported?
> >>>>> Now there's just no way to learn how the PMD behaves,
> >>>>> but if it provides a response, it can't be "I don't know what I do".
> >>>>>
> >>>>
> >>>> I agree 'unknown' state is not ideal, but my intentions is prevent
> >>>> drivers that not implemented this new feature report wrong
> capability.
> >>>>
> >>>> Without capability, application already doesn't know how underlying
> >>>> PMD behaves, so this is by default 'unknown' state.
> >>>> I suggest keeping that state until driver explicitly updates its
> state
> >>>> to the correct value.
> >>>
> >>> My concern is that when all the drivers are changed to report a proper
> >>> capability, UNKNOWN remains in the API meaning "there's a bug in
> DPDK".
> >>>
> >>
> >> When all drivers are changed, of course we can remove the 'unknown'
> flag.
> >>
> >>> Instead of UNKNOWN response we can declare that rte_flow_flush()
> >>> must be called unless the application wants to keep the rules
> >>> and has made sure it's possible, or the behavior is undefined.
> >>> (Can be viewed as "UNKNOWN by default", but is simpler.)
> >>> This way neither UNKNOWN state is needed,
> >>> nor the bit saying the flow rules are flushed.
> >>> Here is why, let's consider KEEP and FLUSH combinations:
> >>>
> >>> (1) FLUSH=0, KEEP=0 is equivalent to UNKNOWN, i.e. the application
> >>>                       must explicitly flush the rules itself
> >>>                       in order to get deterministic behavior.
> >>> (2) FLUSH=1, KEEP=0 means PMD flushes all rules on the device stop.
> >>> (3) FLUSH=0, KEEP=1 means PMD can keep at least some rules,
> >>>                       exact support must be checked with
> >> rte_flow_create()
> >>>                       when the device is stopped.
> >>> (4) FLUSH=1, KEEP=1 is forbidden.
> >>>
> >>
> >> What is 'FLUSH' here? Are you proposing a new capability?
> >>
> >>> If the application doesn't need the PMD to keep flow rules,
> >>> it can as well flush them always before the device stop
> >>> regardless of whether the driver does it automatically or not.
> >>> It's even simpler and probably as efficient. Testpmd does this.
> >>> If the application wants to take advantage of rule-keeping ability,
> >>> it just tests the KEEP bit. If it is unset that's the previous case,
> >>> application should call rte_flow_flush() before the device stop to be
> >> sure.
> >>> Otherwise, the application can test capability to keep flow rule kinds
> >>> it is interested in (see my reply to Andrew).
> >>>
> >>
> >> Overall this is an optimization, application can workaround without
> this
> >> capability.
> >>
> >> If driver doesn't set KEEP capability, it is not clear what does it
> >> mean, driver doesn't keep rules or driver is not updated yet.
> >> I suggest to update comment to clarify the meaning of the missing KEEP
> >> flag.
> >>
> >> And unless we have two explicit status flags application can never be
> >> sure that driver doesn't keep rules after stop. I am don't know if
> >> application wants to know this.
> >>
> >> Other concern is how PMD maintainers will know that there is something
> >> to update here, I am sure many driver maintainers won't even be aware
> of
> >> this, your patch doesn't even cc them. Your approach feels like you are
> >> thinking only single PMD and ignore rest.
> >>
> >> My intention was to have a way to follow drivers that is not updated,
> >> by marking them with UNKNOWN flag. But this also doesn't work with new
> >> drivers, they may forget setting capability.
> >>
> >>
> >> What about following:
> >> 1) Clarify KEEP flag meaning:
> >> having KEEP: flow rules are kept after stop
> >> missing KEEP: unknown behavior
> >>
> >> 2) Mark all PMDs with useless flag:
> >> dev_capa &= ~KEEP
> >> Maintainer can remove or update this later, and we can easily track it.
> >
> > Item 1) is almost what I did in v2. The difference (or clarification) is
> that
> > if the bit is set, it doesn't mean that all rules are kept.
> > It allows the PMD to not support keeping some kinds of rules.
> > Please see the doc update about how the kind is defined
> > and how the application can test what is unsupported.
> >
> > This complication is needed so that if a PMD cannot keep some exotic
> kind of rules,
> > it is not forced to remove the capability completely,
> > blocking optimizations even if the application doesn't use problematic
> rule kinds.
> > It makes the capability future-proof.
> >
> > The second flag (FLUSH) would not be of much help.
> > Consider it is not set, but the PMD can keep some kinds of rules.
> > The application still needs to test all the kinds it needs.
> > But it needs to do the same if the KEEP bit is set.
> > Only if it is set the application can skip the tests and
> rte_flow_flush(),
> > but these optimizations are small compared to keeping the rules itself.
> >
> > Item 2) needs not to be done, because the absence of the bit is *the*
> useless value:
> > it means the unspecified same behavior as it is presently.
> > It is worth noting that currently any application that relies on the PMD
> > to keep or flush the rules is non-portable, because PMD is allowed to do
> anything.
> > To get a reliable behavior application must explicitly clear the rules.
> >
> > Regarding you concern about maintainers forgetting to update PMDs,
> > I think there are better task-tracking tools then constants in the code
> > (the authors of golang's context.TODO may disagree :)
> >
> 
> Hi Dmitry,
> 
> This is a valid concern, and adding code to the drivers proved that it
> works.
> 
> There are multiple authors updating the ethdev layer and expecting PMD
> maintainers will do required changes. For your case you are updating the
> PMD
> you concern but how other PMD maintainers will even be aware that there is
> something to change in their PMD?
> By your change you are putting some responsibility to other maintainers,
> without even cc'ing them. And it is for sure they are not reading all
> emails
> in the mail list, they can't.
> 
> Task-tracking is an option, it the past I tried to upstream some todo doc
> for PMDs. But I can see the additional maintenance cost to trace features
> from a central point, comparing the distributing it to PMDS (adding code
> to PMDs).
> 
> I think best method is whoever doing the ethdev layer do the relevant
> change
> in the PMDs, but has the obvious problem that not able to know enough
> about
> the PMDs to update them.
> 
> We have used the following option, and it worked in the past:
> - When an ethdev feature require change in PMDs, ehtdev supports both new
>    and old method
> - PMDs set a flag by default to request old method, so there is no update
>    in the PMD default behavior
> - When PMD does the required changes, it removes the flag
> - This lets me (or other maintainer), to trace the update status and ping
>    relevant maintainers
> - When all PMDs updated, ethdev support for old method removed
> - This method allows PMD maintainers do the change on their own time
Hi Ferruh,
Thanks for sharing the experience.
You suggest updating PMDs with an explicit reset of this bit,
despite that it will be zero anyway, to attract maintainers' attention.
From user's perspective it will be all the same: KEEP bit reset,
not a special value saying the PMD is not updated
that we would need to deprecate and remove later.
If this understanding is correct, then for sure I can add a patch
updating the relevant PMDs.
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH 2/5] ethdev: add capability to keep shared objects on restart
  2021-10-18 11:13                   ` Dmitry Kozlyuk
@ 2021-10-18 11:59                     ` Ferruh Yigit
  0 siblings, 0 replies; 96+ messages in thread
From: Ferruh Yigit @ 2021-10-18 11:59 UTC (permalink / raw)
  To: Dmitry Kozlyuk, dev, Andrew Rybchenko, Ori Kam, Raslan Darawsheh,
	NBU-Contact-Thomas Monjalon
  Cc: Qi Zhang, jerinj, Maxime Coquelin
On 10/18/2021 12:13 PM, Dmitry Kozlyuk wrote:
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@intel.com>
>> Sent: 18 октября 2021 г. 11:42
>> To: Dmitry Kozlyuk <dkozlyuk@nvidia.com>; dev@dpdk.org; Andrew Rybchenko
>> <andrew.rybchenko@oktetlabs.ru>; Ori Kam <orika@nvidia.com>; Raslan
>> Darawsheh <rasland@nvidia.com>; NBU-Contact-Thomas Monjalon
>> <thomas@monjalon.net>
>> Cc: Qi Zhang <qi.z.zhang@intel.com>; jerinj@marvell.com; Maxime Coquelin
>> <maxime.coquelin@redhat.com>
>> Subject: Re: [PATCH 2/5] ethdev: add capability to keep shared objects on
>> restart
>>
>> External email: Use caution opening links or attachments
>>
>>
>> On 10/16/2021 9:32 PM, Dmitry Kozlyuk wrote:
>>>> -----Original Message-----
>>>> From: Ferruh Yigit <ferruh.yigit@intel.com>
>>>> Sent: 15 октября 2021 г. 19:27
>>>> To: Dmitry Kozlyuk <dkozlyuk@nvidia.com>; dev@dpdk.org; Andrew
>> Rybchenko
>>>> <andrew.rybchenko@oktetlabs.ru>; Ori Kam <orika@nvidia.com>; Raslan
>>>> Darawsheh <rasland@nvidia.com>
>>>> Cc: NBU-Contact-Thomas Monjalon <thomas@monjalon.net>; Qi Zhang
>>>> <qi.z.zhang@intel.com>; jerinj@marvell.com; Maxime Coquelin
>>>> <maxime.coquelin@redhat.com>
>>>> Subject: Re: [PATCH 2/5] ethdev: add capability to keep shared objects
>> on
>>>> restart
>>>>
>>>> External email: Use caution opening links or attachments
>>>>
>>>>
>>>> On 10/15/2021 1:35 PM, Dmitry Kozlyuk wrote:
>>>>>> -----Original Message-----
>>>>>> From: Ferruh Yigit <ferruh.yigit@intel.com>
>>>>>> [...]
>>>>>>> Introducing UNKNOWN state seems wrong to me.
>>>>>>> What should an application do when it is reported?
>>>>>>> Now there's just no way to learn how the PMD behaves,
>>>>>>> but if it provides a response, it can't be "I don't know what I do".
>>>>>>>
>>>>>>
>>>>>> I agree 'unknown' state is not ideal, but my intentions is prevent
>>>>>> drivers that not implemented this new feature report wrong
>> capability.
>>>>>>
>>>>>> Without capability, application already doesn't know how underlying
>>>>>> PMD behaves, so this is by default 'unknown' state.
>>>>>> I suggest keeping that state until driver explicitly updates its
>> state
>>>>>> to the correct value.
>>>>>
>>>>> My concern is that when all the drivers are changed to report a proper
>>>>> capability, UNKNOWN remains in the API meaning "there's a bug in
>> DPDK".
>>>>>
>>>>
>>>> When all drivers are changed, of course we can remove the 'unknown'
>> flag.
>>>>
>>>>> Instead of UNKNOWN response we can declare that rte_flow_flush()
>>>>> must be called unless the application wants to keep the rules
>>>>> and has made sure it's possible, or the behavior is undefined.
>>>>> (Can be viewed as "UNKNOWN by default", but is simpler.)
>>>>> This way neither UNKNOWN state is needed,
>>>>> nor the bit saying the flow rules are flushed.
>>>>> Here is why, let's consider KEEP and FLUSH combinations:
>>>>>
>>>>> (1) FLUSH=0, KEEP=0 is equivalent to UNKNOWN, i.e. the application
>>>>>                        must explicitly flush the rules itself
>>>>>                        in order to get deterministic behavior.
>>>>> (2) FLUSH=1, KEEP=0 means PMD flushes all rules on the device stop.
>>>>> (3) FLUSH=0, KEEP=1 means PMD can keep at least some rules,
>>>>>                        exact support must be checked with
>>>> rte_flow_create()
>>>>>                        when the device is stopped.
>>>>> (4) FLUSH=1, KEEP=1 is forbidden.
>>>>>
>>>>
>>>> What is 'FLUSH' here? Are you proposing a new capability?
>>>>
>>>>> If the application doesn't need the PMD to keep flow rules,
>>>>> it can as well flush them always before the device stop
>>>>> regardless of whether the driver does it automatically or not.
>>>>> It's even simpler and probably as efficient. Testpmd does this.
>>>>> If the application wants to take advantage of rule-keeping ability,
>>>>> it just tests the KEEP bit. If it is unset that's the previous case,
>>>>> application should call rte_flow_flush() before the device stop to be
>>>> sure.
>>>>> Otherwise, the application can test capability to keep flow rule kinds
>>>>> it is interested in (see my reply to Andrew).
>>>>>
>>>>
>>>> Overall this is an optimization, application can workaround without
>> this
>>>> capability.
>>>>
>>>> If driver doesn't set KEEP capability, it is not clear what does it
>>>> mean, driver doesn't keep rules or driver is not updated yet.
>>>> I suggest to update comment to clarify the meaning of the missing KEEP
>>>> flag.
>>>>
>>>> And unless we have two explicit status flags application can never be
>>>> sure that driver doesn't keep rules after stop. I am don't know if
>>>> application wants to know this.
>>>>
>>>> Other concern is how PMD maintainers will know that there is something
>>>> to update here, I am sure many driver maintainers won't even be aware
>> of
>>>> this, your patch doesn't even cc them. Your approach feels like you are
>>>> thinking only single PMD and ignore rest.
>>>>
>>>> My intention was to have a way to follow drivers that is not updated,
>>>> by marking them with UNKNOWN flag. But this also doesn't work with new
>>>> drivers, they may forget setting capability.
>>>>
>>>>
>>>> What about following:
>>>> 1) Clarify KEEP flag meaning:
>>>> having KEEP: flow rules are kept after stop
>>>> missing KEEP: unknown behavior
>>>>
>>>> 2) Mark all PMDs with useless flag:
>>>> dev_capa &= ~KEEP
>>>> Maintainer can remove or update this later, and we can easily track it.
>>>
>>> Item 1) is almost what I did in v2. The difference (or clarification) is
>> that
>>> if the bit is set, it doesn't mean that all rules are kept.
>>> It allows the PMD to not support keeping some kinds of rules.
>>> Please see the doc update about how the kind is defined
>>> and how the application can test what is unsupported.
>>>
>>> This complication is needed so that if a PMD cannot keep some exotic
>> kind of rules,
>>> it is not forced to remove the capability completely,
>>> blocking optimizations even if the application doesn't use problematic
>> rule kinds.
>>> It makes the capability future-proof.
>>>
>>> The second flag (FLUSH) would not be of much help.
>>> Consider it is not set, but the PMD can keep some kinds of rules.
>>> The application still needs to test all the kinds it needs.
>>> But it needs to do the same if the KEEP bit is set.
>>> Only if it is set the application can skip the tests and
>> rte_flow_flush(),
>>> but these optimizations are small compared to keeping the rules itself.
>>>
>>> Item 2) needs not to be done, because the absence of the bit is *the*
>> useless value:
>>> it means the unspecified same behavior as it is presently.
>>> It is worth noting that currently any application that relies on the PMD
>>> to keep or flush the rules is non-portable, because PMD is allowed to do
>> anything.
>>> To get a reliable behavior application must explicitly clear the rules.
>>>
>>> Regarding you concern about maintainers forgetting to update PMDs,
>>> I think there are better task-tracking tools then constants in the code
>>> (the authors of golang's context.TODO may disagree :)
>>>
>>
>> Hi Dmitry,
>>
>> This is a valid concern, and adding code to the drivers proved that it
>> works.
>>
>> There are multiple authors updating the ethdev layer and expecting PMD
>> maintainers will do required changes. For your case you are updating the
>> PMD
>> you concern but how other PMD maintainers will even be aware that there is
>> something to change in their PMD?
>> By your change you are putting some responsibility to other maintainers,
>> without even cc'ing them. And it is for sure they are not reading all
>> emails
>> in the mail list, they can't.
>>
>> Task-tracking is an option, it the past I tried to upstream some todo doc
>> for PMDs. But I can see the additional maintenance cost to trace features
>> from a central point, comparing the distributing it to PMDS (adding code
>> to PMDs).
>>
>> I think best method is whoever doing the ethdev layer do the relevant
>> change
>> in the PMDs, but has the obvious problem that not able to know enough
>> about
>> the PMDs to update them.
>>
>> We have used the following option, and it worked in the past:
>> - When an ethdev feature require change in PMDs, ehtdev supports both new
>>     and old method
>> - PMDs set a flag by default to request old method, so there is no update
>>     in the PMD default behavior
>> - When PMD does the required changes, it removes the flag
>> - This lets me (or other maintainer), to trace the update status and ping
>>     relevant maintainers
>> - When all PMDs updated, ethdev support for old method removed
>> - This method allows PMD maintainers do the change on their own time
> 
> Hi Ferruh,
> 
> Thanks for sharing the experience.
> You suggest updating PMDs with an explicit reset of this bit,
> despite that it will be zero anyway, to attract maintainers' attention.
ack, but please with a brief comment to clarify intention.
>  From user's perspective it will be all the same: KEEP bit reset,
> not a special value saying the PMD is not updated
> that we would need to deprecate and remove later.
ack, only it needs to be clear for application that PMD not advertising
KEEP flag means behavior is undefined, it does NOT mean PMD flush rules.
Which you already said updated like this in v2, but I am just stressing it.
> If this understanding is correct, then for sure I can add a patch
> updating the relevant PMDs.
> 
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/5] ethdev: add capability to keep flow rules on restart
  2021-10-15 16:18   ` [dpdk-dev] [PATCH v2 1/5] ethdev: add capability to keep flow rules on restart Dmitry Kozlyuk
  2021-10-18  8:56     ` Andrew Rybchenko
@ 2021-10-18 13:06     ` Zhang, Qi Z
  2021-10-18 22:51       ` Dmitry Kozlyuk
  1 sibling, 1 reply; 96+ messages in thread
From: Zhang, Qi Z @ 2021-10-18 13:06 UTC (permalink / raw)
  To: Dmitry Kozlyuk, dev
  Cc: Ori Kam, Thomas Monjalon, Yigit, Ferruh, Andrew Rybchenko
> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Dmitry Kozlyuk
> Sent: Saturday, October 16, 2021 12:18 AM
> To: dev@dpdk.org
> Cc: Ori Kam <orika@oss.nvidia.com>; Thomas Monjalon
> <thomas@monjalon.net>; Yigit, Ferruh <ferruh.yigit@intel.com>; Andrew
> Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Subject: [dpdk-dev] [PATCH v2 1/5] ethdev: add capability to keep flow rules
> on restart
> 
> Currently, it is not specified what happens to the flow rules when the device is
> stopped, possibly reconfigured, then started.
> If flow rules were kept, it could be convenient for application developers,
> because they wouldn't need to save and restore them.
> However, due to the number of flows and possible creation rate it is
> impractical to save all flow rules in DPDK layer. This means that flow rules
> persistence really depends on whether PMD and HW can implement it
> efficiently. It can also be limited by the rule item and action types, and its
> attributes transfer bit, which together comprise the rule kind.
> 
> Add a device capability bit for PMDs that can keep at least some of the flow
> rules across restart. Without this capability behavior is still unspecified, which
> is now explicitly stated.
> Declare that the application can test for persitence of flow rules of a particular
> kind by attempting to create a rule of that kind when the device is stopped
> and checking for the specific error.
> This is logical because if the PMD can to create the flow rule when the device
> is not started and use it after the start happens, it is natural that it can move
> its internal flow rule object to the same state when the device is stopped and
> restore the state when the device is started.
> 
> If the device is being reconfigured in a way that is incompatible with existing
> flow rules, PMD is required to report an error.
> This is mandatory, because flow API does not supply users with capabilities, so
> this is the only way for a user to learn that configuration is invalid.
What if a PMD does not flush rules during start /stop cycle, but just want to simply flush rules during dev_config? 
Is it reasonable to take above as an typical implementation to avoid all the complexity for handling the conflicts?
1. queues are destroyed and re-created with a different number which may impact "to queue" action.
2. hash key may be overwritten which impact RSS result.
3. offload flags changes may impact data path selection which cause mark action does not work.
....
> example, if queue count changes and the action of a flow rule specifies queues
> that are going away, the user must update or remove the flow rule before
> removing the queues.
> 
> Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> ---
>  doc/guides/prog_guide/rte_flow.rst | 27 +++++++++++++++++++++++++++
>  lib/ethdev/rte_ethdev.h            |  7 +++++++
>  lib/ethdev/rte_flow.h              |  1 +
>  3 files changed, 35 insertions(+)
> 
> diff --git a/doc/guides/prog_guide/rte_flow.rst
> b/doc/guides/prog_guide/rte_flow.rst
> index 2b42d5ec8c..b0ced4209b 100644
> --- a/doc/guides/prog_guide/rte_flow.rst
> +++ b/doc/guides/prog_guide/rte_flow.rst
> @@ -87,6 +87,33 @@ To avoid resource leaks on the PMD side, handles must
> be explicitly  destroyed by the application before releasing associated
> resources such as  queues and ports.
> 
> +By default it is unspecified if the flow rules persist after the device stop.
> +If ``RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP`` is not advertised, then rules
> +must be explicitly flushed before stopping the device if the
> +application needs to ensure they are removed.
> +If it is advertised, this means the PMD can keep at least some rules
> +across the device stop and start with possible reconfiguration in between.
> +However, it may be only supported for some kinds of rules.
> +The kind is a combination of the following rule properties:
> +
> +- the sequence of item types;
> +- the sequence of action types;
> +- the value of the transfer attribute.
> +
> +To test if a particular kind of rules is kept, the application must try
> +to create a valid rule of that kind when the device is stopped (after
> +it has been configured or started previously).
> +If it succeeds, all rules of the same kind are kept at the device stop.
> +If it fails with an error of type ``RTE_FLOW_ERROR_TYPE_STATE``, rules
> +of this kind are flushed when the device is stopped.
> +Rules of a kept kind that are created when the device is stopped,
> +including the rules created for the test, will be kept after the device is started.
> +Some configuration changes may be incompatible with existing rules.
> +In this case ``rte_eth_dev_configure()``,
> +``rte_eth_rx/tx_queue_setup()``, and/or ``rte_eth_dev_start()`` will
> +fail with a log message from the PMD that should be similar to the one
> +that would be emitted by ``rte_flow_create()`` if an attempt was made to
> create the offending rule with the new configuration.
> +
>  The following sections cover:
> 
>  - **Attributes** (represented by ``struct rte_flow_attr``): properties of a diff
> --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> 6d80514ba7..a0b388bb25 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -90,6 +90,11 @@
>   *     - flow director filtering mode (but not filtering rules)
>   *     - NIC queue statistics mappings
>   *
> + * The following configuration may be retained or not
> + * depending on the device capabilities:
> + *
> + *     - flow rules
> + *
>   * Any other configuration will not be stored and will need to be re-entered
>   * before a call to rte_eth_dev_start().
>   *
> @@ -1445,6 +1450,8 @@ struct rte_eth_conf {  #define
> RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
>  /** Device supports Tx queue setup after device started. */  #define
> RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
> +/** Device supports keeping flow rules across restart. */ #define
> +RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP 0x00000004
>  /**@}*/
> 
>  /*
> diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h index
> a89945061a..aa0182d021 100644
> --- a/lib/ethdev/rte_flow.h
> +++ b/lib/ethdev/rte_flow.h
> @@ -3344,6 +3344,7 @@ enum rte_flow_error_type {
>  	RTE_FLOW_ERROR_TYPE_ACTION_NUM, /**< Number of actions. */
>  	RTE_FLOW_ERROR_TYPE_ACTION_CONF, /**< Action configuration. */
>  	RTE_FLOW_ERROR_TYPE_ACTION, /**< Specific action. */
> +	RTE_FLOW_ERROR_TYPE_STATE, /**< Current device state. */
>  };
> 
>  /**
> --
> 2.25.1
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/5] ethdev: add capability to keep flow rules on restart
  2021-10-18 13:06     ` Zhang, Qi Z
@ 2021-10-18 22:51       ` Dmitry Kozlyuk
  2021-10-19  1:00         ` Zhang, Qi Z
  0 siblings, 1 reply; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-10-18 22:51 UTC (permalink / raw)
  To: Zhang, Qi Z, dev
  Cc: Ori Kam, NBU-Contact-Thomas Monjalon, Yigit, Ferruh, Andrew Rybchenko
> -----Original Message-----
> From: Zhang, Qi Z <qi.z.zhang@intel.com>
> Sent: 18 октября 2021 г. 16:06
> To: Dmitry Kozlyuk <dkozlyuk@nvidia.com>; dev@dpdk.org
> Cc: Ori Kam <orika@nvidia.com>; NBU-Contact-Thomas Monjalon
> <thomas@monjalon.net>; Yigit, Ferruh <ferruh.yigit@intel.com>; Andrew
> Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Subject: RE: [dpdk-dev] [PATCH v2 1/5] ethdev: add capability to keep flow
> rules on restart
> 
> External email: Use caution opening links or attachments
> 
> 
> > -----Original Message-----
> > From: dev <dev-bounces@dpdk.org> On Behalf Of Dmitry Kozlyuk
> > Sent: Saturday, October 16, 2021 12:18 AM
> > To: dev@dpdk.org
> > Cc: Ori Kam <orika@oss.nvidia.com>; Thomas Monjalon
> > <thomas@monjalon.net>; Yigit, Ferruh <ferruh.yigit@intel.com>; Andrew
> > Rybchenko <andrew.rybchenko@oktetlabs.ru>
> > Subject: [dpdk-dev] [PATCH v2 1/5] ethdev: add capability to keep flow
> rules
> > on restart
> >
> > Currently, it is not specified what happens to the flow rules when the
> device is
> > stopped, possibly reconfigured, then started.
> > If flow rules were kept, it could be convenient for application
> developers,
> > because they wouldn't need to save and restore them.
> > However, due to the number of flows and possible creation rate it is
> > impractical to save all flow rules in DPDK layer. This means that flow
> rules
> > persistence really depends on whether PMD and HW can implement it
> > efficiently. It can also be limited by the rule item and action types,
> and its
> > attributes transfer bit, which together comprise the rule kind.
> >
> > Add a device capability bit for PMDs that can keep at least some of the
> flow
> > rules across restart. Without this capability behavior is still
> unspecified, which
> > is now explicitly stated.
> > Declare that the application can test for persitence of flow rules of a
> particular
> > kind by attempting to create a rule of that kind when the device is
> stopped
> > and checking for the specific error.
> > This is logical because if the PMD can to create the flow rule when the
> device
> > is not started and use it after the start happens, it is natural that it
> can move
> > its internal flow rule object to the same state when the device is
> stopped and
> > restore the state when the device is started.
> >
> > If the device is being reconfigured in a way that is incompatible with
> existing
> > flow rules, PMD is required to report an error.
> > This is mandatory, because flow API does not supply users with
> capabilities, so
> > this is the only way for a user to learn that configuration is invalid.
> 
> What if a PMD does not flush rules during start /stop cycle, but just want
> to simply flush rules during dev_config?
> Is it reasonable to take above as an typical implementation to avoid all
> the complexity for handling the conflicts?
> 
> 1. queues are destroyed and re-created with a different number which may
> impact "to queue" action.
> 2. hash key may be overwritten which impact RSS result.
> 3. offload flags changes may impact data path selection which cause mark
> action does not work.
> ....
Hello Qi,
Yes, it sounds reasonable that rules do need not to persist across reconfiguration.
Unlike indirect actions, they are too numerous for PMD to track and check.
I'm not sure rte_eth_dev_configure() should be specified to implicitly flush them.
Some PMDs may wish to preserve the rules even then in the future,
so we don't want applications to rely on configure flushing the rules.
It can be specified that applications should flush the rules themselves before.
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/5] ethdev: add capability to keep flow rules on restart
  2021-10-18 22:51       ` Dmitry Kozlyuk
@ 2021-10-19  1:00         ` Zhang, Qi Z
  0 siblings, 0 replies; 96+ messages in thread
From: Zhang, Qi Z @ 2021-10-19  1:00 UTC (permalink / raw)
  To: Dmitry Kozlyuk, dev
  Cc: Ori Kam, NBU-Contact-Thomas Monjalon, Yigit, Ferruh, Andrew Rybchenko
> -----Original Message-----
> From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> Sent: Tuesday, October 19, 2021 6:51 AM
> To: Zhang, Qi Z <qi.z.zhang@intel.com>; dev@dpdk.org
> Cc: Ori Kam <orika@nvidia.com>; NBU-Contact-Thomas Monjalon
> <thomas@monjalon.net>; Yigit, Ferruh <ferruh.yigit@intel.com>; Andrew
> Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Subject: RE: [dpdk-dev] [PATCH v2 1/5] ethdev: add capability to keep flow
> rules on restart
> 
> > -----Original Message-----
> > From: Zhang, Qi Z <qi.z.zhang@intel.com>
> > Sent: 18 октября 2021 г. 16:06
> > To: Dmitry Kozlyuk <dkozlyuk@nvidia.com>; dev@dpdk.org
> > Cc: Ori Kam <orika@nvidia.com>; NBU-Contact-Thomas Monjalon
> > <thomas@monjalon.net>; Yigit, Ferruh <ferruh.yigit@intel.com>; Andrew
> > Rybchenko <andrew.rybchenko@oktetlabs.ru>
> > Subject: RE: [dpdk-dev] [PATCH v2 1/5] ethdev: add capability to keep
> > flow rules on restart
> >
> > External email: Use caution opening links or attachments
> >
> >
> > > -----Original Message-----
> > > From: dev <dev-bounces@dpdk.org> On Behalf Of Dmitry Kozlyuk
> > > Sent: Saturday, October 16, 2021 12:18 AM
> > > To: dev@dpdk.org
> > > Cc: Ori Kam <orika@oss.nvidia.com>; Thomas Monjalon
> > > <thomas@monjalon.net>; Yigit, Ferruh <ferruh.yigit@intel.com>;
> > > Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> > > Subject: [dpdk-dev] [PATCH v2 1/5] ethdev: add capability to keep
> > > flow
> > rules
> > > on restart
> > >
> > > Currently, it is not specified what happens to the flow rules when
> > > the
> > device is
> > > stopped, possibly reconfigured, then started.
> > > If flow rules were kept, it could be convenient for application
> > developers,
> > > because they wouldn't need to save and restore them.
> > > However, due to the number of flows and possible creation rate it is
> > > impractical to save all flow rules in DPDK layer. This means that
> > > flow
> > rules
> > > persistence really depends on whether PMD and HW can implement it
> > > efficiently. It can also be limited by the rule item and action
> > > types,
> > and its
> > > attributes transfer bit, which together comprise the rule kind.
> > >
> > > Add a device capability bit for PMDs that can keep at least some of
> > > the
> > flow
> > > rules across restart. Without this capability behavior is still
> > unspecified, which
> > > is now explicitly stated.
> > > Declare that the application can test for persitence of flow rules
> > > of a
> > particular
> > > kind by attempting to create a rule of that kind when the device is
> > stopped
> > > and checking for the specific error.
> > > This is logical because if the PMD can to create the flow rule when
> > > the
> > device
> > > is not started and use it after the start happens, it is natural
> > > that it
> > can move
> > > its internal flow rule object to the same state when the device is
> > stopped and
> > > restore the state when the device is started.
> > >
> > > If the device is being reconfigured in a way that is incompatible
> > > with
> > existing
> > > flow rules, PMD is required to report an error.
> > > This is mandatory, because flow API does not supply users with
> > capabilities, so
> > > this is the only way for a user to learn that configuration is invalid.
> >
> > What if a PMD does not flush rules during start /stop cycle, but just
> > want to simply flush rules during dev_config?
> > Is it reasonable to take above as an typical implementation to avoid
> > all the complexity for handling the conflicts?
> >
> > 1. queues are destroyed and re-created with a different number which
> > may impact "to queue" action.
> > 2. hash key may be overwritten which impact RSS result.
> > 3. offload flags changes may impact data path selection which cause
> > mark action does not work.
> > ....
> 
> Hello Qi,
> 
> Yes, it sounds reasonable that rules do need not to persist across
> reconfiguration.
> Unlike indirect actions, they are too numerous for PMD to track and check.
> I'm not sure rte_eth_dev_configure() should be specified to implicitly flush
> them.
> Some PMDs may wish to preserve the rules even then in the future, so we
> don't want applications to rely on configure flushing the rules.
> It can be specified that applications should flush the rules themselves before.
OK, I'm trying to figure out how to set this "keep" capability for a PMD that don't want application to re-create rules after dev_stop, but still want rules be flushed before reconfigure.
I think the answer is it should expose the "keep" capability and simply return error in dev_configure if any rules exists, thanks
^ permalink raw reply	[flat|nested] 96+ messages in thread
* [dpdk-dev] [PATCH v3 0/6] Flow entites behavior on port restart
  2021-10-15 16:18 ` [dpdk-dev] [PATCH v2 0/5] Flow entites behavior on port restart Dmitry Kozlyuk
                     ` (4 preceding siblings ...)
  2021-10-15 16:18   ` [dpdk-dev] [PATCH v2 5/5] net/mlx5: preserve indirect actions on restart Dmitry Kozlyuk
@ 2021-10-19 12:37   ` Dmitry Kozlyuk
  2021-10-19 12:37     ` [dpdk-dev] [PATCH v3 1/6] ethdev: add capability to keep flow rules on restart Dmitry Kozlyuk
                       ` (7 more replies)
  5 siblings, 8 replies; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-10-19 12:37 UTC (permalink / raw)
  To: dev
It is unspecified whether flow rules and indirect actions are kept
when a port is stopped, possibly reconfigured, and started again.
Vendors approach the topic differently, e.g. mlx5 and i40e PMD
disagree in whether flow rules can be kept, and mlx5 PMD would keep
indirect actions. In the end, applications are greatly affected
by whatever contract there is and need to know it.
It is proposed to advertise capabilities of keeping flow rules
and indirect actions (as a special case of shared object)
using a combination of ethdev info and rte_flow calls.
Then a bug is fixed in mlx5 PMD that prevented indirect RSS action
from being kept, and the driver starts advertising the new capability.
Prior discussions:
1) http://inbox.dpdk.org/dev/20210727073121.895620-1-dkozlyuk@nvidia.com/
2) http://inbox.dpdk.org/dev/20210901085516.3647814-1-dkozlyuk@nvidia.com/
v3:  1. Add a patch 3/6 to update all PMDs that implement rte_flow
        with an explicit reset of the new capability (Ferruh).
     2. Change how the support of keeping particular kinds
        of flow rules is determined, improve wording (Andrew).
     3. Do not require keeping rules and indirect actions
        across reconfiguration (Qi Zhang).
     4. Improve wording (Ori).
Dmitry Kozlyuk (6):
  ethdev: add capability to keep flow rules on restart
  ethdev: add capability to keep shared objects on restart
  net: advertise no support for keeping flow rules
  net/mlx5: discover max flow priority using DevX
  net/mlx5: create drop queue using DevX
  net/mlx5: preserve indirect actions on restart
 doc/guides/prog_guide/rte_flow.rst      |  49 ++++
 drivers/net/bnxt/bnxt_ethdev.c          |   1 +
 drivers/net/bnxt/bnxt_reps.c            |   1 +
 drivers/net/cnxk/cnxk_ethdev_ops.c      |   1 +
 drivers/net/cxgbe/cxgbe_ethdev.c        |   2 +
 drivers/net/dpaa2/dpaa2_ethdev.c        |   1 +
 drivers/net/e1000/em_ethdev.c           |   2 +
 drivers/net/e1000/igb_ethdev.c          |   1 +
 drivers/net/enic/enic_ethdev.c          |   1 +
 drivers/net/failsafe/failsafe_ops.c     |   1 +
 drivers/net/hinic/hinic_pmd_ethdev.c    |   2 +
 drivers/net/hns3/hns3_ethdev.c          |   1 +
 drivers/net/hns3/hns3_ethdev_vf.c       |   1 +
 drivers/net/i40e/i40e_ethdev.c          |   1 +
 drivers/net/i40e/i40e_vf_representor.c  |   2 +
 drivers/net/iavf/iavf_ethdev.c          |   1 +
 drivers/net/ice/ice_dcf_ethdev.c        |   1 +
 drivers/net/igc/igc_ethdev.c            |   1 +
 drivers/net/ipn3ke/ipn3ke_representor.c |   1 +
 drivers/net/mlx5/linux/mlx5_os.c        |   5 -
 drivers/net/mlx5/mlx5_devx.c            | 211 ++++++++++++++---
 drivers/net/mlx5/mlx5_ethdev.c          |   1 +
 drivers/net/mlx5/mlx5_flow.c            | 292 ++++++++++++++++++++++--
 drivers/net/mlx5/mlx5_flow.h            |   6 +
 drivers/net/mlx5/mlx5_flow_dv.c         | 103 +++++++++
 drivers/net/mlx5/mlx5_flow_verbs.c      |  77 +------
 drivers/net/mlx5/mlx5_rx.h              |   4 +
 drivers/net/mlx5/mlx5_rxq.c             |  99 +++++++-
 drivers/net/mlx5/mlx5_trigger.c         |  10 +
 drivers/net/mvpp2/mrvl_ethdev.c         |   2 +
 drivers/net/octeontx2/otx2_ethdev_ops.c |   1 +
 drivers/net/qede/qede_ethdev.c          |   1 +
 drivers/net/sfc/sfc_ethdev.c            |   1 +
 drivers/net/softnic/rte_eth_softnic.c   |   1 +
 drivers/net/tap/rte_eth_tap.c           |   1 +
 drivers/net/txgbe/txgbe_ethdev.c        |   1 +
 drivers/net/txgbe/txgbe_ethdev_vf.c     |   1 +
 lib/ethdev/rte_ethdev.h                 |  10 +
 lib/ethdev/rte_flow.h                   |   1 +
 39 files changed, 762 insertions(+), 137 deletions(-)
-- 
2.25.1
^ permalink raw reply	[flat|nested] 96+ messages in thread
* [dpdk-dev] [PATCH v3 1/6] ethdev: add capability to keep flow rules on restart
  2021-10-19 12:37   ` [dpdk-dev] [PATCH v3 0/6] Flow entites behavior on port restart Dmitry Kozlyuk
@ 2021-10-19 12:37     ` Dmitry Kozlyuk
  2021-10-19 15:22       ` Ori Kam
                         ` (2 more replies)
  2021-10-19 12:37     ` [dpdk-dev] [PATCH v3 2/6] ethdev: add capability to keep shared objects " Dmitry Kozlyuk
                       ` (6 subsequent siblings)
  7 siblings, 3 replies; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-10-19 12:37 UTC (permalink / raw)
  To: dev; +Cc: Qi Zhang, Ori Kam, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
Previously, it was not specified what happens to the flow rules
when the device is stopped, possibly reconfigured, then started.
If flow rules were kept, it could be convenient for application
developers, because they wouldn't need to save and restore them.
However, due to the number of flows and possible creation rate it is
impractical to save all flow rules in DPDK layer. This means that flow
rules persistence really depends on whether PMD and HW can implement it
efficiently. It can also be limited by the rule item and action types,
and its attributes transfer bit (a combination of an item/action type
and a value of the transfer bit is called a ruel feature).
Add a device capability bit for PMDs that can keep at least some
of the flow rules across restart. Without this capability behavior
is still unspecified and it is declared that the application must
flush the rules before stopping the device.
Allow the application to test for persitence of rules using
a particular feature by attempting to create a flow rule
using that feature when the device is stopped
and checking for the specific error.
This is logical because if the PMD can to create the flow rule
when the device is not started and use it after the start happens,
it is natural that it can move its internal flow rule object
to the same state when the device is stopped and restore the state
when the device is started.
Rule persistence across a reconfigurations is not required,
because tracking all the rules and configuration-dependent resources
they use may be infeasible. In case a PMD cannot keep the rules
across reconfiguration, it is allowed just to report an error.
Application must then flush the rules before attempting it.
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
---
 doc/guides/prog_guide/rte_flow.rst | 25 +++++++++++++++++++++++++
 lib/ethdev/rte_ethdev.h            |  7 +++++++
 lib/ethdev/rte_flow.h              |  1 +
 3 files changed, 33 insertions(+)
diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index 2b42d5ec8c..ff67b211e3 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -87,6 +87,31 @@ To avoid resource leaks on the PMD side, handles must be explicitly
 destroyed by the application before releasing associated resources such as
 queues and ports.
 
+If ``RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP`` is not advertised,
+rules cannot be created until the device is started for the first time
+and cannot be kept when the device is stopped.
+However, PMD also does not flush them automatically on stop,
+so the application must call ``rte_flow_flush()`` or ``rte_flow_destroy()``
+before stopping the device to ensure no rules remain.
+
+If ``RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP`` is advertised, this means
+the PMD can keep at least some rules across the device stop and start.
+However, ``rte_eth_dev_configure()`` may fail if any rules remain,
+so the application must flush them before attempting a reconfiguration.
+Keeping may be unsupported for some types of rule items and actions,
+as well as depending on the value of flow attributes transfer bit.
+A combination of an item or action type and a value of the transfer bit
+is called a rule feature.
+To test if rules with a particular feature are kept, the application must try
+to create a valid rule using this feature when the device is stopped
+(after it has been configured or started previously).
+If it fails with an error of type ``RTE_FLOW_ERROR_TYPE_STATE``,
+rules using this feature are flushed when the device is stopped.
+If it suceeds, such rules will be kept when the device is stopped,
+provided they do not use other features that are not supported.
+Rules that are created when the device is stopped, including the rules
+created for the test, will be kept after the device is started.
+
 The following sections cover:
 
 - **Attributes** (represented by ``struct rte_flow_attr``): properties of a
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 6d80514ba7..a0b388bb25 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -90,6 +90,11 @@
  *     - flow director filtering mode (but not filtering rules)
  *     - NIC queue statistics mappings
  *
+ * The following configuration may be retained or not
+ * depending on the device capabilities:
+ *
+ *     - flow rules
+ *
  * Any other configuration will not be stored and will need to be re-entered
  * before a call to rte_eth_dev_start().
  *
@@ -1445,6 +1450,8 @@ struct rte_eth_conf {
 #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
 /** Device supports Tx queue setup after device started. */
 #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
+/** Device supports keeping flow rules across restart. */
+#define RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP 0x00000004
 /**@}*/
 
 /*
diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
index a89945061a..aa0182d021 100644
--- a/lib/ethdev/rte_flow.h
+++ b/lib/ethdev/rte_flow.h
@@ -3344,6 +3344,7 @@ enum rte_flow_error_type {
 	RTE_FLOW_ERROR_TYPE_ACTION_NUM, /**< Number of actions. */
 	RTE_FLOW_ERROR_TYPE_ACTION_CONF, /**< Action configuration. */
 	RTE_FLOW_ERROR_TYPE_ACTION, /**< Specific action. */
+	RTE_FLOW_ERROR_TYPE_STATE, /**< Current device state. */
 };
 
 /**
-- 
2.25.1
^ permalink raw reply	[flat|nested] 96+ messages in thread
* [dpdk-dev] [PATCH v3 2/6] ethdev: add capability to keep shared objects on restart
  2021-10-19 12:37   ` [dpdk-dev] [PATCH v3 0/6] Flow entites behavior on port restart Dmitry Kozlyuk
  2021-10-19 12:37     ` [dpdk-dev] [PATCH v3 1/6] ethdev: add capability to keep flow rules on restart Dmitry Kozlyuk
@ 2021-10-19 12:37     ` Dmitry Kozlyuk
  2021-10-19 15:22       ` Ori Kam
  2021-10-19 12:37     ` [dpdk-dev] [PATCH v3 3/6] net: advertise no support for keeping flow rules Dmitry Kozlyuk
                       ` (5 subsequent siblings)
  7 siblings, 1 reply; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-10-19 12:37 UTC (permalink / raw)
  To: dev; +Cc: Ori Kam, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
rte_flow_action_handle_create() did not mention what happens
with an indirect action when a device is stopped and started again.
It is natural for some indirect actions, like counter, to be persistent.
Keeping others at least saves application time and complexity.
However, not all PMDs can support it, or the support may be limited
by particular action kinds, that is, combinations of action type
and the value of the transfer bit in its configuration.
Add a device capability to indicate if at least some indirect actions
are kept across the above sequence. Without this capability the behavior
is still unspecified, and application is required to destroy
the indirect actions before stopping the device.
In the future, indirect actions may not be the only type of objects
shared between flow rules. The capability bit intends to cover all
possible types of such objects, hence its name.
Declare that the application can test for the persistence
of a particular indirect action kind by attempting to create
an indirect action of that kind when the device is stopped
and checking for the specific error type.
This is logical because if the PMD can to create an indirect action
when the device is not started and use it after the start happens,
it is natural that it can move its internal flow shared object
to the same state when the device is stopped and restore the state
when the device is started.
Indirect action persistence across a reconfigurations is not required.
In case a PMD cannot keep the indirect actions across reconfiguration,
it is allowed just to report an error.
Application must then flush the indirect actions before attempting it.
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
---
 doc/guides/prog_guide/rte_flow.rst | 24 ++++++++++++++++++++++++
 lib/ethdev/rte_ethdev.h            |  3 +++
 2 files changed, 27 insertions(+)
diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index ff67b211e3..19e17f453d 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -2810,6 +2810,30 @@ updated depend on the type of the ``action`` and different for every type.
 The indirect action specified data (e.g. counter) can be queried by
 ``rte_flow_action_handle_query()``.
 
+If ``RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP`` is not advertised,
+indirect actions cannot be created until the device is started for the first time
+and cannot be kept when the device is stopped.
+However, PMD also does not flush them automatically on stop,
+so the application must call ``rte_flow_action_handle_destroy()``
+before stopping the device to ensure no indirect actions remain.
+
+If ``RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP`` is advertised,
+this means that the PMD can keep at least some indirect actions
+across device stop and start.
+However, ``rte_eth_dev_configure()`` may fail if any indirect actions remain,
+so the application must destroy them before attempting a reconfiguration.
+Keeping may be only supported for certain kinds of indirect actions.
+A kind is a combination of an action type and a value of its transfer bit.
+To test if a particular kind of indirect actions is kept,
+the application must try to create a valid indirect action of that kind
+when the device is stopped (after it has been configured or started previously).
+If it fails with an error of type ``RTE_FLOW_ERROR_TYPE_STATE``,
+indirect actions of this kind are flushed when the device is stopped.
+If it succeeds, all indirect actions of the same kind are kept
+when the device is stopped.
+Indirect actions of a kept kind that are created when the device is stopped,
+including the ones created for the test, will be kept after the device start.
+
 .. _table_rte_flow_action_handle:
 
 .. table:: INDIRECT
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index a0b388bb25..12fc7262eb 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -94,6 +94,7 @@
  * depending on the device capabilities:
  *
  *     - flow rules
+ *     - flow-related shared objects, e.g. indirect actions
  *
  * Any other configuration will not be stored and will need to be re-entered
  * before a call to rte_eth_dev_start().
@@ -1452,6 +1453,8 @@ struct rte_eth_conf {
 #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
 /** Device supports keeping flow rules across restart. */
 #define RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP 0x00000004
+/** Device supports keeping shared flow objects across restart. */
+#define RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP 0x00000008
 /**@}*/
 
 /*
-- 
2.25.1
^ permalink raw reply	[flat|nested] 96+ messages in thread
* [dpdk-dev] [PATCH v3 3/6] net: advertise no support for keeping flow rules
  2021-10-19 12:37   ` [dpdk-dev] [PATCH v3 0/6] Flow entites behavior on port restart Dmitry Kozlyuk
  2021-10-19 12:37     ` [dpdk-dev] [PATCH v3 1/6] ethdev: add capability to keep flow rules on restart Dmitry Kozlyuk
  2021-10-19 12:37     ` [dpdk-dev] [PATCH v3 2/6] ethdev: add capability to keep shared objects " Dmitry Kozlyuk
@ 2021-10-19 12:37     ` Dmitry Kozlyuk
  2021-10-20 10:08       ` Andrew Rybchenko
  2021-10-19 12:37     ` [dpdk-dev] [PATCH v3 4/6] net/mlx5: discover max flow priority using DevX Dmitry Kozlyuk
                       ` (4 subsequent siblings)
  7 siblings, 1 reply; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-10-19 12:37 UTC (permalink / raw)
  To: dev
  Cc: Ferruh Yigit, Ajit Khaparde, Somnath Kotur, Nithin Dabilpuram,
	Kiran Kumar K, Sunil Kumar Kori, Satha Rao, Rahul Lakkireddy,
	Hemant Agrawal, Sachin Saxena, Haiyue Wang, John Daley,
	Hyong Youb Kim, Gaetan Rivet, Ziyang Xuan, Xiaoyun Wang,
	Guoyang Zhou, Min Hu (Connor),
	Yisen Zhuang, Lijun Ou, Beilei Xing, Jingjing Wu, Qiming Yang,
	Qi Zhang, Rosen Xu, Liron Himi, Jerin Jacob, Rasesh Mody,
	Devendra Singh Rawat, Andrew Rybchenko, Jasvinder Singh,
	Cristian Dumitrescu, Keith Wiles, Jiawen Wu, Jian Wang
When RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP capability bit is zero,
the specified behavior is the same as it had been before
this bit was introduced. Explicitly reset it in all PMDs
supporting rte_flow API in order to attract the attention
of maintainers, who should eventually choose to advertise
the new capability or not. It is already known that
mlx4 and mlx5 will not support this capability.
For RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP
similar action is not performed,
because no PMD except mlx5 supports indirect actions.
Any PMD that starts doing so will anyway have to consider
all relevant API, including this capability.
Suggested-by: Ferruh Yigit <ferruh.yigit@intel.com>
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
---
 drivers/net/bnxt/bnxt_ethdev.c          | 1 +
 drivers/net/bnxt/bnxt_reps.c            | 1 +
 drivers/net/cnxk/cnxk_ethdev_ops.c      | 1 +
 drivers/net/cxgbe/cxgbe_ethdev.c        | 2 ++
 drivers/net/dpaa2/dpaa2_ethdev.c        | 1 +
 drivers/net/e1000/em_ethdev.c           | 2 ++
 drivers/net/e1000/igb_ethdev.c          | 1 +
 drivers/net/enic/enic_ethdev.c          | 1 +
 drivers/net/failsafe/failsafe_ops.c     | 1 +
 drivers/net/hinic/hinic_pmd_ethdev.c    | 2 ++
 drivers/net/hns3/hns3_ethdev.c          | 1 +
 drivers/net/hns3/hns3_ethdev_vf.c       | 1 +
 drivers/net/i40e/i40e_ethdev.c          | 1 +
 drivers/net/i40e/i40e_vf_representor.c  | 2 ++
 drivers/net/iavf/iavf_ethdev.c          | 1 +
 drivers/net/ice/ice_dcf_ethdev.c        | 1 +
 drivers/net/igc/igc_ethdev.c            | 1 +
 drivers/net/ipn3ke/ipn3ke_representor.c | 1 +
 drivers/net/mvpp2/mrvl_ethdev.c         | 2 ++
 drivers/net/octeontx2/otx2_ethdev_ops.c | 1 +
 drivers/net/qede/qede_ethdev.c          | 1 +
 drivers/net/sfc/sfc_ethdev.c            | 1 +
 drivers/net/softnic/rte_eth_softnic.c   | 1 +
 drivers/net/tap/rte_eth_tap.c           | 1 +
 drivers/net/txgbe/txgbe_ethdev.c        | 1 +
 drivers/net/txgbe/txgbe_ethdev_vf.c     | 1 +
 26 files changed, 31 insertions(+)
diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index aa7e7fdc85..1a6e0128ff 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -1009,6 +1009,7 @@ static int bnxt_dev_info_get_op(struct rte_eth_dev *eth_dev,
 	dev_info->speed_capa = bnxt_get_speed_capabilities(bp);
 	dev_info->dev_capa = RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP |
 			     RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	dev_info->default_rxconf = (struct rte_eth_rxconf) {
 		.rx_thresh = {
diff --git a/drivers/net/bnxt/bnxt_reps.c b/drivers/net/bnxt/bnxt_reps.c
index df05619c3f..0697f820db 100644
--- a/drivers/net/bnxt/bnxt_reps.c
+++ b/drivers/net/bnxt/bnxt_reps.c
@@ -525,6 +525,7 @@ int bnxt_rep_dev_info_get_op(struct rte_eth_dev *eth_dev,
 	dev_info->max_tx_queues = max_rx_rings;
 	dev_info->reta_size = bnxt_rss_hash_tbl_size(parent_bp);
 	dev_info->hash_key_size = 40;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	/* MTU specifics */
 	dev_info->min_mtu = RTE_ETHER_MIN_MTU;
diff --git a/drivers/net/cnxk/cnxk_ethdev_ops.c b/drivers/net/cnxk/cnxk_ethdev_ops.c
index b6cc5286c6..a2b85c9411 100644
--- a/drivers/net/cnxk/cnxk_ethdev_ops.c
+++ b/drivers/net/cnxk/cnxk_ethdev_ops.c
@@ -68,6 +68,7 @@ cnxk_nix_info_get(struct rte_eth_dev *eth_dev, struct rte_eth_dev_info *devinfo)
 	devinfo->speed_capa = dev->speed_capa;
 	devinfo->dev_capa = RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP |
 			    RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP;
+	devinfo->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 	return 0;
 }
 
diff --git a/drivers/net/cxgbe/cxgbe_ethdev.c b/drivers/net/cxgbe/cxgbe_ethdev.c
index cd9aa9f84b..a1207fcc17 100644
--- a/drivers/net/cxgbe/cxgbe_ethdev.c
+++ b/drivers/net/cxgbe/cxgbe_ethdev.c
@@ -131,6 +131,8 @@ int cxgbe_dev_info_get(struct rte_eth_dev *eth_dev,
 	device_info->max_vfs = adapter->params.arch.vfcount;
 	device_info->max_vmdq_pools = 0; /* XXX: For now no support for VMDQ */
 
+	device_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
+
 	device_info->rx_queue_offload_capa = 0UL;
 	device_info->rx_offload_capa = CXGBE_RX_OFFLOADS;
 
diff --git a/drivers/net/dpaa2/dpaa2_ethdev.c b/drivers/net/dpaa2/dpaa2_ethdev.c
index ff8ae89922..6f8d4d3ad8 100644
--- a/drivers/net/dpaa2/dpaa2_ethdev.c
+++ b/drivers/net/dpaa2/dpaa2_ethdev.c
@@ -255,6 +255,7 @@ dpaa2_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->speed_capa = ETH_LINK_SPEED_1G |
 			ETH_LINK_SPEED_2_5G |
 			ETH_LINK_SPEED_10G;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	dev_info->max_hash_mac_addrs = 0;
 	dev_info->max_vfs = 0;
diff --git a/drivers/net/e1000/em_ethdev.c b/drivers/net/e1000/em_ethdev.c
index a0ca371b02..897c15d6da 100644
--- a/drivers/net/e1000/em_ethdev.c
+++ b/drivers/net/e1000/em_ethdev.c
@@ -1108,6 +1108,8 @@ eth_em_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 			ETH_LINK_SPEED_100M_HD | ETH_LINK_SPEED_100M |
 			ETH_LINK_SPEED_1G;
 
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
+
 	/* Preferred queue parameters */
 	dev_info->default_rxportconf.nb_queues = 1;
 	dev_info->default_txportconf.nb_queues = 1;
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index 6510cd7ceb..49e354b388 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -2178,6 +2178,7 @@ eth_igb_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->tx_queue_offload_capa = igb_get_tx_queue_offloads_capa(dev);
 	dev_info->tx_offload_capa = igb_get_tx_port_offloads_capa(dev) |
 				    dev_info->tx_queue_offload_capa;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	switch (hw->mac.type) {
 	case e1000_82575:
diff --git a/drivers/net/enic/enic_ethdev.c b/drivers/net/enic/enic_ethdev.c
index b03e56bc25..d445a11e4d 100644
--- a/drivers/net/enic/enic_ethdev.c
+++ b/drivers/net/enic/enic_ethdev.c
@@ -469,6 +469,7 @@ static int enicpmd_dev_info_get(struct rte_eth_dev *eth_dev,
 	device_info->rx_offload_capa = enic->rx_offload_capa;
 	device_info->tx_offload_capa = enic->tx_offload_capa;
 	device_info->tx_queue_offload_capa = enic->tx_queue_offload_capa;
+	device_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 	device_info->default_rxconf = (struct rte_eth_rxconf) {
 		.rx_free_thresh = ENIC_DEFAULT_RX_FREE_THRESH
 	};
diff --git a/drivers/net/failsafe/failsafe_ops.c b/drivers/net/failsafe/failsafe_ops.c
index d0030af061..3040ce0de6 100644
--- a/drivers/net/failsafe/failsafe_ops.c
+++ b/drivers/net/failsafe/failsafe_ops.c
@@ -1222,6 +1222,7 @@ fs_dev_infos_get(struct rte_eth_dev *dev,
 	infos->dev_capa =
 		RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP |
 		RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP;
+	infos->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_PROBED) {
 		struct rte_eth_dev_info sub_info;
diff --git a/drivers/net/hinic/hinic_pmd_ethdev.c b/drivers/net/hinic/hinic_pmd_ethdev.c
index cd4dad8588..9567974cc9 100644
--- a/drivers/net/hinic/hinic_pmd_ethdev.c
+++ b/drivers/net/hinic/hinic_pmd_ethdev.c
@@ -752,6 +752,8 @@ hinic_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
 				DEV_TX_OFFLOAD_TCP_TSO |
 				DEV_TX_OFFLOAD_MULTI_SEGS;
 
+	info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
+
 	info->hash_key_size = HINIC_RSS_KEY_SIZE;
 	info->reta_size = HINIC_RSS_INDIR_SIZE;
 	info->flow_type_rss_offloads = HINIC_RSS_OFFLOAD_ALL;
diff --git a/drivers/net/hns3/hns3_ethdev.c b/drivers/net/hns3/hns3_ethdev.c
index cabf73ffbc..6cfcbe1375 100644
--- a/drivers/net/hns3/hns3_ethdev.c
+++ b/drivers/net/hns3/hns3_ethdev.c
@@ -2752,6 +2752,7 @@ hns3_dev_infos_get(struct rte_eth_dev *eth_dev, struct rte_eth_dev_info *info)
 	if (hns3_dev_indep_txrx_supported(hw))
 		info->dev_capa = RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP |
 				 RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP;
+	info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	if (hns3_dev_ptp_supported(hw))
 		info->rx_offload_capa |= DEV_RX_OFFLOAD_TIMESTAMP;
diff --git a/drivers/net/hns3/hns3_ethdev_vf.c b/drivers/net/hns3/hns3_ethdev_vf.c
index 8d9b7979c8..25f73e62c6 100644
--- a/drivers/net/hns3/hns3_ethdev_vf.c
+++ b/drivers/net/hns3/hns3_ethdev_vf.c
@@ -994,6 +994,7 @@ hns3vf_dev_infos_get(struct rte_eth_dev *eth_dev, struct rte_eth_dev_info *info)
 	if (hns3_dev_indep_txrx_supported(hw))
 		info->dev_capa = RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP |
 				 RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP;
+	info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	info->rx_desc_lim = (struct rte_eth_desc_lim) {
 		.nb_max = HNS3_MAX_RING_DESC,
diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 1fc3d897a8..13b0ccfbf7 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -3747,6 +3747,7 @@ i40e_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->dev_capa =
 		RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP |
 		RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	dev_info->hash_key_size = (I40E_PFQF_HKEY_MAX_INDEX + 1) *
 						sizeof(uint32_t);
diff --git a/drivers/net/i40e/i40e_vf_representor.c b/drivers/net/i40e/i40e_vf_representor.c
index 0481b55381..6c06c8992b 100644
--- a/drivers/net/i40e/i40e_vf_representor.c
+++ b/drivers/net/i40e/i40e_vf_representor.c
@@ -35,6 +35,8 @@ i40e_vf_representor_dev_infos_get(struct rte_eth_dev *ethdev,
 	/* get dev info for the vdev */
 	dev_info->device = ethdev->device;
 
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
+
 	dev_info->max_rx_queues = ethdev->data->nb_rx_queues;
 	dev_info->max_tx_queues = ethdev->data->nb_tx_queues;
 
diff --git a/drivers/net/iavf/iavf_ethdev.c b/drivers/net/iavf/iavf_ethdev.c
index 5a5a7f59e1..83914b05cc 100644
--- a/drivers/net/iavf/iavf_ethdev.c
+++ b/drivers/net/iavf/iavf_ethdev.c
@@ -959,6 +959,7 @@ iavf_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->reta_size = vf->vf_res->rss_lut_size;
 	dev_info->flow_type_rss_offloads = IAVF_RSS_OFFLOAD_ALL;
 	dev_info->max_mac_addrs = IAVF_NUM_MACADDR_MAX;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 	dev_info->rx_offload_capa =
 		DEV_RX_OFFLOAD_VLAN_STRIP |
 		DEV_RX_OFFLOAD_QINQ_STRIP |
diff --git a/drivers/net/ice/ice_dcf_ethdev.c b/drivers/net/ice/ice_dcf_ethdev.c
index 91f6558742..dab11b21ef 100644
--- a/drivers/net/ice/ice_dcf_ethdev.c
+++ b/drivers/net/ice/ice_dcf_ethdev.c
@@ -676,6 +676,7 @@ ice_dcf_dev_info_get(struct rte_eth_dev *dev,
 	dev_info->hash_key_size = hw->vf_res->rss_key_size;
 	dev_info->reta_size = hw->vf_res->rss_lut_size;
 	dev_info->flow_type_rss_offloads = ICE_RSS_OFFLOAD_ALL;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	dev_info->rx_offload_capa =
 		DEV_RX_OFFLOAD_VLAN_STRIP |
diff --git a/drivers/net/igc/igc_ethdev.c b/drivers/net/igc/igc_ethdev.c
index 0e41c85d29..d55bc4babd 100644
--- a/drivers/net/igc/igc_ethdev.c
+++ b/drivers/net/igc/igc_ethdev.c
@@ -1488,6 +1488,7 @@ eth_igc_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->min_rx_bufsize = 256; /* See BSIZE field of RCTL register. */
 	dev_info->max_rx_pktlen = MAX_RX_JUMBO_FRAME_SIZE;
 	dev_info->max_mac_addrs = hw->mac.rar_entry_count;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 	dev_info->rx_offload_capa = IGC_RX_OFFLOAD_ALL;
 	dev_info->tx_offload_capa = IGC_TX_OFFLOAD_ALL;
 	dev_info->rx_queue_offload_capa = DEV_RX_OFFLOAD_VLAN_STRIP;
diff --git a/drivers/net/ipn3ke/ipn3ke_representor.c b/drivers/net/ipn3ke/ipn3ke_representor.c
index 694435a4ae..e7c1968ead 100644
--- a/drivers/net/ipn3ke/ipn3ke_representor.c
+++ b/drivers/net/ipn3ke/ipn3ke_representor.c
@@ -97,6 +97,7 @@ ipn3ke_rpst_dev_infos_get(struct rte_eth_dev *ethdev,
 	dev_info->dev_capa =
 		RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP |
 		RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	dev_info->switch_info.name = ethdev->device->name;
 	dev_info->switch_info.domain_id = rpst->switch_domain_id;
diff --git a/drivers/net/mvpp2/mrvl_ethdev.c b/drivers/net/mvpp2/mrvl_ethdev.c
index 65d011300a..71ff094495 100644
--- a/drivers/net/mvpp2/mrvl_ethdev.c
+++ b/drivers/net/mvpp2/mrvl_ethdev.c
@@ -1718,6 +1718,8 @@ mrvl_dev_infos_get(struct rte_eth_dev *dev,
 {
 	struct mrvl_priv *priv = dev->data->dev_private;
 
+	info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
+
 	info->speed_capa = ETH_LINK_SPEED_10M |
 			   ETH_LINK_SPEED_100M |
 			   ETH_LINK_SPEED_1G |
diff --git a/drivers/net/octeontx2/otx2_ethdev_ops.c b/drivers/net/octeontx2/otx2_ethdev_ops.c
index 552e6bd43d..54cb1f4b44 100644
--- a/drivers/net/octeontx2/otx2_ethdev_ops.c
+++ b/drivers/net/octeontx2/otx2_ethdev_ops.c
@@ -611,6 +611,7 @@ otx2_nix_info_get(struct rte_eth_dev *eth_dev, struct rte_eth_dev_info *devinfo)
 
 	devinfo->dev_capa = RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP |
 				RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP;
+	devinfo->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	return 0;
 }
diff --git a/drivers/net/qede/qede_ethdev.c b/drivers/net/qede/qede_ethdev.c
index fd8c62a182..403c63cc68 100644
--- a/drivers/net/qede/qede_ethdev.c
+++ b/drivers/net/qede/qede_ethdev.c
@@ -1373,6 +1373,7 @@ qede_dev_info_get(struct rte_eth_dev *eth_dev,
 	dev_info->max_rx_pktlen = (uint32_t)ETH_TX_MAX_NON_LSO_PKT_LEN;
 	dev_info->rx_desc_lim = qede_rx_desc_lim;
 	dev_info->tx_desc_lim = qede_tx_desc_lim;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	if (IS_PF(edev))
 		dev_info->max_rx_queues = (uint16_t)RTE_MIN(
diff --git a/drivers/net/sfc/sfc_ethdev.c b/drivers/net/sfc/sfc_ethdev.c
index 9dc5e5b3a3..50908005e8 100644
--- a/drivers/net/sfc/sfc_ethdev.c
+++ b/drivers/net/sfc/sfc_ethdev.c
@@ -183,6 +183,7 @@ sfc_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 
 	dev_info->dev_capa = RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP |
 			     RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	if (mae->status == SFC_MAE_STATUS_SUPPORTED) {
 		dev_info->switch_info.name = dev->device->driver->name;
diff --git a/drivers/net/softnic/rte_eth_softnic.c b/drivers/net/softnic/rte_eth_softnic.c
index b3b55b9035..3622049afa 100644
--- a/drivers/net/softnic/rte_eth_softnic.c
+++ b/drivers/net/softnic/rte_eth_softnic.c
@@ -93,6 +93,7 @@ pmd_dev_infos_get(struct rte_eth_dev *dev __rte_unused,
 	dev_info->max_rx_pktlen = UINT32_MAX;
 	dev_info->max_rx_queues = UINT16_MAX;
 	dev_info->max_tx_queues = UINT16_MAX;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	return 0;
 }
diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index 046f17669d..f8f1eb96f4 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -1006,6 +1006,7 @@ tap_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	 * functions together and not in partial combinations
 	 */
 	dev_info->flow_type_rss_offloads = ~TAP_RSS_HF_MASK;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	return 0;
 }
diff --git a/drivers/net/txgbe/txgbe_ethdev.c b/drivers/net/txgbe/txgbe_ethdev.c
index b267da462b..21758851e5 100644
--- a/drivers/net/txgbe/txgbe_ethdev.c
+++ b/drivers/net/txgbe/txgbe_ethdev.c
@@ -2613,6 +2613,7 @@ txgbe_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->max_vfs = pci_dev->max_vfs;
 	dev_info->max_vmdq_pools = ETH_64_POOLS;
 	dev_info->vmdq_queue_num = dev_info->max_rx_queues;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 	dev_info->rx_queue_offload_capa = txgbe_get_rx_queue_offloads(dev);
 	dev_info->rx_offload_capa = (txgbe_get_rx_port_offloads(dev) |
 				     dev_info->rx_queue_offload_capa);
diff --git a/drivers/net/txgbe/txgbe_ethdev_vf.c b/drivers/net/txgbe/txgbe_ethdev_vf.c
index 896da8a887..bba234a5e9 100644
--- a/drivers/net/txgbe/txgbe_ethdev_vf.c
+++ b/drivers/net/txgbe/txgbe_ethdev_vf.c
@@ -487,6 +487,7 @@ txgbevf_dev_info_get(struct rte_eth_dev *dev,
 	dev_info->max_hash_mac_addrs = TXGBE_VMDQ_NUM_UC_MAC;
 	dev_info->max_vfs = pci_dev->max_vfs;
 	dev_info->max_vmdq_pools = ETH_64_POOLS;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 	dev_info->rx_queue_offload_capa = txgbe_get_rx_queue_offloads(dev);
 	dev_info->rx_offload_capa = (txgbe_get_rx_port_offloads(dev) |
 				     dev_info->rx_queue_offload_capa);
-- 
2.25.1
^ permalink raw reply	[flat|nested] 96+ messages in thread
* [dpdk-dev] [PATCH v3 4/6] net/mlx5: discover max flow priority using DevX
  2021-10-19 12:37   ` [dpdk-dev] [PATCH v3 0/6] Flow entites behavior on port restart Dmitry Kozlyuk
                       ` (2 preceding siblings ...)
  2021-10-19 12:37     ` [dpdk-dev] [PATCH v3 3/6] net: advertise no support for keeping flow rules Dmitry Kozlyuk
@ 2021-10-19 12:37     ` Dmitry Kozlyuk
  2021-10-19 12:37     ` [dpdk-dev] [PATCH v3 5/6] net/mlx5: create drop queue " Dmitry Kozlyuk
                       ` (3 subsequent siblings)
  7 siblings, 0 replies; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-10-19 12:37 UTC (permalink / raw)
  To: dev; +Cc: stable, Matan Azrad, Viacheslav Ovsiienko
Maximum available flow priority was discovered using Verbs API
regardless of the selected flow engine. This required some Verbs
objects to be initialized in order to use DevX engine. Make priority
discovery an engine method and implement it for DevX using its API.
Cc: stable@dpdk.org
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_os.c   |   1 -
 drivers/net/mlx5/mlx5_flow.c       |  98 +++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_flow.h       |   4 ++
 drivers/net/mlx5/mlx5_flow_dv.c    | 103 +++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_flow_verbs.c |  77 +++------------------
 5 files changed, 215 insertions(+), 68 deletions(-)
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 3746057673..8ee7ada51b 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1830,7 +1830,6 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	priv->drop_queue.hrxq = mlx5_drop_action_create(eth_dev);
 	if (!priv->drop_queue.hrxq)
 		goto error;
-	/* Supported Verbs flow priority number detection. */
 	err = mlx5_flow_discover_priorities(eth_dev);
 	if (err < 0) {
 		err = -err;
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index c914a7120c..bfc3e20c9a 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -9416,3 +9416,101 @@ mlx5_dbg__print_pattern(const struct rte_flow_item *item)
 	}
 	printf("END\n");
 }
+
+/* Map of Verbs to Flow priority with 8 Verbs priorities. */
+static const uint32_t priority_map_3[][MLX5_PRIORITY_MAP_MAX] = {
+	{ 0, 1, 2 }, { 2, 3, 4 }, { 5, 6, 7 },
+};
+
+/* Map of Verbs to Flow priority with 16 Verbs priorities. */
+static const uint32_t priority_map_5[][MLX5_PRIORITY_MAP_MAX] = {
+	{ 0, 1, 2 }, { 3, 4, 5 }, { 6, 7, 8 },
+	{ 9, 10, 11 }, { 12, 13, 14 },
+};
+
+/**
+ * Discover the number of available flow priorities.
+ *
+ * @param dev
+ *   Ethernet device.
+ *
+ * @return
+ *   On success, number of available flow priorities.
+ *   On failure, a negative errno-style code and rte_errno is set.
+ */
+int
+mlx5_flow_discover_priorities(struct rte_eth_dev *dev)
+{
+	static const uint16_t vprio[] = {8, 16};
+	const struct mlx5_priv *priv = dev->data->dev_private;
+	const struct mlx5_flow_driver_ops *fops;
+	enum mlx5_flow_drv_type type;
+	int ret;
+
+	type = mlx5_flow_os_get_type();
+	if (type == MLX5_FLOW_TYPE_MAX) {
+		type = MLX5_FLOW_TYPE_VERBS;
+		if (priv->config.devx && priv->config.dv_flow_en)
+			type = MLX5_FLOW_TYPE_DV;
+	}
+	fops = flow_get_drv_ops(type);
+	if (fops->discover_priorities == NULL) {
+		DRV_LOG(ERR, "Priority discovery not supported");
+		rte_errno = ENOTSUP;
+		return -rte_errno;
+	}
+	ret = fops->discover_priorities(dev, vprio, RTE_DIM(vprio));
+	if (ret < 0)
+		return ret;
+	switch (ret) {
+	case 8:
+		ret = RTE_DIM(priority_map_3);
+		break;
+	case 16:
+		ret = RTE_DIM(priority_map_5);
+		break;
+	default:
+		rte_errno = ENOTSUP;
+		DRV_LOG(ERR,
+			"port %u maximum priority: %d expected 8/16",
+			dev->data->port_id, ret);
+		return -rte_errno;
+	}
+	DRV_LOG(INFO, "port %u supported flow priorities:"
+		" 0-%d for ingress or egress root table,"
+		" 0-%d for non-root table or transfer root table.",
+		dev->data->port_id, ret - 2,
+		MLX5_NON_ROOT_FLOW_MAX_PRIO - 1);
+	return ret;
+}
+
+/**
+ * Adjust flow priority based on the highest layer and the request priority.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] priority
+ *   The rule base priority.
+ * @param[in] subpriority
+ *   The priority based on the items.
+ *
+ * @return
+ *   The new priority.
+ */
+uint32_t
+mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
+			  uint32_t subpriority)
+{
+	uint32_t res = 0;
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	switch (priv->config.flow_prio) {
+	case RTE_DIM(priority_map_3):
+		res = priority_map_3[priority][subpriority];
+		break;
+	case RTE_DIM(priority_map_5):
+		res = priority_map_5[priority][subpriority];
+		break;
+	}
+	return  res;
+}
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 5c68d4f7d7..8f94125f26 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1226,6 +1226,9 @@ typedef int (*mlx5_flow_create_def_policy_t)
 			(struct rte_eth_dev *dev);
 typedef void (*mlx5_flow_destroy_def_policy_t)
 			(struct rte_eth_dev *dev);
+typedef int (*mlx5_flow_discover_priorities_t)
+			(struct rte_eth_dev *dev,
+			 const uint16_t *vprio, int vprio_n);
 
 struct mlx5_flow_driver_ops {
 	mlx5_flow_validate_t validate;
@@ -1260,6 +1263,7 @@ struct mlx5_flow_driver_ops {
 	mlx5_flow_action_update_t action_update;
 	mlx5_flow_action_query_t action_query;
 	mlx5_flow_sync_domain_t sync_domain;
+	mlx5_flow_discover_priorities_t discover_priorities;
 };
 
 /* mlx5_flow.c */
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index c6370cd1d6..155745748f 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -17978,6 +17978,108 @@ flow_dv_sync_domain(struct rte_eth_dev *dev, uint32_t domains, uint32_t flags)
 	return 0;
 }
 
+/**
+ * Discover the number of available flow priorities
+ * by trying to create a flow with the highest priority value
+ * for each possible number.
+ *
+ * @param[in] dev
+ *   Ethernet device.
+ * @param[in] vprio
+ *   List of possible number of available priorities.
+ * @param[in] vprio_n
+ *   Size of @p vprio array.
+ * @return
+ *   On success, number of available flow priorities.
+ *   On failure, a negative errno-style code and rte_errno is set.
+ */
+static int
+flow_dv_discover_priorities(struct rte_eth_dev *dev,
+			    const uint16_t *vprio, int vprio_n)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_indexed_pool *pool = priv->sh->ipool[MLX5_IPOOL_MLX5_FLOW];
+	struct rte_flow_item_eth eth;
+	struct rte_flow_item item = {
+		.type = RTE_FLOW_ITEM_TYPE_ETH,
+		.spec = ð,
+		.mask = ð,
+	};
+	struct mlx5_flow_dv_matcher matcher = {
+		.mask = {
+			.size = sizeof(matcher.mask.buf),
+		},
+	};
+	union mlx5_flow_tbl_key tbl_key;
+	struct mlx5_flow flow;
+	void *action;
+	struct rte_flow_error error;
+	uint8_t misc_mask;
+	int i, err, ret = -ENOTSUP;
+
+	/*
+	 * Prepare a flow with a catch-all pattern and a drop action.
+	 * Use drop queue, because shared drop action may be unavailable.
+	 */
+	action = priv->drop_queue.hrxq->action;
+	if (action == NULL) {
+		DRV_LOG(ERR, "Priority discovery requires a drop action");
+		rte_errno = ENOTSUP;
+		return -rte_errno;
+	}
+	memset(&flow, 0, sizeof(flow));
+	flow.handle = mlx5_ipool_zmalloc(pool, &flow.handle_idx);
+	if (flow.handle == NULL) {
+		DRV_LOG(ERR, "Cannot create flow handle");
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	flow.ingress = true;
+	flow.dv.value.size = MLX5_ST_SZ_BYTES(fte_match_param);
+	flow.dv.actions[0] = action;
+	flow.dv.actions_n = 1;
+	memset(ð, 0, sizeof(eth));
+	flow_dv_translate_item_eth(matcher.mask.buf, flow.dv.value.buf,
+				   &item, /* inner */ false, /* group */ 0);
+	matcher.crc = rte_raw_cksum(matcher.mask.buf, matcher.mask.size);
+	for (i = 0; i < vprio_n; i++) {
+		/* Configure the next proposed maximum priority. */
+		matcher.priority = vprio[i] - 1;
+		memset(&tbl_key, 0, sizeof(tbl_key));
+		err = flow_dv_matcher_register(dev, &matcher, &tbl_key, &flow,
+					       /* tunnel */ NULL,
+					       /* group */ 0,
+					       &error);
+		if (err != 0) {
+			/* This action is pure SW and must always succeed. */
+			DRV_LOG(ERR, "Cannot register matcher");
+			ret = -rte_errno;
+			break;
+		}
+		/* Try to apply the flow to HW. */
+		misc_mask = flow_dv_matcher_enable(flow.dv.value.buf);
+		__flow_dv_adjust_buf_size(&flow.dv.value.size, misc_mask);
+		err = mlx5_flow_os_create_flow
+				(flow.handle->dvh.matcher->matcher_object,
+				 (void *)&flow.dv.value, flow.dv.actions_n,
+				 flow.dv.actions, &flow.handle->drv_flow);
+		if (err == 0) {
+			claim_zero(mlx5_flow_os_destroy_flow
+						(flow.handle->drv_flow));
+			flow.handle->drv_flow = NULL;
+		}
+		claim_zero(flow_dv_matcher_release(dev, flow.handle));
+		if (err != 0)
+			break;
+		ret = vprio[i];
+	}
+	mlx5_ipool_free(pool, flow.handle_idx);
+	/* Set rte_errno if no expected priority value matched. */
+	if (ret < 0)
+		rte_errno = -ret;
+	return ret;
+}
+
 const struct mlx5_flow_driver_ops mlx5_flow_dv_drv_ops = {
 	.validate = flow_dv_validate,
 	.prepare = flow_dv_prepare,
@@ -18011,6 +18113,7 @@ const struct mlx5_flow_driver_ops mlx5_flow_dv_drv_ops = {
 	.action_update = flow_dv_action_update,
 	.action_query = flow_dv_action_query,
 	.sync_domain = flow_dv_sync_domain,
+	.discover_priorities = flow_dv_discover_priorities,
 };
 
 #endif /* HAVE_IBV_FLOW_DV_SUPPORT */
diff --git a/drivers/net/mlx5/mlx5_flow_verbs.c b/drivers/net/mlx5/mlx5_flow_verbs.c
index b93fd4d2c9..72b9db6c7f 100644
--- a/drivers/net/mlx5/mlx5_flow_verbs.c
+++ b/drivers/net/mlx5/mlx5_flow_verbs.c
@@ -28,17 +28,6 @@
 #define VERBS_SPEC_INNER(item_flags) \
 	(!!((item_flags) & MLX5_FLOW_LAYER_TUNNEL) ? IBV_FLOW_SPEC_INNER : 0)
 
-/* Map of Verbs to Flow priority with 8 Verbs priorities. */
-static const uint32_t priority_map_3[][MLX5_PRIORITY_MAP_MAX] = {
-	{ 0, 1, 2 }, { 2, 3, 4 }, { 5, 6, 7 },
-};
-
-/* Map of Verbs to Flow priority with 16 Verbs priorities. */
-static const uint32_t priority_map_5[][MLX5_PRIORITY_MAP_MAX] = {
-	{ 0, 1, 2 }, { 3, 4, 5 }, { 6, 7, 8 },
-	{ 9, 10, 11 }, { 12, 13, 14 },
-};
-
 /* Verbs specification header. */
 struct ibv_spec_header {
 	enum ibv_flow_spec_type type;
@@ -50,13 +39,17 @@ struct ibv_spec_header {
  *
  * @param[in] dev
  *   Pointer to the Ethernet device structure.
- *
+ * @param[in] vprio
+ *   Expected result variants.
+ * @param[in] vprio_n
+ *   Number of entries in @p vprio array.
  * @return
- *   number of supported flow priority on success, a negative errno
+ *   Number of supported flow priority on success, a negative errno
  *   value otherwise and rte_errno is set.
  */
-int
-mlx5_flow_discover_priorities(struct rte_eth_dev *dev)
+static int
+flow_verbs_discover_priorities(struct rte_eth_dev *dev,
+			       const uint16_t *vprio, int vprio_n)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct {
@@ -79,7 +72,6 @@ mlx5_flow_discover_priorities(struct rte_eth_dev *dev)
 	};
 	struct ibv_flow *flow;
 	struct mlx5_hrxq *drop = priv->drop_queue.hrxq;
-	uint16_t vprio[] = { 8, 16 };
 	int i;
 	int priority = 0;
 
@@ -87,7 +79,7 @@ mlx5_flow_discover_priorities(struct rte_eth_dev *dev)
 		rte_errno = ENOTSUP;
 		return -rte_errno;
 	}
-	for (i = 0; i != RTE_DIM(vprio); i++) {
+	for (i = 0; i != vprio_n; i++) {
 		flow_attr.attr.priority = vprio[i] - 1;
 		flow = mlx5_glue->create_flow(drop->qp, &flow_attr.attr);
 		if (!flow)
@@ -95,59 +87,9 @@ mlx5_flow_discover_priorities(struct rte_eth_dev *dev)
 		claim_zero(mlx5_glue->destroy_flow(flow));
 		priority = vprio[i];
 	}
-	switch (priority) {
-	case 8:
-		priority = RTE_DIM(priority_map_3);
-		break;
-	case 16:
-		priority = RTE_DIM(priority_map_5);
-		break;
-	default:
-		rte_errno = ENOTSUP;
-		DRV_LOG(ERR,
-			"port %u verbs maximum priority: %d expected 8/16",
-			dev->data->port_id, priority);
-		return -rte_errno;
-	}
-	DRV_LOG(INFO, "port %u supported flow priorities:"
-		" 0-%d for ingress or egress root table,"
-		" 0-%d for non-root table or transfer root table.",
-		dev->data->port_id, priority - 2,
-		MLX5_NON_ROOT_FLOW_MAX_PRIO - 1);
 	return priority;
 }
 
-/**
- * Adjust flow priority based on the highest layer and the request priority.
- *
- * @param[in] dev
- *   Pointer to the Ethernet device structure.
- * @param[in] priority
- *   The rule base priority.
- * @param[in] subpriority
- *   The priority based on the items.
- *
- * @return
- *   The new priority.
- */
-uint32_t
-mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
-				   uint32_t subpriority)
-{
-	uint32_t res = 0;
-	struct mlx5_priv *priv = dev->data->dev_private;
-
-	switch (priv->config.flow_prio) {
-	case RTE_DIM(priority_map_3):
-		res = priority_map_3[priority][subpriority];
-		break;
-	case RTE_DIM(priority_map_5):
-		res = priority_map_5[priority][subpriority];
-		break;
-	}
-	return  res;
-}
-
 /**
  * Get Verbs flow counter by index.
  *
@@ -2105,4 +2047,5 @@ const struct mlx5_flow_driver_ops mlx5_flow_verbs_drv_ops = {
 	.destroy = flow_verbs_destroy,
 	.query = flow_verbs_query,
 	.sync_domain = flow_verbs_sync_domain,
+	.discover_priorities = flow_verbs_discover_priorities,
 };
-- 
2.25.1
^ permalink raw reply	[flat|nested] 96+ messages in thread
* [dpdk-dev] [PATCH v3 5/6] net/mlx5: create drop queue using DevX
  2021-10-19 12:37   ` [dpdk-dev] [PATCH v3 0/6] Flow entites behavior on port restart Dmitry Kozlyuk
                       ` (3 preceding siblings ...)
  2021-10-19 12:37     ` [dpdk-dev] [PATCH v3 4/6] net/mlx5: discover max flow priority using DevX Dmitry Kozlyuk
@ 2021-10-19 12:37     ` Dmitry Kozlyuk
  2021-10-19 12:37     ` [dpdk-dev] [PATCH v3 6/6] net/mlx5: preserve indirect actions on restart Dmitry Kozlyuk
                       ` (2 subsequent siblings)
  7 siblings, 0 replies; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-10-19 12:37 UTC (permalink / raw)
  To: dev; +Cc: stable, Matan Azrad, Viacheslav Ovsiienko
Drop queue creation and destruction were not implemented for DevX
flow engine and Verbs engine methods were used as a workaround.
Implement these methods for DevX so that there is a valid queue ID
that can be used regardless of queue configuration via API.
Cc: stable@dpdk.org
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_os.c |   4 -
 drivers/net/mlx5/mlx5_devx.c     | 211 ++++++++++++++++++++++++++-----
 2 files changed, 180 insertions(+), 35 deletions(-)
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 8ee7ada51b..985f0bd489 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1790,10 +1790,6 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	}
 	if (config->devx && config->dv_flow_en && config->dest_tir) {
 		priv->obj_ops = devx_obj_ops;
-		priv->obj_ops.drop_action_create =
-						ibv_obj_ops.drop_action_create;
-		priv->obj_ops.drop_action_destroy =
-						ibv_obj_ops.drop_action_destroy;
 #ifndef HAVE_MLX5DV_DEVX_UAR_OFFSET
 		priv->obj_ops.txq_obj_modify = ibv_obj_ops.txq_obj_modify;
 #else
diff --git a/drivers/net/mlx5/mlx5_devx.c b/drivers/net/mlx5/mlx5_devx.c
index a1db53577a..1e62108c94 100644
--- a/drivers/net/mlx5/mlx5_devx.c
+++ b/drivers/net/mlx5/mlx5_devx.c
@@ -226,17 +226,17 @@ mlx5_rx_devx_get_event(struct mlx5_rxq_obj *rxq_obj)
  *
  * @param dev
  *   Pointer to Ethernet device.
- * @param idx
- *   Queue index in DPDK Rx queue array.
+ * @param rxq_data
+ *   RX queue data.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_rxq_create_devx_rq_resources(struct rte_eth_dev *dev, uint16_t idx)
+mlx5_rxq_create_devx_rq_resources(struct rte_eth_dev *dev,
+				  struct mlx5_rxq_data *rxq_data)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[idx];
 	struct mlx5_rxq_ctrl *rxq_ctrl =
 		container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
 	struct mlx5_devx_create_rq_attr rq_attr = { 0 };
@@ -289,20 +289,20 @@ mlx5_rxq_create_devx_rq_resources(struct rte_eth_dev *dev, uint16_t idx)
  *
  * @param dev
  *   Pointer to Ethernet device.
- * @param idx
- *   Queue index in DPDK Rx queue array.
+ * @param rxq_data
+ *   RX queue data.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_rxq_create_devx_cq_resources(struct rte_eth_dev *dev, uint16_t idx)
+mlx5_rxq_create_devx_cq_resources(struct rte_eth_dev *dev,
+				  struct mlx5_rxq_data *rxq_data)
 {
 	struct mlx5_devx_cq *cq_obj = 0;
 	struct mlx5_devx_cq_attr cq_attr = { 0 };
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_dev_ctx_shared *sh = priv->sh;
-	struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[idx];
 	struct mlx5_rxq_ctrl *rxq_ctrl =
 		container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
 	unsigned int cqe_n = mlx5_rxq_cqe_num(rxq_data);
@@ -497,13 +497,13 @@ mlx5_rxq_devx_obj_new(struct rte_eth_dev *dev, uint16_t idx)
 		tmpl->fd = mlx5_os_get_devx_channel_fd(tmpl->devx_channel);
 	}
 	/* Create CQ using DevX API. */
-	ret = mlx5_rxq_create_devx_cq_resources(dev, idx);
+	ret = mlx5_rxq_create_devx_cq_resources(dev, rxq_data);
 	if (ret) {
 		DRV_LOG(ERR, "Failed to create CQ.");
 		goto error;
 	}
 	/* Create RQ using DevX API. */
-	ret = mlx5_rxq_create_devx_rq_resources(dev, idx);
+	ret = mlx5_rxq_create_devx_rq_resources(dev, rxq_data);
 	if (ret) {
 		DRV_LOG(ERR, "Port %u Rx queue %u RQ creation failure.",
 			dev->data->port_id, idx);
@@ -536,6 +536,11 @@ mlx5_rxq_devx_obj_new(struct rte_eth_dev *dev, uint16_t idx)
  *   Pointer to Ethernet device.
  * @param log_n
  *   Log of number of queues in the array.
+ * @param queues
+ *   List of RX queue indices or NULL, in which case
+ *   the attribute will be filled by drop queue ID.
+ * @param queues_n
+ *   Size of @p queues array or 0 if it is NULL.
  * @param ind_tbl
  *   DevX indirection table object.
  *
@@ -563,6 +568,11 @@ mlx5_devx_ind_table_create_rqt_attr(struct rte_eth_dev *dev,
 	}
 	rqt_attr->rqt_max_size = priv->config.ind_table_max_size;
 	rqt_attr->rqt_actual_size = rqt_n;
+	if (queues == NULL) {
+		for (i = 0; i < rqt_n; i++)
+			rqt_attr->rq_list[i] = priv->drop_queue.rxq->rq->id;
+		return rqt_attr;
+	}
 	for (i = 0; i != queues_n; ++i) {
 		struct mlx5_rxq_data *rxq = (*priv->rxqs)[queues[i]];
 		struct mlx5_rxq_ctrl *rxq_ctrl =
@@ -595,11 +605,12 @@ mlx5_devx_ind_table_new(struct rte_eth_dev *dev, const unsigned int log_n,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_devx_rqt_attr *rqt_attr = NULL;
+	const uint16_t *queues = dev->data->dev_started ? ind_tbl->queues :
+							  NULL;
 
 	MLX5_ASSERT(ind_tbl);
-	rqt_attr = mlx5_devx_ind_table_create_rqt_attr(dev, log_n,
-							ind_tbl->queues,
-							ind_tbl->queues_n);
+	rqt_attr = mlx5_devx_ind_table_create_rqt_attr(dev, log_n, queues,
+						       ind_tbl->queues_n);
 	if (!rqt_attr)
 		return -rte_errno;
 	ind_tbl->rqt = mlx5_devx_cmd_create_rqt(priv->sh->ctx, rqt_attr);
@@ -670,7 +681,8 @@ mlx5_devx_ind_table_destroy(struct mlx5_ind_table_obj *ind_tbl)
  * @param[in] hash_fields
  *   Verbs protocol hash field to make the RSS on.
  * @param[in] ind_tbl
- *   Indirection table for TIR.
+ *   Indirection table for TIR. If table queues array is NULL,
+ *   a TIR for drop queue is assumed.
  * @param[in] tunnel
  *   Tunnel type.
  * @param[out] tir_attr
@@ -686,19 +698,27 @@ mlx5_devx_tir_attr_set(struct rte_eth_dev *dev, const uint8_t *rss_key,
 		       int tunnel, struct mlx5_devx_tir_attr *tir_attr)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[ind_tbl->queues[0]];
-	struct mlx5_rxq_ctrl *rxq_ctrl =
-		container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
-	enum mlx5_rxq_type rxq_obj_type = rxq_ctrl->type;
+	enum mlx5_rxq_type rxq_obj_type;
 	bool lro = true;
 	uint32_t i;
 
-	/* Enable TIR LRO only if all the queues were configured for. */
-	for (i = 0; i < ind_tbl->queues_n; ++i) {
-		if (!(*priv->rxqs)[ind_tbl->queues[i]]->lro) {
-			lro = false;
-			break;
+	/* NULL queues designate drop queue. */
+	if (ind_tbl->queues != NULL) {
+		struct mlx5_rxq_data *rxq_data =
+					(*priv->rxqs)[ind_tbl->queues[0]];
+		struct mlx5_rxq_ctrl *rxq_ctrl =
+			container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
+		rxq_obj_type = rxq_ctrl->type;
+
+		/* Enable TIR LRO only if all the queues were configured for. */
+		for (i = 0; i < ind_tbl->queues_n; ++i) {
+			if (!(*priv->rxqs)[ind_tbl->queues[i]]->lro) {
+				lro = false;
+				break;
+			}
 		}
+	} else {
+		rxq_obj_type = priv->drop_queue.rxq->rxq_ctrl->type;
 	}
 	memset(tir_attr, 0, sizeof(*tir_attr));
 	tir_attr->disp_type = MLX5_TIRC_DISP_TYPE_INDIRECT;
@@ -857,7 +877,7 @@ mlx5_devx_hrxq_modify(struct rte_eth_dev *dev, struct mlx5_hrxq *hrxq,
 }
 
 /**
- * Create a DevX drop action for Rx Hash queue.
+ * Create a DevX drop Rx queue.
  *
  * @param dev
  *   Pointer to Ethernet device.
@@ -866,14 +886,99 @@ mlx5_devx_hrxq_modify(struct rte_eth_dev *dev, struct mlx5_hrxq *hrxq,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_devx_drop_action_create(struct rte_eth_dev *dev)
+mlx5_rxq_devx_obj_drop_create(struct rte_eth_dev *dev)
 {
-	(void)dev;
-	DRV_LOG(ERR, "DevX drop action is not supported yet.");
-	rte_errno = ENOTSUP;
+	struct mlx5_priv *priv = dev->data->dev_private;
+	int socket_id = dev->device->numa_node;
+	struct mlx5_rxq_ctrl *rxq_ctrl;
+	struct mlx5_rxq_data *rxq_data;
+	struct mlx5_rxq_obj *rxq = NULL;
+	int ret;
+
+	/*
+	 * Initialize dummy control structures.
+	 * They are required to hold pointers for cleanup
+	 * and are only accessible via drop queue DevX objects.
+	 */
+	rxq_ctrl = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*rxq_ctrl),
+			       0, socket_id);
+	if (rxq_ctrl == NULL) {
+		DRV_LOG(ERR, "Port %u could not allocate drop queue control",
+			dev->data->port_id);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	rxq = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*rxq), 0, socket_id);
+	if (rxq == NULL) {
+		DRV_LOG(ERR, "Port %u could not allocate drop queue object",
+			dev->data->port_id);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	rxq->rxq_ctrl = rxq_ctrl;
+	rxq_ctrl->type = MLX5_RXQ_TYPE_STANDARD;
+	rxq_ctrl->priv = priv;
+	rxq_ctrl->obj = rxq;
+	rxq_data = &rxq_ctrl->rxq;
+	/* Create CQ using DevX API. */
+	ret = mlx5_rxq_create_devx_cq_resources(dev, rxq_data);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Port %u drop queue CQ creation failed.",
+			dev->data->port_id);
+		goto error;
+	}
+	/* Create RQ using DevX API. */
+	ret = mlx5_rxq_create_devx_rq_resources(dev, rxq_data);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Port %u drop queue RQ creation failed.",
+			dev->data->port_id);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	/* Change queue state to ready. */
+	ret = mlx5_devx_modify_rq(rxq, MLX5_RXQ_MOD_RST2RDY);
+	if (ret != 0)
+		goto error;
+	/* Initialize drop queue. */
+	priv->drop_queue.rxq = rxq;
+	return 0;
+error:
+	ret = rte_errno; /* Save rte_errno before cleanup. */
+	if (rxq != NULL) {
+		if (rxq->rq_obj.rq != NULL)
+			mlx5_devx_rq_destroy(&rxq->rq_obj);
+		if (rxq->cq_obj.cq != NULL)
+			mlx5_devx_cq_destroy(&rxq->cq_obj);
+		if (rxq->devx_channel)
+			mlx5_os_devx_destroy_event_channel
+							(rxq->devx_channel);
+		mlx5_free(rxq);
+	}
+	if (rxq_ctrl != NULL)
+		mlx5_free(rxq_ctrl);
+	rte_errno = ret; /* Restore rte_errno. */
 	return -rte_errno;
 }
 
+/**
+ * Release drop Rx queue resources.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ */
+static void
+mlx5_rxq_devx_obj_drop_release(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rxq_obj *rxq = priv->drop_queue.rxq;
+	struct mlx5_rxq_ctrl *rxq_ctrl = rxq->rxq_ctrl;
+
+	mlx5_rxq_devx_obj_release(rxq);
+	mlx5_free(rxq);
+	mlx5_free(rxq_ctrl);
+	priv->drop_queue.rxq = NULL;
+}
+
 /**
  * Release a drop hash Rx queue.
  *
@@ -883,9 +988,53 @@ mlx5_devx_drop_action_create(struct rte_eth_dev *dev)
 static void
 mlx5_devx_drop_action_destroy(struct rte_eth_dev *dev)
 {
-	(void)dev;
-	DRV_LOG(ERR, "DevX drop action is not supported yet.");
-	rte_errno = ENOTSUP;
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_hrxq *hrxq = priv->drop_queue.hrxq;
+
+	if (hrxq->tir != NULL)
+		mlx5_devx_tir_destroy(hrxq);
+	if (hrxq->ind_table->ind_table != NULL)
+		mlx5_devx_ind_table_destroy(hrxq->ind_table);
+	if (priv->drop_queue.rxq->rq != NULL)
+		mlx5_rxq_devx_obj_drop_release(dev);
+}
+
+/**
+ * Create a DevX drop action for Rx Hash queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_devx_drop_action_create(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_hrxq *hrxq = priv->drop_queue.hrxq;
+	int ret;
+
+	ret = mlx5_rxq_devx_obj_drop_create(dev);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Cannot create drop RX queue");
+		return ret;
+	}
+	/* hrxq->ind_table queues are NULL, drop RX queue ID will be used */
+	ret = mlx5_devx_ind_table_new(dev, 0, hrxq->ind_table);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Cannot create drop hash RX queue indirection table");
+		goto error;
+	}
+	ret = mlx5_devx_hrxq_new(dev, hrxq, /* tunnel */ false);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Cannot create drop hash RX queue");
+		goto error;
+	}
+	return 0;
+error:
+	mlx5_devx_drop_action_destroy(dev);
+	return ret;
 }
 
 /**
-- 
2.25.1
^ permalink raw reply	[flat|nested] 96+ messages in thread
* [dpdk-dev] [PATCH v3 6/6] net/mlx5: preserve indirect actions on restart
  2021-10-19 12:37   ` [dpdk-dev] [PATCH v3 0/6] Flow entites behavior on port restart Dmitry Kozlyuk
                       ` (4 preceding siblings ...)
  2021-10-19 12:37     ` [dpdk-dev] [PATCH v3 5/6] net/mlx5: create drop queue " Dmitry Kozlyuk
@ 2021-10-19 12:37     ` Dmitry Kozlyuk
  2021-10-20 10:12     ` [dpdk-dev] [PATCH v3 0/6] Flow entites behavior on port restart Andrew Rybchenko
  2021-10-21  6:34     ` [dpdk-dev] [PATCH v4 " Dmitry Kozlyuk
  7 siblings, 0 replies; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-10-19 12:37 UTC (permalink / raw)
  To: dev; +Cc: bingz, stable, Matan Azrad, Viacheslav Ovsiienko
MLX5 PMD uses reference counting to manage RX queue resources.
After port stop shared RSS actions kept references to RX queues,
preventing resource release. As a result, internal PMD mempool
for such queues had been exhausted after a number of port restarts.
Diagnostic message from rte_eth_dev_start():
    Rx queue allocation failed: Cannot allocate memory
Dereference RX queues used by indirect actions on port stop (detach)
and restore references on port start (attach) in order to allow RX queue
resource release, but keep indirect RSS across the port restart.
Replace queue IDs in HW by drop queue ID on detach and restore actual
queue IDs on attach.
When the port is stopped, create indirect RSS in the detached state.
As a result, MLX5 PMD is able to keep all its indirect actions
across port restart. Advertise this capability.
Fixes: 4b61b8774be9 ("ethdev: introduce indirect flow action")
Cc: bingz@nvidia.com
Cc: stable@dpdk.org
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/net/mlx5/mlx5_ethdev.c  |   1 +
 drivers/net/mlx5/mlx5_flow.c    | 194 ++++++++++++++++++++++++++++----
 drivers/net/mlx5/mlx5_flow.h    |   2 +
 drivers/net/mlx5/mlx5_rx.h      |   4 +
 drivers/net/mlx5/mlx5_rxq.c     |  99 ++++++++++++++--
 drivers/net/mlx5/mlx5_trigger.c |  10 ++
 6 files changed, 276 insertions(+), 34 deletions(-)
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 82e2284d98..419fec3e4e 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -321,6 +321,7 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
 	info->rx_offload_capa = (mlx5_get_rx_port_offloads() |
 				 info->rx_queue_offload_capa);
 	info->tx_offload_capa = mlx5_get_tx_port_offloads(dev);
+	info->dev_capa = RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP;
 	info->if_index = mlx5_ifindex(dev);
 	info->reta_size = priv->reta_idx_n ?
 		priv->reta_idx_n : config->ind_table_max_size;
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index bfc3e20c9a..c10b911259 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -1560,6 +1560,58 @@ mlx5_flow_validate_action_queue(const struct rte_flow_action *action,
 	return 0;
 }
 
+/**
+ * Validate queue numbers for device RSS.
+ *
+ * @param[in] dev
+ *   Configured device.
+ * @param[in] queues
+ *   Array of queue numbers.
+ * @param[in] queues_n
+ *   Size of the @p queues array.
+ * @param[out] error
+ *   On error, filled with a textual error description.
+ * @param[out] queue
+ *   On error, filled with an offending queue index in @p queues array.
+ *
+ * @return
+ *   0 on success, a negative errno code on error.
+ */
+static int
+mlx5_validate_rss_queues(const struct rte_eth_dev *dev,
+			 const uint16_t *queues, uint32_t queues_n,
+			 const char **error, uint32_t *queue_idx)
+{
+	const struct mlx5_priv *priv = dev->data->dev_private;
+	enum mlx5_rxq_type rxq_type = MLX5_RXQ_TYPE_UNDEFINED;
+	uint32_t i;
+
+	for (i = 0; i != queues_n; ++i) {
+		struct mlx5_rxq_ctrl *rxq_ctrl;
+
+		if (queues[i] >= priv->rxqs_n) {
+			*error = "queue index out of range";
+			*queue_idx = i;
+			return -EINVAL;
+		}
+		if (!(*priv->rxqs)[queues[i]]) {
+			*error =  "queue is not configured";
+			*queue_idx = i;
+			return -EINVAL;
+		}
+		rxq_ctrl = container_of((*priv->rxqs)[queues[i]],
+					struct mlx5_rxq_ctrl, rxq);
+		if (i == 0)
+			rxq_type = rxq_ctrl->type;
+		if (rxq_type != rxq_ctrl->type) {
+			*error = "combining hairpin and regular RSS queues is not supported";
+			*queue_idx = i;
+			return -ENOTSUP;
+		}
+	}
+	return 0;
+}
+
 /*
  * Validate the rss action.
  *
@@ -1580,8 +1632,9 @@ mlx5_validate_action_rss(struct rte_eth_dev *dev,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	const struct rte_flow_action_rss *rss = action->conf;
-	enum mlx5_rxq_type rxq_type = MLX5_RXQ_TYPE_UNDEFINED;
-	unsigned int i;
+	int ret;
+	const char *message;
+	uint32_t queue_idx;
 
 	if (rss->func != RTE_ETH_HASH_FUNCTION_DEFAULT &&
 	    rss->func != RTE_ETH_HASH_FUNCTION_TOEPLITZ)
@@ -1645,27 +1698,12 @@ mlx5_validate_action_rss(struct rte_eth_dev *dev,
 		return rte_flow_error_set(error, EINVAL,
 					  RTE_FLOW_ERROR_TYPE_ACTION_CONF,
 					  NULL, "No queues configured");
-	for (i = 0; i != rss->queue_num; ++i) {
-		struct mlx5_rxq_ctrl *rxq_ctrl;
-
-		if (rss->queue[i] >= priv->rxqs_n)
-			return rte_flow_error_set
-				(error, EINVAL,
-				 RTE_FLOW_ERROR_TYPE_ACTION_CONF,
-				 &rss->queue[i], "queue index out of range");
-		if (!(*priv->rxqs)[rss->queue[i]])
-			return rte_flow_error_set
-				(error, EINVAL, RTE_FLOW_ERROR_TYPE_ACTION_CONF,
-				 &rss->queue[i], "queue is not configured");
-		rxq_ctrl = container_of((*priv->rxqs)[rss->queue[i]],
-					struct mlx5_rxq_ctrl, rxq);
-		if (i == 0)
-			rxq_type = rxq_ctrl->type;
-		if (rxq_type != rxq_ctrl->type)
-			return rte_flow_error_set
-				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION_CONF,
-				 &rss->queue[i],
-				 "combining hairpin and regular RSS queues is not supported");
+	ret = mlx5_validate_rss_queues(dev, rss->queue, rss->queue_num,
+				       &message, &queue_idx);
+	if (ret != 0) {
+		return rte_flow_error_set(error, -ret,
+					  RTE_FLOW_ERROR_TYPE_ACTION_CONF,
+					  &rss->queue[queue_idx], message);
 	}
 	return 0;
 }
@@ -8547,6 +8585,116 @@ mlx5_action_handle_flush(struct rte_eth_dev *dev)
 	return ret;
 }
 
+/**
+ * Validate existing indirect actions against current device configuration
+ * and attach them to device resources.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_action_handle_attach(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_indexed_pool *ipool =
+			priv->sh->ipool[MLX5_IPOOL_RSS_SHARED_ACTIONS];
+	struct mlx5_shared_action_rss *shared_rss, *shared_rss_last;
+	int ret = 0;
+	uint32_t idx;
+
+	ILIST_FOREACH(ipool, priv->rss_shared_actions, idx, shared_rss, next) {
+		struct mlx5_ind_table_obj *ind_tbl = shared_rss->ind_tbl;
+		const char *message;
+		uint32_t queue_idx;
+
+		ret = mlx5_validate_rss_queues(dev, ind_tbl->queues,
+					       ind_tbl->queues_n,
+					       &message, &queue_idx);
+		if (ret != 0) {
+			DRV_LOG(ERR, "Port %u cannot use queue %u in RSS: %s",
+				dev->data->port_id, ind_tbl->queues[queue_idx],
+				message);
+			break;
+		}
+	}
+	if (ret != 0)
+		return ret;
+	ILIST_FOREACH(ipool, priv->rss_shared_actions, idx, shared_rss, next) {
+		struct mlx5_ind_table_obj *ind_tbl = shared_rss->ind_tbl;
+
+		ret = mlx5_ind_table_obj_attach(dev, ind_tbl);
+		if (ret != 0) {
+			DRV_LOG(ERR, "Port %u could not attach "
+				"indirection table obj %p",
+				dev->data->port_id, (void *)ind_tbl);
+			goto error;
+		}
+	}
+	return 0;
+error:
+	shared_rss_last = shared_rss;
+	ILIST_FOREACH(ipool, priv->rss_shared_actions, idx, shared_rss, next) {
+		struct mlx5_ind_table_obj *ind_tbl = shared_rss->ind_tbl;
+
+		if (shared_rss == shared_rss_last)
+			break;
+		if (mlx5_ind_table_obj_detach(dev, ind_tbl) != 0)
+			DRV_LOG(CRIT, "Port %u could not detach "
+				"indirection table obj %p on rollback",
+				dev->data->port_id, (void *)ind_tbl);
+	}
+	return ret;
+}
+
+/**
+ * Detach indirect actions of the device from its resources.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_action_handle_detach(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_indexed_pool *ipool =
+			priv->sh->ipool[MLX5_IPOOL_RSS_SHARED_ACTIONS];
+	struct mlx5_shared_action_rss *shared_rss, *shared_rss_last;
+	int ret = 0;
+	uint32_t idx;
+
+	ILIST_FOREACH(ipool, priv->rss_shared_actions, idx, shared_rss, next) {
+		struct mlx5_ind_table_obj *ind_tbl = shared_rss->ind_tbl;
+
+		ret = mlx5_ind_table_obj_detach(dev, ind_tbl);
+		if (ret != 0) {
+			DRV_LOG(ERR, "Port %u could not detach "
+				"indirection table obj %p",
+				dev->data->port_id, (void *)ind_tbl);
+			goto error;
+		}
+	}
+	return 0;
+error:
+	shared_rss_last = shared_rss;
+	ILIST_FOREACH(ipool, priv->rss_shared_actions, idx, shared_rss, next) {
+		struct mlx5_ind_table_obj *ind_tbl = shared_rss->ind_tbl;
+
+		if (shared_rss == shared_rss_last)
+			break;
+		if (mlx5_ind_table_obj_attach(dev, ind_tbl) != 0)
+			DRV_LOG(CRIT, "Port %u could not attach "
+				"indirection table obj %p on rollback",
+				dev->data->port_id, (void *)ind_tbl);
+	}
+	return ret;
+}
+
 #ifndef HAVE_MLX5DV_DR
 #define MLX5_DOMAIN_SYNC_FLOW ((1 << 0) | (1 << 1))
 #else
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 8f94125f26..6bc7946cc3 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1574,6 +1574,8 @@ void mlx5_flow_destroy_sub_policy_with_rxq(struct rte_eth_dev *dev,
 		struct mlx5_flow_meter_policy *mtr_policy);
 int mlx5_flow_dv_discover_counter_offset_support(struct rte_eth_dev *dev);
 int mlx5_flow_discover_dr_action_support(struct rte_eth_dev *dev);
+int mlx5_action_handle_attach(struct rte_eth_dev *dev);
+int mlx5_action_handle_detach(struct rte_eth_dev *dev);
 int mlx5_action_handle_flush(struct rte_eth_dev *dev);
 void mlx5_release_tunnel_hub(struct mlx5_dev_ctx_shared *sh, uint16_t port_id);
 int mlx5_alloc_tunnel_hub(struct mlx5_dev_ctx_shared *sh);
diff --git a/drivers/net/mlx5/mlx5_rx.h b/drivers/net/mlx5/mlx5_rx.h
index 2b7ad3e48b..d44c8078de 100644
--- a/drivers/net/mlx5/mlx5_rx.h
+++ b/drivers/net/mlx5/mlx5_rx.h
@@ -222,6 +222,10 @@ int mlx5_ind_table_obj_modify(struct rte_eth_dev *dev,
 			      struct mlx5_ind_table_obj *ind_tbl,
 			      uint16_t *queues, const uint32_t queues_n,
 			      bool standalone);
+int mlx5_ind_table_obj_attach(struct rte_eth_dev *dev,
+			      struct mlx5_ind_table_obj *ind_tbl);
+int mlx5_ind_table_obj_detach(struct rte_eth_dev *dev,
+			      struct mlx5_ind_table_obj *ind_tbl);
 struct mlx5_list_entry *mlx5_hrxq_create_cb(void *tool_ctx, void *cb_ctx);
 int mlx5_hrxq_match_cb(void *tool_ctx, struct mlx5_list_entry *entry,
 		       void *cb_ctx);
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index b68443bed5..fd2b5779ff 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -2015,6 +2015,26 @@ mlx5_ind_table_obj_new(struct rte_eth_dev *dev, const uint16_t *queues,
 	return ind_tbl;
 }
 
+static int
+mlx5_ind_table_obj_check_standalone(struct rte_eth_dev *dev __rte_unused,
+				    struct mlx5_ind_table_obj *ind_tbl)
+{
+	uint32_t refcnt;
+
+	refcnt = __atomic_load_n(&ind_tbl->refcnt, __ATOMIC_RELAXED);
+	if (refcnt <= 1)
+		return 0;
+	/*
+	 * Modification of indirection tables having more than 1
+	 * reference is unsupported.
+	 */
+	DRV_LOG(DEBUG,
+		"Port %u cannot modify indirection table %p (refcnt %u > 1).",
+		dev->data->port_id, (void *)ind_tbl, refcnt);
+	rte_errno = EINVAL;
+	return -rte_errno;
+}
+
 /**
  * Modify an indirection table.
  *
@@ -2047,18 +2067,8 @@ mlx5_ind_table_obj_modify(struct rte_eth_dev *dev,
 
 	MLX5_ASSERT(standalone);
 	RTE_SET_USED(standalone);
-	if (__atomic_load_n(&ind_tbl->refcnt, __ATOMIC_RELAXED) > 1) {
-		/*
-		 * Modification of indirection ntables having more than 1
-		 * reference unsupported. Intended for standalone indirection
-		 * tables only.
-		 */
-		DRV_LOG(DEBUG,
-			"Port %u cannot modify indirection table (refcnt> 1).",
-			dev->data->port_id);
-		rte_errno = EINVAL;
+	if (mlx5_ind_table_obj_check_standalone(dev, ind_tbl) < 0)
 		return -rte_errno;
-	}
 	for (i = 0; i != queues_n; ++i) {
 		if (!mlx5_rxq_get(dev, queues[i])) {
 			ret = -rte_errno;
@@ -2084,6 +2094,73 @@ mlx5_ind_table_obj_modify(struct rte_eth_dev *dev,
 	return ret;
 }
 
+/**
+ * Attach an indirection table to its queues.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param ind_table
+ *   Indirection table to attach.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_ind_table_obj_attach(struct rte_eth_dev *dev,
+			  struct mlx5_ind_table_obj *ind_tbl)
+{
+	unsigned int i;
+	int ret;
+
+	ret = mlx5_ind_table_obj_modify(dev, ind_tbl, ind_tbl->queues,
+					ind_tbl->queues_n, true);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Port %u could not modify indirect table obj %p",
+			dev->data->port_id, (void *)ind_tbl);
+		return ret;
+	}
+	for (i = 0; i < ind_tbl->queues_n; i++)
+		mlx5_rxq_get(dev, ind_tbl->queues[i]);
+	return 0;
+}
+
+/**
+ * Detach an indirection table from its queues.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param ind_table
+ *   Indirection table to detach.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_ind_table_obj_detach(struct rte_eth_dev *dev,
+			  struct mlx5_ind_table_obj *ind_tbl)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const unsigned int n = rte_is_power_of_2(ind_tbl->queues_n) ?
+			       log2above(ind_tbl->queues_n) :
+			       log2above(priv->config.ind_table_max_size);
+	unsigned int i;
+	int ret;
+
+	ret = mlx5_ind_table_obj_check_standalone(dev, ind_tbl);
+	if (ret != 0)
+		return ret;
+	MLX5_ASSERT(priv->obj_ops.ind_table_modify);
+	ret = priv->obj_ops.ind_table_modify(dev, n, NULL, 0, ind_tbl);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Port %u could not modify indirect table obj %p",
+			dev->data->port_id, (void *)ind_tbl);
+		return ret;
+	}
+	for (i = 0; i < ind_tbl->queues_n; i++)
+		mlx5_rxq_release(dev, ind_tbl->queues[i]);
+	return ret;
+}
+
 int
 mlx5_hrxq_match_cb(void *tool_ctx __rte_unused, struct mlx5_list_entry *entry,
 		   void *cb_ctx)
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 54173bfacb..c3adf5082e 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -14,6 +14,7 @@
 #include <mlx5_malloc.h>
 
 #include "mlx5.h"
+#include "mlx5_flow.h"
 #include "mlx5_mr.h"
 #include "mlx5_rx.h"
 #include "mlx5_tx.h"
@@ -1113,6 +1114,14 @@ mlx5_dev_start(struct rte_eth_dev *dev)
 	mlx5_rxq_timestamp_set(dev);
 	/* Set a mask and offset of scheduling on timestamp into Tx queues. */
 	mlx5_txq_dynf_timestamp_set(dev);
+	/* Attach indirection table objects detached on port stop. */
+	ret = mlx5_action_handle_attach(dev);
+	if (ret) {
+		DRV_LOG(ERR,
+			"port %u failed to attach indirect actions: %s",
+			dev->data->port_id, rte_strerror(rte_errno));
+		goto error;
+	}
 	/*
 	 * In non-cached mode, it only needs to start the default mreg copy
 	 * action and no flow created by application exists anymore.
@@ -1185,6 +1194,7 @@ mlx5_dev_stop(struct rte_eth_dev *dev)
 	/* All RX queue flags will be cleared in the flush interface. */
 	mlx5_flow_list_flush(dev, MLX5_FLOW_TYPE_GEN, true);
 	mlx5_flow_meter_rxq_flush(dev);
+	mlx5_action_handle_detach(dev);
 	mlx5_rx_intr_vec_disable(dev);
 	priv->sh->port[priv->dev_port - 1].ih_port_id = RTE_MAX_ETHPORTS;
 	priv->sh->port[priv->dev_port - 1].devx_ih_port_id = RTE_MAX_ETHPORTS;
-- 
2.25.1
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/5] ethdev: add capability to keep flow rules on restart
  2021-10-18  8:56     ` Andrew Rybchenko
@ 2021-10-19 12:38       ` Dmitry Kozlyuk
  0 siblings, 0 replies; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-10-19 12:38 UTC (permalink / raw)
  To: Andrew Rybchenko, dev; +Cc: Ori Kam, NBU-Contact-Thomas Monjalon, Ferruh Yigit
> > [...]
> > Add a device capability bit for PMDs that can keep at least some
> > of the flow rules across restart. Without this capability behavior
> > is still unspecified, which is now explicitly stated.
> > Declare that the application can test for persitence of flow rules
> > of a particular kind by attempting to create a rule of that kind
> > when the device is stopped and checking for the specific error.
> 
> stopped -> configured but not yet started
Correct, fixed in v3.
> > diff --git a/doc/guides/prog_guide/rte_flow.rst
> b/doc/guides/prog_guide/rte_flow.rst
> > index 2b42d5ec8c..b0ced4209b 100644
> > --- a/doc/guides/prog_guide/rte_flow.rst
> > +++ b/doc/guides/prog_guide/rte_flow.rst
> > @@ -87,6 +87,33 @@ To avoid resource leaks on the PMD side, handles must
> be explicitly
> >  destroyed by the application before releasing associated resources such
> as
> >  queues and ports.
> >
> > +By default it is unspecified if the flow rules persist after the device
> stop.
> 
> or can be created before the first device start
Correct, fixed in v3.
> 
> > +If ``RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP`` is not advertised,
> > +then rules must be explicitly flushed before stopping the device
> > +if the application needs to ensure they are removed.
> > +If it is advertised, this means the PMD can keep at least some rules
> > +across the device stop and start with possible reconfiguration in
> between.
> > +However, it may be only supported for some kinds of rules.
> > +The kind is a combination of the following rule properties:
> > +
> > +- the sequence of item types;
> > +- the sequence of action types;
> > +- the value of the transfer attribute.
> > +
> > +To test if a particular kind of rules is kept, the application must try
> > +to create a valid rule of that kind when the device is stopped
> > +(after it has been configured or started previously).
> > +If it succeeds, all rules of the same kind are kept at the device stop.
> > +If it fails with an error of type ``RTE_FLOW_ERROR_TYPE_STATE``,
> > +rules of this kind are flushed when the device is stopped.
> > +Rules of a kept kind that are created when the device is stopped,
> including
> > +the rules created for the test, will be kept after the device is
> started.
> 
> It must be defined what application should expect for
> not tested rule kinds.
> 
> For me about check sounds extremely complicated and hardly
> doable. Yes, some applications know kinds of rule it would
> like to create, but some, like OvS, do not. Please, correct
> me if I'm wrong. OvS knows which types of actions and even
> possible combinations of actions (harder, but still possible)
> it would like to install. But all possible combinations of
> items together with all possible combinations of actions
> could be very-very big.
> 
> May be I still misunderstand the above idea.
This is a very valid concern.
After an offline consideration me and Ori concluded
that an item/action type + transfer bit ("a feature") is enough.
That is, if some feature cannot be kept, no rules using it can be kept.
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH v3 2/6] ethdev: add capability to keep shared objects on restart
  2021-10-19 12:37     ` [dpdk-dev] [PATCH v3 2/6] ethdev: add capability to keep shared objects " Dmitry Kozlyuk
@ 2021-10-19 15:22       ` Ori Kam
  0 siblings, 0 replies; 96+ messages in thread
From: Ori Kam @ 2021-10-19 15:22 UTC (permalink / raw)
  To: Dmitry Kozlyuk, dev
  Cc: NBU-Contact-Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
> -----Original Message-----
> From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> Sent: Tuesday, October 19, 2021 3:37 PM
> To: dev@dpdk.org
> Cc: Ori Kam <orika@nvidia.com>; NBU-Contact-Thomas Monjalon <thomas@monjalon.net>; Ferruh
> Yigit <ferruh.yigit@intel.com>; Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Subject: [PATCH v3 2/6] ethdev: add capability to keep shared objects on restart
> 
> rte_flow_action_handle_create() did not mention what happens with an indirect action when a device
> is stopped and started again.
> It is natural for some indirect actions, like counter, to be persistent.
> Keeping others at least saves application time and complexity.
> However, not all PMDs can support it, or the support may be limited by particular action kinds, that is,
> combinations of action type and the value of the transfer bit in its configuration.
> 
> Add a device capability to indicate if at least some indirect actions are kept across the above sequence.
> Without this capability the behavior is still unspecified, and application is required to destroy the
> indirect actions before stopping the device.
> In the future, indirect actions may not be the only type of objects shared between flow rules. The
> capability bit intends to cover all possible types of such objects, hence its name.
> 
> Declare that the application can test for the persistence of a particular indirect action kind by
> attempting to create an indirect action of that kind when the device is stopped and checking for the
> specific error type.
> This is logical because if the PMD can to create an indirect action when the device is not started and
> use it after the start happens, it is natural that it can move its internal flow shared object to the same
> state when the device is stopped and restore the state when the device is started.
> 
> Indirect action persistence across a reconfigurations is not required.
> In case a PMD cannot keep the indirect actions across reconfiguration, it is allowed just to report an
> error.
> Application must then flush the indirect actions before attempting it.
> 
> Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> ---
>  doc/guides/prog_guide/rte_flow.rst | 24 ++++++++++++++++++++++++
>  lib/ethdev/rte_ethdev.h            |  3 +++
>  2 files changed, 27 insertions(+)
> 
> diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
> index ff67b211e3..19e17f453d 100644
> --- a/doc/guides/prog_guide/rte_flow.rst
> +++ b/doc/guides/prog_guide/rte_flow.rst
> @@ -2810,6 +2810,30 @@ updated depend on the type of the ``action`` and different for every type.
>  The indirect action specified data (e.g. counter) can be queried by
> ``rte_flow_action_handle_query()``.
> 
> +If ``RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP`` is not advertised,
> +indirect actions cannot be created until the device is started for the
> +first time and cannot be kept when the device is stopped.
> +However, PMD also does not flush them automatically on stop, so the
> +application must call ``rte_flow_action_handle_destroy()``
> +before stopping the device to ensure no indirect actions remain.
> +
> +If ``RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP`` is advertised, this
> +means that the PMD can keep at least some indirect actions across
> +device stop and start.
> +However, ``rte_eth_dev_configure()`` may fail if any indirect actions
> +remain, so the application must destroy them before attempting a reconfiguration.
> +Keeping may be only supported for certain kinds of indirect actions.
> +A kind is a combination of an action type and a value of its transfer bit.
> +To test if a particular kind of indirect actions is kept, the
> +application must try to create a valid indirect action of that kind
> +when the device is stopped (after it has been configured or started previously).
> +If it fails with an error of type ``RTE_FLOW_ERROR_TYPE_STATE``,
> +indirect actions of this kind are flushed when the device is stopped.
> +If it succeeds, all indirect actions of the same kind are kept when the
> +device is stopped.
> +Indirect actions of a kept kind that are created when the device is
> +stopped, including the ones created for the test, will be kept after the device start.
> +
>  .. _table_rte_flow_action_handle:
> 
>  .. table:: INDIRECT
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index a0b388bb25..12fc7262eb 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -94,6 +94,7 @@
>   * depending on the device capabilities:
>   *
>   *     - flow rules
> + *     - flow-related shared objects, e.g. indirect actions
>   *
>   * Any other configuration will not be stored and will need to be re-entered
>   * before a call to rte_eth_dev_start().
> @@ -1452,6 +1453,8 @@ struct rte_eth_conf {  #define
> RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
>  /** Device supports keeping flow rules across restart. */  #define
> RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP 0x00000004
> +/** Device supports keeping shared flow objects across restart. */
> +#define RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP 0x00000008
>  /**@}*/
> 
>  /*
> --
> 2.25.1
Acked-by: Ori Kam <orika@nvidia.com>
Best,
Ori
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH v3 1/6] ethdev: add capability to keep flow rules on restart
  2021-10-19 12:37     ` [dpdk-dev] [PATCH v3 1/6] ethdev: add capability to keep flow rules on restart Dmitry Kozlyuk
@ 2021-10-19 15:22       ` Ori Kam
  2021-10-19 16:38       ` Ferruh Yigit
  2021-10-20 10:39       ` Andrew Rybchenko
  2 siblings, 0 replies; 96+ messages in thread
From: Ori Kam @ 2021-10-19 15:22 UTC (permalink / raw)
  To: Dmitry Kozlyuk, dev
  Cc: Qi Zhang, Ori Kam, NBU-Contact-Thomas Monjalon, Ferruh Yigit,
	Andrew Rybchenko
Hi Dmitry,
> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Dmitry Kozlyuk
> Sent: Tuesday, October 19, 2021 3:37 PM
> To: dev@dpdk.org
> Cc: Qi Zhang <qi.z.zhang@intel.com>; Ori Kam <orika@oss.nvidia.com>; NBU-Contact-Thomas
> Monjalon <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>
> Subject: [dpdk-dev] [PATCH v3 1/6] ethdev: add capability to keep flow rules on restart
> 
> Previously, it was not specified what happens to the flow rules when the device is stopped, possibly
> reconfigured, then started.
> If flow rules were kept, it could be convenient for application developers, because they wouldn't need
> to save and restore them.
> However, due to the number of flows and possible creation rate it is impractical to save all flow rules in
> DPDK layer. This means that flow rules persistence really depends on whether PMD and HW can
> implement it efficiently. It can also be limited by the rule item and action types, and its attributes
> transfer bit (a combination of an item/action type and a value of the transfer bit is called a ruel
> feature).
> 
> Add a device capability bit for PMDs that can keep at least some of the flow rules across restart.
> Without this capability behavior is still unspecified and it is declared that the application must flush the
> rules before stopping the device.
> Allow the application to test for persitence of rules using a particular feature by attempting to create a
> flow rule using that feature when the device is stopped and checking for the specific error.
> This is logical because if the PMD can to create the flow rule when the device is not started and use it
> after the start happens, it is natural that it can move its internal flow rule object to the same state
> when the device is stopped and restore the state when the device is started.
> 
> Rule persistence across a reconfigurations is not required, because tracking all the rules and
> configuration-dependent resources they use may be infeasible. In case a PMD cannot keep the rules
> across reconfiguration, it is allowed just to report an error.
> Application must then flush the rules before attempting it.
> 
> Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> ---
>  doc/guides/prog_guide/rte_flow.rst | 25 +++++++++++++++++++++++++
>  lib/ethdev/rte_ethdev.h            |  7 +++++++
>  lib/ethdev/rte_flow.h              |  1 +
>  3 files changed, 33 insertions(+)
> 
> diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
> index 2b42d5ec8c..ff67b211e3 100644
> --- a/doc/guides/prog_guide/rte_flow.rst
> +++ b/doc/guides/prog_guide/rte_flow.rst
> @@ -87,6 +87,31 @@ To avoid resource leaks on the PMD side, handles must be explicitly  destroyed
> by the application before releasing associated resources such as  queues and ports.
> 
> +If ``RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP`` is not advertised, rules cannot
> +be created until the device is started for the first time and cannot be
> +kept when the device is stopped.
> +However, PMD also does not flush them automatically on stop, so the
> +application must call ``rte_flow_flush()`` or ``rte_flow_destroy()``
> +before stopping the device to ensure no rules remain.
> +
> +If ``RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP`` is advertised, this means the
> +PMD can keep at least some rules across the device stop and start.
> +However, ``rte_eth_dev_configure()`` may fail if any rules remain, so
> +the application must flush them before attempting a reconfiguration.
> +Keeping may be unsupported for some types of rule items and actions, as
> +well as depending on the value of flow attributes transfer bit.
> +A combination of an item or action type and a value of the transfer bit
> +is called a rule feature.
> +To test if rules with a particular feature are kept, the application
> +must try to create a valid rule using this feature when the device is
> +stopped (after it has been configured or started previously).
> +If it fails with an error of type ``RTE_FLOW_ERROR_TYPE_STATE``, rules
> +using this feature are flushed when the device is stopped.
> +If it suceeds, such rules will be kept when the device is stopped,
> +provided they do not use other features that are not supported.
> +Rules that are created when the device is stopped, including the rules
> +created for the test, will be kept after the device is started.
> +
>  The following sections cover:
> 
>  - **Attributes** (represented by ``struct rte_flow_attr``): properties of a diff --git
> a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index 6d80514ba7..a0b388bb25 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -90,6 +90,11 @@
>   *     - flow director filtering mode (but not filtering rules)
>   *     - NIC queue statistics mappings
>   *
> + * The following configuration may be retained or not
> + * depending on the device capabilities:
> + *
> + *     - flow rules
> + *
>   * Any other configuration will not be stored and will need to be re-entered
>   * before a call to rte_eth_dev_start().
>   *
> @@ -1445,6 +1450,8 @@ struct rte_eth_conf {  #define
> RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
>  /** Device supports Tx queue setup after device started. */  #define
> RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
> +/** Device supports keeping flow rules across restart. */ #define
> +RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP 0x00000004
>  /**@}*/
> 
>  /*
> diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h index a89945061a..aa0182d021 100644
> --- a/lib/ethdev/rte_flow.h
> +++ b/lib/ethdev/rte_flow.h
> @@ -3344,6 +3344,7 @@ enum rte_flow_error_type {
>  	RTE_FLOW_ERROR_TYPE_ACTION_NUM, /**< Number of actions. */
>  	RTE_FLOW_ERROR_TYPE_ACTION_CONF, /**< Action configuration. */
>  	RTE_FLOW_ERROR_TYPE_ACTION, /**< Specific action. */
> +	RTE_FLOW_ERROR_TYPE_STATE, /**< Current device state. */
>  };
> 
>  /**
> --
> 2.25.1
Acked-by: Ori Kam <orika@nvidia.com>
Best,
Ori
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH v3 1/6] ethdev: add capability to keep flow rules on restart
  2021-10-19 12:37     ` [dpdk-dev] [PATCH v3 1/6] ethdev: add capability to keep flow rules on restart Dmitry Kozlyuk
  2021-10-19 15:22       ` Ori Kam
@ 2021-10-19 16:38       ` Ferruh Yigit
  2021-10-19 17:13         ` Dmitry Kozlyuk
  2021-10-20 10:39       ` Andrew Rybchenko
  2 siblings, 1 reply; 96+ messages in thread
From: Ferruh Yigit @ 2021-10-19 16:38 UTC (permalink / raw)
  To: Dmitry Kozlyuk, dev, Qi Zhang; +Cc: Ori Kam, Thomas Monjalon, Andrew Rybchenko
On 10/19/2021 1:37 PM, Dmitry Kozlyuk wrote:
> diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
> index 2b42d5ec8c..ff67b211e3 100644
> --- a/doc/guides/prog_guide/rte_flow.rst
> +++ b/doc/guides/prog_guide/rte_flow.rst
> @@ -87,6 +87,31 @@ To avoid resource leaks on the PMD side, handles must be explicitly
>   destroyed by the application before releasing associated resources such as
>   queues and ports.
>   
> +If ``RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP`` is not advertised,
> +rules cannot be created until the device is started for the first time
> +and cannot be kept when the device is stopped.
So flag means two things:
1) rules cannot be created until the device is started for the first time
2) rules cannot be kept when the device is stopped
Can't be a case one is true but other is not? I was thinking flag is
only for (2).
> +However, PMD also does not flush them automatically on stop,
> +so the application must call ``rte_flow_flush()`` or ``rte_flow_destroy()``
> +before stopping the device to ensure no rules remain.
> +
> +If ``RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP`` is advertised, this means
> +the PMD can keep at least some rules across the device stop and start.
> +However, ``rte_eth_dev_configure()`` may fail if any rules remain,
> +so the application must flush them before attempting a reconfiguration.
If there are any remaining rules, should we fail the ``rte_eth_dev_configure()``,
or is it allowed PMD to flush the rules itself?
As far as I know some Intel PMDs flush remaining rules in configure itself
without failing, @Qi can correct me if I am wrong.
> +Keeping may be unsupported for some types of rule items and actions,
> +as well as depending on the value of flow attributes transfer bit.
> +A combination of an item or action type and a value of the transfer bit
> +is called a rule feature.
> +To test if rules with a particular feature are kept, the application must try
> +to create a valid rule using this feature when the device is stopped
> +(after it has been configured or started previously).
> +If it fails with an error of type ``RTE_FLOW_ERROR_TYPE_STATE``,
> +rules using this feature are flushed when the device is stopped.
> +If it suceeds, such rules will be kept when the device is stopped,
> +provided they do not use other features that are not supported.
> +Rules that are created when the device is stopped, including the rules
> +created for the test, will be kept after the device is started.
> +
I understand the intention, but I don't know if this is true for all devices.
Can't there be a case that driver can't create rule when it is stopped,
but it can keep the rules after stop. Or other-way around, driver can
create rule when it is stopped, but can't keep rule after stop.
I am feeling we are missing comments from different vendors if this logic
works for them.
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH v3 1/6] ethdev: add capability to keep flow rules on restart
  2021-10-19 16:38       ` Ferruh Yigit
@ 2021-10-19 17:13         ` Dmitry Kozlyuk
  0 siblings, 0 replies; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-10-19 17:13 UTC (permalink / raw)
  To: Ferruh Yigit, dev, Qi Zhang
  Cc: Ori Kam, NBU-Contact-Thomas Monjalon, Andrew Rybchenko
> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@intel.com>
> Sent: 19 октября 2021 г. 19:39
> To: Dmitry Kozlyuk <dkozlyuk@nvidia.com>; dev@dpdk.org; Qi Zhang
> <qi.z.zhang@intel.com>
> Cc: Ori Kam <orika@nvidia.com>; NBU-Contact-Thomas Monjalon
> <thomas@monjalon.net>; Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Subject: Re: [PATCH v3 1/6] ethdev: add capability to keep flow rules on
> restart
> 
> External email: Use caution opening links or attachments
> 
> 
> On 10/19/2021 1:37 PM, Dmitry Kozlyuk wrote:
> > diff --git a/doc/guides/prog_guide/rte_flow.rst
> b/doc/guides/prog_guide/rte_flow.rst
> > index 2b42d5ec8c..ff67b211e3 100644
> > --- a/doc/guides/prog_guide/rte_flow.rst
> > +++ b/doc/guides/prog_guide/rte_flow.rst
> > @@ -87,6 +87,31 @@ To avoid resource leaks on the PMD side, handles must
> be explicitly
> >   destroyed by the application before releasing associated resources
> such as
> >   queues and ports.
> >
> > +If ``RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP`` is not advertised,
> > +rules cannot be created until the device is started for the first time
> > +and cannot be kept when the device is stopped.
> 
> So flag means two things:
> 1) rules cannot be created until the device is started for the first time
> 2) rules cannot be kept when the device is stopped
> 
> Can't be a case one is true but other is not? I was thinking flag is
> only for (2).
It theoretically can, but it doesn't seem feasible
to separate these capabilities:
a) Suppose a PMD can create rules before the device is started.
They are in some special state when they are not applied to the traffic.
When the device is started, these rules begin being applied.
When the device is stopped, what would make the PMD unable to move
the rules back to the detached state they were before the start?
And then attach them back?
b) Suppose a PMD can keep the rules between stop and start.
It must be able to move them to the detached stated described above
on the device stop and attach them back when it is started again.
What would prevent the PMD to create rules in a detached state
before the device is started for the first time?
That's what I had in mind before,
and now it is just stated explicitly per Andrew's suggestion:
https://inbox.dpdk.org/dev/5ec7101f-169e-cbd0-87bb-810b7476c7d0@oktetlabs.ru
> > +However, PMD also does not flush them automatically on stop,
> > +so the application must call ``rte_flow_flush()`` or
> ``rte_flow_destroy()``
> > +before stopping the device to ensure no rules remain.
> > +
> > +If ``RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP`` is advertised, this means
> > +the PMD can keep at least some rules across the device stop and start.
> > +However, ``rte_eth_dev_configure()`` may fail if any rules remain,
> > +so the application must flush them before attempting a reconfiguration.
> 
> If there are any remaining rules, should we fail the
> ``rte_eth_dev_configure()``,
> or is it allowed PMD to flush the rules itself?
>
> As far as I know some Intel PMDs flush remaining rules in configure itself
> without failing, @Qi can correct me if I am wrong.
Implicit flush is non-orthogonal API,
which only makes sense if it gives some performance benefit.
> > +Keeping may be unsupported for some types of rule items and actions,
> > +as well as depending on the value of flow attributes transfer bit.
> > +A combination of an item or action type and a value of the transfer bit
> > +is called a rule feature.
> > +To test if rules with a particular feature are kept, the application
> must try
> > +to create a valid rule using this feature when the device is stopped
> > +(after it has been configured or started previously).
> > +If it fails with an error of type ``RTE_FLOW_ERROR_TYPE_STATE``,
> > +rules using this feature are flushed when the device is stopped.
> > +If it suceeds, such rules will be kept when the device is stopped,
> > +provided they do not use other features that are not supported.
> > +Rules that are created when the device is stopped, including the rules
> > +created for the test, will be kept after the device is started.
> > +
> 
> I understand the intention, but I don't know if this is true for all
> devices.
> Can't there be a case that driver can't create rule when it is stopped,
> but it can keep the rules after stop. Or other-way around, driver can
> create rule when it is stopped, but can't keep rule after stop.
Isn't it the same consideration as the first comment?
If so, it's about the ability to have a rule in a detached state
given its features.
> I am feeling we are missing comments from different vendors if this logic
> works for them.
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH v3 3/6] net: advertise no support for keeping flow rules
  2021-10-19 12:37     ` [dpdk-dev] [PATCH v3 3/6] net: advertise no support for keeping flow rules Dmitry Kozlyuk
@ 2021-10-20 10:08       ` Andrew Rybchenko
  2021-10-20 22:20         ` Dmitry Kozlyuk
  0 siblings, 1 reply; 96+ messages in thread
From: Andrew Rybchenko @ 2021-10-20 10:08 UTC (permalink / raw)
  To: Dmitry Kozlyuk, dev
  Cc: Ferruh Yigit, Ajit Khaparde, Somnath Kotur, Nithin Dabilpuram,
	Kiran Kumar K, Sunil Kumar Kori, Satha Rao, Rahul Lakkireddy,
	Hemant Agrawal, Sachin Saxena, Haiyue Wang, John Daley,
	Hyong Youb Kim, Gaetan Rivet, Ziyang Xuan, Xiaoyun Wang,
	Guoyang Zhou, Min Hu (Connor),
	Yisen Zhuang, Lijun Ou, Beilei Xing, Jingjing Wu, Qiming Yang,
	Qi Zhang, Rosen Xu, Liron Himi, Jerin Jacob, Rasesh Mody,
	Devendra Singh Rawat, Jasvinder Singh, Cristian Dumitrescu,
	Keith Wiles, Jiawen Wu, Jian Wang
On 10/19/21 3:37 PM, Dmitry Kozlyuk wrote:
> When RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP capability bit is zero,
> the specified behavior is the same as it had been before
> this bit was introduced. Explicitly reset it in all PMDs
> supporting rte_flow API in order to attract the attention
> of maintainers, who should eventually choose to advertise
> the new capability or not. It is already known that
> mlx4 and mlx5 will not support this capability.
> 
> For RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP
> similar action is not performed,
> because no PMD except mlx5 supports indirect actions.
> Any PMD that starts doing so will anyway have to consider
> all relevant API, including this capability.
> 
> Suggested-by: Ferruh Yigit <ferruh.yigit@intel.com>
> Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
[snip]
> diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
> index aa7e7fdc85..1a6e0128ff 100644
> --- a/drivers/net/bnxt/bnxt_ethdev.c
> +++ b/drivers/net/bnxt/bnxt_ethdev.c
> @@ -1009,6 +1009,7 @@ static int bnxt_dev_info_get_op(struct rte_eth_dev *eth_dev,
>  	dev_info->speed_capa = bnxt_get_speed_capabilities(bp);
>  	dev_info->dev_capa = RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP |
>  			     RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP;
> +	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
Sorry, but here and everywhere below I see no point to cleanup
the bit explicitly when it is not actually set.
[snip]
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH v3 0/6] Flow entites behavior on port restart
  2021-10-19 12:37   ` [dpdk-dev] [PATCH v3 0/6] Flow entites behavior on port restart Dmitry Kozlyuk
                       ` (5 preceding siblings ...)
  2021-10-19 12:37     ` [dpdk-dev] [PATCH v3 6/6] net/mlx5: preserve indirect actions on restart Dmitry Kozlyuk
@ 2021-10-20 10:12     ` Andrew Rybchenko
  2021-10-20 13:21       ` Dmitry Kozlyuk
  2021-10-21  6:34     ` [dpdk-dev] [PATCH v4 " Dmitry Kozlyuk
  7 siblings, 1 reply; 96+ messages in thread
From: Andrew Rybchenko @ 2021-10-20 10:12 UTC (permalink / raw)
  To: Dmitry Kozlyuk, dev
On 10/19/21 3:37 PM, Dmitry Kozlyuk wrote:
> It is unspecified whether flow rules and indirect actions are kept
> when a port is stopped, possibly reconfigured, and started again.
> Vendors approach the topic differently, e.g. mlx5 and i40e PMD
> disagree in whether flow rules can be kept, and mlx5 PMD would keep
> indirect actions. In the end, applications are greatly affected
> by whatever contract there is and need to know it.
> 
> It is proposed to advertise capabilities of keeping flow rules
> and indirect actions (as a special case of shared object)
> using a combination of ethdev info and rte_flow calls.
> Then a bug is fixed in mlx5 PMD that prevented indirect RSS action
> from being kept, and the driver starts advertising the new capability.
> 
> Prior discussions:
> 1) http://inbox.dpdk.org/dev/20210727073121.895620-1-dkozlyuk@nvidia.com/
> 2) http://inbox.dpdk.org/dev/20210901085516.3647814-1-dkozlyuk@nvidia.com/
Is there real usecase for keeping flow rules or indirect
actions?
Why does application want to restart port?
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH v3 1/6] ethdev: add capability to keep flow rules on restart
  2021-10-19 12:37     ` [dpdk-dev] [PATCH v3 1/6] ethdev: add capability to keep flow rules on restart Dmitry Kozlyuk
  2021-10-19 15:22       ` Ori Kam
  2021-10-19 16:38       ` Ferruh Yigit
@ 2021-10-20 10:39       ` Andrew Rybchenko
  2021-10-20 11:40         ` Dmitry Kozlyuk
  2 siblings, 1 reply; 96+ messages in thread
From: Andrew Rybchenko @ 2021-10-20 10:39 UTC (permalink / raw)
  To: Dmitry Kozlyuk, dev; +Cc: Qi Zhang, Ori Kam, Thomas Monjalon, Ferruh Yigit
On 10/19/21 3:37 PM, Dmitry Kozlyuk wrote:
> Previously, it was not specified what happens to the flow rules
> when the device is stopped, possibly reconfigured, then started.
> If flow rules were kept, it could be convenient for application
> developers, because they wouldn't need to save and restore them.
> However, due to the number of flows and possible creation rate it is
> impractical to save all flow rules in DPDK layer. This means that flow
> rules persistence really depends on whether PMD and HW can implement it
> efficiently. It can also be limited by the rule item and action types,
> and its attributes transfer bit (a combination of an item/action type
> and a value of the transfer bit is called a ruel feature).
> 
> Add a device capability bit for PMDs that can keep at least some
> of the flow rules across restart. Without this capability behavior
> is still unspecified and it is declared that the application must
> flush the rules before stopping the device.
> Allow the application to test for persitence of rules using
persitence -> persistence
> a particular feature by attempting to create a flow rule
> using that feature when the device is stopped
> and checking for the specific error.
> This is logical because if the PMD can to create the flow rule
> when the device is not started and use it after the start happens,
> it is natural that it can move its internal flow rule object
> to the same state when the device is stopped and restore the state
> when the device is started.
> 
> Rule persistence across a reconfigurations is not required,
> because tracking all the rules and configuration-dependent resources
> they use may be infeasible. In case a PMD cannot keep the rules
> across reconfiguration, it is allowed just to report an error.
> Application must then flush the rules before attempting it.
> 
> Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> ---
>  doc/guides/prog_guide/rte_flow.rst | 25 +++++++++++++++++++++++++
>  lib/ethdev/rte_ethdev.h            |  7 +++++++
>  lib/ethdev/rte_flow.h              |  1 +
>  3 files changed, 33 insertions(+)
> 
> diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
> index 2b42d5ec8c..ff67b211e3 100644
> --- a/doc/guides/prog_guide/rte_flow.rst
> +++ b/doc/guides/prog_guide/rte_flow.rst
> @@ -87,6 +87,31 @@ To avoid resource leaks on the PMD side, handles must be explicitly
>  destroyed by the application before releasing associated resources such as
>  queues and ports.
>  
> +If ``RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP`` is not advertised,
> +rules cannot be created until the device is started for the first time
> +and cannot be kept when the device is stopped.
> +However, PMD also does not flush them automatically on stop,
> +so the application must call ``rte_flow_flush()`` or ``rte_flow_destroy()``
> +before stopping the device to ensure no rules remain.
> +
> +If ``RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP`` is advertised, this means
> +the PMD can keep at least some rules across the device stop and start.
> +However, ``rte_eth_dev_configure()`` may fail if any rules remain,
> +so the application must flush them before attempting a reconfiguration.
> +Keeping may be unsupported for some types of rule items and actions,
> +as well as depending on the value of flow attributes transfer bit.
> +A combination of an item or action type and a value of the transfer bit
> +is called a rule feature.
As I said before a combination is very hard to test and
unfriendly to applications. Do we really need to make it
that complex?
Which PMDs are going to support it? Which cases will really
be distinguished and will have different support (keep or not)?
> +To test if rules with a particular feature are kept, the application must try
> +to create a valid rule using this feature when the device is stopped
> +(after it has been configured or started previously).
Sorry, it hardly makes sense. Does it suggest an application
to:
 1. configure
 2. start
 3. stop
 4. check/create flow rules
 5. start again
as a regular start sequence instead of just configure+start.
IMHO, it must be possible to check just after configure without
start. Otherwise it looks really bad.
> +If it fails with an error of type ``RTE_FLOW_ERROR_TYPE_STATE``,
> +rules using this feature are flushed when the device is stopped.
Which entity does flush it?
> +If it suceeds, such rules will be kept when the device is stopped,
suceeds -> succeeds
kept and functional? I.e. transfer rules still route traffic to
other ports which could be up and running.
> +provided they do not use other features that are not supported.
> +Rules that are created when the device is stopped, including the rules
> +created for the test, will be kept after the device is started.
> +
>  The following sections cover:
>  
>  - **Attributes** (represented by ``struct rte_flow_attr``): properties of a
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH v3 1/6] ethdev: add capability to keep flow rules on restart
  2021-10-20 10:39       ` Andrew Rybchenko
@ 2021-10-20 11:40         ` Dmitry Kozlyuk
  2021-10-20 13:40           ` Ori Kam
  0 siblings, 1 reply; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-10-20 11:40 UTC (permalink / raw)
  To: Andrew Rybchenko, dev
  Cc: Qi Zhang, Ori Kam, NBU-Contact-Thomas Monjalon, Ferruh Yigit
> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: 20 октября 2021 г. 13:40
> To: Dmitry Kozlyuk <dkozlyuk@nvidia.com>; dev@dpdk.org
> Cc: Qi Zhang <qi.z.zhang@intel.com>; Ori Kam <orika@nvidia.com>; NBU-
> Contact-Thomas Monjalon <thomas@monjalon.net>; Ferruh Yigit
> <ferruh.yigit@intel.com>
> Subject: Re: [PATCH v3 1/6] ethdev: add capability to keep flow rules on
> restart
> 
> External email: Use caution opening links or attachments
> 
> 
> On 10/19/21 3:37 PM, Dmitry Kozlyuk wrote:
> > Previously, it was not specified what happens to the flow rules
> > when the device is stopped, possibly reconfigured, then started.
> > If flow rules were kept, it could be convenient for application
> > developers, because they wouldn't need to save and restore them.
> > However, due to the number of flows and possible creation rate it is
> > impractical to save all flow rules in DPDK layer. This means that flow
> > rules persistence really depends on whether PMD and HW can implement it
> > efficiently. It can also be limited by the rule item and action types,
> > and its attributes transfer bit (a combination of an item/action type
> > and a value of the transfer bit is called a ruel feature).
> >
> > Add a device capability bit for PMDs that can keep at least some
> > of the flow rules across restart. Without this capability behavior
> > is still unspecified and it is declared that the application must
> > flush the rules before stopping the device.
> > Allow the application to test for persitence of rules using
> 
> persitence -> persistence
>
> > a particular feature by attempting to create a flow rule
> > using that feature when the device is stopped
> > and checking for the specific error.
> > This is logical because if the PMD can to create the flow rule
> > when the device is not started and use it after the start happens,
> > it is natural that it can move its internal flow rule object
> > to the same state when the device is stopped and restore the state
> > when the device is started.
> >
> > Rule persistence across a reconfigurations is not required,
> > because tracking all the rules and configuration-dependent resources
> > they use may be infeasible. In case a PMD cannot keep the rules
> > across reconfiguration, it is allowed just to report an error.
> > Application must then flush the rules before attempting it.
> >
> > Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> > ---
> >  doc/guides/prog_guide/rte_flow.rst | 25 +++++++++++++++++++++++++
> >  lib/ethdev/rte_ethdev.h            |  7 +++++++
> >  lib/ethdev/rte_flow.h              |  1 +
> >  3 files changed, 33 insertions(+)
> >
> > diff --git a/doc/guides/prog_guide/rte_flow.rst
> b/doc/guides/prog_guide/rte_flow.rst
> > index 2b42d5ec8c..ff67b211e3 100644
> > --- a/doc/guides/prog_guide/rte_flow.rst
> > +++ b/doc/guides/prog_guide/rte_flow.rst
> > @@ -87,6 +87,31 @@ To avoid resource leaks on the PMD side, handles must
> be explicitly
> >  destroyed by the application before releasing associated resources such
> as
> >  queues and ports.
> >
> > +If ``RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP`` is not advertised,
> > +rules cannot be created until the device is started for the first time
> > +and cannot be kept when the device is stopped.
> > +However, PMD also does not flush them automatically on stop,
> > +so the application must call ``rte_flow_flush()`` or
> ``rte_flow_destroy()``
> > +before stopping the device to ensure no rules remain.
> > +
> > +If ``RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP`` is advertised, this means
> > +the PMD can keep at least some rules across the device stop and start.
> > +However, ``rte_eth_dev_configure()`` may fail if any rules remain,
> > +so the application must flush them before attempting a reconfiguration.
> > +Keeping may be unsupported for some types of rule items and actions,
> > +as well as depending on the value of flow attributes transfer bit.
> > +A combination of an item or action type and a value of the transfer bit
> > +is called a rule feature.
> 
> As I said before a combination is very hard to test and
> unfriendly to applications. Do we really need to make it
> that complex?
Maybe the wording is not explicit enough,
but it exactly attempts to address your previous comment.
In v3, applications don't need to check for a full combination
of item types, actions types, and a transfer bit value.
Instead, they only need to check for a combination
of one type (of an item or an action) with a transfer bit value.
There is an example below in the text.
> Which PMDs are going to support it? Which cases will really
> be distinguished and will have different support (keep or not)?
> 
> > +To test if rules with a particular feature are kept, the application
> must try
> > +to create a valid rule using this feature when the device is stopped
> > +(after it has been configured or started previously).
> 
> Sorry, it hardly makes sense. Does it suggest an application
> to:
>  1. configure
>  2. start
>  3. stop
>  4. check/create flow rules
>  5. start again
> as a regular start sequence instead of just configure+start.
> IMHO, it must be possible to check just after configure without
> start. Otherwise it looks really bad.
Of course, the following sequence is meant:
1. Configure
2. Try to create flow rules with needed features,
   check for RTE_FLOW_ERROR_TYPE_STATE.
   If and only if the test rules are not needed, destroy them.
3. Start
The sequence you outlined is also possible, but it is not necessary.
It may even be useful, for example, if an application is switching the workload
and has a new set of rule features to check.
> 
> > +If it fails with an error of type ``RTE_FLOW_ERROR_TYPE_STATE``,
> > +rules using this feature are flushed when the device is stopped.
> 
> Which entity does flush it?
PMD does.
Overall approach is as follows:
no capability => no guarantees, the application must manage the entities itself;
have capability => PMD manages the entities, only it may be unable to keep some.
> > +If it suceeds, such rules will be kept when the device is stopped,
> 
> suceeds -> succeeds
> 
> kept and functional? I.e. transfer rules still route traffic to
> other ports which could be up and running.
Unless you or anyone object, it's kept, but non-functional,
because the semantic of port stop is to stop traffic processing.
A transfer rule can mention any port, but I think it should be controlled
via the one that created it.
In any case, this must be stated explicitly in v4.
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH v3 0/6] Flow entites behavior on port restart
  2021-10-20 10:12     ` [dpdk-dev] [PATCH v3 0/6] Flow entites behavior on port restart Andrew Rybchenko
@ 2021-10-20 13:21       ` Dmitry Kozlyuk
  0 siblings, 0 replies; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-10-20 13:21 UTC (permalink / raw)
  To: Andrew Rybchenko, dev
> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: 20 октября 2021 г. 13:12
> To: Dmitry Kozlyuk <dkozlyuk@nvidia.com>; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3 0/6] Flow entites behavior on port
> restart
> 
> External email: Use caution opening links or attachments
> 
> 
> On 10/19/21 3:37 PM, Dmitry Kozlyuk wrote:
> > It is unspecified whether flow rules and indirect actions are kept
> > when a port is stopped, possibly reconfigured, and started again.
> > Vendors approach the topic differently, e.g. mlx5 and i40e PMD
> > disagree in whether flow rules can be kept, and mlx5 PMD would keep
> > indirect actions. In the end, applications are greatly affected
> > by whatever contract there is and need to know it.
> >
> > It is proposed to advertise capabilities of keeping flow rules
> > and indirect actions (as a special case of shared object)
> > using a combination of ethdev info and rte_flow calls.
> > Then a bug is fixed in mlx5 PMD that prevented indirect RSS action
> > from being kept, and the driver starts advertising the new capability.
> >
> > Prior discussions:
> > 1) http://inbox.dpdk.org/dev/20210727073121.895620-1-
> dkozlyuk@nvidia.com/
> > 2) http://inbox.dpdk.org/dev/20210901085516.3647814-1-
> dkozlyuk@nvidia.com/
> 
> Is there real usecase for keeping flow rules or indirect
> actions?
> Why does application want to restart port?
Sorry, I don't know of real use cases that would already use this feature.
But on the other hand, there was no well-defined API to enable such apps.
I can imagine apps adding queues (if their setup after the start is unsupported)
and enabling offloads when available resources or traffic pattern changes,
e.g. a DDoS attack starts and checksum calculation now wastes cycles on garbage.
For indirect actions, as patch 2/6 mentions, persistent semantics
are either natural (counters, meters) or just convenient.
Working around with shifts for counters or tolerances for meters
is possible of course, but it increases application state to manage.
 
For rules,
1. It is worth noting that an app can create many of them
before stopping a port, like a rule for each connection.
Saving the application from tracking them in such case is a big advantage.
If they cannot be restored before the port is started again,
there will be a time gap when the traffic is flowing, but no rules process it.
This alone could be covered by a distinct capability proposed earlier [1].
2. However, nowadays, there are apps that create rules on their datapath.
If rules are kept, such apps can reconfigure ports without
either loosing the rules or having to track them very fast.
As it is explained in the comments to patch 1/6 and 2/6,
now that rules can exist when the port is not stated,
it is logical that they need not be destroyed when the port is stopped.
[1]: http://patchwork.dpdk.org/project/dpdk/patch/20211005171914.2936-1-xhavli56@stud.fit.vutbr.cz/ 
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH v3 1/6] ethdev: add capability to keep flow rules on restart
  2021-10-20 11:40         ` Dmitry Kozlyuk
@ 2021-10-20 13:40           ` Ori Kam
  0 siblings, 0 replies; 96+ messages in thread
From: Ori Kam @ 2021-10-20 13:40 UTC (permalink / raw)
  To: Dmitry Kozlyuk, Andrew Rybchenko, dev
  Cc: Qi Zhang, NBU-Contact-Thomas Monjalon, Ferruh Yigit
Hi Dmitry,
> -----Original Message-----
> From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> Sent: Wednesday, October 20, 2021 2:40 PM
> Subject: RE: [PATCH v3 1/6] ethdev: add capability to keep flow rules on restart
> 
> 
> 
> > -----Original Message-----
> > From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> > Sent: 20 октября 2021 г. 13:40
> > To: Dmitry Kozlyuk <dkozlyuk@nvidia.com>; dev@dpdk.org
> > Cc: Qi Zhang <qi.z.zhang@intel.com>; Ori Kam <orika@nvidia.com>; NBU-
> > Contact-Thomas Monjalon <thomas@monjalon.net>; Ferruh Yigit
> > <ferruh.yigit@intel.com>
> > Subject: Re: [PATCH v3 1/6] ethdev: add capability to keep flow rules on
> > restart
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On 10/19/21 3:37 PM, Dmitry Kozlyuk wrote:
> > > Previously, it was not specified what happens to the flow rules
> > > when the device is stopped, possibly reconfigured, then started.
> > > If flow rules were kept, it could be convenient for application
> > > developers, because they wouldn't need to save and restore them.
> > > However, due to the number of flows and possible creation rate it is
> > > impractical to save all flow rules in DPDK layer. This means that flow
> > > rules persistence really depends on whether PMD and HW can implement it
> > > efficiently. It can also be limited by the rule item and action types,
> > > and its attributes transfer bit (a combination of an item/action type
> > > and a value of the transfer bit is called a ruel feature).
> > >
> > > Add a device capability bit for PMDs that can keep at least some
> > > of the flow rules across restart. Without this capability behavior
> > > is still unspecified and it is declared that the application must
> > > flush the rules before stopping the device.
> > > Allow the application to test for persitence of rules using
> >
> > persitence -> persistence
> >
> > > a particular feature by attempting to create a flow rule
> > > using that feature when the device is stopped
> > > and checking for the specific error.
> > > This is logical because if the PMD can to create the flow rule
> > > when the device is not started and use it after the start happens,
> > > it is natural that it can move its internal flow rule object
> > > to the same state when the device is stopped and restore the state
> > > when the device is started.
> > >
> > > Rule persistence across a reconfigurations is not required,
> > > because tracking all the rules and configuration-dependent resources
> > > they use may be infeasible. In case a PMD cannot keep the rules
> > > across reconfiguration, it is allowed just to report an error.
> > > Application must then flush the rules before attempting it.
> > >
> > > Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> > > ---
> > >  doc/guides/prog_guide/rte_flow.rst | 25 +++++++++++++++++++++++++
> > >  lib/ethdev/rte_ethdev.h            |  7 +++++++
> > >  lib/ethdev/rte_flow.h              |  1 +
> > >  3 files changed, 33 insertions(+)
> > >
> > > diff --git a/doc/guides/prog_guide/rte_flow.rst
> > b/doc/guides/prog_guide/rte_flow.rst
> > > index 2b42d5ec8c..ff67b211e3 100644
> > > --- a/doc/guides/prog_guide/rte_flow.rst
> > > +++ b/doc/guides/prog_guide/rte_flow.rst
> > > @@ -87,6 +87,31 @@ To avoid resource leaks on the PMD side, handles must
> > be explicitly
> > >  destroyed by the application before releasing associated resources such
> > as
> > >  queues and ports.
> > >
> > > +If ``RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP`` is not advertised,
> > > +rules cannot be created until the device is started for the first time
> > > +and cannot be kept when the device is stopped.
> > > +However, PMD also does not flush them automatically on stop,
> > > +so the application must call ``rte_flow_flush()`` or
> > ``rte_flow_destroy()``
> > > +before stopping the device to ensure no rules remain.
> > > +
> > > +If ``RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP`` is advertised, this means
> > > +the PMD can keep at least some rules across the device stop and start.
> > > +However, ``rte_eth_dev_configure()`` may fail if any rules remain,
> > > +so the application must flush them before attempting a reconfiguration.
> > > +Keeping may be unsupported for some types of rule items and actions,
> > > +as well as depending on the value of flow attributes transfer bit.
> > > +A combination of an item or action type and a value of the transfer bit
> > > +is called a rule feature.
> >
> > As I said before a combination is very hard to test and
> > unfriendly to applications. Do we really need to make it
> > that complex?
> 
> Maybe the wording is not explicit enough,
> but it exactly attempts to address your previous comment.
> In v3, applications don't need to check for a full combination
> of item types, actions types, and a transfer bit value.
> Instead, they only need to check for a combination
> of one type (of an item or an action) with a transfer bit value.
> There is an example below in the text.
> 
> > Which PMDs are going to support it? Which cases will really
> > be distinguished and will have different support (keep or not)?
> >
> > > +To test if rules with a particular feature are kept, the application
> > must try
> > > +to create a valid rule using this feature when the device is stopped
> > > +(after it has been configured or started previously).
> >
> > Sorry, it hardly makes sense. Does it suggest an application
> > to:
> >  1. configure
> >  2. start
> >  3. stop
> >  4. check/create flow rules
> >  5. start again
> > as a regular start sequence instead of just configure+start.
> > IMHO, it must be possible to check just after configure without
> > start. Otherwise it looks really bad.
> 
> Of course, the following sequence is meant:
> 1. Configure
> 2. Try to create flow rules with needed features,
>    check for RTE_FLOW_ERROR_TYPE_STATE.
>    If and only if the test rules are not needed, destroy them.
> 3. Start
> 
> The sequence you outlined is also possible, but it is not necessary.
> It may even be useful, for example, if an application is switching the workload
> and has a new set of rule features to check.
> 
> >
> > > +If it fails with an error of type ``RTE_FLOW_ERROR_TYPE_STATE``,
> > > +rules using this feature are flushed when the device is stopped.
> >
> > Which entity does flush it?
> 
> PMD does.
> Overall approach is as follows:
> no capability => no guarantees, the application must manage the entities itself;
> have capability => PMD manages the entities, only it may be unable to keep some.
> 
I don't think it should be the PMD responsibility.
it is always the application,
if PMD returns ``RTE_FLOW_ERROR_TYPE_STATE`` it means that the application
must destroy the rules before stop.
Since the PMD said it can't keep the flows it may mean that he can't track them
so it can't remove them.
> > > +If it suceeds, such rules will be kept when the device is stopped,
> >
> > suceeds -> succeeds
> >
> > kept and functional? I.e. transfer rules still route traffic to
> > other ports which could be up and running.
> 
> Unless you or anyone object, it's kept, but non-functional,
> because the semantic of port stop is to stop traffic processing.
> A transfer rule can mention any port, but I think it should be controlled
> via the one that created it.
> In any case, this must be stated explicitly in v4.
Best,
Ori
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH v3 3/6] net: advertise no support for keeping flow rules
  2021-10-20 10:08       ` Andrew Rybchenko
@ 2021-10-20 22:20         ` Dmitry Kozlyuk
  0 siblings, 0 replies; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-10-20 22:20 UTC (permalink / raw)
  To: Andrew Rybchenko, Ferruh Yigit; +Cc: dev
> > diff --git a/drivers/net/bnxt/bnxt_ethdev.c
> b/drivers/net/bnxt/bnxt_ethdev.c
> > index aa7e7fdc85..1a6e0128ff 100644
> > --- a/drivers/net/bnxt/bnxt_ethdev.c
> > +++ b/drivers/net/bnxt/bnxt_ethdev.c
> > @@ -1009,6 +1009,7 @@ static int bnxt_dev_info_get_op(struct rte_eth_dev
> *eth_dev,
> >       dev_info->speed_capa = bnxt_get_speed_capabilities(bp);
> >       dev_info->dev_capa = RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP |
> >                            RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP;
> > +     dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
> 
> Sorry, but here and everywhere below I see no point to cleanup
> the bit explicitly when it is not actually set.
> 
> [snip]
(removed maintainers from CC)
As Ferruh explained, this line serves as a TODO item for maintainers
to either remove the useless line or to support the capability and advertise it.
If deemed unnecessary in the end, this patch can be safely dropped.
http://inbox.dpdk.org/dev/6e148fd8-c7b9-dd90-d286-a54ff0faf713@intel.com/
^ permalink raw reply	[flat|nested] 96+ messages in thread
* [dpdk-dev] [PATCH v4 0/6] Flow entites behavior on port restart
  2021-10-19 12:37   ` [dpdk-dev] [PATCH v3 0/6] Flow entites behavior on port restart Dmitry Kozlyuk
                       ` (6 preceding siblings ...)
  2021-10-20 10:12     ` [dpdk-dev] [PATCH v3 0/6] Flow entites behavior on port restart Andrew Rybchenko
@ 2021-10-21  6:34     ` Dmitry Kozlyuk
  2021-10-21  6:34       ` [dpdk-dev] [PATCH v4 1/6] ethdev: add capability to keep flow rules on restart Dmitry Kozlyuk
                         ` (8 more replies)
  7 siblings, 9 replies; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-10-21  6:34 UTC (permalink / raw)
  To: dev; +Cc: Ori Kam
It is unspecified whether flow rules and indirect actions are kept
when a port is stopped, possibly reconfigured, and started again.
Vendors approach the topic differently, e.g. mlx5 and i40e PMD
disagree in whether flow rules can be kept, and mlx5 PMD would keep
indirect actions. In the end, applications are greatly affected
by whatever contract there is and need to know it.
Applications may wish to restart the port to reconfigure it,
e.g. switch offloads or even modify queues.
Keeping rte_flow entities enables application improvements:
1. Since keeping the rules across restart comes with the ability
   to create rules before the device is started. This allows
   to have all the rules created at the moment of start,
   so that there is no time frame when traffic is coming already,
   but the rules are not yet created (restored).
2. When a rule or an indirect action has some associated state,
   such as a counter, application saves the need to keep
   additional state in order to cope with information loss
   if such an entity would be destroyed.
It is proposed to advertise capabilities of keeping flow rules
and indirect actions (as a special case of shared object)
using a combination of ethdev info and rte_flow calls.
Then a bug is fixed in mlx5 PMD that prevented indirect RSS action
from being kept, and the driver starts advertising the new capability.
Prior discussions:
1) http://inbox.dpdk.org/dev/20210727073121.895620-1-dkozlyuk@nvidia.com/
2) http://inbox.dpdk.org/dev/20210901085516.3647814-1-dkozlyuk@nvidia.com/
v4:  1. Fix rebase conflicts (CI).
     2. State rule behavior when a port is not started or stopped (Ori).
     3. Improve wording on rule features, add examples (Andrew).
     4. State that rules/actions that cannot be kept while other can be
        must be destroyed by the application (Andrew/Ori).
     5. Add rationale to the cover letter (Andrew).
Dmitry Kozlyuk (6):
  ethdev: add capability to keep flow rules on restart
  ethdev: add capability to keep shared objects on restart
  net: advertise no support for keeping flow rules
  net/mlx5: discover max flow priority using DevX
  net/mlx5: create drop queue using DevX
  net/mlx5: preserve indirect actions on restart
 doc/guides/prog_guide/rte_flow.rst      |  57 +++++
 drivers/net/bnxt/bnxt_ethdev.c          |   1 +
 drivers/net/bnxt/bnxt_reps.c            |   1 +
 drivers/net/cnxk/cnxk_ethdev_ops.c      |   1 +
 drivers/net/cxgbe/cxgbe_ethdev.c        |   2 +
 drivers/net/dpaa2/dpaa2_ethdev.c        |   1 +
 drivers/net/e1000/em_ethdev.c           |   2 +
 drivers/net/e1000/igb_ethdev.c          |   1 +
 drivers/net/enic/enic_ethdev.c          |   1 +
 drivers/net/failsafe/failsafe_ops.c     |   1 +
 drivers/net/hinic/hinic_pmd_ethdev.c    |   2 +
 drivers/net/hns3/hns3_ethdev.c          |   1 +
 drivers/net/hns3/hns3_ethdev_vf.c       |   1 +
 drivers/net/i40e/i40e_ethdev.c          |   1 +
 drivers/net/i40e/i40e_vf_representor.c  |   2 +
 drivers/net/iavf/iavf_ethdev.c          |   1 +
 drivers/net/ice/ice_dcf_ethdev.c        |   1 +
 drivers/net/igc/igc_ethdev.c            |   1 +
 drivers/net/ipn3ke/ipn3ke_representor.c |   1 +
 drivers/net/mlx5/linux/mlx5_os.c        |   5 -
 drivers/net/mlx5/mlx5_devx.c            | 211 ++++++++++++++---
 drivers/net/mlx5/mlx5_ethdev.c          |   1 +
 drivers/net/mlx5/mlx5_flow.c            | 292 ++++++++++++++++++++++--
 drivers/net/mlx5/mlx5_flow.h            |   6 +
 drivers/net/mlx5/mlx5_flow_dv.c         | 103 +++++++++
 drivers/net/mlx5/mlx5_flow_verbs.c      |  77 +------
 drivers/net/mlx5/mlx5_rx.h              |   4 +
 drivers/net/mlx5/mlx5_rxq.c             |  99 +++++++-
 drivers/net/mlx5/mlx5_trigger.c         |  10 +
 drivers/net/mvpp2/mrvl_ethdev.c         |   2 +
 drivers/net/octeontx2/otx2_ethdev_ops.c |   1 +
 drivers/net/qede/qede_ethdev.c          |   1 +
 drivers/net/sfc/sfc_ethdev.c            |   1 +
 drivers/net/softnic/rte_eth_softnic.c   |   1 +
 drivers/net/tap/rte_eth_tap.c           |   1 +
 drivers/net/txgbe/txgbe_ethdev.c        |   1 +
 drivers/net/txgbe/txgbe_ethdev_vf.c     |   1 +
 lib/ethdev/rte_ethdev.h                 |  10 +
 lib/ethdev/rte_flow.h                   |   1 +
 39 files changed, 770 insertions(+), 137 deletions(-)
-- 
2.25.1
^ permalink raw reply	[flat|nested] 96+ messages in thread
* [dpdk-dev] [PATCH v4 1/6] ethdev: add capability to keep flow rules on restart
  2021-10-21  6:34     ` [dpdk-dev] [PATCH v4 " Dmitry Kozlyuk
@ 2021-10-21  6:34       ` Dmitry Kozlyuk
  2021-10-21  7:36         ` Ori Kam
                           ` (2 more replies)
  2021-10-21  6:34       ` [dpdk-dev] [PATCH v4 2/6] ethdev: add capability to keep shared objects " Dmitry Kozlyuk
                         ` (7 subsequent siblings)
  8 siblings, 3 replies; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-10-21  6:34 UTC (permalink / raw)
  To: dev; +Cc: Ori Kam, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
Previously, it was not specified what happens to the flow rules
when the device is stopped, possibly reconfigured, then started.
If flow rules were kept, it could be convenient for application
developers, because they wouldn't need to save and restore them.
However, due to the number of flows and possible creation rate it is
impractical to save all flow rules in DPDK layer. This means that flow
rules persistence really depends on whether PMD and HW can implement it
efficiently. It can also be limited by the rule item and action types,
and its attributes transfer bit (a combination of an item/action type
and a value of the transfer bit is called a ruel feature).
Add a device capability bit for PMDs that can keep at least some
of the flow rules across restart. Without this capability behavior
is still unspecified and it is declared that the application must
flush the rules before stopping the device.
Allow the application to test for persistence of rules using
a particular feature by attempting to create a flow rule
using that feature when the device is stopped
and checking for the specific error.
This is logical because if the PMD can to create the flow rule
when the device is not started and use it after the start happens,
it is natural that it can move its internal flow rule object
to the same state when the device is stopped and restore the state
when the device is started.
Rule persistence across a reconfigurations is not required,
because tracking all the rules and configuration-dependent resources
they use may be infeasible. In case a PMD cannot keep the rules
across reconfiguration, it is allowed just to report an error.
Application must then flush the rules before attempting it.
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
---
 doc/guides/prog_guide/rte_flow.rst | 31 ++++++++++++++++++++++++++++++
 lib/ethdev/rte_ethdev.h            |  7 +++++++
 lib/ethdev/rte_flow.h              |  1 +
 3 files changed, 39 insertions(+)
diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index aeba374182..9beaae3df3 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -87,6 +87,37 @@ To avoid resource leaks on the PMD side, handles must be explicitly
 destroyed by the application before releasing associated resources such as
 queues and ports.
 
+When the device is stopped, its rules do not process the traffic.
+In particular, transfer rules created using some device
+stop affecting the traffic even if they refer to different ports.
+
+If ``RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP`` is not advertised,
+rules cannot be created until the device is started for the first time
+and cannot be kept when the device is stopped.
+However, PMD also does not flush them automatically on stop,
+so the application must call ``rte_flow_flush()`` or ``rte_flow_destroy()``
+before stopping the device to ensure no rules remain.
+
+If ``RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP`` is advertised, this means
+the PMD can keep at least some rules across the device stop and start.
+However, ``rte_eth_dev_configure()`` may fail if any rules remain,
+so the application must flush them before attempting a reconfiguration.
+Keeping may be unsupported for some types of rule items and actions,
+as well as depending on the value of flow attributes transfer bit.
+A combination of a single an item or action type
+and a value of the transfer bit is called a rule feature.
+For example: a COUNT action with the transfer bit set.
+To test if rules with a particular feature are kept, the application must try
+to create a valid rule using this feature when the device is not started
+(either before the first start or after a stop).
+If it fails with an error of type ``RTE_FLOW_ERROR_TYPE_STATE``,
+all rules using this feature must be flushed by the application
+before stopping the device.
+If it succeeds, such rules will be kept when the device is stopped,
+provided they do not use other features that are not supported.
+Rules that are created when the device is stopped, including the rules
+created for the test, will be kept after the device is started.
+
 The following sections cover:
 
 - **Attributes** (represented by ``struct rte_flow_attr``): properties of a
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 014270d316..9cf23fecce 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -90,6 +90,11 @@
  *     - flow director filtering mode (but not filtering rules)
  *     - NIC queue statistics mappings
  *
+ * The following configuration may be retained or not
+ * depending on the device capabilities:
+ *
+ *     - flow rules
+ *
  * Any other configuration will not be stored and will need to be re-entered
  * before a call to rte_eth_dev_start().
  *
@@ -1445,6 +1450,8 @@ struct rte_eth_conf {
 #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
 /** Device supports Tx queue setup after device started. */
 #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
+/** Device supports keeping flow rules across restart. */
+#define RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP 0x00000004
 /**@}*/
 
 /*
diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
index 2b6efeef8c..16ef33819b 100644
--- a/lib/ethdev/rte_flow.h
+++ b/lib/ethdev/rte_flow.h
@@ -3676,6 +3676,7 @@ enum rte_flow_error_type {
 	RTE_FLOW_ERROR_TYPE_ACTION_NUM, /**< Number of actions. */
 	RTE_FLOW_ERROR_TYPE_ACTION_CONF, /**< Action configuration. */
 	RTE_FLOW_ERROR_TYPE_ACTION, /**< Specific action. */
+	RTE_FLOW_ERROR_TYPE_STATE, /**< Current device state. */
 };
 
 /**
-- 
2.25.1
^ permalink raw reply	[flat|nested] 96+ messages in thread
* [dpdk-dev] [PATCH v4 2/6] ethdev: add capability to keep shared objects on restart
  2021-10-21  6:34     ` [dpdk-dev] [PATCH v4 " Dmitry Kozlyuk
  2021-10-21  6:34       ` [dpdk-dev] [PATCH v4 1/6] ethdev: add capability to keep flow rules on restart Dmitry Kozlyuk
@ 2021-10-21  6:34       ` Dmitry Kozlyuk
  2021-10-21  7:37         ` Ori Kam
                           ` (2 more replies)
  2021-10-21  6:35       ` [dpdk-dev] [PATCH v4 3/6] net: advertise no support for keeping flow rules Dmitry Kozlyuk
                         ` (6 subsequent siblings)
  8 siblings, 3 replies; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-10-21  6:34 UTC (permalink / raw)
  To: dev; +Cc: Ori Kam, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
rte_flow_action_handle_create() did not mention what happens
with an indirect action when a device is stopped and started again.
It is natural for some indirect actions, like counter, to be persistent.
Keeping others at least saves application time and complexity.
However, not all PMDs can support it, or the support may be limited
by particular action kinds, that is, combinations of action type
and the value of the transfer bit in its configuration.
Add a device capability to indicate if at least some indirect actions
are kept across the above sequence. Without this capability the behavior
is still unspecified, and application is required to destroy
the indirect actions before stopping the device.
In the future, indirect actions may not be the only type of objects
shared between flow rules. The capability bit intends to cover all
possible types of such objects, hence its name.
Declare that the application can test for the persistence
of a particular indirect action kind by attempting to create
an indirect action of that kind when the device is stopped
and checking for the specific error type.
This is logical because if the PMD can to create an indirect action
when the device is not started and use it after the start happens,
it is natural that it can move its internal flow shared object
to the same state when the device is stopped and restore the state
when the device is started.
Indirect action persistence across a reconfigurations is not required.
In case a PMD cannot keep the indirect actions across reconfiguration,
it is allowed just to report an error.
Application must then flush the indirect actions before attempting it.
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
---
 doc/guides/prog_guide/rte_flow.rst | 26 ++++++++++++++++++++++++++
 lib/ethdev/rte_ethdev.h            |  3 +++
 2 files changed, 29 insertions(+)
diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index 9beaae3df3..bef143862b 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -2965,6 +2965,32 @@ updated depend on the type of the ``action`` and different for every type.
 The indirect action specified data (e.g. counter) can be queried by
 ``rte_flow_action_handle_query()``.
 
+If ``RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP`` is not advertised,
+indirect actions cannot be created until the device is started for the first time
+and cannot be kept when the device is stopped.
+However, PMD also does not flush them automatically on stop,
+so the application must call ``rte_flow_action_handle_destroy()``
+before stopping the device to ensure no indirect actions remain.
+
+If ``RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP`` is advertised,
+this means that the PMD can keep at least some indirect actions
+across device stop and start.
+However, ``rte_eth_dev_configure()`` may fail if any indirect actions remain,
+so the application must destroy them before attempting a reconfiguration.
+Keeping may be only supported for certain kinds of indirect actions.
+A kind is a combination of an action type and a value of its transfer bit.
+For example: an indirect counter with the transfer bit reset.
+To test if a particular kind of indirect actions is kept,
+the application must try to create a valid indirect action of that kind
+when the device is not started (either before the first start of after a stop).
+If it fails with an error of type ``RTE_FLOW_ERROR_TYPE_STATE``,
+application must destroy all indirect actions of this kind
+before stopping the device.
+If it succeeds, all indirect actions of the same kind are kept
+when the device is stopped.
+Indirect actions of a kept kind that are created when the device is stopped,
+including the ones created for the test, will be kept after the device start.
+
 .. _table_rte_flow_action_handle:
 
 .. table:: INDIRECT
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 9cf23fecce..5375844484 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -94,6 +94,7 @@
  * depending on the device capabilities:
  *
  *     - flow rules
+ *     - flow-related shared objects, e.g. indirect actions
  *
  * Any other configuration will not be stored and will need to be re-entered
  * before a call to rte_eth_dev_start().
@@ -1452,6 +1453,8 @@ struct rte_eth_conf {
 #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
 /** Device supports keeping flow rules across restart. */
 #define RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP 0x00000004
+/** Device supports keeping shared flow objects across restart. */
+#define RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP 0x00000008
 /**@}*/
 
 /*
-- 
2.25.1
^ permalink raw reply	[flat|nested] 96+ messages in thread
* [dpdk-dev] [PATCH v4 3/6] net: advertise no support for keeping flow rules
  2021-10-21  6:34     ` [dpdk-dev] [PATCH v4 " Dmitry Kozlyuk
  2021-10-21  6:34       ` [dpdk-dev] [PATCH v4 1/6] ethdev: add capability to keep flow rules on restart Dmitry Kozlyuk
  2021-10-21  6:34       ` [dpdk-dev] [PATCH v4 2/6] ethdev: add capability to keep shared objects " Dmitry Kozlyuk
@ 2021-10-21  6:35       ` Dmitry Kozlyuk
  2021-10-21 18:26         ` Ajit Khaparde
                           ` (2 more replies)
  2021-10-21  6:35       ` [dpdk-dev] [PATCH v4 4/6] net/mlx5: discover max flow priority using DevX Dmitry Kozlyuk
                         ` (5 subsequent siblings)
  8 siblings, 3 replies; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-10-21  6:35 UTC (permalink / raw)
  To: dev
  Cc: Ori Kam, Ferruh Yigit, Ajit Khaparde, Somnath Kotur,
	Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori, Satha Rao,
	Rahul Lakkireddy, Hemant Agrawal, Sachin Saxena, Haiyue Wang,
	John Daley, Hyong Youb Kim, Gaetan Rivet, Ziyang Xuan,
	Xiaoyun Wang, Guoyang Zhou, Min Hu (Connor),
	Yisen Zhuang, Lijun Ou, Beilei Xing, Jingjing Wu, Qiming Yang,
	Qi Zhang, Rosen Xu, Liron Himi, Jerin Jacob, Rasesh Mody,
	Devendra Singh Rawat, Andrew Rybchenko, Jasvinder Singh,
	Cristian Dumitrescu, Keith Wiles, Jiawen Wu, Jian Wang
When RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP capability bit is zero,
the specified behavior is the same as it had been before
this bit was introduced. Explicitly reset it in all PMDs
supporting rte_flow API in order to attract the attention
of maintainers, who should eventually choose to advertise
the new capability or not. It is already known that
mlx4 and mlx5 will not support this capability.
For RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP
similar action is not performed,
because no PMD except mlx5 supports indirect actions.
Any PMD that starts doing so will anyway have to consider
all relevant API, including this capability.
Suggested-by: Ferruh Yigit <ferruh.yigit@intel.com>
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
---
 drivers/net/bnxt/bnxt_ethdev.c          | 1 +
 drivers/net/bnxt/bnxt_reps.c            | 1 +
 drivers/net/cnxk/cnxk_ethdev_ops.c      | 1 +
 drivers/net/cxgbe/cxgbe_ethdev.c        | 2 ++
 drivers/net/dpaa2/dpaa2_ethdev.c        | 1 +
 drivers/net/e1000/em_ethdev.c           | 2 ++
 drivers/net/e1000/igb_ethdev.c          | 1 +
 drivers/net/enic/enic_ethdev.c          | 1 +
 drivers/net/failsafe/failsafe_ops.c     | 1 +
 drivers/net/hinic/hinic_pmd_ethdev.c    | 2 ++
 drivers/net/hns3/hns3_ethdev.c          | 1 +
 drivers/net/hns3/hns3_ethdev_vf.c       | 1 +
 drivers/net/i40e/i40e_ethdev.c          | 1 +
 drivers/net/i40e/i40e_vf_representor.c  | 2 ++
 drivers/net/iavf/iavf_ethdev.c          | 1 +
 drivers/net/ice/ice_dcf_ethdev.c        | 1 +
 drivers/net/igc/igc_ethdev.c            | 1 +
 drivers/net/ipn3ke/ipn3ke_representor.c | 1 +
 drivers/net/mvpp2/mrvl_ethdev.c         | 2 ++
 drivers/net/octeontx2/otx2_ethdev_ops.c | 1 +
 drivers/net/qede/qede_ethdev.c          | 1 +
 drivers/net/sfc/sfc_ethdev.c            | 1 +
 drivers/net/softnic/rte_eth_softnic.c   | 1 +
 drivers/net/tap/rte_eth_tap.c           | 1 +
 drivers/net/txgbe/txgbe_ethdev.c        | 1 +
 drivers/net/txgbe/txgbe_ethdev_vf.c     | 1 +
 26 files changed, 31 insertions(+)
diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index f385723a9f..dbdcdb1ec4 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -1008,6 +1008,7 @@ static int bnxt_dev_info_get_op(struct rte_eth_dev *eth_dev,
 	dev_info->speed_capa = bnxt_get_speed_capabilities(bp);
 	dev_info->dev_capa = RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP |
 			     RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	dev_info->default_rxconf = (struct rte_eth_rxconf) {
 		.rx_thresh = {
diff --git a/drivers/net/bnxt/bnxt_reps.c b/drivers/net/bnxt/bnxt_reps.c
index b7e88e013a..34b5df6018 100644
--- a/drivers/net/bnxt/bnxt_reps.c
+++ b/drivers/net/bnxt/bnxt_reps.c
@@ -526,6 +526,7 @@ int bnxt_rep_dev_info_get_op(struct rte_eth_dev *eth_dev,
 	dev_info->max_tx_queues = max_rx_rings;
 	dev_info->reta_size = bnxt_rss_hash_tbl_size(parent_bp);
 	dev_info->hash_key_size = 40;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	/* MTU specifics */
 	dev_info->min_mtu = RTE_ETHER_MIN_MTU;
diff --git a/drivers/net/cnxk/cnxk_ethdev_ops.c b/drivers/net/cnxk/cnxk_ethdev_ops.c
index d0924df761..b598512322 100644
--- a/drivers/net/cnxk/cnxk_ethdev_ops.c
+++ b/drivers/net/cnxk/cnxk_ethdev_ops.c
@@ -68,6 +68,7 @@ cnxk_nix_info_get(struct rte_eth_dev *eth_dev, struct rte_eth_dev_info *devinfo)
 	devinfo->speed_capa = dev->speed_capa;
 	devinfo->dev_capa = RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP |
 			    RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP;
+	devinfo->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 	return 0;
 }
 
diff --git a/drivers/net/cxgbe/cxgbe_ethdev.c b/drivers/net/cxgbe/cxgbe_ethdev.c
index f77b297600..e654ccc854 100644
--- a/drivers/net/cxgbe/cxgbe_ethdev.c
+++ b/drivers/net/cxgbe/cxgbe_ethdev.c
@@ -131,6 +131,8 @@ int cxgbe_dev_info_get(struct rte_eth_dev *eth_dev,
 	device_info->max_vfs = adapter->params.arch.vfcount;
 	device_info->max_vmdq_pools = 0; /* XXX: For now no support for VMDQ */
 
+	device_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
+
 	device_info->rx_queue_offload_capa = 0UL;
 	device_info->rx_offload_capa = CXGBE_RX_OFFLOADS;
 
diff --git a/drivers/net/dpaa2/dpaa2_ethdev.c b/drivers/net/dpaa2/dpaa2_ethdev.c
index a0270e7852..19f35262e5 100644
--- a/drivers/net/dpaa2/dpaa2_ethdev.c
+++ b/drivers/net/dpaa2/dpaa2_ethdev.c
@@ -254,6 +254,7 @@ dpaa2_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->speed_capa = ETH_LINK_SPEED_1G |
 			ETH_LINK_SPEED_2_5G |
 			ETH_LINK_SPEED_10G;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	dev_info->max_hash_mac_addrs = 0;
 	dev_info->max_vfs = 0;
diff --git a/drivers/net/e1000/em_ethdev.c b/drivers/net/e1000/em_ethdev.c
index 73152dec6e..3d546c5517 100644
--- a/drivers/net/e1000/em_ethdev.c
+++ b/drivers/net/e1000/em_ethdev.c
@@ -1106,6 +1106,8 @@ eth_em_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 			ETH_LINK_SPEED_100M_HD | ETH_LINK_SPEED_100M |
 			ETH_LINK_SPEED_1G;
 
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
+
 	/* Preferred queue parameters */
 	dev_info->default_rxportconf.nb_queues = 1;
 	dev_info->default_txportconf.nb_queues = 1;
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index dbe811a1ad..d1e61ea345 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -2174,6 +2174,7 @@ eth_igb_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->tx_queue_offload_capa = igb_get_tx_queue_offloads_capa(dev);
 	dev_info->tx_offload_capa = igb_get_tx_port_offloads_capa(dev) |
 				    dev_info->tx_queue_offload_capa;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	switch (hw->mac.type) {
 	case e1000_82575:
diff --git a/drivers/net/enic/enic_ethdev.c b/drivers/net/enic/enic_ethdev.c
index 8df7332bc5..4e8ccfd832 100644
--- a/drivers/net/enic/enic_ethdev.c
+++ b/drivers/net/enic/enic_ethdev.c
@@ -469,6 +469,7 @@ static int enicpmd_dev_info_get(struct rte_eth_dev *eth_dev,
 	device_info->rx_offload_capa = enic->rx_offload_capa;
 	device_info->tx_offload_capa = enic->tx_offload_capa;
 	device_info->tx_queue_offload_capa = enic->tx_queue_offload_capa;
+	device_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 	device_info->default_rxconf = (struct rte_eth_rxconf) {
 		.rx_free_thresh = ENIC_DEFAULT_RX_FREE_THRESH
 	};
diff --git a/drivers/net/failsafe/failsafe_ops.c b/drivers/net/failsafe/failsafe_ops.c
index 29de39910c..9e9c688961 100644
--- a/drivers/net/failsafe/failsafe_ops.c
+++ b/drivers/net/failsafe/failsafe_ops.c
@@ -1220,6 +1220,7 @@ fs_dev_infos_get(struct rte_eth_dev *dev,
 	infos->dev_capa =
 		RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP |
 		RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP;
+	infos->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_PROBED) {
 		struct rte_eth_dev_info sub_info;
diff --git a/drivers/net/hinic/hinic_pmd_ethdev.c b/drivers/net/hinic/hinic_pmd_ethdev.c
index c2374ebb67..ff287321c5 100644
--- a/drivers/net/hinic/hinic_pmd_ethdev.c
+++ b/drivers/net/hinic/hinic_pmd_ethdev.c
@@ -751,6 +751,8 @@ hinic_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
 				DEV_TX_OFFLOAD_TCP_TSO |
 				DEV_TX_OFFLOAD_MULTI_SEGS;
 
+	info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
+
 	info->hash_key_size = HINIC_RSS_KEY_SIZE;
 	info->reta_size = HINIC_RSS_INDIR_SIZE;
 	info->flow_type_rss_offloads = HINIC_RSS_OFFLOAD_ALL;
diff --git a/drivers/net/hns3/hns3_ethdev.c b/drivers/net/hns3/hns3_ethdev.c
index 693048f587..4177c0db41 100644
--- a/drivers/net/hns3/hns3_ethdev.c
+++ b/drivers/net/hns3/hns3_ethdev.c
@@ -2707,6 +2707,7 @@ hns3_dev_infos_get(struct rte_eth_dev *eth_dev, struct rte_eth_dev_info *info)
 	if (hns3_dev_get_support(hw, INDEP_TXRX))
 		info->dev_capa = RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP |
 				 RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP;
+	info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	if (hns3_dev_get_support(hw, PTP))
 		info->rx_offload_capa |= DEV_RX_OFFLOAD_TIMESTAMP;
diff --git a/drivers/net/hns3/hns3_ethdev_vf.c b/drivers/net/hns3/hns3_ethdev_vf.c
index 54dbd4b798..b53e9be091 100644
--- a/drivers/net/hns3/hns3_ethdev_vf.c
+++ b/drivers/net/hns3/hns3_ethdev_vf.c
@@ -965,6 +965,7 @@ hns3vf_dev_infos_get(struct rte_eth_dev *eth_dev, struct rte_eth_dev_info *info)
 	if (hns3_dev_get_support(hw, INDEP_TXRX))
 		info->dev_capa = RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP |
 				 RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP;
+	info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	info->rx_desc_lim = (struct rte_eth_desc_lim) {
 		.nb_max = HNS3_MAX_RING_DESC,
diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 0a4db0891d..e472cee167 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -3751,6 +3751,7 @@ i40e_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->dev_capa =
 		RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP |
 		RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	dev_info->hash_key_size = (I40E_PFQF_HKEY_MAX_INDEX + 1) *
 						sizeof(uint32_t);
diff --git a/drivers/net/i40e/i40e_vf_representor.c b/drivers/net/i40e/i40e_vf_representor.c
index 12d5a2e48a..4d5a4af292 100644
--- a/drivers/net/i40e/i40e_vf_representor.c
+++ b/drivers/net/i40e/i40e_vf_representor.c
@@ -35,6 +35,8 @@ i40e_vf_representor_dev_infos_get(struct rte_eth_dev *ethdev,
 	/* get dev info for the vdev */
 	dev_info->device = ethdev->device;
 
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
+
 	dev_info->max_rx_queues = ethdev->data->nb_rx_queues;
 	dev_info->max_tx_queues = ethdev->data->nb_tx_queues;
 
diff --git a/drivers/net/iavf/iavf_ethdev.c b/drivers/net/iavf/iavf_ethdev.c
index 611f1f7722..9bb5bdf465 100644
--- a/drivers/net/iavf/iavf_ethdev.c
+++ b/drivers/net/iavf/iavf_ethdev.c
@@ -960,6 +960,7 @@ iavf_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->reta_size = vf->vf_res->rss_lut_size;
 	dev_info->flow_type_rss_offloads = IAVF_RSS_OFFLOAD_ALL;
 	dev_info->max_mac_addrs = IAVF_NUM_MACADDR_MAX;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 	dev_info->rx_offload_capa =
 		DEV_RX_OFFLOAD_VLAN_STRIP |
 		DEV_RX_OFFLOAD_QINQ_STRIP |
diff --git a/drivers/net/ice/ice_dcf_ethdev.c b/drivers/net/ice/ice_dcf_ethdev.c
index b8a537cb85..05a7ccf71e 100644
--- a/drivers/net/ice/ice_dcf_ethdev.c
+++ b/drivers/net/ice/ice_dcf_ethdev.c
@@ -673,6 +673,7 @@ ice_dcf_dev_info_get(struct rte_eth_dev *dev,
 	dev_info->hash_key_size = hw->vf_res->rss_key_size;
 	dev_info->reta_size = hw->vf_res->rss_lut_size;
 	dev_info->flow_type_rss_offloads = ICE_RSS_OFFLOAD_ALL;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	dev_info->rx_offload_capa =
 		DEV_RX_OFFLOAD_VLAN_STRIP |
diff --git a/drivers/net/igc/igc_ethdev.c b/drivers/net/igc/igc_ethdev.c
index 2a1ed90b64..7d4cc408ba 100644
--- a/drivers/net/igc/igc_ethdev.c
+++ b/drivers/net/igc/igc_ethdev.c
@@ -1480,6 +1480,7 @@ eth_igc_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->min_rx_bufsize = 256; /* See BSIZE field of RCTL register. */
 	dev_info->max_rx_pktlen = MAX_RX_JUMBO_FRAME_SIZE;
 	dev_info->max_mac_addrs = hw->mac.rar_entry_count;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 	dev_info->rx_offload_capa = IGC_RX_OFFLOAD_ALL;
 	dev_info->tx_offload_capa = IGC_TX_OFFLOAD_ALL;
 	dev_info->rx_queue_offload_capa = DEV_RX_OFFLOAD_VLAN_STRIP;
diff --git a/drivers/net/ipn3ke/ipn3ke_representor.c b/drivers/net/ipn3ke/ipn3ke_representor.c
index 063a9c6a6f..d40947162d 100644
--- a/drivers/net/ipn3ke/ipn3ke_representor.c
+++ b/drivers/net/ipn3ke/ipn3ke_representor.c
@@ -96,6 +96,7 @@ ipn3ke_rpst_dev_infos_get(struct rte_eth_dev *ethdev,
 	dev_info->dev_capa =
 		RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP |
 		RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	dev_info->switch_info.name = ethdev->device->name;
 	dev_info->switch_info.domain_id = rpst->switch_domain_id;
diff --git a/drivers/net/mvpp2/mrvl_ethdev.c b/drivers/net/mvpp2/mrvl_ethdev.c
index a6458d2ce9..a6d67ea093 100644
--- a/drivers/net/mvpp2/mrvl_ethdev.c
+++ b/drivers/net/mvpp2/mrvl_ethdev.c
@@ -1709,6 +1709,8 @@ mrvl_dev_infos_get(struct rte_eth_dev *dev,
 {
 	struct mrvl_priv *priv = dev->data->dev_private;
 
+	info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
+
 	info->speed_capa = ETH_LINK_SPEED_10M |
 			   ETH_LINK_SPEED_100M |
 			   ETH_LINK_SPEED_1G |
diff --git a/drivers/net/octeontx2/otx2_ethdev_ops.c b/drivers/net/octeontx2/otx2_ethdev_ops.c
index 22a8af5cba..cad5416ba2 100644
--- a/drivers/net/octeontx2/otx2_ethdev_ops.c
+++ b/drivers/net/octeontx2/otx2_ethdev_ops.c
@@ -583,6 +583,7 @@ otx2_nix_info_get(struct rte_eth_dev *eth_dev, struct rte_eth_dev_info *devinfo)
 
 	devinfo->dev_capa = RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP |
 				RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP;
+	devinfo->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	return 0;
 }
diff --git a/drivers/net/qede/qede_ethdev.c b/drivers/net/qede/qede_ethdev.c
index 27f6932dc7..5bcc97d314 100644
--- a/drivers/net/qede/qede_ethdev.c
+++ b/drivers/net/qede/qede_ethdev.c
@@ -1367,6 +1367,7 @@ qede_dev_info_get(struct rte_eth_dev *eth_dev,
 	dev_info->max_rx_pktlen = (uint32_t)ETH_TX_MAX_NON_LSO_PKT_LEN;
 	dev_info->rx_desc_lim = qede_rx_desc_lim;
 	dev_info->tx_desc_lim = qede_tx_desc_lim;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	if (IS_PF(edev))
 		dev_info->max_rx_queues = (uint16_t)RTE_MIN(
diff --git a/drivers/net/sfc/sfc_ethdev.c b/drivers/net/sfc/sfc_ethdev.c
index f5986b610f..8951495841 100644
--- a/drivers/net/sfc/sfc_ethdev.c
+++ b/drivers/net/sfc/sfc_ethdev.c
@@ -186,6 +186,7 @@ sfc_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 
 	dev_info->dev_capa = RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP |
 			     RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	if (mae->status == SFC_MAE_STATUS_SUPPORTED ||
 	    mae->status == SFC_MAE_STATUS_ADMIN) {
diff --git a/drivers/net/softnic/rte_eth_softnic.c b/drivers/net/softnic/rte_eth_softnic.c
index b3b55b9035..3622049afa 100644
--- a/drivers/net/softnic/rte_eth_softnic.c
+++ b/drivers/net/softnic/rte_eth_softnic.c
@@ -93,6 +93,7 @@ pmd_dev_infos_get(struct rte_eth_dev *dev __rte_unused,
 	dev_info->max_rx_pktlen = UINT32_MAX;
 	dev_info->max_rx_queues = UINT16_MAX;
 	dev_info->max_tx_queues = UINT16_MAX;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	return 0;
 }
diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index e4f1ad4521..5e19bd8d4b 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -1006,6 +1006,7 @@ tap_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	 * functions together and not in partial combinations
 	 */
 	dev_info->flow_type_rss_offloads = ~TAP_RSS_HF_MASK;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	return 0;
 }
diff --git a/drivers/net/txgbe/txgbe_ethdev.c b/drivers/net/txgbe/txgbe_ethdev.c
index 7b46ffb686..6d64c657d9 100644
--- a/drivers/net/txgbe/txgbe_ethdev.c
+++ b/drivers/net/txgbe/txgbe_ethdev.c
@@ -2603,6 +2603,7 @@ txgbe_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->max_vfs = pci_dev->max_vfs;
 	dev_info->max_vmdq_pools = ETH_64_POOLS;
 	dev_info->vmdq_queue_num = dev_info->max_rx_queues;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 	dev_info->rx_queue_offload_capa = txgbe_get_rx_queue_offloads(dev);
 	dev_info->rx_offload_capa = (txgbe_get_rx_port_offloads(dev) |
 				     dev_info->rx_queue_offload_capa);
diff --git a/drivers/net/txgbe/txgbe_ethdev_vf.c b/drivers/net/txgbe/txgbe_ethdev_vf.c
index 43dc0ed39b..0d464c5a4c 100644
--- a/drivers/net/txgbe/txgbe_ethdev_vf.c
+++ b/drivers/net/txgbe/txgbe_ethdev_vf.c
@@ -487,6 +487,7 @@ txgbevf_dev_info_get(struct rte_eth_dev *dev,
 	dev_info->max_hash_mac_addrs = TXGBE_VMDQ_NUM_UC_MAC;
 	dev_info->max_vfs = pci_dev->max_vfs;
 	dev_info->max_vmdq_pools = ETH_64_POOLS;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 	dev_info->rx_queue_offload_capa = txgbe_get_rx_queue_offloads(dev);
 	dev_info->rx_offload_capa = (txgbe_get_rx_port_offloads(dev) |
 				     dev_info->rx_queue_offload_capa);
-- 
2.25.1
^ permalink raw reply	[flat|nested] 96+ messages in thread
* [dpdk-dev] [PATCH v4 4/6] net/mlx5: discover max flow priority using DevX
  2021-10-21  6:34     ` [dpdk-dev] [PATCH v4 " Dmitry Kozlyuk
                         ` (2 preceding siblings ...)
  2021-10-21  6:35       ` [dpdk-dev] [PATCH v4 3/6] net: advertise no support for keeping flow rules Dmitry Kozlyuk
@ 2021-10-21  6:35       ` Dmitry Kozlyuk
  2021-10-21  6:35       ` [dpdk-dev] [PATCH v4 5/6] net/mlx5: create drop queue " Dmitry Kozlyuk
                         ` (4 subsequent siblings)
  8 siblings, 0 replies; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-10-21  6:35 UTC (permalink / raw)
  To: dev; +Cc: Ori Kam, stable, Matan Azrad, Viacheslav Ovsiienko
Maximum available flow priority was discovered using Verbs API
regardless of the selected flow engine. This required some Verbs
objects to be initialized in order to use DevX engine. Make priority
discovery an engine method and implement it for DevX using its API.
Cc: stable@dpdk.org
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_os.c   |   1 -
 drivers/net/mlx5/mlx5_flow.c       |  98 +++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_flow.h       |   4 ++
 drivers/net/mlx5/mlx5_flow_dv.c    | 103 +++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_flow_verbs.c |  77 +++------------------
 5 files changed, 215 insertions(+), 68 deletions(-)
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 26a8d75b99..60d2c398db 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1849,7 +1849,6 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	priv->drop_queue.hrxq = mlx5_drop_action_create(eth_dev);
 	if (!priv->drop_queue.hrxq)
 		goto error;
-	/* Supported Verbs flow priority number detection. */
 	err = mlx5_flow_discover_priorities(eth_dev);
 	if (err < 0) {
 		err = -err;
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index b4d0b7b5ef..2768244c2e 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -9480,3 +9480,101 @@ mlx5_flow_expand_rss_adjust_node(const struct rte_flow_item *pattern,
 		return node;
 	}
 }
+
+/* Map of Verbs to Flow priority with 8 Verbs priorities. */
+static const uint32_t priority_map_3[][MLX5_PRIORITY_MAP_MAX] = {
+	{ 0, 1, 2 }, { 2, 3, 4 }, { 5, 6, 7 },
+};
+
+/* Map of Verbs to Flow priority with 16 Verbs priorities. */
+static const uint32_t priority_map_5[][MLX5_PRIORITY_MAP_MAX] = {
+	{ 0, 1, 2 }, { 3, 4, 5 }, { 6, 7, 8 },
+	{ 9, 10, 11 }, { 12, 13, 14 },
+};
+
+/**
+ * Discover the number of available flow priorities.
+ *
+ * @param dev
+ *   Ethernet device.
+ *
+ * @return
+ *   On success, number of available flow priorities.
+ *   On failure, a negative errno-style code and rte_errno is set.
+ */
+int
+mlx5_flow_discover_priorities(struct rte_eth_dev *dev)
+{
+	static const uint16_t vprio[] = {8, 16};
+	const struct mlx5_priv *priv = dev->data->dev_private;
+	const struct mlx5_flow_driver_ops *fops;
+	enum mlx5_flow_drv_type type;
+	int ret;
+
+	type = mlx5_flow_os_get_type();
+	if (type == MLX5_FLOW_TYPE_MAX) {
+		type = MLX5_FLOW_TYPE_VERBS;
+		if (priv->config.devx && priv->config.dv_flow_en)
+			type = MLX5_FLOW_TYPE_DV;
+	}
+	fops = flow_get_drv_ops(type);
+	if (fops->discover_priorities == NULL) {
+		DRV_LOG(ERR, "Priority discovery not supported");
+		rte_errno = ENOTSUP;
+		return -rte_errno;
+	}
+	ret = fops->discover_priorities(dev, vprio, RTE_DIM(vprio));
+	if (ret < 0)
+		return ret;
+	switch (ret) {
+	case 8:
+		ret = RTE_DIM(priority_map_3);
+		break;
+	case 16:
+		ret = RTE_DIM(priority_map_5);
+		break;
+	default:
+		rte_errno = ENOTSUP;
+		DRV_LOG(ERR,
+			"port %u maximum priority: %d expected 8/16",
+			dev->data->port_id, ret);
+		return -rte_errno;
+	}
+	DRV_LOG(INFO, "port %u supported flow priorities:"
+		" 0-%d for ingress or egress root table,"
+		" 0-%d for non-root table or transfer root table.",
+		dev->data->port_id, ret - 2,
+		MLX5_NON_ROOT_FLOW_MAX_PRIO - 1);
+	return ret;
+}
+
+/**
+ * Adjust flow priority based on the highest layer and the request priority.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] priority
+ *   The rule base priority.
+ * @param[in] subpriority
+ *   The priority based on the items.
+ *
+ * @return
+ *   The new priority.
+ */
+uint32_t
+mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
+			  uint32_t subpriority)
+{
+	uint32_t res = 0;
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	switch (priv->config.flow_prio) {
+	case RTE_DIM(priority_map_3):
+		res = priority_map_3[priority][subpriority];
+		break;
+	case RTE_DIM(priority_map_5):
+		res = priority_map_5[priority][subpriority];
+		break;
+	}
+	return  res;
+}
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 5c68d4f7d7..8f94125f26 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1226,6 +1226,9 @@ typedef int (*mlx5_flow_create_def_policy_t)
 			(struct rte_eth_dev *dev);
 typedef void (*mlx5_flow_destroy_def_policy_t)
 			(struct rte_eth_dev *dev);
+typedef int (*mlx5_flow_discover_priorities_t)
+			(struct rte_eth_dev *dev,
+			 const uint16_t *vprio, int vprio_n);
 
 struct mlx5_flow_driver_ops {
 	mlx5_flow_validate_t validate;
@@ -1260,6 +1263,7 @@ struct mlx5_flow_driver_ops {
 	mlx5_flow_action_update_t action_update;
 	mlx5_flow_action_query_t action_query;
 	mlx5_flow_sync_domain_t sync_domain;
+	mlx5_flow_discover_priorities_t discover_priorities;
 };
 
 /* mlx5_flow.c */
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index e31d4d8468..5163f518d7 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -17883,6 +17883,108 @@ flow_dv_sync_domain(struct rte_eth_dev *dev, uint32_t domains, uint32_t flags)
 	return 0;
 }
 
+/**
+ * Discover the number of available flow priorities
+ * by trying to create a flow with the highest priority value
+ * for each possible number.
+ *
+ * @param[in] dev
+ *   Ethernet device.
+ * @param[in] vprio
+ *   List of possible number of available priorities.
+ * @param[in] vprio_n
+ *   Size of @p vprio array.
+ * @return
+ *   On success, number of available flow priorities.
+ *   On failure, a negative errno-style code and rte_errno is set.
+ */
+static int
+flow_dv_discover_priorities(struct rte_eth_dev *dev,
+			    const uint16_t *vprio, int vprio_n)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_indexed_pool *pool = priv->sh->ipool[MLX5_IPOOL_MLX5_FLOW];
+	struct rte_flow_item_eth eth;
+	struct rte_flow_item item = {
+		.type = RTE_FLOW_ITEM_TYPE_ETH,
+		.spec = ð,
+		.mask = ð,
+	};
+	struct mlx5_flow_dv_matcher matcher = {
+		.mask = {
+			.size = sizeof(matcher.mask.buf),
+		},
+	};
+	union mlx5_flow_tbl_key tbl_key;
+	struct mlx5_flow flow;
+	void *action;
+	struct rte_flow_error error;
+	uint8_t misc_mask;
+	int i, err, ret = -ENOTSUP;
+
+	/*
+	 * Prepare a flow with a catch-all pattern and a drop action.
+	 * Use drop queue, because shared drop action may be unavailable.
+	 */
+	action = priv->drop_queue.hrxq->action;
+	if (action == NULL) {
+		DRV_LOG(ERR, "Priority discovery requires a drop action");
+		rte_errno = ENOTSUP;
+		return -rte_errno;
+	}
+	memset(&flow, 0, sizeof(flow));
+	flow.handle = mlx5_ipool_zmalloc(pool, &flow.handle_idx);
+	if (flow.handle == NULL) {
+		DRV_LOG(ERR, "Cannot create flow handle");
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	flow.ingress = true;
+	flow.dv.value.size = MLX5_ST_SZ_BYTES(fte_match_param);
+	flow.dv.actions[0] = action;
+	flow.dv.actions_n = 1;
+	memset(ð, 0, sizeof(eth));
+	flow_dv_translate_item_eth(matcher.mask.buf, flow.dv.value.buf,
+				   &item, /* inner */ false, /* group */ 0);
+	matcher.crc = rte_raw_cksum(matcher.mask.buf, matcher.mask.size);
+	for (i = 0; i < vprio_n; i++) {
+		/* Configure the next proposed maximum priority. */
+		matcher.priority = vprio[i] - 1;
+		memset(&tbl_key, 0, sizeof(tbl_key));
+		err = flow_dv_matcher_register(dev, &matcher, &tbl_key, &flow,
+					       /* tunnel */ NULL,
+					       /* group */ 0,
+					       &error);
+		if (err != 0) {
+			/* This action is pure SW and must always succeed. */
+			DRV_LOG(ERR, "Cannot register matcher");
+			ret = -rte_errno;
+			break;
+		}
+		/* Try to apply the flow to HW. */
+		misc_mask = flow_dv_matcher_enable(flow.dv.value.buf);
+		__flow_dv_adjust_buf_size(&flow.dv.value.size, misc_mask);
+		err = mlx5_flow_os_create_flow
+				(flow.handle->dvh.matcher->matcher_object,
+				 (void *)&flow.dv.value, flow.dv.actions_n,
+				 flow.dv.actions, &flow.handle->drv_flow);
+		if (err == 0) {
+			claim_zero(mlx5_flow_os_destroy_flow
+						(flow.handle->drv_flow));
+			flow.handle->drv_flow = NULL;
+		}
+		claim_zero(flow_dv_matcher_release(dev, flow.handle));
+		if (err != 0)
+			break;
+		ret = vprio[i];
+	}
+	mlx5_ipool_free(pool, flow.handle_idx);
+	/* Set rte_errno if no expected priority value matched. */
+	if (ret < 0)
+		rte_errno = -ret;
+	return ret;
+}
+
 const struct mlx5_flow_driver_ops mlx5_flow_dv_drv_ops = {
 	.validate = flow_dv_validate,
 	.prepare = flow_dv_prepare,
@@ -17916,6 +18018,7 @@ const struct mlx5_flow_driver_ops mlx5_flow_dv_drv_ops = {
 	.action_update = flow_dv_action_update,
 	.action_query = flow_dv_action_query,
 	.sync_domain = flow_dv_sync_domain,
+	.discover_priorities = flow_dv_discover_priorities,
 };
 
 #endif /* HAVE_IBV_FLOW_DV_SUPPORT */
diff --git a/drivers/net/mlx5/mlx5_flow_verbs.c b/drivers/net/mlx5/mlx5_flow_verbs.c
index 1627c3905f..c8cf7ef29c 100644
--- a/drivers/net/mlx5/mlx5_flow_verbs.c
+++ b/drivers/net/mlx5/mlx5_flow_verbs.c
@@ -28,17 +28,6 @@
 #define VERBS_SPEC_INNER(item_flags) \
 	(!!((item_flags) & MLX5_FLOW_LAYER_TUNNEL) ? IBV_FLOW_SPEC_INNER : 0)
 
-/* Map of Verbs to Flow priority with 8 Verbs priorities. */
-static const uint32_t priority_map_3[][MLX5_PRIORITY_MAP_MAX] = {
-	{ 0, 1, 2 }, { 2, 3, 4 }, { 5, 6, 7 },
-};
-
-/* Map of Verbs to Flow priority with 16 Verbs priorities. */
-static const uint32_t priority_map_5[][MLX5_PRIORITY_MAP_MAX] = {
-	{ 0, 1, 2 }, { 3, 4, 5 }, { 6, 7, 8 },
-	{ 9, 10, 11 }, { 12, 13, 14 },
-};
-
 /* Verbs specification header. */
 struct ibv_spec_header {
 	enum ibv_flow_spec_type type;
@@ -50,13 +39,17 @@ struct ibv_spec_header {
  *
  * @param[in] dev
  *   Pointer to the Ethernet device structure.
- *
+ * @param[in] vprio
+ *   Expected result variants.
+ * @param[in] vprio_n
+ *   Number of entries in @p vprio array.
  * @return
- *   number of supported flow priority on success, a negative errno
+ *   Number of supported flow priority on success, a negative errno
  *   value otherwise and rte_errno is set.
  */
-int
-mlx5_flow_discover_priorities(struct rte_eth_dev *dev)
+static int
+flow_verbs_discover_priorities(struct rte_eth_dev *dev,
+			       const uint16_t *vprio, int vprio_n)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct {
@@ -79,7 +72,6 @@ mlx5_flow_discover_priorities(struct rte_eth_dev *dev)
 	};
 	struct ibv_flow *flow;
 	struct mlx5_hrxq *drop = priv->drop_queue.hrxq;
-	uint16_t vprio[] = { 8, 16 };
 	int i;
 	int priority = 0;
 
@@ -87,7 +79,7 @@ mlx5_flow_discover_priorities(struct rte_eth_dev *dev)
 		rte_errno = ENOTSUP;
 		return -rte_errno;
 	}
-	for (i = 0; i != RTE_DIM(vprio); i++) {
+	for (i = 0; i != vprio_n; i++) {
 		flow_attr.attr.priority = vprio[i] - 1;
 		flow = mlx5_glue->create_flow(drop->qp, &flow_attr.attr);
 		if (!flow)
@@ -95,59 +87,9 @@ mlx5_flow_discover_priorities(struct rte_eth_dev *dev)
 		claim_zero(mlx5_glue->destroy_flow(flow));
 		priority = vprio[i];
 	}
-	switch (priority) {
-	case 8:
-		priority = RTE_DIM(priority_map_3);
-		break;
-	case 16:
-		priority = RTE_DIM(priority_map_5);
-		break;
-	default:
-		rte_errno = ENOTSUP;
-		DRV_LOG(ERR,
-			"port %u verbs maximum priority: %d expected 8/16",
-			dev->data->port_id, priority);
-		return -rte_errno;
-	}
-	DRV_LOG(INFO, "port %u supported flow priorities:"
-		" 0-%d for ingress or egress root table,"
-		" 0-%d for non-root table or transfer root table.",
-		dev->data->port_id, priority - 2,
-		MLX5_NON_ROOT_FLOW_MAX_PRIO - 1);
 	return priority;
 }
 
-/**
- * Adjust flow priority based on the highest layer and the request priority.
- *
- * @param[in] dev
- *   Pointer to the Ethernet device structure.
- * @param[in] priority
- *   The rule base priority.
- * @param[in] subpriority
- *   The priority based on the items.
- *
- * @return
- *   The new priority.
- */
-uint32_t
-mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
-				   uint32_t subpriority)
-{
-	uint32_t res = 0;
-	struct mlx5_priv *priv = dev->data->dev_private;
-
-	switch (priv->config.flow_prio) {
-	case RTE_DIM(priority_map_3):
-		res = priority_map_3[priority][subpriority];
-		break;
-	case RTE_DIM(priority_map_5):
-		res = priority_map_5[priority][subpriority];
-		break;
-	}
-	return  res;
-}
-
 /**
  * Get Verbs flow counter by index.
  *
@@ -2087,4 +2029,5 @@ const struct mlx5_flow_driver_ops mlx5_flow_verbs_drv_ops = {
 	.destroy = flow_verbs_destroy,
 	.query = flow_verbs_query,
 	.sync_domain = flow_verbs_sync_domain,
+	.discover_priorities = flow_verbs_discover_priorities,
 };
-- 
2.25.1
^ permalink raw reply	[flat|nested] 96+ messages in thread
* [dpdk-dev] [PATCH v4 5/6] net/mlx5: create drop queue using DevX
  2021-10-21  6:34     ` [dpdk-dev] [PATCH v4 " Dmitry Kozlyuk
                         ` (3 preceding siblings ...)
  2021-10-21  6:35       ` [dpdk-dev] [PATCH v4 4/6] net/mlx5: discover max flow priority using DevX Dmitry Kozlyuk
@ 2021-10-21  6:35       ` Dmitry Kozlyuk
  2021-10-21  6:35       ` [dpdk-dev] [PATCH v4 6/6] net/mlx5: preserve indirect actions on restart Dmitry Kozlyuk
                         ` (3 subsequent siblings)
  8 siblings, 0 replies; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-10-21  6:35 UTC (permalink / raw)
  To: dev; +Cc: Ori Kam, stable, Matan Azrad, Viacheslav Ovsiienko
Drop queue creation and destruction were not implemented for DevX
flow engine and Verbs engine methods were used as a workaround.
Implement these methods for DevX so that there is a valid queue ID
that can be used regardless of queue configuration via API.
Cc: stable@dpdk.org
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_os.c |   4 -
 drivers/net/mlx5/mlx5_devx.c     | 211 ++++++++++++++++++++++++++-----
 2 files changed, 180 insertions(+), 35 deletions(-)
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 60d2c398db..a8d4b9ae88 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1809,10 +1809,6 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	}
 	if (config->devx && config->dv_flow_en && config->dest_tir) {
 		priv->obj_ops = devx_obj_ops;
-		priv->obj_ops.drop_action_create =
-						ibv_obj_ops.drop_action_create;
-		priv->obj_ops.drop_action_destroy =
-						ibv_obj_ops.drop_action_destroy;
 #ifndef HAVE_MLX5DV_DEVX_UAR_OFFSET
 		priv->obj_ops.txq_obj_modify = ibv_obj_ops.txq_obj_modify;
 #else
diff --git a/drivers/net/mlx5/mlx5_devx.c b/drivers/net/mlx5/mlx5_devx.c
index a1db53577a..1e62108c94 100644
--- a/drivers/net/mlx5/mlx5_devx.c
+++ b/drivers/net/mlx5/mlx5_devx.c
@@ -226,17 +226,17 @@ mlx5_rx_devx_get_event(struct mlx5_rxq_obj *rxq_obj)
  *
  * @param dev
  *   Pointer to Ethernet device.
- * @param idx
- *   Queue index in DPDK Rx queue array.
+ * @param rxq_data
+ *   RX queue data.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_rxq_create_devx_rq_resources(struct rte_eth_dev *dev, uint16_t idx)
+mlx5_rxq_create_devx_rq_resources(struct rte_eth_dev *dev,
+				  struct mlx5_rxq_data *rxq_data)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[idx];
 	struct mlx5_rxq_ctrl *rxq_ctrl =
 		container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
 	struct mlx5_devx_create_rq_attr rq_attr = { 0 };
@@ -289,20 +289,20 @@ mlx5_rxq_create_devx_rq_resources(struct rte_eth_dev *dev, uint16_t idx)
  *
  * @param dev
  *   Pointer to Ethernet device.
- * @param idx
- *   Queue index in DPDK Rx queue array.
+ * @param rxq_data
+ *   RX queue data.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_rxq_create_devx_cq_resources(struct rte_eth_dev *dev, uint16_t idx)
+mlx5_rxq_create_devx_cq_resources(struct rte_eth_dev *dev,
+				  struct mlx5_rxq_data *rxq_data)
 {
 	struct mlx5_devx_cq *cq_obj = 0;
 	struct mlx5_devx_cq_attr cq_attr = { 0 };
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_dev_ctx_shared *sh = priv->sh;
-	struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[idx];
 	struct mlx5_rxq_ctrl *rxq_ctrl =
 		container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
 	unsigned int cqe_n = mlx5_rxq_cqe_num(rxq_data);
@@ -497,13 +497,13 @@ mlx5_rxq_devx_obj_new(struct rte_eth_dev *dev, uint16_t idx)
 		tmpl->fd = mlx5_os_get_devx_channel_fd(tmpl->devx_channel);
 	}
 	/* Create CQ using DevX API. */
-	ret = mlx5_rxq_create_devx_cq_resources(dev, idx);
+	ret = mlx5_rxq_create_devx_cq_resources(dev, rxq_data);
 	if (ret) {
 		DRV_LOG(ERR, "Failed to create CQ.");
 		goto error;
 	}
 	/* Create RQ using DevX API. */
-	ret = mlx5_rxq_create_devx_rq_resources(dev, idx);
+	ret = mlx5_rxq_create_devx_rq_resources(dev, rxq_data);
 	if (ret) {
 		DRV_LOG(ERR, "Port %u Rx queue %u RQ creation failure.",
 			dev->data->port_id, idx);
@@ -536,6 +536,11 @@ mlx5_rxq_devx_obj_new(struct rte_eth_dev *dev, uint16_t idx)
  *   Pointer to Ethernet device.
  * @param log_n
  *   Log of number of queues in the array.
+ * @param queues
+ *   List of RX queue indices or NULL, in which case
+ *   the attribute will be filled by drop queue ID.
+ * @param queues_n
+ *   Size of @p queues array or 0 if it is NULL.
  * @param ind_tbl
  *   DevX indirection table object.
  *
@@ -563,6 +568,11 @@ mlx5_devx_ind_table_create_rqt_attr(struct rte_eth_dev *dev,
 	}
 	rqt_attr->rqt_max_size = priv->config.ind_table_max_size;
 	rqt_attr->rqt_actual_size = rqt_n;
+	if (queues == NULL) {
+		for (i = 0; i < rqt_n; i++)
+			rqt_attr->rq_list[i] = priv->drop_queue.rxq->rq->id;
+		return rqt_attr;
+	}
 	for (i = 0; i != queues_n; ++i) {
 		struct mlx5_rxq_data *rxq = (*priv->rxqs)[queues[i]];
 		struct mlx5_rxq_ctrl *rxq_ctrl =
@@ -595,11 +605,12 @@ mlx5_devx_ind_table_new(struct rte_eth_dev *dev, const unsigned int log_n,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_devx_rqt_attr *rqt_attr = NULL;
+	const uint16_t *queues = dev->data->dev_started ? ind_tbl->queues :
+							  NULL;
 
 	MLX5_ASSERT(ind_tbl);
-	rqt_attr = mlx5_devx_ind_table_create_rqt_attr(dev, log_n,
-							ind_tbl->queues,
-							ind_tbl->queues_n);
+	rqt_attr = mlx5_devx_ind_table_create_rqt_attr(dev, log_n, queues,
+						       ind_tbl->queues_n);
 	if (!rqt_attr)
 		return -rte_errno;
 	ind_tbl->rqt = mlx5_devx_cmd_create_rqt(priv->sh->ctx, rqt_attr);
@@ -670,7 +681,8 @@ mlx5_devx_ind_table_destroy(struct mlx5_ind_table_obj *ind_tbl)
  * @param[in] hash_fields
  *   Verbs protocol hash field to make the RSS on.
  * @param[in] ind_tbl
- *   Indirection table for TIR.
+ *   Indirection table for TIR. If table queues array is NULL,
+ *   a TIR for drop queue is assumed.
  * @param[in] tunnel
  *   Tunnel type.
  * @param[out] tir_attr
@@ -686,19 +698,27 @@ mlx5_devx_tir_attr_set(struct rte_eth_dev *dev, const uint8_t *rss_key,
 		       int tunnel, struct mlx5_devx_tir_attr *tir_attr)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[ind_tbl->queues[0]];
-	struct mlx5_rxq_ctrl *rxq_ctrl =
-		container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
-	enum mlx5_rxq_type rxq_obj_type = rxq_ctrl->type;
+	enum mlx5_rxq_type rxq_obj_type;
 	bool lro = true;
 	uint32_t i;
 
-	/* Enable TIR LRO only if all the queues were configured for. */
-	for (i = 0; i < ind_tbl->queues_n; ++i) {
-		if (!(*priv->rxqs)[ind_tbl->queues[i]]->lro) {
-			lro = false;
-			break;
+	/* NULL queues designate drop queue. */
+	if (ind_tbl->queues != NULL) {
+		struct mlx5_rxq_data *rxq_data =
+					(*priv->rxqs)[ind_tbl->queues[0]];
+		struct mlx5_rxq_ctrl *rxq_ctrl =
+			container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
+		rxq_obj_type = rxq_ctrl->type;
+
+		/* Enable TIR LRO only if all the queues were configured for. */
+		for (i = 0; i < ind_tbl->queues_n; ++i) {
+			if (!(*priv->rxqs)[ind_tbl->queues[i]]->lro) {
+				lro = false;
+				break;
+			}
 		}
+	} else {
+		rxq_obj_type = priv->drop_queue.rxq->rxq_ctrl->type;
 	}
 	memset(tir_attr, 0, sizeof(*tir_attr));
 	tir_attr->disp_type = MLX5_TIRC_DISP_TYPE_INDIRECT;
@@ -857,7 +877,7 @@ mlx5_devx_hrxq_modify(struct rte_eth_dev *dev, struct mlx5_hrxq *hrxq,
 }
 
 /**
- * Create a DevX drop action for Rx Hash queue.
+ * Create a DevX drop Rx queue.
  *
  * @param dev
  *   Pointer to Ethernet device.
@@ -866,14 +886,99 @@ mlx5_devx_hrxq_modify(struct rte_eth_dev *dev, struct mlx5_hrxq *hrxq,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_devx_drop_action_create(struct rte_eth_dev *dev)
+mlx5_rxq_devx_obj_drop_create(struct rte_eth_dev *dev)
 {
-	(void)dev;
-	DRV_LOG(ERR, "DevX drop action is not supported yet.");
-	rte_errno = ENOTSUP;
+	struct mlx5_priv *priv = dev->data->dev_private;
+	int socket_id = dev->device->numa_node;
+	struct mlx5_rxq_ctrl *rxq_ctrl;
+	struct mlx5_rxq_data *rxq_data;
+	struct mlx5_rxq_obj *rxq = NULL;
+	int ret;
+
+	/*
+	 * Initialize dummy control structures.
+	 * They are required to hold pointers for cleanup
+	 * and are only accessible via drop queue DevX objects.
+	 */
+	rxq_ctrl = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*rxq_ctrl),
+			       0, socket_id);
+	if (rxq_ctrl == NULL) {
+		DRV_LOG(ERR, "Port %u could not allocate drop queue control",
+			dev->data->port_id);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	rxq = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*rxq), 0, socket_id);
+	if (rxq == NULL) {
+		DRV_LOG(ERR, "Port %u could not allocate drop queue object",
+			dev->data->port_id);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	rxq->rxq_ctrl = rxq_ctrl;
+	rxq_ctrl->type = MLX5_RXQ_TYPE_STANDARD;
+	rxq_ctrl->priv = priv;
+	rxq_ctrl->obj = rxq;
+	rxq_data = &rxq_ctrl->rxq;
+	/* Create CQ using DevX API. */
+	ret = mlx5_rxq_create_devx_cq_resources(dev, rxq_data);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Port %u drop queue CQ creation failed.",
+			dev->data->port_id);
+		goto error;
+	}
+	/* Create RQ using DevX API. */
+	ret = mlx5_rxq_create_devx_rq_resources(dev, rxq_data);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Port %u drop queue RQ creation failed.",
+			dev->data->port_id);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	/* Change queue state to ready. */
+	ret = mlx5_devx_modify_rq(rxq, MLX5_RXQ_MOD_RST2RDY);
+	if (ret != 0)
+		goto error;
+	/* Initialize drop queue. */
+	priv->drop_queue.rxq = rxq;
+	return 0;
+error:
+	ret = rte_errno; /* Save rte_errno before cleanup. */
+	if (rxq != NULL) {
+		if (rxq->rq_obj.rq != NULL)
+			mlx5_devx_rq_destroy(&rxq->rq_obj);
+		if (rxq->cq_obj.cq != NULL)
+			mlx5_devx_cq_destroy(&rxq->cq_obj);
+		if (rxq->devx_channel)
+			mlx5_os_devx_destroy_event_channel
+							(rxq->devx_channel);
+		mlx5_free(rxq);
+	}
+	if (rxq_ctrl != NULL)
+		mlx5_free(rxq_ctrl);
+	rte_errno = ret; /* Restore rte_errno. */
 	return -rte_errno;
 }
 
+/**
+ * Release drop Rx queue resources.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ */
+static void
+mlx5_rxq_devx_obj_drop_release(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rxq_obj *rxq = priv->drop_queue.rxq;
+	struct mlx5_rxq_ctrl *rxq_ctrl = rxq->rxq_ctrl;
+
+	mlx5_rxq_devx_obj_release(rxq);
+	mlx5_free(rxq);
+	mlx5_free(rxq_ctrl);
+	priv->drop_queue.rxq = NULL;
+}
+
 /**
  * Release a drop hash Rx queue.
  *
@@ -883,9 +988,53 @@ mlx5_devx_drop_action_create(struct rte_eth_dev *dev)
 static void
 mlx5_devx_drop_action_destroy(struct rte_eth_dev *dev)
 {
-	(void)dev;
-	DRV_LOG(ERR, "DevX drop action is not supported yet.");
-	rte_errno = ENOTSUP;
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_hrxq *hrxq = priv->drop_queue.hrxq;
+
+	if (hrxq->tir != NULL)
+		mlx5_devx_tir_destroy(hrxq);
+	if (hrxq->ind_table->ind_table != NULL)
+		mlx5_devx_ind_table_destroy(hrxq->ind_table);
+	if (priv->drop_queue.rxq->rq != NULL)
+		mlx5_rxq_devx_obj_drop_release(dev);
+}
+
+/**
+ * Create a DevX drop action for Rx Hash queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_devx_drop_action_create(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_hrxq *hrxq = priv->drop_queue.hrxq;
+	int ret;
+
+	ret = mlx5_rxq_devx_obj_drop_create(dev);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Cannot create drop RX queue");
+		return ret;
+	}
+	/* hrxq->ind_table queues are NULL, drop RX queue ID will be used */
+	ret = mlx5_devx_ind_table_new(dev, 0, hrxq->ind_table);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Cannot create drop hash RX queue indirection table");
+		goto error;
+	}
+	ret = mlx5_devx_hrxq_new(dev, hrxq, /* tunnel */ false);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Cannot create drop hash RX queue");
+		goto error;
+	}
+	return 0;
+error:
+	mlx5_devx_drop_action_destroy(dev);
+	return ret;
 }
 
 /**
-- 
2.25.1
^ permalink raw reply	[flat|nested] 96+ messages in thread
* [dpdk-dev] [PATCH v4 6/6] net/mlx5: preserve indirect actions on restart
  2021-10-21  6:34     ` [dpdk-dev] [PATCH v4 " Dmitry Kozlyuk
                         ` (4 preceding siblings ...)
  2021-10-21  6:35       ` [dpdk-dev] [PATCH v4 5/6] net/mlx5: create drop queue " Dmitry Kozlyuk
@ 2021-10-21  6:35       ` Dmitry Kozlyuk
  2021-10-26 11:46       ` [dpdk-dev] [PATCH v4 0/6] Flow entites behavior on port restart Ferruh Yigit
                         ` (2 subsequent siblings)
  8 siblings, 0 replies; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-10-21  6:35 UTC (permalink / raw)
  To: dev; +Cc: Ori Kam, bingz, stable, Matan Azrad, Viacheslav Ovsiienko
MLX5 PMD uses reference counting to manage RX queue resources.
After port stop shared RSS actions kept references to RX queues,
preventing resource release. As a result, internal PMD mempool
for such queues had been exhausted after a number of port restarts.
Diagnostic message from rte_eth_dev_start():
    Rx queue allocation failed: Cannot allocate memory
Dereference RX queues used by indirect actions on port stop (detach)
and restore references on port start (attach) in order to allow RX queue
resource release, but keep indirect RSS across the port restart.
Replace queue IDs in HW by drop queue ID on detach and restore actual
queue IDs on attach.
When the port is stopped, create indirect RSS in the detached state.
As a result, MLX5 PMD is able to keep all its indirect actions
across port restart. Advertise this capability.
Fixes: 4b61b8774be9 ("ethdev: introduce indirect flow action")
Cc: bingz@nvidia.com
Cc: stable@dpdk.org
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/net/mlx5/mlx5_ethdev.c  |   1 +
 drivers/net/mlx5/mlx5_flow.c    | 194 ++++++++++++++++++++++++++++----
 drivers/net/mlx5/mlx5_flow.h    |   2 +
 drivers/net/mlx5/mlx5_rx.h      |   4 +
 drivers/net/mlx5/mlx5_rxq.c     |  99 ++++++++++++++--
 drivers/net/mlx5/mlx5_trigger.c |  10 ++
 6 files changed, 276 insertions(+), 34 deletions(-)
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 82e2284d98..419fec3e4e 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -321,6 +321,7 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
 	info->rx_offload_capa = (mlx5_get_rx_port_offloads() |
 				 info->rx_queue_offload_capa);
 	info->tx_offload_capa = mlx5_get_tx_port_offloads(dev);
+	info->dev_capa = RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP;
 	info->if_index = mlx5_ifindex(dev);
 	info->reta_size = priv->reta_idx_n ?
 		priv->reta_idx_n : config->ind_table_max_size;
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 2768244c2e..df1e927534 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -1583,6 +1583,58 @@ mlx5_flow_validate_action_queue(const struct rte_flow_action *action,
 	return 0;
 }
 
+/**
+ * Validate queue numbers for device RSS.
+ *
+ * @param[in] dev
+ *   Configured device.
+ * @param[in] queues
+ *   Array of queue numbers.
+ * @param[in] queues_n
+ *   Size of the @p queues array.
+ * @param[out] error
+ *   On error, filled with a textual error description.
+ * @param[out] queue
+ *   On error, filled with an offending queue index in @p queues array.
+ *
+ * @return
+ *   0 on success, a negative errno code on error.
+ */
+static int
+mlx5_validate_rss_queues(const struct rte_eth_dev *dev,
+			 const uint16_t *queues, uint32_t queues_n,
+			 const char **error, uint32_t *queue_idx)
+{
+	const struct mlx5_priv *priv = dev->data->dev_private;
+	enum mlx5_rxq_type rxq_type = MLX5_RXQ_TYPE_UNDEFINED;
+	uint32_t i;
+
+	for (i = 0; i != queues_n; ++i) {
+		struct mlx5_rxq_ctrl *rxq_ctrl;
+
+		if (queues[i] >= priv->rxqs_n) {
+			*error = "queue index out of range";
+			*queue_idx = i;
+			return -EINVAL;
+		}
+		if (!(*priv->rxqs)[queues[i]]) {
+			*error =  "queue is not configured";
+			*queue_idx = i;
+			return -EINVAL;
+		}
+		rxq_ctrl = container_of((*priv->rxqs)[queues[i]],
+					struct mlx5_rxq_ctrl, rxq);
+		if (i == 0)
+			rxq_type = rxq_ctrl->type;
+		if (rxq_type != rxq_ctrl->type) {
+			*error = "combining hairpin and regular RSS queues is not supported";
+			*queue_idx = i;
+			return -ENOTSUP;
+		}
+	}
+	return 0;
+}
+
 /*
  * Validate the rss action.
  *
@@ -1603,8 +1655,9 @@ mlx5_validate_action_rss(struct rte_eth_dev *dev,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	const struct rte_flow_action_rss *rss = action->conf;
-	enum mlx5_rxq_type rxq_type = MLX5_RXQ_TYPE_UNDEFINED;
-	unsigned int i;
+	int ret;
+	const char *message;
+	uint32_t queue_idx;
 
 	if (rss->func != RTE_ETH_HASH_FUNCTION_DEFAULT &&
 	    rss->func != RTE_ETH_HASH_FUNCTION_TOEPLITZ)
@@ -1668,27 +1721,12 @@ mlx5_validate_action_rss(struct rte_eth_dev *dev,
 		return rte_flow_error_set(error, EINVAL,
 					  RTE_FLOW_ERROR_TYPE_ACTION_CONF,
 					  NULL, "No queues configured");
-	for (i = 0; i != rss->queue_num; ++i) {
-		struct mlx5_rxq_ctrl *rxq_ctrl;
-
-		if (rss->queue[i] >= priv->rxqs_n)
-			return rte_flow_error_set
-				(error, EINVAL,
-				 RTE_FLOW_ERROR_TYPE_ACTION_CONF,
-				 &rss->queue[i], "queue index out of range");
-		if (!(*priv->rxqs)[rss->queue[i]])
-			return rte_flow_error_set
-				(error, EINVAL, RTE_FLOW_ERROR_TYPE_ACTION_CONF,
-				 &rss->queue[i], "queue is not configured");
-		rxq_ctrl = container_of((*priv->rxqs)[rss->queue[i]],
-					struct mlx5_rxq_ctrl, rxq);
-		if (i == 0)
-			rxq_type = rxq_ctrl->type;
-		if (rxq_type != rxq_ctrl->type)
-			return rte_flow_error_set
-				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION_CONF,
-				 &rss->queue[i],
-				 "combining hairpin and regular RSS queues is not supported");
+	ret = mlx5_validate_rss_queues(dev, rss->queue, rss->queue_num,
+				       &message, &queue_idx);
+	if (ret != 0) {
+		return rte_flow_error_set(error, -ret,
+					  RTE_FLOW_ERROR_TYPE_ACTION_CONF,
+					  &rss->queue[queue_idx], message);
 	}
 	return 0;
 }
@@ -8570,6 +8608,116 @@ mlx5_action_handle_flush(struct rte_eth_dev *dev)
 	return ret;
 }
 
+/**
+ * Validate existing indirect actions against current device configuration
+ * and attach them to device resources.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_action_handle_attach(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_indexed_pool *ipool =
+			priv->sh->ipool[MLX5_IPOOL_RSS_SHARED_ACTIONS];
+	struct mlx5_shared_action_rss *shared_rss, *shared_rss_last;
+	int ret = 0;
+	uint32_t idx;
+
+	ILIST_FOREACH(ipool, priv->rss_shared_actions, idx, shared_rss, next) {
+		struct mlx5_ind_table_obj *ind_tbl = shared_rss->ind_tbl;
+		const char *message;
+		uint32_t queue_idx;
+
+		ret = mlx5_validate_rss_queues(dev, ind_tbl->queues,
+					       ind_tbl->queues_n,
+					       &message, &queue_idx);
+		if (ret != 0) {
+			DRV_LOG(ERR, "Port %u cannot use queue %u in RSS: %s",
+				dev->data->port_id, ind_tbl->queues[queue_idx],
+				message);
+			break;
+		}
+	}
+	if (ret != 0)
+		return ret;
+	ILIST_FOREACH(ipool, priv->rss_shared_actions, idx, shared_rss, next) {
+		struct mlx5_ind_table_obj *ind_tbl = shared_rss->ind_tbl;
+
+		ret = mlx5_ind_table_obj_attach(dev, ind_tbl);
+		if (ret != 0) {
+			DRV_LOG(ERR, "Port %u could not attach "
+				"indirection table obj %p",
+				dev->data->port_id, (void *)ind_tbl);
+			goto error;
+		}
+	}
+	return 0;
+error:
+	shared_rss_last = shared_rss;
+	ILIST_FOREACH(ipool, priv->rss_shared_actions, idx, shared_rss, next) {
+		struct mlx5_ind_table_obj *ind_tbl = shared_rss->ind_tbl;
+
+		if (shared_rss == shared_rss_last)
+			break;
+		if (mlx5_ind_table_obj_detach(dev, ind_tbl) != 0)
+			DRV_LOG(CRIT, "Port %u could not detach "
+				"indirection table obj %p on rollback",
+				dev->data->port_id, (void *)ind_tbl);
+	}
+	return ret;
+}
+
+/**
+ * Detach indirect actions of the device from its resources.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_action_handle_detach(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_indexed_pool *ipool =
+			priv->sh->ipool[MLX5_IPOOL_RSS_SHARED_ACTIONS];
+	struct mlx5_shared_action_rss *shared_rss, *shared_rss_last;
+	int ret = 0;
+	uint32_t idx;
+
+	ILIST_FOREACH(ipool, priv->rss_shared_actions, idx, shared_rss, next) {
+		struct mlx5_ind_table_obj *ind_tbl = shared_rss->ind_tbl;
+
+		ret = mlx5_ind_table_obj_detach(dev, ind_tbl);
+		if (ret != 0) {
+			DRV_LOG(ERR, "Port %u could not detach "
+				"indirection table obj %p",
+				dev->data->port_id, (void *)ind_tbl);
+			goto error;
+		}
+	}
+	return 0;
+error:
+	shared_rss_last = shared_rss;
+	ILIST_FOREACH(ipool, priv->rss_shared_actions, idx, shared_rss, next) {
+		struct mlx5_ind_table_obj *ind_tbl = shared_rss->ind_tbl;
+
+		if (shared_rss == shared_rss_last)
+			break;
+		if (mlx5_ind_table_obj_attach(dev, ind_tbl) != 0)
+			DRV_LOG(CRIT, "Port %u could not attach "
+				"indirection table obj %p on rollback",
+				dev->data->port_id, (void *)ind_tbl);
+	}
+	return ret;
+}
+
 #ifndef HAVE_MLX5DV_DR
 #define MLX5_DOMAIN_SYNC_FLOW ((1 << 0) | (1 << 1))
 #else
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 8f94125f26..6bc7946cc3 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1574,6 +1574,8 @@ void mlx5_flow_destroy_sub_policy_with_rxq(struct rte_eth_dev *dev,
 		struct mlx5_flow_meter_policy *mtr_policy);
 int mlx5_flow_dv_discover_counter_offset_support(struct rte_eth_dev *dev);
 int mlx5_flow_discover_dr_action_support(struct rte_eth_dev *dev);
+int mlx5_action_handle_attach(struct rte_eth_dev *dev);
+int mlx5_action_handle_detach(struct rte_eth_dev *dev);
 int mlx5_action_handle_flush(struct rte_eth_dev *dev);
 void mlx5_release_tunnel_hub(struct mlx5_dev_ctx_shared *sh, uint16_t port_id);
 int mlx5_alloc_tunnel_hub(struct mlx5_dev_ctx_shared *sh);
diff --git a/drivers/net/mlx5/mlx5_rx.h b/drivers/net/mlx5/mlx5_rx.h
index a90cb497d1..6d010059f1 100644
--- a/drivers/net/mlx5/mlx5_rx.h
+++ b/drivers/net/mlx5/mlx5_rx.h
@@ -222,6 +222,10 @@ int mlx5_ind_table_obj_modify(struct rte_eth_dev *dev,
 			      struct mlx5_ind_table_obj *ind_tbl,
 			      uint16_t *queues, const uint32_t queues_n,
 			      bool standalone);
+int mlx5_ind_table_obj_attach(struct rte_eth_dev *dev,
+			      struct mlx5_ind_table_obj *ind_tbl);
+int mlx5_ind_table_obj_detach(struct rte_eth_dev *dev,
+			      struct mlx5_ind_table_obj *ind_tbl);
 struct mlx5_list_entry *mlx5_hrxq_create_cb(void *tool_ctx, void *cb_ctx);
 int mlx5_hrxq_match_cb(void *tool_ctx, struct mlx5_list_entry *entry,
 		       void *cb_ctx);
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 60673d014d..47124f6e81 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -2028,6 +2028,26 @@ mlx5_ind_table_obj_new(struct rte_eth_dev *dev, const uint16_t *queues,
 	return ind_tbl;
 }
 
+static int
+mlx5_ind_table_obj_check_standalone(struct rte_eth_dev *dev __rte_unused,
+				    struct mlx5_ind_table_obj *ind_tbl)
+{
+	uint32_t refcnt;
+
+	refcnt = __atomic_load_n(&ind_tbl->refcnt, __ATOMIC_RELAXED);
+	if (refcnt <= 1)
+		return 0;
+	/*
+	 * Modification of indirection tables having more than 1
+	 * reference is unsupported.
+	 */
+	DRV_LOG(DEBUG,
+		"Port %u cannot modify indirection table %p (refcnt %u > 1).",
+		dev->data->port_id, (void *)ind_tbl, refcnt);
+	rte_errno = EINVAL;
+	return -rte_errno;
+}
+
 /**
  * Modify an indirection table.
  *
@@ -2060,18 +2080,8 @@ mlx5_ind_table_obj_modify(struct rte_eth_dev *dev,
 
 	MLX5_ASSERT(standalone);
 	RTE_SET_USED(standalone);
-	if (__atomic_load_n(&ind_tbl->refcnt, __ATOMIC_RELAXED) > 1) {
-		/*
-		 * Modification of indirection ntables having more than 1
-		 * reference unsupported. Intended for standalone indirection
-		 * tables only.
-		 */
-		DRV_LOG(DEBUG,
-			"Port %u cannot modify indirection table (refcnt> 1).",
-			dev->data->port_id);
-		rte_errno = EINVAL;
+	if (mlx5_ind_table_obj_check_standalone(dev, ind_tbl) < 0)
 		return -rte_errno;
-	}
 	for (i = 0; i != queues_n; ++i) {
 		if (!mlx5_rxq_get(dev, queues[i])) {
 			ret = -rte_errno;
@@ -2097,6 +2107,73 @@ mlx5_ind_table_obj_modify(struct rte_eth_dev *dev,
 	return ret;
 }
 
+/**
+ * Attach an indirection table to its queues.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param ind_table
+ *   Indirection table to attach.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_ind_table_obj_attach(struct rte_eth_dev *dev,
+			  struct mlx5_ind_table_obj *ind_tbl)
+{
+	unsigned int i;
+	int ret;
+
+	ret = mlx5_ind_table_obj_modify(dev, ind_tbl, ind_tbl->queues,
+					ind_tbl->queues_n, true);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Port %u could not modify indirect table obj %p",
+			dev->data->port_id, (void *)ind_tbl);
+		return ret;
+	}
+	for (i = 0; i < ind_tbl->queues_n; i++)
+		mlx5_rxq_get(dev, ind_tbl->queues[i]);
+	return 0;
+}
+
+/**
+ * Detach an indirection table from its queues.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param ind_table
+ *   Indirection table to detach.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_ind_table_obj_detach(struct rte_eth_dev *dev,
+			  struct mlx5_ind_table_obj *ind_tbl)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const unsigned int n = rte_is_power_of_2(ind_tbl->queues_n) ?
+			       log2above(ind_tbl->queues_n) :
+			       log2above(priv->config.ind_table_max_size);
+	unsigned int i;
+	int ret;
+
+	ret = mlx5_ind_table_obj_check_standalone(dev, ind_tbl);
+	if (ret != 0)
+		return ret;
+	MLX5_ASSERT(priv->obj_ops.ind_table_modify);
+	ret = priv->obj_ops.ind_table_modify(dev, n, NULL, 0, ind_tbl);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Port %u could not modify indirect table obj %p",
+			dev->data->port_id, (void *)ind_tbl);
+		return ret;
+	}
+	for (i = 0; i < ind_tbl->queues_n; i++)
+		mlx5_rxq_release(dev, ind_tbl->queues[i]);
+	return ret;
+}
+
 int
 mlx5_hrxq_match_cb(void *tool_ctx __rte_unused, struct mlx5_list_entry *entry,
 		   void *cb_ctx)
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 3cbf5816a1..6295c6b3e9 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -14,6 +14,7 @@
 #include <mlx5_malloc.h>
 
 #include "mlx5.h"
+#include "mlx5_flow.h"
 #include "mlx5_mr.h"
 #include "mlx5_rx.h"
 #include "mlx5_tx.h"
@@ -1161,6 +1162,14 @@ mlx5_dev_start(struct rte_eth_dev *dev)
 	mlx5_rxq_timestamp_set(dev);
 	/* Set a mask and offset of scheduling on timestamp into Tx queues. */
 	mlx5_txq_dynf_timestamp_set(dev);
+	/* Attach indirection table objects detached on port stop. */
+	ret = mlx5_action_handle_attach(dev);
+	if (ret) {
+		DRV_LOG(ERR,
+			"port %u failed to attach indirect actions: %s",
+			dev->data->port_id, rte_strerror(rte_errno));
+		goto error;
+	}
 	/*
 	 * In non-cached mode, it only needs to start the default mreg copy
 	 * action and no flow created by application exists anymore.
@@ -1238,6 +1247,7 @@ mlx5_dev_stop(struct rte_eth_dev *dev)
 	/* All RX queue flags will be cleared in the flush interface. */
 	mlx5_flow_list_flush(dev, MLX5_FLOW_TYPE_GEN, true);
 	mlx5_flow_meter_rxq_flush(dev);
+	mlx5_action_handle_detach(dev);
 	mlx5_rx_intr_vec_disable(dev);
 	priv->sh->port[priv->dev_port - 1].ih_port_id = RTE_MAX_ETHPORTS;
 	priv->sh->port[priv->dev_port - 1].devx_ih_port_id = RTE_MAX_ETHPORTS;
-- 
2.25.1
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH v4 1/6] ethdev: add capability to keep flow rules on restart
  2021-10-21  6:34       ` [dpdk-dev] [PATCH v4 1/6] ethdev: add capability to keep flow rules on restart Dmitry Kozlyuk
@ 2021-10-21  7:36         ` Ori Kam
  2021-10-28 18:33         ` Ajit Khaparde
  2021-11-01 15:02         ` Andrew Rybchenko
  2 siblings, 0 replies; 96+ messages in thread
From: Ori Kam @ 2021-10-21  7:36 UTC (permalink / raw)
  To: Dmitry Kozlyuk, dev
  Cc: Ori Kam, NBU-Contact-Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
Hi Dmitry,
> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Dmitry Kozlyuk
> Sent: Thursday, October 21, 2021 9:35 AM
> To: dev@dpdk.org
> Cc: Ori Kam <orika@oss.nvidia.com>; NBU-Contact-Thomas Monjalon <thomas@monjalon.net>;
> Ferruh Yigit <ferruh.yigit@intel.com>; Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Subject: [dpdk-dev] [PATCH v4 1/6] ethdev: add capability to keep flow rules on restart
> 
> Previously, it was not specified what happens to the flow rules
> when the device is stopped, possibly reconfigured, then started.
> If flow rules were kept, it could be convenient for application
> developers, because they wouldn't need to save and restore them.
> However, due to the number of flows and possible creation rate it is
> impractical to save all flow rules in DPDK layer. This means that flow
> rules persistence really depends on whether PMD and HW can implement it
> efficiently. It can also be limited by the rule item and action types,
> and its attributes transfer bit (a combination of an item/action type
> and a value of the transfer bit is called a ruel feature).
> 
> Add a device capability bit for PMDs that can keep at least some
> of the flow rules across restart. Without this capability behavior
> is still unspecified and it is declared that the application must
> flush the rules before stopping the device.
> Allow the application to test for persistence of rules using
> a particular feature by attempting to create a flow rule
> using that feature when the device is stopped
> and checking for the specific error.
> This is logical because if the PMD can to create the flow rule
> when the device is not started and use it after the start happens,
> it is natural that it can move its internal flow rule object
> to the same state when the device is stopped and restore the state
> when the device is started.
> 
> Rule persistence across a reconfigurations is not required,
> because tracking all the rules and configuration-dependent resources
> they use may be infeasible. In case a PMD cannot keep the rules
> across reconfiguration, it is allowed just to report an error.
> Application must then flush the rules before attempting it.
> 
> Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> ---
>  doc/guides/prog_guide/rte_flow.rst | 31 ++++++++++++++++++++++++++++++
>  lib/ethdev/rte_ethdev.h            |  7 +++++++
>  lib/ethdev/rte_flow.h              |  1 +
>  3 files changed, 39 insertions(+)
> 
> diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
> index aeba374182..9beaae3df3 100644
> --- a/doc/guides/prog_guide/rte_flow.rst
> +++ b/doc/guides/prog_guide/rte_flow.rst
> @@ -87,6 +87,37 @@ To avoid resource leaks on the PMD side, handles must be explicitly
>  destroyed by the application before releasing associated resources such as
>  queues and ports.
> 
> +When the device is stopped, its rules do not process the traffic.
> +In particular, transfer rules created using some device
> +stop affecting the traffic even if they refer to different ports.
> +
> +If ``RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP`` is not advertised,
> +rules cannot be created until the device is started for the first time
> +and cannot be kept when the device is stopped.
> +However, PMD also does not flush them automatically on stop,
> +so the application must call ``rte_flow_flush()`` or ``rte_flow_destroy()``
> +before stopping the device to ensure no rules remain.
> +
> +If ``RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP`` is advertised, this means
> +the PMD can keep at least some rules across the device stop and start.
> +However, ``rte_eth_dev_configure()`` may fail if any rules remain,
> +so the application must flush them before attempting a reconfiguration.
> +Keeping may be unsupported for some types of rule items and actions,
> +as well as depending on the value of flow attributes transfer bit.
> +A combination of a single an item or action type
> +and a value of the transfer bit is called a rule feature.
> +For example: a COUNT action with the transfer bit set.
> +To test if rules with a particular feature are kept, the application must try
> +to create a valid rule using this feature when the device is not started
> +(either before the first start or after a stop).
> +If it fails with an error of type ``RTE_FLOW_ERROR_TYPE_STATE``,
> +all rules using this feature must be flushed by the application
> +before stopping the device.
> +If it succeeds, such rules will be kept when the device is stopped,
> +provided they do not use other features that are not supported.
> +Rules that are created when the device is stopped, including the rules
> +created for the test, will be kept after the device is started.
> +
>  The following sections cover:
> 
>  - **Attributes** (represented by ``struct rte_flow_attr``): properties of a
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index 014270d316..9cf23fecce 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -90,6 +90,11 @@
>   *     - flow director filtering mode (but not filtering rules)
>   *     - NIC queue statistics mappings
>   *
> + * The following configuration may be retained or not
> + * depending on the device capabilities:
> + *
> + *     - flow rules
> + *
>   * Any other configuration will not be stored and will need to be re-entered
>   * before a call to rte_eth_dev_start().
>   *
> @@ -1445,6 +1450,8 @@ struct rte_eth_conf {
>  #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
>  /** Device supports Tx queue setup after device started. */
>  #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
> +/** Device supports keeping flow rules across restart. */
> +#define RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP 0x00000004
>  /**@}*/
> 
>  /*
> diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
> index 2b6efeef8c..16ef33819b 100644
> --- a/lib/ethdev/rte_flow.h
> +++ b/lib/ethdev/rte_flow.h
> @@ -3676,6 +3676,7 @@ enum rte_flow_error_type {
>  	RTE_FLOW_ERROR_TYPE_ACTION_NUM, /**< Number of actions. */
>  	RTE_FLOW_ERROR_TYPE_ACTION_CONF, /**< Action configuration. */
>  	RTE_FLOW_ERROR_TYPE_ACTION, /**< Specific action. */
> +	RTE_FLOW_ERROR_TYPE_STATE, /**< Current device state. */
>  };
> 
>  /**
> --
> 2.25.1
Acked-by: Ori Kam <orika@nvidia.com>
Best,
Ori
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH v4 2/6] ethdev: add capability to keep shared objects on restart
  2021-10-21  6:34       ` [dpdk-dev] [PATCH v4 2/6] ethdev: add capability to keep shared objects " Dmitry Kozlyuk
@ 2021-10-21  7:37         ` Ori Kam
  2021-10-21 18:28         ` Ajit Khaparde
  2021-11-01 15:04         ` Andrew Rybchenko
  2 siblings, 0 replies; 96+ messages in thread
From: Ori Kam @ 2021-10-21  7:37 UTC (permalink / raw)
  To: Dmitry Kozlyuk, dev
  Cc: NBU-Contact-Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
Hi Dmitry,
> -----Original Message-----
> From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> Sent: Thursday, October 21, 2021 9:35 AM
> Subject: [PATCH v4 2/6] ethdev: add capability to keep shared objects on restart
> 
> rte_flow_action_handle_create() did not mention what happens
> with an indirect action when a device is stopped and started again.
> It is natural for some indirect actions, like counter, to be persistent.
> Keeping others at least saves application time and complexity.
> However, not all PMDs can support it, or the support may be limited
> by particular action kinds, that is, combinations of action type
> and the value of the transfer bit in its configuration.
> 
> Add a device capability to indicate if at least some indirect actions
> are kept across the above sequence. Without this capability the behavior
> is still unspecified, and application is required to destroy
> the indirect actions before stopping the device.
> In the future, indirect actions may not be the only type of objects
> shared between flow rules. The capability bit intends to cover all
> possible types of such objects, hence its name.
> 
> Declare that the application can test for the persistence
> of a particular indirect action kind by attempting to create
> an indirect action of that kind when the device is stopped
> and checking for the specific error type.
> This is logical because if the PMD can to create an indirect action
> when the device is not started and use it after the start happens,
> it is natural that it can move its internal flow shared object
> to the same state when the device is stopped and restore the state
> when the device is started.
> 
> Indirect action persistence across a reconfigurations is not required.
> In case a PMD cannot keep the indirect actions across reconfiguration,
> it is allowed just to report an error.
> Application must then flush the indirect actions before attempting it.
> 
> Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> ---
>  doc/guides/prog_guide/rte_flow.rst | 26 ++++++++++++++++++++++++++
>  lib/ethdev/rte_ethdev.h            |  3 +++
>  2 files changed, 29 insertions(+)
> 
> diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
> index 9beaae3df3..bef143862b 100644
> --- a/doc/guides/prog_guide/rte_flow.rst
> +++ b/doc/guides/prog_guide/rte_flow.rst
> @@ -2965,6 +2965,32 @@ updated depend on the type of the ``action`` and different for every type.
>  The indirect action specified data (e.g. counter) can be queried by
>  ``rte_flow_action_handle_query()``.
> 
> +If ``RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP`` is not advertised,
> +indirect actions cannot be created until the device is started for the first time
> +and cannot be kept when the device is stopped.
> +However, PMD also does not flush them automatically on stop,
> +so the application must call ``rte_flow_action_handle_destroy()``
> +before stopping the device to ensure no indirect actions remain.
> +
> +If ``RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP`` is advertised,
> +this means that the PMD can keep at least some indirect actions
> +across device stop and start.
> +However, ``rte_eth_dev_configure()`` may fail if any indirect actions remain,
> +so the application must destroy them before attempting a reconfiguration.
> +Keeping may be only supported for certain kinds of indirect actions.
> +A kind is a combination of an action type and a value of its transfer bit.
> +For example: an indirect counter with the transfer bit reset.
> +To test if a particular kind of indirect actions is kept,
> +the application must try to create a valid indirect action of that kind
> +when the device is not started (either before the first start of after a stop).
> +If it fails with an error of type ``RTE_FLOW_ERROR_TYPE_STATE``,
> +application must destroy all indirect actions of this kind
> +before stopping the device.
> +If it succeeds, all indirect actions of the same kind are kept
> +when the device is stopped.
> +Indirect actions of a kept kind that are created when the device is stopped,
> +including the ones created for the test, will be kept after the device start.
> +
>  .. _table_rte_flow_action_handle:
> 
>  .. table:: INDIRECT
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index 9cf23fecce..5375844484 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -94,6 +94,7 @@
>   * depending on the device capabilities:
>   *
>   *     - flow rules
> + *     - flow-related shared objects, e.g. indirect actions
>   *
>   * Any other configuration will not be stored and will need to be re-entered
>   * before a call to rte_eth_dev_start().
> @@ -1452,6 +1453,8 @@ struct rte_eth_conf {
>  #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
>  /** Device supports keeping flow rules across restart. */
>  #define RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP 0x00000004
> +/** Device supports keeping shared flow objects across restart. */
> +#define RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP 0x00000008
>  /**@}*/
> 
>  /*
> --
> 2.25.1
Acked-by: Ori Kam <orika@nvidia.com>
Best,
Ori
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH v4 3/6] net: advertise no support for keeping flow rules
  2021-10-21  6:35       ` [dpdk-dev] [PATCH v4 3/6] net: advertise no support for keeping flow rules Dmitry Kozlyuk
@ 2021-10-21 18:26         ` Ajit Khaparde
  2021-10-22  1:38           ` Somnath Kotur
  2021-10-27  7:11         ` Hyong Youb Kim (hyonkim)
  2021-11-01 15:06         ` Andrew Rybchenko
  2 siblings, 1 reply; 96+ messages in thread
From: Ajit Khaparde @ 2021-10-21 18:26 UTC (permalink / raw)
  To: Dmitry Kozlyuk
  Cc: dpdk-dev, Ori Kam, Ferruh Yigit, Somnath Kotur,
	Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori, Satha Rao,
	Rahul Lakkireddy, Hemant Agrawal, Sachin Saxena, Haiyue Wang,
	John Daley, Hyong Youb Kim, Gaetan Rivet, Ziyang Xuan,
	Xiaoyun Wang, Guoyang Zhou, Min Hu (Connor),
	Yisen Zhuang, Lijun Ou, Beilei Xing, Jingjing Wu, Qiming Yang,
	Qi Zhang, Rosen Xu, Liron Himi, Jerin Jacob, Rasesh Mody,
	Devendra Singh Rawat, Andrew Rybchenko, Jasvinder Singh,
	Cristian Dumitrescu, Keith Wiles, Jiawen Wu, Jian Wang
On Wed, Oct 20, 2021 at 11:36 PM Dmitry Kozlyuk <dkozlyuk@nvidia.com> wrote:
>
> When RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP capability bit is zero,
> the specified behavior is the same as it had been before
> this bit was introduced. Explicitly reset it in all PMDs
> supporting rte_flow API in order to attract the attention
> of maintainers, who should eventually choose to advertise
> the new capability or not. It is already known that
> mlx4 and mlx5 will not support this capability.
>
> For RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP
> similar action is not performed,
> because no PMD except mlx5 supports indirect actions.
> Any PMD that starts doing so will anyway have to consider
> all relevant API, including this capability.
>
> Suggested-by: Ferruh Yigit <ferruh.yigit@intel.com>
> Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH v4 2/6] ethdev: add capability to keep shared objects on restart
  2021-10-21  6:34       ` [dpdk-dev] [PATCH v4 2/6] ethdev: add capability to keep shared objects " Dmitry Kozlyuk
  2021-10-21  7:37         ` Ori Kam
@ 2021-10-21 18:28         ` Ajit Khaparde
  2021-11-01 15:04         ` Andrew Rybchenko
  2 siblings, 0 replies; 96+ messages in thread
From: Ajit Khaparde @ 2021-10-21 18:28 UTC (permalink / raw)
  To: Dmitry Kozlyuk
  Cc: dpdk-dev, Ori Kam, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
On Wed, Oct 20, 2021 at 11:35 PM Dmitry Kozlyuk <dkozlyuk@oss.nvidia.com> wrote:
>
> rte_flow_action_handle_create() did not mention what happens
> with an indirect action when a device is stopped and started again.
> It is natural for some indirect actions, like counter, to be persistent.
> Keeping others at least saves application time and complexity.
> However, not all PMDs can support it, or the support may be limited
> by particular action kinds, that is, combinations of action type
> and the value of the transfer bit in its configuration.
>
> Add a device capability to indicate if at least some indirect actions
> are kept across the above sequence. Without this capability the behavior
> is still unspecified, and application is required to destroy
> the indirect actions before stopping the device.
> In the future, indirect actions may not be the only type of objects
> shared between flow rules. The capability bit intends to cover all
> possible types of such objects, hence its name.
>
> Declare that the application can test for the persistence
> of a particular indirect action kind by attempting to create
> an indirect action of that kind when the device is stopped
> and checking for the specific error type.
> This is logical because if the PMD can to create an indirect action
> when the device is not started and use it after the start happens,
> it is natural that it can move its internal flow shared object
> to the same state when the device is stopped and restore the state
> when the device is started.
>
> Indirect action persistence across a reconfigurations is not required.
> In case a PMD cannot keep the indirect actions across reconfiguration,
> it is allowed just to report an error.
> Application must then flush the indirect actions before attempting it.
>
> Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
>
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH v4 3/6] net: advertise no support for keeping flow rules
  2021-10-21 18:26         ` Ajit Khaparde
@ 2021-10-22  1:38           ` Somnath Kotur
  0 siblings, 0 replies; 96+ messages in thread
From: Somnath Kotur @ 2021-10-22  1:38 UTC (permalink / raw)
  To: Ajit Khaparde
  Cc: Dmitry Kozlyuk, dpdk-dev, Ori Kam, Ferruh Yigit,
	Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori, Satha Rao,
	Rahul Lakkireddy, Hemant Agrawal, Sachin Saxena, Haiyue Wang,
	John Daley, Hyong Youb Kim, Gaetan Rivet, Ziyang Xuan,
	Xiaoyun Wang, Guoyang Zhou, Min Hu (Connor),
	Yisen Zhuang, Lijun Ou, Beilei Xing, Jingjing Wu, Qiming Yang,
	Qi Zhang, Rosen Xu, Liron Himi, Jerin Jacob, Rasesh Mody,
	Devendra Singh Rawat, Andrew Rybchenko, Jasvinder Singh,
	Cristian Dumitrescu, Keith Wiles, Jiawen Wu, Jian Wang
[-- Attachment #1: Type: text/plain, Size: 1071 bytes --]
On Thu, 21 Oct 2021, 23:56 Ajit Khaparde, <ajit.khaparde@broadcom.com>
wrote:
> On Wed, Oct 20, 2021 at 11:36 PM Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> wrote:
> >
> > When RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP capability bit is zero,
> > the specified behavior is the same as it had been before
> > this bit was introduced. Explicitly reset it in all PMDs
> > supporting rte_flow API in order to attract the attention
> > of maintainers, who should eventually choose to advertise
> > the new capability or not. It is already known that
> > mlx4 and mlx5 will not support this capability.
> >
> > For RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP
> > similar action is not performed,
> > because no PMD except mlx5 supports indirect actions.
> > Any PMD that starts doing so will anyway have to consider
> > all relevant API, including this capability.
> >
> > Suggested-by: Ferruh Yigit <ferruh.yigit@intel.com>
> > Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
>
Acked-by: Somnath Kotur
<somnath.kotur@broadcom.com>
>
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH v4 0/6] Flow entites behavior on port restart
  2021-10-21  6:34     ` [dpdk-dev] [PATCH v4 " Dmitry Kozlyuk
                         ` (5 preceding siblings ...)
  2021-10-21  6:35       ` [dpdk-dev] [PATCH v4 6/6] net/mlx5: preserve indirect actions on restart Dmitry Kozlyuk
@ 2021-10-26 11:46       ` Ferruh Yigit
  2021-11-01 13:43         ` Ferruh Yigit
  2021-11-02 13:49       ` Ferruh Yigit
  2021-11-02 13:54       ` [dpdk-dev] [PATCH v5 " Dmitry Kozlyuk
  8 siblings, 1 reply; 96+ messages in thread
From: Ferruh Yigit @ 2021-10-26 11:46 UTC (permalink / raw)
  To: Ajit Khaparde, Somnath Kotur, Nithin Dabilpuram, Kiran Kumar K,
	Sunil Kumar Kori, Satha Rao, Rahul Lakkireddy, Hemant Agrawal,
	Sachin Saxena, Haiyue Wang, John Daley, Hyong Youb Kim,
	Gaetan Rivet, Ziyang Xuan, Xiaoyun Wang, Guoyang Zhou,
	Min Hu (Connor),
	Yisen Zhuang, Lijun Ou, Beilei Xing, Jingjing Wu, Beilei Xing,
	Qiming Yang, Qi Zhang, Rosen Xu, Liron Himi, Jerin Jacob,
	Nithin Dabilpuram, Kiran Kumar K, Rasesh Mody,
	Devendra Singh Rawat, Andrew Rybchenko, Jasvinder Singh,
	Cristian Dumitrescu, Keith Wiles, Jiawen Wu, Jian Wang
  Cc: Ori Kam, Dmitry Kozlyuk, dev
On 10/21/2021 7:34 AM, Dmitry Kozlyuk wrote:
> It is unspecified whether flow rules and indirect actions are kept
> when a port is stopped, possibly reconfigured, and started again.
> Vendors approach the topic differently, e.g. mlx5 and i40e PMD
> disagree in whether flow rules can be kept, and mlx5 PMD would keep
> indirect actions. In the end, applications are greatly affected
> by whatever contract there is and need to know it.
> 
> Applications may wish to restart the port to reconfigure it,
> e.g. switch offloads or even modify queues.
> Keeping rte_flow entities enables application improvements:
> 1. Since keeping the rules across restart comes with the ability
>     to create rules before the device is started. This allows
>     to have all the rules created at the moment of start,
>     so that there is no time frame when traffic is coming already,
>     but the rules are not yet created (restored).
> 2. When a rule or an indirect action has some associated state,
>     such as a counter, application saves the need to keep
>     additional state in order to cope with information loss
>     if such an entity would be destroyed.
> 
> It is proposed to advertise capabilities of keeping flow rules
> and indirect actions (as a special case of shared object)
> using a combination of ethdev info and rte_flow calls.
> Then a bug is fixed in mlx5 PMD that prevented indirect RSS action
> from being kept, and the driver starts advertising the new capability.
> 
> Prior discussions:
> 1) http://inbox.dpdk.org/dev/20210727073121.895620-1-dkozlyuk@nvidia.com/
> 2) http://inbox.dpdk.org/dev/20210901085516.3647814-1-dkozlyuk@nvidia.com/
> 
> v4:  1. Fix rebase conflicts (CI).
>       2. State rule behavior when a port is not started or stopped (Ori).
>       3. Improve wording on rule features, add examples (Andrew).
>       4. State that rules/actions that cannot be kept while other can be
>          must be destroyed by the application (Andrew/Ori).
>       5. Add rationale to the cover letter (Andrew).
> 
> Dmitry Kozlyuk (6):
>    ethdev: add capability to keep flow rules on restart
>    ethdev: add capability to keep shared objects on restart
>    net: advertise no support for keeping flow rules
>    net/mlx5: discover max flow priority using DevX
>    net/mlx5: create drop queue using DevX
>    net/mlx5: preserve indirect actions on restart
> 
Requesting review from PMD maintainers.
Since this patch tries to define behavior on keeping/flushing flow rules
after port stop/start/configure, better to get more feedback from various
vendors, please review/comment on patch so that we can get it for -rc2.
If there is no comment the patch can go in as it is for -rc2.
Thanks,
ferruh
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH v4 3/6] net: advertise no support for keeping flow rules
  2021-10-21  6:35       ` [dpdk-dev] [PATCH v4 3/6] net: advertise no support for keeping flow rules Dmitry Kozlyuk
  2021-10-21 18:26         ` Ajit Khaparde
@ 2021-10-27  7:11         ` Hyong Youb Kim (hyonkim)
  2021-11-01 15:06         ` Andrew Rybchenko
  2 siblings, 0 replies; 96+ messages in thread
From: Hyong Youb Kim (hyonkim) @ 2021-10-27  7:11 UTC (permalink / raw)
  To: Dmitry Kozlyuk, dev
  Cc: Ori Kam, Ferruh Yigit, Ajit Khaparde, Somnath Kotur,
	Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori, Satha Rao,
	Rahul Lakkireddy, Hemant Agrawal, Sachin Saxena, Haiyue Wang,
	John Daley (johndale),
	Gaetan Rivet, Ziyang Xuan, Xiaoyun Wang, Guoyang Zhou,
	Min Hu (Connor),
	Yisen Zhuang, Lijun Ou, Beilei Xing, Jingjing Wu, Qiming Yang,
	Qi Zhang, Rosen Xu, Liron Himi, Jerin Jacob, Rasesh Mody,
	Devendra Singh Rawat, Andrew Rybchenko, Jasvinder Singh,
	Cristian Dumitrescu, Keith Wiles, Jiawen Wu, Jian Wang
> -----Original Message-----
> From: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> Sent: Thursday, October 21, 2021 3:35 PM
[...]
> Subject: [PATCH v4 3/6] net: advertise no support for keeping flow rules
> 
> When RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP capability bit is zero,
> the specified behavior is the same as it had been before
> this bit was introduced. Explicitly reset it in all PMDs
> supporting rte_flow API in order to attract the attention
> of maintainers, who should eventually choose to advertise
> the new capability or not. It is already known that
> mlx4 and mlx5 will not support this capability.
> 
> For RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP
> similar action is not performed,
> because no PMD except mlx5 supports indirect actions.
> Any PMD that starts doing so will anyway have to consider
> all relevant API, including this capability.
> 
> Suggested-by: Ferruh Yigit <ferruh.yigit@intel.com>
> Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> ---
For net/enic,
Acked-by: Hyong Youb Kim <hyonkim@cisco.com>
-Hyong
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH v4 1/6] ethdev: add capability to keep flow rules on restart
  2021-10-21  6:34       ` [dpdk-dev] [PATCH v4 1/6] ethdev: add capability to keep flow rules on restart Dmitry Kozlyuk
  2021-10-21  7:36         ` Ori Kam
@ 2021-10-28 18:33         ` Ajit Khaparde
  2021-11-01 15:02         ` Andrew Rybchenko
  2 siblings, 0 replies; 96+ messages in thread
From: Ajit Khaparde @ 2021-10-28 18:33 UTC (permalink / raw)
  To: Dmitry Kozlyuk
  Cc: dpdk-dev, Ori Kam, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
On Wed, Oct 20, 2021 at 11:35 PM Dmitry Kozlyuk <dkozlyuk@oss.nvidia.com> wrote:
>
> Previously, it was not specified what happens to the flow rules
> when the device is stopped, possibly reconfigured, then started.
> If flow rules were kept, it could be convenient for application
> developers, because they wouldn't need to save and restore them.
> However, due to the number of flows and possible creation rate it is
> impractical to save all flow rules in DPDK layer. This means that flow
> rules persistence really depends on whether PMD and HW can implement it
> efficiently. It can also be limited by the rule item and action types,
> and its attributes transfer bit (a combination of an item/action type
> and a value of the transfer bit is called a ruel feature).
>
> Add a device capability bit for PMDs that can keep at least some
> of the flow rules across restart. Without this capability behavior
> is still unspecified and it is declared that the application must
> flush the rules before stopping the device.
> Allow the application to test for persistence of rules using
> a particular feature by attempting to create a flow rule
> using that feature when the device is stopped
> and checking for the specific error.
> This is logical because if the PMD can to create the flow rule
> when the device is not started and use it after the start happens,
> it is natural that it can move its internal flow rule object
> to the same state when the device is stopped and restore the state
> when the device is started.
>
> Rule persistence across a reconfigurations is not required,
> because tracking all the rules and configuration-dependent resources
> they use may be infeasible. In case a PMD cannot keep the rules
> across reconfiguration, it is allowed just to report an error.
> Application must then flush the rules before attempting it.
>
> Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH v4 0/6] Flow entites behavior on port restart
  2021-10-26 11:46       ` [dpdk-dev] [PATCH v4 0/6] Flow entites behavior on port restart Ferruh Yigit
@ 2021-11-01 13:43         ` Ferruh Yigit
  0 siblings, 0 replies; 96+ messages in thread
From: Ferruh Yigit @ 2021-11-01 13:43 UTC (permalink / raw)
  To: Ajit Khaparde, Somnath Kotur, Nithin Dabilpuram, Kiran Kumar K,
	Sunil Kumar Kori, Satha Rao, Rahul Lakkireddy, Hemant Agrawal,
	Sachin Saxena, Haiyue Wang, John Daley, Hyong Youb Kim,
	Gaetan Rivet, Ziyang Xuan, Xiaoyun Wang, Guoyang Zhou,
	Min Hu (Connor),
	Yisen Zhuang, Lijun Ou, Beilei Xing, Jingjing Wu, Beilei Xing,
	Qiming Yang, Qi Zhang, Rosen Xu, Liron Himi, Jerin Jacob,
	Nithin Dabilpuram, Kiran Kumar K, Rasesh Mody,
	Devendra Singh Rawat, Andrew Rybchenko, Jasvinder Singh,
	Cristian Dumitrescu, Keith Wiles, Jiawen Wu, Jian Wang
  Cc: Ori Kam, Dmitry Kozlyuk, dev
On 10/26/2021 12:46 PM, Ferruh Yigit wrote:
> On 10/21/2021 7:34 AM, Dmitry Kozlyuk wrote:
>> It is unspecified whether flow rules and indirect actions are kept
>> when a port is stopped, possibly reconfigured, and started again.
>> Vendors approach the topic differently, e.g. mlx5 and i40e PMD
>> disagree in whether flow rules can be kept, and mlx5 PMD would keep
>> indirect actions. In the end, applications are greatly affected
>> by whatever contract there is and need to know it.
>>
>> Applications may wish to restart the port to reconfigure it,
>> e.g. switch offloads or even modify queues.
>> Keeping rte_flow entities enables application improvements:
>> 1. Since keeping the rules across restart comes with the ability
>>     to create rules before the device is started. This allows
>>     to have all the rules created at the moment of start,
>>     so that there is no time frame when traffic is coming already,
>>     but the rules are not yet created (restored).
>> 2. When a rule or an indirect action has some associated state,
>>     such as a counter, application saves the need to keep
>>     additional state in order to cope with information loss
>>     if such an entity would be destroyed.
>>
>> It is proposed to advertise capabilities of keeping flow rules
>> and indirect actions (as a special case of shared object)
>> using a combination of ethdev info and rte_flow calls.
>> Then a bug is fixed in mlx5 PMD that prevented indirect RSS action
>> from being kept, and the driver starts advertising the new capability.
>>
>> Prior discussions:
>> 1) http://inbox.dpdk.org/dev/20210727073121.895620-1-dkozlyuk@nvidia.com/
>> 2) http://inbox.dpdk.org/dev/20210901085516.3647814-1-dkozlyuk@nvidia.com/
>>
>> v4:  1. Fix rebase conflicts (CI).
>>       2. State rule behavior when a port is not started or stopped (Ori).
>>       3. Improve wording on rule features, add examples (Andrew).
>>       4. State that rules/actions that cannot be kept while other can be
>>          must be destroyed by the application (Andrew/Ori).
>>       5. Add rationale to the cover letter (Andrew).
>>
>> Dmitry Kozlyuk (6):
>>    ethdev: add capability to keep flow rules on restart
>>    ethdev: add capability to keep shared objects on restart
>>    net: advertise no support for keeping flow rules
>>    net/mlx5: discover max flow priority using DevX
>>    net/mlx5: create drop queue using DevX
>>    net/mlx5: preserve indirect actions on restart
>>
> 
> Requesting review from PMD maintainers.
> 
> Since this patch tries to define behavior on keeping/flushing flow rules
> after port stop/start/configure, better to get more feedback from various
> vendors, please review/comment on patch so that we can get it for -rc2.
> 
> If there is no comment the patch can go in as it is for -rc2.
> 
Last review call before merge.
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH v4 1/6] ethdev: add capability to keep flow rules on restart
  2021-10-21  6:34       ` [dpdk-dev] [PATCH v4 1/6] ethdev: add capability to keep flow rules on restart Dmitry Kozlyuk
  2021-10-21  7:36         ` Ori Kam
  2021-10-28 18:33         ` Ajit Khaparde
@ 2021-11-01 15:02         ` Andrew Rybchenko
  2021-11-01 15:56           ` Dmitry Kozlyuk
  2 siblings, 1 reply; 96+ messages in thread
From: Andrew Rybchenko @ 2021-11-01 15:02 UTC (permalink / raw)
  To: Dmitry Kozlyuk, dev; +Cc: Ori Kam, Thomas Monjalon, Ferruh Yigit
On 10/21/21 9:34 AM, Dmitry Kozlyuk wrote:
> Previously, it was not specified what happens to the flow rules
> when the device is stopped, possibly reconfigured, then started.
> If flow rules were kept, it could be convenient for application
> developers, because they wouldn't need to save and restore them.
> However, due to the number of flows and possible creation rate it is
> impractical to save all flow rules in DPDK layer. This means that flow
> rules persistence really depends on whether PMD and HW can implement it
> efficiently. It can also be limited by the rule item and action types,
> and its attributes transfer bit (a combination of an item/action type
> and a value of the transfer bit is called a ruel feature).
> 
> Add a device capability bit for PMDs that can keep at least some
> of the flow rules across restart. Without this capability behavior
> is still unspecified and it is declared that the application must
> flush the rules before stopping the device.
> Allow the application to test for persistence of rules using
> a particular feature by attempting to create a flow rule
> using that feature when the device is stopped
> and checking for the specific error.
> This is logical because if the PMD can to create the flow rule
> when the device is not started and use it after the start happens,
> it is natural that it can move its internal flow rule object
> to the same state when the device is stopped and restore the state
> when the device is started.
> 
> Rule persistence across a reconfigurations is not required,
> because tracking all the rules and configuration-dependent resources
> they use may be infeasible. In case a PMD cannot keep the rules
> across reconfiguration, it is allowed just to report an error.
> Application must then flush the rules before attempting it.
> 
> Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
with some review notes below
> ---
>   doc/guides/prog_guide/rte_flow.rst | 31 ++++++++++++++++++++++++++++++
>   lib/ethdev/rte_ethdev.h            |  7 +++++++
>   lib/ethdev/rte_flow.h              |  1 +
>   3 files changed, 39 insertions(+)
> 
> diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
> index aeba374182..9beaae3df3 100644
> --- a/doc/guides/prog_guide/rte_flow.rst
> +++ b/doc/guides/prog_guide/rte_flow.rst
> @@ -87,6 +87,37 @@ To avoid resource leaks on the PMD side, handles must be explicitly
>   destroyed by the application before releasing associated resources such as
>   queues and ports.
>   
> +When the device is stopped, its rules do not process the traffic.
> +In particular, transfer rules created using some device
> +stop affecting the traffic even if they refer to different ports.
> +
> +If ``RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP`` is not advertised,
> +rules cannot be created until the device is started for the first time
> +and cannot be kept when the device is stopped.
> +However, PMD also does not flush them automatically on stop,
> +so the application must call ``rte_flow_flush()`` or ``rte_flow_destroy()``
> +before stopping the device to ensure no rules remain.
> +
> +If ``RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP`` is advertised, this means
> +the PMD can keep at least some rules across the device stop and start.
> +However, ``rte_eth_dev_configure()`` may fail if any rules remain,
> +so the application must flush them before attempting a reconfiguration.
> +Keeping may be unsupported for some types of rule items and actions,
> +as well as depending on the value of flow attributes transfer bit.
> +A combination of a single an item or action type
> +and a value of the transfer bit is called a rule feature.
> +For example: a COUNT action with the transfer bit set.
> +To test if rules with a particular feature are kept, the application must try
> +to create a valid rule using this feature when the device is not started
> +(either before the first start or after a stop).
> +If it fails with an error of type ``RTE_FLOW_ERROR_TYPE_STATE``,
> +all rules using this feature must be flushed by the application
> +before stopping the device.
> +If it succeeds, such rules will be kept when the device is stopped,
> +provided they do not use other features that are not supported.
> +Rules that are created when the device is stopped, including the rules
> +created for the test, will be kept after the device is started.
> +
>   The following sections cover:
>   
>   - **Attributes** (represented by ``struct rte_flow_attr``): properties of a
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index 014270d316..9cf23fecce 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -90,6 +90,11 @@
>    *     - flow director filtering mode (but not filtering rules)
>    *     - NIC queue statistics mappings
>    *
> + * The following configuration may be retained or not
> + * depending on the device capabilities:
> + *
> + *     - flow rules
> + *
>    * Any other configuration will not be stored and will need to be re-entered
>    * before a call to rte_eth_dev_start().
>    *
> @@ -1445,6 +1450,8 @@ struct rte_eth_conf {
>   #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
>   /** Device supports Tx queue setup after device started. */
>   #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
> +/** Device supports keeping flow rules across restart. */
> +#define RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP 0x00000004
RTE_BIT64(2) since previous two are already defined using RTE_BIT32()
in next-net
Don't we need an experimental markup in the documentation to
make it possible to refine the future in the nearest future
without API breakage? If yes, it must be mentioned in the
rte_flow.rst documentation as well.
>   /**@}*/
>   
>   /*
> diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
> index 2b6efeef8c..16ef33819b 100644
> --- a/lib/ethdev/rte_flow.h
> +++ b/lib/ethdev/rte_flow.h
> @@ -3676,6 +3676,7 @@ enum rte_flow_error_type {
>   	RTE_FLOW_ERROR_TYPE_ACTION_NUM, /**< Number of actions. */
>   	RTE_FLOW_ERROR_TYPE_ACTION_CONF, /**< Action configuration. */
>   	RTE_FLOW_ERROR_TYPE_ACTION, /**< Specific action. */
> +	RTE_FLOW_ERROR_TYPE_STATE, /**< Current device state. */
>   };
>   
>   /**
> 
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH v4 2/6] ethdev: add capability to keep shared objects on restart
  2021-10-21  6:34       ` [dpdk-dev] [PATCH v4 2/6] ethdev: add capability to keep shared objects " Dmitry Kozlyuk
  2021-10-21  7:37         ` Ori Kam
  2021-10-21 18:28         ` Ajit Khaparde
@ 2021-11-01 15:04         ` Andrew Rybchenko
  2 siblings, 0 replies; 96+ messages in thread
From: Andrew Rybchenko @ 2021-11-01 15:04 UTC (permalink / raw)
  To: Dmitry Kozlyuk, dev; +Cc: Ori Kam, Thomas Monjalon, Ferruh Yigit
On 10/21/21 9:34 AM, Dmitry Kozlyuk wrote:
> rte_flow_action_handle_create() did not mention what happens
> with an indirect action when a device is stopped and started again.
> It is natural for some indirect actions, like counter, to be persistent.
> Keeping others at least saves application time and complexity.
> However, not all PMDs can support it, or the support may be limited
> by particular action kinds, that is, combinations of action type
> and the value of the transfer bit in its configuration.
> 
> Add a device capability to indicate if at least some indirect actions
> are kept across the above sequence. Without this capability the behavior
> is still unspecified, and application is required to destroy
> the indirect actions before stopping the device.
> In the future, indirect actions may not be the only type of objects
> shared between flow rules. The capability bit intends to cover all
> possible types of such objects, hence its name.
> 
> Declare that the application can test for the persistence
> of a particular indirect action kind by attempting to create
> an indirect action of that kind when the device is stopped
> and checking for the specific error type.
> This is logical because if the PMD can to create an indirect action
> when the device is not started and use it after the start happens,
> it is natural that it can move its internal flow shared object
> to the same state when the device is stopped and restore the state
> when the device is started.
> 
> Indirect action persistence across a reconfigurations is not required.
> In case a PMD cannot keep the indirect actions across reconfiguration,
> it is allowed just to report an error.
> Application must then flush the indirect actions before attempting it.
> 
> Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
with below review notes processed
> ---
>   doc/guides/prog_guide/rte_flow.rst | 26 ++++++++++++++++++++++++++
>   lib/ethdev/rte_ethdev.h            |  3 +++
>   2 files changed, 29 insertions(+)
> 
> diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
> index 9beaae3df3..bef143862b 100644
> --- a/doc/guides/prog_guide/rte_flow.rst
> +++ b/doc/guides/prog_guide/rte_flow.rst
> @@ -2965,6 +2965,32 @@ updated depend on the type of the ``action`` and different for every type.
>   The indirect action specified data (e.g. counter) can be queried by
>   ``rte_flow_action_handle_query()``.
>   
> +If ``RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP`` is not advertised,
> +indirect actions cannot be created until the device is started for the first time
> +and cannot be kept when the device is stopped.
> +However, PMD also does not flush them automatically on stop,
> +so the application must call ``rte_flow_action_handle_destroy()``
> +before stopping the device to ensure no indirect actions remain.
> +
> +If ``RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP`` is advertised,
> +this means that the PMD can keep at least some indirect actions
> +across device stop and start.
> +However, ``rte_eth_dev_configure()`` may fail if any indirect actions remain,
> +so the application must destroy them before attempting a reconfiguration.
> +Keeping may be only supported for certain kinds of indirect actions.
> +A kind is a combination of an action type and a value of its transfer bit.
> +For example: an indirect counter with the transfer bit reset.
> +To test if a particular kind of indirect actions is kept,
> +the application must try to create a valid indirect action of that kind
> +when the device is not started (either before the first start of after a stop).
> +If it fails with an error of type ``RTE_FLOW_ERROR_TYPE_STATE``,
> +application must destroy all indirect actions of this kind
> +before stopping the device.
> +If it succeeds, all indirect actions of the same kind are kept
> +when the device is stopped.
> +Indirect actions of a kept kind that are created when the device is stopped,
> +including the ones created for the test, will be kept after the device start.
> +
>   .. _table_rte_flow_action_handle:
>   
>   .. table:: INDIRECT
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index 9cf23fecce..5375844484 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -94,6 +94,7 @@
>    * depending on the device capabilities:
>    *
>    *     - flow rules
> + *     - flow-related shared objects, e.g. indirect actions
>    *
>    * Any other configuration will not be stored and will need to be re-entered
>    * before a call to rte_eth_dev_start().
> @@ -1452,6 +1453,8 @@ struct rte_eth_conf {
>   #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
>   /** Device supports keeping flow rules across restart. */
>   #define RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP 0x00000004
> +/** Device supports keeping shared flow objects across restart. */
> +#define RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP 0x00000008
RTE_BIT32(3) plus experimental markup
>   /**@}*/
>   
>   /*
> 
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH v4 3/6] net: advertise no support for keeping flow rules
  2021-10-21  6:35       ` [dpdk-dev] [PATCH v4 3/6] net: advertise no support for keeping flow rules Dmitry Kozlyuk
  2021-10-21 18:26         ` Ajit Khaparde
  2021-10-27  7:11         ` Hyong Youb Kim (hyonkim)
@ 2021-11-01 15:06         ` Andrew Rybchenko
  2021-11-01 16:59           ` Ferruh Yigit
  2 siblings, 1 reply; 96+ messages in thread
From: Andrew Rybchenko @ 2021-11-01 15:06 UTC (permalink / raw)
  To: Dmitry Kozlyuk, dev
  Cc: Ori Kam, Ferruh Yigit, Ajit Khaparde, Somnath Kotur,
	Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori, Satha Rao,
	Rahul Lakkireddy, Hemant Agrawal, Sachin Saxena, Haiyue Wang,
	John Daley, Hyong Youb Kim, Gaetan Rivet, Ziyang Xuan,
	Xiaoyun Wang, Guoyang Zhou, Min Hu (Connor),
	Yisen Zhuang, Lijun Ou, Beilei Xing, Jingjing Wu, Qiming Yang,
	Qi Zhang, Rosen Xu, Liron Himi, Jerin Jacob, Rasesh Mody,
	Devendra Singh Rawat, Jasvinder Singh, Cristian Dumitrescu,
	Keith Wiles, Jiawen Wu, Jian Wang
On 10/21/21 9:35 AM, Dmitry Kozlyuk wrote:
> When RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP capability bit is zero,
> the specified behavior is the same as it had been before
> this bit was introduced. Explicitly reset it in all PMDs
> supporting rte_flow API in order to attract the attention
> of maintainers, who should eventually choose to advertise
> the new capability or not. It is already known that
> mlx4 and mlx5 will not support this capability.
> 
> For RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP
> similar action is not performed,
> because no PMD except mlx5 supports indirect actions.
> Any PMD that starts doing so will anyway have to consider
> all relevant API, including this capability.
> 
> Suggested-by: Ferruh Yigit <ferruh.yigit@intel.com>
> Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
I'm sorry, but I still think that the patch is confusing.
No strong opinion, but personally I'd go without it.
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH v4 1/6] ethdev: add capability to keep flow rules on restart
  2021-11-01 15:02         ` Andrew Rybchenko
@ 2021-11-01 15:56           ` Dmitry Kozlyuk
  0 siblings, 0 replies; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-11-01 15:56 UTC (permalink / raw)
  To: Andrew Rybchenko, dev; +Cc: Ori Kam, NBU-Contact-Thomas Monjalon, Ferruh Yigit
> > @@ -1445,6 +1450,8 @@ struct rte_eth_conf {
> >   #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
> >   /** Device supports Tx queue setup after device started. */
> >   #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
> > +/** Device supports keeping flow rules across restart. */
> > +#define RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP 0x00000004
> 
> RTE_BIT64(2) since previous two are already defined using RTE_BIT32()
> in next-net
> 
> Don't we need an experimental markup in the documentation to
> make it possible to refine the future in the nearest future
> without API breakage? If yes, it must be mentioned in the
> rte_flow.rst documentation as well.
It seems constants are not usually marked as experimental,
but I will add warnings to rte_flow.rst anyway.
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH v4 3/6] net: advertise no support for keeping flow rules
  2021-11-01 15:06         ` Andrew Rybchenko
@ 2021-11-01 16:59           ` Ferruh Yigit
  0 siblings, 0 replies; 96+ messages in thread
From: Ferruh Yigit @ 2021-11-01 16:59 UTC (permalink / raw)
  To: Andrew Rybchenko, Dmitry Kozlyuk, dev
  Cc: Ori Kam, Ajit Khaparde, Somnath Kotur, Nithin Dabilpuram,
	Kiran Kumar K, Sunil Kumar Kori, Satha Rao, Rahul Lakkireddy,
	Hemant Agrawal, Sachin Saxena, Haiyue Wang, John Daley,
	Hyong Youb Kim, Gaetan Rivet, Ziyang Xuan, Xiaoyun Wang,
	Guoyang Zhou, Min Hu (Connor),
	Yisen Zhuang, Lijun Ou, Beilei Xing, Jingjing Wu, Qiming Yang,
	Qi Zhang, Rosen Xu, Liron Himi, Jerin Jacob, Rasesh Mody,
	Devendra Singh Rawat, Jasvinder Singh, Cristian Dumitrescu,
	Keith Wiles, Jiawen Wu, Jian Wang
On 11/1/2021 3:06 PM, Andrew Rybchenko wrote:
> On 10/21/21 9:35 AM, Dmitry Kozlyuk wrote:
>> When RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP capability bit is zero,
>> the specified behavior is the same as it had been before
>> this bit was introduced. Explicitly reset it in all PMDs
>> supporting rte_flow API in order to attract the attention
>> of maintainers, who should eventually choose to advertise
>> the new capability or not. It is already known that
>> mlx4 and mlx5 will not support this capability.
>>
>> For RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP
>> similar action is not performed,
>> because no PMD except mlx5 supports indirect actions.
>> Any PMD that starts doing so will anyway have to consider
>> all relevant API, including this capability.
>>
>> Suggested-by: Ferruh Yigit <ferruh.yigit@intel.com>
>> Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
> 
> I'm sorry, but I still think that the patch is confusing.
> No strong opinion, but personally I'd go without it.
It is confusing alright to add a flag that has no impact.
But again my concern is some PMDs maintainers not being aware of
this new capability flag that has an impact in the user application.
See the level of comments to the patch from PMD maintainers.
This way gives a visibility to both PMD maintainers that some action
is required, and gives visibility to application developers and maintainers
to track the support. By time the  should go away.
Updating ethdev without updating all drivers is hard to manage.
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH v4 0/6] Flow entites behavior on port restart
  2021-10-21  6:34     ` [dpdk-dev] [PATCH v4 " Dmitry Kozlyuk
                         ` (6 preceding siblings ...)
  2021-10-26 11:46       ` [dpdk-dev] [PATCH v4 0/6] Flow entites behavior on port restart Ferruh Yigit
@ 2021-11-02 13:49       ` Ferruh Yigit
  2021-11-02 13:54       ` [dpdk-dev] [PATCH v5 " Dmitry Kozlyuk
  8 siblings, 0 replies; 96+ messages in thread
From: Ferruh Yigit @ 2021-11-02 13:49 UTC (permalink / raw)
  To: Dmitry Kozlyuk, dev; +Cc: Ori Kam
On 10/21/2021 7:34 AM, Dmitry Kozlyuk wrote:
> It is unspecified whether flow rules and indirect actions are kept
> when a port is stopped, possibly reconfigured, and started again.
> Vendors approach the topic differently, e.g. mlx5 and i40e PMD
> disagree in whether flow rules can be kept, and mlx5 PMD would keep
> indirect actions. In the end, applications are greatly affected
> by whatever contract there is and need to know it.
> 
> Applications may wish to restart the port to reconfigure it,
> e.g. switch offloads or even modify queues.
> Keeping rte_flow entities enables application improvements:
> 1. Since keeping the rules across restart comes with the ability
>     to create rules before the device is started. This allows
>     to have all the rules created at the moment of start,
>     so that there is no time frame when traffic is coming already,
>     but the rules are not yet created (restored).
> 2. When a rule or an indirect action has some associated state,
>     such as a counter, application saves the need to keep
>     additional state in order to cope with information loss
>     if such an entity would be destroyed.
> 
> It is proposed to advertise capabilities of keeping flow rules
> and indirect actions (as a special case of shared object)
> using a combination of ethdev info and rte_flow calls.
> Then a bug is fixed in mlx5 PMD that prevented indirect RSS action
> from being kept, and the driver starts advertising the new capability.
> 
> Prior discussions:
> 1) http://inbox.dpdk.org/dev/20210727073121.895620-1-dkozlyuk@nvidia.com/
> 2) http://inbox.dpdk.org/dev/20210901085516.3647814-1-dkozlyuk@nvidia.com/
> 
> v4:  1. Fix rebase conflicts (CI).
>       2. State rule behavior when a port is not started or stopped (Ori).
>       3. Improve wording on rule features, add examples (Andrew).
>       4. State that rules/actions that cannot be kept while other can be
>          must be destroyed by the application (Andrew/Ori).
>       5. Add rationale to the cover letter (Andrew).
> 
> Dmitry Kozlyuk (6):
>    ethdev: add capability to keep flow rules on restart
>    ethdev: add capability to keep shared objects on restart
>    net: advertise no support for keeping flow rules
>    net/mlx5: discover max flow priority using DevX
>    net/mlx5: create drop queue using DevX
>    net/mlx5: preserve indirect actions on restart
> 
Hi Dmitry,
Can you please rebase this set on latest next-net?
There are some changes both in ethdev and mlx5.
^ permalink raw reply	[flat|nested] 96+ messages in thread
* [dpdk-dev] [PATCH v5 0/6] Flow entites behavior on port restart
  2021-10-21  6:34     ` [dpdk-dev] [PATCH v4 " Dmitry Kozlyuk
                         ` (7 preceding siblings ...)
  2021-11-02 13:49       ` Ferruh Yigit
@ 2021-11-02 13:54       ` Dmitry Kozlyuk
  2021-11-02 13:54         ` [dpdk-dev] [PATCH v5 1/6] ethdev: add capability to keep flow rules on restart Dmitry Kozlyuk
                           ` (7 more replies)
  8 siblings, 8 replies; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-11-02 13:54 UTC (permalink / raw)
  To: dev
It is unspecified whether flow rules and indirect actions are kept
when a port is stopped, possibly reconfigured, and started again.
Vendors approach the topic differently, e.g. mlx5 and i40e PMD
disagree in whether flow rules can be kept, and mlx5 PMD would keep
indirect actions. In the end, applications are greatly affected
by whatever contract there is and need to know it.
Applications may wish to restart the port to reconfigure it,
e.g. switch offloads or even modify queues.
Keeping rte_flow entities enables application improvements:
1. Since keeping the rules across restart comes with the ability
   to create rules before the device is started. This allows
   to have all the rules created at the moment of start,
   so that there is no time frame when traffic is coming already,
   but the rules are not yet created (restored).
2. When a rule or an indirect action has some associated state,
   such as a counter, application saves the need to keep
   additional state in order to cope with information loss
   if such an entity would be destroyed.
It is proposed to advertise capabilities of keeping flow rules
and indirect actions (as a special case of shared object)
using a combination of ethdev info and rte_flow calls.
Then a bug is fixed in mlx5 PMD that prevented indirect RSS action
from being kept, and the driver starts advertising the new capability.
Prior discussions:
1) http://inbox.dpdk.org/dev/20210727073121.895620-1-dkozlyuk@nvidia.com/
2) http://inbox.dpdk.org/dev/20210901085516.3647814-1-dkozlyuk@nvidia.com/
v5:
     1. Fix rebase conflicts.
     2. Add warnings about experimental status (Andrew).
v4:  1. Fix rebase conflicts (CI).
     2. State rule behavior when a port is not started or stopped (Ori).
     3. Improve wording on rule features, add examples (Andrew).
     4. State that rules/actions that cannot be kept while other can be
        must be destroyed by the application (Andrew/Ori).
     5. Add rationale to the cover letter (Andrew).
Dmitry Kozlyuk (6):
  ethdev: add capability to keep flow rules on restart
  ethdev: add capability to keep shared objects on restart
  net: advertise no support for keeping flow rules
  net/mlx5: discover max flow priority using DevX
  net/mlx5: create drop queue using DevX
  net/mlx5: preserve indirect actions on restart
 doc/guides/prog_guide/rte_flow.rst      |  67 ++++++
 drivers/net/bnxt/bnxt_ethdev.c          |   1 +
 drivers/net/bnxt/bnxt_reps.c            |   1 +
 drivers/net/cnxk/cnxk_ethdev_ops.c      |   1 +
 drivers/net/cxgbe/cxgbe_ethdev.c        |   2 +
 drivers/net/dpaa2/dpaa2_ethdev.c        |   1 +
 drivers/net/e1000/em_ethdev.c           |   2 +
 drivers/net/e1000/igb_ethdev.c          |   1 +
 drivers/net/enic/enic_ethdev.c          |   1 +
 drivers/net/failsafe/failsafe_ops.c     |   1 +
 drivers/net/hinic/hinic_pmd_ethdev.c    |   2 +
 drivers/net/hns3/hns3_ethdev.c          |   1 +
 drivers/net/hns3/hns3_ethdev_vf.c       |   1 +
 drivers/net/i40e/i40e_ethdev.c          |   1 +
 drivers/net/i40e/i40e_vf_representor.c  |   2 +
 drivers/net/iavf/iavf_ethdev.c          |   1 +
 drivers/net/ice/ice_dcf_ethdev.c        |   1 +
 drivers/net/igc/igc_ethdev.c            |   1 +
 drivers/net/ipn3ke/ipn3ke_representor.c |   1 +
 drivers/net/mlx5/linux/mlx5_os.c        |   5 -
 drivers/net/mlx5/mlx5_devx.c            | 211 ++++++++++++++---
 drivers/net/mlx5/mlx5_ethdev.c          |   1 +
 drivers/net/mlx5/mlx5_flow.c            | 292 ++++++++++++++++++++++--
 drivers/net/mlx5/mlx5_flow.h            |   6 +
 drivers/net/mlx5/mlx5_flow_dv.c         | 103 +++++++++
 drivers/net/mlx5/mlx5_flow_verbs.c      |  74 +-----
 drivers/net/mlx5/mlx5_rx.h              |   4 +
 drivers/net/mlx5/mlx5_rxq.c             |  99 +++++++-
 drivers/net/mlx5/mlx5_trigger.c         |  10 +
 drivers/net/mvpp2/mrvl_ethdev.c         |   2 +
 drivers/net/octeontx2/otx2_ethdev_ops.c |   1 +
 drivers/net/qede/qede_ethdev.c          |   1 +
 drivers/net/sfc/sfc_ethdev.c            |   1 +
 drivers/net/softnic/rte_eth_softnic.c   |   1 +
 drivers/net/tap/rte_eth_tap.c           |   1 +
 drivers/net/txgbe/txgbe_ethdev.c        |   1 +
 drivers/net/txgbe/txgbe_ethdev_vf.c     |   1 +
 lib/ethdev/rte_ethdev.h                 |  10 +
 lib/ethdev/rte_flow.h                   |   1 +
 39 files changed, 781 insertions(+), 133 deletions(-)
-- 
2.25.1
^ permalink raw reply	[flat|nested] 96+ messages in thread
* [dpdk-dev] [PATCH v5 1/6] ethdev: add capability to keep flow rules on restart
  2021-11-02 13:54       ` [dpdk-dev] [PATCH v5 " Dmitry Kozlyuk
@ 2021-11-02 13:54         ` Dmitry Kozlyuk
  2021-11-02 13:54         ` [dpdk-dev] [PATCH v5 2/6] ethdev: add capability to keep shared objects " Dmitry Kozlyuk
                           ` (6 subsequent siblings)
  7 siblings, 0 replies; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-11-02 13:54 UTC (permalink / raw)
  To: dev; +Cc: Ori Kam, Andrew Rybchenko, Thomas Monjalon, Ferruh Yigit
Previously, it was not specified what happens to the flow rules
when the device is stopped, possibly reconfigured, then started.
If flow rules were kept, it could be convenient for application
developers, because they wouldn't need to save and restore them.
However, due to the number of flows and possible creation rate it is
impractical to save all flow rules in DPDK layer. This means that flow
rules persistence really depends on whether PMD and HW can implement it
efficiently. It can also be limited by the rule item and action types,
and its attributes transfer bit (a combination of an item/action type
and a value of the transfer bit is called a ruel feature).
Add a device capability bit for PMDs that can keep at least some
of the flow rules across restart. Without this capability behavior
is still unspecified and it is declared that the application must
flush the rules before stopping the device.
Allow the application to test for persistence of rules using
a particular feature by attempting to create a flow rule
using that feature when the device is stopped
and checking for the specific error.
This is logical because if the PMD can to create the flow rule
when the device is not started and use it after the start happens,
it is natural that it can move its internal flow rule object
to the same state when the device is stopped and restore the state
when the device is started.
Rule persistence across a reconfigurations is not required,
because tracking all the rules and configuration-dependent resources
they use may be infeasible. In case a PMD cannot keep the rules
across reconfiguration, it is allowed just to report an error.
Application must then flush the rules before attempting it.
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
---
 doc/guides/prog_guide/rte_flow.rst | 36 ++++++++++++++++++++++++++++++
 lib/ethdev/rte_ethdev.h            |  7 ++++++
 lib/ethdev/rte_flow.h              |  1 +
 3 files changed, 44 insertions(+)
diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index 2d2d87f1db..e01a079230 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -87,6 +87,42 @@ To avoid resource leaks on the PMD side, handles must be explicitly
 destroyed by the application before releasing associated resources such as
 queues and ports.
 
+.. warning::
+
+   The following description of rule persistence is an experimental behavior
+   that may change without a prior notice.
+
+When the device is stopped, its rules do not process the traffic.
+In particular, transfer rules created using some device
+stop affecting the traffic even if they refer to different ports.
+
+If ``RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP`` is not advertised,
+rules cannot be created until the device is started for the first time
+and cannot be kept when the device is stopped.
+However, PMD also does not flush them automatically on stop,
+so the application must call ``rte_flow_flush()`` or ``rte_flow_destroy()``
+before stopping the device to ensure no rules remain.
+
+If ``RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP`` is advertised, this means
+the PMD can keep at least some rules across the device stop and start.
+However, ``rte_eth_dev_configure()`` may fail if any rules remain,
+so the application must flush them before attempting a reconfiguration.
+Keeping may be unsupported for some types of rule items and actions,
+as well as depending on the value of flow attributes transfer bit.
+A combination of a single an item or action type
+and a value of the transfer bit is called a rule feature.
+For example: a COUNT action with the transfer bit set.
+To test if rules with a particular feature are kept, the application must try
+to create a valid rule using this feature when the device is not started
+(either before the first start or after a stop).
+If it fails with an error of type ``RTE_FLOW_ERROR_TYPE_STATE``,
+all rules using this feature must be flushed by the application
+before stopping the device.
+If it succeeds, such rules will be kept when the device is stopped,
+provided they do not use other features that are not supported.
+Rules that are created when the device is stopped, including the rules
+created for the test, will be kept after the device is started.
+
 The following sections cover:
 
 - **Attributes** (represented by ``struct rte_flow_attr``): properties of a
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 24f30b4b28..a18e6ab887 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -90,6 +90,11 @@
  *     - flow director filtering mode (but not filtering rules)
  *     - NIC queue statistics mappings
  *
+ * The following configuration may be retained or not
+ * depending on the device capabilities:
+ *
+ *     - flow rules
+ *
  * Any other configuration will not be stored and will need to be re-entered
  * before a call to rte_eth_dev_start().
  *
@@ -1691,6 +1696,8 @@ struct rte_eth_conf {
  * mbuf->port field.
  */
 #define RTE_ETH_DEV_CAPA_RXQ_SHARE              RTE_BIT64(2)
+/** Device supports keeping flow rules across restart. */
+#define RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP         RTE_BIT64(3)
 /**@}*/
 
 /*
diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
index 85ab29b320..ebcd3a3c8e 100644
--- a/lib/ethdev/rte_flow.h
+++ b/lib/ethdev/rte_flow.h
@@ -3748,6 +3748,7 @@ enum rte_flow_error_type {
 	RTE_FLOW_ERROR_TYPE_ACTION_NUM, /**< Number of actions. */
 	RTE_FLOW_ERROR_TYPE_ACTION_CONF, /**< Action configuration. */
 	RTE_FLOW_ERROR_TYPE_ACTION, /**< Specific action. */
+	RTE_FLOW_ERROR_TYPE_STATE, /**< Current device state. */
 };
 
 /**
-- 
2.25.1
^ permalink raw reply	[flat|nested] 96+ messages in thread
* [dpdk-dev] [PATCH v5 2/6] ethdev: add capability to keep shared objects on restart
  2021-11-02 13:54       ` [dpdk-dev] [PATCH v5 " Dmitry Kozlyuk
  2021-11-02 13:54         ` [dpdk-dev] [PATCH v5 1/6] ethdev: add capability to keep flow rules on restart Dmitry Kozlyuk
@ 2021-11-02 13:54         ` Dmitry Kozlyuk
  2021-11-02 13:54         ` [dpdk-dev] [PATCH v5 3/6] net: advertise no support for keeping flow rules Dmitry Kozlyuk
                           ` (5 subsequent siblings)
  7 siblings, 0 replies; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-11-02 13:54 UTC (permalink / raw)
  To: dev; +Cc: Ori Kam, Andrew Rybchenko, Thomas Monjalon, Ferruh Yigit
rte_flow_action_handle_create() did not mention what happens
with an indirect action when a device is stopped and started again.
It is natural for some indirect actions, like counter, to be persistent.
Keeping others at least saves application time and complexity.
However, not all PMDs can support it, or the support may be limited
by particular action kinds, that is, combinations of action type
and the value of the transfer bit in its configuration.
Add a device capability to indicate if at least some indirect actions
are kept across the above sequence. Without this capability the behavior
is still unspecified, and application is required to destroy
the indirect actions before stopping the device.
In the future, indirect actions may not be the only type of objects
shared between flow rules. The capability bit intends to cover all
possible types of such objects, hence its name.
Declare that the application can test for the persistence
of a particular indirect action kind by attempting to create
an indirect action of that kind when the device is stopped
and checking for the specific error type.
This is logical because if the PMD can to create an indirect action
when the device is not started and use it after the start happens,
it is natural that it can move its internal flow shared object
to the same state when the device is stopped and restore the state
when the device is started.
Indirect action persistence across a reconfigurations is not required.
In case a PMD cannot keep the indirect actions across reconfiguration,
it is allowed just to report an error.
Application must then flush the indirect actions before attempting it.
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
---
 doc/guides/prog_guide/rte_flow.rst | 31 ++++++++++++++++++++++++++++++
 lib/ethdev/rte_ethdev.h            |  3 +++
 2 files changed, 34 insertions(+)
diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index e01a079230..77de8da973 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -2995,6 +2995,37 @@ updated depend on the type of the ``action`` and different for every type.
 The indirect action specified data (e.g. counter) can be queried by
 ``rte_flow_action_handle_query()``.
 
+.. warning::
+
+   The following description of indirect action persistence
+   is an experimental behavior that may change without a prior notice.
+
+If ``RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP`` is not advertised,
+indirect actions cannot be created until the device is started for the first time
+and cannot be kept when the device is stopped.
+However, PMD also does not flush them automatically on stop,
+so the application must call ``rte_flow_action_handle_destroy()``
+before stopping the device to ensure no indirect actions remain.
+
+If ``RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP`` is advertised,
+this means that the PMD can keep at least some indirect actions
+across device stop and start.
+However, ``rte_eth_dev_configure()`` may fail if any indirect actions remain,
+so the application must destroy them before attempting a reconfiguration.
+Keeping may be only supported for certain kinds of indirect actions.
+A kind is a combination of an action type and a value of its transfer bit.
+For example: an indirect counter with the transfer bit reset.
+To test if a particular kind of indirect actions is kept,
+the application must try to create a valid indirect action of that kind
+when the device is not started (either before the first start of after a stop).
+If it fails with an error of type ``RTE_FLOW_ERROR_TYPE_STATE``,
+application must destroy all indirect actions of this kind
+before stopping the device.
+If it succeeds, all indirect actions of the same kind are kept
+when the device is stopped.
+Indirect actions of a kept kind that are created when the device is stopped,
+including the ones created for the test, will be kept after the device start.
+
 .. _table_rte_flow_action_handle:
 
 .. table:: INDIRECT
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index a18e6ab887..5f803ad1e6 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -94,6 +94,7 @@
  * depending on the device capabilities:
  *
  *     - flow rules
+ *     - flow-related shared objects, e.g. indirect actions
  *
  * Any other configuration will not be stored and will need to be re-entered
  * before a call to rte_eth_dev_start().
@@ -1698,6 +1699,8 @@ struct rte_eth_conf {
 #define RTE_ETH_DEV_CAPA_RXQ_SHARE              RTE_BIT64(2)
 /** Device supports keeping flow rules across restart. */
 #define RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP         RTE_BIT64(3)
+/** Device supports keeping shared flow objects across restart. */
+#define RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP RTE_BIT64(4)
 /**@}*/
 
 /*
-- 
2.25.1
^ permalink raw reply	[flat|nested] 96+ messages in thread
* [dpdk-dev] [PATCH v5 3/6] net: advertise no support for keeping flow rules
  2021-11-02 13:54       ` [dpdk-dev] [PATCH v5 " Dmitry Kozlyuk
  2021-11-02 13:54         ` [dpdk-dev] [PATCH v5 1/6] ethdev: add capability to keep flow rules on restart Dmitry Kozlyuk
  2021-11-02 13:54         ` [dpdk-dev] [PATCH v5 2/6] ethdev: add capability to keep shared objects " Dmitry Kozlyuk
@ 2021-11-02 13:54         ` Dmitry Kozlyuk
  2021-11-02 13:54         ` [dpdk-dev] [PATCH v5 4/6] net/mlx5: discover max flow priority using DevX Dmitry Kozlyuk
                           ` (4 subsequent siblings)
  7 siblings, 0 replies; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-11-02 13:54 UTC (permalink / raw)
  To: dev
  Cc: Ferruh Yigit, Ajit Khaparde, Somnath Kotur, Hyong Youb Kim,
	Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori, Satha Rao,
	Rahul Lakkireddy, Hemant Agrawal, Sachin Saxena, Haiyue Wang,
	John Daley, Gaetan Rivet, Ziyang Xuan, Xiaoyun Wang,
	Guoyang Zhou, Min Hu (Connor),
	Yisen Zhuang, Lijun Ou, Beilei Xing, Jingjing Wu, Qiming Yang,
	Qi Zhang, Rosen Xu, Liron Himi, Jerin Jacob, Rasesh Mody,
	Devendra Singh Rawat, Andrew Rybchenko, Jasvinder Singh,
	Cristian Dumitrescu, Keith Wiles, Jiawen Wu, Jian Wang
When RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP capability bit is zero,
the specified behavior is the same as it had been before
this bit was introduced. Explicitly reset it in all PMDs
supporting rte_flow API in order to attract the attention
of maintainers, who should eventually choose to advertise
the new capability or not. It is already known that
mlx4 and mlx5 will not support this capability.
For RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP
similar action is not performed,
because no PMD except mlx5 supports indirect actions.
Any PMD that starts doing so will anyway have to consider
all relevant API, including this capability.
Suggested-by: Ferruh Yigit <ferruh.yigit@intel.com>
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>
Acked-by: Hyong Youb Kim <hyonkim@cisco.com>
---
 drivers/net/bnxt/bnxt_ethdev.c          | 1 +
 drivers/net/bnxt/bnxt_reps.c            | 1 +
 drivers/net/cnxk/cnxk_ethdev_ops.c      | 1 +
 drivers/net/cxgbe/cxgbe_ethdev.c        | 2 ++
 drivers/net/dpaa2/dpaa2_ethdev.c        | 1 +
 drivers/net/e1000/em_ethdev.c           | 2 ++
 drivers/net/e1000/igb_ethdev.c          | 1 +
 drivers/net/enic/enic_ethdev.c          | 1 +
 drivers/net/failsafe/failsafe_ops.c     | 1 +
 drivers/net/hinic/hinic_pmd_ethdev.c    | 2 ++
 drivers/net/hns3/hns3_ethdev.c          | 1 +
 drivers/net/hns3/hns3_ethdev_vf.c       | 1 +
 drivers/net/i40e/i40e_ethdev.c          | 1 +
 drivers/net/i40e/i40e_vf_representor.c  | 2 ++
 drivers/net/iavf/iavf_ethdev.c          | 1 +
 drivers/net/ice/ice_dcf_ethdev.c        | 1 +
 drivers/net/igc/igc_ethdev.c            | 1 +
 drivers/net/ipn3ke/ipn3ke_representor.c | 1 +
 drivers/net/mvpp2/mrvl_ethdev.c         | 2 ++
 drivers/net/octeontx2/otx2_ethdev_ops.c | 1 +
 drivers/net/qede/qede_ethdev.c          | 1 +
 drivers/net/sfc/sfc_ethdev.c            | 1 +
 drivers/net/softnic/rte_eth_softnic.c   | 1 +
 drivers/net/tap/rte_eth_tap.c           | 1 +
 drivers/net/txgbe/txgbe_ethdev.c        | 1 +
 drivers/net/txgbe/txgbe_ethdev_vf.c     | 1 +
 26 files changed, 31 insertions(+)
diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index 5a34bb96d0..7e3ee3d357 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -1006,6 +1006,7 @@ static int bnxt_dev_info_get_op(struct rte_eth_dev *eth_dev,
 	dev_info->speed_capa = bnxt_get_speed_capabilities(bp);
 	dev_info->dev_capa = RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP |
 			     RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	dev_info->default_rxconf = (struct rte_eth_rxconf) {
 		.rx_thresh = {
diff --git a/drivers/net/bnxt/bnxt_reps.c b/drivers/net/bnxt/bnxt_reps.c
index 1c07db3ca9..01460a0846 100644
--- a/drivers/net/bnxt/bnxt_reps.c
+++ b/drivers/net/bnxt/bnxt_reps.c
@@ -526,6 +526,7 @@ int bnxt_rep_dev_info_get_op(struct rte_eth_dev *eth_dev,
 	dev_info->max_tx_queues = max_rx_rings;
 	dev_info->reta_size = bnxt_rss_hash_tbl_size(parent_bp);
 	dev_info->hash_key_size = 40;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	/* MTU specifics */
 	dev_info->min_mtu = RTE_ETHER_MIN_MTU;
diff --git a/drivers/net/cnxk/cnxk_ethdev_ops.c b/drivers/net/cnxk/cnxk_ethdev_ops.c
index 6746430265..62306b6cd6 100644
--- a/drivers/net/cnxk/cnxk_ethdev_ops.c
+++ b/drivers/net/cnxk/cnxk_ethdev_ops.c
@@ -68,6 +68,7 @@ cnxk_nix_info_get(struct rte_eth_dev *eth_dev, struct rte_eth_dev_info *devinfo)
 	devinfo->speed_capa = dev->speed_capa;
 	devinfo->dev_capa = RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP |
 			    RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP;
+	devinfo->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 	return 0;
 }
 
diff --git a/drivers/net/cxgbe/cxgbe_ethdev.c b/drivers/net/cxgbe/cxgbe_ethdev.c
index 4758321778..e7ea76180f 100644
--- a/drivers/net/cxgbe/cxgbe_ethdev.c
+++ b/drivers/net/cxgbe/cxgbe_ethdev.c
@@ -131,6 +131,8 @@ int cxgbe_dev_info_get(struct rte_eth_dev *eth_dev,
 	device_info->max_vfs = adapter->params.arch.vfcount;
 	device_info->max_vmdq_pools = 0; /* XXX: For now no support for VMDQ */
 
+	device_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
+
 	device_info->rx_queue_offload_capa = 0UL;
 	device_info->rx_offload_capa = CXGBE_RX_OFFLOADS;
 
diff --git a/drivers/net/dpaa2/dpaa2_ethdev.c b/drivers/net/dpaa2/dpaa2_ethdev.c
index 73d17f7b3c..a3706439d5 100644
--- a/drivers/net/dpaa2/dpaa2_ethdev.c
+++ b/drivers/net/dpaa2/dpaa2_ethdev.c
@@ -254,6 +254,7 @@ dpaa2_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->speed_capa = RTE_ETH_LINK_SPEED_1G |
 			RTE_ETH_LINK_SPEED_2_5G |
 			RTE_ETH_LINK_SPEED_10G;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	dev_info->max_hash_mac_addrs = 0;
 	dev_info->max_vfs = 0;
diff --git a/drivers/net/e1000/em_ethdev.c b/drivers/net/e1000/em_ethdev.c
index 18fea4e0ac..31c4870086 100644
--- a/drivers/net/e1000/em_ethdev.c
+++ b/drivers/net/e1000/em_ethdev.c
@@ -1101,6 +1101,8 @@ eth_em_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 			RTE_ETH_LINK_SPEED_100M_HD | RTE_ETH_LINK_SPEED_100M |
 			RTE_ETH_LINK_SPEED_1G;
 
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
+
 	/* Preferred queue parameters */
 	dev_info->default_rxportconf.nb_queues = 1;
 	dev_info->default_txportconf.nb_queues = 1;
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index ff06575f03..d0e2bc9814 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -2168,6 +2168,7 @@ eth_igb_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->tx_queue_offload_capa = igb_get_tx_queue_offloads_capa(dev);
 	dev_info->tx_offload_capa = igb_get_tx_port_offloads_capa(dev) |
 				    dev_info->tx_queue_offload_capa;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	switch (hw->mac.type) {
 	case e1000_82575:
diff --git a/drivers/net/enic/enic_ethdev.c b/drivers/net/enic/enic_ethdev.c
index c8bdaf1a8e..163be09809 100644
--- a/drivers/net/enic/enic_ethdev.c
+++ b/drivers/net/enic/enic_ethdev.c
@@ -469,6 +469,7 @@ static int enicpmd_dev_info_get(struct rte_eth_dev *eth_dev,
 	device_info->rx_offload_capa = enic->rx_offload_capa;
 	device_info->tx_offload_capa = enic->tx_offload_capa;
 	device_info->tx_queue_offload_capa = enic->tx_queue_offload_capa;
+	device_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 	device_info->default_rxconf = (struct rte_eth_rxconf) {
 		.rx_free_thresh = ENIC_DEFAULT_RX_FREE_THRESH
 	};
diff --git a/drivers/net/failsafe/failsafe_ops.c b/drivers/net/failsafe/failsafe_ops.c
index 822883bc2f..55e21d635c 100644
--- a/drivers/net/failsafe/failsafe_ops.c
+++ b/drivers/net/failsafe/failsafe_ops.c
@@ -1227,6 +1227,7 @@ fs_dev_infos_get(struct rte_eth_dev *dev,
 	infos->dev_capa =
 		RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP |
 		RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP;
+	infos->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_PROBED) {
 		struct rte_eth_dev_info sub_info;
diff --git a/drivers/net/hinic/hinic_pmd_ethdev.c b/drivers/net/hinic/hinic_pmd_ethdev.c
index 9cabd3e0c1..1853511c3b 100644
--- a/drivers/net/hinic/hinic_pmd_ethdev.c
+++ b/drivers/net/hinic/hinic_pmd_ethdev.c
@@ -751,6 +751,8 @@ hinic_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
 				RTE_ETH_TX_OFFLOAD_TCP_TSO |
 				RTE_ETH_TX_OFFLOAD_MULTI_SEGS;
 
+	info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
+
 	info->hash_key_size = HINIC_RSS_KEY_SIZE;
 	info->reta_size = HINIC_RSS_INDIR_SIZE;
 	info->flow_type_rss_offloads = HINIC_RSS_OFFLOAD_ALL;
diff --git a/drivers/net/hns3/hns3_ethdev.c b/drivers/net/hns3/hns3_ethdev.c
index 1437a07372..ee0af7756d 100644
--- a/drivers/net/hns3/hns3_ethdev.c
+++ b/drivers/net/hns3/hns3_ethdev.c
@@ -2707,6 +2707,7 @@ hns3_dev_infos_get(struct rte_eth_dev *eth_dev, struct rte_eth_dev_info *info)
 	if (hns3_dev_get_support(hw, INDEP_TXRX))
 		info->dev_capa = RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP |
 				 RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP;
+	info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	if (hns3_dev_get_support(hw, PTP))
 		info->rx_offload_capa |= RTE_ETH_RX_OFFLOAD_TIMESTAMP;
diff --git a/drivers/net/hns3/hns3_ethdev_vf.c b/drivers/net/hns3/hns3_ethdev_vf.c
index 873924927c..2cd8161086 100644
--- a/drivers/net/hns3/hns3_ethdev_vf.c
+++ b/drivers/net/hns3/hns3_ethdev_vf.c
@@ -965,6 +965,7 @@ hns3vf_dev_infos_get(struct rte_eth_dev *eth_dev, struct rte_eth_dev_info *info)
 	if (hns3_dev_get_support(hw, INDEP_TXRX))
 		info->dev_capa = RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP |
 				 RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP;
+	info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	info->rx_desc_lim = (struct rte_eth_desc_lim) {
 		.nb_max = HNS3_MAX_RING_DESC,
diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 62e374d19e..9ea5f303ff 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -3750,6 +3750,7 @@ i40e_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->dev_capa =
 		RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP |
 		RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	dev_info->hash_key_size = (I40E_PFQF_HKEY_MAX_INDEX + 1) *
 						sizeof(uint32_t);
diff --git a/drivers/net/i40e/i40e_vf_representor.c b/drivers/net/i40e/i40e_vf_representor.c
index 663c46b91d..7f8e81858e 100644
--- a/drivers/net/i40e/i40e_vf_representor.c
+++ b/drivers/net/i40e/i40e_vf_representor.c
@@ -35,6 +35,8 @@ i40e_vf_representor_dev_infos_get(struct rte_eth_dev *ethdev,
 	/* get dev info for the vdev */
 	dev_info->device = ethdev->device;
 
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
+
 	dev_info->max_rx_queues = ethdev->data->nb_rx_queues;
 	dev_info->max_tx_queues = ethdev->data->nb_tx_queues;
 
diff --git a/drivers/net/iavf/iavf_ethdev.c b/drivers/net/iavf/iavf_ethdev.c
index f892306f18..48f3a94a95 100644
--- a/drivers/net/iavf/iavf_ethdev.c
+++ b/drivers/net/iavf/iavf_ethdev.c
@@ -943,6 +943,7 @@ iavf_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->reta_size = vf->vf_res->rss_lut_size;
 	dev_info->flow_type_rss_offloads = IAVF_RSS_OFFLOAD_ALL;
 	dev_info->max_mac_addrs = IAVF_NUM_MACADDR_MAX;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 	dev_info->rx_offload_capa =
 		RTE_ETH_RX_OFFLOAD_VLAN_STRIP |
 		RTE_ETH_RX_OFFLOAD_QINQ_STRIP |
diff --git a/drivers/net/ice/ice_dcf_ethdev.c b/drivers/net/ice/ice_dcf_ethdev.c
index 7c71a48010..ca2107f9c6 100644
--- a/drivers/net/ice/ice_dcf_ethdev.c
+++ b/drivers/net/ice/ice_dcf_ethdev.c
@@ -663,6 +663,7 @@ ice_dcf_dev_info_get(struct rte_eth_dev *dev,
 	dev_info->hash_key_size = hw->vf_res->rss_key_size;
 	dev_info->reta_size = hw->vf_res->rss_lut_size;
 	dev_info->flow_type_rss_offloads = ICE_RSS_OFFLOAD_ALL;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	dev_info->rx_offload_capa =
 		RTE_ETH_RX_OFFLOAD_VLAN_STRIP |
diff --git a/drivers/net/igc/igc_ethdev.c b/drivers/net/igc/igc_ethdev.c
index 8189ad412a..3e2bf14b94 100644
--- a/drivers/net/igc/igc_ethdev.c
+++ b/drivers/net/igc/igc_ethdev.c
@@ -1477,6 +1477,7 @@ eth_igc_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->min_rx_bufsize = 256; /* See BSIZE field of RCTL register. */
 	dev_info->max_rx_pktlen = MAX_RX_JUMBO_FRAME_SIZE;
 	dev_info->max_mac_addrs = hw->mac.rar_entry_count;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 	dev_info->rx_offload_capa = IGC_RX_OFFLOAD_ALL;
 	dev_info->tx_offload_capa = IGC_TX_OFFLOAD_ALL;
 	dev_info->rx_queue_offload_capa = RTE_ETH_RX_OFFLOAD_VLAN_STRIP;
diff --git a/drivers/net/ipn3ke/ipn3ke_representor.c b/drivers/net/ipn3ke/ipn3ke_representor.c
index 1708858575..de325c7d29 100644
--- a/drivers/net/ipn3ke/ipn3ke_representor.c
+++ b/drivers/net/ipn3ke/ipn3ke_representor.c
@@ -96,6 +96,7 @@ ipn3ke_rpst_dev_infos_get(struct rte_eth_dev *ethdev,
 	dev_info->dev_capa =
 		RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP |
 		RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	dev_info->switch_info.name = ethdev->device->name;
 	dev_info->switch_info.domain_id = rpst->switch_domain_id;
diff --git a/drivers/net/mvpp2/mrvl_ethdev.c b/drivers/net/mvpp2/mrvl_ethdev.c
index 25f213bda5..9c7fe13f7f 100644
--- a/drivers/net/mvpp2/mrvl_ethdev.c
+++ b/drivers/net/mvpp2/mrvl_ethdev.c
@@ -1709,6 +1709,8 @@ mrvl_dev_infos_get(struct rte_eth_dev *dev,
 {
 	struct mrvl_priv *priv = dev->data->dev_private;
 
+	info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
+
 	info->speed_capa = RTE_ETH_LINK_SPEED_10M |
 			   RTE_ETH_LINK_SPEED_100M |
 			   RTE_ETH_LINK_SPEED_1G |
diff --git a/drivers/net/octeontx2/otx2_ethdev_ops.c b/drivers/net/octeontx2/otx2_ethdev_ops.c
index d5caaa326a..48781514c3 100644
--- a/drivers/net/octeontx2/otx2_ethdev_ops.c
+++ b/drivers/net/octeontx2/otx2_ethdev_ops.c
@@ -583,6 +583,7 @@ otx2_nix_info_get(struct rte_eth_dev *eth_dev, struct rte_eth_dev_info *devinfo)
 
 	devinfo->dev_capa = RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP |
 				RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP;
+	devinfo->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	return 0;
 }
diff --git a/drivers/net/qede/qede_ethdev.c b/drivers/net/qede/qede_ethdev.c
index 8ca00e7f6c..3e9aaeecd3 100644
--- a/drivers/net/qede/qede_ethdev.c
+++ b/drivers/net/qede/qede_ethdev.c
@@ -1367,6 +1367,7 @@ qede_dev_info_get(struct rte_eth_dev *eth_dev,
 	dev_info->max_rx_pktlen = (uint32_t)ETH_TX_MAX_NON_LSO_PKT_LEN;
 	dev_info->rx_desc_lim = qede_rx_desc_lim;
 	dev_info->tx_desc_lim = qede_tx_desc_lim;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	if (IS_PF(edev))
 		dev_info->max_rx_queues = (uint16_t)RTE_MIN(
diff --git a/drivers/net/sfc/sfc_ethdev.c b/drivers/net/sfc/sfc_ethdev.c
index 833d833a04..6b0a7e6b0c 100644
--- a/drivers/net/sfc/sfc_ethdev.c
+++ b/drivers/net/sfc/sfc_ethdev.c
@@ -186,6 +186,7 @@ sfc_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 
 	dev_info->dev_capa = RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP |
 			     RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	if (mae->status == SFC_MAE_STATUS_SUPPORTED ||
 	    mae->status == SFC_MAE_STATUS_ADMIN) {
diff --git a/drivers/net/softnic/rte_eth_softnic.c b/drivers/net/softnic/rte_eth_softnic.c
index 3ef33818a9..8c098cad5b 100644
--- a/drivers/net/softnic/rte_eth_softnic.c
+++ b/drivers/net/softnic/rte_eth_softnic.c
@@ -93,6 +93,7 @@ pmd_dev_infos_get(struct rte_eth_dev *dev __rte_unused,
 	dev_info->max_rx_pktlen = UINT32_MAX;
 	dev_info->max_rx_queues = UINT16_MAX;
 	dev_info->max_tx_queues = UINT16_MAX;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	return 0;
 }
diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index a9a7658147..37ac18f951 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -1006,6 +1006,7 @@ tap_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	 * functions together and not in partial combinations
 	 */
 	dev_info->flow_type_rss_offloads = ~TAP_RSS_HF_MASK;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	return 0;
 }
diff --git a/drivers/net/txgbe/txgbe_ethdev.c b/drivers/net/txgbe/txgbe_ethdev.c
index 169272ded5..4f6db99221 100644
--- a/drivers/net/txgbe/txgbe_ethdev.c
+++ b/drivers/net/txgbe/txgbe_ethdev.c
@@ -2597,6 +2597,7 @@ txgbe_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->max_vfs = pci_dev->max_vfs;
 	dev_info->max_vmdq_pools = RTE_ETH_64_POOLS;
 	dev_info->vmdq_queue_num = dev_info->max_rx_queues;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 	dev_info->rx_queue_offload_capa = txgbe_get_rx_queue_offloads(dev);
 	dev_info->rx_offload_capa = (txgbe_get_rx_port_offloads(dev) |
 				     dev_info->rx_queue_offload_capa);
diff --git a/drivers/net/txgbe/txgbe_ethdev_vf.c b/drivers/net/txgbe/txgbe_ethdev_vf.c
index 4dda55b0c2..67ae69dec3 100644
--- a/drivers/net/txgbe/txgbe_ethdev_vf.c
+++ b/drivers/net/txgbe/txgbe_ethdev_vf.c
@@ -487,6 +487,7 @@ txgbevf_dev_info_get(struct rte_eth_dev *dev,
 	dev_info->max_hash_mac_addrs = TXGBE_VMDQ_NUM_UC_MAC;
 	dev_info->max_vfs = pci_dev->max_vfs;
 	dev_info->max_vmdq_pools = RTE_ETH_64_POOLS;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 	dev_info->rx_queue_offload_capa = txgbe_get_rx_queue_offloads(dev);
 	dev_info->rx_offload_capa = (txgbe_get_rx_port_offloads(dev) |
 				     dev_info->rx_queue_offload_capa);
-- 
2.25.1
^ permalink raw reply	[flat|nested] 96+ messages in thread
* [dpdk-dev] [PATCH v5 4/6] net/mlx5: discover max flow priority using DevX
  2021-11-02 13:54       ` [dpdk-dev] [PATCH v5 " Dmitry Kozlyuk
                           ` (2 preceding siblings ...)
  2021-11-02 13:54         ` [dpdk-dev] [PATCH v5 3/6] net: advertise no support for keeping flow rules Dmitry Kozlyuk
@ 2021-11-02 13:54         ` Dmitry Kozlyuk
  2021-11-02 13:54         ` [dpdk-dev] [PATCH v5 5/6] net/mlx5: create drop queue " Dmitry Kozlyuk
                           ` (3 subsequent siblings)
  7 siblings, 0 replies; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-11-02 13:54 UTC (permalink / raw)
  To: dev; +Cc: stable, Matan Azrad, Viacheslav Ovsiienko
Maximum available flow priority was discovered using Verbs API
regardless of the selected flow engine. This required some Verbs
objects to be initialized in order to use DevX engine. Make priority
discovery an engine method and implement it for DevX using its API.
Cc: stable@dpdk.org
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_os.c   |   1 -
 drivers/net/mlx5/mlx5_flow.c       |  98 +++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_flow.h       |   4 ++
 drivers/net/mlx5/mlx5_flow_dv.c    | 103 +++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_flow_verbs.c |  74 +++------------------
 5 files changed, 216 insertions(+), 64 deletions(-)
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 72bbb665cf..34546635c4 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1720,7 +1720,6 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	priv->drop_queue.hrxq = mlx5_drop_action_create(eth_dev);
 	if (!priv->drop_queue.hrxq)
 		goto error;
-	/* Supported Verbs flow priority number detection. */
 	err = mlx5_flow_discover_priorities(eth_dev);
 	if (err < 0) {
 		err = -err;
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 5d19ef1e82..850eb353fd 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -9570,3 +9570,101 @@ mlx5_flow_expand_rss_adjust_node(const struct rte_flow_item *pattern,
 		return node;
 	}
 }
+
+/* Map of Verbs to Flow priority with 8 Verbs priorities. */
+static const uint32_t priority_map_3[][MLX5_PRIORITY_MAP_MAX] = {
+	{ 0, 1, 2 }, { 2, 3, 4 }, { 5, 6, 7 },
+};
+
+/* Map of Verbs to Flow priority with 16 Verbs priorities. */
+static const uint32_t priority_map_5[][MLX5_PRIORITY_MAP_MAX] = {
+	{ 0, 1, 2 }, { 3, 4, 5 }, { 6, 7, 8 },
+	{ 9, 10, 11 }, { 12, 13, 14 },
+};
+
+/**
+ * Discover the number of available flow priorities.
+ *
+ * @param dev
+ *   Ethernet device.
+ *
+ * @return
+ *   On success, number of available flow priorities.
+ *   On failure, a negative errno-style code and rte_errno is set.
+ */
+int
+mlx5_flow_discover_priorities(struct rte_eth_dev *dev)
+{
+	static const uint16_t vprio[] = {8, 16};
+	const struct mlx5_priv *priv = dev->data->dev_private;
+	const struct mlx5_flow_driver_ops *fops;
+	enum mlx5_flow_drv_type type;
+	int ret;
+
+	type = mlx5_flow_os_get_type();
+	if (type == MLX5_FLOW_TYPE_MAX) {
+		type = MLX5_FLOW_TYPE_VERBS;
+		if (priv->sh->devx && priv->config.dv_flow_en)
+			type = MLX5_FLOW_TYPE_DV;
+	}
+	fops = flow_get_drv_ops(type);
+	if (fops->discover_priorities == NULL) {
+		DRV_LOG(ERR, "Priority discovery not supported");
+		rte_errno = ENOTSUP;
+		return -rte_errno;
+	}
+	ret = fops->discover_priorities(dev, vprio, RTE_DIM(vprio));
+	if (ret < 0)
+		return ret;
+	switch (ret) {
+	case 8:
+		ret = RTE_DIM(priority_map_3);
+		break;
+	case 16:
+		ret = RTE_DIM(priority_map_5);
+		break;
+	default:
+		rte_errno = ENOTSUP;
+		DRV_LOG(ERR,
+			"port %u maximum priority: %d expected 8/16",
+			dev->data->port_id, ret);
+		return -rte_errno;
+	}
+	DRV_LOG(INFO, "port %u supported flow priorities:"
+		" 0-%d for ingress or egress root table,"
+		" 0-%d for non-root table or transfer root table.",
+		dev->data->port_id, ret - 2,
+		MLX5_NON_ROOT_FLOW_MAX_PRIO - 1);
+	return ret;
+}
+
+/**
+ * Adjust flow priority based on the highest layer and the request priority.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] priority
+ *   The rule base priority.
+ * @param[in] subpriority
+ *   The priority based on the items.
+ *
+ * @return
+ *   The new priority.
+ */
+uint32_t
+mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
+			  uint32_t subpriority)
+{
+	uint32_t res = 0;
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	switch (priv->config.flow_prio) {
+	case RTE_DIM(priority_map_3):
+		res = priority_map_3[priority][subpriority];
+		break;
+	case RTE_DIM(priority_map_5):
+		res = priority_map_5[priority][subpriority];
+		break;
+	}
+	return  res;
+}
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 4a16f30fb7..2c9d3759b8 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1229,6 +1229,9 @@ typedef int (*mlx5_flow_create_def_policy_t)
 			(struct rte_eth_dev *dev);
 typedef void (*mlx5_flow_destroy_def_policy_t)
 			(struct rte_eth_dev *dev);
+typedef int (*mlx5_flow_discover_priorities_t)
+			(struct rte_eth_dev *dev,
+			 const uint16_t *vprio, int vprio_n);
 
 struct mlx5_flow_driver_ops {
 	mlx5_flow_validate_t validate;
@@ -1263,6 +1266,7 @@ struct mlx5_flow_driver_ops {
 	mlx5_flow_action_update_t action_update;
 	mlx5_flow_action_query_t action_query;
 	mlx5_flow_sync_domain_t sync_domain;
+	mlx5_flow_discover_priorities_t discover_priorities;
 };
 
 /* mlx5_flow.c */
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 9cba22ca2d..3d59c72550 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -17909,6 +17909,108 @@ flow_dv_sync_domain(struct rte_eth_dev *dev, uint32_t domains, uint32_t flags)
 	return 0;
 }
 
+/**
+ * Discover the number of available flow priorities
+ * by trying to create a flow with the highest priority value
+ * for each possible number.
+ *
+ * @param[in] dev
+ *   Ethernet device.
+ * @param[in] vprio
+ *   List of possible number of available priorities.
+ * @param[in] vprio_n
+ *   Size of @p vprio array.
+ * @return
+ *   On success, number of available flow priorities.
+ *   On failure, a negative errno-style code and rte_errno is set.
+ */
+static int
+flow_dv_discover_priorities(struct rte_eth_dev *dev,
+			    const uint16_t *vprio, int vprio_n)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_indexed_pool *pool = priv->sh->ipool[MLX5_IPOOL_MLX5_FLOW];
+	struct rte_flow_item_eth eth;
+	struct rte_flow_item item = {
+		.type = RTE_FLOW_ITEM_TYPE_ETH,
+		.spec = ð,
+		.mask = ð,
+	};
+	struct mlx5_flow_dv_matcher matcher = {
+		.mask = {
+			.size = sizeof(matcher.mask.buf),
+		},
+	};
+	union mlx5_flow_tbl_key tbl_key;
+	struct mlx5_flow flow;
+	void *action;
+	struct rte_flow_error error;
+	uint8_t misc_mask;
+	int i, err, ret = -ENOTSUP;
+
+	/*
+	 * Prepare a flow with a catch-all pattern and a drop action.
+	 * Use drop queue, because shared drop action may be unavailable.
+	 */
+	action = priv->drop_queue.hrxq->action;
+	if (action == NULL) {
+		DRV_LOG(ERR, "Priority discovery requires a drop action");
+		rte_errno = ENOTSUP;
+		return -rte_errno;
+	}
+	memset(&flow, 0, sizeof(flow));
+	flow.handle = mlx5_ipool_zmalloc(pool, &flow.handle_idx);
+	if (flow.handle == NULL) {
+		DRV_LOG(ERR, "Cannot create flow handle");
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	flow.ingress = true;
+	flow.dv.value.size = MLX5_ST_SZ_BYTES(fte_match_param);
+	flow.dv.actions[0] = action;
+	flow.dv.actions_n = 1;
+	memset(ð, 0, sizeof(eth));
+	flow_dv_translate_item_eth(matcher.mask.buf, flow.dv.value.buf,
+				   &item, /* inner */ false, /* group */ 0);
+	matcher.crc = rte_raw_cksum(matcher.mask.buf, matcher.mask.size);
+	for (i = 0; i < vprio_n; i++) {
+		/* Configure the next proposed maximum priority. */
+		matcher.priority = vprio[i] - 1;
+		memset(&tbl_key, 0, sizeof(tbl_key));
+		err = flow_dv_matcher_register(dev, &matcher, &tbl_key, &flow,
+					       /* tunnel */ NULL,
+					       /* group */ 0,
+					       &error);
+		if (err != 0) {
+			/* This action is pure SW and must always succeed. */
+			DRV_LOG(ERR, "Cannot register matcher");
+			ret = -rte_errno;
+			break;
+		}
+		/* Try to apply the flow to HW. */
+		misc_mask = flow_dv_matcher_enable(flow.dv.value.buf);
+		__flow_dv_adjust_buf_size(&flow.dv.value.size, misc_mask);
+		err = mlx5_flow_os_create_flow
+				(flow.handle->dvh.matcher->matcher_object,
+				 (void *)&flow.dv.value, flow.dv.actions_n,
+				 flow.dv.actions, &flow.handle->drv_flow);
+		if (err == 0) {
+			claim_zero(mlx5_flow_os_destroy_flow
+						(flow.handle->drv_flow));
+			flow.handle->drv_flow = NULL;
+		}
+		claim_zero(flow_dv_matcher_release(dev, flow.handle));
+		if (err != 0)
+			break;
+		ret = vprio[i];
+	}
+	mlx5_ipool_free(pool, flow.handle_idx);
+	/* Set rte_errno if no expected priority value matched. */
+	if (ret < 0)
+		rte_errno = -ret;
+	return ret;
+}
+
 const struct mlx5_flow_driver_ops mlx5_flow_dv_drv_ops = {
 	.validate = flow_dv_validate,
 	.prepare = flow_dv_prepare,
@@ -17942,6 +18044,7 @@ const struct mlx5_flow_driver_ops mlx5_flow_dv_drv_ops = {
 	.action_update = flow_dv_action_update,
 	.action_query = flow_dv_action_query,
 	.sync_domain = flow_dv_sync_domain,
+	.discover_priorities = flow_dv_discover_priorities,
 };
 
 #endif /* HAVE_IBV_FLOW_DV_SUPPORT */
diff --git a/drivers/net/mlx5/mlx5_flow_verbs.c b/drivers/net/mlx5/mlx5_flow_verbs.c
index 176d867202..92dc9903f3 100644
--- a/drivers/net/mlx5/mlx5_flow_verbs.c
+++ b/drivers/net/mlx5/mlx5_flow_verbs.c
@@ -28,17 +28,6 @@
 #define VERBS_SPEC_INNER(item_flags) \
 	(!!((item_flags) & MLX5_FLOW_LAYER_TUNNEL) ? IBV_FLOW_SPEC_INNER : 0)
 
-/* Map of Verbs to Flow priority with 8 Verbs priorities. */
-static const uint32_t priority_map_3[][MLX5_PRIORITY_MAP_MAX] = {
-	{ 0, 1, 2 }, { 2, 3, 4 }, { 5, 6, 7 },
-};
-
-/* Map of Verbs to Flow priority with 16 Verbs priorities. */
-static const uint32_t priority_map_5[][MLX5_PRIORITY_MAP_MAX] = {
-	{ 0, 1, 2 }, { 3, 4, 5 }, { 6, 7, 8 },
-	{ 9, 10, 11 }, { 12, 13, 14 },
-};
-
 /* Verbs specification header. */
 struct ibv_spec_header {
 	enum ibv_flow_spec_type type;
@@ -50,13 +39,17 @@ struct ibv_spec_header {
  *
  * @param[in] dev
  *   Pointer to the Ethernet device structure.
- *
+ * @param[in] vprio
+ *   Expected result variants.
+ * @param[in] vprio_n
+ *   Number of entries in @p vprio array.
  * @return
- *   number of supported flow priority on success, a negative errno
+ *   Number of supported flow priority on success, a negative errno
  *   value otherwise and rte_errno is set.
  */
-int
-mlx5_flow_discover_priorities(struct rte_eth_dev *dev)
+static int
+flow_verbs_discover_priorities(struct rte_eth_dev *dev,
+			       const uint16_t *vprio, int vprio_n)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct {
@@ -79,20 +72,19 @@ mlx5_flow_discover_priorities(struct rte_eth_dev *dev)
 	};
 	struct ibv_flow *flow;
 	struct mlx5_hrxq *drop = priv->drop_queue.hrxq;
-	uint16_t vprio[] = { 8, 16 };
 	int i;
 	int priority = 0;
 
 #if defined(HAVE_MLX5DV_DR_DEVX_PORT) || defined(HAVE_MLX5DV_DR_DEVX_PORT_V35)
 	/* If DevX supported, driver must support 16 verbs flow priorities. */
-	priority = RTE_DIM(priority_map_5);
+	priority = 16;
 	goto out;
 #endif
 	if (!drop->qp) {
 		rte_errno = ENOTSUP;
 		return -rte_errno;
 	}
-	for (i = 0; i != RTE_DIM(vprio); i++) {
+	for (i = 0; i != vprio_n; i++) {
 		flow_attr.attr.priority = vprio[i] - 1;
 		flow = mlx5_glue->create_flow(drop->qp, &flow_attr.attr);
 		if (!flow)
@@ -100,20 +92,6 @@ mlx5_flow_discover_priorities(struct rte_eth_dev *dev)
 		claim_zero(mlx5_glue->destroy_flow(flow));
 		priority = vprio[i];
 	}
-	switch (priority) {
-	case 8:
-		priority = RTE_DIM(priority_map_3);
-		break;
-	case 16:
-		priority = RTE_DIM(priority_map_5);
-		break;
-	default:
-		rte_errno = ENOTSUP;
-		DRV_LOG(ERR,
-			"port %u verbs maximum priority: %d expected 8/16",
-			dev->data->port_id, priority);
-		return -rte_errno;
-	}
 #if defined(HAVE_MLX5DV_DR_DEVX_PORT) || defined(HAVE_MLX5DV_DR_DEVX_PORT_V35)
 out:
 #endif
@@ -125,37 +103,6 @@ mlx5_flow_discover_priorities(struct rte_eth_dev *dev)
 	return priority;
 }
 
-/**
- * Adjust flow priority based on the highest layer and the request priority.
- *
- * @param[in] dev
- *   Pointer to the Ethernet device structure.
- * @param[in] priority
- *   The rule base priority.
- * @param[in] subpriority
- *   The priority based on the items.
- *
- * @return
- *   The new priority.
- */
-uint32_t
-mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
-				   uint32_t subpriority)
-{
-	uint32_t res = 0;
-	struct mlx5_priv *priv = dev->data->dev_private;
-
-	switch (priv->config.flow_prio) {
-	case RTE_DIM(priority_map_3):
-		res = priority_map_3[priority][subpriority];
-		break;
-	case RTE_DIM(priority_map_5):
-		res = priority_map_5[priority][subpriority];
-		break;
-	}
-	return  res;
-}
-
 /**
  * Get Verbs flow counter by index.
  *
@@ -2095,4 +2042,5 @@ const struct mlx5_flow_driver_ops mlx5_flow_verbs_drv_ops = {
 	.destroy = flow_verbs_destroy,
 	.query = flow_verbs_query,
 	.sync_domain = flow_verbs_sync_domain,
+	.discover_priorities = flow_verbs_discover_priorities,
 };
-- 
2.25.1
^ permalink raw reply	[flat|nested] 96+ messages in thread
* [dpdk-dev] [PATCH v5 5/6] net/mlx5: create drop queue using DevX
  2021-11-02 13:54       ` [dpdk-dev] [PATCH v5 " Dmitry Kozlyuk
                           ` (3 preceding siblings ...)
  2021-11-02 13:54         ` [dpdk-dev] [PATCH v5 4/6] net/mlx5: discover max flow priority using DevX Dmitry Kozlyuk
@ 2021-11-02 13:54         ` Dmitry Kozlyuk
  2021-11-02 13:54         ` [dpdk-dev] [PATCH v5 6/6] net/mlx5: preserve indirect actions on restart Dmitry Kozlyuk
                           ` (2 subsequent siblings)
  7 siblings, 0 replies; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-11-02 13:54 UTC (permalink / raw)
  To: dev; +Cc: stable, Matan Azrad, Viacheslav Ovsiienko
Drop queue creation and destruction were not implemented for DevX
flow engine and Verbs engine methods were used as a workaround.
Implement these methods for DevX so that there is a valid queue ID
that can be used regardless of queue configuration via API.
Cc: stable@dpdk.org
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_os.c |   4 -
 drivers/net/mlx5/mlx5_devx.c     | 211 ++++++++++++++++++++++++++-----
 2 files changed, 180 insertions(+), 35 deletions(-)
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 34546635c4..091763b745 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1685,10 +1685,6 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	}
 	if (sh->devx && config->dv_flow_en && config->dest_tir) {
 		priv->obj_ops = devx_obj_ops;
-		priv->obj_ops.drop_action_create =
-						ibv_obj_ops.drop_action_create;
-		priv->obj_ops.drop_action_destroy =
-						ibv_obj_ops.drop_action_destroy;
 		mlx5_queue_counter_id_prepare(eth_dev);
 		priv->obj_ops.lb_dummy_queue_create =
 					mlx5_rxq_ibv_obj_dummy_lb_create;
diff --git a/drivers/net/mlx5/mlx5_devx.c b/drivers/net/mlx5/mlx5_devx.c
index 7ed774e804..424f77be79 100644
--- a/drivers/net/mlx5/mlx5_devx.c
+++ b/drivers/net/mlx5/mlx5_devx.c
@@ -226,18 +226,18 @@ mlx5_rx_devx_get_event(struct mlx5_rxq_obj *rxq_obj)
  *
  * @param dev
  *   Pointer to Ethernet device.
- * @param idx
- *   Queue index in DPDK Rx queue array.
+ * @param rxq_data
+ *   RX queue data.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_rxq_create_devx_rq_resources(struct rte_eth_dev *dev, uint16_t idx)
+mlx5_rxq_create_devx_rq_resources(struct rte_eth_dev *dev,
+				  struct mlx5_rxq_data *rxq_data)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_common_device *cdev = priv->sh->cdev;
-	struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[idx];
 	struct mlx5_rxq_ctrl *rxq_ctrl =
 		container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
 	struct mlx5_devx_create_rq_attr rq_attr = { 0 };
@@ -290,20 +290,20 @@ mlx5_rxq_create_devx_rq_resources(struct rte_eth_dev *dev, uint16_t idx)
  *
  * @param dev
  *   Pointer to Ethernet device.
- * @param idx
- *   Queue index in DPDK Rx queue array.
+ * @param rxq_data
+ *   RX queue data.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_rxq_create_devx_cq_resources(struct rte_eth_dev *dev, uint16_t idx)
+mlx5_rxq_create_devx_cq_resources(struct rte_eth_dev *dev,
+				  struct mlx5_rxq_data *rxq_data)
 {
 	struct mlx5_devx_cq *cq_obj = 0;
 	struct mlx5_devx_cq_attr cq_attr = { 0 };
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_dev_ctx_shared *sh = priv->sh;
-	struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[idx];
 	struct mlx5_rxq_ctrl *rxq_ctrl =
 		container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
 	unsigned int cqe_n = mlx5_rxq_cqe_num(rxq_data);
@@ -498,13 +498,13 @@ mlx5_rxq_devx_obj_new(struct rte_eth_dev *dev, uint16_t idx)
 		tmpl->fd = mlx5_os_get_devx_channel_fd(tmpl->devx_channel);
 	}
 	/* Create CQ using DevX API. */
-	ret = mlx5_rxq_create_devx_cq_resources(dev, idx);
+	ret = mlx5_rxq_create_devx_cq_resources(dev, rxq_data);
 	if (ret) {
 		DRV_LOG(ERR, "Failed to create CQ.");
 		goto error;
 	}
 	/* Create RQ using DevX API. */
-	ret = mlx5_rxq_create_devx_rq_resources(dev, idx);
+	ret = mlx5_rxq_create_devx_rq_resources(dev, rxq_data);
 	if (ret) {
 		DRV_LOG(ERR, "Port %u Rx queue %u RQ creation failure.",
 			dev->data->port_id, idx);
@@ -537,6 +537,11 @@ mlx5_rxq_devx_obj_new(struct rte_eth_dev *dev, uint16_t idx)
  *   Pointer to Ethernet device.
  * @param log_n
  *   Log of number of queues in the array.
+ * @param queues
+ *   List of RX queue indices or NULL, in which case
+ *   the attribute will be filled by drop queue ID.
+ * @param queues_n
+ *   Size of @p queues array or 0 if it is NULL.
  * @param ind_tbl
  *   DevX indirection table object.
  *
@@ -564,6 +569,11 @@ mlx5_devx_ind_table_create_rqt_attr(struct rte_eth_dev *dev,
 	}
 	rqt_attr->rqt_max_size = priv->config.ind_table_max_size;
 	rqt_attr->rqt_actual_size = rqt_n;
+	if (queues == NULL) {
+		for (i = 0; i < rqt_n; i++)
+			rqt_attr->rq_list[i] = priv->drop_queue.rxq->rq->id;
+		return rqt_attr;
+	}
 	for (i = 0; i != queues_n; ++i) {
 		struct mlx5_rxq_data *rxq = (*priv->rxqs)[queues[i]];
 		struct mlx5_rxq_ctrl *rxq_ctrl =
@@ -596,11 +606,12 @@ mlx5_devx_ind_table_new(struct rte_eth_dev *dev, const unsigned int log_n,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_devx_rqt_attr *rqt_attr = NULL;
+	const uint16_t *queues = dev->data->dev_started ? ind_tbl->queues :
+							  NULL;
 
 	MLX5_ASSERT(ind_tbl);
-	rqt_attr = mlx5_devx_ind_table_create_rqt_attr(dev, log_n,
-							ind_tbl->queues,
-							ind_tbl->queues_n);
+	rqt_attr = mlx5_devx_ind_table_create_rqt_attr(dev, log_n, queues,
+						       ind_tbl->queues_n);
 	if (!rqt_attr)
 		return -rte_errno;
 	ind_tbl->rqt = mlx5_devx_cmd_create_rqt(priv->sh->cdev->ctx, rqt_attr);
@@ -671,7 +682,8 @@ mlx5_devx_ind_table_destroy(struct mlx5_ind_table_obj *ind_tbl)
  * @param[in] hash_fields
  *   Verbs protocol hash field to make the RSS on.
  * @param[in] ind_tbl
- *   Indirection table for TIR.
+ *   Indirection table for TIR. If table queues array is NULL,
+ *   a TIR for drop queue is assumed.
  * @param[in] tunnel
  *   Tunnel type.
  * @param[out] tir_attr
@@ -687,19 +699,27 @@ mlx5_devx_tir_attr_set(struct rte_eth_dev *dev, const uint8_t *rss_key,
 		       int tunnel, struct mlx5_devx_tir_attr *tir_attr)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[ind_tbl->queues[0]];
-	struct mlx5_rxq_ctrl *rxq_ctrl =
-		container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
-	enum mlx5_rxq_type rxq_obj_type = rxq_ctrl->type;
+	enum mlx5_rxq_type rxq_obj_type;
 	bool lro = true;
 	uint32_t i;
 
-	/* Enable TIR LRO only if all the queues were configured for. */
-	for (i = 0; i < ind_tbl->queues_n; ++i) {
-		if (!(*priv->rxqs)[ind_tbl->queues[i]]->lro) {
-			lro = false;
-			break;
+	/* NULL queues designate drop queue. */
+	if (ind_tbl->queues != NULL) {
+		struct mlx5_rxq_data *rxq_data =
+					(*priv->rxqs)[ind_tbl->queues[0]];
+		struct mlx5_rxq_ctrl *rxq_ctrl =
+			container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
+		rxq_obj_type = rxq_ctrl->type;
+
+		/* Enable TIR LRO only if all the queues were configured for. */
+		for (i = 0; i < ind_tbl->queues_n; ++i) {
+			if (!(*priv->rxqs)[ind_tbl->queues[i]]->lro) {
+				lro = false;
+				break;
+			}
 		}
+	} else {
+		rxq_obj_type = priv->drop_queue.rxq->rxq_ctrl->type;
 	}
 	memset(tir_attr, 0, sizeof(*tir_attr));
 	tir_attr->disp_type = MLX5_TIRC_DISP_TYPE_INDIRECT;
@@ -858,7 +878,7 @@ mlx5_devx_hrxq_modify(struct rte_eth_dev *dev, struct mlx5_hrxq *hrxq,
 }
 
 /**
- * Create a DevX drop action for Rx Hash queue.
+ * Create a DevX drop Rx queue.
  *
  * @param dev
  *   Pointer to Ethernet device.
@@ -867,14 +887,99 @@ mlx5_devx_hrxq_modify(struct rte_eth_dev *dev, struct mlx5_hrxq *hrxq,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_devx_drop_action_create(struct rte_eth_dev *dev)
+mlx5_rxq_devx_obj_drop_create(struct rte_eth_dev *dev)
 {
-	(void)dev;
-	DRV_LOG(ERR, "DevX drop action is not supported yet.");
-	rte_errno = ENOTSUP;
+	struct mlx5_priv *priv = dev->data->dev_private;
+	int socket_id = dev->device->numa_node;
+	struct mlx5_rxq_ctrl *rxq_ctrl;
+	struct mlx5_rxq_data *rxq_data;
+	struct mlx5_rxq_obj *rxq = NULL;
+	int ret;
+
+	/*
+	 * Initialize dummy control structures.
+	 * They are required to hold pointers for cleanup
+	 * and are only accessible via drop queue DevX objects.
+	 */
+	rxq_ctrl = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*rxq_ctrl),
+			       0, socket_id);
+	if (rxq_ctrl == NULL) {
+		DRV_LOG(ERR, "Port %u could not allocate drop queue control",
+			dev->data->port_id);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	rxq = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*rxq), 0, socket_id);
+	if (rxq == NULL) {
+		DRV_LOG(ERR, "Port %u could not allocate drop queue object",
+			dev->data->port_id);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	rxq->rxq_ctrl = rxq_ctrl;
+	rxq_ctrl->type = MLX5_RXQ_TYPE_STANDARD;
+	rxq_ctrl->priv = priv;
+	rxq_ctrl->obj = rxq;
+	rxq_data = &rxq_ctrl->rxq;
+	/* Create CQ using DevX API. */
+	ret = mlx5_rxq_create_devx_cq_resources(dev, rxq_data);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Port %u drop queue CQ creation failed.",
+			dev->data->port_id);
+		goto error;
+	}
+	/* Create RQ using DevX API. */
+	ret = mlx5_rxq_create_devx_rq_resources(dev, rxq_data);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Port %u drop queue RQ creation failed.",
+			dev->data->port_id);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	/* Change queue state to ready. */
+	ret = mlx5_devx_modify_rq(rxq, MLX5_RXQ_MOD_RST2RDY);
+	if (ret != 0)
+		goto error;
+	/* Initialize drop queue. */
+	priv->drop_queue.rxq = rxq;
+	return 0;
+error:
+	ret = rte_errno; /* Save rte_errno before cleanup. */
+	if (rxq != NULL) {
+		if (rxq->rq_obj.rq != NULL)
+			mlx5_devx_rq_destroy(&rxq->rq_obj);
+		if (rxq->cq_obj.cq != NULL)
+			mlx5_devx_cq_destroy(&rxq->cq_obj);
+		if (rxq->devx_channel)
+			mlx5_os_devx_destroy_event_channel
+							(rxq->devx_channel);
+		mlx5_free(rxq);
+	}
+	if (rxq_ctrl != NULL)
+		mlx5_free(rxq_ctrl);
+	rte_errno = ret; /* Restore rte_errno. */
 	return -rte_errno;
 }
 
+/**
+ * Release drop Rx queue resources.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ */
+static void
+mlx5_rxq_devx_obj_drop_release(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rxq_obj *rxq = priv->drop_queue.rxq;
+	struct mlx5_rxq_ctrl *rxq_ctrl = rxq->rxq_ctrl;
+
+	mlx5_rxq_devx_obj_release(rxq);
+	mlx5_free(rxq);
+	mlx5_free(rxq_ctrl);
+	priv->drop_queue.rxq = NULL;
+}
+
 /**
  * Release a drop hash Rx queue.
  *
@@ -884,9 +989,53 @@ mlx5_devx_drop_action_create(struct rte_eth_dev *dev)
 static void
 mlx5_devx_drop_action_destroy(struct rte_eth_dev *dev)
 {
-	(void)dev;
-	DRV_LOG(ERR, "DevX drop action is not supported yet.");
-	rte_errno = ENOTSUP;
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_hrxq *hrxq = priv->drop_queue.hrxq;
+
+	if (hrxq->tir != NULL)
+		mlx5_devx_tir_destroy(hrxq);
+	if (hrxq->ind_table->ind_table != NULL)
+		mlx5_devx_ind_table_destroy(hrxq->ind_table);
+	if (priv->drop_queue.rxq->rq != NULL)
+		mlx5_rxq_devx_obj_drop_release(dev);
+}
+
+/**
+ * Create a DevX drop action for Rx Hash queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_devx_drop_action_create(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_hrxq *hrxq = priv->drop_queue.hrxq;
+	int ret;
+
+	ret = mlx5_rxq_devx_obj_drop_create(dev);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Cannot create drop RX queue");
+		return ret;
+	}
+	/* hrxq->ind_table queues are NULL, drop RX queue ID will be used */
+	ret = mlx5_devx_ind_table_new(dev, 0, hrxq->ind_table);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Cannot create drop hash RX queue indirection table");
+		goto error;
+	}
+	ret = mlx5_devx_hrxq_new(dev, hrxq, /* tunnel */ false);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Cannot create drop hash RX queue");
+		goto error;
+	}
+	return 0;
+error:
+	mlx5_devx_drop_action_destroy(dev);
+	return ret;
 }
 
 /**
-- 
2.25.1
^ permalink raw reply	[flat|nested] 96+ messages in thread
* [dpdk-dev] [PATCH v5 6/6] net/mlx5: preserve indirect actions on restart
  2021-11-02 13:54       ` [dpdk-dev] [PATCH v5 " Dmitry Kozlyuk
                           ` (4 preceding siblings ...)
  2021-11-02 13:54         ` [dpdk-dev] [PATCH v5 5/6] net/mlx5: create drop queue " Dmitry Kozlyuk
@ 2021-11-02 13:54         ` Dmitry Kozlyuk
  2021-11-02 14:23         ` [dpdk-dev] [PATCH v5 0/6] Flow entites behavior on port restart Ferruh Yigit
  2021-11-02 17:01         ` [dpdk-dev] [PATCH v6 " Dmitry Kozlyuk
  7 siblings, 0 replies; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-11-02 13:54 UTC (permalink / raw)
  To: dev; +Cc: bingz, stable, Matan Azrad, Viacheslav Ovsiienko
MLX5 PMD uses reference counting to manage RX queue resources.
After port stop shared RSS actions kept references to RX queues,
preventing resource release. As a result, internal PMD mempool
for such queues had been exhausted after a number of port restarts.
Diagnostic message from rte_eth_dev_start():
    Rx queue allocation failed: Cannot allocate memory
Dereference RX queues used by indirect actions on port stop (detach)
and restore references on port start (attach) in order to allow RX queue
resource release, but keep indirect RSS across the port restart.
Replace queue IDs in HW by drop queue ID on detach and restore actual
queue IDs on attach.
When the port is stopped, create indirect RSS in the detached state.
As a result, MLX5 PMD is able to keep all its indirect actions
across port restart. Advertise this capability.
Fixes: 4b61b8774be9 ("ethdev: introduce indirect flow action")
Cc: bingz@nvidia.com
Cc: stable@dpdk.org
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/net/mlx5/mlx5_ethdev.c  |   1 +
 drivers/net/mlx5/mlx5_flow.c    | 194 ++++++++++++++++++++++++++++----
 drivers/net/mlx5/mlx5_flow.h    |   2 +
 drivers/net/mlx5/mlx5_rx.h      |   4 +
 drivers/net/mlx5/mlx5_rxq.c     |  99 ++++++++++++++--
 drivers/net/mlx5/mlx5_trigger.c |  10 ++
 6 files changed, 276 insertions(+), 34 deletions(-)
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index f2b78c3cc6..81fa8845bb 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -321,6 +321,7 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
 	info->rx_offload_capa = (mlx5_get_rx_port_offloads() |
 				 info->rx_queue_offload_capa);
 	info->tx_offload_capa = mlx5_get_tx_port_offloads(dev);
+	info->dev_capa = RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP;
 	info->if_index = mlx5_ifindex(dev);
 	info->reta_size = priv->reta_idx_n ?
 		priv->reta_idx_n : config->ind_table_max_size;
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 850eb353fd..671a5a34d9 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -1588,6 +1588,58 @@ mlx5_flow_validate_action_queue(const struct rte_flow_action *action,
 	return 0;
 }
 
+/**
+ * Validate queue numbers for device RSS.
+ *
+ * @param[in] dev
+ *   Configured device.
+ * @param[in] queues
+ *   Array of queue numbers.
+ * @param[in] queues_n
+ *   Size of the @p queues array.
+ * @param[out] error
+ *   On error, filled with a textual error description.
+ * @param[out] queue
+ *   On error, filled with an offending queue index in @p queues array.
+ *
+ * @return
+ *   0 on success, a negative errno code on error.
+ */
+static int
+mlx5_validate_rss_queues(const struct rte_eth_dev *dev,
+			 const uint16_t *queues, uint32_t queues_n,
+			 const char **error, uint32_t *queue_idx)
+{
+	const struct mlx5_priv *priv = dev->data->dev_private;
+	enum mlx5_rxq_type rxq_type = MLX5_RXQ_TYPE_UNDEFINED;
+	uint32_t i;
+
+	for (i = 0; i != queues_n; ++i) {
+		struct mlx5_rxq_ctrl *rxq_ctrl;
+
+		if (queues[i] >= priv->rxqs_n) {
+			*error = "queue index out of range";
+			*queue_idx = i;
+			return -EINVAL;
+		}
+		if (!(*priv->rxqs)[queues[i]]) {
+			*error =  "queue is not configured";
+			*queue_idx = i;
+			return -EINVAL;
+		}
+		rxq_ctrl = container_of((*priv->rxqs)[queues[i]],
+					struct mlx5_rxq_ctrl, rxq);
+		if (i == 0)
+			rxq_type = rxq_ctrl->type;
+		if (rxq_type != rxq_ctrl->type) {
+			*error = "combining hairpin and regular RSS queues is not supported";
+			*queue_idx = i;
+			return -ENOTSUP;
+		}
+	}
+	return 0;
+}
+
 /*
  * Validate the rss action.
  *
@@ -1608,8 +1660,9 @@ mlx5_validate_action_rss(struct rte_eth_dev *dev,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	const struct rte_flow_action_rss *rss = action->conf;
-	enum mlx5_rxq_type rxq_type = MLX5_RXQ_TYPE_UNDEFINED;
-	unsigned int i;
+	int ret;
+	const char *message;
+	uint32_t queue_idx;
 
 	if (rss->func != RTE_ETH_HASH_FUNCTION_DEFAULT &&
 	    rss->func != RTE_ETH_HASH_FUNCTION_TOEPLITZ)
@@ -1673,27 +1726,12 @@ mlx5_validate_action_rss(struct rte_eth_dev *dev,
 		return rte_flow_error_set(error, EINVAL,
 					  RTE_FLOW_ERROR_TYPE_ACTION_CONF,
 					  NULL, "No queues configured");
-	for (i = 0; i != rss->queue_num; ++i) {
-		struct mlx5_rxq_ctrl *rxq_ctrl;
-
-		if (rss->queue[i] >= priv->rxqs_n)
-			return rte_flow_error_set
-				(error, EINVAL,
-				 RTE_FLOW_ERROR_TYPE_ACTION_CONF,
-				 &rss->queue[i], "queue index out of range");
-		if (!(*priv->rxqs)[rss->queue[i]])
-			return rte_flow_error_set
-				(error, EINVAL, RTE_FLOW_ERROR_TYPE_ACTION_CONF,
-				 &rss->queue[i], "queue is not configured");
-		rxq_ctrl = container_of((*priv->rxqs)[rss->queue[i]],
-					struct mlx5_rxq_ctrl, rxq);
-		if (i == 0)
-			rxq_type = rxq_ctrl->type;
-		if (rxq_type != rxq_ctrl->type)
-			return rte_flow_error_set
-				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION_CONF,
-				 &rss->queue[i],
-				 "combining hairpin and regular RSS queues is not supported");
+	ret = mlx5_validate_rss_queues(dev, rss->queue, rss->queue_num,
+				       &message, &queue_idx);
+	if (ret != 0) {
+		return rte_flow_error_set(error, -ret,
+					  RTE_FLOW_ERROR_TYPE_ACTION_CONF,
+					  &rss->queue[queue_idx], message);
 	}
 	return 0;
 }
@@ -8660,6 +8698,116 @@ mlx5_action_handle_flush(struct rte_eth_dev *dev)
 	return ret;
 }
 
+/**
+ * Validate existing indirect actions against current device configuration
+ * and attach them to device resources.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_action_handle_attach(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_indexed_pool *ipool =
+			priv->sh->ipool[MLX5_IPOOL_RSS_SHARED_ACTIONS];
+	struct mlx5_shared_action_rss *shared_rss, *shared_rss_last;
+	int ret = 0;
+	uint32_t idx;
+
+	ILIST_FOREACH(ipool, priv->rss_shared_actions, idx, shared_rss, next) {
+		struct mlx5_ind_table_obj *ind_tbl = shared_rss->ind_tbl;
+		const char *message;
+		uint32_t queue_idx;
+
+		ret = mlx5_validate_rss_queues(dev, ind_tbl->queues,
+					       ind_tbl->queues_n,
+					       &message, &queue_idx);
+		if (ret != 0) {
+			DRV_LOG(ERR, "Port %u cannot use queue %u in RSS: %s",
+				dev->data->port_id, ind_tbl->queues[queue_idx],
+				message);
+			break;
+		}
+	}
+	if (ret != 0)
+		return ret;
+	ILIST_FOREACH(ipool, priv->rss_shared_actions, idx, shared_rss, next) {
+		struct mlx5_ind_table_obj *ind_tbl = shared_rss->ind_tbl;
+
+		ret = mlx5_ind_table_obj_attach(dev, ind_tbl);
+		if (ret != 0) {
+			DRV_LOG(ERR, "Port %u could not attach "
+				"indirection table obj %p",
+				dev->data->port_id, (void *)ind_tbl);
+			goto error;
+		}
+	}
+	return 0;
+error:
+	shared_rss_last = shared_rss;
+	ILIST_FOREACH(ipool, priv->rss_shared_actions, idx, shared_rss, next) {
+		struct mlx5_ind_table_obj *ind_tbl = shared_rss->ind_tbl;
+
+		if (shared_rss == shared_rss_last)
+			break;
+		if (mlx5_ind_table_obj_detach(dev, ind_tbl) != 0)
+			DRV_LOG(CRIT, "Port %u could not detach "
+				"indirection table obj %p on rollback",
+				dev->data->port_id, (void *)ind_tbl);
+	}
+	return ret;
+}
+
+/**
+ * Detach indirect actions of the device from its resources.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_action_handle_detach(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_indexed_pool *ipool =
+			priv->sh->ipool[MLX5_IPOOL_RSS_SHARED_ACTIONS];
+	struct mlx5_shared_action_rss *shared_rss, *shared_rss_last;
+	int ret = 0;
+	uint32_t idx;
+
+	ILIST_FOREACH(ipool, priv->rss_shared_actions, idx, shared_rss, next) {
+		struct mlx5_ind_table_obj *ind_tbl = shared_rss->ind_tbl;
+
+		ret = mlx5_ind_table_obj_detach(dev, ind_tbl);
+		if (ret != 0) {
+			DRV_LOG(ERR, "Port %u could not detach "
+				"indirection table obj %p",
+				dev->data->port_id, (void *)ind_tbl);
+			goto error;
+		}
+	}
+	return 0;
+error:
+	shared_rss_last = shared_rss;
+	ILIST_FOREACH(ipool, priv->rss_shared_actions, idx, shared_rss, next) {
+		struct mlx5_ind_table_obj *ind_tbl = shared_rss->ind_tbl;
+
+		if (shared_rss == shared_rss_last)
+			break;
+		if (mlx5_ind_table_obj_attach(dev, ind_tbl) != 0)
+			DRV_LOG(CRIT, "Port %u could not attach "
+				"indirection table obj %p on rollback",
+				dev->data->port_id, (void *)ind_tbl);
+	}
+	return ret;
+}
+
 #ifndef HAVE_MLX5DV_DR
 #define MLX5_DOMAIN_SYNC_FLOW ((1 << 0) | (1 << 1))
 #else
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 2c9d3759b8..6eb254f115 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1577,6 +1577,8 @@ void mlx5_flow_destroy_sub_policy_with_rxq(struct rte_eth_dev *dev,
 		struct mlx5_flow_meter_policy *mtr_policy);
 int mlx5_flow_dv_discover_counter_offset_support(struct rte_eth_dev *dev);
 int mlx5_flow_discover_dr_action_support(struct rte_eth_dev *dev);
+int mlx5_action_handle_attach(struct rte_eth_dev *dev);
+int mlx5_action_handle_detach(struct rte_eth_dev *dev);
 int mlx5_action_handle_flush(struct rte_eth_dev *dev);
 void mlx5_release_tunnel_hub(struct mlx5_dev_ctx_shared *sh, uint16_t port_id);
 int mlx5_alloc_tunnel_hub(struct mlx5_dev_ctx_shared *sh);
diff --git a/drivers/net/mlx5/mlx5_rx.h b/drivers/net/mlx5/mlx5_rx.h
index 4952fe1455..69b1263339 100644
--- a/drivers/net/mlx5/mlx5_rx.h
+++ b/drivers/net/mlx5/mlx5_rx.h
@@ -211,6 +211,10 @@ int mlx5_ind_table_obj_modify(struct rte_eth_dev *dev,
 			      struct mlx5_ind_table_obj *ind_tbl,
 			      uint16_t *queues, const uint32_t queues_n,
 			      bool standalone);
+int mlx5_ind_table_obj_attach(struct rte_eth_dev *dev,
+			      struct mlx5_ind_table_obj *ind_tbl);
+int mlx5_ind_table_obj_detach(struct rte_eth_dev *dev,
+			      struct mlx5_ind_table_obj *ind_tbl);
 struct mlx5_list_entry *mlx5_hrxq_create_cb(void *tool_ctx, void *cb_ctx);
 int mlx5_hrxq_match_cb(void *tool_ctx, struct mlx5_list_entry *entry,
 		       void *cb_ctx);
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 4f02fe02b9..9220bb2c15 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -2032,6 +2032,26 @@ mlx5_ind_table_obj_new(struct rte_eth_dev *dev, const uint16_t *queues,
 	return ind_tbl;
 }
 
+static int
+mlx5_ind_table_obj_check_standalone(struct rte_eth_dev *dev __rte_unused,
+				    struct mlx5_ind_table_obj *ind_tbl)
+{
+	uint32_t refcnt;
+
+	refcnt = __atomic_load_n(&ind_tbl->refcnt, __ATOMIC_RELAXED);
+	if (refcnt <= 1)
+		return 0;
+	/*
+	 * Modification of indirection tables having more than 1
+	 * reference is unsupported.
+	 */
+	DRV_LOG(DEBUG,
+		"Port %u cannot modify indirection table %p (refcnt %u > 1).",
+		dev->data->port_id, (void *)ind_tbl, refcnt);
+	rte_errno = EINVAL;
+	return -rte_errno;
+}
+
 /**
  * Modify an indirection table.
  *
@@ -2064,18 +2084,8 @@ mlx5_ind_table_obj_modify(struct rte_eth_dev *dev,
 
 	MLX5_ASSERT(standalone);
 	RTE_SET_USED(standalone);
-	if (__atomic_load_n(&ind_tbl->refcnt, __ATOMIC_RELAXED) > 1) {
-		/*
-		 * Modification of indirection ntables having more than 1
-		 * reference unsupported. Intended for standalone indirection
-		 * tables only.
-		 */
-		DRV_LOG(DEBUG,
-			"Port %u cannot modify indirection table (refcnt> 1).",
-			dev->data->port_id);
-		rte_errno = EINVAL;
+	if (mlx5_ind_table_obj_check_standalone(dev, ind_tbl) < 0)
 		return -rte_errno;
-	}
 	for (i = 0; i != queues_n; ++i) {
 		if (!mlx5_rxq_get(dev, queues[i])) {
 			ret = -rte_errno;
@@ -2101,6 +2111,73 @@ mlx5_ind_table_obj_modify(struct rte_eth_dev *dev,
 	return ret;
 }
 
+/**
+ * Attach an indirection table to its queues.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param ind_table
+ *   Indirection table to attach.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_ind_table_obj_attach(struct rte_eth_dev *dev,
+			  struct mlx5_ind_table_obj *ind_tbl)
+{
+	unsigned int i;
+	int ret;
+
+	ret = mlx5_ind_table_obj_modify(dev, ind_tbl, ind_tbl->queues,
+					ind_tbl->queues_n, true);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Port %u could not modify indirect table obj %p",
+			dev->data->port_id, (void *)ind_tbl);
+		return ret;
+	}
+	for (i = 0; i < ind_tbl->queues_n; i++)
+		mlx5_rxq_get(dev, ind_tbl->queues[i]);
+	return 0;
+}
+
+/**
+ * Detach an indirection table from its queues.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param ind_table
+ *   Indirection table to detach.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_ind_table_obj_detach(struct rte_eth_dev *dev,
+			  struct mlx5_ind_table_obj *ind_tbl)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const unsigned int n = rte_is_power_of_2(ind_tbl->queues_n) ?
+			       log2above(ind_tbl->queues_n) :
+			       log2above(priv->config.ind_table_max_size);
+	unsigned int i;
+	int ret;
+
+	ret = mlx5_ind_table_obj_check_standalone(dev, ind_tbl);
+	if (ret != 0)
+		return ret;
+	MLX5_ASSERT(priv->obj_ops.ind_table_modify);
+	ret = priv->obj_ops.ind_table_modify(dev, n, NULL, 0, ind_tbl);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Port %u could not modify indirect table obj %p",
+			dev->data->port_id, (void *)ind_tbl);
+		return ret;
+	}
+	for (i = 0; i < ind_tbl->queues_n; i++)
+		mlx5_rxq_release(dev, ind_tbl->queues[i]);
+	return ret;
+}
+
 int
 mlx5_hrxq_match_cb(void *tool_ctx __rte_unused, struct mlx5_list_entry *entry,
 		   void *cb_ctx)
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index d916c8addc..ebeeae279e 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -14,6 +14,7 @@
 #include <mlx5_malloc.h>
 
 #include "mlx5.h"
+#include "mlx5_flow.h"
 #include "mlx5_rx.h"
 #include "mlx5_tx.h"
 #include "mlx5_utils.h"
@@ -1162,6 +1163,14 @@ mlx5_dev_start(struct rte_eth_dev *dev)
 	mlx5_rxq_timestamp_set(dev);
 	/* Set a mask and offset of scheduling on timestamp into Tx queues. */
 	mlx5_txq_dynf_timestamp_set(dev);
+	/* Attach indirection table objects detached on port stop. */
+	ret = mlx5_action_handle_attach(dev);
+	if (ret) {
+		DRV_LOG(ERR,
+			"port %u failed to attach indirect actions: %s",
+			dev->data->port_id, rte_strerror(rte_errno));
+		goto error;
+	}
 	/*
 	 * In non-cached mode, it only needs to start the default mreg copy
 	 * action and no flow created by application exists anymore.
@@ -1239,6 +1248,7 @@ mlx5_dev_stop(struct rte_eth_dev *dev)
 	/* All RX queue flags will be cleared in the flush interface. */
 	mlx5_flow_list_flush(dev, MLX5_FLOW_TYPE_GEN, true);
 	mlx5_flow_meter_rxq_flush(dev);
+	mlx5_action_handle_detach(dev);
 	mlx5_rx_intr_vec_disable(dev);
 	priv->sh->port[priv->dev_port - 1].ih_port_id = RTE_MAX_ETHPORTS;
 	priv->sh->port[priv->dev_port - 1].devx_ih_port_id = RTE_MAX_ETHPORTS;
-- 
2.25.1
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH v5 0/6] Flow entites behavior on port restart
  2021-11-02 13:54       ` [dpdk-dev] [PATCH v5 " Dmitry Kozlyuk
                           ` (5 preceding siblings ...)
  2021-11-02 13:54         ` [dpdk-dev] [PATCH v5 6/6] net/mlx5: preserve indirect actions on restart Dmitry Kozlyuk
@ 2021-11-02 14:23         ` Ferruh Yigit
  2021-11-02 17:02           ` Dmitry Kozlyuk
  2021-11-02 17:01         ` [dpdk-dev] [PATCH v6 " Dmitry Kozlyuk
  7 siblings, 1 reply; 96+ messages in thread
From: Ferruh Yigit @ 2021-11-02 14:23 UTC (permalink / raw)
  To: Dmitry Kozlyuk, dev
On 11/2/2021 1:54 PM, Dmitry Kozlyuk wrote:
> It is unspecified whether flow rules and indirect actions are kept
> when a port is stopped, possibly reconfigured, and started again.
> Vendors approach the topic differently, e.g. mlx5 and i40e PMD
> disagree in whether flow rules can be kept, and mlx5 PMD would keep
> indirect actions. In the end, applications are greatly affected
> by whatever contract there is and need to know it.
> 
> Applications may wish to restart the port to reconfigure it,
> e.g. switch offloads or even modify queues.
> Keeping rte_flow entities enables application improvements:
> 1. Since keeping the rules across restart comes with the ability
>     to create rules before the device is started. This allows
>     to have all the rules created at the moment of start,
>     so that there is no time frame when traffic is coming already,
>     but the rules are not yet created (restored).
> 2. When a rule or an indirect action has some associated state,
>     such as a counter, application saves the need to keep
>     additional state in order to cope with information loss
>     if such an entity would be destroyed.
> 
> It is proposed to advertise capabilities of keeping flow rules
> and indirect actions (as a special case of shared object)
> using a combination of ethdev info and rte_flow calls.
> Then a bug is fixed in mlx5 PMD that prevented indirect RSS action
> from being kept, and the driver starts advertising the new capability.
> 
> Prior discussions:
> 1) http://inbox.dpdk.org/dev/20210727073121.895620-1-dkozlyuk@nvidia.com/
> 2) http://inbox.dpdk.org/dev/20210901085516.3647814-1-dkozlyuk@nvidia.com/
> 
> v5:
>       1. Fix rebase conflicts.
I am still getting conflicts. Did you rebase it on top of next-net?
^ permalink raw reply	[flat|nested] 96+ messages in thread
* [dpdk-dev] [PATCH v6 0/6] Flow entites behavior on port restart
  2021-11-02 13:54       ` [dpdk-dev] [PATCH v5 " Dmitry Kozlyuk
                           ` (6 preceding siblings ...)
  2021-11-02 14:23         ` [dpdk-dev] [PATCH v5 0/6] Flow entites behavior on port restart Ferruh Yigit
@ 2021-11-02 17:01         ` Dmitry Kozlyuk
  2021-11-02 17:01           ` [dpdk-dev] [PATCH v6 1/6] ethdev: add capability to keep flow rules on restart Dmitry Kozlyuk
                             ` (6 more replies)
  7 siblings, 7 replies; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-11-02 17:01 UTC (permalink / raw)
  To: dev
It is unspecified whether flow rules and indirect actions are kept
when a port is stopped, possibly reconfigured, and started again.
Vendors approach the topic differently, e.g. mlx5 and i40e PMD
disagree in whether flow rules can be kept, and mlx5 PMD would keep
indirect actions. In the end, applications are greatly affected
by whatever contract there is and need to know it.
Applications may wish to restart the port to reconfigure it,
e.g. switch offloads or even modify queues.
Keeping rte_flow entities enables application improvements:
1. Since keeping the rules across restart comes with the ability
   to create rules before the device is started. This allows
   to have all the rules created at the moment of start,
   so that there is no time frame when traffic is coming already,
   but the rules are not yet created (restored).
2. When a rule or an indirect action has some associated state,
   such as a counter, application saves the need to keep
   additional state in order to cope with information loss
   if such an entity would be destroyed.
It is proposed to advertise capabilities of keeping flow rules
and indirect actions (as a special case of shared object)
using a combination of ethdev info and rte_flow calls.
Then a bug is fixed in mlx5 PMD that prevented indirect RSS action
from being kept, and the driver starts advertising the new capability.
Prior discussions:
1) http://inbox.dpdk.org/dev/20210727073121.895620-1-dkozlyuk@nvidia.com/
2) http://inbox.dpdk.org/dev/20210901085516.3647814-1-dkozlyuk@nvidia.com/
v6:
     Rebase on next-net commit 87f4496c74e6 and fix conflicts.
v5:
     1. Fix rebase conflicts.
     2. Add warnings about experimental status (Andrew).
v4:  1. Fix rebase conflicts (CI).
     2. State rule behavior when a port is not started or stopped (Ori).
     3. Improve wording on rule features, add examples (Andrew).
     4. State that rules/actions that cannot be kept while other can be
        must be destroyed by the application (Andrew/Ori).
     5. Add rationale to the cover letter (Andrew).
Dmitry Kozlyuk (6):
  ethdev: add capability to keep flow rules on restart
  ethdev: add capability to keep shared objects on restart
  net: advertise no support for keeping flow rules
  net/mlx5: discover max flow priority using DevX
  net/mlx5: create drop queue using DevX
  net/mlx5: preserve indirect actions on restart
 doc/guides/prog_guide/rte_flow.rst      |  67 ++++++
 drivers/net/bnxt/bnxt_ethdev.c          |   1 +
 drivers/net/bnxt/bnxt_reps.c            |   1 +
 drivers/net/cnxk/cnxk_ethdev_ops.c      |   1 +
 drivers/net/cxgbe/cxgbe_ethdev.c        |   2 +
 drivers/net/dpaa2/dpaa2_ethdev.c        |   1 +
 drivers/net/e1000/em_ethdev.c           |   2 +
 drivers/net/e1000/igb_ethdev.c          |   1 +
 drivers/net/enic/enic_ethdev.c          |   1 +
 drivers/net/failsafe/failsafe_ops.c     |   1 +
 drivers/net/hinic/hinic_pmd_ethdev.c    |   2 +
 drivers/net/hns3/hns3_ethdev.c          |   1 +
 drivers/net/hns3/hns3_ethdev_vf.c       |   1 +
 drivers/net/i40e/i40e_ethdev.c          |   1 +
 drivers/net/i40e/i40e_vf_representor.c  |   2 +
 drivers/net/iavf/iavf_ethdev.c          |   1 +
 drivers/net/ice/ice_dcf_ethdev.c        |   1 +
 drivers/net/igc/igc_ethdev.c            |   1 +
 drivers/net/ipn3ke/ipn3ke_representor.c |   1 +
 drivers/net/mlx5/linux/mlx5_os.c        |   4 -
 drivers/net/mlx5/mlx5_devx.c            | 211 ++++++++++++++---
 drivers/net/mlx5/mlx5_ethdev.c          |   1 +
 drivers/net/mlx5/mlx5_flow.c            | 292 ++++++++++++++++++++++--
 drivers/net/mlx5/mlx5_flow.h            |   6 +
 drivers/net/mlx5/mlx5_flow_dv.c         | 103 +++++++++
 drivers/net/mlx5/mlx5_flow_verbs.c      |  74 +-----
 drivers/net/mlx5/mlx5_rx.h              |   4 +
 drivers/net/mlx5/mlx5_rxq.c             |  99 +++++++-
 drivers/net/mlx5/mlx5_trigger.c         |  10 +
 drivers/net/mvpp2/mrvl_ethdev.c         |   2 +
 drivers/net/octeontx2/otx2_ethdev_ops.c |   1 +
 drivers/net/qede/qede_ethdev.c          |   1 +
 drivers/net/sfc/sfc_ethdev.c            |   1 +
 drivers/net/softnic/rte_eth_softnic.c   |   1 +
 drivers/net/tap/rte_eth_tap.c           |   1 +
 drivers/net/txgbe/txgbe_ethdev.c        |   1 +
 drivers/net/txgbe/txgbe_ethdev_vf.c     |   1 +
 lib/ethdev/rte_ethdev.h                 |  10 +
 lib/ethdev/rte_flow.h                   |   1 +
 39 files changed, 781 insertions(+), 132 deletions(-)
-- 
2.25.1
^ permalink raw reply	[flat|nested] 96+ messages in thread
* [dpdk-dev] [PATCH v6 1/6] ethdev: add capability to keep flow rules on restart
  2021-11-02 17:01         ` [dpdk-dev] [PATCH v6 " Dmitry Kozlyuk
@ 2021-11-02 17:01           ` Dmitry Kozlyuk
  2021-11-02 17:01           ` [dpdk-dev] [PATCH v6 2/6] ethdev: add capability to keep shared objects " Dmitry Kozlyuk
                             ` (5 subsequent siblings)
  6 siblings, 0 replies; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-11-02 17:01 UTC (permalink / raw)
  To: dev; +Cc: Ori Kam, Andrew Rybchenko, Thomas Monjalon, Ferruh Yigit
Previously, it was not specified what happens to the flow rules
when the device is stopped, possibly reconfigured, then started.
If flow rules were kept, it could be convenient for application
developers, because they wouldn't need to save and restore them.
However, due to the number of flows and possible creation rate it is
impractical to save all flow rules in DPDK layer. This means that flow
rules persistence really depends on whether PMD and HW can implement it
efficiently. It can also be limited by the rule item and action types,
and its attributes transfer bit (a combination of an item/action type
and a value of the transfer bit is called a ruel feature).
Add a device capability bit for PMDs that can keep at least some
of the flow rules across restart. Without this capability behavior
is still unspecified and it is declared that the application must
flush the rules before stopping the device.
Allow the application to test for persistence of rules using
a particular feature by attempting to create a flow rule
using that feature when the device is stopped
and checking for the specific error.
This is logical because if the PMD can to create the flow rule
when the device is not started and use it after the start happens,
it is natural that it can move its internal flow rule object
to the same state when the device is stopped and restore the state
when the device is started.
Rule persistence across a reconfigurations is not required,
because tracking all the rules and configuration-dependent resources
they use may be infeasible. In case a PMD cannot keep the rules
across reconfiguration, it is allowed just to report an error.
Application must then flush the rules before attempting it.
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
---
 doc/guides/prog_guide/rte_flow.rst | 36 ++++++++++++++++++++++++++++++
 lib/ethdev/rte_ethdev.h            |  7 ++++++
 lib/ethdev/rte_flow.h              |  1 +
 3 files changed, 44 insertions(+)
diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index 2d2d87f1db..e01a079230 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -87,6 +87,42 @@ To avoid resource leaks on the PMD side, handles must be explicitly
 destroyed by the application before releasing associated resources such as
 queues and ports.
 
+.. warning::
+
+   The following description of rule persistence is an experimental behavior
+   that may change without a prior notice.
+
+When the device is stopped, its rules do not process the traffic.
+In particular, transfer rules created using some device
+stop affecting the traffic even if they refer to different ports.
+
+If ``RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP`` is not advertised,
+rules cannot be created until the device is started for the first time
+and cannot be kept when the device is stopped.
+However, PMD also does not flush them automatically on stop,
+so the application must call ``rte_flow_flush()`` or ``rte_flow_destroy()``
+before stopping the device to ensure no rules remain.
+
+If ``RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP`` is advertised, this means
+the PMD can keep at least some rules across the device stop and start.
+However, ``rte_eth_dev_configure()`` may fail if any rules remain,
+so the application must flush them before attempting a reconfiguration.
+Keeping may be unsupported for some types of rule items and actions,
+as well as depending on the value of flow attributes transfer bit.
+A combination of a single an item or action type
+and a value of the transfer bit is called a rule feature.
+For example: a COUNT action with the transfer bit set.
+To test if rules with a particular feature are kept, the application must try
+to create a valid rule using this feature when the device is not started
+(either before the first start or after a stop).
+If it fails with an error of type ``RTE_FLOW_ERROR_TYPE_STATE``,
+all rules using this feature must be flushed by the application
+before stopping the device.
+If it succeeds, such rules will be kept when the device is stopped,
+provided they do not use other features that are not supported.
+Rules that are created when the device is stopped, including the rules
+created for the test, will be kept after the device is started.
+
 The following sections cover:
 
 - **Attributes** (represented by ``struct rte_flow_attr``): properties of a
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 24f30b4b28..a18e6ab887 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -90,6 +90,11 @@
  *     - flow director filtering mode (but not filtering rules)
  *     - NIC queue statistics mappings
  *
+ * The following configuration may be retained or not
+ * depending on the device capabilities:
+ *
+ *     - flow rules
+ *
  * Any other configuration will not be stored and will need to be re-entered
  * before a call to rte_eth_dev_start().
  *
@@ -1691,6 +1696,8 @@ struct rte_eth_conf {
  * mbuf->port field.
  */
 #define RTE_ETH_DEV_CAPA_RXQ_SHARE              RTE_BIT64(2)
+/** Device supports keeping flow rules across restart. */
+#define RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP         RTE_BIT64(3)
 /**@}*/
 
 /*
diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
index 85ab29b320..ebcd3a3c8e 100644
--- a/lib/ethdev/rte_flow.h
+++ b/lib/ethdev/rte_flow.h
@@ -3748,6 +3748,7 @@ enum rte_flow_error_type {
 	RTE_FLOW_ERROR_TYPE_ACTION_NUM, /**< Number of actions. */
 	RTE_FLOW_ERROR_TYPE_ACTION_CONF, /**< Action configuration. */
 	RTE_FLOW_ERROR_TYPE_ACTION, /**< Specific action. */
+	RTE_FLOW_ERROR_TYPE_STATE, /**< Current device state. */
 };
 
 /**
-- 
2.25.1
^ permalink raw reply	[flat|nested] 96+ messages in thread
* [dpdk-dev] [PATCH v6 2/6] ethdev: add capability to keep shared objects on restart
  2021-11-02 17:01         ` [dpdk-dev] [PATCH v6 " Dmitry Kozlyuk
  2021-11-02 17:01           ` [dpdk-dev] [PATCH v6 1/6] ethdev: add capability to keep flow rules on restart Dmitry Kozlyuk
@ 2021-11-02 17:01           ` Dmitry Kozlyuk
  2021-11-02 17:01           ` [dpdk-dev] [PATCH v6 3/6] net: advertise no support for keeping flow rules Dmitry Kozlyuk
                             ` (4 subsequent siblings)
  6 siblings, 0 replies; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-11-02 17:01 UTC (permalink / raw)
  To: dev; +Cc: Ori Kam, Andrew Rybchenko, Thomas Monjalon, Ferruh Yigit
rte_flow_action_handle_create() did not mention what happens
with an indirect action when a device is stopped and started again.
It is natural for some indirect actions, like counter, to be persistent.
Keeping others at least saves application time and complexity.
However, not all PMDs can support it, or the support may be limited
by particular action kinds, that is, combinations of action type
and the value of the transfer bit in its configuration.
Add a device capability to indicate if at least some indirect actions
are kept across the above sequence. Without this capability the behavior
is still unspecified, and application is required to destroy
the indirect actions before stopping the device.
In the future, indirect actions may not be the only type of objects
shared between flow rules. The capability bit intends to cover all
possible types of such objects, hence its name.
Declare that the application can test for the persistence
of a particular indirect action kind by attempting to create
an indirect action of that kind when the device is stopped
and checking for the specific error type.
This is logical because if the PMD can to create an indirect action
when the device is not started and use it after the start happens,
it is natural that it can move its internal flow shared object
to the same state when the device is stopped and restore the state
when the device is started.
Indirect action persistence across a reconfigurations is not required.
In case a PMD cannot keep the indirect actions across reconfiguration,
it is allowed just to report an error.
Application must then flush the indirect actions before attempting it.
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
---
 doc/guides/prog_guide/rte_flow.rst | 31 ++++++++++++++++++++++++++++++
 lib/ethdev/rte_ethdev.h            |  3 +++
 2 files changed, 34 insertions(+)
diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index e01a079230..77de8da973 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -2995,6 +2995,37 @@ updated depend on the type of the ``action`` and different for every type.
 The indirect action specified data (e.g. counter) can be queried by
 ``rte_flow_action_handle_query()``.
 
+.. warning::
+
+   The following description of indirect action persistence
+   is an experimental behavior that may change without a prior notice.
+
+If ``RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP`` is not advertised,
+indirect actions cannot be created until the device is started for the first time
+and cannot be kept when the device is stopped.
+However, PMD also does not flush them automatically on stop,
+so the application must call ``rte_flow_action_handle_destroy()``
+before stopping the device to ensure no indirect actions remain.
+
+If ``RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP`` is advertised,
+this means that the PMD can keep at least some indirect actions
+across device stop and start.
+However, ``rte_eth_dev_configure()`` may fail if any indirect actions remain,
+so the application must destroy them before attempting a reconfiguration.
+Keeping may be only supported for certain kinds of indirect actions.
+A kind is a combination of an action type and a value of its transfer bit.
+For example: an indirect counter with the transfer bit reset.
+To test if a particular kind of indirect actions is kept,
+the application must try to create a valid indirect action of that kind
+when the device is not started (either before the first start of after a stop).
+If it fails with an error of type ``RTE_FLOW_ERROR_TYPE_STATE``,
+application must destroy all indirect actions of this kind
+before stopping the device.
+If it succeeds, all indirect actions of the same kind are kept
+when the device is stopped.
+Indirect actions of a kept kind that are created when the device is stopped,
+including the ones created for the test, will be kept after the device start.
+
 .. _table_rte_flow_action_handle:
 
 .. table:: INDIRECT
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index a18e6ab887..5f803ad1e6 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -94,6 +94,7 @@
  * depending on the device capabilities:
  *
  *     - flow rules
+ *     - flow-related shared objects, e.g. indirect actions
  *
  * Any other configuration will not be stored and will need to be re-entered
  * before a call to rte_eth_dev_start().
@@ -1698,6 +1699,8 @@ struct rte_eth_conf {
 #define RTE_ETH_DEV_CAPA_RXQ_SHARE              RTE_BIT64(2)
 /** Device supports keeping flow rules across restart. */
 #define RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP         RTE_BIT64(3)
+/** Device supports keeping shared flow objects across restart. */
+#define RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP RTE_BIT64(4)
 /**@}*/
 
 /*
-- 
2.25.1
^ permalink raw reply	[flat|nested] 96+ messages in thread
* [dpdk-dev] [PATCH v6 3/6] net: advertise no support for keeping flow rules
  2021-11-02 17:01         ` [dpdk-dev] [PATCH v6 " Dmitry Kozlyuk
  2021-11-02 17:01           ` [dpdk-dev] [PATCH v6 1/6] ethdev: add capability to keep flow rules on restart Dmitry Kozlyuk
  2021-11-02 17:01           ` [dpdk-dev] [PATCH v6 2/6] ethdev: add capability to keep shared objects " Dmitry Kozlyuk
@ 2021-11-02 17:01           ` Dmitry Kozlyuk
  2021-11-02 17:01           ` [dpdk-dev] [PATCH v6 4/6] net/mlx5: discover max flow priority using DevX Dmitry Kozlyuk
                             ` (3 subsequent siblings)
  6 siblings, 0 replies; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-11-02 17:01 UTC (permalink / raw)
  To: dev
  Cc: Ferruh Yigit, Ajit Khaparde, Somnath Kotur, Hyong Youb Kim,
	Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori, Satha Rao,
	Rahul Lakkireddy, Hemant Agrawal, Sachin Saxena, Haiyue Wang,
	John Daley, Gaetan Rivet, Ziyang Xuan, Xiaoyun Wang,
	Guoyang Zhou, Min Hu (Connor),
	Yisen Zhuang, Lijun Ou, Beilei Xing, Jingjing Wu, Qiming Yang,
	Qi Zhang, Rosen Xu, Liron Himi, Jerin Jacob, Rasesh Mody,
	Devendra Singh Rawat, Andrew Rybchenko, Jasvinder Singh,
	Cristian Dumitrescu, Keith Wiles, Jiawen Wu, Jian Wang
When RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP capability bit is zero,
the specified behavior is the same as it had been before
this bit was introduced. Explicitly reset it in all PMDs
supporting rte_flow API in order to attract the attention
of maintainers, who should eventually choose to advertise
the new capability or not. It is already known that
mlx4 and mlx5 will not support this capability.
For RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP
similar action is not performed,
because no PMD except mlx5 supports indirect actions.
Any PMD that starts doing so will anyway have to consider
all relevant API, including this capability.
Suggested-by: Ferruh Yigit <ferruh.yigit@intel.com>
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>
Acked-by: Hyong Youb Kim <hyonkim@cisco.com>
---
 drivers/net/bnxt/bnxt_ethdev.c          | 1 +
 drivers/net/bnxt/bnxt_reps.c            | 1 +
 drivers/net/cnxk/cnxk_ethdev_ops.c      | 1 +
 drivers/net/cxgbe/cxgbe_ethdev.c        | 2 ++
 drivers/net/dpaa2/dpaa2_ethdev.c        | 1 +
 drivers/net/e1000/em_ethdev.c           | 2 ++
 drivers/net/e1000/igb_ethdev.c          | 1 +
 drivers/net/enic/enic_ethdev.c          | 1 +
 drivers/net/failsafe/failsafe_ops.c     | 1 +
 drivers/net/hinic/hinic_pmd_ethdev.c    | 2 ++
 drivers/net/hns3/hns3_ethdev.c          | 1 +
 drivers/net/hns3/hns3_ethdev_vf.c       | 1 +
 drivers/net/i40e/i40e_ethdev.c          | 1 +
 drivers/net/i40e/i40e_vf_representor.c  | 2 ++
 drivers/net/iavf/iavf_ethdev.c          | 1 +
 drivers/net/ice/ice_dcf_ethdev.c        | 1 +
 drivers/net/igc/igc_ethdev.c            | 1 +
 drivers/net/ipn3ke/ipn3ke_representor.c | 1 +
 drivers/net/mvpp2/mrvl_ethdev.c         | 2 ++
 drivers/net/octeontx2/otx2_ethdev_ops.c | 1 +
 drivers/net/qede/qede_ethdev.c          | 1 +
 drivers/net/sfc/sfc_ethdev.c            | 1 +
 drivers/net/softnic/rte_eth_softnic.c   | 1 +
 drivers/net/tap/rte_eth_tap.c           | 1 +
 drivers/net/txgbe/txgbe_ethdev.c        | 1 +
 drivers/net/txgbe/txgbe_ethdev_vf.c     | 1 +
 26 files changed, 31 insertions(+)
diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index c8dad8a7c5..257e6b0d6a 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -1000,6 +1000,7 @@ static int bnxt_dev_info_get_op(struct rte_eth_dev *eth_dev,
 	dev_info->speed_capa = bnxt_get_speed_capabilities(bp);
 	dev_info->dev_capa = RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP |
 			     RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	dev_info->default_rxconf = (struct rte_eth_rxconf) {
 		.rx_thresh = {
diff --git a/drivers/net/bnxt/bnxt_reps.c b/drivers/net/bnxt/bnxt_reps.c
index 92beea3558..19da24b41d 100644
--- a/drivers/net/bnxt/bnxt_reps.c
+++ b/drivers/net/bnxt/bnxt_reps.c
@@ -546,6 +546,7 @@ int bnxt_rep_dev_info_get_op(struct rte_eth_dev *eth_dev,
 	dev_info->max_tx_queues = max_rx_rings;
 	dev_info->reta_size = bnxt_rss_hash_tbl_size(parent_bp);
 	dev_info->hash_key_size = 40;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	/* MTU specifics */
 	dev_info->min_mtu = RTE_ETHER_MIN_MTU;
diff --git a/drivers/net/cnxk/cnxk_ethdev_ops.c b/drivers/net/cnxk/cnxk_ethdev_ops.c
index 6746430265..62306b6cd6 100644
--- a/drivers/net/cnxk/cnxk_ethdev_ops.c
+++ b/drivers/net/cnxk/cnxk_ethdev_ops.c
@@ -68,6 +68,7 @@ cnxk_nix_info_get(struct rte_eth_dev *eth_dev, struct rte_eth_dev_info *devinfo)
 	devinfo->speed_capa = dev->speed_capa;
 	devinfo->dev_capa = RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP |
 			    RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP;
+	devinfo->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 	return 0;
 }
 
diff --git a/drivers/net/cxgbe/cxgbe_ethdev.c b/drivers/net/cxgbe/cxgbe_ethdev.c
index 4758321778..e7ea76180f 100644
--- a/drivers/net/cxgbe/cxgbe_ethdev.c
+++ b/drivers/net/cxgbe/cxgbe_ethdev.c
@@ -131,6 +131,8 @@ int cxgbe_dev_info_get(struct rte_eth_dev *eth_dev,
 	device_info->max_vfs = adapter->params.arch.vfcount;
 	device_info->max_vmdq_pools = 0; /* XXX: For now no support for VMDQ */
 
+	device_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
+
 	device_info->rx_queue_offload_capa = 0UL;
 	device_info->rx_offload_capa = CXGBE_RX_OFFLOADS;
 
diff --git a/drivers/net/dpaa2/dpaa2_ethdev.c b/drivers/net/dpaa2/dpaa2_ethdev.c
index 73d17f7b3c..a3706439d5 100644
--- a/drivers/net/dpaa2/dpaa2_ethdev.c
+++ b/drivers/net/dpaa2/dpaa2_ethdev.c
@@ -254,6 +254,7 @@ dpaa2_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->speed_capa = RTE_ETH_LINK_SPEED_1G |
 			RTE_ETH_LINK_SPEED_2_5G |
 			RTE_ETH_LINK_SPEED_10G;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	dev_info->max_hash_mac_addrs = 0;
 	dev_info->max_vfs = 0;
diff --git a/drivers/net/e1000/em_ethdev.c b/drivers/net/e1000/em_ethdev.c
index 18fea4e0ac..31c4870086 100644
--- a/drivers/net/e1000/em_ethdev.c
+++ b/drivers/net/e1000/em_ethdev.c
@@ -1101,6 +1101,8 @@ eth_em_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 			RTE_ETH_LINK_SPEED_100M_HD | RTE_ETH_LINK_SPEED_100M |
 			RTE_ETH_LINK_SPEED_1G;
 
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
+
 	/* Preferred queue parameters */
 	dev_info->default_rxportconf.nb_queues = 1;
 	dev_info->default_txportconf.nb_queues = 1;
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index ff06575f03..d0e2bc9814 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -2168,6 +2168,7 @@ eth_igb_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->tx_queue_offload_capa = igb_get_tx_queue_offloads_capa(dev);
 	dev_info->tx_offload_capa = igb_get_tx_port_offloads_capa(dev) |
 				    dev_info->tx_queue_offload_capa;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	switch (hw->mac.type) {
 	case e1000_82575:
diff --git a/drivers/net/enic/enic_ethdev.c b/drivers/net/enic/enic_ethdev.c
index c8bdaf1a8e..163be09809 100644
--- a/drivers/net/enic/enic_ethdev.c
+++ b/drivers/net/enic/enic_ethdev.c
@@ -469,6 +469,7 @@ static int enicpmd_dev_info_get(struct rte_eth_dev *eth_dev,
 	device_info->rx_offload_capa = enic->rx_offload_capa;
 	device_info->tx_offload_capa = enic->tx_offload_capa;
 	device_info->tx_queue_offload_capa = enic->tx_queue_offload_capa;
+	device_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 	device_info->default_rxconf = (struct rte_eth_rxconf) {
 		.rx_free_thresh = ENIC_DEFAULT_RX_FREE_THRESH
 	};
diff --git a/drivers/net/failsafe/failsafe_ops.c b/drivers/net/failsafe/failsafe_ops.c
index 822883bc2f..55e21d635c 100644
--- a/drivers/net/failsafe/failsafe_ops.c
+++ b/drivers/net/failsafe/failsafe_ops.c
@@ -1227,6 +1227,7 @@ fs_dev_infos_get(struct rte_eth_dev *dev,
 	infos->dev_capa =
 		RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP |
 		RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP;
+	infos->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_PROBED) {
 		struct rte_eth_dev_info sub_info;
diff --git a/drivers/net/hinic/hinic_pmd_ethdev.c b/drivers/net/hinic/hinic_pmd_ethdev.c
index 9cabd3e0c1..1853511c3b 100644
--- a/drivers/net/hinic/hinic_pmd_ethdev.c
+++ b/drivers/net/hinic/hinic_pmd_ethdev.c
@@ -751,6 +751,8 @@ hinic_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
 				RTE_ETH_TX_OFFLOAD_TCP_TSO |
 				RTE_ETH_TX_OFFLOAD_MULTI_SEGS;
 
+	info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
+
 	info->hash_key_size = HINIC_RSS_KEY_SIZE;
 	info->reta_size = HINIC_RSS_INDIR_SIZE;
 	info->flow_type_rss_offloads = HINIC_RSS_OFFLOAD_ALL;
diff --git a/drivers/net/hns3/hns3_ethdev.c b/drivers/net/hns3/hns3_ethdev.c
index 56eca03833..03447c8d4a 100644
--- a/drivers/net/hns3/hns3_ethdev.c
+++ b/drivers/net/hns3/hns3_ethdev.c
@@ -2598,6 +2598,7 @@ hns3_dev_infos_get(struct rte_eth_dev *eth_dev, struct rte_eth_dev_info *info)
 	if (hns3_dev_get_support(hw, INDEP_TXRX))
 		info->dev_capa = RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP |
 				 RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP;
+	info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	if (hns3_dev_get_support(hw, PTP))
 		info->rx_offload_capa |= RTE_ETH_RX_OFFLOAD_TIMESTAMP;
diff --git a/drivers/net/hns3/hns3_ethdev_vf.c b/drivers/net/hns3/hns3_ethdev_vf.c
index 675db44e85..4a0d73fc29 100644
--- a/drivers/net/hns3/hns3_ethdev_vf.c
+++ b/drivers/net/hns3/hns3_ethdev_vf.c
@@ -699,6 +699,7 @@ hns3vf_dev_infos_get(struct rte_eth_dev *eth_dev, struct rte_eth_dev_info *info)
 	if (hns3_dev_get_support(hw, INDEP_TXRX))
 		info->dev_capa = RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP |
 				 RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP;
+	info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	info->rx_desc_lim = (struct rte_eth_desc_lim) {
 		.nb_max = HNS3_MAX_RING_DESC,
diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 62e374d19e..9ea5f303ff 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -3750,6 +3750,7 @@ i40e_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->dev_capa =
 		RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP |
 		RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	dev_info->hash_key_size = (I40E_PFQF_HKEY_MAX_INDEX + 1) *
 						sizeof(uint32_t);
diff --git a/drivers/net/i40e/i40e_vf_representor.c b/drivers/net/i40e/i40e_vf_representor.c
index 663c46b91d..7f8e81858e 100644
--- a/drivers/net/i40e/i40e_vf_representor.c
+++ b/drivers/net/i40e/i40e_vf_representor.c
@@ -35,6 +35,8 @@ i40e_vf_representor_dev_infos_get(struct rte_eth_dev *ethdev,
 	/* get dev info for the vdev */
 	dev_info->device = ethdev->device;
 
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
+
 	dev_info->max_rx_queues = ethdev->data->nb_rx_queues;
 	dev_info->max_tx_queues = ethdev->data->nb_tx_queues;
 
diff --git a/drivers/net/iavf/iavf_ethdev.c b/drivers/net/iavf/iavf_ethdev.c
index 8ae15652cd..7bdf09b199 100644
--- a/drivers/net/iavf/iavf_ethdev.c
+++ b/drivers/net/iavf/iavf_ethdev.c
@@ -1057,6 +1057,7 @@ iavf_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->reta_size = vf->vf_res->rss_lut_size;
 	dev_info->flow_type_rss_offloads = IAVF_RSS_OFFLOAD_ALL;
 	dev_info->max_mac_addrs = IAVF_NUM_MACADDR_MAX;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 	dev_info->rx_offload_capa =
 		RTE_ETH_RX_OFFLOAD_VLAN_STRIP |
 		RTE_ETH_RX_OFFLOAD_QINQ_STRIP |
diff --git a/drivers/net/ice/ice_dcf_ethdev.c b/drivers/net/ice/ice_dcf_ethdev.c
index 4d9484e994..d1e6757641 100644
--- a/drivers/net/ice/ice_dcf_ethdev.c
+++ b/drivers/net/ice/ice_dcf_ethdev.c
@@ -663,6 +663,7 @@ ice_dcf_dev_info_get(struct rte_eth_dev *dev,
 	dev_info->hash_key_size = hw->vf_res->rss_key_size;
 	dev_info->reta_size = hw->vf_res->rss_lut_size;
 	dev_info->flow_type_rss_offloads = ICE_RSS_OFFLOAD_ALL;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	dev_info->rx_offload_capa =
 		RTE_ETH_RX_OFFLOAD_VLAN_STRIP |
diff --git a/drivers/net/igc/igc_ethdev.c b/drivers/net/igc/igc_ethdev.c
index 8189ad412a..3e2bf14b94 100644
--- a/drivers/net/igc/igc_ethdev.c
+++ b/drivers/net/igc/igc_ethdev.c
@@ -1477,6 +1477,7 @@ eth_igc_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->min_rx_bufsize = 256; /* See BSIZE field of RCTL register. */
 	dev_info->max_rx_pktlen = MAX_RX_JUMBO_FRAME_SIZE;
 	dev_info->max_mac_addrs = hw->mac.rar_entry_count;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 	dev_info->rx_offload_capa = IGC_RX_OFFLOAD_ALL;
 	dev_info->tx_offload_capa = IGC_TX_OFFLOAD_ALL;
 	dev_info->rx_queue_offload_capa = RTE_ETH_RX_OFFLOAD_VLAN_STRIP;
diff --git a/drivers/net/ipn3ke/ipn3ke_representor.c b/drivers/net/ipn3ke/ipn3ke_representor.c
index 1708858575..de325c7d29 100644
--- a/drivers/net/ipn3ke/ipn3ke_representor.c
+++ b/drivers/net/ipn3ke/ipn3ke_representor.c
@@ -96,6 +96,7 @@ ipn3ke_rpst_dev_infos_get(struct rte_eth_dev *ethdev,
 	dev_info->dev_capa =
 		RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP |
 		RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	dev_info->switch_info.name = ethdev->device->name;
 	dev_info->switch_info.domain_id = rpst->switch_domain_id;
diff --git a/drivers/net/mvpp2/mrvl_ethdev.c b/drivers/net/mvpp2/mrvl_ethdev.c
index 25f213bda5..9c7fe13f7f 100644
--- a/drivers/net/mvpp2/mrvl_ethdev.c
+++ b/drivers/net/mvpp2/mrvl_ethdev.c
@@ -1709,6 +1709,8 @@ mrvl_dev_infos_get(struct rte_eth_dev *dev,
 {
 	struct mrvl_priv *priv = dev->data->dev_private;
 
+	info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
+
 	info->speed_capa = RTE_ETH_LINK_SPEED_10M |
 			   RTE_ETH_LINK_SPEED_100M |
 			   RTE_ETH_LINK_SPEED_1G |
diff --git a/drivers/net/octeontx2/otx2_ethdev_ops.c b/drivers/net/octeontx2/otx2_ethdev_ops.c
index d5caaa326a..48781514c3 100644
--- a/drivers/net/octeontx2/otx2_ethdev_ops.c
+++ b/drivers/net/octeontx2/otx2_ethdev_ops.c
@@ -583,6 +583,7 @@ otx2_nix_info_get(struct rte_eth_dev *eth_dev, struct rte_eth_dev_info *devinfo)
 
 	devinfo->dev_capa = RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP |
 				RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP;
+	devinfo->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	return 0;
 }
diff --git a/drivers/net/qede/qede_ethdev.c b/drivers/net/qede/qede_ethdev.c
index 8ca00e7f6c..3e9aaeecd3 100644
--- a/drivers/net/qede/qede_ethdev.c
+++ b/drivers/net/qede/qede_ethdev.c
@@ -1367,6 +1367,7 @@ qede_dev_info_get(struct rte_eth_dev *eth_dev,
 	dev_info->max_rx_pktlen = (uint32_t)ETH_TX_MAX_NON_LSO_PKT_LEN;
 	dev_info->rx_desc_lim = qede_rx_desc_lim;
 	dev_info->tx_desc_lim = qede_tx_desc_lim;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	if (IS_PF(edev))
 		dev_info->max_rx_queues = (uint16_t)RTE_MIN(
diff --git a/drivers/net/sfc/sfc_ethdev.c b/drivers/net/sfc/sfc_ethdev.c
index 833d833a04..6b0a7e6b0c 100644
--- a/drivers/net/sfc/sfc_ethdev.c
+++ b/drivers/net/sfc/sfc_ethdev.c
@@ -186,6 +186,7 @@ sfc_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 
 	dev_info->dev_capa = RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP |
 			     RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	if (mae->status == SFC_MAE_STATUS_SUPPORTED ||
 	    mae->status == SFC_MAE_STATUS_ADMIN) {
diff --git a/drivers/net/softnic/rte_eth_softnic.c b/drivers/net/softnic/rte_eth_softnic.c
index 3ef33818a9..8c098cad5b 100644
--- a/drivers/net/softnic/rte_eth_softnic.c
+++ b/drivers/net/softnic/rte_eth_softnic.c
@@ -93,6 +93,7 @@ pmd_dev_infos_get(struct rte_eth_dev *dev __rte_unused,
 	dev_info->max_rx_pktlen = UINT32_MAX;
 	dev_info->max_rx_queues = UINT16_MAX;
 	dev_info->max_tx_queues = UINT16_MAX;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	return 0;
 }
diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index a9a7658147..37ac18f951 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -1006,6 +1006,7 @@ tap_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	 * functions together and not in partial combinations
 	 */
 	dev_info->flow_type_rss_offloads = ~TAP_RSS_HF_MASK;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 
 	return 0;
 }
diff --git a/drivers/net/txgbe/txgbe_ethdev.c b/drivers/net/txgbe/txgbe_ethdev.c
index fde9914e49..5c31ba5358 100644
--- a/drivers/net/txgbe/txgbe_ethdev.c
+++ b/drivers/net/txgbe/txgbe_ethdev.c
@@ -2597,6 +2597,7 @@ txgbe_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->max_vfs = pci_dev->max_vfs;
 	dev_info->max_vmdq_pools = RTE_ETH_64_POOLS;
 	dev_info->vmdq_queue_num = dev_info->max_rx_queues;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 	dev_info->rx_queue_offload_capa = txgbe_get_rx_queue_offloads(dev);
 	dev_info->rx_offload_capa = (txgbe_get_rx_port_offloads(dev) |
 				     dev_info->rx_queue_offload_capa);
diff --git a/drivers/net/txgbe/txgbe_ethdev_vf.c b/drivers/net/txgbe/txgbe_ethdev_vf.c
index 4dda55b0c2..67ae69dec3 100644
--- a/drivers/net/txgbe/txgbe_ethdev_vf.c
+++ b/drivers/net/txgbe/txgbe_ethdev_vf.c
@@ -487,6 +487,7 @@ txgbevf_dev_info_get(struct rte_eth_dev *dev,
 	dev_info->max_hash_mac_addrs = TXGBE_VMDQ_NUM_UC_MAC;
 	dev_info->max_vfs = pci_dev->max_vfs;
 	dev_info->max_vmdq_pools = RTE_ETH_64_POOLS;
+	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
 	dev_info->rx_queue_offload_capa = txgbe_get_rx_queue_offloads(dev);
 	dev_info->rx_offload_capa = (txgbe_get_rx_port_offloads(dev) |
 				     dev_info->rx_queue_offload_capa);
-- 
2.25.1
^ permalink raw reply	[flat|nested] 96+ messages in thread
* [dpdk-dev] [PATCH v6 4/6] net/mlx5: discover max flow priority using DevX
  2021-11-02 17:01         ` [dpdk-dev] [PATCH v6 " Dmitry Kozlyuk
                             ` (2 preceding siblings ...)
  2021-11-02 17:01           ` [dpdk-dev] [PATCH v6 3/6] net: advertise no support for keeping flow rules Dmitry Kozlyuk
@ 2021-11-02 17:01           ` Dmitry Kozlyuk
  2021-11-02 17:01           ` [dpdk-dev] [PATCH v6 5/6] net/mlx5: create drop queue " Dmitry Kozlyuk
                             ` (2 subsequent siblings)
  6 siblings, 0 replies; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-11-02 17:01 UTC (permalink / raw)
  To: dev; +Cc: stable, Matan Azrad, Viacheslav Ovsiienko
Maximum available flow priority was discovered using Verbs API
regardless of the selected flow engine. This required some Verbs
objects to be initialized in order to use DevX engine. Make priority
discovery an engine method and implement it for DevX using its API.
Cc: stable@dpdk.org
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/net/mlx5/mlx5_flow.c       |  98 +++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_flow.h       |   4 ++
 drivers/net/mlx5/mlx5_flow_dv.c    | 103 +++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_flow_verbs.c |  74 +++------------------
 4 files changed, 216 insertions(+), 63 deletions(-)
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 2385a0b550..3d8dd974ce 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -9700,3 +9700,101 @@ mlx5_flow_expand_rss_adjust_node(const struct rte_flow_item *pattern,
 	}
 	return node;
 }
+
+/* Map of Verbs to Flow priority with 8 Verbs priorities. */
+static const uint32_t priority_map_3[][MLX5_PRIORITY_MAP_MAX] = {
+	{ 0, 1, 2 }, { 2, 3, 4 }, { 5, 6, 7 },
+};
+
+/* Map of Verbs to Flow priority with 16 Verbs priorities. */
+static const uint32_t priority_map_5[][MLX5_PRIORITY_MAP_MAX] = {
+	{ 0, 1, 2 }, { 3, 4, 5 }, { 6, 7, 8 },
+	{ 9, 10, 11 }, { 12, 13, 14 },
+};
+
+/**
+ * Discover the number of available flow priorities.
+ *
+ * @param dev
+ *   Ethernet device.
+ *
+ * @return
+ *   On success, number of available flow priorities.
+ *   On failure, a negative errno-style code and rte_errno is set.
+ */
+int
+mlx5_flow_discover_priorities(struct rte_eth_dev *dev)
+{
+	static const uint16_t vprio[] = {8, 16};
+	const struct mlx5_priv *priv = dev->data->dev_private;
+	const struct mlx5_flow_driver_ops *fops;
+	enum mlx5_flow_drv_type type;
+	int ret;
+
+	type = mlx5_flow_os_get_type();
+	if (type == MLX5_FLOW_TYPE_MAX) {
+		type = MLX5_FLOW_TYPE_VERBS;
+		if (priv->sh->devx && priv->config.dv_flow_en)
+			type = MLX5_FLOW_TYPE_DV;
+	}
+	fops = flow_get_drv_ops(type);
+	if (fops->discover_priorities == NULL) {
+		DRV_LOG(ERR, "Priority discovery not supported");
+		rte_errno = ENOTSUP;
+		return -rte_errno;
+	}
+	ret = fops->discover_priorities(dev, vprio, RTE_DIM(vprio));
+	if (ret < 0)
+		return ret;
+	switch (ret) {
+	case 8:
+		ret = RTE_DIM(priority_map_3);
+		break;
+	case 16:
+		ret = RTE_DIM(priority_map_5);
+		break;
+	default:
+		rte_errno = ENOTSUP;
+		DRV_LOG(ERR,
+			"port %u maximum priority: %d expected 8/16",
+			dev->data->port_id, ret);
+		return -rte_errno;
+	}
+	DRV_LOG(INFO, "port %u supported flow priorities:"
+		" 0-%d for ingress or egress root table,"
+		" 0-%d for non-root table or transfer root table.",
+		dev->data->port_id, ret - 2,
+		MLX5_NON_ROOT_FLOW_MAX_PRIO - 1);
+	return ret;
+}
+
+/**
+ * Adjust flow priority based on the highest layer and the request priority.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] priority
+ *   The rule base priority.
+ * @param[in] subpriority
+ *   The priority based on the items.
+ *
+ * @return
+ *   The new priority.
+ */
+uint32_t
+mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
+			  uint32_t subpriority)
+{
+	uint32_t res = 0;
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	switch (priv->sh->flow_max_priority) {
+	case RTE_DIM(priority_map_3):
+		res = priority_map_3[priority][subpriority];
+		break;
+	case RTE_DIM(priority_map_5):
+		res = priority_map_5[priority][subpriority];
+		break;
+	}
+	return  res;
+}
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 5509c28f01..8b83fa6f67 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1232,6 +1232,9 @@ typedef int (*mlx5_flow_create_def_policy_t)
 			(struct rte_eth_dev *dev);
 typedef void (*mlx5_flow_destroy_def_policy_t)
 			(struct rte_eth_dev *dev);
+typedef int (*mlx5_flow_discover_priorities_t)
+			(struct rte_eth_dev *dev,
+			 const uint16_t *vprio, int vprio_n);
 
 struct mlx5_flow_driver_ops {
 	mlx5_flow_validate_t validate;
@@ -1266,6 +1269,7 @@ struct mlx5_flow_driver_ops {
 	mlx5_flow_action_update_t action_update;
 	mlx5_flow_action_query_t action_query;
 	mlx5_flow_sync_domain_t sync_domain;
+	mlx5_flow_discover_priorities_t discover_priorities;
 };
 
 /* mlx5_flow.c */
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 8962d26c75..aaf96fc297 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -17932,6 +17932,108 @@ flow_dv_sync_domain(struct rte_eth_dev *dev, uint32_t domains, uint32_t flags)
 	return 0;
 }
 
+/**
+ * Discover the number of available flow priorities
+ * by trying to create a flow with the highest priority value
+ * for each possible number.
+ *
+ * @param[in] dev
+ *   Ethernet device.
+ * @param[in] vprio
+ *   List of possible number of available priorities.
+ * @param[in] vprio_n
+ *   Size of @p vprio array.
+ * @return
+ *   On success, number of available flow priorities.
+ *   On failure, a negative errno-style code and rte_errno is set.
+ */
+static int
+flow_dv_discover_priorities(struct rte_eth_dev *dev,
+			    const uint16_t *vprio, int vprio_n)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_indexed_pool *pool = priv->sh->ipool[MLX5_IPOOL_MLX5_FLOW];
+	struct rte_flow_item_eth eth;
+	struct rte_flow_item item = {
+		.type = RTE_FLOW_ITEM_TYPE_ETH,
+		.spec = ð,
+		.mask = ð,
+	};
+	struct mlx5_flow_dv_matcher matcher = {
+		.mask = {
+			.size = sizeof(matcher.mask.buf),
+		},
+	};
+	union mlx5_flow_tbl_key tbl_key;
+	struct mlx5_flow flow;
+	void *action;
+	struct rte_flow_error error;
+	uint8_t misc_mask;
+	int i, err, ret = -ENOTSUP;
+
+	/*
+	 * Prepare a flow with a catch-all pattern and a drop action.
+	 * Use drop queue, because shared drop action may be unavailable.
+	 */
+	action = priv->drop_queue.hrxq->action;
+	if (action == NULL) {
+		DRV_LOG(ERR, "Priority discovery requires a drop action");
+		rte_errno = ENOTSUP;
+		return -rte_errno;
+	}
+	memset(&flow, 0, sizeof(flow));
+	flow.handle = mlx5_ipool_zmalloc(pool, &flow.handle_idx);
+	if (flow.handle == NULL) {
+		DRV_LOG(ERR, "Cannot create flow handle");
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	flow.ingress = true;
+	flow.dv.value.size = MLX5_ST_SZ_BYTES(fte_match_param);
+	flow.dv.actions[0] = action;
+	flow.dv.actions_n = 1;
+	memset(ð, 0, sizeof(eth));
+	flow_dv_translate_item_eth(matcher.mask.buf, flow.dv.value.buf,
+				   &item, /* inner */ false, /* group */ 0);
+	matcher.crc = rte_raw_cksum(matcher.mask.buf, matcher.mask.size);
+	for (i = 0; i < vprio_n; i++) {
+		/* Configure the next proposed maximum priority. */
+		matcher.priority = vprio[i] - 1;
+		memset(&tbl_key, 0, sizeof(tbl_key));
+		err = flow_dv_matcher_register(dev, &matcher, &tbl_key, &flow,
+					       /* tunnel */ NULL,
+					       /* group */ 0,
+					       &error);
+		if (err != 0) {
+			/* This action is pure SW and must always succeed. */
+			DRV_LOG(ERR, "Cannot register matcher");
+			ret = -rte_errno;
+			break;
+		}
+		/* Try to apply the flow to HW. */
+		misc_mask = flow_dv_matcher_enable(flow.dv.value.buf);
+		__flow_dv_adjust_buf_size(&flow.dv.value.size, misc_mask);
+		err = mlx5_flow_os_create_flow
+				(flow.handle->dvh.matcher->matcher_object,
+				 (void *)&flow.dv.value, flow.dv.actions_n,
+				 flow.dv.actions, &flow.handle->drv_flow);
+		if (err == 0) {
+			claim_zero(mlx5_flow_os_destroy_flow
+						(flow.handle->drv_flow));
+			flow.handle->drv_flow = NULL;
+		}
+		claim_zero(flow_dv_matcher_release(dev, flow.handle));
+		if (err != 0)
+			break;
+		ret = vprio[i];
+	}
+	mlx5_ipool_free(pool, flow.handle_idx);
+	/* Set rte_errno if no expected priority value matched. */
+	if (ret < 0)
+		rte_errno = -ret;
+	return ret;
+}
+
 const struct mlx5_flow_driver_ops mlx5_flow_dv_drv_ops = {
 	.validate = flow_dv_validate,
 	.prepare = flow_dv_prepare,
@@ -17965,6 +18067,7 @@ const struct mlx5_flow_driver_ops mlx5_flow_dv_drv_ops = {
 	.action_update = flow_dv_action_update,
 	.action_query = flow_dv_action_query,
 	.sync_domain = flow_dv_sync_domain,
+	.discover_priorities = flow_dv_discover_priorities,
 };
 
 #endif /* HAVE_IBV_FLOW_DV_SUPPORT */
diff --git a/drivers/net/mlx5/mlx5_flow_verbs.c b/drivers/net/mlx5/mlx5_flow_verbs.c
index 0a89a136a2..29cd694752 100644
--- a/drivers/net/mlx5/mlx5_flow_verbs.c
+++ b/drivers/net/mlx5/mlx5_flow_verbs.c
@@ -28,17 +28,6 @@
 #define VERBS_SPEC_INNER(item_flags) \
 	(!!((item_flags) & MLX5_FLOW_LAYER_TUNNEL) ? IBV_FLOW_SPEC_INNER : 0)
 
-/* Map of Verbs to Flow priority with 8 Verbs priorities. */
-static const uint32_t priority_map_3[][MLX5_PRIORITY_MAP_MAX] = {
-	{ 0, 1, 2 }, { 2, 3, 4 }, { 5, 6, 7 },
-};
-
-/* Map of Verbs to Flow priority with 16 Verbs priorities. */
-static const uint32_t priority_map_5[][MLX5_PRIORITY_MAP_MAX] = {
-	{ 0, 1, 2 }, { 3, 4, 5 }, { 6, 7, 8 },
-	{ 9, 10, 11 }, { 12, 13, 14 },
-};
-
 /* Verbs specification header. */
 struct ibv_spec_header {
 	enum ibv_flow_spec_type type;
@@ -50,13 +39,17 @@ struct ibv_spec_header {
  *
  * @param[in] dev
  *   Pointer to the Ethernet device structure.
- *
+ * @param[in] vprio
+ *   Expected result variants.
+ * @param[in] vprio_n
+ *   Number of entries in @p vprio array.
  * @return
- *   number of supported flow priority on success, a negative errno
+ *   Number of supported flow priority on success, a negative errno
  *   value otherwise and rte_errno is set.
  */
-int
-mlx5_flow_discover_priorities(struct rte_eth_dev *dev)
+static int
+flow_verbs_discover_priorities(struct rte_eth_dev *dev,
+			       const uint16_t *vprio, int vprio_n)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct {
@@ -79,20 +72,19 @@ mlx5_flow_discover_priorities(struct rte_eth_dev *dev)
 	};
 	struct ibv_flow *flow;
 	struct mlx5_hrxq *drop = priv->drop_queue.hrxq;
-	uint16_t vprio[] = { 8, 16 };
 	int i;
 	int priority = 0;
 
 #if defined(HAVE_MLX5DV_DR_DEVX_PORT) || defined(HAVE_MLX5DV_DR_DEVX_PORT_V35)
 	/* If DevX supported, driver must support 16 verbs flow priorities. */
-	priority = RTE_DIM(priority_map_5);
+	priority = 16;
 	goto out;
 #endif
 	if (!drop->qp) {
 		rte_errno = ENOTSUP;
 		return -rte_errno;
 	}
-	for (i = 0; i != RTE_DIM(vprio); i++) {
+	for (i = 0; i != vprio_n; i++) {
 		flow_attr.attr.priority = vprio[i] - 1;
 		flow = mlx5_glue->create_flow(drop->qp, &flow_attr.attr);
 		if (!flow)
@@ -100,20 +92,6 @@ mlx5_flow_discover_priorities(struct rte_eth_dev *dev)
 		claim_zero(mlx5_glue->destroy_flow(flow));
 		priority = vprio[i];
 	}
-	switch (priority) {
-	case 8:
-		priority = RTE_DIM(priority_map_3);
-		break;
-	case 16:
-		priority = RTE_DIM(priority_map_5);
-		break;
-	default:
-		rte_errno = ENOTSUP;
-		DRV_LOG(ERR,
-			"port %u verbs maximum priority: %d expected 8/16",
-			dev->data->port_id, priority);
-		return -rte_errno;
-	}
 #if defined(HAVE_MLX5DV_DR_DEVX_PORT) || defined(HAVE_MLX5DV_DR_DEVX_PORT_V35)
 out:
 #endif
@@ -125,37 +103,6 @@ mlx5_flow_discover_priorities(struct rte_eth_dev *dev)
 	return priority;
 }
 
-/**
- * Adjust flow priority based on the highest layer and the request priority.
- *
- * @param[in] dev
- *   Pointer to the Ethernet device structure.
- * @param[in] priority
- *   The rule base priority.
- * @param[in] subpriority
- *   The priority based on the items.
- *
- * @return
- *   The new priority.
- */
-uint32_t
-mlx5_flow_adjust_priority(struct rte_eth_dev *dev, int32_t priority,
-				   uint32_t subpriority)
-{
-	uint32_t res = 0;
-	struct mlx5_priv *priv = dev->data->dev_private;
-
-	switch (priv->sh->flow_max_priority) {
-	case RTE_DIM(priority_map_3):
-		res = priority_map_3[priority][subpriority];
-		break;
-	case RTE_DIM(priority_map_5):
-		res = priority_map_5[priority][subpriority];
-		break;
-	}
-	return  res;
-}
-
 /**
  * Get Verbs flow counter by index.
  *
@@ -2095,4 +2042,5 @@ const struct mlx5_flow_driver_ops mlx5_flow_verbs_drv_ops = {
 	.destroy = flow_verbs_destroy,
 	.query = flow_verbs_query,
 	.sync_domain = flow_verbs_sync_domain,
+	.discover_priorities = flow_verbs_discover_priorities,
 };
-- 
2.25.1
^ permalink raw reply	[flat|nested] 96+ messages in thread
* [dpdk-dev] [PATCH v6 5/6] net/mlx5: create drop queue using DevX
  2021-11-02 17:01         ` [dpdk-dev] [PATCH v6 " Dmitry Kozlyuk
                             ` (3 preceding siblings ...)
  2021-11-02 17:01           ` [dpdk-dev] [PATCH v6 4/6] net/mlx5: discover max flow priority using DevX Dmitry Kozlyuk
@ 2021-11-02 17:01           ` Dmitry Kozlyuk
  2021-11-02 17:01           ` [dpdk-dev] [PATCH v6 6/6] net/mlx5: preserve indirect actions on restart Dmitry Kozlyuk
  2021-11-02 18:02           ` [dpdk-dev] [PATCH v6 0/6] Flow entites behavior on port restart Ferruh Yigit
  6 siblings, 0 replies; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-11-02 17:01 UTC (permalink / raw)
  To: dev; +Cc: stable, Matan Azrad, Viacheslav Ovsiienko
Drop queue creation and destruction were not implemented for DevX
flow engine and Verbs engine methods were used as a workaround.
Implement these methods for DevX so that there is a valid queue ID
that can be used regardless of queue configuration via API.
Cc: stable@dpdk.org
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_os.c |   4 -
 drivers/net/mlx5/mlx5_devx.c     | 211 ++++++++++++++++++++++++++-----
 2 files changed, 180 insertions(+), 35 deletions(-)
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index f31f1e96c6..dd4fc0c716 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1690,10 +1690,6 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	}
 	if (sh->devx && config->dv_flow_en && config->dest_tir) {
 		priv->obj_ops = devx_obj_ops;
-		priv->obj_ops.drop_action_create =
-						ibv_obj_ops.drop_action_create;
-		priv->obj_ops.drop_action_destroy =
-						ibv_obj_ops.drop_action_destroy;
 		mlx5_queue_counter_id_prepare(eth_dev);
 		priv->obj_ops.lb_dummy_queue_create =
 					mlx5_rxq_ibv_obj_dummy_lb_create;
diff --git a/drivers/net/mlx5/mlx5_devx.c b/drivers/net/mlx5/mlx5_devx.c
index 7ed774e804..424f77be79 100644
--- a/drivers/net/mlx5/mlx5_devx.c
+++ b/drivers/net/mlx5/mlx5_devx.c
@@ -226,18 +226,18 @@ mlx5_rx_devx_get_event(struct mlx5_rxq_obj *rxq_obj)
  *
  * @param dev
  *   Pointer to Ethernet device.
- * @param idx
- *   Queue index in DPDK Rx queue array.
+ * @param rxq_data
+ *   RX queue data.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_rxq_create_devx_rq_resources(struct rte_eth_dev *dev, uint16_t idx)
+mlx5_rxq_create_devx_rq_resources(struct rte_eth_dev *dev,
+				  struct mlx5_rxq_data *rxq_data)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_common_device *cdev = priv->sh->cdev;
-	struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[idx];
 	struct mlx5_rxq_ctrl *rxq_ctrl =
 		container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
 	struct mlx5_devx_create_rq_attr rq_attr = { 0 };
@@ -290,20 +290,20 @@ mlx5_rxq_create_devx_rq_resources(struct rte_eth_dev *dev, uint16_t idx)
  *
  * @param dev
  *   Pointer to Ethernet device.
- * @param idx
- *   Queue index in DPDK Rx queue array.
+ * @param rxq_data
+ *   RX queue data.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_rxq_create_devx_cq_resources(struct rte_eth_dev *dev, uint16_t idx)
+mlx5_rxq_create_devx_cq_resources(struct rte_eth_dev *dev,
+				  struct mlx5_rxq_data *rxq_data)
 {
 	struct mlx5_devx_cq *cq_obj = 0;
 	struct mlx5_devx_cq_attr cq_attr = { 0 };
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_dev_ctx_shared *sh = priv->sh;
-	struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[idx];
 	struct mlx5_rxq_ctrl *rxq_ctrl =
 		container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
 	unsigned int cqe_n = mlx5_rxq_cqe_num(rxq_data);
@@ -498,13 +498,13 @@ mlx5_rxq_devx_obj_new(struct rte_eth_dev *dev, uint16_t idx)
 		tmpl->fd = mlx5_os_get_devx_channel_fd(tmpl->devx_channel);
 	}
 	/* Create CQ using DevX API. */
-	ret = mlx5_rxq_create_devx_cq_resources(dev, idx);
+	ret = mlx5_rxq_create_devx_cq_resources(dev, rxq_data);
 	if (ret) {
 		DRV_LOG(ERR, "Failed to create CQ.");
 		goto error;
 	}
 	/* Create RQ using DevX API. */
-	ret = mlx5_rxq_create_devx_rq_resources(dev, idx);
+	ret = mlx5_rxq_create_devx_rq_resources(dev, rxq_data);
 	if (ret) {
 		DRV_LOG(ERR, "Port %u Rx queue %u RQ creation failure.",
 			dev->data->port_id, idx);
@@ -537,6 +537,11 @@ mlx5_rxq_devx_obj_new(struct rte_eth_dev *dev, uint16_t idx)
  *   Pointer to Ethernet device.
  * @param log_n
  *   Log of number of queues in the array.
+ * @param queues
+ *   List of RX queue indices or NULL, in which case
+ *   the attribute will be filled by drop queue ID.
+ * @param queues_n
+ *   Size of @p queues array or 0 if it is NULL.
  * @param ind_tbl
  *   DevX indirection table object.
  *
@@ -564,6 +569,11 @@ mlx5_devx_ind_table_create_rqt_attr(struct rte_eth_dev *dev,
 	}
 	rqt_attr->rqt_max_size = priv->config.ind_table_max_size;
 	rqt_attr->rqt_actual_size = rqt_n;
+	if (queues == NULL) {
+		for (i = 0; i < rqt_n; i++)
+			rqt_attr->rq_list[i] = priv->drop_queue.rxq->rq->id;
+		return rqt_attr;
+	}
 	for (i = 0; i != queues_n; ++i) {
 		struct mlx5_rxq_data *rxq = (*priv->rxqs)[queues[i]];
 		struct mlx5_rxq_ctrl *rxq_ctrl =
@@ -596,11 +606,12 @@ mlx5_devx_ind_table_new(struct rte_eth_dev *dev, const unsigned int log_n,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_devx_rqt_attr *rqt_attr = NULL;
+	const uint16_t *queues = dev->data->dev_started ? ind_tbl->queues :
+							  NULL;
 
 	MLX5_ASSERT(ind_tbl);
-	rqt_attr = mlx5_devx_ind_table_create_rqt_attr(dev, log_n,
-							ind_tbl->queues,
-							ind_tbl->queues_n);
+	rqt_attr = mlx5_devx_ind_table_create_rqt_attr(dev, log_n, queues,
+						       ind_tbl->queues_n);
 	if (!rqt_attr)
 		return -rte_errno;
 	ind_tbl->rqt = mlx5_devx_cmd_create_rqt(priv->sh->cdev->ctx, rqt_attr);
@@ -671,7 +682,8 @@ mlx5_devx_ind_table_destroy(struct mlx5_ind_table_obj *ind_tbl)
  * @param[in] hash_fields
  *   Verbs protocol hash field to make the RSS on.
  * @param[in] ind_tbl
- *   Indirection table for TIR.
+ *   Indirection table for TIR. If table queues array is NULL,
+ *   a TIR for drop queue is assumed.
  * @param[in] tunnel
  *   Tunnel type.
  * @param[out] tir_attr
@@ -687,19 +699,27 @@ mlx5_devx_tir_attr_set(struct rte_eth_dev *dev, const uint8_t *rss_key,
 		       int tunnel, struct mlx5_devx_tir_attr *tir_attr)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[ind_tbl->queues[0]];
-	struct mlx5_rxq_ctrl *rxq_ctrl =
-		container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
-	enum mlx5_rxq_type rxq_obj_type = rxq_ctrl->type;
+	enum mlx5_rxq_type rxq_obj_type;
 	bool lro = true;
 	uint32_t i;
 
-	/* Enable TIR LRO only if all the queues were configured for. */
-	for (i = 0; i < ind_tbl->queues_n; ++i) {
-		if (!(*priv->rxqs)[ind_tbl->queues[i]]->lro) {
-			lro = false;
-			break;
+	/* NULL queues designate drop queue. */
+	if (ind_tbl->queues != NULL) {
+		struct mlx5_rxq_data *rxq_data =
+					(*priv->rxqs)[ind_tbl->queues[0]];
+		struct mlx5_rxq_ctrl *rxq_ctrl =
+			container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
+		rxq_obj_type = rxq_ctrl->type;
+
+		/* Enable TIR LRO only if all the queues were configured for. */
+		for (i = 0; i < ind_tbl->queues_n; ++i) {
+			if (!(*priv->rxqs)[ind_tbl->queues[i]]->lro) {
+				lro = false;
+				break;
+			}
 		}
+	} else {
+		rxq_obj_type = priv->drop_queue.rxq->rxq_ctrl->type;
 	}
 	memset(tir_attr, 0, sizeof(*tir_attr));
 	tir_attr->disp_type = MLX5_TIRC_DISP_TYPE_INDIRECT;
@@ -858,7 +878,7 @@ mlx5_devx_hrxq_modify(struct rte_eth_dev *dev, struct mlx5_hrxq *hrxq,
 }
 
 /**
- * Create a DevX drop action for Rx Hash queue.
+ * Create a DevX drop Rx queue.
  *
  * @param dev
  *   Pointer to Ethernet device.
@@ -867,14 +887,99 @@ mlx5_devx_hrxq_modify(struct rte_eth_dev *dev, struct mlx5_hrxq *hrxq,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_devx_drop_action_create(struct rte_eth_dev *dev)
+mlx5_rxq_devx_obj_drop_create(struct rte_eth_dev *dev)
 {
-	(void)dev;
-	DRV_LOG(ERR, "DevX drop action is not supported yet.");
-	rte_errno = ENOTSUP;
+	struct mlx5_priv *priv = dev->data->dev_private;
+	int socket_id = dev->device->numa_node;
+	struct mlx5_rxq_ctrl *rxq_ctrl;
+	struct mlx5_rxq_data *rxq_data;
+	struct mlx5_rxq_obj *rxq = NULL;
+	int ret;
+
+	/*
+	 * Initialize dummy control structures.
+	 * They are required to hold pointers for cleanup
+	 * and are only accessible via drop queue DevX objects.
+	 */
+	rxq_ctrl = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*rxq_ctrl),
+			       0, socket_id);
+	if (rxq_ctrl == NULL) {
+		DRV_LOG(ERR, "Port %u could not allocate drop queue control",
+			dev->data->port_id);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	rxq = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*rxq), 0, socket_id);
+	if (rxq == NULL) {
+		DRV_LOG(ERR, "Port %u could not allocate drop queue object",
+			dev->data->port_id);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	rxq->rxq_ctrl = rxq_ctrl;
+	rxq_ctrl->type = MLX5_RXQ_TYPE_STANDARD;
+	rxq_ctrl->priv = priv;
+	rxq_ctrl->obj = rxq;
+	rxq_data = &rxq_ctrl->rxq;
+	/* Create CQ using DevX API. */
+	ret = mlx5_rxq_create_devx_cq_resources(dev, rxq_data);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Port %u drop queue CQ creation failed.",
+			dev->data->port_id);
+		goto error;
+	}
+	/* Create RQ using DevX API. */
+	ret = mlx5_rxq_create_devx_rq_resources(dev, rxq_data);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Port %u drop queue RQ creation failed.",
+			dev->data->port_id);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	/* Change queue state to ready. */
+	ret = mlx5_devx_modify_rq(rxq, MLX5_RXQ_MOD_RST2RDY);
+	if (ret != 0)
+		goto error;
+	/* Initialize drop queue. */
+	priv->drop_queue.rxq = rxq;
+	return 0;
+error:
+	ret = rte_errno; /* Save rte_errno before cleanup. */
+	if (rxq != NULL) {
+		if (rxq->rq_obj.rq != NULL)
+			mlx5_devx_rq_destroy(&rxq->rq_obj);
+		if (rxq->cq_obj.cq != NULL)
+			mlx5_devx_cq_destroy(&rxq->cq_obj);
+		if (rxq->devx_channel)
+			mlx5_os_devx_destroy_event_channel
+							(rxq->devx_channel);
+		mlx5_free(rxq);
+	}
+	if (rxq_ctrl != NULL)
+		mlx5_free(rxq_ctrl);
+	rte_errno = ret; /* Restore rte_errno. */
 	return -rte_errno;
 }
 
+/**
+ * Release drop Rx queue resources.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ */
+static void
+mlx5_rxq_devx_obj_drop_release(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rxq_obj *rxq = priv->drop_queue.rxq;
+	struct mlx5_rxq_ctrl *rxq_ctrl = rxq->rxq_ctrl;
+
+	mlx5_rxq_devx_obj_release(rxq);
+	mlx5_free(rxq);
+	mlx5_free(rxq_ctrl);
+	priv->drop_queue.rxq = NULL;
+}
+
 /**
  * Release a drop hash Rx queue.
  *
@@ -884,9 +989,53 @@ mlx5_devx_drop_action_create(struct rte_eth_dev *dev)
 static void
 mlx5_devx_drop_action_destroy(struct rte_eth_dev *dev)
 {
-	(void)dev;
-	DRV_LOG(ERR, "DevX drop action is not supported yet.");
-	rte_errno = ENOTSUP;
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_hrxq *hrxq = priv->drop_queue.hrxq;
+
+	if (hrxq->tir != NULL)
+		mlx5_devx_tir_destroy(hrxq);
+	if (hrxq->ind_table->ind_table != NULL)
+		mlx5_devx_ind_table_destroy(hrxq->ind_table);
+	if (priv->drop_queue.rxq->rq != NULL)
+		mlx5_rxq_devx_obj_drop_release(dev);
+}
+
+/**
+ * Create a DevX drop action for Rx Hash queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_devx_drop_action_create(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_hrxq *hrxq = priv->drop_queue.hrxq;
+	int ret;
+
+	ret = mlx5_rxq_devx_obj_drop_create(dev);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Cannot create drop RX queue");
+		return ret;
+	}
+	/* hrxq->ind_table queues are NULL, drop RX queue ID will be used */
+	ret = mlx5_devx_ind_table_new(dev, 0, hrxq->ind_table);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Cannot create drop hash RX queue indirection table");
+		goto error;
+	}
+	ret = mlx5_devx_hrxq_new(dev, hrxq, /* tunnel */ false);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Cannot create drop hash RX queue");
+		goto error;
+	}
+	return 0;
+error:
+	mlx5_devx_drop_action_destroy(dev);
+	return ret;
 }
 
 /**
-- 
2.25.1
^ permalink raw reply	[flat|nested] 96+ messages in thread
* [dpdk-dev] [PATCH v6 6/6] net/mlx5: preserve indirect actions on restart
  2021-11-02 17:01         ` [dpdk-dev] [PATCH v6 " Dmitry Kozlyuk
                             ` (4 preceding siblings ...)
  2021-11-02 17:01           ` [dpdk-dev] [PATCH v6 5/6] net/mlx5: create drop queue " Dmitry Kozlyuk
@ 2021-11-02 17:01           ` Dmitry Kozlyuk
  2021-11-02 18:02           ` [dpdk-dev] [PATCH v6 0/6] Flow entites behavior on port restart Ferruh Yigit
  6 siblings, 0 replies; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-11-02 17:01 UTC (permalink / raw)
  To: dev; +Cc: bingz, stable, Matan Azrad, Viacheslav Ovsiienko
MLX5 PMD uses reference counting to manage RX queue resources.
After port stop shared RSS actions kept references to RX queues,
preventing resource release. As a result, internal PMD mempool
for such queues had been exhausted after a number of port restarts.
Diagnostic message from rte_eth_dev_start():
    Rx queue allocation failed: Cannot allocate memory
Dereference RX queues used by indirect actions on port stop (detach)
and restore references on port start (attach) in order to allow RX queue
resource release, but keep indirect RSS across the port restart.
Replace queue IDs in HW by drop queue ID on detach and restore actual
queue IDs on attach.
When the port is stopped, create indirect RSS in the detached state.
As a result, MLX5 PMD is able to keep all its indirect actions
across port restart. Advertise this capability.
Fixes: 4b61b8774be9 ("ethdev: introduce indirect flow action")
Cc: bingz@nvidia.com
Cc: stable@dpdk.org
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/net/mlx5/mlx5_ethdev.c  |   1 +
 drivers/net/mlx5/mlx5_flow.c    | 194 ++++++++++++++++++++++++++++----
 drivers/net/mlx5/mlx5_flow.h    |   2 +
 drivers/net/mlx5/mlx5_rx.h      |   4 +
 drivers/net/mlx5/mlx5_rxq.c     |  99 ++++++++++++++--
 drivers/net/mlx5/mlx5_trigger.c |  10 ++
 6 files changed, 276 insertions(+), 34 deletions(-)
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index f2b78c3cc6..81fa8845bb 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -321,6 +321,7 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
 	info->rx_offload_capa = (mlx5_get_rx_port_offloads() |
 				 info->rx_queue_offload_capa);
 	info->tx_offload_capa = mlx5_get_tx_port_offloads(dev);
+	info->dev_capa = RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP;
 	info->if_index = mlx5_ifindex(dev);
 	info->reta_size = priv->reta_idx_n ?
 		priv->reta_idx_n : config->ind_table_max_size;
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 3d8dd974ce..9904bc5863 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -1594,6 +1594,58 @@ mlx5_flow_validate_action_queue(const struct rte_flow_action *action,
 	return 0;
 }
 
+/**
+ * Validate queue numbers for device RSS.
+ *
+ * @param[in] dev
+ *   Configured device.
+ * @param[in] queues
+ *   Array of queue numbers.
+ * @param[in] queues_n
+ *   Size of the @p queues array.
+ * @param[out] error
+ *   On error, filled with a textual error description.
+ * @param[out] queue
+ *   On error, filled with an offending queue index in @p queues array.
+ *
+ * @return
+ *   0 on success, a negative errno code on error.
+ */
+static int
+mlx5_validate_rss_queues(const struct rte_eth_dev *dev,
+			 const uint16_t *queues, uint32_t queues_n,
+			 const char **error, uint32_t *queue_idx)
+{
+	const struct mlx5_priv *priv = dev->data->dev_private;
+	enum mlx5_rxq_type rxq_type = MLX5_RXQ_TYPE_UNDEFINED;
+	uint32_t i;
+
+	for (i = 0; i != queues_n; ++i) {
+		struct mlx5_rxq_ctrl *rxq_ctrl;
+
+		if (queues[i] >= priv->rxqs_n) {
+			*error = "queue index out of range";
+			*queue_idx = i;
+			return -EINVAL;
+		}
+		if (!(*priv->rxqs)[queues[i]]) {
+			*error =  "queue is not configured";
+			*queue_idx = i;
+			return -EINVAL;
+		}
+		rxq_ctrl = container_of((*priv->rxqs)[queues[i]],
+					struct mlx5_rxq_ctrl, rxq);
+		if (i == 0)
+			rxq_type = rxq_ctrl->type;
+		if (rxq_type != rxq_ctrl->type) {
+			*error = "combining hairpin and regular RSS queues is not supported";
+			*queue_idx = i;
+			return -ENOTSUP;
+		}
+	}
+	return 0;
+}
+
 /*
  * Validate the rss action.
  *
@@ -1614,8 +1666,9 @@ mlx5_validate_action_rss(struct rte_eth_dev *dev,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	const struct rte_flow_action_rss *rss = action->conf;
-	enum mlx5_rxq_type rxq_type = MLX5_RXQ_TYPE_UNDEFINED;
-	unsigned int i;
+	int ret;
+	const char *message;
+	uint32_t queue_idx;
 
 	if (rss->func != RTE_ETH_HASH_FUNCTION_DEFAULT &&
 	    rss->func != RTE_ETH_HASH_FUNCTION_TOEPLITZ)
@@ -1679,27 +1732,12 @@ mlx5_validate_action_rss(struct rte_eth_dev *dev,
 		return rte_flow_error_set(error, EINVAL,
 					  RTE_FLOW_ERROR_TYPE_ACTION_CONF,
 					  NULL, "No queues configured");
-	for (i = 0; i != rss->queue_num; ++i) {
-		struct mlx5_rxq_ctrl *rxq_ctrl;
-
-		if (rss->queue[i] >= priv->rxqs_n)
-			return rte_flow_error_set
-				(error, EINVAL,
-				 RTE_FLOW_ERROR_TYPE_ACTION_CONF,
-				 &rss->queue[i], "queue index out of range");
-		if (!(*priv->rxqs)[rss->queue[i]])
-			return rte_flow_error_set
-				(error, EINVAL, RTE_FLOW_ERROR_TYPE_ACTION_CONF,
-				 &rss->queue[i], "queue is not configured");
-		rxq_ctrl = container_of((*priv->rxqs)[rss->queue[i]],
-					struct mlx5_rxq_ctrl, rxq);
-		if (i == 0)
-			rxq_type = rxq_ctrl->type;
-		if (rxq_type != rxq_ctrl->type)
-			return rte_flow_error_set
-				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION_CONF,
-				 &rss->queue[i],
-				 "combining hairpin and regular RSS queues is not supported");
+	ret = mlx5_validate_rss_queues(dev, rss->queue, rss->queue_num,
+				       &message, &queue_idx);
+	if (ret != 0) {
+		return rte_flow_error_set(error, -ret,
+					  RTE_FLOW_ERROR_TYPE_ACTION_CONF,
+					  &rss->queue[queue_idx], message);
 	}
 	return 0;
 }
@@ -8786,6 +8824,116 @@ mlx5_action_handle_flush(struct rte_eth_dev *dev)
 	return ret;
 }
 
+/**
+ * Validate existing indirect actions against current device configuration
+ * and attach them to device resources.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_action_handle_attach(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_indexed_pool *ipool =
+			priv->sh->ipool[MLX5_IPOOL_RSS_SHARED_ACTIONS];
+	struct mlx5_shared_action_rss *shared_rss, *shared_rss_last;
+	int ret = 0;
+	uint32_t idx;
+
+	ILIST_FOREACH(ipool, priv->rss_shared_actions, idx, shared_rss, next) {
+		struct mlx5_ind_table_obj *ind_tbl = shared_rss->ind_tbl;
+		const char *message;
+		uint32_t queue_idx;
+
+		ret = mlx5_validate_rss_queues(dev, ind_tbl->queues,
+					       ind_tbl->queues_n,
+					       &message, &queue_idx);
+		if (ret != 0) {
+			DRV_LOG(ERR, "Port %u cannot use queue %u in RSS: %s",
+				dev->data->port_id, ind_tbl->queues[queue_idx],
+				message);
+			break;
+		}
+	}
+	if (ret != 0)
+		return ret;
+	ILIST_FOREACH(ipool, priv->rss_shared_actions, idx, shared_rss, next) {
+		struct mlx5_ind_table_obj *ind_tbl = shared_rss->ind_tbl;
+
+		ret = mlx5_ind_table_obj_attach(dev, ind_tbl);
+		if (ret != 0) {
+			DRV_LOG(ERR, "Port %u could not attach "
+				"indirection table obj %p",
+				dev->data->port_id, (void *)ind_tbl);
+			goto error;
+		}
+	}
+	return 0;
+error:
+	shared_rss_last = shared_rss;
+	ILIST_FOREACH(ipool, priv->rss_shared_actions, idx, shared_rss, next) {
+		struct mlx5_ind_table_obj *ind_tbl = shared_rss->ind_tbl;
+
+		if (shared_rss == shared_rss_last)
+			break;
+		if (mlx5_ind_table_obj_detach(dev, ind_tbl) != 0)
+			DRV_LOG(CRIT, "Port %u could not detach "
+				"indirection table obj %p on rollback",
+				dev->data->port_id, (void *)ind_tbl);
+	}
+	return ret;
+}
+
+/**
+ * Detach indirect actions of the device from its resources.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_action_handle_detach(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_indexed_pool *ipool =
+			priv->sh->ipool[MLX5_IPOOL_RSS_SHARED_ACTIONS];
+	struct mlx5_shared_action_rss *shared_rss, *shared_rss_last;
+	int ret = 0;
+	uint32_t idx;
+
+	ILIST_FOREACH(ipool, priv->rss_shared_actions, idx, shared_rss, next) {
+		struct mlx5_ind_table_obj *ind_tbl = shared_rss->ind_tbl;
+
+		ret = mlx5_ind_table_obj_detach(dev, ind_tbl);
+		if (ret != 0) {
+			DRV_LOG(ERR, "Port %u could not detach "
+				"indirection table obj %p",
+				dev->data->port_id, (void *)ind_tbl);
+			goto error;
+		}
+	}
+	return 0;
+error:
+	shared_rss_last = shared_rss;
+	ILIST_FOREACH(ipool, priv->rss_shared_actions, idx, shared_rss, next) {
+		struct mlx5_ind_table_obj *ind_tbl = shared_rss->ind_tbl;
+
+		if (shared_rss == shared_rss_last)
+			break;
+		if (mlx5_ind_table_obj_attach(dev, ind_tbl) != 0)
+			DRV_LOG(CRIT, "Port %u could not attach "
+				"indirection table obj %p on rollback",
+				dev->data->port_id, (void *)ind_tbl);
+	}
+	return ret;
+}
+
 #ifndef HAVE_MLX5DV_DR
 #define MLX5_DOMAIN_SYNC_FLOW ((1 << 0) | (1 << 1))
 #else
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 8b83fa6f67..8fbc37feb7 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1582,6 +1582,8 @@ void mlx5_flow_destroy_sub_policy_with_rxq(struct rte_eth_dev *dev,
 		struct mlx5_flow_meter_policy *mtr_policy);
 int mlx5_flow_dv_discover_counter_offset_support(struct rte_eth_dev *dev);
 int mlx5_flow_discover_dr_action_support(struct rte_eth_dev *dev);
+int mlx5_action_handle_attach(struct rte_eth_dev *dev);
+int mlx5_action_handle_detach(struct rte_eth_dev *dev);
 int mlx5_action_handle_flush(struct rte_eth_dev *dev);
 void mlx5_release_tunnel_hub(struct mlx5_dev_ctx_shared *sh, uint16_t port_id);
 int mlx5_alloc_tunnel_hub(struct mlx5_dev_ctx_shared *sh);
diff --git a/drivers/net/mlx5/mlx5_rx.h b/drivers/net/mlx5/mlx5_rx.h
index 4952fe1455..69b1263339 100644
--- a/drivers/net/mlx5/mlx5_rx.h
+++ b/drivers/net/mlx5/mlx5_rx.h
@@ -211,6 +211,10 @@ int mlx5_ind_table_obj_modify(struct rte_eth_dev *dev,
 			      struct mlx5_ind_table_obj *ind_tbl,
 			      uint16_t *queues, const uint32_t queues_n,
 			      bool standalone);
+int mlx5_ind_table_obj_attach(struct rte_eth_dev *dev,
+			      struct mlx5_ind_table_obj *ind_tbl);
+int mlx5_ind_table_obj_detach(struct rte_eth_dev *dev,
+			      struct mlx5_ind_table_obj *ind_tbl);
 struct mlx5_list_entry *mlx5_hrxq_create_cb(void *tool_ctx, void *cb_ctx);
 int mlx5_hrxq_match_cb(void *tool_ctx, struct mlx5_list_entry *entry,
 		       void *cb_ctx);
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 4f02fe02b9..9220bb2c15 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -2032,6 +2032,26 @@ mlx5_ind_table_obj_new(struct rte_eth_dev *dev, const uint16_t *queues,
 	return ind_tbl;
 }
 
+static int
+mlx5_ind_table_obj_check_standalone(struct rte_eth_dev *dev __rte_unused,
+				    struct mlx5_ind_table_obj *ind_tbl)
+{
+	uint32_t refcnt;
+
+	refcnt = __atomic_load_n(&ind_tbl->refcnt, __ATOMIC_RELAXED);
+	if (refcnt <= 1)
+		return 0;
+	/*
+	 * Modification of indirection tables having more than 1
+	 * reference is unsupported.
+	 */
+	DRV_LOG(DEBUG,
+		"Port %u cannot modify indirection table %p (refcnt %u > 1).",
+		dev->data->port_id, (void *)ind_tbl, refcnt);
+	rte_errno = EINVAL;
+	return -rte_errno;
+}
+
 /**
  * Modify an indirection table.
  *
@@ -2064,18 +2084,8 @@ mlx5_ind_table_obj_modify(struct rte_eth_dev *dev,
 
 	MLX5_ASSERT(standalone);
 	RTE_SET_USED(standalone);
-	if (__atomic_load_n(&ind_tbl->refcnt, __ATOMIC_RELAXED) > 1) {
-		/*
-		 * Modification of indirection ntables having more than 1
-		 * reference unsupported. Intended for standalone indirection
-		 * tables only.
-		 */
-		DRV_LOG(DEBUG,
-			"Port %u cannot modify indirection table (refcnt> 1).",
-			dev->data->port_id);
-		rte_errno = EINVAL;
+	if (mlx5_ind_table_obj_check_standalone(dev, ind_tbl) < 0)
 		return -rte_errno;
-	}
 	for (i = 0; i != queues_n; ++i) {
 		if (!mlx5_rxq_get(dev, queues[i])) {
 			ret = -rte_errno;
@@ -2101,6 +2111,73 @@ mlx5_ind_table_obj_modify(struct rte_eth_dev *dev,
 	return ret;
 }
 
+/**
+ * Attach an indirection table to its queues.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param ind_table
+ *   Indirection table to attach.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_ind_table_obj_attach(struct rte_eth_dev *dev,
+			  struct mlx5_ind_table_obj *ind_tbl)
+{
+	unsigned int i;
+	int ret;
+
+	ret = mlx5_ind_table_obj_modify(dev, ind_tbl, ind_tbl->queues,
+					ind_tbl->queues_n, true);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Port %u could not modify indirect table obj %p",
+			dev->data->port_id, (void *)ind_tbl);
+		return ret;
+	}
+	for (i = 0; i < ind_tbl->queues_n; i++)
+		mlx5_rxq_get(dev, ind_tbl->queues[i]);
+	return 0;
+}
+
+/**
+ * Detach an indirection table from its queues.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param ind_table
+ *   Indirection table to detach.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_ind_table_obj_detach(struct rte_eth_dev *dev,
+			  struct mlx5_ind_table_obj *ind_tbl)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const unsigned int n = rte_is_power_of_2(ind_tbl->queues_n) ?
+			       log2above(ind_tbl->queues_n) :
+			       log2above(priv->config.ind_table_max_size);
+	unsigned int i;
+	int ret;
+
+	ret = mlx5_ind_table_obj_check_standalone(dev, ind_tbl);
+	if (ret != 0)
+		return ret;
+	MLX5_ASSERT(priv->obj_ops.ind_table_modify);
+	ret = priv->obj_ops.ind_table_modify(dev, n, NULL, 0, ind_tbl);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Port %u could not modify indirect table obj %p",
+			dev->data->port_id, (void *)ind_tbl);
+		return ret;
+	}
+	for (i = 0; i < ind_tbl->queues_n; i++)
+		mlx5_rxq_release(dev, ind_tbl->queues[i]);
+	return ret;
+}
+
 int
 mlx5_hrxq_match_cb(void *tool_ctx __rte_unused, struct mlx5_list_entry *entry,
 		   void *cb_ctx)
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index d916c8addc..ebeeae279e 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -14,6 +14,7 @@
 #include <mlx5_malloc.h>
 
 #include "mlx5.h"
+#include "mlx5_flow.h"
 #include "mlx5_rx.h"
 #include "mlx5_tx.h"
 #include "mlx5_utils.h"
@@ -1162,6 +1163,14 @@ mlx5_dev_start(struct rte_eth_dev *dev)
 	mlx5_rxq_timestamp_set(dev);
 	/* Set a mask and offset of scheduling on timestamp into Tx queues. */
 	mlx5_txq_dynf_timestamp_set(dev);
+	/* Attach indirection table objects detached on port stop. */
+	ret = mlx5_action_handle_attach(dev);
+	if (ret) {
+		DRV_LOG(ERR,
+			"port %u failed to attach indirect actions: %s",
+			dev->data->port_id, rte_strerror(rte_errno));
+		goto error;
+	}
 	/*
 	 * In non-cached mode, it only needs to start the default mreg copy
 	 * action and no flow created by application exists anymore.
@@ -1239,6 +1248,7 @@ mlx5_dev_stop(struct rte_eth_dev *dev)
 	/* All RX queue flags will be cleared in the flush interface. */
 	mlx5_flow_list_flush(dev, MLX5_FLOW_TYPE_GEN, true);
 	mlx5_flow_meter_rxq_flush(dev);
+	mlx5_action_handle_detach(dev);
 	mlx5_rx_intr_vec_disable(dev);
 	priv->sh->port[priv->dev_port - 1].ih_port_id = RTE_MAX_ETHPORTS;
 	priv->sh->port[priv->dev_port - 1].devx_ih_port_id = RTE_MAX_ETHPORTS;
-- 
2.25.1
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH v5 0/6] Flow entites behavior on port restart
  2021-11-02 14:23         ` [dpdk-dev] [PATCH v5 0/6] Flow entites behavior on port restart Ferruh Yigit
@ 2021-11-02 17:02           ` Dmitry Kozlyuk
  0 siblings, 0 replies; 96+ messages in thread
From: Dmitry Kozlyuk @ 2021-11-02 17:02 UTC (permalink / raw)
  To: Ferruh Yigit, dev
> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@intel.com>
[...]
> > v5:
> >       1. Fix rebase conflicts.
> 
> I am still getting conflicts. Did you rebase it on top of next-net?
Sorry, I was rebasing onto main instead of next-net.
Sent v6 rebased onto:
commit 87f4496c74e644d4466233315b2497e7b1acca6d (next-net/main)
Author: Lior Margalit <lmargalit@nvidia.com>
Date:   Mon Nov 1 08:38:41 2021 +0200
    net/mlx5: fix RSS expansion of ETH item with EtherType
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [dpdk-dev] [PATCH v6 0/6] Flow entites behavior on port restart
  2021-11-02 17:01         ` [dpdk-dev] [PATCH v6 " Dmitry Kozlyuk
                             ` (5 preceding siblings ...)
  2021-11-02 17:01           ` [dpdk-dev] [PATCH v6 6/6] net/mlx5: preserve indirect actions on restart Dmitry Kozlyuk
@ 2021-11-02 18:02           ` Ferruh Yigit
  6 siblings, 0 replies; 96+ messages in thread
From: Ferruh Yigit @ 2021-11-02 18:02 UTC (permalink / raw)
  To: Dmitry Kozlyuk, dev
On 11/2/2021 5:01 PM, Dmitry Kozlyuk wrote:
> It is unspecified whether flow rules and indirect actions are kept
> when a port is stopped, possibly reconfigured, and started again.
> Vendors approach the topic differently, e.g. mlx5 and i40e PMD
> disagree in whether flow rules can be kept, and mlx5 PMD would keep
> indirect actions. In the end, applications are greatly affected
> by whatever contract there is and need to know it.
> 
> Applications may wish to restart the port to reconfigure it,
> e.g. switch offloads or even modify queues.
> Keeping rte_flow entities enables application improvements:
> 1. Since keeping the rules across restart comes with the ability
>     to create rules before the device is started. This allows
>     to have all the rules created at the moment of start,
>     so that there is no time frame when traffic is coming already,
>     but the rules are not yet created (restored).
> 2. When a rule or an indirect action has some associated state,
>     such as a counter, application saves the need to keep
>     additional state in order to cope with information loss
>     if such an entity would be destroyed.
> 
> It is proposed to advertise capabilities of keeping flow rules
> and indirect actions (as a special case of shared object)
> using a combination of ethdev info and rte_flow calls.
> Then a bug is fixed in mlx5 PMD that prevented indirect RSS action
> from being kept, and the driver starts advertising the new capability.
> 
> Prior discussions:
> 1) http://inbox.dpdk.org/dev/20210727073121.895620-1-dkozlyuk@nvidia.com/
> 2) http://inbox.dpdk.org/dev/20210901085516.3647814-1-dkozlyuk@nvidia.com/
> 
> v6:
>       Rebase on next-net commit 87f4496c74e6 and fix conflicts.
> v5:
>       1. Fix rebase conflicts.
>       2. Add warnings about experimental status (Andrew).
> v4:  1. Fix rebase conflicts (CI).
>       2. State rule behavior when a port is not started or stopped (Ori).
>       3. Improve wording on rule features, add examples (Andrew).
>       4. State that rules/actions that cannot be kept while other can be
>          must be destroyed by the application (Andrew/Ori).
>       5. Add rationale to the cover letter (Andrew).
> 
> 
> Dmitry Kozlyuk (6):
>    ethdev: add capability to keep flow rules on restart
>    ethdev: add capability to keep shared objects on restart
>    net: advertise no support for keeping flow rules
>    net/mlx5: discover max flow priority using DevX
>    net/mlx5: create drop queue using DevX
>    net/mlx5: preserve indirect actions on restart
> 
Series applied to dpdk-next-net/main, thanks.
^ permalink raw reply	[flat|nested] 96+ messages in thread
end of thread, other threads:[~2021-11-02 18:02 UTC | newest]
Thread overview: 96+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-05  0:52 [dpdk-dev] [PATCH 0/5] Flow entites behavior on port restart dkozlyuk
2021-10-05  0:52 ` [dpdk-dev] [PATCH 1/5] ethdev: add capability to keep flow rules on restart dkozlyuk
2021-10-06  6:15   ` Ori Kam
2021-10-06  6:55     ` Somnath Kotur
2021-10-06 17:15   ` Ajit Khaparde
2021-10-05  0:52 ` [dpdk-dev] [PATCH 2/5] ethdev: add capability to keep shared objects " dkozlyuk
2021-10-06  6:16   ` Ori Kam
2021-10-13  8:32   ` Dmitry Kozlyuk
2021-10-14 13:46     ` Ferruh Yigit
2021-10-14 21:45       ` Dmitry Kozlyuk
2021-10-14 21:48         ` Dmitry Kozlyuk
2021-10-15 11:46         ` Ferruh Yigit
2021-10-15 12:35           ` Dmitry Kozlyuk
2021-10-15 16:26             ` Ferruh Yigit
2021-10-16 20:32               ` Dmitry Kozlyuk
2021-10-18  8:42                 ` Ferruh Yigit
2021-10-18 11:13                   ` Dmitry Kozlyuk
2021-10-18 11:59                     ` Ferruh Yigit
2021-10-14 14:14     ` Dmitry Kozlyuk
2021-10-15  8:26       ` Andrew Rybchenko
2021-10-15  9:04         ` Dmitry Kozlyuk
2021-10-15  9:36           ` Andrew Rybchenko
2021-10-05  0:52 ` [dpdk-dev] [PATCH 3/5] net/mlx5: discover max flow priority using DevX dkozlyuk
2021-10-05  0:52 ` [dpdk-dev] [PATCH 4/5] net/mlx5: create drop queue " dkozlyuk
2021-10-05  0:52 ` [dpdk-dev] [PATCH 5/5] net/mlx5: preserve indirect actions on restart dkozlyuk
2021-10-15 16:18 ` [dpdk-dev] [PATCH v2 0/5] Flow entites behavior on port restart Dmitry Kozlyuk
2021-10-15 16:18   ` [dpdk-dev] [PATCH v2 1/5] ethdev: add capability to keep flow rules on restart Dmitry Kozlyuk
2021-10-18  8:56     ` Andrew Rybchenko
2021-10-19 12:38       ` Dmitry Kozlyuk
2021-10-18 13:06     ` Zhang, Qi Z
2021-10-18 22:51       ` Dmitry Kozlyuk
2021-10-19  1:00         ` Zhang, Qi Z
2021-10-15 16:18   ` [dpdk-dev] [PATCH v2 2/5] ethdev: add capability to keep shared objects " Dmitry Kozlyuk
2021-10-17  8:10     ` Ori Kam
2021-10-17  9:14       ` Dmitry Kozlyuk
2021-10-17  9:45         ` Ori Kam
2021-10-15 16:18   ` [dpdk-dev] [PATCH v2 3/5] net/mlx5: discover max flow priority using DevX Dmitry Kozlyuk
2021-10-15 16:18   ` [dpdk-dev] [PATCH v2 4/5] net/mlx5: create drop queue " Dmitry Kozlyuk
2021-10-15 16:18   ` [dpdk-dev] [PATCH v2 5/5] net/mlx5: preserve indirect actions on restart Dmitry Kozlyuk
2021-10-19 12:37   ` [dpdk-dev] [PATCH v3 0/6] Flow entites behavior on port restart Dmitry Kozlyuk
2021-10-19 12:37     ` [dpdk-dev] [PATCH v3 1/6] ethdev: add capability to keep flow rules on restart Dmitry Kozlyuk
2021-10-19 15:22       ` Ori Kam
2021-10-19 16:38       ` Ferruh Yigit
2021-10-19 17:13         ` Dmitry Kozlyuk
2021-10-20 10:39       ` Andrew Rybchenko
2021-10-20 11:40         ` Dmitry Kozlyuk
2021-10-20 13:40           ` Ori Kam
2021-10-19 12:37     ` [dpdk-dev] [PATCH v3 2/6] ethdev: add capability to keep shared objects " Dmitry Kozlyuk
2021-10-19 15:22       ` Ori Kam
2021-10-19 12:37     ` [dpdk-dev] [PATCH v3 3/6] net: advertise no support for keeping flow rules Dmitry Kozlyuk
2021-10-20 10:08       ` Andrew Rybchenko
2021-10-20 22:20         ` Dmitry Kozlyuk
2021-10-19 12:37     ` [dpdk-dev] [PATCH v3 4/6] net/mlx5: discover max flow priority using DevX Dmitry Kozlyuk
2021-10-19 12:37     ` [dpdk-dev] [PATCH v3 5/6] net/mlx5: create drop queue " Dmitry Kozlyuk
2021-10-19 12:37     ` [dpdk-dev] [PATCH v3 6/6] net/mlx5: preserve indirect actions on restart Dmitry Kozlyuk
2021-10-20 10:12     ` [dpdk-dev] [PATCH v3 0/6] Flow entites behavior on port restart Andrew Rybchenko
2021-10-20 13:21       ` Dmitry Kozlyuk
2021-10-21  6:34     ` [dpdk-dev] [PATCH v4 " Dmitry Kozlyuk
2021-10-21  6:34       ` [dpdk-dev] [PATCH v4 1/6] ethdev: add capability to keep flow rules on restart Dmitry Kozlyuk
2021-10-21  7:36         ` Ori Kam
2021-10-28 18:33         ` Ajit Khaparde
2021-11-01 15:02         ` Andrew Rybchenko
2021-11-01 15:56           ` Dmitry Kozlyuk
2021-10-21  6:34       ` [dpdk-dev] [PATCH v4 2/6] ethdev: add capability to keep shared objects " Dmitry Kozlyuk
2021-10-21  7:37         ` Ori Kam
2021-10-21 18:28         ` Ajit Khaparde
2021-11-01 15:04         ` Andrew Rybchenko
2021-10-21  6:35       ` [dpdk-dev] [PATCH v4 3/6] net: advertise no support for keeping flow rules Dmitry Kozlyuk
2021-10-21 18:26         ` Ajit Khaparde
2021-10-22  1:38           ` Somnath Kotur
2021-10-27  7:11         ` Hyong Youb Kim (hyonkim)
2021-11-01 15:06         ` Andrew Rybchenko
2021-11-01 16:59           ` Ferruh Yigit
2021-10-21  6:35       ` [dpdk-dev] [PATCH v4 4/6] net/mlx5: discover max flow priority using DevX Dmitry Kozlyuk
2021-10-21  6:35       ` [dpdk-dev] [PATCH v4 5/6] net/mlx5: create drop queue " Dmitry Kozlyuk
2021-10-21  6:35       ` [dpdk-dev] [PATCH v4 6/6] net/mlx5: preserve indirect actions on restart Dmitry Kozlyuk
2021-10-26 11:46       ` [dpdk-dev] [PATCH v4 0/6] Flow entites behavior on port restart Ferruh Yigit
2021-11-01 13:43         ` Ferruh Yigit
2021-11-02 13:49       ` Ferruh Yigit
2021-11-02 13:54       ` [dpdk-dev] [PATCH v5 " Dmitry Kozlyuk
2021-11-02 13:54         ` [dpdk-dev] [PATCH v5 1/6] ethdev: add capability to keep flow rules on restart Dmitry Kozlyuk
2021-11-02 13:54         ` [dpdk-dev] [PATCH v5 2/6] ethdev: add capability to keep shared objects " Dmitry Kozlyuk
2021-11-02 13:54         ` [dpdk-dev] [PATCH v5 3/6] net: advertise no support for keeping flow rules Dmitry Kozlyuk
2021-11-02 13:54         ` [dpdk-dev] [PATCH v5 4/6] net/mlx5: discover max flow priority using DevX Dmitry Kozlyuk
2021-11-02 13:54         ` [dpdk-dev] [PATCH v5 5/6] net/mlx5: create drop queue " Dmitry Kozlyuk
2021-11-02 13:54         ` [dpdk-dev] [PATCH v5 6/6] net/mlx5: preserve indirect actions on restart Dmitry Kozlyuk
2021-11-02 14:23         ` [dpdk-dev] [PATCH v5 0/6] Flow entites behavior on port restart Ferruh Yigit
2021-11-02 17:02           ` Dmitry Kozlyuk
2021-11-02 17:01         ` [dpdk-dev] [PATCH v6 " Dmitry Kozlyuk
2021-11-02 17:01           ` [dpdk-dev] [PATCH v6 1/6] ethdev: add capability to keep flow rules on restart Dmitry Kozlyuk
2021-11-02 17:01           ` [dpdk-dev] [PATCH v6 2/6] ethdev: add capability to keep shared objects " Dmitry Kozlyuk
2021-11-02 17:01           ` [dpdk-dev] [PATCH v6 3/6] net: advertise no support for keeping flow rules Dmitry Kozlyuk
2021-11-02 17:01           ` [dpdk-dev] [PATCH v6 4/6] net/mlx5: discover max flow priority using DevX Dmitry Kozlyuk
2021-11-02 17:01           ` [dpdk-dev] [PATCH v6 5/6] net/mlx5: create drop queue " Dmitry Kozlyuk
2021-11-02 17:01           ` [dpdk-dev] [PATCH v6 6/6] net/mlx5: preserve indirect actions on restart Dmitry Kozlyuk
2021-11-02 18:02           ` [dpdk-dev] [PATCH v6 0/6] Flow entites behavior on port restart Ferruh Yigit
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).